摘要
针对中文新闻网页的特点,使用了包括统计特征、位置特征和词性特征等在内的多种特征综合评定候选关键词的权重大小。对于部分分词结果不能良好地反映主题的问题,提出了一种基于有向图的组合词生成方法,旨在找出高频次的相邻词作为组合词。实验结果表明,该方法较传统的TF-IDF方法效率有较大提升,能够有效提取出新闻网页关键词。
Considering the characteristics of Chinese news Web pages, this paper uses many features including statistical feature, position feature and POS(Part of Speech)feature to evaluate the weight of candidate keywords. In order to solve the problem of that some segmentation cannot reflect the theme, this paper proposes a compound words generation method based on directed graph, which aims to find adjacency words for compound words. The experimental results show that this method is vastly superior to the conventional TF-IDF method in efficiency and can extract keyword from news Web page efficiently.
出处
《计算机工程与应用》
CSCD
2014年第19期222-226,共5页
Computer Engineering and Applications
关键词
提取
组合特征
组合词
有向图
新闻网页
extraction
multi-features
compound words
directed graph
news Web page