摘要
信息检索的核心工作包括文档的分类和排序等操作,如何对文档中的特征词权重进行有效度量是其中的一项关键技术。利用词的共现等关系为每个文档建立文本图,基于邻接词间重要性相互影响的思路,结合文档中特征词的词频特性,迭代计算每个词的权重,进一步结合文本图的密度等全局特性,对信息检索的结果进行排序。实验证实,算法在标准数据集上具有良好的效果。
The core work of information retrieval including document classification and ranking operations, how to effectively compute the term weight of every document is one of a key technology. Use of the word relationship to create a text graph for each document, based on the idea of the importance of interaction between adjacent words, combining the characteristics of the word document word frequency characteristics, we iteratively compute weighting of each word. Further combining the global properties of text graph, such as density, we could rank the results of information retrieval. Experiments confirmed that the algorithm in standard data sets with good results.
出处
《计算机系统应用》
2012年第6期216-218,194,共4页
Computer Systems & Applications
基金
湖南省教育厅自然科学基金(06C658)
关键词
文本图
共现关系
文档排序
特征词权重
text graph
co-occurrence relation
document ranking
term weight