摘要
关键词的权值计算绝大多数都是将关键词当作独立的部分,忽略关键词间关联性。试图从关键词间关联性出发,提出关键词的权值受到其他关键词的相互贡献作用,以PageRank算法中对于网页权值的迭代计算为理论基础,提出一种基于关键词间相互投票的权值迭代计算模型,将关键词抽象为模型中各个节点,关键词的初始权值采用经典的TF-IDF方法。将改进的关键词权值计算方法应用于Reuters21578 Top10和20Newsgroup数据集上,实验结果表明,新的算法能够较为明显地差异化关键词之间权值,达到区分文本中关键词重要程度的作用。
The weight calculation of terms in text which mainly regards terms as a separate part, ignoring the correlation among terms. A kind of theory, which is based on correlation among terms, proposed about the term' s weight could acquire contribution from other terms. The model of weight iterative calculation based on vote among terms is proposed under the foundation of PageRank algorithm on web page weight iterative calculation. Each of term is represented as node in the model, the initial weight of the node is obtained by TF - IDF method. The experimental results on open Reuters21578 ToplO and 20Newsgroup datasets show that the improved algorithm could differentiate terms through weight significantly in order to distinguish the features in text.
出处
《网络新媒体技术》
2015年第3期37-41,共5页
Network New Media Technology
关键词
词项权重
投票模型
迭代收敛
权值差异化
特征项区分
term' s weight, vote model, iteratively convergence, weight differentiation, feature distinguish