摘要
自动文本的特征评价方法的研究一直未受到研究者们的重视。以往只是简单地将面向文本集的TF.IDF特征评价方法应用于针对单文本的自动文摘领域,该方法无法排除低频词噪音的影响,特征评价存在明显误差,致使不能准确计算文本特征。文章引入信息熵,提出了针对单文本的特征评价方法TF.IDF.H。实验表明,新的特征评价方法能够准确获得文章主题特征,更好地改善文摘质量。
Most researchers have not paid enough attention to the study of feature extraction in automatic text summarization.Before,the TF.IDF method is directly applied for text summarization,which isn't able to dispose of the noise impact of low-frequency word and results in obvious errors.In this paper,information entropy is introduced for feature evaluation and a new method(TF.IDF.H)is proposed to evaluate the features for single-document summarization.Experiments results indicate that the approaches proposed are able to evaluate exactly relevant features and lead to summarization quality improvements.
出处
《计算机工程与应用》
CSCD
北大核心
2004年第33期176-178,183,共4页
Computer Engineering and Applications
基金
国家自然科学基金重大项目(编号:79990584)
国家973基础研究规划项目(编号:G1998030414)资助
关键词
自动文摘
文本挖掘
特征评价
信息熵
automatic text summarization,text mining,feature evaluation,information entropy