摘要
根据中文新闻报道的特点,分析了信息增益的缺陷,对比了文本证据权,并用文本证据权的算法思想改善其缺陷,提出了基于两层阈值的特征选择算法,设计并实现了基于两层阈值的话题/报道表示模型.根据话题检测与跟踪评测结果,基于两层阈值的话题/报道表示模型的最好性能比基于信息增益的模型提高了3.321%,证明了新的算法和模型具有更好的性能.
According to characteristics of Chinese news reports,the defects of information gain were analyzed,weight of evidence of the text was compared,and the algorithm ideas of weight of evidence of the text were used to improve its shortcomings,proposing a feature selection method based on two thresholds and topic/reports representation model based on two thres holds were designed and implemented.According to the results of topic detection and tracking(TDT)evaluation,the best performance of topic/reports representation model based on two thresholds increases 3.321% more than the model based on information gain,which proves that the new algorithm and model has better performance.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2013年第S2期117-120,130,共5页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
网络文化与数字传播北京市重点实验室开放课题资助项目(ICDD201105
ICDD201205)
国家自然科学基金资助项目(61271304)
北京市教委科技发展计划重点资助项目暨北京市自然科学基金B类重点资助项目(KZ201311232037)
2013年河北省高等学校科学技术研究自筹资金项目(Z2013162)
关键词
信息增益
文本证据权
向量空间模型
话题/报道表示模型
特征选择
information gain
weight of evidence of the text
vector space model
topic/reports repre-sentation model
feature selection