期刊文献+

基于两层阈值的话题/报道表示模型 被引量:1

Topic/reports representation model based on two thresholds
原文传递
导出
摘要 根据中文新闻报道的特点,分析了信息增益的缺陷,对比了文本证据权,并用文本证据权的算法思想改善其缺陷,提出了基于两层阈值的特征选择算法,设计并实现了基于两层阈值的话题/报道表示模型.根据话题检测与跟踪评测结果,基于两层阈值的话题/报道表示模型的最好性能比基于信息增益的模型提高了3.321%,证明了新的算法和模型具有更好的性能. According to characteristics of Chinese news reports,the defects of information gain were analyzed,weight of evidence of the text was compared,and the algorithm ideas of weight of evidence of the text were used to improve its shortcomings,proposing a feature selection method based on two thresholds and topic/reports representation model based on two thres holds were designed and implemented.According to the results of topic detection and tracking(TDT)evaluation,the best performance of topic/reports representation model based on two thresholds increases 3.321% more than the model based on information gain,which proves that the new algorithm and model has better performance.
出处 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2013年第S2期117-120,130,共5页 Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金 网络文化与数字传播北京市重点实验室开放课题资助项目(ICDD201105 ICDD201205) 国家自然科学基金资助项目(61271304) 北京市教委科技发展计划重点资助项目暨北京市自然科学基金B类重点资助项目(KZ201311232037) 2013年河北省高等学校科学技术研究自筹资金项目(Z2013162)
关键词 信息增益 文本证据权 向量空间模型 话题/报道表示模型 特征选择 information gain weight of evidence of the text vector space model topic/reports repre-sentation model feature selection
  • 相关文献

同被引文献15

  • 1骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 2谭松波,王月粉..中文文本分类语料库-TanCorpV1.0..http://lcc.ict.ac.cn/-tansongbo/corpus1.php,,[2005-12-20]..
  • 3中科院计算所.基于多层隐马模型的汉语词法分析系统ICTCLAS. http://www.nlp.org.cn/project/project.php?proj_id=6.
  • 4洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 5Nist.The 2004 Topic Detection and Tracking (TDT2004) Task Definition and Evaluation Plan.http://www.itl.nist.gov/iad/mig/tests/tdt/2004/TDT04.Eval.Plan.v1.2.pdf.
  • 6Li Xinwu.Research on Text Clustering Algorithm Based on K_ means and SOM[C] //Proceedings of ShangHai:International Symposium on Intelligent Information Technology Application Workshops,2008:341-344.
  • 7Tan S B,et al.A Novel Refinement Approach for Text Categorization[C] //Proceedings of ACM CIKM,2005.
  • 8Tim Leek,Richard Schwartz,Srinivasa Sista.Probabilistic Approaches to Topic Detection and Tracking[J] .Data Mining and Knowledge Discovery.2003,7(3):67-83.
  • 9Yiming Yang,Jaime Carbonell,Ralf Brown,et al.Multi-Strategy Learning for Topic Detection and Tracking:a joint report of CMU approaches to multilingual TDT[C] //Proceedings of TDT 2002 Workshop.2002:85-114.
  • 10张阔,李涓子,吴刚,王克宏.基于关键词元的话题内事件检测[J].计算机研究与发展,2009,46(2):245-252. 被引量:15

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部