期刊文献+

词性对新闻和微博网络话题检测的影响 被引量:2

Influence of Part-of-speeches on the Network Topic Detection of Chinese News and Micro-blog
下载PDF
导出
摘要 针对新闻和微博2组有代表性的语料开展实验研究,旨在发现不同词性特征及其组合对2种通用网络平台话题检测的作用及其影响.研究表明:在选择单一词性特征时,名词特征可得到最好的检测结果,命名实体可在保证准确率的情况下大大降低聚类的特征维度.在选择词性组合作为特征时,名词或命名实体、数词、时间短语、形容词以及量词的组合特征可提升新闻网络话题检测的准确率,而名词或命名实体、形容词、量词、数词以及特殊符号与网址的组合特征可在微博语料上获得较好的检测结果. Based on two representative corpus of news and micro-blog,an experimental study was conducted in the paper,in which the purpose is to find the effect and influence of different part-ofspeeches and their combinations on the network topic detection. The research shows that if a single partof-speech as a characteristic is chosen,nouns can get the best results,and named entities can greatly reduce the dimensions of clustering characteristics while keeping the accuracy. If the combination of partof-speeches as a characteristic is chosen, nouns or named entities, numerals, the time phrases,adjectives and quantifiers can promote the accuracy of news network topic detection while nouns or named entities,adjectives,quantifiers,numerals,and the combination of special symbols and sites can achieve good results on micro-blog corpus.
出处 《北京工业大学学报》 CAS CSCD 北大核心 2015年第4期526-533,共8页 Journal of Beijing University of Technology
基金 国家自然科学基金重点资助项目(613300194)
关键词 话题检测 词性 文本特征 新闻 微博 topic detection part-of-speeches text feature news micro-blog
  • 相关文献

参考文献12

  • 1洪宇,张宇,刘挺,李生.话题检测与跟踪的评测及研究综述[J].中文信息学报,2007,21(6):71-87. 被引量:153
  • 2张小明,李舟军,巢文涵.基于增量型聚类的自动话题检测研究[J].软件学报,2012,23(6):1578-1587. 被引量:23
  • 3李营那,阮彤,顾春华.基于新闻要素的在线新事件检测[J].计算机应用与软件,2013,30(12):100-104. 被引量:2
  • 4ALLAN J, CARBONELL J G, DODDINGTON G, et al. Topic detection and tracking pilot study final report [ C ] ff Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Carnegie Mellon University, 1998: 194-218.
  • 5ALSUMAIT L, BARBARA D, DOMENICONI C. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking [ C ] // Eighth IEEE International Conference on Data Mining. Pisa: Institute of Electrical and Electronics Engineers, 2008 : 3- 12.
  • 6CATALDI M, DI C L, SCHIFANELLA C. Emerging topic detection on twitter based on temporal and social terms evaluation [ C ] //Proceedings of the Tenth International Workshop on Multimedia Data Mining. Washington DC: Association for Computing Machinery, 2010: 4.
  • 7SAKAKI T, OKAZAKI M, MATSUO Y. Earthquake shakes twitter user: real-time event detection by social sensors [ C ] // Proceedings of the 19th International Conference on World Wide Web. North Carolina: Association for Computing Machinery, 2010: 851-861.
  • 8杨亮,林原,林鸿飞.基于情感分布的微博热点事件发现[J].中文信息学报,2012,26(1):84-90. 被引量:64
  • 9周刚,邹鸿程,熊小兵,黄永忠.MB-SinglePass:基于组合相似度的微博话题检测[J].计算机科学,2012,39(10):198-202. 被引量:24
  • 10CHUA S. The role of parts-of-speech in feature selection [ C]//Proceedings of the International MultiConference of Engineers and Computer Scientists. Hong Kong: International Association of Engineers, 2008: 457-461.

二级参考文献103

共引文献255

同被引文献24

引证文献2

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部