期刊文献+

基于双层语料过滤器的短语抽取方法

Text Topic Extraction Based on Double-linguistic-filter
下载PDF
导出
摘要 文本主题提取技术能够有效地精炼文本消息,传统的中文文本由最基本的词语组成,由于词汇本身的信息粒度过小,针对词语进行中文信息抽取不能完整表达文本片段的语义信息。短语本身包含较为丰富的细粒度语义信息,更能表达出文本片段的主题性。本文提出基于双层语料过滤器(词性过滤器与短语扩展规则过滤器)的方法来进行文本语料的冗余信息过滤并抽取文本主题短语信息。实验证明,本文的方法具有一定的可靠性和应用性。 The technology of text topic extraction is widely applied to refine the text information. Since the Chinese text is made up of base Chinese words, which contains trivial semantic information, the methods of using the words to express the semantic in- formation of short text is not promised in applications. In contrast, Chinese phrases contain rich fine-gained semantic information and they are preferred to be the representatives of topic of text. Therefore, this paper proposed a method of double-linguistic-filter ( lexical category filter and phrase-extending filter) to weed out the redundant information and extract topic phrases from text. The phrase results are close to the refined semantic expression of text. The experimental result shows that the method we proposed can obtain reliable results, and the method would indicate other new methods on text mining.
出处 《计算机与现代化》 2015年第12期7-14,共8页 Computer and Modernization
关键词 短语抽取 信息提取 规则挖掘 phrase extraction information extraction rule mining
  • 相关文献

参考文献14

  • 1中国互联网络信息中心.第35次中国互联网络发展状况统计报告[DB/OL].http://www.cnnic.net.cn/hlwfzyj/hl-wxzbg/201502/lr20150203551802054676.pdf,2015-03-26.
  • 2鲁明羽,姚晓娜,魏善岭.基于模糊聚类的网络论坛热点话题挖掘[J].大连海事大学学报,2008,34(4):52-54. 被引量:20
  • 3Sahami Mehran, Heilman Timothy D. A Web-based kernel function for measuring the similarity of short text snippets [ C]//Proceedings of ACM the 15th International Confer- ence on World Wide Web. 2006:377-386.
  • 4Metaler D, Dmnais S, Meek C. Similarity measures for short segments of text[ C]//European Colloquium on IR Research- ECIR. 2007 : 16-27.
  • 5Yih W, Meek C. Improving similarity measures for short segments of text [ C ]// National Conference on Artificial Intelligence-AAAI. 2007 : 1489-1494.
  • 6Phan Xuan-Hieu, Nguyen Le-Minh, Horiguchi Susumu. Learning to classify short and sparse Text&Web with hidden topics from large-scale data collections [ C ]// World Wide Web Conference Series-WWW. 2008:91-100.
  • 7Tantanasiriwong Supaporn, Haruechaiyasak Choochart, Guha Sumanta. A comparative study of key phrase extraction for cross-domain document collections [ C ]// The 16th Interna- tional Conference on Asia-Pacific Digital Libraries. 2014:393- 398.
  • 8Liu Dacheng, Peng Zhiyong, Liu Bin, et al. Technology effect phrase extraction in Chinese patent abstracts [ C ]// Web Technologies and Applications, Lecture Notes in Computer Science. 2014,8709 : 141-152.
  • 9Bharti Kusum Kumari, Singh Pramod Kumar. Hybrid di- mension reduction by integrating feature selection with fea- ture extraction method for text clustering [ J ]. Expert Sys- tems with Applications, 2015,42(6) :3105-3114.
  • 10王鹏,樊兴华.中文文本分类中利用依存关系的实验研究[J].计算机工程与应用,2010,46(3):131-133. 被引量:16

二级参考文献41

共引文献102

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部