期刊文献+

基于粗集理论的中文关键词短语构成规则挖掘 被引量:17

Mining Construction Rules of Chinese Keyphrase Based on Rough Set Theory
下载PDF
导出
摘要 短语比词信息量更加丰富,更能够体现原文的主题,通常所说的关键词实际上多数为短语形式.然而目前的问题是关键词短语的自动标引缺乏统一的规则指导.本文利用粗集理论在数据泛化和知识约简方面的优势,对人工标注的人民日报关键词短语语料进行了挖掘,从而得到了中文关键词短语的若干构成规则.规则可以用于自动关键词抽取,也可以对手工关键词标引进行指导.实验结果表明获取的规则使关键词自动抽取的性能有较大改善. Phrase conveys more information than word, and can better represent main topic of one article. Most of keywords we referred to are actually in form of phrases. The problem is that extraction of keyphrase lacks guidance of some general rules. By taking advantage of the ability of rough set theory on data generalization and knowledge reduction,the manually labeled keyphrase corpus which come from People's Daily was mined and some construction nile, s of Chinese keyphrase has been generated. These rule, s can be used for automatic keyword extraction, and can also help people manually label keyword. The experimental results are promising: the performance of keyword extraction improved greatly after importing these rules.
出处 《电子学报》 EI CAS CSCD 北大核心 2007年第2期371-374,共4页 Acta Electronica Sinica
基金 国家自然科学基金(No.60435020) 教育部微软语言语音重点实验室基金(No.01307620)
关键词 抽取 关键词短语 粗集理论 规则挖掘 keyword extraction keyphrase rough set theory rule mining
  • 相关文献

参考文献9

  • 1李素建,王厚峰,俞士汶,辛乘胜.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197. 被引量:92
  • 2韩客松,王永成.中文全文标引的主题词标引和主题概念标引方法[J].情报学报,2001,20(2):212-216. 被引量:41
  • 3Anette hulth. Combining machine learning and natural language processing for automatic keyword extraction [ D ]. Stockholm:Department of computer and systems sciences, Stockholm University, 2004.35 - 38.
  • 4俞士汶,陆俭明,朱学锋,等.现代汉语语料库加工规范——词语切分与词性标注[S].http://www.icl.pku.edu.cn/ic1_groups/corpus/spec.htm,1999.
  • 5Pawlak Z. Rough sets[ J]. International Journal of Computer and Information Sciences, 1982,11 (5) : 341 - 356.
  • 6Michal,Jacek.Rough set theory library.http://www.pw.edu. pl/english/, 1994.
  • 7董振东,董强.知网.http://www.keenage.com/,2004.
  • 8刘群 李素建.基于《知网》的词汇语义相似度的计算[A]..第三届汉语词汇语义学研讨会[C].台北,2002..
  • 9卢志茂,刘挺,李生.统计词义消歧的研究进展[J].电子学报,2006,34(2):333-343. 被引量:27

二级参考文献69

共引文献165

同被引文献228

引证文献17

二级引证文献118

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部