期刊文献+

基于语义联系的新闻网页关键词抽取 被引量:10

Keyphrase Extraction from News Web Pages Based on Semantic Relations
下载PDF
导出
摘要 提出一种基于语义联系的新闻网页关键词抽取方法,不仅考虑了词语在知识库《知网》中的语义相似度,还考虑词语在具体上下文中的相关性,用词汇链将词语语义联系表示成图形式,在此基础上抽取出新闻网页关键词。对从网易网站选取120篇有核心提示的新闻网页进行测试,实验结果表明,所提出的方法比基于词频的关键词抽取方法和基于《知网》语义相似度构建词汇链的关键词抽取方法,在准确率和召回率上有很大的提高,当抽取关键词个数为3时,比基于词频方法的准确率和召回率分别提高了27.77%和21.38%。 A new keyphrase extraction method based on semantic relations is proposed in this paper. Two kinds of relations are considered one is the semantic similarity between words in HowNet and the other is the word correlation in context. The lexical chains representing the relation graph between phrases are constructed to extract keyphrases. The experimental results show that the proposed method substantially outperforms the method based on term frequency and the method based on lexical chains that are constructed by the semantic similarity based on HowNet,in terms of recall and precision,especially when the number of keyphrases extracted is 3-the accurate rate is increased by 27.77 percent,and the recall rate is increased by 21.38 percent compared with the method based on term frequency.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2009年第1期145-148,共4页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金资助项目(60573174) 中国科学院自动化研究所开放课题“HTML新闻网页过滤与总结系统”资助
关键词 关键词抽取 词汇链 语义联系 keyphrase extraction lexical chain semantic relation
  • 相关文献

参考文献10

  • 1TURNEY P D. Learning to extract keyphrases from text [R/OL]. Ottawa: National Research Council of Canada, (1999-02-17)[2008-11-10]. http ://iit-iti. nrc. gc. ca/iit-publications-iti/docs/NRC-41622, pdf.
  • 2WITTEN I H.PAYNTER G W,FRANK E,et al. KEA :Practical automatic keyphrase extraction[C]//Proceedings of the 4th ACM Conference on Digital Libraries. New York : ACM Press, 1999 : 254-255.
  • 3SILBER H G ,McCOY K F. Efficient text summarization using lexical chains[C]//Proceedings of the 5th International Conference on Intelligent User Interfaces. New York:ACM Press,2000:252-255.
  • 4MORRIS J,HIRST G. Lexical cohesion computed by thesaural relations as an indicator of the structure of text[J]. Computational Linguistics, 1991,17 (1) : 21-48.
  • 5李素建,王厚峰,俞士汶,辛乘胜.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197. 被引量:92
  • 6刘远超,王晓龙,徐志明,刘秉权.基于粗集理论的中文关键词短语构成规则挖掘[J].电子学报,2007,35(2):371-374. 被引量:17
  • 7索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报,2006,20(6):25-30. 被引量:88
  • 8刘群 李素建.基于《知网》的词汇语义相似度计算.中文计算语言学,2002,7(2):59-76.
  • 9PEAT H J ,WILLET P. The limitations of term co-occurrence data for query expansion in document retrieval systems [J]. Journal of American Society for Information Science, 1991,42 (5):378-383.
  • 10董振东,董强.知网和汉语研究[J].当代语言学,2001,3(1):33-44. 被引量:56

二级参考文献26

共引文献355

同被引文献91

引证文献10

二级引证文献91

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部