期刊文献+

基于过滤与权重平滑策略的自动标引方法研究 被引量:1

Research on the Automatic Indexing Method Based on the Filtering and Weight Smoothing Strategies
下载PDF
导出
摘要 文章提出一种基于过滤和权重平滑策略的标引词自动抽取方法,该方法采用可变停用词作为文档分隔手段,采用词性、词频、词语位置等信息作为标引词过滤的手段,采用合理的权重倾向策略保证了标引词抽取在词组和单词中的均衡。方法综合利用了词性标注等自然语言处理领域的研究成果和统计学信息,不依赖词汇在文档集中的分布规律,可以直接从单篇文档抽取标引词,在待标引文档篇幅受限的情况下具有良好的运行性能。 This paper proposes an automatic indexing word extraction method based on the filtering and weight smoothing strat- egies. Taking the variable disabled words as the document separation means, and the information such as part of speech, term fre- quency and words' position as the indexing word filtering means, the paper uses the reasonable weight tendency strategies to guar- antee the balance of indexing word extraction in phrase and single word. The method makes a comprehensive use of the research re- sults and statistics information in the natural language processing field such as part-of-speech marking, is able to extract the inde- xing words from the single document directly without relying on the distribution law of the words in the document set, and has excel- lent operating performance when the length of the document to be indexed is limited.
出处 《情报理论与实践》 CSSCI 北大核心 2014年第2期103-106,共4页 Information Studies:Theory & Application
基金 中国科学技术信息研究所重点工作课题"多语言科技信息语义关联网络构建及其应用"(项目编号:ZD2012-3-3) 中国科学技术信息研究所预研项目"基于句子解析的科技文献自动标引改进方法研究"(项目编号:YY-201218)的成果
关键词 词性过滤规则 权重 自动标引 part-of-speech filtering hales weight automatic indexing
  • 相关文献

参考文献10

  • 1TURNEY P D. Learning algorithms for keyphrase extraction [J]. Information Retrieval, 2000, 2 (4): 303-336.
  • 2WITTEN I H, PAYNTER G W, FRANK E, et al. KEA: practical automatic keyphrase extraction [ C ] // Proceeding of the 4th ACM Conference on Digital Libraries. Berkeley, USA: ACM Press, 1999: 254-255.
  • 3HULTH A. Improved automatic keyword extraction given more linguistic knowledge [ C ] //Proceeding of EMNI P' 03. Stroudshurg : ACL, 2003.
  • 4NGUYEN T, KAN M Y. Keyphrase extraction in scientific publications [C] //Proceedings of the 10th International Con- ference on Asian Digital Libraries, 2007: 317-326.
  • 5MIHALCEA R, TARAU P. Textrank: bringing order into texts [ C 1 //Proceedings of EMNLP. 2004 : 404-411.
  • 6李鹏,王斌,石志伟,崔雅超,李恒训.Tag-TextRank:一种基于Tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351. 被引量:56
  • 7PASQUIER C. Task 5: single document keyphrase extracting using sentence clustering and latent dirichlet allocation [ C ] // Proc of ACL Wordshop on semantic Evaluation. 2010 : 154-157.
  • 8LIU Zhiyuan, CHEN Xinxiong, ZHENG Yabin, et al. Auto- matic keyphrase extraction by bridging vocabulary gap [ C ] // Proceedings of the Fifteenth Conference on Computational Natu- ral Language Learning, 2011 : 135-144.
  • 9高影繁,徐红姣,王惠临.基于多重过滤策略的科技文献自动标引方法研究[J].情报理论与实践,2012,35(12):98-100. 被引量:1
  • 10刘开瑛,薛翠芳,郑家恒,周晓强.中文文本中抽取特征信息的区域与技术[J].中文信息学报,1998,12(2):1-7. 被引量:45

二级参考文献33

  • 1李素建,王厚峰,俞士汶,辛乘胜.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197. 被引量:92
  • 2靳从,樊春丽,杨静宇.主题词自动标引中的知识处理方法[J].情报理论与实践,1996,19(2):30-33. 被引量:3
  • 3刘开瑛,计算机期刊关键词标引统计分析技术报告,1996年
  • 4刘开瑛,中国人民银行××省分行《重要文件汇编》的主题词标引研究技术报告,1996年
  • 5Yih W, Goodman J, Carvalho V R. Finding advertising keywords on Web pages [C]//Proc of WWW'06. New York: ACM, 2006:213-222.
  • 6Kelleher D, I.uz S. Automatic hypertext kcyphrase detection [C] //Proc of IJCAI-05. San Francisco: Morgan Kaufmann, 2005:1608-1609.
  • 7Turney P D. Coherent keyphrase extraction via web mining [C]//Proc of IJCAI 03. San Francisco: Morgan Kaufmann, 2003:434-439.
  • 8Hulth A. Improved automatic keyword extraction given more linguistic knowledge[C] //Proc of EMNLP'03. Stroudsburg: ACL, 2003:216-223.
  • 9A1 Khalifa H S, Davis H C. Folksonomies versus automatic keyword extraction: An empirical study [C]//Proc of IAD1S Web Applications and Research 2006. Southampton: ECS, 2006: 132-143.
  • 10Mihaleea R, Tarau P. TextRank.- Bringing order into texts [C] //ProeofEMNLP'04. Stroudsburg: ACL, 2004:404 - 411.

共引文献99

同被引文献15

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部