基于多重过滤策略的科技文献自动标引方法研究被引量：1

Research on Automatic Indexing Method for Scientific Literatures Based on Multi-filtering Strategies

下载PDF

导出

摘要文章提出一种基于多重过滤策略的科技文献自动标引方法,该方法不依赖于大规模训练语料,很容易作为处理模块嵌入到其他文本处理环节中,实验结果验证了方法的可行性。另外,还提出了一种基于二次文献的标引词评价方法。该方法虽然严重依赖于二次文献中给出的摘要和关键词的质量,但在人力和物力资源不足以支持建立一个高质量测试集的条件下是有价值的,制定更加合理有效的评测方案势在必行。 This paper proposes an automatic indexing method for scientific literatures based on multi-filtering strategies. The method does not rely on large-scale training corpus, and is easy to be embedded into the other text processing links as a processing module. The experimental results verify the feasibility of the method. Moreover, the paper proposes an evaluation method for index terms based on secondary literatures. Although the method relies heavily on the quality of the abstract and keywords of the secondary literature, it＇ s valuable under the conditions when the human and material resources are insufficient to support the establishment of a high quality test set. It＇ s imperative to formulate a more rational and efficient evaluation scheme.

作者高影繁徐红姣王惠临

机构地区中国科学技术信息研究所

出处《情报理论与实践》 CSSCI 北大核心 2012年第12期98-100,110,共4页 Information Studies:Theory & Application

基金中国科学技术信息研究所学科建设课题"自然语言处理"(项目编号:XK2011-6) 中国科学技术信息研究所重点工作课题"多语言信息获取关键技术研究与应用示范"(项目编号:ZD2011-3-3) 中国科学技术信息研究所科研项目预研资金(项目编号:YY-201121)支持

关键词多重过滤科技文献自动标引 multi-filtering S＆T document automatic indexing

分类号 G254.36 [文化科学—图书馆学]

引文网络
相关文献

参考文献6

1LUHN H P. The automatic creation of literature abstracts [ J ]. IBM, Journal of Research & Development, 1958 (2): 159-165.
2WITYEN I H, PAYNTER G W, FRANK E, et al. KEA: practi- cal automatic keyphrase extraction [ C] //Proceedings of the 4th ACM Conference on Digital Libraries ( DL' 99 ), Berkeley, USA, 1999: 254-255.
3KELLEHER D, LUZ S. Automatic hypertext keyphrase detection [ C] //Proceedings fo the 19th International Joint Conference on Artificial Intelligence, Edinburgh, UK, 2005 : 1608-1609.
4E1-BELTAGY S R, RAFEA A. KP-Miner: a keyphrase extrac- tion system for English and Arabic documents [ J ]. Information Systems, 2009, 34 (1): 132-144.
5LIU Zhiyuan, HUANG Wenyi, ZHENG Yabin, et al. Automatic keyphrase extraction in natural language decomposition [ C ] // Proceedings of the 2010 Conference on empirical methods in nat- ural language processing, Cambridge, USA, 2010: 366-376.
6ZHAO W X, JIANG Jing, HE Jing, et al. Topic keyphrase ex- traction from Twitter [ C]. The 49th annual meeting of the asso- ciation for computational linguistics: human language technolo- gies, Porland, USA, 2011: 379-388.

同被引文献9

1TURNEY P D. Learning algorithms for keyphrase extraction [J]. Information Retrieval, 2000, 2 (4): 303-336.
2WITTEN I H, PAYNTER G W, FRANK E, et al. KEA: practical automatic keyphrase extraction [ C ] // Proceeding of the 4th ACM Conference on Digital Libraries. Berkeley, USA: ACM Press, 1999: 254-255.
3HULTH A. Improved automatic keyword extraction given more linguistic knowledge [ C ] //Proceeding of EMNI P' 03. Stroudshurg : ACL, 2003.
4NGUYEN T, KAN M Y. Keyphrase extraction in scientific publications [C] //Proceedings of the 10th International Con- ference on Asian Digital Libraries, 2007: 317-326.
5MIHALCEA R, TARAU P. Textrank: bringing order into texts [ C 1 //Proceedings of EMNLP. 2004 : 404-411.
6PASQUIER C. Task 5: single document keyphrase extracting using sentence clustering and latent dirichlet allocation [ C ] // Proc of ACL Wordshop on semantic Evaluation. 2010 : 154-157.
7LIU Zhiyuan, CHEN Xinxiong, ZHENG Yabin, et al. Auto- matic keyphrase extraction by bridging vocabulary gap [ C ] // Proceedings of the Fifteenth Conference on Computational Natu- ral Language Learning, 2011 : 135-144.
8刘开瑛,薛翠芳,郑家恒,周晓强.中文文本中抽取特征信息的区域与技术[J].中文信息学报,1998,12(2):1-7. 被引量：45
9李鹏,王斌,石志伟,崔雅超,李恒训.Tag-TextRank:一种基于Tag的网页关键词抽取方法[J].计算机研究与发展,2012,49(11):2344-2351. 被引量：56

引证文献1

1高影繁,徐红姣,杜枫.基于过滤与权重平滑策略的自动标引方法研究[J].情报理论与实践,2014,37(2):103-106. 被引量：1

二级引证文献1

1李千驹,李思达,刘建毅.一种基于知识组织的关键词自动标引方法[J].情报科学,2016,34(11):107-110. 被引量：8

1《科技文献信息管理》2010年度索引[J].科技文献信息管理,2010,24(4):62-64.
2《科技文献信息管理》2006年总目录[J].科技文献信息管理,2006,20(4):62-64.
3《体育科技文献通报》征稿启事[J].体育科技文献通报,2006,14(3):88-88.
4无.《体育科技文献通报》征稿启事[J].体育科技文献通报,2006,14(1):88-88.
5《体育科技文献通报》征稿启事[J].体育科技文献通报,2006,14(5):88-88.
6《科技文献信息管理》2009年度索引[J].科技文献信息管理,2009,23(4):62-64.
7《科技文献信息管》2005年总目录[J].科技文献信息管理,2005,19(4):63-64.
8《科技文献信息管理》2008年度索引[J].科技文献信息管理,2008,22(4):62-64.
9《体育科技文献通报》征稿启事[J].体育科技文献通报,2006,14(2):88-88.
10《科技文献信息管理》2007年总目录[J].科技文献信息管理,2007,21(4):62-64.

情报理论与实践

2012年第12期

浏览历史

内容加载中请稍等...

基于多重过滤策略的科技文献自动标引方法研究被引量：1

参考文献6

同被引文献9

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于多重过滤策略的科技文献自动标引方法研究 被引量：1

参考文献6

同被引文献9

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于多重过滤策略的科技文献自动标引方法研究被引量：1