期刊文献+

不同特征下的学术文本结构功能自动识别研究 被引量:21

Research on the Structure Recognition of Academic Texts Under Different Characteristics
下载PDF
导出
摘要 随着大量科研论文全文本的出现,如何从中挖掘相应的知识不仅有利于学术文献的深度知识组织而且有益于学术文献的精准检索。而识别学术文本的结构是进行上述探究的基础,因为结构的识别有助于从更深层次或者偏重语义的角度理解学术文本,从而促进学术文本挖掘研究的发展。本文以学术文本的不同结构功能为研究对象,以Journal of the Association for Information Science and Technology(JASIST)上发表的1579篇论文为数据集,进行双向长短时记忆神经网络、支持向量机和条件随机场三种模型上的预实验,并对比实验结果的性能,最终确定利用条件随机场模型做进一步探究。利用条件随机场模型,本文将学术文本结构功能识别问题转化为对句子单元的序列标注问题,寻找最优识别模型并探究不同特征对结构功能识别的影响,最终获得开放测试的调和平均值为92.88%的结构整体识别效果。实验结果表明,章节标题中词汇信息和章节内容的特征词汇信息对学术文本的功能结构识别起到巨大作用,可以达到令人满意的效果,而结构的长度特征则干扰条件随机场方法的性能。在最后,本文对学术文本结构功能识别出错原因进行总结,指出进一步探讨的问题和方向。 With the emergence of a large number of full-text scientific theses,the process of extracting the useful information in these volumes is not only beneficial to knowledge-based organizations but is also useful for the accurate retrieval of academic literature.The recognition of the structure of academic text is the basis for this investigation because structure recognition is helpful in the comprehension of these documents from the perspective of depth and semantic,to promote research into academic text mining.This paper examines different structural functions of academic texts as research objects,and considers 1579 papers from the Journal of the Association for Information Science and Technology as the dataset,and compares three types of models,namely bidirectional long short-term memory neural network,support vector machine,and conditional random fields,and the conditional random field determined to be used in the following exploration.Based on this approach,the problem of functional structure recognition of academic texts was transformed to identify the sequence of sentence units.Finally,the best model was obtained for an F-measure of 92.88%for the average of the open test,and the effect of different features on the structure recognition problem was explored.The experimental results showed that the lexical information in the chapter titles and the feature words in the chapters play an important role in academic text functional structure recognition,and satisfactory results were produced.However,the length of the structure affected the conditional random fields method.The causes of the errors associated with the identification of academic texts are summarized,in addition to the identification of the limitations and plans for further studies.
作者 王东波 高瑞卿 叶文豪 周鑫 朱丹浩 Wang Dongbo;Gao Ruiqing;Ye Wenhao;Zhou Xin;Zhu Danhao(College of Information Science and Technology,Nanjing Agricultural University,Nanjing 210095;Department of Information Management,Nanjing University,Nanjing 210093;Department of Computer Science and Technology,Nanjing University,Nanjing 210093)
出处 《情报学报》 CSSCI CSCD 北大核心 2018年第10期997-1008,共12页 Journal of the China Society for Scientific and Technical Information
基金 国家社会科学基金重大项目"情报学学科建设与情报工作未来发展路径研究"(17ZDA291)
关键词 文本分类 条件随机场 篇章结构 深度学习 text classification condition random fields chapter structure deep learning
  • 相关文献

参考文献12

二级参考文献113

共引文献181

同被引文献253

引证文献21

二级引证文献108

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部