期刊文献+

引文元数据的自动发现和标注方法研究——以外文引文为例 被引量:3

Automatically Detecting and Tagging Foreign Language Citation Metadata
原文传递
导出
摘要 【目的】在总结当前引文元数据抽取方法的基础上,结合语义学知识和机器学习方法,对引文元数据的自动抽取方法进行探索。【方法】实验中采用神经网络模型对人工分割过的语料进行词向量训练。利用相同类型的元数据会相对集中地出现在向量空间中某一位置的现象,通过支持向量机分类算法实现对元数据的自动归类和标注。【结果】在以外文引文数据作为测试集的实验中,本文方法取得了较高的准确率和召回率,特别是针对引文中含有多种语言和缩写的现象,具有较好的处理能力。【局限】在对于引文元数据时间内容的细粒度抽取中存在一定的局限性。【结论】实验结果表明,此方法在引文元数据的自动发现和标注上具有良好的效果,并能很大程度地提高方法的适用性和容错率。 [Objective] This paper proposes a new method to automatically extract bibliographic metadata, with the help of semantic knowledge and machine learning technologies. [Methods] We used the neural network model to create word vectors from manually split data, and then found that same type of metadata is relatively concentrated at certain locations in the vector space. Thus, we proposed a new SVM classification algorithm to classify and annotate the bibliographic metadata automatically. [Results] The proposed method achieved high recall and precision rates with citation data, especially for citations with various languages and abbreviations. [Limitations] The fine-grained extraction of the time related content could be improved. [Conclusions] The proposed method could effectively detect and tag bibliographic metadata, and improve the system's compatibility and fault tolerance ability.
作者 姜霖 王东波
出处 《数据分析与知识发现》 CSSCI CSCD 2017年第1期47-54,共8页 Data Analysis and Knowledge Discovery
关键词 引文元数据 元数据抽取 机器学习 神经网络 Bibliographic Metadata Metadata Extraction Machine Learning Neural Network
  • 引文网络
  • 相关文献

参考文献3

二级参考文献6

  • 1American Psyhological Association. (1983). Publication Manual of the American Psychological Association[M]. (3rd ed). Washington DC: American Psychological Association.
  • 2American Psyhological Association. (2001). Publication Manual of the American Psychological Association[ M]. (Sth ed). Washington DC: American Psychological Association.
  • 3Turabian, Kate L.(1996).A Manual for Writers of Tem Papers, Theses, and Disseaations[M]. (6thed). Chicago and london: The University of Chicago Press.
  • 4Gibaldi, Joseph. (1998). MAL Style Manual arid Guide to Scholarly Publishing[M] .(2nd ed).New York: The Modem Language Association of America.
  • 5Gibaldi, Joseph. (1999) .MLA Handbook for Writers of Research Papers[M]. (5th ed).New York:The Modem Language Association of America.
  • 6黄豫清,戚广志,张福炎.从WEB文档中构造半结构化信息的抽取器[J].软件学报,2000,11(1):73-78. 被引量:47

共引文献143

同被引文献34

引证文献3

二级引证文献15

相关作者

;
使用帮助 返回顶部