摘要
自动识别学术论文所使用的研究方法对研究方法的评估、方法使用行为分析、方法检索等均具有重要意义。学术论文研究方法的自动分类离不开大量训练语料,但论文研究方法标注成本高昂,探讨如何充分利用现有标注数据对于降低标注成本具有重要意义。以图书情报领域为研究对象,首先通过实验比较了基于英文摘要的单语言方法和基于全文的跨语言方法,进而说明了使用跨语言方法的必要性;其次比较了两种跨语言方法在跨语言研究方法分类中的效果;最后对本文提出的一种学术论文全文处理方法进行了验证。实验结果表明,基于学术论文全文的跨语言方法明显优于基于英文摘要的单语言方法,基于机器翻译的方法比基于跨语言预训练模型的方法更优。此外,实验表明针对学术论文全文的长文本处理方法相较于基线方法有明显提升。
Automatically identifying research methods used in academic articles is of great significance to the research method evaluation,method use behavior analysis, and method retrieval. Automatic identification of research methods requires a large amount of training corpus. But the cost of annotating research methods in articles is expensive, so how to make full use of the existing annotation data is of great significance to reduce the cost. In the field of library and information science, the necessity of cross-lingual method is proved by comparing the single language method in English abstract with the cross-lingual method in full text. The effects of the two methods on the classification of cross-lingual research methods are compared,and then a method of academic article full-text processing proposed by the author is verified. The results show that the cross-lingual research method based on academic article full text is obviously better than the single language method based on the English abstract, the method based on machine translation is superior to the method based on cross-lingual pre-training model. In addition, it proves that long text processing method for the full text of academic article has obviously improved the classification effect compared with the baseline method.
作者
田亮
李博闻
章成志
Tian Liang;Li Bowen;Zhang Chengzhi
出处
《图书馆建设》
CSSCI
北大核心
2022年第1期75-86,共12页
Library Development
基金
国家社会科学基金重大项目“面向三大公共数字文化工程资源融合的多语言信息组织与检索研究”,项目编号:19ZDA341
江苏省研究生科研与实践创新计划项目“特定领域学术论文研究方法的跨语言自动分类研究”,项目编号:KYCX21_0424的成果之一。
关键词
研究方法自动分类
跨语言文本分类
多标签分类
学术论文全文内容
Automatic classification of research method
Cross-lingual text classification
Multi-label classification
Full-text of academic article