摘要
[研究目的]探索自然语言处理(Natural Language Processing,NLP)在其他学科领域的影响力,以促进技术的落地应用与创新研究。构建NLP主题分类体系与数据集,能为未来相关论文主题识别、NLP跨学科知识扩散提供有力支撑。[研究方法]利用《中国图书馆分类法》以及论文间的引证关系,从中国知网采集2159篇NLP典型文献与1376篇非典型文献,可视化分析文献所属刊物、学科分类号的频次信息,提出NLP领域4层级主题分类体系,并据此构建论文多主题分类数据集“NLP-others”,进行文献的多标签分类。[研究结论]NLP在自然、社会与人文各领域均有程度不同的影响,与图书情报学的联系最为密切。相关技术甚至能拓展到处理非自然语言的序列。知识库与知识图谱、神经网络、舆情分析是被广泛提及或应用的技术;LDA、LSTM、CRF、BERT则是在其他领域应用较多的模型算法。
[Research purpose]Probe into the influence of Natural Language Processing(NLP)on other fields to promote technology application and research innovation.The NLP topic taxonomy and relevant dataset proposed by this paper can strongly support topic detection of relevant papers and interdisciplinary research of NLP.[Research method]Firstly,making full use of Chinese Library Taxonomy(CLT)and citation network of papers,we collect 2159 typical NLP papers and 1376 atypical NLP papers from CNKI.Secondly,analyze the frequency of journals and CLT labels of papers.Thirdly,we propose a 4-layer topic taxonomy of NLP,and construct a paper multi-topic classification dataset"NLP-others"by this taxonomy.Finally,conduct the multi-topic classification on"NLP-others".[Research conclusion]NLP has a wide influence on natural science and social science and humanity,and is most closely related to library and information science.Relevant technology can even process sequences besides natural languages.Knowledge base,knowledge graph,neural network and public sentiment analysis are most widely referred or applied technology.LDA、LSTM、CRF、BERT are algorithms or models often applied by other fields.
作者
蒋彦廷
胡韧奋
Jiang Yanting;Hu Renfen(Chengdu Aeronautic Polytechnic,Chengdu 610100;Sichuan University of Media and Communications,Chengdu 611745;Institute of Chinese Information Processing,Beijing Normal University,Beijing 100875;School of Chinese Language&Culture,Beijing Normal University,Beijing 100875)
出处
《情报杂志》
CSSCI
北大核心
2021年第12期169-176,共8页
Journal of Intelligence
基金
国家自然科学基金青年项目“面向古籍整理智能化的知识表示与加工研究”(编号:62006021)
教育部人文社科基金项目“国际汉语教材文本可读性智能评价方法”(编号:18YJAZH112)
国家语委十三五科研规划重点项目(全球中文联盟专项)“面向国际中文教育的文本可读性智能评价方法研究及分析系统构建”(编号:ZDI135-141)的研究成果之一。