摘要
在把所获取的国家社科基金项目标题按照词表示成训练和测试语料的基础上,基于条件随机场模型和双向长短时记忆模型对所构建的国家社科基金项目学科类别判定模型,进行了多个角度和层面的验证,并与支持向量机模型的实验结果进行对比.基于相应的模型性能评价指标,验证了传统机器学习模型在小规模语料上的整体性能,证明增加了人工特征模型后的条件随机场模型的整体性能并未突出,同时对条件随机场的性能进行个案分析.
The words of National Social Science Foundation(NSSF)titles are expressed into the train and test corpus.And then,the category determination model of the NSSF project by using the conditional random field model and the bidirectional short and long time memory model is verified from many angles and levels.The results are compared with the experimental results of the support vector machine model.Based on the corresponding model performance evaluation indexes,this paper not only verifies the overall performance of the traditional machine learning model on the small-scale corpus,but also proves that the overall performance of the conditional random field model with the artificial feature model is not certain to be outstanding,meanwhile,the performance of the conditional random field model is analyzed in a case.
作者
沈思
翁小颖
孙豪
王东波
SHEN Si;WENG Xiaoying;SUN Hao;WANG Dongbo(School of Economics and Management,Nanjing University of Science and Technology,Nanjing 210094,China;College of Information Science and Technology,Nanjing Agricultural University,Nanjing 210095,China)
出处
《湖南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2020年第4期118-124,共7页
Journal of Hunan University:Natural Sciences
基金
国家自然科学基金资助项目(71974094)
国家社科基金资助项目(19FTQB015)
江苏省自然科学基金资助项目(BK20190450)。
关键词
机器学习
条件随机场模型
国家社科基金
文本挖掘
machine learning
conditional random field
National Social Science Foundation
text mining