摘要
随着中国经济的高速发展和技术创新能力的不断提升,高效的组织、分类信息是提供个性化行业管理和跟踪分析的基础。根据行业信息特点和发展规律,提出了一种基于fastText算法的行业分类模型。首先,构建行业分类关键词库,通过特征词库进行分词和权重计算。然后,构建分类器模型,实现中文行业的自动分类。最后,实验选取了80000个包含企业经营范围、企业信息、舆论信息的测试文档,结果表明,所提模型结果高于Bayes、决策树、KNN等分类算法,取得了较好的应用效果。
With the rapid development of China's economy and the continuous improvement of technological innovation ability,efficient organization and classification information is the basis of providing personalized industry management and tracking analysis.According to the characteristics of industry information and the law of development,a Chinese industry classification model based on fastText is proposed in this paper.First,the keyword database of industry classification is constructed,then word segmentation and weight calculation are carried out by feature lexicon,and finally the classifier model is constructed to realize the automatic classification of industry.In the experiment,80000 test documents including business scope,enterprise information and public opinion information were selected.The results show that the classification accuracy of the proposed model is higher than that of Bayes,decision tree,KNN and other classification algorithms.Thus,the proposed model works well in the application.
作者
吴震
冉晓燕
苗权
刘纯艳
张栋
魏娜
WU Zhen;RAN Xiaoyan;MIAO Quan;LIU Chunyan;ZHANG Dong;WEI Na(National Computer Network Emergency Response Technical Team/Coordination Center of China,Beijing 100029,China;Beijing Branch of National Computer Network Emergency Response Technical Team/Coordination Center of China,Beijing 100055,China;Great Wall Computer Software&System Inc.,Beijing 100190,China)
出处
《北京航空航天大学学报》
EI
CAS
CSCD
北大核心
2022年第2期193-198,共6页
Journal of Beijing University of Aeronautics and Astronautics