摘要
为了达到节省船舶领域对比工作消耗的时间与人力资源成本,文中对常用的文本分类算法进行分析比较,有针对性地对分类算法做出在船舶领域的创新。改进传统特征提取算法(TF-IDF算法),使得权重提取更加合理,改进fasttext分类算法,解决输入层参数的输入问题,提出改进后的C-fasttext算法。通过实验,将C-fasttext算法模型分类效果和朴素贝叶斯算法、支持向量机算法和传统fasttext模型进行对比。结果表明,改进C-fasttext算法准确率最高,为91.59%;传统的fasttext分类算法的准确率为88.27%;支持向量机算法处在较低水平,准确率只有59.98%;朴素贝叶斯方法在准确率上为76.19%。改进算法的匹配准确率超过90%,语料覆盖率超过95%,满足实验需求。
In order to achieve the purpose of saving the time and human resource cost of personnel in the shipping field,this article analyzes and compares the commonly used text classification algorithms,and makes targeted innovations in the classification algorithm in the shipping field.Improve the traditional feature extraction algorithm TF-IDF algorithm to make the weight extraction more reasonable,improve the fasttext fast text classification algorithm to solve the input layer parameter input problem,and propose an improved C-fasttext algorithm.Combined with the experiment,the C-fasttext algorithm model classification effect is compared with the naive Bayes algorithm,the support vector machine algorithm and the traditional fasttext model.The results showed that,the improved C-fasttext algorithm has the highest accuracy which is up to 91.59%.The accuracy of traditional fasttext classification algorithm is88.27%.The support vector machine algorithm is at a low level,and the accuracy rate is only 59.98%.The accuracy of naive Bayes method is 76.19%.The matching accuracy rate of the improved algorithm exceeds 90%,and the corpus coverage rate exceeds 95%,meeting the experimental requirements.
作者
陈浩天
刘晓东
CHEN Haotian;LIU Xiaodong(Wuhan Research Institude of Posts and Telecommunications,Wuhan 430070,China;Wuhan Hongxu Information Technology Co.,Ltd.,Wuhan 430070,China)
出处
《电子设计工程》
2023年第2期72-76,共5页
Electronic Design Engineering