期刊文献+

基于深度卷积神经网络的水稻知识文本分类方法 被引量:12

Rice Knowledge Text Classification Based on Deep Convolution Neural Network
下载PDF
导出
摘要 为解决文本特征提取不准确和因网络层次加深而导致模型分类性能变差等问题,提出基于深度卷积神经网络的水稻知识文本分类方法。针对水稻知识文本的特点,采用Word2Vec方法进行文本向量化处理,并与OneHot、TF-IDF和Hashing方法进行对比分析,得出Word2Vec方法具有较高的分类精度,正确率为86.44%,能够有效解决文本向量表示稀疏和信息不完整等问题。通过调整残差网络(Residual network,Res Net)结构,分析残差模块结构和网络层次对分类网络的影响,构建了9种分类网络结构,测试结果表明,具有4层残差模块结构的网络具有较好的特征提取精度,Top-1准确率为99.79%。采用优选出的4层残差模块结构作为基本结构,使用胶囊网络(Capsule network,Caps Net)替代其池化层,设计了水稻知识文本分类模型。与Fast Text、Bi LSTM、Atten-Bi GRU、RCNN、DPCNN和Text CNN等6种文本分类模型的对比分析表明,本文设计的文本分类模型能够较好地对不同样本量和不同复杂程度的水稻知识文本进行精准分类,模型的精准率、召回率和F1值分别不小于95.17%、95.83%和95.50%,正确率为98.62%。本文模型能够实现准确、高效的水稻知识文本分类,满足实际应用需求。 The data of weeds,pests,diseases and cultivation management of rice extracted from agricultural text data is a typical text classification problem,which is fundamental to key text information extraction,text data mining and agricultural intelligent question and answer.The classification of Chinese texts,especially agricultural texts,is characterized by poor data redundancy,sparsity and normativity.While the deep learning technology can automatically extract the key features of the text,and the built model has strong adaptability and mobility.For that reason,in order to solve the problem of classification performance of the model deteriorates caused by inaccurate text feature extraction and deepened network hierarchy,a text classification method of rice knowledge oriented Q&A system was proposed.The Python of scrapy was adopted to obtain Chinese text data on rice pests,grass pests,cultivation and management,such as the experts online system of Hownet and the planting question and answer website,as training and test samples.Jieba segmentation method was applied to rice knowledge text for word segmentation to remove useless symbols and stop words in the text.Meanwhile,the results of Chinese segmentation were greatly influenced by the segmentation lexicon.In order to improve the precision of word segmentation of rice knowledge text and reduce the situation of misclassification,omission and misclassification,a rice-related corpus was constructed on the basis of sogou agricultural corpus,which further expanded the basic Jieba word segmentation database and improved the identification degree of specialized words such as rice diseases,insect pests,grass and drugs,cultivation and management.At the same time,Word2Vec method was used to vectorize text data,and it was compared with One Hot,TF IDF and Hashing methods,and it was concluded that Word2Vec method can effectively solve the text vector typical problems such as sparsity and incomplete information.Based on the fundamental structure of ResNet,nine kinds of rice knowledge text classification models were constructed by means of the change and design of its residual module and network hierarchy.The test results indicated that a network with 4-layer residual module structure had good feature extraction accuracy,and the Top 1 accuracy was 99.79%.In the convolutional neural network,the pooling layer was used for the under-sampling operation,which would lose certain text phrase relative position characteristics in the pooling process,thus affecting the classification accuracy of the model,therefore,the optimized 4-layer residual module structure was taken as the basic structure,and the CapsNet was used to replace the pooling layer,and a rice knowledge text classification model,referred to as RIC Net,was designed.Through comparative analysis of six text classification models,including FastText,BiLSTM,Atten BiGRU,RCNN,DPCNN and TextCNN,it was concluded that the text classification model designed was able to precisely classify rice knowledge texts with different sample sizes and different levels of complexity,which enabled the accuracy rate,recall rate and F1 value of the model to be no less than 95.17%,95.83%and 95.50%,respectively,and the accuracy rate was as high as 98.62%.The model can realize accurate and efficient classification of rice knowledge text,meeting practical application requirements.
作者 冯帅 许童羽 周云成 赵冬雪 金宁 王郝日钦 FENG Shuai;XU Tongyu;ZHOU Yuncheng;ZHAO Dongxue;JIN Ning;WANG Haoriqin(College of Information and Electrical Engineering,Shenyang Agricultural University,Shenyang 110161,China;Liaoning Agricultural Information Technology Center,Shenyang Agricultural University,Shenyang 110161,China)
出处 《农业机械学报》 EI CAS CSCD 北大核心 2021年第3期257-264,共8页 Transactions of the Chinese Society for Agricultural Machinery
基金 国家重点研发计划项目(2018YFD0300309)。
关键词 水稻知识文本 文本分类 深度卷积神经网络 向量化处理 特征提取 分类模型 rice knowledge text text classification deep convolution neural network vectorization feature extraction classification model
  • 相关文献

参考文献13

二级参考文献85

共引文献437

同被引文献177

引证文献12

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部