期刊文献+

基于改进文本表示的农产品贸易摩擦新闻文本聚类及应用展望

Clustering of Agricultural Trade Friction News Text Based on Improved Text Representation and Its Application Prospect
下载PDF
导出
摘要 传统文本表示方法应用于农产品贸易摩擦新闻文本上,数据维度高、稀疏性较大、结构信息和语义信息表达不充分,会造成文本聚类时间复杂度和计算复杂度较大等问题。为解决这些问题,在Word2Vec词向量表示方法的基础上,结合TF-IDF表示新闻文本,提出基于关键词文本表示矩阵KTRM方法,通过深度学习聚类DEC模型进行文本聚类。在有标注的新闻语料上进行包括参数调节实验、文本表示方法对比实验和聚类方法对比实验,并应用于实际的农产品贸易摩擦新闻文本。结果表明,该方法聚类精度ACC及标准化互信息NMI均有显著提高,验证了该方法的有效性。最后,对该方法的应用前景进行了展望。 Traditional text representation methods suffer from high data dimensionality, large sparseness and inadequate representation of structural and semantic information, which causes higher time complexity and computation complexity of text clustering when applied to the news text of agricultural trade friction. In order to solve those problems, based on the word2 Vec representation method and combined with the word TF-IDF sorting, an improved representation of text was proposed named Keywords-based Text Representation Matrix(KTRM). KTRM was used as input into deep learning clustering DEC model for text clustering. Parameter adjustment experiment, text representation method comparison experiment and clustering method comparison experiment were carried out on the labeled news corpus, and were applied to the news text of actual agricultural products trade friction. The results showed that the clustering accuracy ACC and standardized mutual information NMI were significantly improved. The experimental results verified the effectiveness of this method. Finally, the application prospect of this method was forecasted.
作者 潘尧 王末 王健 Pan Yao;Wang Mo;Wang Jian(Institute of Agricultural Information,Chinese Academy of Agricultural Sciences,Key Laboratory of Agricultural Data,Ministry of Agriculture and Rural Affairs,Beijing 100081)
出处 《农业展望》 2020年第6期80-88,共9页 Agricultural Outlook
基金 中国农业科学院农业信息研究所基本科研业务费重点项目“科学数据出版能力建设研究”(JBYW-AII-2020-35) 中国农业科学院农业信息研究所基本科研业务费项目“中美贸易摩擦对全球及我国大豆市场定量影响研究”(JBYW-AII-2020-17) 国家社会科学基金项目“科学数据用户相关性标准与使用模式研究”(14BTQ056)。
关键词 农产品贸易摩擦 文本表示 词向量 文本聚类 trade friction of agricultural products text representation word vector text clustering
  • 相关文献

参考文献1

二级参考文献28

  • 1Baeza-Yates R,Ribeiro-Neto B.Modern Information Retrieval[M].New York:ACM press,1999.
  • 2Manning C D,Schütze H.Foundations of Statistical NaturalLanguage Processing [M].Cambridge:MIT press,1999.
  • 3Hwang M,Choi C,Youn B,et al.Word Sense Disambiguation Based on Relation Structure[C]∥International Conference on Advanced Language Processing and Web Information Technology.2008:15-20.
  • 4Wang X,Mccallum A,Wei X.Topical N-Grams:Phrase andTopic Discovery,with an Application to Information Retrieval [C]∥IEEE International Conference on Data Mining.IEEE Computer Society,2007:697-702.
  • 5Haruechaiyasak C,Jitkrittum W,Sangkeettrakarn C,et al.Im-plementing News Article Category Browsing Based on Text Categorization Technique [C]∥2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.IEEE Computer Society,2008:143-146.
  • 6Mikolov T,Sutskever I,Chen K,et al.Distributed Representations of Words and Phrases and their Compositionality [J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
  • 7Mikolov T,Chen K,Corrado G,et al.Efficient Estimation of Word Representations in Vector Space [C]∥ICLR 2013.2013.
  • 8Joachims T.A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization [M].Springer US,1997:143-151.
  • 9Hinton G E.Learning distributed representations of concepts[C]∥Proceedings of CogSci.1986:1-12.
  • 10Socher R,Bauer J,Manning C D,et al.Parsing with Compositional Vector Grammars [C]∥Meeting of the Association for Computational Linguistics.2013:455-465.

共引文献144

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部