期刊文献+

基于TF-IDF算法的文本信息提取 被引量:12

Text information extraction based on TF-IDF algorithm
下载PDF
导出
摘要 随着大数据时代的到来,数据量呈几何倍增长。文本信息是人们接触最多的信息,关键信息作为对文本主题的高度概括,成为用户了解文本主题的快速渠道,如何快速有效的挖掘文本关键信息成为研究的关键问题。本文以本溪市政府工作报告为研究对象,将文本信息进行抽象,利用TF-IDF算法实现对文本中频繁出现的短语进行批量自动提取,统计频繁短语出现的频次,进而提取关键信息。通过对政府工作报告的提取,可以看出政府建设本溪的总体趋势,并且积极响应国家号召,总体推进本溪政府工作不断向前。 With the advent of the big data era, the volume of data has increased exponentially.Text information is the most accessible information, and the key information, as a high summary of the text theme, has become a fast channel for users to understand the theme of the text.How to quickly and effectively excavate the key information of the text has become the key issue of the research.This paper takes the Benxi municipal government ' s work report as the research object and abstracts the text information.TF-IDF algorithm is used to automatically extract frequent phrases in the text, and the frequent occurrences of frequent phrases are extracted, and the key information is extracted.Through the extraction of the government work report, we can see the general trend of the government ' s construction of benxi, and actively respond to the national call, so as to push forward the work of benxi government.
作者 于韬 王洪岩 YU Tao ,WANG Hong-yan(Liaoning Institute of Science and Technology Benxi,Liaoning 117004,China)
机构地区 辽宁科技学院
出处 《科技视界》 2018年第16期117-118,共2页 Science & Technology Vision
基金 基于文献知识图谱的智能推荐系统(201811430044) 辽宁省教育厅科学技术研究青年项目(L2017lkyqn-01) 辽宁科技学院青年基金(Qn201603) 辽宁科技学院服务地方创新发展软科学项目(20162rkx-06)
关键词 进行关键词提取的工作 Key in fomlation extraetion TF-IDF algorithm Frequent phrases Word frequency statistics
  • 相关文献

参考文献1

二级参考文献14

  • 1ABILHOA W D, CASTRO L N D. A keyword extraction method from twitter messages represented as graphs [ J]. Applied Mathematics and Computation, 2014, 240(4) : 308 - 325.
  • 2CHEN Y H, LU J L, MENG F T. Finding keywords in blogs: efficient keyword extraction in blog mining via user behaviors [ J]. Expert Systems with Applications, 2014, 41(2):663 -670.
  • 3JEAN-LOUIS L, GAGNON M, CHARTON E. A knowledge-base o-riented approach for automatic keyword extraction [ J]. Computacion y Sistemas, 2013, 17(2) : 187 - 196.
  • 4HABIBI M, POPESCU-BELIS A. Keyword extraction and clustering for document recommendation in conversations [ J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2015, 23 (4) :746 -759.
  • 5ZIPF G K. Human behavior and the principle of least effort: an introduction to human ecology [ M]. Boston: Addison-Wesley Press, 1949: 23.
  • 6BOOTH A D. A law of occurrences for words of low frequency [ J]. Information and Control, 1967, 10(4) : 386 -393.
  • 7EGGHE L. A new short proof of Naranan's theorem, explaining Lotka's law and Zipt's law [ J]. Journal of the American Society for Information Science and Technology, 2010, 61(12) : 2581 -2583.
  • 8CHAN P, HIJIKATA Y, NISHIDA S. Computing semantic relatedness using word frequency and layout information of wikipedia [ C]// Proceedings of the 28th Annual ACM Symposium on Applied Computing. New York: ACM, 2013:282-287.
  • 9SURYASEN R, RANA M S. Content analysis and application of Zipfs law in computer science literature [ C]//Proceedings of the 2015 4th International Symposium on Emerging Trends and Technologies in Libraries and Information Services. Piseataway, NJ: IEEE, 2015:223 -227.
  • 10ZIPF G K. Psyehol [ M]. Boston: Addison-Wesley Press, 1938: 347 - 367.

共引文献75

同被引文献156

引证文献12

二级引证文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部