期刊文献+

基于远程学习的关键词提取技术研究 被引量:1

Research on Extraction Technology Based on Remote Learning
下载PDF
导出
摘要 伴随着互联网技术的发展,文本数量的爆发式增长带来了处理文本数据的一些困扰,传统的文本聚类以及关键词提取的技术不能很好解决对大数据进行精准筛选的需求。对此,提出利用基于LDA算法的潜在语义模型来对文本进行文本聚类,得到了对文本进行聚类的结果和LDA提取出来的主题词语;然后利用FP-growth算法对LDA算法的结果进行分析,对文本进行挖掘,得到中文关键词集;借助网络知识库的思想,利用百度百科提出了汉语比对算法对中文关键词集进行筛选,过滤掉了很多噪声词。实验表明,本文的方法可以很好地对给定的中文语料文本进行文本聚类和关键词提取,特别是在增加了基于百度百科远程学习的筛选之后,系统的准确率有大幅度的提高。 With the development of Internet technology,the explosive growth in the number of text has brought some troubles in processing text data.The traditional text clustering and keyword extraction technology cannot solve the need for precise screening of large data very well.This paper combines text clustering and keyword extraction.The text clustering based on LDA algorithm is proposed.The results of clustering and the subject terms extracted from LDA are obtained.Then the FP-growth algorithm is used to analyze the results of the LDA algorithm,and the text is mined.In this paper,according to the idea of using the network knowledge base,the Baidu encyclopedia is used to put forward the Chinese comparison algorithm to select the Chinese keyword set and filter out a lot of noise words.Experimental results show that the method can cluster text and extract keyword perfectly for a given Chinese corpus by comparing with the existing method.On the basis of increasing the word selection of Baidu encyclopedia,the accuracy of the system is greatly improved.
作者 曹聪慧 兰强 侯群 漆为民 CAO Cong-hui;LAN Qiang;HOU Qun;QI Wei-min(School of Artificial Intelligence,Jianghan University,Wuhan 430056,Hubei;Dongfeng Motor Finance Co.,Ltd.,Wuhan 430056,Hubei)
出处 《电脑与电信》 2021年第8期1-5,9,共6页 Computer & Telecommunication
基金 湖北省教育厅科学研究计划指导性项目,项目编号:B2020224 江汉大学湖北省重点学科管理科学与工程2019年度开放性课题,项目编号:ZDXK2019YB05 江汉大学高层次人才科研启动经费,项目编号:2019032。
关键词 文本聚类 关键词提取 LDA算法 远程学习 汉语比对算法 text clustering keyword extraction LDA algorithm remote learning Chinese alignment algorithm
  • 相关文献

参考文献4

二级参考文献41

  • 1[1]中国社会科学研究评价中心.中文社会科学引文索引[EB/OL].[2008-08-25].http://cssci.nju.edu.cn/introduce.htm.
  • 2[1]Chien Lee-Feng.PAT-tree-based keyword extraction for Chinese information retrieval[C]//Proceedifigs of the ACM SIGIR Intemational Conference on Information Retrieval,1997:50-59
  • 3[2]Yang Wenfen,Li Xing.Chinese keyword extraction based on max-duplicated strings of the documents[C]//Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,2002
  • 4[3]Zhang Kuo,Xu Hui.Tang Jie,et al.Keyword extraction usingsupport vector machine[C]//Proceedings of the 7th International Conference on Web-Age Information Management,Hong Kong,China,2006:85-96
  • 5[4]Olena M,Witten I H.Thesaurus-based index term extraction for agricultural documents[C]//Proceedings of the 6th Agricultural Ontology Service Workshop at EFITA/WCCA.Vila Real;IEEE Press,2005:11-22
  • 6[5]Peter T.Learning to extract keyphrases from text[R].OTTAWA:National Research Council,1999:1-43
  • 7[7]俞鸿魁,张华平,刘群.基于角色标注的中文机构名识别[C]//Proceedings of the 20th International Conference on Computer Processing of Oriental Languages(ACOL),2003
  • 8[9]中国科学院计算技术研究所.汉语词法分析系统ICTCLAS[EB/OL].[2008-03-10].http://www.i3s.ac.err/index.htm
  • 9[10]詹卫东.中文信息处理基础[EB/OL].[2008-03-10].http://ccl.pku.edu.cn/doubffire/Course/Chinese%20 Information%20Processing/2002_2003_1.htm
  • 10[11]吕震宇.SharplCTCLAS分词系统[EB/OL].[2008-03-10].http://www.cnblogs.com/zhenyulu/category/85598.html

共引文献126

同被引文献3

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部