
文本挖掘技术综述 被引量:29

On Technology of Document Mining
摘要 文本挖掘,是一个对具有丰富语义的文本进行分析从而理解其所包含的内容和意义的过程 对其进行深入的研究势必将极大地提高人们从海量的文本数据中提取信息的能力,具有很高的商业价值 首先介绍了文本数据挖掘的研究情况,然后给出了文本挖掘的框架,对文本挖掘中信息的抽取技术以及文本挖掘中使用的相关技术、评估方法等都作了详细的介绍,最后指出了文本挖掘在知识发现中的重要意义。 Document Mining(DM), also known as Text Mining, is the process of analyzing a semantically rich document or set of documents to understand the content and meaning of the information they contain. The research in Document Mining will enhance human's ability to process massive quantities of information, and has high commercial values. Firstly, the paper discusses the research status of DM Then it lays out the framework of the DM and introduces techniques of Information Extraction, Document Mining, and evaluation research for Document Mining. In the end, it shows the importance of DM in knowledge discovery and highlights the upcoming challenges of document mining and the opportunities it offers.
作者 梅馨 邢桂芬
出处 《江苏大学学报(自然科学版)》 EI CAS 2003年第5期72-76,共5页 Journal of Jiangsu University:Natural Science Edition
基金 教育部重点科技基金资助项目(1633000004)
关键词 文本挖掘 信息提取 信息检索 数据挖掘 知识发现 document mining information extraction information retrieval data mining knowledge acquisition
  • 引文网络
  • 相关文献


  • 1王继成,潘金贵,张福炎.Web文本挖掘技术研究[J].计算机研究与发展,2000,37(5):513-520. 被引量:275
  • 2宋擒豹,沈钧毅.基于关联规则的Web文档聚类算法[J].软件学报,2002,13(3):417-423. 被引量:41
  • 3邢桂芬.ERP与CRM一体化结构中数据实时通讯的研究[J].江苏大学学报(自然科学版),2002,23(6):79-81. 被引量:2
  • 4Gerald DeJong. An Overview of the Frump System[C].In: Lehnert W B, Ringle M H. Strategies for NaturalLanguage Processing. Erlbaum, 1982.
  • 5Helena Ahonen, Oskari Heinonen, Mika Klemettinen,Inkeri Verkamo A. Mining in the Phrasal Frontier[C].In: Proceedings of PKDD' 97 - 1st European Symposium on Principles of Data Mining and Knowledge Discovery. Norway: Trondheim, 1997.
  • 6Helena Ahonen, Oskari Heinonen, Mika Klemettinen,Inkeri Verkamo A [ J ]. Applying Data Mining Techniques in Text Analysis, 1997 (2) : 4 - 8.
  • 7Ronen Feldman, Willi Klosgen, Yaniv Ben-Yehuda, Gil Kedar, Vladmir Reznikov. Pattern Based Browsing in Document Collections[J]. Principles of Data Mining and Knowledge Discovery, 1997, 1263: 112-122.
  • 8Oren Etzioni. The World-Wide Web: Quagmire or Gold Mine[J ]. Communications of the ACM, 1996,39 ( 11 ) :65 - 68.
  • 9Stephen Soderland. Learning to Extract Text-Based Information from the World Wide Web[C]. In: Proceedings of Third International Conference on Knowledge Discovery and Data Mining(KDD- 97), 1997.
  • 10Usama Fayyad , Gregory Piatetsky Shapiro , Padhrasic Smyth. The KDD Process for Extracting Useful Knowledge from Volume of Data [ J ]. Communications of the ACM, 1996,39(11):27 - 34.


  • 1Tom Myers 王辉(译).Java XML编程指南[M].北京:电子工业出版社,2001..
  • 2[1]Broder,A.Z.,Glassman,S.C.,Manasse,M.S.Syntactic clustering of the Web.Technical Report,1997-015,Palo Alto,CA:Digital Systems Research Center (Digital),1997.
  • 3[2]Chang,C.H.,Hsu,C.C.Customizable multi-engine search tool with clustering.Computer Network and ISDN Systems,1997,29(8-13):1217~1224.
  • 4[3]Chen,L.,Katya,S.Webmate:a personal agent browsing and searching.In:Sycara,K.P.,Wooldridge,M.,eds.Proceedings of the 2nd International Conference on Autonomous Agents.New York:ACM Press,1998.132~139.
  • 5[4]Ron,W.,Bienvenido,V.,Mark,A.S.,et al.Hypursuit:a hierarchical network search engine that exploits content-link hypertext clustering.In:ACM,ed.Proceedings of the 7th ACM Conference on Hypertext.New York:ACM Press,1996.180~193.
  • 6[5]Ackerman,M.,Billsus,D.,Gaffney,S.,et al.Learning probabilistic user profiles.AI Magazine,1997,18(2):47~56.
  • 7[6]Cheeseman,P.,Stutz,J.Bayesian classification (autoclass):theory and results.In:Fayyad,U.M.,Piatetsky-Shapiro,G.,Smyth,P.,et al.,eds.Advances in Knowledge Discovery and Data Mining.Menlo Park,CA:AAAI/MIT Press,1996.153~180.
  • 8[7]Agrawal,R.,Srikant,R.Fast algorithm for mining association rules.In:Jorge,B.B,Matthias,J.,Carlo,Z.,eds.Proceedings of the 20th International Conference on Very Large Databases.Santiago:Morgan Kaufmann Publishers,Inc.,1994.487~499.
  • 9Zalane O R,Proc of 1998ACM-SIGMOD Conf onManagement of Data.Seattle,1998年,581页
  • 10Wang Ke,Newport Beach,1997年





使用帮助 返回顶部