摘要
文本挖掘,是一个对具有丰富语义的文本进行分析从而理解其所包含的内容和意义的过程 对其进行深入的研究势必将极大地提高人们从海量的文本数据中提取信息的能力,具有很高的商业价值 首先介绍了文本数据挖掘的研究情况,然后给出了文本挖掘的框架,对文本挖掘中信息的抽取技术以及文本挖掘中使用的相关技术、评估方法等都作了详细的介绍,最后指出了文本挖掘在知识发现中的重要意义。
Document Mining(DM), also known as Text Mining, is the process of analyzing a semantically rich document or set of documents to understand the content and meaning of the information they contain. The research in Document Mining will enhance human's ability to process massive quantities of information, and has high commercial values. Firstly, the paper discusses the research status of DM Then it lays out the framework of the DM and introduces techniques of Information Extraction, Document Mining, and evaluation research for Document Mining. In the end, it shows the importance of DM in knowledge discovery and highlights the upcoming challenges of document mining and the opportunities it offers.
出处
《江苏大学学报(自然科学版)》
EI
CAS
2003年第5期72-76,共5页
Journal of Jiangsu University:Natural Science Edition
基金
教育部重点科技基金资助项目(1633000004)
关键词
文本挖掘
信息提取
信息检索
数据挖掘
知识发现
document mining
information extraction
information retrieval
data mining
knowledge acquisition