摘要
作为从浩瀚的 Web信息资源中发现潜在的、有价值知识的一种有效技术 ,Web挖掘正悄然兴起 ,倍受关注 .目前 ,Web挖掘的研究正处于发展阶段 ,尚无统一的结论 ,需要国内外学者在理论上开展更多的讨论 .同时 ,Web挖掘系统的开发对其研究也将起到很大推进作用 .首先探讨了 Web挖掘的有关理论 ,从 Web挖掘的定义、Web挖掘与 Web信息检索的关系、Web挖掘任务的分类与功能等方面加以阐述 .然后重点分析了 Web文本挖掘的方法 ,包括 :文本的特征表示、文本分类与文本聚类 .在此基础上简单介绍了一个 Web文本挖掘系统原型Web Miner.Web Miner采用了多 agent体系结构 ,将多维文本分析与文本挖掘这两种技术有机地结合起来 ,以帮助用户快速、有效地挖掘 Web上的 HTML 文档 .
With the flood of information on the Web, Web mining is a new research issue which draws great interest from many communities. Currently, there is no agreement about Web mining yet. It needs more discussion among scientists in order to define what it is exactly. Meanwhile, the development of Web mining system will promote its research in turn. In this paper, a systemic discussion about the principle of Web mining is presented, including the definition, the relationship between information mining and retrieval on the Web, the taxonomy and function. Then the methods of text mining on the Web are discussed in detail and a prototype of Web text mining system WebMiner is introduced. WebMiner is a multi agent system which combines text mining and multi dimension text analysis in order to help user in mining HTML documents on the Web efficiently and effectively.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2000年第5期513-520,共8页
Journal of Computer Research and Development
关键词
文本挖掘
文本分类
文本聚类
信息检索
WEB
Web mining, text mining, text categorization, text clustering, multi dimension text analysis