摘要
在全球信息化进程中,信息超载已经成为一个大问题。Web上信息虽多,但想找到需要的信息却很困难。人们通过点击和搜索引擎与Web进行交互,但是都不能从中准确快捷地获取需要的信息,Web数据挖掘技术就是解决此问题的好方法。讲述了Web数据挖掘的基本理论,根据挖掘对象的不同将其划分为Web内容挖掘、Web链接结构挖掘和Web访问信息挖掘;利用HTML网页的特殊结构性质,提出了一种Web数据挖掘系统的通用框架,并讨论了一些实现的具体技术。
Information overloading is a big problem in the global informatization. The web is huge, but it is difficult to find what we needed on it. We interactive with the web by click or search engines, however neither can helps us get what we want ac-curately and immediately. The web mining techniques can resolve these problems. The theories of web mining are discussed in this paper. Based on the objects, we classify web mining into three categories, namely web content mining, web structure mining, and web usage mining. Finally, we proposed a general frame of web mining systems based on the specific structures of HTML pages. The implementation is also discussed in details.
出处
《计算机工程与设计》
CSCD
2002年第7期36-38,45,共4页
Computer Engineering and Design