摘要
基于万维网(Web)的商务智能和决策系统成功的关键是遴选并使用万维网上的高质量信息。由于Web资源具有高动态性、高自主性、数据海量、信息类型多样,以及应用要求不同等特点,造成了严峻的Web信息源质量问题。国内外已开始着手Web资源质量的研究。分析了各类基于Web的高端应用(如商务智能)对Web资源和信息的质量要求,指出了Web资源质量带来的挑战,综述了Web资源质量模式发现和评测方法的现状,深入讨论了应用数据挖掘及相关技术发现、处理Web资源质量异常的原理,指明了Web资源质量挖掘领域亟待解决的问题和需要深入研究的方向。
The key to success of the Web-based information management, business intelligence and decision making systerns is high quality information from the Web. However, the Web source quality is very problematic due to the peculiar characteristics of the Web, such as, dynamics and autonomy of Web sources, enormous amount and various types of Web data, multifarious quality requirements of Web applications, etc. There has been some work on Web source quality management. In this paper,the quality requirements of advanced Web-based applications(e, g. business intelligence) and the quality challenges of Web sources were analyzed. The state-of-art in Web source quality pattern discovery and evaluation was surveyed. Data mining and the related approaches for dealing with Web quality issues were investigated to reveal many still unsolved problems and to suggest several important research directions.
出处
《计算机科学》
CSCD
北大核心
2010年第8期201-207,共7页
Computer Science
基金
国家自然科学基金(60573165)
教育部留学回国人员基金
西南交通大学科技发展基金项目(2007A14)资助
关键词
Web资源质量
质量模式挖掘
元数据管理
质量评测方法
Web source quality, Quality pattern mining, Metadata management, Quality evaluation approaches