期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
一个面向大规模数据库的数据挖掘系统 被引量:28
1
作者 钱卫宁 魏藜 +2 位作者 王焱 钱海蕾 周傲英 《软件学报》 EI CSCD 北大核心 2002年第8期1540-1545,共6页
数据挖掘融合了数据库技术、人工智能和统计学,是目前的研究热点.为了能够集成当前数据挖掘的主要技术并使它们协同工作,在进行数据挖掘基本算法研究的基础上研制开发了一个数据挖掘系统——Golden-Eye.系统实现了在数据挖掘研究中的一... 数据挖掘融合了数据库技术、人工智能和统计学,是目前的研究热点.为了能够集成当前数据挖掘的主要技术并使它们协同工作,在进行数据挖掘基本算法研究的基础上研制开发了一个数据挖掘系统——Golden-Eye.系统实现了在数据挖掘研究中的一些最新成果,集成了泛化、数据清洗这两个数据准备操作以及关联规则发现、例外规则发现、时序模式发现、分类器构造、聚类分析等基本数据挖掘操作,并实现了对挖掘操作的基本管理和结果的图形化显示.整个框架设计充分体现了系统的完整性、协调性和高效性:自底向上将存储控制模块、数据预处理模块、挖掘操作模块、挖掘库管理模块有机地结合在一起,在底层实现了对包括中间结果在内的数据的统一管理,在上层为用户提供了可视化的界面.实验结果表明,该系统能够在大规模数据库上成功地完成用户所指定的数据挖掘操作. 展开更多
关键词 大规模数据库 数据挖掘系统 数据预处理 存储控制 知识发现
下载PDF
A New Classification Method to Overcome Over—Branching
2
作者 周傲英 钱卫宁 +1 位作者 钱海蕾 金文 《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第1期18-27,共10页
Classification is an important technique in data mining. The decision trees built by most of the existing classification algorithms commonly feature over-branching, which will lead to poor efficiency in the subsequent... Classification is an important technique in data mining. The decision trees built by most of the existing classification algorithms commonly feature over-branching, which will lead to poor efficiency in the subsequent classification period. In this paper, we present a new value-oriented classification method, which aims at building accurately proper-sized decision trees while reducing over-branching as much as possible, based on the concepts of frequentpattern-node and exceptive-child-node. The experiments show that while using relevant analysis as pre-processing, our classification method, without loss of accuracy, can eliminate the over-branching greatly in decision trees more effectively and efficiently than other algorithms do. 展开更多
原文传递
Clustering DTDS:An Interactive Two—Level Approach
3
作者 周傲英 钱卫宁 +3 位作者 钱海蕾 张龙 梁宇奇 金文 《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第6期807-819,共13页
XML (eXtensible Markup Language) is a standard which is widely appliedin data representation and data exchange. However, as an important concept of XML, DTD(Document Type Definition) is not taken full advantage in cur... XML (eXtensible Markup Language) is a standard which is widely appliedin data representation and data exchange. However, as an important concept of XML, DTD(Document Type Definition) is not taken full advantage in current applications. In this paper, anew method for clustering DTDs is presented, and it can be used in XML document clustering.The two-level method clusters the elements in DTDs and clusters DTDs separately. Elementclustering forms the first level and provides element clusters, which are the generalization ofrelevant elements. DTD clustering utilizes the generalized information and forms the secondlevel in the whole clustering process. The two-level method has the following advantages: 1) Ittakes into consideration both the content and the structure within DTDs; 2) The generalizedinformation about elements is more useful than the separated words in the vector model; 3) Thetwo-level method facilitates the searching of outliers. The experiments show that this methodis able to categorize the relevant DTDs effectively. 展开更多
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部