期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
利用数据库技术实现的可扩展的分类算法 被引量:14
1
作者 刘红岩 陆宏钧 陈剑 《软件学报》 EI CSCD 北大核心 2002年第6期1075-1081,共7页
重点研究将数据挖掘中的分类技术与数据库技术紧密结合的高效的可扩展的分类算法.提出一种基于分组记数技术构造分类器的方法,利用数据库系统的结构化查询语言来实现主要计算任务.为了提高算法的执行效率,还提出了优化策略和冗余规则的... 重点研究将数据挖掘中的分类技术与数据库技术紧密结合的高效的可扩展的分类算法.提出一种基于分组记数技术构造分类器的方法,利用数据库系统的结构化查询语言来实现主要计算任务.为了提高算法的执行效率,还提出了优化策略和冗余规则的剪裁策略,并将分类规则的发现过程与相关属性的选择方法有机地结合在一起.使用这些方法和策略,分类算法能够从大规模数据集中快速地发现一组简洁的规则.除了具有与现有分类算法相当的准确度和较高的执行效率以外,该分类算法还具有良好的基于训练集元组个数和属性个数两方面的可扩展性和易于实现的特点. 展开更多
关键词 数据库 可扩展 分类算法 数据挖掘 结构化查询语言 知识发现
下载PDF
关于切换回归的集成模糊聚类算法GFC(英文) 被引量:6
2
作者 王士同 江海峰 陆宏钧 《软件学报》 EI CSCD 北大核心 2002年第10期1905-1914,共10页
已经有多个方法可用于解决切换回归问题.根据所提出的基于Newton引力定理的引力聚类算法GC,结合模糊聚类算法,进一步提出了新的集成模糊聚类算法 GFC.理论分析表明GFC能收敛到局部最小.实验结果表明GFC在解决切换回归问题时,比标准模糊... 已经有多个方法可用于解决切换回归问题.根据所提出的基于Newton引力定理的引力聚类算法GC,结合模糊聚类算法,进一步提出了新的集成模糊聚类算法 GFC.理论分析表明GFC能收敛到局部最小.实验结果表明GFC在解决切换回归问题时,比标准模糊聚类算法更有效,特别在收敛速度方面. 展开更多
关键词 切换回归 集成模糊聚类算法 GFC GFC.理论
下载PDF
Data Extraction from the Web Based on Pre—Defined Schema 被引量:4
3
作者 孟小峰 陆宏钧 +1 位作者 王海燕 谷明哲 《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第4期371-382,共12页
With the development of the Internet, the World Wide Web has become an invaluable information source for most organizations. However, most documents available from the Web are in HTML form which is originally designed... With the development of the Internet, the World Wide Web has become an invaluable information source for most organizations. However, most documents available from the Web are in HTML form which is originally designed for document formatting with little consideration of its contents. Effectively extracting data from such documents remains a nontrivial task. In this paper, we present a schema-guided approach to extracting data from HTML pages. Under the approach, the user defines a schema specifying what to be extracted and provides sample mappings between the schema and the HTML page. The system will induce the mapping rules and generate a wrapper that takes the HTML page as input and produces the required data in the form of XML conforming to the user-defined schema. A prototype system implementing the approach has been developed. The preliminary experiments indicate that the proposed semi-automatic approach is not only easy to use but also able to produce a wrapper that extracts required data from inputted pages with high accuracy. 展开更多
原文传递
Managing Very Large Document Collections Using Semantics
4
作者 王国仁 陆宏钧 +1 位作者 于戈 鲍玉斌 《Journal of Computer Science & Technology》 SCIE EI CSCD 2003年第3期403-406,共4页
In this paper, a system is presented where documents are no longer identified bytheir file names. Instead, a document is represented by its semantics in terms of descriptor andcontent vector. The descriptor of a docum... In this paper, a system is presented where documents are no longer identified bytheir file names. Instead, a document is represented by its semantics in terms of descriptor andcontent vector. The descriptor of a document consists of a set of attributes, such as date of creation,its type, its size, annotations, etc. The content vector of a document consists of a set of termsextracted from the document. In this paper, a semantic document management system XBASEis designed and implemented based on the semantics and the functions of three main modules,X-Loader, X-Explorer and X-Query. 展开更多
原文传递
A Fast Scalable Classifier Tightly Integrated with RDBMS
5
作者 刘红岩 陆宏钧 陈剑 《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第2期152-159,共8页
In this paper, we report our success in building efficient scalable classifiers by exploring the capabilities of modern relational database management systems(RDBMS).In addition to high classification accuracy, the un... In this paper, we report our success in building efficient scalable classifiers by exploring the capabilities of modern relational database management systems(RDBMS).In addition to high classification accuracy, the unique features of theapproach include its high training speed, linear scalability, and simplicity in implementation. More importantly,the major computation required in the approachcan be implemented using standard functions provided by the modern relational DBMS.Besides, with the effective rule pruning strategy, the algorithm proposed inthis paper can produce a compact set of classification rules. The results of experiments conducted for performance evaluation and analysis are presented. 展开更多
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部