数据挖掘(Data Mining,简称DM),又称数据库中的知识发现(Knowledge Discovery in Database,简称:KDD),是指从大型数据库或数据仓库中提取隐含的、未知的、特殊的及有潜在应用价值的信息或模式。本文在了解数据挖掘的定义和市场前景的基...数据挖掘(Data Mining,简称DM),又称数据库中的知识发现(Knowledge Discovery in Database,简称:KDD),是指从大型数据库或数据仓库中提取隐含的、未知的、特殊的及有潜在应用价值的信息或模式。本文在了解数据挖掘的定义和市场前景的基础上,结合当今企业对数据挖掘技术的重视和数据挖掘技术对企业发展带来的巨大效益的现状,从众多方面提出和分析了数据挖掘技术在发展和应用过程中面临的主要问题。并且结合开放网格服务体系(OGSA)的思想,采用分层描述的方法,围绕功能、结构、调度成本和网格服务目标等,设计了一种五层的网格数据挖掘体系结构(GDMA)。该体系结构以服务为核心,通过统一的网格服务接口屏蔽资源的异构性,并针对用户的特殊需求,设计了基于工作流和Web服务的数据挖掘客户端。展开更多
The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things...The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things, this paper proposes a novel distributed data-mining model to realize the seamless access between cloud computing and distributed data mining. The model is based on the cloud computing architecture, which belongs to the type of incredible nodes.展开更多
Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scena...Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios(OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpec Miner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.展开更多
文摘数据挖掘(Data Mining,简称DM),又称数据库中的知识发现(Knowledge Discovery in Database,简称:KDD),是指从大型数据库或数据仓库中提取隐含的、未知的、特殊的及有潜在应用价值的信息或模式。本文在了解数据挖掘的定义和市场前景的基础上,结合当今企业对数据挖掘技术的重视和数据挖掘技术对企业发展带来的巨大效益的现状,从众多方面提出和分析了数据挖掘技术在发展和应用过程中面临的主要问题。并且结合开放网格服务体系(OGSA)的思想,采用分层描述的方法,围绕功能、结构、调度成本和网格服务目标等,设计了一种五层的网格数据挖掘体系结构(GDMA)。该体系结构以服务为核心,通过统一的网格服务接口屏蔽资源的异构性,并针对用户的特殊需求,设计了基于工作流和Web服务的数据挖掘客户端。
文摘The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things, this paper proposes a novel distributed data-mining model to realize the seamless access between cloud computing and distributed data mining. The model is based on the cloud computing architecture, which belongs to the type of incredible nodes.
基金supported by the Scientific Research Project of the Education Department of Hubei Province,China(No.Q20181508)the Youths Science Foundation of Wuhan Institute of Technology(No.k201622)+5 种基金the Surveying and Mapping Geographic Information Public Welfare Scientific Research Special Industry(No.201412014)the Educational Commission of Hubei Province,China(No.Q20151504)the National Natural Science Foundation of China(Nos.41501505,61502355,61502355,and 61502354)the China Postdoctoral Science Foundation(No.2015M581887)the Key Program of Higher Education Institutions of Henan Province,China(No.17A520040)and the Natural Science Foundation of Henan Province,China(No.162300410177)
文摘Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data(object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios(OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpec Miner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.