Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the differe...Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2).展开更多
The maturity of big data analysis theory and its tools improve the efficiency and reduce the cost of massive data mining.This paper discusses the method of product customer demand mining based on big data,and further ...The maturity of big data analysis theory and its tools improve the efficiency and reduce the cost of massive data mining.This paper discusses the method of product customer demand mining based on big data,and further studies the configuration of product function attributes.Firstly,the Hadoop platform was used to perform product attribute data participle and feature word extraction based on Apriori algorithm was used to mine product customer demand information.And then the MapReduce model on the big data platform was applied into efficient parallel data processing,obtaining product attributes with research value,and their weights and attribute levels.After that,the cloud model and the MNL model were employed to construct the product function attribute configuration model,and the improved artificial bee colony algorithm was used to solve the model.The optimal solution of the product function attribute configuration model was got.Finally,an example was given to illustrate the feasibility of the proposed method in this paper.展开更多
A fast physics analysis framework has been developed based on SNi PER to process the increasingly large data sample collected by BESⅢ.In this framework,a reconstructed event data model with Smart Ref is designed to i...A fast physics analysis framework has been developed based on SNi PER to process the increasingly large data sample collected by BESⅢ.In this framework,a reconstructed event data model with Smart Ref is designed to improve the speed of Input/Output operations,and necessary physics analysis tools are migrated from BOSS to SNi PER.A real physics analysis e~+e^-→ π~+π^-J/ψ is used to test the new framework,and achieves a factor of10.3 improvement in Input/Output speed compared to BOSS.Further tests show that the improvement is mainly attributed to the new reconstructed event data model and the lazy-loading functionality provided by Smart Ref.展开更多
基金Supported by the National Natural Science Foundation of China (90204008)Chen-Guang Plan of Wuhan City(20055003059-3)
文摘Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2).
基金the National Natural Science Foundation of China granted 71961005the Guangxi Science and Technology Program granted 1598007-15.
文摘The maturity of big data analysis theory and its tools improve the efficiency and reduce the cost of massive data mining.This paper discusses the method of product customer demand mining based on big data,and further studies the configuration of product function attributes.Firstly,the Hadoop platform was used to perform product attribute data participle and feature word extraction based on Apriori algorithm was used to mine product customer demand information.And then the MapReduce model on the big data platform was applied into efficient parallel data processing,obtaining product attributes with research value,and their weights and attribute levels.After that,the cloud model and the MNL model were employed to construct the product function attribute configuration model,and the improved artificial bee colony algorithm was used to solve the model.The optimal solution of the product function attribute configuration model was got.Finally,an example was given to illustrate the feasibility of the proposed method in this paper.
基金Supported by Joint Large-Scale Scientific Facility Funds of the NSFC and CAS(U1532258)Program for New Century Excellent Talents in University(NCET-13-0342)+1 种基金Shandong Natural Science Funds for Distinguished Young Scholar(JQ201402)National Key Basic Research Program of China under Contract(2015CB856700)
文摘A fast physics analysis framework has been developed based on SNi PER to process the increasingly large data sample collected by BESⅢ.In this framework,a reconstructed event data model with Smart Ref is designed to improve the speed of Input/Output operations,and necessary physics analysis tools are migrated from BOSS to SNi PER.A real physics analysis e~+e^-→ π~+π^-J/ψ is used to test the new framework,and achieves a factor of10.3 improvement in Input/Output speed compared to BOSS.Further tests show that the improvement is mainly attributed to the new reconstructed event data model and the lazy-loading functionality provided by Smart Ref.