Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the differe...Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2).展开更多
Municipal solid waste landfill is the main disposal option for domestic garbage,in which microbial activities play an important role.However,despite the widespread practice of landfilling,the metagenomic microbial pro...Municipal solid waste landfill is the main disposal option for domestic garbage,in which microbial activities play an important role.However,despite the widespread practice of landfilling,the metagenomic microbial profiles of landfill sites remain poorly characterized.In this study,we used a combination of physicochemical analysis,ultraviolet-visible spectrophotometry,and high-throughput Illumina shotgun sequencing to systematically investigate the changes in soil enzyme activities,microbial community structure,and functional attributes in aged refuse collected from the Xingou Municipal Solid Waste Landfill in Taiyuan,China,with ordinary topsoil from an area within 5 km of the landfill as control soil.Except for neutral phosphatase(P=0.065),the activities of urease,laccase,dehydrogenase,sucrase,neutral protease,andβ-glucosidase were all significantly reduced(P<0.05)in the aged refuse compared with the control soil.Contrastingly,catalase activity was found to be significantly elevated in the aged refuse.Compared with the control soil,aged refuse was characterized by higher richness and diversity of microbial communities,as reflected by the higher values of community richness estimators(Chao 1 and ACE)and diversity indices(Shannon and Simpson).In total,186 phyla,4354 genera,and 34459 species were identified,with 132 phyla,1914 genera,and7369 species showing significantly different abundances between the aged refuse and the control soil.Actinobacteria and Acidobacteria were identified as the dominant phyla in the control soil,whereas Proteobacteria,Euryarchaeota(archaea),and Firmicutes were found to predominate in the aged refuse.Notably,Euryarchaeota and Methanoculleus were the major taxa detected in the aged refuse,but were almost completely absent in the control soil.Xenobiotic biodegradation and bacterial chemotaxis were the main functions of the microflora in the aged refuse,whereas the carbohydrate,amino acid,energy,and lipid metabolism pathways were significantly enriched in the control soil.Moreover,the aged refuse contained a high abundance of genes involved in quorum sensing.Our findings in this study revealed close associations between enzyme activities and variations in the microbial community structure and genes that were actively involved in biodegradation activities at landfill sites.It was found that the landfill environment was characterized by a more complex spectrum of microbial activities than expected.Further investigations are needed to gain a more comprehensive understanding of the microbial community structure and functional attributes as well as their potential influencing factors in the landfill environment.展开更多
The maturity of big data analysis theory and its tools improve the efficiency and reduce the cost of massive data mining.This paper discusses the method of product customer demand mining based on big data,and further ...The maturity of big data analysis theory and its tools improve the efficiency and reduce the cost of massive data mining.This paper discusses the method of product customer demand mining based on big data,and further studies the configuration of product function attributes.Firstly,the Hadoop platform was used to perform product attribute data participle and feature word extraction based on Apriori algorithm was used to mine product customer demand information.And then the MapReduce model on the big data platform was applied into efficient parallel data processing,obtaining product attributes with research value,and their weights and attribute levels.After that,the cloud model and the MNL model were employed to construct the product function attribute configuration model,and the improved artificial bee colony algorithm was used to solve the model.The optimal solution of the product function attribute configuration model was got.Finally,an example was given to illustrate the feasibility of the proposed method in this paper.展开更多
A fast physics analysis framework has been developed based on SNi PER to process the increasingly large data sample collected by BESⅢ.In this framework,a reconstructed event data model with Smart Ref is designed to i...A fast physics analysis framework has been developed based on SNi PER to process the increasingly large data sample collected by BESⅢ.In this framework,a reconstructed event data model with Smart Ref is designed to improve the speed of Input/Output operations,and necessary physics analysis tools are migrated from BOSS to SNi PER.A real physics analysis e~+e^-→ π~+π^-J/ψ is used to test the new framework,and achieves a factor of10.3 improvement in Input/Output speed compared to BOSS.Further tests show that the improvement is mainly attributed to the new reconstructed event data model and the lazy-loading functionality provided by Smart Ref.展开更多
基金Supported by the National Natural Science Foundation of China (90204008)Chen-Guang Plan of Wuhan City(20055003059-3)
文摘Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2).
基金financially supported by the Fundamental Research Program of Shanxi Province,China(Nos.202103021224263 and 202203021212493)。
文摘Municipal solid waste landfill is the main disposal option for domestic garbage,in which microbial activities play an important role.However,despite the widespread practice of landfilling,the metagenomic microbial profiles of landfill sites remain poorly characterized.In this study,we used a combination of physicochemical analysis,ultraviolet-visible spectrophotometry,and high-throughput Illumina shotgun sequencing to systematically investigate the changes in soil enzyme activities,microbial community structure,and functional attributes in aged refuse collected from the Xingou Municipal Solid Waste Landfill in Taiyuan,China,with ordinary topsoil from an area within 5 km of the landfill as control soil.Except for neutral phosphatase(P=0.065),the activities of urease,laccase,dehydrogenase,sucrase,neutral protease,andβ-glucosidase were all significantly reduced(P<0.05)in the aged refuse compared with the control soil.Contrastingly,catalase activity was found to be significantly elevated in the aged refuse.Compared with the control soil,aged refuse was characterized by higher richness and diversity of microbial communities,as reflected by the higher values of community richness estimators(Chao 1 and ACE)and diversity indices(Shannon and Simpson).In total,186 phyla,4354 genera,and 34459 species were identified,with 132 phyla,1914 genera,and7369 species showing significantly different abundances between the aged refuse and the control soil.Actinobacteria and Acidobacteria were identified as the dominant phyla in the control soil,whereas Proteobacteria,Euryarchaeota(archaea),and Firmicutes were found to predominate in the aged refuse.Notably,Euryarchaeota and Methanoculleus were the major taxa detected in the aged refuse,but were almost completely absent in the control soil.Xenobiotic biodegradation and bacterial chemotaxis were the main functions of the microflora in the aged refuse,whereas the carbohydrate,amino acid,energy,and lipid metabolism pathways were significantly enriched in the control soil.Moreover,the aged refuse contained a high abundance of genes involved in quorum sensing.Our findings in this study revealed close associations between enzyme activities and variations in the microbial community structure and genes that were actively involved in biodegradation activities at landfill sites.It was found that the landfill environment was characterized by a more complex spectrum of microbial activities than expected.Further investigations are needed to gain a more comprehensive understanding of the microbial community structure and functional attributes as well as their potential influencing factors in the landfill environment.
基金the National Natural Science Foundation of China granted 71961005the Guangxi Science and Technology Program granted 1598007-15.
文摘The maturity of big data analysis theory and its tools improve the efficiency and reduce the cost of massive data mining.This paper discusses the method of product customer demand mining based on big data,and further studies the configuration of product function attributes.Firstly,the Hadoop platform was used to perform product attribute data participle and feature word extraction based on Apriori algorithm was used to mine product customer demand information.And then the MapReduce model on the big data platform was applied into efficient parallel data processing,obtaining product attributes with research value,and their weights and attribute levels.After that,the cloud model and the MNL model were employed to construct the product function attribute configuration model,and the improved artificial bee colony algorithm was used to solve the model.The optimal solution of the product function attribute configuration model was got.Finally,an example was given to illustrate the feasibility of the proposed method in this paper.
基金Supported by Joint Large-Scale Scientific Facility Funds of the NSFC and CAS(U1532258)Program for New Century Excellent Talents in University(NCET-13-0342)+1 种基金Shandong Natural Science Funds for Distinguished Young Scholar(JQ201402)National Key Basic Research Program of China under Contract(2015CB856700)
文摘A fast physics analysis framework has been developed based on SNi PER to process the increasingly large data sample collected by BESⅢ.In this framework,a reconstructed event data model with Smart Ref is designed to improve the speed of Input/Output operations,and necessary physics analysis tools are migrated from BOSS to SNi PER.A real physics analysis e~+e^-→ π~+π^-J/ψ is used to test the new framework,and achieves a factor of10.3 improvement in Input/Output speed compared to BOSS.Further tests show that the improvement is mainly attributed to the new reconstructed event data model and the lazy-loading functionality provided by Smart Ref.