Outlier detection is a key research area in data mining technologies,as outlier detection can identify data inconsistent within a data set.Outlier detection aims to find an abnormal data size from a large data size an...Outlier detection is a key research area in data mining technologies,as outlier detection can identify data inconsistent within a data set.Outlier detection aims to find an abnormal data size from a large data size and has been applied in many fields including fraud detection,network intrusion detection,disaster prediction,medical diagnosis,public security,and image processing.While outlier detection has been widely applied in real systems,its effectiveness is challenged by higher dimensions and redundant data attributes,leading to detection errors and complicated calculations.The prevalence of mixed data is a current issue for outlier detection algorithms.An outlier detection method of mixed data based on neighborhood combinatorial entropy is studied to improve outlier detection performance by reducing data dimension using an attribute reduction algorithm.The significance of attributes is determined,and fewer influencing attributes are removed based on neighborhood combinatorial entropy.Outlier detection is conducted using the algorithm of local outlier factor.The proposed outlier detection method can be applied effectively in numerical and mixed multidimensional data using neighborhood combinatorial entropy.In the experimental part of this paper,we give a comparison on outlier detection before and after attribute reduction.In a comparative analysis,we give results of the enhanced outlier detection accuracy by removing the fewer influencing attributes in numerical and mixed multidimensional data.展开更多
With the rapid development of next-generation sequencing technologies, bacterial identification becomes a very important and essential step in processing genomic data, especially for metagenomic data. Many computation...With the rapid development of next-generation sequencing technologies, bacterial identification becomes a very important and essential step in processing genomic data, especially for metagenomic data. Many computational methods have been developed and some of them are widely used to address the problems in bacterial identification. In this article we review the algorithms of these methods, discuss their drawbacks, and propose future computational methods that use genomic data to characterize bacteria. In addition, we tackle two specific computational problems in bacterial identification, namely, the detection of host-specific bacteria and the detection of disease-associated bacteria, by offering potential solutions as a starting point for those who are interested in the area.展开更多
基金The authors would like to acknowledge the support of Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)(SML2020SP007)The paper is supported under the National Natural Science Foundation of China(Nos.61772280 and 62072249).
文摘Outlier detection is a key research area in data mining technologies,as outlier detection can identify data inconsistent within a data set.Outlier detection aims to find an abnormal data size from a large data size and has been applied in many fields including fraud detection,network intrusion detection,disaster prediction,medical diagnosis,public security,and image processing.While outlier detection has been widely applied in real systems,its effectiveness is challenged by higher dimensions and redundant data attributes,leading to detection errors and complicated calculations.The prevalence of mixed data is a current issue for outlier detection algorithms.An outlier detection method of mixed data based on neighborhood combinatorial entropy is studied to improve outlier detection performance by reducing data dimension using an attribute reduction algorithm.The significance of attributes is determined,and fewer influencing attributes are removed based on neighborhood combinatorial entropy.Outlier detection is conducted using the algorithm of local outlier factor.The proposed outlier detection method can be applied effectively in numerical and mixed multidimensional data using neighborhood combinatorial entropy.In the experimental part of this paper,we give a comparison on outlier detection before and after attribute reduction.In a comparative analysis,we give results of the enhanced outlier detection accuracy by removing the fewer influencing attributes in numerical and mixed multidimensional data.
基金supported by the National Institute of Health of USA under Grant No. R21/R33 GM078601USDANIFA’s Evans-Allen Grant (Project NO: MOX-Zheng) under Grant No. 0223248International Exchange and Cooperation Officeof Nanjing Medical University of China
文摘With the rapid development of next-generation sequencing technologies, bacterial identification becomes a very important and essential step in processing genomic data, especially for metagenomic data. Many computational methods have been developed and some of them are widely used to address the problems in bacterial identification. In this article we review the algorithms of these methods, discuss their drawbacks, and propose future computational methods that use genomic data to characterize bacteria. In addition, we tackle two specific computational problems in bacterial identification, namely, the detection of host-specific bacteria and the detection of disease-associated bacteria, by offering potential solutions as a starting point for those who are interested in the area.