The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional...The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.展开更多
Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional sp...Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory. Key words outlier - detection - three-dimensional space graph - data mining CLC number TP 311. 13 - TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: ZHANG Jing (1975-), female, Ph. D, lecturer, research direction: data mining and knowledge discovery.展开更多
Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical ...Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical methods for anomalous cell detection cannot adapt to the evolution of networks, and data mining becomes the mainstream. In this paper, we propose a novel kernel density-based local outlier factor(KLOF) to assign a degree of being an outlier to each object. Firstly, the notion of KLOF is introduced, which captures exactly the relative degree of isolation. Then, by analyzing its properties, including the tightness of upper and lower bounds, sensitivity of density perturbation, we find that KLOF is much greater than 1 for outliers. Lastly, KLOFis applied on a real-world dataset to detect anomalous cells with abnormal key performance indicators(KPIs) to verify its reliability. The experiment shows that KLOF can find outliers efficiently. It can be a guideline for the operators to perform faster and more efficient trouble shooting.展开更多
Air pollution is a major issue related to national economy and people's livelihood.At present,the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole,and...Air pollution is a major issue related to national economy and people's livelihood.At present,the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole,and is a lack of attention to enterprise pollutant emissions from the micro level.Limited by the amount and time granularity of data from enterprises,enterprise pollutant emissions are stll understudied.Driven by big data of air pollution emissions of industrial enterprises monitored in Beijing-Tianjin-Hebei,the data mining of enterprises pollution emissions is carried out in the paper,including the association analysis between different features based on grey association,the association mining between different data based on association rule and the outlier detection based on clustering.The results show that:(1)The industries affecting NOx and SO2 mainly are electric power,heat production and supply industry,metal smelting and processing industries in Beijing-Tianjin-Hebei;(2)These districts nearby Hengshui and Shijiazhuang city in Hebei province form strong association rules;(3)The industrial enterprises in Beijing-Tianjin-Hebei are divided into six clusters,of which three categories belong to outliers with excessive emissions of total vOCs,PM and NH3 respectively.展开更多
In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be ma...In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.展开更多
Purpose–Among the growing number of data mining(DM)techniques,outlier detection has gained importance in many applications and also attracted much attention in recent times.In the past,outlier detection researched pa...Purpose–Among the growing number of data mining(DM)techniques,outlier detection has gained importance in many applications and also attracted much attention in recent times.In the past,outlier detection researched papers appeared in a safety care that can view as searching for the needles in the haystack.However,outliers are not always erroneous.Therefore,the purpose of this paper is to investigate the role of outliers in healthcare services in general and patient safety care,in particular.Design/methodology/approach–It is a combined DM(clustering and the nearest neighbor)technique for outliers’detection,which provides a clear understanding and meaningful insights to visualize the data behaviors for healthcare safety.The outcomes or the knowledge implicit is vitally essential to a proper clinicaldecision-making process.The method isimportant to thesemantic,andthe novel tactic of patients’events and situations prove that play a significant role in the process of patient care safety and medications.Findings–The outcomes of the paper is discussing a novel and integrated methodology,which can be inferring for different biological data analysis.It is discussed as integrated DM techniques to optimize its performancein the field of health and medicalscience.It is an integrated method of outliers detection that can be extending for searching valuable information and knowledge implicit based on selected patient factors.Based on these facts,outliers are detected as clusters and point events,and novel ideas proposed to empower clinical services in consideration of customers’satisfactions.It is also essential to be a baseline for further healthcare strategic development and research works.Research limitations/implications–This paper mainly focussed on outliers detections.Outlier isolation that are essential to investigate the reason how it happened and communications how to mitigate it did not touch.Therefore,the research can be extended more about the hierarchy of patient problems.Originality/value–DM is a dynamic and successful gateway for discovering useful knowledge for enhancing healthcare performances and patient safety.Clinical data based outlier detection is a basic task to achieve healthcare strategy.Therefore,in this paper,the authors focussed on combined DM techniques for a deep analysis of clinical data,which provide an optimal level of clinical decision-making processes.Proper clinical decisions can obtain in terms of attributes selections that important to know the influential factors or parameters of healthcare services.Therefore,using integrated clustering and nearest neighbors techniques give more acceptable searched such complex data outliers,which could be fundamental to further analysis of healthcare and patient safety situational analysis.展开更多
Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Bas...Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Based on data samples from selected iron and steel companies,data types were classified according to different characteristics;then,appropriate methods were selected to process them in order to solve the deficiencies and outliers of the original blast furnace data.Linear interpolation was used to fill in the divided continuation data,the Knearest neighbor(KNN)algorithm was used to fill in correlation data with the internal law,and periodic statistical data were filled by the average.The error rate in the filling was low,and the fitting degree was over 85%.For the screening of outliers,corresponding indicator parameters were added according to the continuity,relevance,and periodicity of different data.Also,a variety of algorithms were used for processing.Through the analysis of screening results,a large amount of efficient information in the data was retained,and ineffective outliers were eliminated.Standardized processing of blast furnace big data as the basis of applied research on blast furnace big data can serve as an important means to improve data quality and retain data value.展开更多
Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. ...Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.展开更多
This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the compu...This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the computation of Non_Reduct in order to detect outliers.By using Binary PSO algorithm, the rules generated from Rough_Outliers algorithm is optimized, giving significant outliers object detected. The detection ofoutliers process is then enhanced by hybridizing it with Negative Association Rules. Frequent and Infrequent item sets from outlier rules are generated. Results show that the hybrid Rough_Negative algorithm is able to uncover meaningful knowledge of outliers from the frequent and infrequent item sets. These knowledge can then be used by experts in their field of domain for better decision making.展开更多
Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate th...Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate the similarity between different types of objects.We propose a meta-path-based outlier detection method(MPOutliers)in heterogeneous information network to deal with problems in one go under a unified framework.MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships.It discovers the semantic information among nodes in heterogeneous networks,instead of only considering the network structure.It also computes the closeness degree between nodes with the same type,which extends the whole heterogeneous network.Moreover,each node is assigned with a reliable weighting to measure its authority degree.Substantial experiments on two real datasets(AMiner and Movies dataset)show that our proposed method is very effective and efficient for outlier detection.展开更多
Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outl...Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.展开更多
Outlier detection is a very important type of data mining,which is extensively used in application areas.The traditional cell-based outlier detection algorithm not only takes a large amount of time in processing massi...Outlier detection is a very important type of data mining,which is extensively used in application areas.The traditional cell-based outlier detection algorithm not only takes a large amount of time in processing massive data,but also uses lots of machine resources,which results in the imbalance of the machine load.This paper presents an algorithm of the MapReduce-based and cell-based outlier detection,combined with the single-layer perceptron,which achieves the parallelization of outlier detection.These experiments show that this improved algorithm is able to effectively improve the efficiency of the outlier detection as well as the accuracy.展开更多
基金supported by Fundamental Research Funds for the Central Universities (No. 2018XD004)
文摘The distance-based outlier detection method detects the implied outliers by calculating the distance of the points in the dataset, but the computational complexity is particularly high when processing multidimensional datasets. In addition, the traditional outlier detection method does not consider the frequency of subsets occurrence, thus, the detected outliers do not fit the definition of outliers (i.e., rarely appearing). The pattern mining-based outlier detection approaches have solved this problem, but the importance of each pattern is not taken into account in outlier detection process, so the detected outliers cannot truly reflect some actual situation. Aimed at these problems, a two-phase minimal weighted rare pattern mining-based outlier detection approach, called MWRPM-Outlier, is proposed to effectively detect outliers on the weight data stream. In particular, a method called MWRPM is proposed in the pattern mining phase to fast mine the minimal weighted rare patterns, and then two deviation factors are defined in outlier detection phase to measure the abnormal degree of each transaction on the weight data stream. Experimental results show that the proposed MWRPM-Outlier approach has excellent performance in outlier detection and MWRPM approach outperforms in weighted rare pattern mining.
文摘Outlier detection has very important applied value in data mining literature. Different outlier detection algorithms based on distinct theories have different definitions and mining processes. The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory. Key words outlier - detection - three-dimensional space graph - data mining CLC number TP 311. 13 - TP 391 Foundation item: Supported by the National Natural Science Foundation of China (70371015)Biography: ZHANG Jing (1975-), female, Ph. D, lecturer, research direction: data mining and knowledge discovery.
基金supported by the National Basic Research Program of China (973 Program: 2013CB329004)
文摘Since data services are penetrating into our daily life rapidly, the mobile network becomes more complicated, and the amount of data transmission is more and more increasing. In this case, the traditional statistical methods for anomalous cell detection cannot adapt to the evolution of networks, and data mining becomes the mainstream. In this paper, we propose a novel kernel density-based local outlier factor(KLOF) to assign a degree of being an outlier to each object. Firstly, the notion of KLOF is introduced, which captures exactly the relative degree of isolation. Then, by analyzing its properties, including the tightness of upper and lower bounds, sensitivity of density perturbation, we find that KLOF is much greater than 1 for outliers. Lastly, KLOFis applied on a real-world dataset to detect anomalous cells with abnormal key performance indicators(KPIs) to verify its reliability. The experiment shows that KLOF can find outliers efficiently. It can be a guideline for the operators to perform faster and more efficient trouble shooting.
基金supported by the National Natural Science Foundation of China[grant number 72271033]the Beijing Municipal Education Commission and Beijing Natural Science Foundation[grant number KZ202110017025]the National Undergraduate Innovation and Entrepreneurship Plan Project(2022J00244).
文摘Air pollution is a major issue related to national economy and people's livelihood.At present,the researches on air pollution mostly focus on the pollutant emissions in a specific industry or region as a whole,and is a lack of attention to enterprise pollutant emissions from the micro level.Limited by the amount and time granularity of data from enterprises,enterprise pollutant emissions are stll understudied.Driven by big data of air pollution emissions of industrial enterprises monitored in Beijing-Tianjin-Hebei,the data mining of enterprises pollution emissions is carried out in the paper,including the association analysis between different features based on grey association,the association mining between different data based on association rule and the outlier detection based on clustering.The results show that:(1)The industries affecting NOx and SO2 mainly are electric power,heat production and supply industry,metal smelting and processing industries in Beijing-Tianjin-Hebei;(2)These districts nearby Hengshui and Shijiazhuang city in Hebei province form strong association rules;(3)The industrial enterprises in Beijing-Tianjin-Hebei are divided into six clusters,of which three categories belong to outliers with excessive emissions of total vOCs,PM and NH3 respectively.
文摘In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k -dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.
基金The work supported by the National Natural Science Foundation of China under Grant No.61374135.
文摘Purpose–Among the growing number of data mining(DM)techniques,outlier detection has gained importance in many applications and also attracted much attention in recent times.In the past,outlier detection researched papers appeared in a safety care that can view as searching for the needles in the haystack.However,outliers are not always erroneous.Therefore,the purpose of this paper is to investigate the role of outliers in healthcare services in general and patient safety care,in particular.Design/methodology/approach–It is a combined DM(clustering and the nearest neighbor)technique for outliers’detection,which provides a clear understanding and meaningful insights to visualize the data behaviors for healthcare safety.The outcomes or the knowledge implicit is vitally essential to a proper clinicaldecision-making process.The method isimportant to thesemantic,andthe novel tactic of patients’events and situations prove that play a significant role in the process of patient care safety and medications.Findings–The outcomes of the paper is discussing a novel and integrated methodology,which can be inferring for different biological data analysis.It is discussed as integrated DM techniques to optimize its performancein the field of health and medicalscience.It is an integrated method of outliers detection that can be extending for searching valuable information and knowledge implicit based on selected patient factors.Based on these facts,outliers are detected as clusters and point events,and novel ideas proposed to empower clinical services in consideration of customers’satisfactions.It is also essential to be a baseline for further healthcare strategic development and research works.Research limitations/implications–This paper mainly focussed on outliers detections.Outlier isolation that are essential to investigate the reason how it happened and communications how to mitigate it did not touch.Therefore,the research can be extended more about the hierarchy of patient problems.Originality/value–DM is a dynamic and successful gateway for discovering useful knowledge for enhancing healthcare performances and patient safety.Clinical data based outlier detection is a basic task to achieve healthcare strategy.Therefore,in this paper,the authors focussed on combined DM techniques for a deep analysis of clinical data,which provide an optimal level of clinical decision-making processes.Proper clinical decisions can obtain in terms of attributes selections that important to know the influential factors or parameters of healthcare services.Therefore,using integrated clustering and nearest neighbors techniques give more acceptable searched such complex data outliers,which could be fundamental to further analysis of healthcare and patient safety situational analysis.
基金This work is financially supported by the National Nature Science Foundation of China(No.52004096)the Hebei Province High-End Iron and Steel Metallurgical Joint Research Fund Project,China(No.E2019209314)+1 种基金the Scientific Research Program Project of Hebei Education Department,China(No.QN2019200)the Tangshan Science and Technology Planning Project,China(No.19150241E).
文摘Blast furnace data processing is prone to problems such as outliers.To overcome these problems and identify an improved method for processing blast furnace data,we conducted an in-depth study of blast furnace data.Based on data samples from selected iron and steel companies,data types were classified according to different characteristics;then,appropriate methods were selected to process them in order to solve the deficiencies and outliers of the original blast furnace data.Linear interpolation was used to fill in the divided continuation data,the Knearest neighbor(KNN)algorithm was used to fill in correlation data with the internal law,and periodic statistical data were filled by the average.The error rate in the filling was low,and the fitting degree was over 85%.For the screening of outliers,corresponding indicator parameters were added according to the continuity,relevance,and periodicity of different data.Also,a variety of algorithms were used for processing.Through the analysis of screening results,a large amount of efficient information in the data was retained,and ineffective outliers were eliminated.Standardized processing of blast furnace big data as the basis of applied research on blast furnace big data can serve as an important means to improve data quality and retain data value.
基金supported by the National Natural Science Foundation of China under Grant Nos.61025007,61328202,61173029,61100024,61332006,and 61073063the National High Technology Research and Development 863 Program of China under Grant No.2012AA011004the National Basic Research 973 Program of China under Grant No.2011CB302200-G
文摘Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach -- Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.
文摘This paper discusses on the detection of outliers by hybridizing Rough_Outlier Algorithm with Negative Association Rules. An optimization algorithm named Binary Particle Swarm Optimization is used to improve the computation of Non_Reduct in order to detect outliers.By using Binary PSO algorithm, the rules generated from Rough_Outliers algorithm is optimized, giving significant outliers object detected. The detection ofoutliers process is then enhanced by hybridizing it with Negative Association Rules. Frequent and Infrequent item sets from outlier rules are generated. Results show that the hybrid Rough_Negative algorithm is able to uncover meaningful knowledge of outliers from the frequent and infrequent item sets. These knowledge can then be used by experts in their field of domain for better decision making.
基金the National Natural Science Foundation of China(Grant Nos.61872163 and 61806084)China Postdoctoral Science Foundation project(2018M631872)Jilin Provincial Education Department project(JJKH20190160KJ).
文摘Mining outliers in heterogeneous networks is crucial to many applications,but challenges abound.In this paper,we focus on identifying meta-path-based outliers in heterogeneous information network(HIN),and calculate the similarity between different types of objects.We propose a meta-path-based outlier detection method(MPOutliers)in heterogeneous information network to deal with problems in one go under a unified framework.MPOutliers calculates the heterogeneous reachable probability by combining different types of objects and their relationships.It discovers the semantic information among nodes in heterogeneous networks,instead of only considering the network structure.It also computes the closeness degree between nodes with the same type,which extends the whole heterogeneous network.Moreover,each node is assigned with a reliable weighting to measure its authority degree.Substantial experiments on two real datasets(AMiner and Movies dataset)show that our proposed method is very effective and efficient for outlier detection.
基金Project(61362021)supported by the National Natural Science Foundation of ChinaProject(2016GXNSFAA380149)supported by Natural Science Foundation of Guangxi Province,China+1 种基金Projects(2016YJCXB02,2017YJCX34)supported by Innovation Project of GUET Graduate Education,ChinaProject(2011KF11)supported by the Key Laboratory of Cognitive Radio and Information Processing,Ministry of Education,China
文摘Outlier detection is an important task in data mining. In fact, it is difficult to find the clustering centers in some sophisticated multidimensional datasets and to measure the deviation degree of each potential outlier. In this work, an effective outlier detection method based on multi-dimensional clustering and local density(ODBMCLD) is proposed. ODBMCLD firstly identifies the center objects by the local density peak of data objects, and clusters the whole dataset based on the center objects. Then, outlier objects belonging to different clusters will be marked as candidates of abnormal data. Finally, the top N points among these abnormal candidates are chosen as final anomaly objects with high outlier factors. The feasibility and effectiveness of the method are verified by experiments.
基金Supported by the National High Technology Research and Development Program of China(863 Program)(2012AA040910)
文摘Outlier detection is a very important type of data mining,which is extensively used in application areas.The traditional cell-based outlier detection algorithm not only takes a large amount of time in processing massive data,but also uses lots of machine resources,which results in the imbalance of the machine load.This paper presents an algorithm of the MapReduce-based and cell-based outlier detection,combined with the single-layer perceptron,which achieves the parallelization of outlier detection.These experiments show that this improved algorithm is able to effectively improve the efficiency of the outlier detection as well as the accuracy.