Cluster-based scenic area is a special form within the scenic area system, scenic areas are always scattered with diversified landscape resources, thus planning should be made according to actual conditions of the loc...Cluster-based scenic area is a special form within the scenic area system, scenic areas are always scattered with diversified landscape resources, thus planning should be made according to actual conditions of the local area, and pertinent measures should be applied. Through elaborating the detail work in the overall planning of Jingyanggang Scenic Area, such as landscape division, spatial layout and sightseeing structure, the authors discussed several problems should be paid more attention in the planning of cluster-based scenic areas.展开更多
Similarity matching and this paper, a saliency-based information presentation are two matching algorithm is proposed key factors in information retrieval. In for user-oriented search based on the psychological studies...Similarity matching and this paper, a saliency-based information presentation are two matching algorithm is proposed key factors in information retrieval. In for user-oriented search based on the psychological studies on human perception, and major emphasis on the saliently similar aspect of objects to be compared is placed and thus the search result is more agreeable for users. After relevant results are obtained, the cluster-based browsing algorithm is adopted for search result presentation based on social network analysis. By organizing the results in clustered lists, the user can have a general understanding of the whole collection by viewing only a small part of results and locate those of major interest rapidly. Experimental results demonstrate the advantages of the proposed algorithm over the traditional work.展开更多
Underwater wireless sensor networks(UWSNs) have attracted wide attention in recent years.The capacity research on it is still in the initial stage,lacking adequate performance evaluation for network construction.This ...Underwater wireless sensor networks(UWSNs) have attracted wide attention in recent years.The capacity research on it is still in the initial stage,lacking adequate performance evaluation for network construction.This paper will focus on this subject by theoretical analysis and simulation,aiming to provide some insights for the actual UWSNs construction.According to the structure features of cluster-based UWSNs and the propagation characteristics of underwater acoustic signal,with the combination of signal to interference plus noise ratio,we define some capacity performance metrics,such as outage probability and transmission capacity.Based on the theory of stochastic geometry,a network capacity analytical model used in the cluster-based UWSNs is presented.The simulation results verify the validity of the theoretical analysis,and the cause of error between theoretical and simulation results has also been clearly explained.展开更多
In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot ...In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot solve the problems. A distributed cluster-based solution for very large linear equations is discussed, it includes the definitions of notations, partition of matrix, communication mechanism, and a master-slaver algorithm etc., the computing cost is O(n^3/N), the memory cost is O(n^2/N), the I/O cost is O(n^2/N), and the com- munication cost is O(Nn ), here, N is the number of computing nodes or processes. Some tests show that the solution could solve the double type of matrix under 10^6 × 10^6 effectively.展开更多
In recent years,several random key pre-distribution schemes have been proposed to bootstrap keys for encryption,but the problem of key and node revocation has received relatively little attention.In this paper,based o...In recent years,several random key pre-distribution schemes have been proposed to bootstrap keys for encryption,but the problem of key and node revocation has received relatively little attention.In this paper,based on a random key pre-distribution scheme using clustering,we present a novel random key revoca-tion protocol,which is suitable for large scale networks greatly and removes compromised information efficiently.The revocation protocol can guarantee network security by using less memory consumption and communication load,and combined by centralized and distributed revoca-tion,having virtues of timeliness and veracity for revoca-tion at the same time.展开更多
The IEEE 802.11p is a standard in a vehicular communication system, known as Wireless Access in Vehicular Environment (WAVE). An implementation of that standard as the MAC Protocol in a high-density of nodes in Vehicu...The IEEE 802.11p is a standard in a vehicular communication system, known as Wireless Access in Vehicular Environment (WAVE). An implementation of that standard as the MAC Protocol in a high-density of nodes in Vehicular Ad-Hoc Networks (VANETs) may create a performance drawback, in particular for packet loss and delay whenever collisions happen. Introducing Time Division Multiple Access (TDMA) schemes can improve the performance. However, TDMA scheduling is difficult to manage the case of high-density of traffic, the high mobility of vehicles, and dynamic network topology. This journal proposes a clustered-based TDMA by traffic priority in VANETs. The clustered traffic is defined as high and low traffic priority and embedded in TDMA MAC Header. The evaluation result obtained through NS3 Simulator shows that the proposed approach performed better in a high-density of nodes.展开更多
Multihop cellular networks is an exciting and a fledgling area of wireless communication which offers huge potential in terms of coverage enhancement, data-rates, power reduction, and various other quality of service ...Multihop cellular networks is an exciting and a fledgling area of wireless communication which offers huge potential in terms of coverage enhancement, data-rates, power reduction, and various other quality of service improvements. However, resource allocation in MCN is an NP-hard problem. Hence, significant research needs to be done in this field in order to efficiently design the radio network. In this paper, optimal position of relay stations in a hierarchical cluster-based two-hop cellular network is investigated. Vector algebra has been used to derive general equation for carrier-to-interference ratio (C/I) of a mobile station. It has been observed that when the transmit power of base station (BS) and the gateway (GTW)/relay station (RS) are same, the RSs should be located close to mid-point of BS and the edge of the cell. However, significantly, when the transmit power of the BS is greater than that of the GTW, then the RSs should be placed closer to the edge of the cell, in order to maximize the minimum C/I at any point in the cell. This in turn results in higher modulation technique at the physical layer, and hence, a higher data-rate to all the users in the system.展开更多
Optimal resource allocation with an objective of maximizing the system capacity is an NP-hard problem in multihop cellular networks. Hence, different heuristic algorithms have been developed over the years that would ...Optimal resource allocation with an objective of maximizing the system capacity is an NP-hard problem in multihop cellular networks. Hence, different heuristic algorithms have been developed over the years that would improve the network system capacity. In this paper, a novel cluster-based architecture is proposed for a two-hop cellular network whereby the transmission distance between any communicating pair is restricted to half the cell radius. In this design, a given radio resource is used by two simultaneously communicating pairs in every hexagonal cell, but for only half the time slot period. The characteristic feature of this cluster-based design is that it enables a frequency reuse ratio of one. The proposed hierarchical system is analyzed and tested under realistic propagation conditions including lognormal shadowing. It has been observed that the system capacity of a cluster-based design is 2.5 times that obtained from the single-hop cellular system with no relaying. In addition, the cluster-based design achieves higher capacity compared to state-of-the-art two-hop algorithms. This is an important finding since the hierarchical cluster-based approach has fewer degrees of freedom in the selection of the routing path for the end-to-end connection. Practical routing algorithms should be able to benefit from this.展开更多
This paper studies the application of the cluster-based approach in the enhancement of the competitiveness of Thailand's SME industry. The author had employed a qualitative method through the in-depth interview. The ...This paper studies the application of the cluster-based approach in the enhancement of the competitiveness of Thailand's SME industry. The author had employed a qualitative method through the in-depth interview. The result showed that Ratchaburi orchid cluster in Thailand has employed the concept of the cluster-based approach since they realized that it was useful and could enable them to produce good quality orchids for the international market. The finding also showed how individuals have worked together and helped each other, in order to build a good horizontal network of support and creating competitive advantages. In addition, the research paper related to knowledge management because knowledge management refers to a method for development which requires cluster members to exchange information, interact with each other, sharing and distribute information, create closer business relationship and build mutual benefit. Therefore, the cluster members will help each other to create a culture that values learning through making a commitment and sharing information to strengthen the cluster.展开更多
Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming t...Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming to achieve early identification of scientific breakthroughs in papers.Design/methodology/approach:This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content.Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages.The development and evolution process are traced using literature time tags.Finally,a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine.Findings:Through manual analysis of all identified outlier papers,the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified.Research limitations:The study’s applicability has only been empirically tested in the biomedical field.More data from various fields are needed to validate the robustness and generalizability of the method.Practical implications:This study provides a valuable supplement to current methods for early identification of scientific breakthroughs,effectively supporting technological intelligence decision-making and services.Originality/value:The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities,offering a more sensitive,precise,and fine-grained alternative method compared to traditional citation-based evaluations,which enhances the ability to identify nascent breakthrough innovations.展开更多
This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-S...This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-Score incorporated with GreyWolf Optimization(GWO)as well as Interquartile Range(IQR)coupled with Ant Colony Optimization(ACO).Using a performance index,it is shown that when compared with the Z-Score and GWO with AdaBoost,the IQR and ACO,with AdaBoost are not very accurate(89.0%vs.86.0%)and less discriminative(Area Under the Curve(AUC)score of 93.0%vs.91.0%).The Z-Score and GWO methods also outperformed the others in terms of precision,scoring 89.0%;and the recall was also found to be satisfactory,scoring 90.0%.Thus,the paper helps to reveal various specific benefits and drawbacks associated with different outlier detection and feature selection techniques,which can be important to consider in further improving various aspects of diagnostics in cardiovascular health.Collectively,these findings can enhance the knowledge of heart disease prediction and patient treatment using enhanced and innovativemachine learning(ML)techniques.These findings when combined improve patient therapy knowledge and cardiac disease prediction through the use of cutting-edge and improved machine learning approaches.This work lays the groundwork for more precise diagnosis models by highlighting the benefits of combining multiple optimization methodologies.Future studies should focus on maximizing patient outcomes and model efficacy through research on these combinations.展开更多
Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method,...Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.展开更多
Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limita...Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limitation of measuring methods. Such outliers pose challenges for data-powered applications such as data assimilation, statistical analysis of pollution characteristics and ensemble forecasting. Here, a fully automatic outlier detection method was developed based on the probability of residuals, which are the discrepancies between the observed and the estimated concentration values. The estimation can be conducted using filtering—or regressions when appropriate—to discriminate four types of outliers characterized by temporal and spatial inconsistency, instrument-induced low variances, periodic calibration exceptions, and less PM_(10) than PM_(2.5) in concentration observations, respectively. This probabilistic method was applied to detect all four types of outliers in hourly surface measurements of six pollutants(PM_(2.5), PM_(10),SO_2,NO_2,CO and O_3) from 1436 stations of the China National Environmental Monitoring Network during 2014-16. Among the measurements, 0.65%-5.68% are marked as outliers. with PM_(10) and CO more prone to outliers. Our method successfully identifies a trend of decreasing outliers from 2014 to 2016,which corresponds to known improvements in the quality assurance and quality control procedures of the China National Environmental Monitoring Network. The outliers can have a significant impact on the annual mean concentrations of PM_(2.5),with differences exceeding 10 μg m^(-3) at 66 sites.展开更多
With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the pr...With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the problem of anomaly detection is a hot topic.Based on the development of anomalous trajectory detection of moving objects,this paper introduces the classical trajectory outlier detection(TRAOD) algorithm,and then proposes a density-based trajectory outlier detection(DBTOD) algorithm,which compensates the disadvantages of the TRAOD algorithm that it is unable to detect anomalous defects when the trajectory is local and dense.The results of employing the proposed algorithm to Elk1993 and Deer1995 datasets are also presented,which show the effectiveness of the algorithm.展开更多
With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorith...With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.展开更多
文摘Cluster-based scenic area is a special form within the scenic area system, scenic areas are always scattered with diversified landscape resources, thus planning should be made according to actual conditions of the local area, and pertinent measures should be applied. Through elaborating the detail work in the overall planning of Jingyanggang Scenic Area, such as landscape division, spatial layout and sightseeing structure, the authors discussed several problems should be paid more attention in the planning of cluster-based scenic areas.
基金Supported by the Fund for Basic Research of National Non-Profit Research Institutes(No.XK2012-2,ZD2012-7-2)the Fund for Preresearch Project of ISTIC(No.YY201208)
文摘Similarity matching and this paper, a saliency-based information presentation are two matching algorithm is proposed key factors in information retrieval. In for user-oriented search based on the psychological studies on human perception, and major emphasis on the saliently similar aspect of objects to be compared is placed and thus the search result is more agreeable for users. After relevant results are obtained, the cluster-based browsing algorithm is adopted for search result presentation based on social network analysis. By organizing the results in clustered lists, the user can have a general understanding of the whole collection by viewing only a small part of results and locate those of major interest rapidly. Experimental results demonstrate the advantages of the proposed algorithm over the traditional work.
基金supported by National Natural Science Foundation of China(No.61101164)
文摘Underwater wireless sensor networks(UWSNs) have attracted wide attention in recent years.The capacity research on it is still in the initial stage,lacking adequate performance evaluation for network construction.This paper will focus on this subject by theoretical analysis and simulation,aiming to provide some insights for the actual UWSNs construction.According to the structure features of cluster-based UWSNs and the propagation characteristics of underwater acoustic signal,with the combination of signal to interference plus noise ratio,we define some capacity performance metrics,such as outage probability and transmission capacity.Based on the theory of stochastic geometry,a network capacity analytical model used in the cluster-based UWSNs is presented.The simulation results verify the validity of the theoretical analysis,and the cause of error between theoretical and simulation results has also been clearly explained.
文摘In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot solve the problems. A distributed cluster-based solution for very large linear equations is discussed, it includes the definitions of notations, partition of matrix, communication mechanism, and a master-slaver algorithm etc., the computing cost is O(n^3/N), the memory cost is O(n^2/N), the I/O cost is O(n^2/N), and the com- munication cost is O(Nn ), here, N is the number of computing nodes or processes. Some tests show that the solution could solve the double type of matrix under 10^6 × 10^6 effectively.
基金supported by the Ministry of Education Doctor Foundation in China under Grant No. 20050699037
文摘In recent years,several random key pre-distribution schemes have been proposed to bootstrap keys for encryption,but the problem of key and node revocation has received relatively little attention.In this paper,based on a random key pre-distribution scheme using clustering,we present a novel random key revoca-tion protocol,which is suitable for large scale networks greatly and removes compromised information efficiently.The revocation protocol can guarantee network security by using less memory consumption and communication load,and combined by centralized and distributed revoca-tion,having virtues of timeliness and veracity for revoca-tion at the same time.
文摘The IEEE 802.11p is a standard in a vehicular communication system, known as Wireless Access in Vehicular Environment (WAVE). An implementation of that standard as the MAC Protocol in a high-density of nodes in Vehicular Ad-Hoc Networks (VANETs) may create a performance drawback, in particular for packet loss and delay whenever collisions happen. Introducing Time Division Multiple Access (TDMA) schemes can improve the performance. However, TDMA scheduling is difficult to manage the case of high-density of traffic, the high mobility of vehicles, and dynamic network topology. This journal proposes a clustered-based TDMA by traffic priority in VANETs. The clustered traffic is defined as high and low traffic priority and embedded in TDMA MAC Header. The evaluation result obtained through NS3 Simulator shows that the proposed approach performed better in a high-density of nodes.
文摘Multihop cellular networks is an exciting and a fledgling area of wireless communication which offers huge potential in terms of coverage enhancement, data-rates, power reduction, and various other quality of service improvements. However, resource allocation in MCN is an NP-hard problem. Hence, significant research needs to be done in this field in order to efficiently design the radio network. In this paper, optimal position of relay stations in a hierarchical cluster-based two-hop cellular network is investigated. Vector algebra has been used to derive general equation for carrier-to-interference ratio (C/I) of a mobile station. It has been observed that when the transmit power of base station (BS) and the gateway (GTW)/relay station (RS) are same, the RSs should be located close to mid-point of BS and the edge of the cell. However, significantly, when the transmit power of the BS is greater than that of the GTW, then the RSs should be placed closer to the edge of the cell, in order to maximize the minimum C/I at any point in the cell. This in turn results in higher modulation technique at the physical layer, and hence, a higher data-rate to all the users in the system.
文摘Optimal resource allocation with an objective of maximizing the system capacity is an NP-hard problem in multihop cellular networks. Hence, different heuristic algorithms have been developed over the years that would improve the network system capacity. In this paper, a novel cluster-based architecture is proposed for a two-hop cellular network whereby the transmission distance between any communicating pair is restricted to half the cell radius. In this design, a given radio resource is used by two simultaneously communicating pairs in every hexagonal cell, but for only half the time slot period. The characteristic feature of this cluster-based design is that it enables a frequency reuse ratio of one. The proposed hierarchical system is analyzed and tested under realistic propagation conditions including lognormal shadowing. It has been observed that the system capacity of a cluster-based design is 2.5 times that obtained from the single-hop cellular system with no relaying. In addition, the cluster-based design achieves higher capacity compared to state-of-the-art two-hop algorithms. This is an important finding since the hierarchical cluster-based approach has fewer degrees of freedom in the selection of the routing path for the end-to-end connection. Practical routing algorithms should be able to benefit from this.
文摘This paper studies the application of the cluster-based approach in the enhancement of the competitiveness of Thailand's SME industry. The author had employed a qualitative method through the in-depth interview. The result showed that Ratchaburi orchid cluster in Thailand has employed the concept of the cluster-based approach since they realized that it was useful and could enable them to produce good quality orchids for the international market. The finding also showed how individuals have worked together and helped each other, in order to build a good horizontal network of support and creating competitive advantages. In addition, the research paper related to knowledge management because knowledge management refers to a method for development which requires cluster members to exchange information, interact with each other, sharing and distribute information, create closer business relationship and build mutual benefit. Therefore, the cluster members will help each other to create a culture that values learning through making a commitment and sharing information to strengthen the cluster.
基金supported by the major project of the National Social Science Foundation of China“Big Data-driven Semantic Evaluation System of Science and Technology Literature”(Grant No.21&ZD329)。
文摘Purpose:To address the“anomalies”that occur when scientific breakthroughs emerge,this study focuses on identifying early signs and nascent stages of breakthrough innovations from the perspective of outliers,aiming to achieve early identification of scientific breakthroughs in papers.Design/methodology/approach:This study utilizes semantic technology to extract research entities from the titles and abstracts of papers to represent each paper’s research content.Outlier detection methods are then employed to measure and analyze the anomalies in breakthrough papers during their early stages.The development and evolution process are traced using literature time tags.Finally,a case study is conducted using the key publications of the 2021 Nobel Prize laureates in Physiology or Medicine.Findings:Through manual analysis of all identified outlier papers,the effectiveness of the proposed method for early identifying potential scientific breakthroughs is verified.Research limitations:The study’s applicability has only been empirically tested in the biomedical field.More data from various fields are needed to validate the robustness and generalizability of the method.Practical implications:This study provides a valuable supplement to current methods for early identification of scientific breakthroughs,effectively supporting technological intelligence decision-making and services.Originality/value:The study introduces a novel approach to early identification of scientific breakthroughs by leveraging outlier analysis of research entities,offering a more sensitive,precise,and fine-grained alternative method compared to traditional citation-based evaluations,which enhances the ability to identify nascent breakthrough innovations.
文摘This paper investigates the application ofmachine learning to develop a response model to cardiovascular problems and the use of AdaBoost which incorporates an application of Outlier Detection methodologies namely;Z-Score incorporated with GreyWolf Optimization(GWO)as well as Interquartile Range(IQR)coupled with Ant Colony Optimization(ACO).Using a performance index,it is shown that when compared with the Z-Score and GWO with AdaBoost,the IQR and ACO,with AdaBoost are not very accurate(89.0%vs.86.0%)and less discriminative(Area Under the Curve(AUC)score of 93.0%vs.91.0%).The Z-Score and GWO methods also outperformed the others in terms of precision,scoring 89.0%;and the recall was also found to be satisfactory,scoring 90.0%.Thus,the paper helps to reveal various specific benefits and drawbacks associated with different outlier detection and feature selection techniques,which can be important to consider in further improving various aspects of diagnostics in cardiovascular health.Collectively,these findings can enhance the knowledge of heart disease prediction and patient treatment using enhanced and innovativemachine learning(ML)techniques.These findings when combined improve patient therapy knowledge and cardiac disease prediction through the use of cutting-edge and improved machine learning approaches.This work lays the groundwork for more precise diagnosis models by highlighting the benefits of combining multiple optimization methodologies.Future studies should focus on maximizing patient outcomes and model efficacy through research on these combinations.
文摘Changepoint detection faces challenges when outlier data are present. This paper proposes a multivariate changepoint detection method which is based on the robust WPCA projection direction and the robust RFPOP method, RWPCA-RFPOP method. Our method is double robust which is suitable for detecting mean changepoints in multivariate normal data with high correlations between variables that include outliers. Simulation results demonstrate that our method provides strong guarantees on both the number and location of changepoints in the presence of outliers. Finally, our method is well applied in an ACGH dataset.
基金supported by the National Natural Science Foundation of China(Grant No.11201003)the Provincial Natural Science Research Project of Anhui Colleges(Grant No.KJ2016A263)+1 种基金the Natural Science Foundation of Anhui Province(Grant No.1408085MA07)the PhD Research Startup Foundation of Anhui Normal University(Grant No.2014bsqdjj34)
基金supported by the National Natural Science Foundation (Grant Nos.91644216 and 41575128)the CAS Information Technology Program (Grant No.XXH13506-302)Guangdong Provincial Science and Technology Development Special Fund (No.2017B020216007)
文摘Although quality assurance and quality control procedures are routinely applied in most air quality networks, outliers can still occur due to instrument malfunctions, the influence of harsh environments and the limitation of measuring methods. Such outliers pose challenges for data-powered applications such as data assimilation, statistical analysis of pollution characteristics and ensemble forecasting. Here, a fully automatic outlier detection method was developed based on the probability of residuals, which are the discrepancies between the observed and the estimated concentration values. The estimation can be conducted using filtering—or regressions when appropriate—to discriminate four types of outliers characterized by temporal and spatial inconsistency, instrument-induced low variances, periodic calibration exceptions, and less PM_(10) than PM_(2.5) in concentration observations, respectively. This probabilistic method was applied to detect all four types of outliers in hourly surface measurements of six pollutants(PM_(2.5), PM_(10),SO_2,NO_2,CO and O_3) from 1436 stations of the China National Environmental Monitoring Network during 2014-16. Among the measurements, 0.65%-5.68% are marked as outliers. with PM_(10) and CO more prone to outliers. Our method successfully identifies a trend of decreasing outliers from 2014 to 2016,which corresponds to known improvements in the quality assurance and quality control procedures of the China National Environmental Monitoring Network. The outliers can have a significant impact on the annual mean concentrations of PM_(2.5),with differences exceeding 10 μg m^(-3) at 66 sites.
基金supported by the Aeronautical Science Foundation of China(20111052010)the Jiangsu Graduates Innovation Project (CXZZ120163)+1 种基金the "333" Project of Jiangsu Provincethe Qing Lan Project of Jiangsu Province
文摘With the development of global position system(GPS),wireless technology and location aware services,it is possible to collect a large quantity of trajectory data.In the field of data mining for moving objects,the problem of anomaly detection is a hot topic.Based on the development of anomalous trajectory detection of moving objects,this paper introduces the classical trajectory outlier detection(TRAOD) algorithm,and then proposes a density-based trajectory outlier detection(DBTOD) algorithm,which compensates the disadvantages of the TRAOD algorithm that it is unable to detect anomalous defects when the trajectory is local and dense.The results of employing the proposed algorithm to Elk1993 and Deer1995 datasets are also presented,which show the effectiveness of the algorithm.
基金supported by the State Grid Liaoning Electric Power Supply CO, LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)”
文摘With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.