As the rapid development of aviation industry and newly emerging crowd-sourcing projects such as Flightradar24 and FlightAware,large amount of air traffic data,particularly four-dimension(4D)trajectory data,have becom...As the rapid development of aviation industry and newly emerging crowd-sourcing projects such as Flightradar24 and FlightAware,large amount of air traffic data,particularly four-dimension(4D)trajectory data,have become available for the public.In order to guarantee the accuracy and reliability of results,data cleansing is the first step in analyzing 4D trajectory data,including error identification and mitigation.Data cleansing techniques for the 4D trajectory data are investigated.Back propagation(BP)neural network algorithm is applied to repair errors.Newton interpolation method is used to obtain even-spaced trajectory samples over a uniform distribution of each flight’s 4D trajectory data.Furthermore,a new method is proposed to compress data while maintaining the intrinsic characteristics of the trajectories.Density-based spatial clustering of applications with noise(DBSCAN)is applied to identify remaining outliers of sample points.Experiments are performed on a data set of one-day 4D trajectory data over Europe.The results show that the proposed method can achieve more efficient and effective results than the existing approaches.The work contributes to the first step of data preprocessing and lays foundation for further downstream 4D trajectory analysis.展开更多
Diabetic retinopathy is a critical eye condition that,if not treated,can lead to vision loss.Traditional methods of diagnosing and treating the disease are time-consuming and expensive.However,machine learning and dee...Diabetic retinopathy is a critical eye condition that,if not treated,can lead to vision loss.Traditional methods of diagnosing and treating the disease are time-consuming and expensive.However,machine learning and deep transfer learning(DTL)techniques have shown promise in medical applications,including detecting,classifying,and segmenting diabetic retinopathy.These advanced techniques offer higher accuracy and performance.ComputerAided Diagnosis(CAD)is crucial in speeding up classification and providing accurate disease diagnoses.Overall,these technological advancements hold great potential for improving the management of diabetic retinopathy.The study’s objective was to differentiate between different classes of diabetes and verify the model’s capability to distinguish between these classes.The robustness of the model was evaluated using other metrics such as accuracy(ACC),precision(PRE),recall(REC),and area under the curve(AUC).In this particular study,the researchers utilized data cleansing techniques,transfer learning(TL),and convolutional neural network(CNN)methods to effectively identify and categorize the various diseases associated with diabetic retinopathy(DR).They employed the VGG-16CNN model,incorporating intelligent parameters that enhanced its robustness.The outcomes surpassed the results obtained by the auto enhancement(AE)filter,which had an ACC of over 98%.The manuscript provides visual aids such as graphs,tables,and techniques and frameworks to enhance understanding.This study highlights the significance of optimized deep TL in improving the metrics of the classification of the four separate classes of DR.The manuscript emphasizes the importance of using the VGG16CNN classification technique in this context.展开更多
Nowadays, the deep learning methods are widely applied to analyze and predict the trend of various disaster events and offer the alternatives to make the appropriate decisions. These support the water resource managem...Nowadays, the deep learning methods are widely applied to analyze and predict the trend of various disaster events and offer the alternatives to make the appropriate decisions. These support the water resource management and the short-term planning. In this paper, the water levels of the Pattani River in the Southern of Thailand have been predicted every hour of 7 days forecast. Time Series Transformer and Linear Regression were applied in this work. The results of both were the water levels forecast that had the high accuracy. Moreover, the water levels forecasting dashboard was developed for using to monitor the water levels at the Pattani River as well.展开更多
Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while ...Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while building the classifier and negatively impacts classification accuracy.This paper uses instance reduction techniques for the datasets before mining the association rules and building the classifier.Instance reduction techniques were originally developed to reduce memory requirements in instance-based learning.This paper utilizes them to remove noise from the dataset before training the association rules classifier.Extensive experiments were conducted to assess the accuracy of association rules with different instance reduction techniques,namely:DecrementalReduction Optimization Procedure(DROP)3,DROP5,ALL K-Nearest Neighbors(ALLKNN),Edited Nearest Neighbor(ENN),and Repeated Edited Nearest Neighbor(RENN)in different noise ratios.Experiments show that instance reduction techniques substantially improved the average classification accuracy on three different noise levels:0%,5%,and 10%.The RENN algorithm achieved the highest levels of accuracy with a significant improvement on seven out of eight used datasets from the University of California Irvine(UCI)machine learning repository.The improvements were more apparent in the 5%and the 10%noise cases.When RENN was applied,the average classification accuracy for the eight datasets in the zero-noise test enhanced from 70.47%to 76.65%compared to the original test.The average accuracy was improved from 66.08%to 77.47%for the 5%-noise case and from 59.89%to 77.59%in the 10%-noise case.Higher confidence was also reported in building the association rules when RENN was used.The above results indicate that RENN is a good solution in removing noise and avoiding overfitting during the construction of the association rules classifier,especially in noisy domains.展开更多
Data quality management,especially data cleansing,has been extensively studied for many years in the areas of data management and visual analytics.In the paper,we first review and explore the relevant work from the re...Data quality management,especially data cleansing,has been extensively studied for many years in the areas of data management and visual analytics.In the paper,we first review and explore the relevant work from the research areas of data management,visual analytics and human-computer interaction.Then for different types of data such as multimedia data,textual data,trajectory data,and graph data,we summarize the common methods for improving data quality by leveraging data cleansing techniques at different analysis stages.Based on a thorough analysis,we propose a general visual analytics framework for interactively cleansing data.Finally,the challenges and opportunities are analyzed and discussed in the context of data and humans.展开更多
基金supported by the National Natural Science Foundations of China (Nos. 61861136005,61851110763,and 71731001).
文摘As the rapid development of aviation industry and newly emerging crowd-sourcing projects such as Flightradar24 and FlightAware,large amount of air traffic data,particularly four-dimension(4D)trajectory data,have become available for the public.In order to guarantee the accuracy and reliability of results,data cleansing is the first step in analyzing 4D trajectory data,including error identification and mitigation.Data cleansing techniques for the 4D trajectory data are investigated.Back propagation(BP)neural network algorithm is applied to repair errors.Newton interpolation method is used to obtain even-spaced trajectory samples over a uniform distribution of each flight’s 4D trajectory data.Furthermore,a new method is proposed to compress data while maintaining the intrinsic characteristics of the trajectories.Density-based spatial clustering of applications with noise(DBSCAN)is applied to identify remaining outliers of sample points.Experiments are performed on a data set of one-day 4D trajectory data over Europe.The results show that the proposed method can achieve more efficient and effective results than the existing approaches.The work contributes to the first step of data preprocessing and lays foundation for further downstream 4D trajectory analysis.
文摘Diabetic retinopathy is a critical eye condition that,if not treated,can lead to vision loss.Traditional methods of diagnosing and treating the disease are time-consuming and expensive.However,machine learning and deep transfer learning(DTL)techniques have shown promise in medical applications,including detecting,classifying,and segmenting diabetic retinopathy.These advanced techniques offer higher accuracy and performance.ComputerAided Diagnosis(CAD)is crucial in speeding up classification and providing accurate disease diagnoses.Overall,these technological advancements hold great potential for improving the management of diabetic retinopathy.The study’s objective was to differentiate between different classes of diabetes and verify the model’s capability to distinguish between these classes.The robustness of the model was evaluated using other metrics such as accuracy(ACC),precision(PRE),recall(REC),and area under the curve(AUC).In this particular study,the researchers utilized data cleansing techniques,transfer learning(TL),and convolutional neural network(CNN)methods to effectively identify and categorize the various diseases associated with diabetic retinopathy(DR).They employed the VGG-16CNN model,incorporating intelligent parameters that enhanced its robustness.The outcomes surpassed the results obtained by the auto enhancement(AE)filter,which had an ACC of over 98%.The manuscript provides visual aids such as graphs,tables,and techniques and frameworks to enhance understanding.This study highlights the significance of optimized deep TL in improving the metrics of the classification of the four separate classes of DR.The manuscript emphasizes the importance of using the VGG16CNN classification technique in this context.
文摘Nowadays, the deep learning methods are widely applied to analyze and predict the trend of various disaster events and offer the alternatives to make the appropriate decisions. These support the water resource management and the short-term planning. In this paper, the water levels of the Pattani River in the Southern of Thailand have been predicted every hour of 7 days forecast. Time Series Transformer and Linear Regression were applied in this work. The results of both were the water levels forecast that had the high accuracy. Moreover, the water levels forecasting dashboard was developed for using to monitor the water levels at the Pattani River as well.
基金The APC was funded by the Deanship of Scientific Research,Saudi Electronic University.
文摘Association rules’learning is a machine learning method used in finding underlying associations in large datasets.Whether intentionally or unintentionally present,noise in training instances causes overfitting while building the classifier and negatively impacts classification accuracy.This paper uses instance reduction techniques for the datasets before mining the association rules and building the classifier.Instance reduction techniques were originally developed to reduce memory requirements in instance-based learning.This paper utilizes them to remove noise from the dataset before training the association rules classifier.Extensive experiments were conducted to assess the accuracy of association rules with different instance reduction techniques,namely:DecrementalReduction Optimization Procedure(DROP)3,DROP5,ALL K-Nearest Neighbors(ALLKNN),Edited Nearest Neighbor(ENN),and Repeated Edited Nearest Neighbor(RENN)in different noise ratios.Experiments show that instance reduction techniques substantially improved the average classification accuracy on three different noise levels:0%,5%,and 10%.The RENN algorithm achieved the highest levels of accuracy with a significant improvement on seven out of eight used datasets from the University of California Irvine(UCI)machine learning repository.The improvements were more apparent in the 5%and the 10%noise cases.When RENN was applied,the average classification accuracy for the eight datasets in the zero-noise test enhanced from 70.47%to 76.65%compared to the original test.The average accuracy was improved from 66.08%to 77.47%for the 5%-noise case and from 59.89%to 77.59%in the 10%-noise case.Higher confidence was also reported in building the association rules when RENN was used.The above results indicate that RENN is a good solution in removing noise and avoiding overfitting during the construction of the association rules classifier,especially in noisy domains.
基金This research was funded by National Key R&D Program of China(No.SQ2018YFB100002)the National Natural Science Foundation of China(No.s 61761136020,61672308)+5 种基金Microsoft Research Asia,Fraunhofer Cluster of Excellence on"Cognitive Internet Technologies",EU through project Track&Know(grant agreement 780754)NSFC(61761136020)NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization(U1609217)Zhejiang Provincial Natural Science Foundation(LR18F020001)NSFC Grants 61602306Fundamental Research Funds for the Central Universities。
文摘Data quality management,especially data cleansing,has been extensively studied for many years in the areas of data management and visual analytics.In the paper,we first review and explore the relevant work from the research areas of data management,visual analytics and human-computer interaction.Then for different types of data such as multimedia data,textual data,trajectory data,and graph data,we summarize the common methods for improving data quality by leveraging data cleansing techniques at different analysis stages.Based on a thorough analysis,we propose a general visual analytics framework for interactively cleansing data.Finally,the challenges and opportunities are analyzed and discussed in the context of data and humans.