Accurate geospatial data are essential for geographic information systems(GIS),environmental monitoring,and urban planning.The deep integration of the open Internet and geographic information technology has led to inc...Accurate geospatial data are essential for geographic information systems(GIS),environmental monitoring,and urban planning.The deep integration of the open Internet and geographic information technology has led to increasing challenges in the integrity and security of spatial data.In this paper,we consider abnormal spatial data as missing data and focus on abnormal spatial data recovery.Existing geospatial data recovery methods require complete datasets for training,resulting in time-consuming data recovery and lack of generalization.To address these issues,we propose a GAIN-LSTM-based geospatial data recovery method(TGAIN),which consists of two main works:(1)it uses a long-short-term recurrent neural network(LSTM)as a generator to analyze geospatial temporal data and capture its temporal correlation;(2)it constructs a complete TGAIN network using a cue-masked fusion matrix mechanism to obtain data that matches the original distribution of the input data.The experimental results on two publicly accessible datasets demonstrate that our proposed TGAIN approach surpasses four contemporary and traditional models in terms of mean absolute error(MAE),root mean square error(RMSE),mean square error(MSE),mean absolute percentage error(MAPE),coefficient of determination(R2)and average computational time across various data missing rates.Concurrently,TGAIN exhibits superior accuracy and robustness in data recovery compared to existing models,especially when dealing with a high rate of missing data.Our model is of great significance in improving the integrity of geospatial data and provides data support for practical applications such as urban traffic optimization prediction and personal mobility analysis.展开更多
Cost-sensitive learning has been applied to resolve the multi-class imbalance problem in Internet traffic classification and it has achieved considerable results. But the classification performance on the minority cla...Cost-sensitive learning has been applied to resolve the multi-class imbalance problem in Internet traffic classification and it has achieved considerable results. But the classification performance on the minority classes with a few bytes is still unhopeful because the existing research only focuses on the classes with a large amount of bytes. Therefore, the class-dependent misclassification cost is studied. Firstly, the flow rate based cost matrix (FCM) is investigated. Secondly, a new cost matrix named weighted cost matrix (WCM) is proposed, which calculates a reasonable weight for each cost of FCM by regarding the data imbalance degree and classification accuracy of each class. It is able to further improve the classification performance on the difficult minority class (the class with more flows but worse classification accuracy). Experimental results on twelve real traffic datasets show that FCM and WCM obtain more than 92% flow g-mean and 80% byte g-mean on average; on the test set collected one year later, WCM outperforms FCM in terms of stability.展开更多
基金supported by the National Natural Science Foundation of China(No.62002144)Ministry of Education Chunhui Plan Research Project(Nos.202200345,HZKY20220125).
文摘Accurate geospatial data are essential for geographic information systems(GIS),environmental monitoring,and urban planning.The deep integration of the open Internet and geographic information technology has led to increasing challenges in the integrity and security of spatial data.In this paper,we consider abnormal spatial data as missing data and focus on abnormal spatial data recovery.Existing geospatial data recovery methods require complete datasets for training,resulting in time-consuming data recovery and lack of generalization.To address these issues,we propose a GAIN-LSTM-based geospatial data recovery method(TGAIN),which consists of two main works:(1)it uses a long-short-term recurrent neural network(LSTM)as a generator to analyze geospatial temporal data and capture its temporal correlation;(2)it constructs a complete TGAIN network using a cue-masked fusion matrix mechanism to obtain data that matches the original distribution of the input data.The experimental results on two publicly accessible datasets demonstrate that our proposed TGAIN approach surpasses four contemporary and traditional models in terms of mean absolute error(MAE),root mean square error(RMSE),mean square error(MSE),mean absolute percentage error(MAPE),coefficient of determination(R2)and average computational time across various data missing rates.Concurrently,TGAIN exhibits superior accuracy and robustness in data recovery compared to existing models,especially when dealing with a high rate of missing data.Our model is of great significance in improving the integrity of geospatial data and provides data support for practical applications such as urban traffic optimization prediction and personal mobility analysis.
基金supported by the National Basic Research Program of China(2007CB307100,2007CB307106)
文摘Cost-sensitive learning has been applied to resolve the multi-class imbalance problem in Internet traffic classification and it has achieved considerable results. But the classification performance on the minority classes with a few bytes is still unhopeful because the existing research only focuses on the classes with a large amount of bytes. Therefore, the class-dependent misclassification cost is studied. Firstly, the flow rate based cost matrix (FCM) is investigated. Secondly, a new cost matrix named weighted cost matrix (WCM) is proposed, which calculates a reasonable weight for each cost of FCM by regarding the data imbalance degree and classification accuracy of each class. It is able to further improve the classification performance on the difficult minority class (the class with more flows but worse classification accuracy). Experimental results on twelve real traffic datasets show that FCM and WCM obtain more than 92% flow g-mean and 80% byte g-mean on average; on the test set collected one year later, WCM outperforms FCM in terms of stability.