传统的主曲线算法在小规模数据集上能获得良好的效果,但单节点的计算和存储能力都不能满足海量数据主曲线的提取要求,而算法分布式并行化是目前解决该类问题最有效的途径之一。本文提出基于MapReduce框架的分布式软K段主曲线算法(Distri...传统的主曲线算法在小规模数据集上能获得良好的效果,但单节点的计算和存储能力都不能满足海量数据主曲线的提取要求,而算法分布式并行化是目前解决该类问题最有效的途径之一。本文提出基于MapReduce框架的分布式软K段主曲线算法(Distributed soft k-segments principal curve,DisSKPC)。首先,基于分布式K-Means算法,采用递归粒化方法对数据集进行粒化,以确定粒的大小并保证粒中数据的关联性。然后调用软K段主曲线算法计算每个粒数据的局部主成分线段,并提出用噪声方差来消除在高密集、高曲率的数据区域可能产生的过拟合线段。最后借助哈密顿路径和贪婪算法连接这些局部主成分线段,形成一条通过数据云中间的最佳曲线。实验结果表明,本文所提出的DisSKPC算法具有良好的可行性和扩展性。展开更多
Rapid and precise location of the faults of on-board equipment of train control system is a significant factor to ensure reliable train operation.Text data of the fault tracking table of on-board equipment are taken a...Rapid and precise location of the faults of on-board equipment of train control system is a significant factor to ensure reliable train operation.Text data of the fault tracking table of on-board equipment are taken as samples,and an on-board equipment fault diagnosis model is designed based on the combination of convolutional neural network(CNN)and particle swarm optimization-support vector machines(PSO-SVM).Due to the characteristics of high dimensionality and sparseness of fault text data,CNN is used to achieve feature extraction.In order to decrease the influence of the imbalance of the fault sample data category on the classification accuracy,the PSO-SVM algorithm is introduced.The fully connected classification part of CNN is replaced by PSO-SVM,the extracted features are classified precisely,and the intelligent diagnosis of on-board equipment fault is implemented.According to the test analysis of the fault text data of on-board equipment recorded by a railway bureau and comparison with other models,the experimental results indicate that this model can obviously upgrade the evaluation indexes and can be used as an effective model for fault diagnosis for on-board equipment.展开更多
Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization met...Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization methods aim at visualizing multidimensional data into low-dimensional representational spaces by making use of spatial metaphors and applying dimension reduction techniques.Spatial metaphors are able to provide a metaphoric framework for the visualization of information at different levels of granularity.The present paper makes an investigation on how the issue of granularity is handled in the context of representative examples of spatialization methods.Furthermore,this paper introduces the prototyping tool Geo-Scape,which provides an interactive spatialization environment for representing and exploring multidimensional data at different levels of granularity,by making use of a kernel density estimation technique and on the landscape "smoothness" metaphor.A demonstration scenario is presented next to show how Geo-Scape helps to discover knowledge into a large set of data,by grouping them into meaningful clusters on the basis of a similarity measure and organizing them at different levels of granularity.展开更多
文摘传统的主曲线算法在小规模数据集上能获得良好的效果,但单节点的计算和存储能力都不能满足海量数据主曲线的提取要求,而算法分布式并行化是目前解决该类问题最有效的途径之一。本文提出基于MapReduce框架的分布式软K段主曲线算法(Distributed soft k-segments principal curve,DisSKPC)。首先,基于分布式K-Means算法,采用递归粒化方法对数据集进行粒化,以确定粒的大小并保证粒中数据的关联性。然后调用软K段主曲线算法计算每个粒数据的局部主成分线段,并提出用噪声方差来消除在高密集、高曲率的数据区域可能产生的过拟合线段。最后借助哈密顿路径和贪婪算法连接这些局部主成分线段,形成一条通过数据云中间的最佳曲线。实验结果表明,本文所提出的DisSKPC算法具有良好的可行性和扩展性。
基金Gansu Province Higher Education Innovation Fund Project(No.2020B-104)“Innovation Star”Project for Outstanding Postgraduates of Gansu Province(No.2021CXZX-606)。
文摘Rapid and precise location of the faults of on-board equipment of train control system is a significant factor to ensure reliable train operation.Text data of the fault tracking table of on-board equipment are taken as samples,and an on-board equipment fault diagnosis model is designed based on the combination of convolutional neural network(CNN)and particle swarm optimization-support vector machines(PSO-SVM).Due to the characteristics of high dimensionality and sparseness of fault text data,CNN is used to achieve feature extraction.In order to decrease the influence of the imbalance of the fault sample data category on the classification accuracy,the PSO-SVM algorithm is introduced.The fully connected classification part of CNN is replaced by PSO-SVM,the extracted features are classified precisely,and the intelligent diagnosis of on-board equipment fault is implemented.According to the test analysis of the fault text data of on-board equipment recorded by a railway bureau and comparison with other models,the experimental results indicate that this model can obviously upgrade the evaluation indexes and can be used as an effective model for fault diagnosis for on-board equipment.
文摘Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization methods aim at visualizing multidimensional data into low-dimensional representational spaces by making use of spatial metaphors and applying dimension reduction techniques.Spatial metaphors are able to provide a metaphoric framework for the visualization of information at different levels of granularity.The present paper makes an investigation on how the issue of granularity is handled in the context of representative examples of spatialization methods.Furthermore,this paper introduces the prototyping tool Geo-Scape,which provides an interactive spatialization environment for representing and exploring multidimensional data at different levels of granularity,by making use of a kernel density estimation technique and on the landscape "smoothness" metaphor.A demonstration scenario is presented next to show how Geo-Scape helps to discover knowledge into a large set of data,by grouping them into meaningful clusters on the basis of a similarity measure and organizing them at different levels of granularity.