The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall...The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.展开更多
Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed ...Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm’s performance presented by the groups is superior to the other two algorithms.展开更多
In order to detect the traffic pattern of moving objects in the city more accurately and quickly, a parallel algorithm for detecting traffic patterns using stay points and moving features is proposed. First, the featu...In order to detect the traffic pattern of moving objects in the city more accurately and quickly, a parallel algorithm for detecting traffic patterns using stay points and moving features is proposed. First, the features of the stay points in different traffic patterns are extracted, that is, the stay points of various traffic patterns are identified, respectively, and the clustering algorithm is used to mine the unique features of the stop points to different traffic patterns. Then, the moving features in different traffic patterns are extracted from a trajectory of a moving object, including the maximum speed, the average speed, and the stopping rate. A classifier is constructed to predict the traffic pattern of the trajectory using the stay points and moving features. Finally, a parallel algorithm based on Spark is proposed to detect traffic patterns. Experimental results show that the stay points and moving features can reflect the difference between different traffic modes to a greater extent, and the detection accuracy is higher than those of other methods. In addition, the parallel algorithm can increase the speed of identifying traffic patterns.展开更多
Every public speaker prepares his or her public speech meticulously.Witty remarks emerge in an endless stream,and demonstrate the rhetoric beauty of English to a great extent.Almost every speaker employs parallelism i...Every public speaker prepares his or her public speech meticulously.Witty remarks emerge in an endless stream,and demonstrate the rhetoric beauty of English to a great extent.Almost every speaker employs parallelism in his or her public speeches.The present paper is intended to study the concept,the classification and the significance of parallelism in English.展开更多
The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a cer...The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a certain number of instances,particularly,when run time is a consideration.However,the classification of large amounts of data has become a fundamental task in many real-world applications.It is logical to scale the k-Nearest Neighbor method to large scale datasets.This paper proposes a new k-Nearest Neighbor classification method(KNN-CCL)which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts.The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters.The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets.Finally,sets of experiments are conducted on the UCI datasets.The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance.展开更多
Osteoporosis is a systemic disease characterized by low bone mass,impaired bone microstruc-ture,increased bone fragility,and a higher risk of fractures.It commonly affects postmenopausal women and the elderly.Orthopan...Osteoporosis is a systemic disease characterized by low bone mass,impaired bone microstruc-ture,increased bone fragility,and a higher risk of fractures.It commonly affects postmenopausal women and the elderly.Orthopantomography,also known as panoramic radiography,is a widely used imaging technique in dental examinations due to its low cost and easy accessibility.Previous studies have shown that the mandibular cortical index(MCI)derived from orthopantomography can serve as an important indicator of osteoporosis risk.To address this,this study proposes a parallel Transformer network based on multiple instance learning.By introducing parallel modules that alleviate optimization issues and integrating multiple-instance learning with the Transformer architecture,our model effectively extracts information from image patches.Our model achieves an accuracy of 86%and an AUC score of 0.963 on an osteoporosis dataset,which demonstrates its promising and competitive performance.展开更多
Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symme...Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symmetry-based distance.Although an efficient parallel point symmetry-based K-means algorithm(ParSym)has been propsed to overcome this limitation,ParSym may get stuck in sub-optimal solutions due to the K-means technique it used.In this study,we proposed a novel parallel point symmetry-based genetic clustering(ParSymG)algorithm for unsupervised classification.The genetic algorithm was introduced to overcome the sub-optimization problem caused by inappropriate selection of initial centroids in ParSym.A message passing interface(MPI)was used to implement the distributed master–slave paradigm.To make the algorithm more time-efficient,a three-phase speedup strategy was adopted for population initialization,image partition,and kd-tree structure-based nearest neighbor searching.The advantages of ParSymG over existing ParSym and parallel K-means(PKM)alogithms were demonstrated through case studies using three different types of remotely sensed images.Results in speedup and time gain proved the excellent scalability of the ParSymG algorithm.展开更多
High spectral,spatial,vertical and temporal resolution data are increasingly available and result in the serious challenge to pro-cess big remote-sensing images effectively and efficiently.This article introduced how ...High spectral,spatial,vertical and temporal resolution data are increasingly available and result in the serious challenge to pro-cess big remote-sensing images effectively and efficiently.This article introduced how to conduct supervised image classification by implementing maximum likelihood classification(MLC)over big image data on a field programmable gate array(FPGA)cloud.By comparing our prior work of implementing MLC on conventional cluster of multicore computers and graphics processing unit,it can be concluded that FPGAs can achieve the best performance in comparison to conventional CPU cluster and K40 GPU,and are more energy efficient.The proposed pipelined thread approach can be extended to other image-processing solutions to handle big data in the future.展开更多
内部电网地理信息系统(Geographic Information Systern,GIS)数据体量增加,对电网数据存储性能造成了极大的困难,为此,提出一种基于随机森林的电网GIS数据分布式存储方法。以跨域资源共享(Cross-Origin Resource Sharing,CORS)技术在电...内部电网地理信息系统(Geographic Information Systern,GIS)数据体量增加,对电网数据存储性能造成了极大的困难,为此,提出一种基于随机森林的电网GIS数据分布式存储方法。以跨域资源共享(Cross-Origin Resource Sharing,CORS)技术在电网GIS空间信息服务平台中获取的电网GIS数据为基础,根据类区分度数值选择电网GIS数据特征,引入随机森林算法分类处理电网GIS数据,将其合理分发给不同的服务器,采用并行处理手段存储分类数据,从而实现了电网GIS数据的分布式存储。实验数据显示:应用所提方法后,电网GIS数据分类精度达到了96.8%,电网GIS数据分布式存储时间最小值为5.2 s,充分证实了所提方法数据存储性能更佳。展开更多
Supervised image classification has been widely utilized in a variety of remote sensing applications.When large volume of satellite imagery data and aerial photos are increasingly available,high-performance image proc...Supervised image classification has been widely utilized in a variety of remote sensing applications.When large volume of satellite imagery data and aerial photos are increasingly available,high-performance image processing solutions are required to handle large scale of data.This paper introduces how maximum likelihood classification approach is parallelized for implementation on a computer cluster and a graphics processing unit to achieve high performance when processing big imagery data.The solution is scalable and satisfies the need of change detection,object identification,and exploratory analysis on large-scale high-resolution imagery data in remote sensing applications.展开更多
基金Project(KC18071)supported by the Application Foundation Research Program of Xuzhou,ChinaProjects(2017YFC0804401,2017YFC0804409)supported by the National Key R&D Program of China
文摘The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining.
文摘Classification algorithm is one of the key techniques to affect text automatic classification system’s performance, play an important role in automatic classification research area. This paper comparatively analyzed k-NN. VSM and hybrid classification algorithm presented by our research group. Some 2000 pieces of Internet news provided by ChinaInfoBank are used in the experiment. The result shows that the hybrid algorithm’s performance presented by the groups is superior to the other two algorithms.
基金The National Natural Science Foundation of China(No.41471371)
文摘In order to detect the traffic pattern of moving objects in the city more accurately and quickly, a parallel algorithm for detecting traffic patterns using stay points and moving features is proposed. First, the features of the stay points in different traffic patterns are extracted, that is, the stay points of various traffic patterns are identified, respectively, and the clustering algorithm is used to mine the unique features of the stop points to different traffic patterns. Then, the moving features in different traffic patterns are extracted from a trajectory of a moving object, including the maximum speed, the average speed, and the stopping rate. A classifier is constructed to predict the traffic pattern of the trajectory using the stay points and moving features. Finally, a parallel algorithm based on Spark is proposed to detect traffic patterns. Experimental results show that the stay points and moving features can reflect the difference between different traffic modes to a greater extent, and the detection accuracy is higher than those of other methods. In addition, the parallel algorithm can increase the speed of identifying traffic patterns.
文摘Every public speaker prepares his or her public speech meticulously.Witty remarks emerge in an endless stream,and demonstrate the rhetoric beauty of English to a great extent.Almost every speaker employs parallelism in his or her public speeches.The present paper is intended to study the concept,the classification and the significance of parallelism in English.
基金The authors received no specific funding for this work.
文摘The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a certain number of instances,particularly,when run time is a consideration.However,the classification of large amounts of data has become a fundamental task in many real-world applications.It is logical to scale the k-Nearest Neighbor method to large scale datasets.This paper proposes a new k-Nearest Neighbor classification method(KNN-CCL)which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts.The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters.The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets.Finally,sets of experiments are conducted on the UCI datasets.The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance.
文摘Osteoporosis is a systemic disease characterized by low bone mass,impaired bone microstruc-ture,increased bone fragility,and a higher risk of fractures.It commonly affects postmenopausal women and the elderly.Orthopantomography,also known as panoramic radiography,is a widely used imaging technique in dental examinations due to its low cost and easy accessibility.Previous studies have shown that the mandibular cortical index(MCI)derived from orthopantomography can serve as an important indicator of osteoporosis risk.To address this,this study proposes a parallel Transformer network based on multiple instance learning.By introducing parallel modules that alleviate optimization issues and integrating multiple-instance learning with the Transformer architecture,our model effectively extracts information from image patches.Our model achieves an accuracy of 86%and an AUC score of 0.963 on an osteoporosis dataset,which demonstrates its promising and competitive performance.
基金Thiswork was supported by the National Natural Science Foundation of China[grant number 41471313],[grant num-ber 41101356],[grant number 41671391]the Fundamental Research Funds for the Central Universities[grant num-ber 2016XZZX004-02]+1 种基金the Science and Technology Project of Zhejiang Province[grant number 2015C33021],[grant number 2013C33051]Major Program of China High Resolution Earth Observation System[grant number 07-Y30B10-9001].
文摘Symmetry is a common feature in the real world.It may be used to improve a classification by using the point symmetry-based distance as a measure of clustering.However,it is time consuming to calculate the point symmetry-based distance.Although an efficient parallel point symmetry-based K-means algorithm(ParSym)has been propsed to overcome this limitation,ParSym may get stuck in sub-optimal solutions due to the K-means technique it used.In this study,we proposed a novel parallel point symmetry-based genetic clustering(ParSymG)algorithm for unsupervised classification.The genetic algorithm was introduced to overcome the sub-optimization problem caused by inappropriate selection of initial centroids in ParSym.A message passing interface(MPI)was used to implement the distributed master–slave paradigm.To make the algorithm more time-efficient,a three-phase speedup strategy was adopted for population initialization,image partition,and kd-tree structure-based nearest neighbor searching.The advantages of ParSymG over existing ParSym and parallel K-means(PKM)alogithms were demonstrated through case studies using three different types of remotely sensed images.Results in speedup and time gain proved the excellent scalability of the ParSymG algorithm.
基金This research was partially supported by the National Science Foundation through the award SMA-1416509.
文摘High spectral,spatial,vertical and temporal resolution data are increasingly available and result in the serious challenge to pro-cess big remote-sensing images effectively and efficiently.This article introduced how to conduct supervised image classification by implementing maximum likelihood classification(MLC)over big image data on a field programmable gate array(FPGA)cloud.By comparing our prior work of implementing MLC on conventional cluster of multicore computers and graphics processing unit,it can be concluded that FPGAs can achieve the best performance in comparison to conventional CPU cluster and K40 GPU,and are more energy efficient.The proposed pipelined thread approach can be extended to other image-processing solutions to handle big data in the future.
文摘Supervised image classification has been widely utilized in a variety of remote sensing applications.When large volume of satellite imagery data and aerial photos are increasingly available,high-performance image processing solutions are required to handle large scale of data.This paper introduces how maximum likelihood classification approach is parallelized for implementation on a computer cluster and a graphics processing unit to achieve high performance when processing big imagery data.The solution is scalable and satisfies the need of change detection,object identification,and exploratory analysis on large-scale high-resolution imagery data in remote sensing applications.