The electromagnetic time-reversal(TR)technique has the characteristics of spatiotemporal focusing in a time-reversal cavity(TRC),which can be used for pulse compression,thus forming an electromagnetic pulse with high ...The electromagnetic time-reversal(TR)technique has the characteristics of spatiotemporal focusing in a time-reversal cavity(TRC),which can be used for pulse compression,thus forming an electromagnetic pulse with high peak power.A time-reversed pulse-compression method in a single channel has high pulse compression gain.However,single channel pulse compression can only generate limited gain.This paper proposes a novel TR power-combination method in a multichannel TRC to obtain higher peak power based on TR pulse-compression theory.First,the TR power-combination model is given,and the crosstalk properties of the associated channel and the influence of the reversal performance are studied.Then,the power-combination performances for the TR pulse compression,such as combined signal to noise ratio(SNR)and combined compression gain,are analyzed by numerical simulation and experimental methods.The results show that the proposed method has obvious advantages over pulse-compression methods using a single channel cavity,and is more convenient for power combination.展开更多
Support vector machines(SVMs)have been recognized as a powerful tool to perform linear classification.When combined with the sparsity-inducing nonconvex penalty,SVMs can perform classification and variable selection s...Support vector machines(SVMs)have been recognized as a powerful tool to perform linear classification.When combined with the sparsity-inducing nonconvex penalty,SVMs can perform classification and variable selection simultaneously.However,the nonconvex penalized SVMs in general cannot be solved globally and efficiently due to their nondifferentiability,nonconvexity,and nonsmoothness.Existing solutions to the nonconvex penalized SVMs typically solve this problem in a serial fashion,which are unable to fully use the parallel computing power of modern multi-core machines.On the other hand,the fact that many real-world data are stored in a distributed manner urgently calls for a parallel and distributed solution to the nonconvex penalized SVMs.To circumvent this challenge,we propose an efficient alternating direction method of multipliers(ADMM)based algorithm that solves the nonconvex penalized SVMs in a parallel and distributed way.We design many useful techniques to decrease the computation and synchronization cost of the proposed parallel algorithm.The time complexity analysis demonstrates the low time complexity of the proposed parallel algorithm.Moreover,the convergence of the parallel algorithm is guaranteed.Experimental evaluations on four LIBSVM benchmark datasets demonstrate the efficiency of the proposed parallel algorithm.展开更多
The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several time...The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several times when determining cluster centers, yielding high computational complexity. In this paper, we focus on accelerating the time-consuming density peaks algorithm with a graphics processing unit (GPU). We analyze the principle of the algorithm to locate its computational bottlenecks, and evaluate its potential for parallelism. In light of our analysis, we propose an efficient parallel DP algorithm targeting on a GPU architecture and implement this parallel method with compute unified device architecture (CUDA), called the ‘CUDA-DP platform'. Specifically, we use shared memory to improve data locality, which reduces the amount of global memory access. To exploit the coalescing accessing mechanism of CPU, we convert the data structure of the CUDA-DP program from array of structures to structure of arrays. In addition, we introduce a binary search-and-sampling method to avoid sorting a large array. The results of the experiment show that CUDA-DP can achieve a 45-fold acceleration when compared to the central processing unit based density peaks implementation.展开更多
Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medica...Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medical,financial,and military domains,labeled data is very scarce,while unlabeled data is readily available.Previous studies have used unlabeled data to enrich word representations,but a large amount of entity information in unlabeled data is neglected,which may be beneficial to the NER task.In this study,we propose a semi-supervised method for NER tasks,which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels.Pseudo labels are automatically generated for unlabeled data and used as if they were true labels.Our semi-supervised framework includes three steps:constructing an optimal single neural model for a specific NER task,learning a module that evaluates pseudo labels,and creating new labeled data and improving the NER model iteratively.Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model.Even when we use only pre-trained static word embeddings and do not rely on any external knowledge,our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.展开更多
Fingerprint has been widely used in a variety of biometric identification systems in the past several years due to its uniqueness and immutability. With the rapid development of fingerprint identification techniques, ...Fingerprint has been widely used in a variety of biometric identification systems in the past several years due to its uniqueness and immutability. With the rapid development of fingerprint identification techniques, many fingerprint identification systems are in urgent need to deal with large-scale fingerprint storage and high concurrent recognition queries, which bring huge challenges to the system. In this circumstance, we design and implement a distributed and load-balancing fingerprint identification system named Pegasus, which includes a distributed feature extraction subsystem and a distributed feature storage subsystem. The feature extraction procedure combines the Hadoop Image Processing Interface(HIPI) library to enhance its overall processing speed; the feature storage subsystem optimizes MongoD B's default load balance strategy to improve the efficiency and robustness of Pegasus.Experiments and simulations are carried out, and results show that Pegasus can reduce the time cost by 70% during the feature extraction procedure. Pegasus also balances the difference of access load among front-end mongos nodes to less than 5%. Additionally, Pegasus reduces over 40% of data migration among back-end data shards to obtain a more reasonable data distribution based on the operation load(insertion, deletion, update, and query) of each shard.展开更多
High-dimensional data arising from diverse scientific research fields and industrial development have led to increased interest in sparse learning due to model parsimony and computational advantage. With the assumptio...High-dimensional data arising from diverse scientific research fields and industrial development have led to increased interest in sparse learning due to model parsimony and computational advantage. With the assumption of sparsity, many computational problems can be handled efficiently in practice. Structured sparse learning encodes the structural information of the variables and has been quite successful in numerous research fields. With various types of structures discovered, sorts of structured regularizations have been proposed. These regularizations have greatly improved the efficacy of sparse learning algorithms through the use of specific structural information. In this article, we present a systematic review of structured sparse learning including ideas, formulations, algorithms, and applications. We present these algorithms in the unified framework of minimizing the sum of loss and penalty functions, summarize publicly accessible software implementations, and compare the computational complexity of typical optimization methods to solve structured sparse learning problems. In experiments, we present applications in unsupervised learning, for structured signal recovery and hierarchical image reconstruction, and in supervised learning in the context of a novel graph-guided logistic regression.展开更多
Internet-based virtual computing environment (iVCE) has been proposed to combine data centers and other kinds of computing resources on the Internet to provide efficient and economical services. Virtual machines (...Internet-based virtual computing environment (iVCE) has been proposed to combine data centers and other kinds of computing resources on the Internet to provide efficient and economical services. Virtual machines (VMs) have been widely used in iVCE to isolate different users/jobs and ensure trustworthiness, but traditionally VMs require a long period of time for booting, which cannot meet the requirement of iVCE's large-scale and highly dynamic applications. To address this problem, in this paper we design and implement VirtMan, a fast booting system for a large number of virtual machines in iVCE. VirtMan uses the Linux Small Computer System Interface (SCSI) target to remotely mount to the source image in a scalable hierarchy, and leverages the homogeneity of a set of VMs to transfer only necessary image data at runtime. We have implemented VirtMan both as a standalone system and for OpenStack. In our 100-server testbed, VirtMan boots up 1000 VMs (with a 15 CB image of Windows Server 2008) on 100 physical servers in less than 120 s, which is three orders of magnitude lower than current public clouds.展开更多
Although concern has been recently expressed with regard to the solution to the non-convex problem, convex optimization is still important in machine learning, especially when the situation requires an interpretable m...Although concern has been recently expressed with regard to the solution to the non-convex problem, convex optimization is still important in machine learning, especially when the situation requires an interpretable model. Solution to the convex problem is a global minimum, and the final model can be explained mathematically. Typically, the convex problem is re-casted as a regularized risk minimization problem to prevent overfitting. The cutting plane method (CPM) is one of the best solvers for the convex problem, irrespective of whether the objective function is differentiable or not. However, CPM and its variants fail to adequately address large-scale data-intensive cases because these algorithms access the entire dataset in each iteration, which substantially increases the computational burden and memory cost. To alleviate this problem, we propose a novel algorithm named the mini-batch cutting plane method (MBCPM), which iterates with estimated cutting planes calculated on a small batch of sampled data and is capable of handling large-scale problems. Furthermore, the proposed MBCPM adopts a "sink" operation that detects and adjusts noisy estimations to guarantee convergence. Numerical experiments on extensive real-world datasets demonstrate the effectiveness of MBCPM, which is superior to the bundle methods for regularized risk minimization as well as popular stochastic gradient descent methods in terms of convergence speed.展开更多
基金Project supported by the National Key R&D Program of China(Grant No.2021YFC2203503)。
文摘The electromagnetic time-reversal(TR)technique has the characteristics of spatiotemporal focusing in a time-reversal cavity(TRC),which can be used for pulse compression,thus forming an electromagnetic pulse with high peak power.A time-reversed pulse-compression method in a single channel has high pulse compression gain.However,single channel pulse compression can only generate limited gain.This paper proposes a novel TR power-combination method in a multichannel TRC to obtain higher peak power based on TR pulse-compression theory.First,the TR power-combination model is given,and the crosstalk properties of the associated channel and the influence of the reversal performance are studied.Then,the power-combination performances for the TR pulse compression,such as combined signal to noise ratio(SNR)and combined compression gain,are analyzed by numerical simulation and experimental methods.The results show that the proposed method has obvious advantages over pulse-compression methods using a single channel cavity,and is more convenient for power combination.
基金Project supported by the Major State Research Development Program,China(No.2016YFB0201305)。
文摘Support vector machines(SVMs)have been recognized as a powerful tool to perform linear classification.When combined with the sparsity-inducing nonconvex penalty,SVMs can perform classification and variable selection simultaneously.However,the nonconvex penalized SVMs in general cannot be solved globally and efficiently due to their nondifferentiability,nonconvexity,and nonsmoothness.Existing solutions to the nonconvex penalized SVMs typically solve this problem in a serial fashion,which are unable to fully use the parallel computing power of modern multi-core machines.On the other hand,the fact that many real-world data are stored in a distributed manner urgently calls for a parallel and distributed solution to the nonconvex penalized SVMs.To circumvent this challenge,we propose an efficient alternating direction method of multipliers(ADMM)based algorithm that solves the nonconvex penalized SVMs in a parallel and distributed way.We design many useful techniques to decrease the computation and synchronization cost of the proposed parallel algorithm.The time complexity analysis demonstrates the low time complexity of the proposed parallel algorithm.Moreover,the convergence of the parallel algorithm is guaranteed.Experimental evaluations on four LIBSVM benchmark datasets demonstrate the efficiency of the proposed parallel algorithm.
基金supported by the National Basic Research Program(973)of China(No.2014CB340303)the National Natural Science Foundation of China(Nos.61502509 and 61222205)+1 种基金the Program for New Century Excellent Talents in Universitythe Fok Ying-Tong Education Foundation(No.141066)
文摘The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several times when determining cluster centers, yielding high computational complexity. In this paper, we focus on accelerating the time-consuming density peaks algorithm with a graphics processing unit (GPU). We analyze the principle of the algorithm to locate its computational bottlenecks, and evaluate its potential for parallelism. In light of our analysis, we propose an efficient parallel DP algorithm targeting on a GPU architecture and implement this parallel method with compute unified device architecture (CUDA), called the ‘CUDA-DP platform'. Specifically, we use shared memory to improve data locality, which reduces the amount of global memory access. To exploit the coalescing accessing mechanism of CPU, we convert the data structure of the CUDA-DP program from array of structures to structure of arrays. In addition, we introduce a binary search-and-sampling method to avoid sorting a large array. The results of the experiment show that CUDA-DP can achieve a 45-fold acceleration when compared to the central processing unit based density peaks implementation.
基金Project supported by the National Key Research and Development Program of China(No.2016YFB0201305)the National Natural Science Foundation of China(No.61872376)。
文摘Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medical,financial,and military domains,labeled data is very scarce,while unlabeled data is readily available.Previous studies have used unlabeled data to enrich word representations,but a large amount of entity information in unlabeled data is neglected,which may be beneficial to the NER task.In this study,we propose a semi-supervised method for NER tasks,which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels.Pseudo labels are automatically generated for unlabeled data and used as if they were true labels.Our semi-supervised framework includes three steps:constructing an optimal single neural model for a specific NER task,learning a module that evaluates pseudo labels,and creating new labeled data and improving the NER model iteratively.Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model.Even when we use only pre-trained static word embeddings and do not rely on any external knowledge,our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.
基金Project supported by the National Basic Research Program(973)of China(No.2014CB340303) the National Natural Science Foundation of China(Nos.61222205 and 61402490)+1 种基金 the Program for New Century Excellent Talents in University,China(No.141066) the Fok Ying-Tong Education Foundation
文摘Fingerprint has been widely used in a variety of biometric identification systems in the past several years due to its uniqueness and immutability. With the rapid development of fingerprint identification techniques, many fingerprint identification systems are in urgent need to deal with large-scale fingerprint storage and high concurrent recognition queries, which bring huge challenges to the system. In this circumstance, we design and implement a distributed and load-balancing fingerprint identification system named Pegasus, which includes a distributed feature extraction subsystem and a distributed feature storage subsystem. The feature extraction procedure combines the Hadoop Image Processing Interface(HIPI) library to enhance its overall processing speed; the feature storage subsystem optimizes MongoD B's default load balance strategy to improve the efficiency and robustness of Pegasus.Experiments and simulations are carried out, and results show that Pegasus can reduce the time cost by 70% during the feature extraction procedure. Pegasus also balances the difference of access load among front-end mongos nodes to less than 5%. Additionally, Pegasus reduces over 40% of data migration among back-end data shards to obtain a more reasonable data distribution based on the operation load(insertion, deletion, update, and query) of each shard.
基金Project supported by the National Natural Science Foundation of China (No. 61303264)
文摘High-dimensional data arising from diverse scientific research fields and industrial development have led to increased interest in sparse learning due to model parsimony and computational advantage. With the assumption of sparsity, many computational problems can be handled efficiently in practice. Structured sparse learning encodes the structural information of the variables and has been quite successful in numerous research fields. With various types of structures discovered, sorts of structured regularizations have been proposed. These regularizations have greatly improved the efficacy of sparse learning algorithms through the use of specific structural information. In this article, we present a systematic review of structured sparse learning including ideas, formulations, algorithms, and applications. We present these algorithms in the unified framework of minimizing the sum of loss and penalty functions, summarize publicly accessible software implementations, and compare the computational complexity of typical optimization methods to solve structured sparse learning problems. In experiments, we present applications in unsupervised learning, for structured signal recovery and hierarchical image reconstruction, and in supervised learning in the context of a novel graph-guided logistic regression.
基金supported by the National Natural Science Foundation of China(Nos.61379055 and 61379053)
文摘Internet-based virtual computing environment (iVCE) has been proposed to combine data centers and other kinds of computing resources on the Internet to provide efficient and economical services. Virtual machines (VMs) have been widely used in iVCE to isolate different users/jobs and ensure trustworthiness, but traditionally VMs require a long period of time for booting, which cannot meet the requirement of iVCE's large-scale and highly dynamic applications. To address this problem, in this paper we design and implement VirtMan, a fast booting system for a large number of virtual machines in iVCE. VirtMan uses the Linux Small Computer System Interface (SCSI) target to remotely mount to the source image in a scalable hierarchy, and leverages the homogeneity of a set of VMs to transfer only necessary image data at runtime. We have implemented VirtMan both as a standalone system and for OpenStack. In our 100-server testbed, VirtMan boots up 1000 VMs (with a 15 CB image of Windows Server 2008) on 100 physical servers in less than 120 s, which is three orders of magnitude lower than current public clouds.
基金Project supported by the National Key R&D Program of China(No.2018YFB0204300)the National Natural Science Foundation of China(Nos.61872376 and 61806216)。
文摘Although concern has been recently expressed with regard to the solution to the non-convex problem, convex optimization is still important in machine learning, especially when the situation requires an interpretable model. Solution to the convex problem is a global minimum, and the final model can be explained mathematically. Typically, the convex problem is re-casted as a regularized risk minimization problem to prevent overfitting. The cutting plane method (CPM) is one of the best solvers for the convex problem, irrespective of whether the objective function is differentiable or not. However, CPM and its variants fail to adequately address large-scale data-intensive cases because these algorithms access the entire dataset in each iteration, which substantially increases the computational burden and memory cost. To alleviate this problem, we propose a novel algorithm named the mini-batch cutting plane method (MBCPM), which iterates with estimated cutting planes calculated on a small batch of sampled data and is capable of handling large-scale problems. Furthermore, the proposed MBCPM adopts a "sink" operation that detects and adjusts noisy estimations to guarantee convergence. Numerical experiments on extensive real-world datasets demonstrate the effectiveness of MBCPM, which is superior to the bundle methods for regularized risk minimization as well as popular stochastic gradient descent methods in terms of convergence speed.