Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecu...Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecular electronegativity-distance vector(I-MEDV) was developed. It was used to describe the structures of 37 compounds of styrax japonicus sieb flowers. Through multiple linear regression(MLR),a QSRR model was built up. The correlation coefficient(R1) of the model was 0.980. Then,4 vectors were selected to build another model through the method of stepwise multiple regression(SMR) ,and the correlation coefficient(R2) of the model was 0.975. Moreover,all the two models were evaluated by performing the crossvalidation with the leave-one-out(LOO) procedure and the correlation coefficients(Rcv) were 0.948 and 0.968,respectively. The results show that the I-MEDV could successfully describe the structures of organic compounds. The stability and predictability of the models were good.展开更多
Polychlorinated dibenzothiophenes(PCDTs) are classified as persistent organic pollutants in the environment,so the analysis of PCDTs by their gas chromatographic behaviors is of great significance.Quantitative struc...Polychlorinated dibenzothiophenes(PCDTs) are classified as persistent organic pollutants in the environment,so the analysis of PCDTs by their gas chromatographic behaviors is of great significance.Quantitative structure-retention relationship(QSRR) analysis is a useful technique capable of relating chromatographic retention time to the molecular structure.In this paper,a QSRR study of 37 PCDTs was carried out by using molecular electronegativity distance vector(MEDV) descriptors and multiple linear regression(MLR) and partial least-squares regression(PLS) methods.The correlation coefficient R of established MLR,PLS models,leave-one-out(LOO) cross-validation(CV),Q2ext were 0.9951,0.9942,0.9839(MLR) and 0.9925,0.9915,0.9833(PLS),respectively.Results showed that the model exhibited excellent estimate capability for internal sample set and good predictive capability for external sample set.By using MEDV descriptors,the QSRR model can provide a simple and rapid way to predict the gas-chromatographic retention indices of polychlorinated dibenzothiophenes in conditions of lacking standard samples or poor experimental conditions.展开更多
Based on the framework of support vector machines (SVM) using one-against-one (OAO) strategy, a new multi-class kernel method based on directed aeyclie graph (DAG) and probabilistic distance is proposed to raise...Based on the framework of support vector machines (SVM) using one-against-one (OAO) strategy, a new multi-class kernel method based on directed aeyclie graph (DAG) and probabilistic distance is proposed to raise the multi-class classification accuracies. The topology structure of DAG is constructed by rearranging the nodes' sequence in the graph. DAG is equivalent to guided operating SVM on a list, and the classification performance depends on the nodes' sequence in the graph. Jeffries-Matusita distance (JMD) is introduced to estimate the separability of each class, and the implementation list is initialized with all classes organized according to certain sequence in the list. To testify the effectiveness of the proposed method, numerical analysis is conducted on UCI data and hyperspectral data. Meanwhile, comparative studies using standard OAO and DAG classification methods are also conducted and the results illustrate better performance and higher accuracy of the orooosed JMD-DAG method.展开更多
Support vector machines (SVMs) are not as favored for large-scale data mining as for pattern recognition and machine learning because the training complexity of SVMs is highly dependent on the size of data set. This...Support vector machines (SVMs) are not as favored for large-scale data mining as for pattern recognition and machine learning because the training complexity of SVMs is highly dependent on the size of data set. This paper presents a geometric distance-based SVM (GDB-SVM). It takes the distance between a point and classified hyperplane as classification rule,and is designed on the basis of theoretical analysis and geometric intuition. Experimental code is derived from LibSVM with Microsoft Visual C ++ 6.0 as system of translating and editing. Four predicted results of five of GDB-SVM are better than those of the method of one against all (OAA). Three predicted results of five of GDB-SVM are better than those of the method of one against one (OAO). Experiments on real data sets show that GDB-SVM is not only superior to the methods of OAA and OAO, but highly scalable for large data sets while generating high classification accuracy.展开更多
Many animals possess actively movable tactile sensors in their heads,to explore the near-range space.During locomotion,an antenna is used in near range orientation,for example,in detecting,localizing,probing,and negot...Many animals possess actively movable tactile sensors in their heads,to explore the near-range space.During locomotion,an antenna is used in near range orientation,for example,in detecting,localizing,probing,and negotiating obstacles.A bionic tactile sensor used in the present work was inspired by the antenna of the stick insects.The sensor is able to detect an obstacle and its location in 3 D(Three dimensional) space.The vibration signals are analyzed in the frequency domain using Fast Fourier Transform(FFT) to estimate the distances.Signal processing algorithms,Artificial Neural Network(ANN) and Support Vector Machine(SVM) are used for the analysis and prediction processes.These three prediction techniques are compared for both distance estimation and material classification processes.When estimating the distances,the accuracy of estimation is deteriorated towards the tip of the probe due to the change in the vibration modes.Since the vibration data within that region have high a variance,the accuracy in distance estimation and material classification are lower towards the tip.The change in vibration mode is mathematically analyzed and a solution is proposed to estimate the distance along the full range of the probe.展开更多
Seizure detection is extremely essential for long-term monitoring of epileptic patients. This paper investigates the detection of epileptic seizures in multi-channel long-term intracranial electroencephalogram (iEEG...Seizure detection is extremely essential for long-term monitoring of epileptic patients. This paper investigates the detection of epileptic seizures in multi-channel long-term intracranial electroencephalogram (iEEG). The algorithm conducts wavelet decomposition of iEEGs with five scales, and transforms the sum of the three frequency bands into histogram for computing the distance. The proposed method combines a novel feature called EMD-L1, which is an efficient algorithm of earth movers' distance (EMD), with support vector machine (SVM) for binary classification between seizures and non-sei- zures. The EMD-LI used in this method is characterized by low time complexity and high processing speed by exploiting the L~ metric structure. The smoothing and collar technique are applied on the raw outputs of SVM classifier to obtain more ac- curate results. Several evaluation criteria are recommended to compare our algorithm with other conventional methods using the same dataset from the Freiburg EEG database. Experiment results show that the proposed method achieves a high sensi- tivity, specificity and low false detection rate, which are 95.73 %, 98.45 % and 0.33/h, respectively. This algorithm is char- acterized by its robustness and high accuracy with the possibility of performing real-time analysis of EEG data, and may serve as a seizure detection tool for monitoring long-term EEG.展开更多
In the present paper we investigate the relationship between Wiener number W, hyper-Wiener number R, Wiener vectors WV, hyper-Wiener vectors HWV, Wiener polynomial H, hyper-Wiener polynomial HH and distance distributi...In the present paper we investigate the relationship between Wiener number W, hyper-Wiener number R, Wiener vectors WV, hyper-Wiener vectors HWV, Wiener polynomial H, hyper-Wiener polynomial HH and distance distribution DD of a (molecular) graph. It is shown that for connected graphs G and G*, the following five statements are equivalent:?;and if G and G* have same distance distribution DD then they have same W and R but the contrary is not true. Therefore, we further investigate the graphs with same distance distribution. Some construction methods for finding graphs with same distance distribution are given.展开更多
In location-aided routing of Mobile Ad hoc NETworks(MANET),nodes mobility and the inaccuracy of location information may result in constant flooding,which will reduce the network performance.In this paper,a Distance-B...In location-aided routing of Mobile Ad hoc NETworks(MANET),nodes mobility and the inaccuracy of location information may result in constant flooding,which will reduce the network performance.In this paper,a Distance-Based Location-Aided Routing(DBLAR) for MANET has been proposed.By tracing the location information of destination nodes and referring to distance change between nodes to adjust route discovery dynamically,the proposed routing algorithm can avoid flooding in the whole networks.Besides,Distance Update Threshold(DUT) is set up to reach the balance between real-time ability and update overhead of location information of nodes,meanwhile,the detection of relative distance vector can achieve the goal of adjusting forwarding condition.Simulation results reveal that DBLAR performs better than LAR1 in terms of packet successful delivery ratio,average end-to-end delay and routing-load,and the set of DUT and relative distance vector has a significant impact on this algorithm.展开更多
Aiming at the rapid identification of rural buildings in complex environments from high-spatialresolution images, an improved Mahalanobis distance colour segmentation method(IMDCSM) is proposed and realised in Red, Gr...Aiming at the rapid identification of rural buildings in complex environments from high-spatialresolution images, an improved Mahalanobis distance colour segmentation method(IMDCSM) is proposed and realised in Red, Green and Blue(RGB) space. Vector sets of a lower discrete degree are obtained by filtering the colour vector sets of the building samples, and a standard ellipsoid equation can be constructed based on these vector sets. The threshold of interested colour range can be flexibly and intuitively selected by changing the shape and size of this ellipsoid. Then, according to the relationship between the location of the image pixel colour vector and the ellipsoid, all building information can be extracted quickly. To verify the effectiveness of the proposed method, unmanned aerial vehicle(UAV) images of two areas in the suburbs of Chengdu city and Deyang city were utilised as experimental data for image segmentation, and the existing colour segmentation method based on the Mahalanobis distance was selected as an indicator to assess the effectiveness of this method. The experimental results demonstrate that the completeness and correctness of this method reached 95% and 83.0%, respectively, values that are higher than those of the Mahalanobis distance colour segmentation method(MDCSM). In general, this method is suitable for the rapid extraction of rural building information, and provides a new threshold selection method for classification.展开更多
The severe acute respiratory syndrome COVID-19 was discovered on December 31,2019 in China.Subsequently,many COVID-19 cases were reported in many other countries.However,some positive COVID-19 samples had been reporte...The severe acute respiratory syndrome COVID-19 was discovered on December 31,2019 in China.Subsequently,many COVID-19 cases were reported in many other countries.However,some positive COVID-19 samples had been reported earlier than those officially accepted by health authorities in other countries,such as France and Italy.Thus,it is of great importance to determine the place where SARS-CoV-2 was first transmitted to human.To this end,we analyze genomes of SARS-CoV-2 using k-mer natural vector method and compare the similarities of global SARS-CoV-2 genomes by a new natural metric.Because it is commonly accepted that SARS-CoV-2 is originated from bat coronavirus RaTG13,we only need to determine which SARS-CoV-2 genome sequence has the closest distance to bat coronavirus RaTG13 under our natural metric.From our analysis,SARS-CoV-2 most likely has already existed in other countries such as France,India,Netherland,England and United States before the outbreak at Wuhan,China.展开更多
A novel permutation-dependent Baire distance is introduced for multi-channel data. The optimal permutation is given by minimizing the sum of these pairwise distances. It is shown that for most practical cases the mini...A novel permutation-dependent Baire distance is introduced for multi-channel data. The optimal permutation is given by minimizing the sum of these pairwise distances. It is shown that for most practical cases the minimum is attained by a new gradient descent algorithm introduced in this article. It is of biquadratic time complexity: Both quadratic in number of channels and in size of data. The optimal permutation allows us to introduce a novel Baire-distance kernel Support Vector Machine (SVM). Applied to benchmark hyperspectral remote sensing data, this new SVM produces results which are comparable with the classical linear SVM, but with higher kernel target alignment.展开更多
Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curs...Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called Bit-code and Distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B+-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD out- performs the existing index structures for KNN search in high-dimensional spaces.展开更多
In the present paper, the problem of handwritten character recognition has been tackled with multiresolution technique using discrete wavelet transform (DWT) and Euclidean distance metric (EDM). The technique has been...In the present paper, the problem of handwritten character recognition has been tackled with multiresolution technique using discrete wavelet transform (DWT) and Euclidean distance metric (EDM). The technique has been tested and found to be more accurate and faster. Characters is classified into 26 pattern classes based on appropriate properties. Features of the handwritten character images are extracted by DWT used with appropriate level of multiresolution technique, and then each pattern class is characterized by a mean vector. Distances from input pattern vector to all the mean vectors are computed by EDM. Minimum distance determines the class membership of input pattern vector. The proposed method provides good recognition accuracy of 90% for handwritten characters even with fewer samples.展开更多
Distance metric learning plays an important role in many machine learning tasks. In this paper, we propose a method for learning a Mahanalobis distance metric. By formulating the metric learning problem with relative ...Distance metric learning plays an important role in many machine learning tasks. In this paper, we propose a method for learning a Mahanalobis distance metric. By formulating the metric learning problem with relative distance constraints, we suggest a Relative Distance Constrained Metric Learning (RDCML) model which can be easily implemented and effectively solved by a modified support vector machine (SVM) approach. Experimental results on UCI datasets and handwritten digits datasets show that RDCML achieves better or comparable classification accuracy when compared with the state-of-the-art metric learning methods.展开更多
股市的情绪化倾向是股票市场具有高度不确定性的主要原因,直接利用历史数据的股票趋势预测方法难以适应市场情绪的多变性,在实际应用中效果不理想。文章针对市场情绪的不稳定性导致股市拐点难以预测的问题,提出一种基于情绪向量的隐半...股市的情绪化倾向是股票市场具有高度不确定性的主要原因,直接利用历史数据的股票趋势预测方法难以适应市场情绪的多变性,在实际应用中效果不理想。文章针对市场情绪的不稳定性导致股市拐点难以预测的问题,提出一种基于情绪向量的隐半马尔可夫模型股市拐点预测方法(hidden semi-Markov model stock turning point prediction method based on sentiment vector,SV-HSMM)。针对市场情绪不可观察性,选取与市场情绪相关的主要特征,使用马尔可夫毯融合成市场情绪;利用隐半马尔可夫模型建模市场环境,构建市场情绪、市场状态和状态持续时间之间的结构关系;引入情绪向量平滑情绪的多变性,并利用Kullback-Leibler(KL)距离量化情绪热度;利用隐半马尔可夫模型的动态推理实现股市拐点预测。结果表明情绪向量方法具有更好的预测效果。展开更多
针对群组机器人系统在应急场景下移动频繁、能量有限的特点,提出了一种基于能量与速度的分簇自组织按需距离矢量协议(Clustered Ad hoc On-Demand Distance Vector Protocol Based on Energy and Speed,ESC-AODV),以延长群组机器人网络...针对群组机器人系统在应急场景下移动频繁、能量有限的特点,提出了一种基于能量与速度的分簇自组织按需距离矢量协议(Clustered Ad hoc On-Demand Distance Vector Protocol Based on Energy and Speed,ESC-AODV),以延长群组机器人网络运行时间,提高通信可靠性。用路由性能代替跳数作为路由判据,目的节点在重复接收到路由请求(Route Request,RREQ)数据包时,若路由性能更小,则回复路由应答(Routing Reply,RREP)数据包,以此选择更好的路由,引入分簇结构,通过簇头和网关组成的骨干网络减少广播洪泛次数。实验结果证明,节点数量多时,改进的ESC-AODV协议在延长网络生存时间的同时,平均端到端时延、数据包投递率、吞吐量和路由开销均优于AODV以及基于能量、负载和速度的AODV路由协议(AODV Routing Protocol Based on Energy,Load and Speed,ELS-AODV)。ESC-AODV协议能够节约网络能量,提高可靠性,获得更优的网络性能。展开更多
基金supported by the Youth Foundation of Education Bureau,Sichuan Province (09ZB036)Technology Bureau,Sichuan Province (2006j13-141)
文摘Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecular electronegativity-distance vector(I-MEDV) was developed. It was used to describe the structures of 37 compounds of styrax japonicus sieb flowers. Through multiple linear regression(MLR),a QSRR model was built up. The correlation coefficient(R1) of the model was 0.980. Then,4 vectors were selected to build another model through the method of stepwise multiple regression(SMR) ,and the correlation coefficient(R2) of the model was 0.975. Moreover,all the two models were evaluated by performing the crossvalidation with the leave-one-out(LOO) procedure and the correlation coefficients(Rcv) were 0.948 and 0.968,respectively. The results show that the I-MEDV could successfully describe the structures of organic compounds. The stability and predictability of the models were good.
基金supported by the Foundation of Returned Scholars (Main Program) of Shanxi Province (200902)
文摘Polychlorinated dibenzothiophenes(PCDTs) are classified as persistent organic pollutants in the environment,so the analysis of PCDTs by their gas chromatographic behaviors is of great significance.Quantitative structure-retention relationship(QSRR) analysis is a useful technique capable of relating chromatographic retention time to the molecular structure.In this paper,a QSRR study of 37 PCDTs was carried out by using molecular electronegativity distance vector(MEDV) descriptors and multiple linear regression(MLR) and partial least-squares regression(PLS) methods.The correlation coefficient R of established MLR,PLS models,leave-one-out(LOO) cross-validation(CV),Q2ext were 0.9951,0.9942,0.9839(MLR) and 0.9925,0.9915,0.9833(PLS),respectively.Results showed that the model exhibited excellent estimate capability for internal sample set and good predictive capability for external sample set.By using MEDV descriptors,the QSRR model can provide a simple and rapid way to predict the gas-chromatographic retention indices of polychlorinated dibenzothiophenes in conditions of lacking standard samples or poor experimental conditions.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61201310)the Fundamental Research Funds for the Central Universities(Grant No.HIT.NSRIF.201160)the China Postdoctoral Science Foundation(Grant No.20110491067)
文摘Based on the framework of support vector machines (SVM) using one-against-one (OAO) strategy, a new multi-class kernel method based on directed aeyclie graph (DAG) and probabilistic distance is proposed to raise the multi-class classification accuracies. The topology structure of DAG is constructed by rearranging the nodes' sequence in the graph. DAG is equivalent to guided operating SVM on a list, and the classification performance depends on the nodes' sequence in the graph. Jeffries-Matusita distance (JMD) is introduced to estimate the separability of each class, and the implementation list is initialized with all classes organized according to certain sequence in the list. To testify the effectiveness of the proposed method, numerical analysis is conducted on UCI data and hyperspectral data. Meanwhile, comparative studies using standard OAO and DAG classification methods are also conducted and the results illustrate better performance and higher accuracy of the orooosed JMD-DAG method.
文摘Support vector machines (SVMs) are not as favored for large-scale data mining as for pattern recognition and machine learning because the training complexity of SVMs is highly dependent on the size of data set. This paper presents a geometric distance-based SVM (GDB-SVM). It takes the distance between a point and classified hyperplane as classification rule,and is designed on the basis of theoretical analysis and geometric intuition. Experimental code is derived from LibSVM with Microsoft Visual C ++ 6.0 as system of translating and editing. Four predicted results of five of GDB-SVM are better than those of the method of one against all (OAA). Three predicted results of five of GDB-SVM are better than those of the method of one against one (OAO). Experiments on real data sets show that GDB-SVM is not only superior to the methods of OAA and OAO, but highly scalable for large data sets while generating high classification accuracy.
文摘Many animals possess actively movable tactile sensors in their heads,to explore the near-range space.During locomotion,an antenna is used in near range orientation,for example,in detecting,localizing,probing,and negotiating obstacles.A bionic tactile sensor used in the present work was inspired by the antenna of the stick insects.The sensor is able to detect an obstacle and its location in 3 D(Three dimensional) space.The vibration signals are analyzed in the frequency domain using Fast Fourier Transform(FFT) to estimate the distances.Signal processing algorithms,Artificial Neural Network(ANN) and Support Vector Machine(SVM) are used for the analysis and prediction processes.These three prediction techniques are compared for both distance estimation and material classification processes.When estimating the distances,the accuracy of estimation is deteriorated towards the tip of the probe due to the change in the vibration modes.Since the vibration data within that region have high a variance,the accuracy in distance estimation and material classification are lower towards the tip.The change in vibration mode is mathematically analyzed and a solution is proposed to estimate the distance along the full range of the probe.
基金Key Program of Natural Science Foundation of Shandong Province(No.ZR2013FZ002)Program of Science and Technology of Suzhou(No.ZXY2013030)Independent Innovation Foundation of Shandong University(No.2012DX008)
文摘Seizure detection is extremely essential for long-term monitoring of epileptic patients. This paper investigates the detection of epileptic seizures in multi-channel long-term intracranial electroencephalogram (iEEG). The algorithm conducts wavelet decomposition of iEEGs with five scales, and transforms the sum of the three frequency bands into histogram for computing the distance. The proposed method combines a novel feature called EMD-L1, which is an efficient algorithm of earth movers' distance (EMD), with support vector machine (SVM) for binary classification between seizures and non-sei- zures. The EMD-LI used in this method is characterized by low time complexity and high processing speed by exploiting the L~ metric structure. The smoothing and collar technique are applied on the raw outputs of SVM classifier to obtain more ac- curate results. Several evaluation criteria are recommended to compare our algorithm with other conventional methods using the same dataset from the Freiburg EEG database. Experiment results show that the proposed method achieves a high sensi- tivity, specificity and low false detection rate, which are 95.73 %, 98.45 % and 0.33/h, respectively. This algorithm is char- acterized by its robustness and high accuracy with the possibility of performing real-time analysis of EEG data, and may serve as a seizure detection tool for monitoring long-term EEG.
文摘In the present paper we investigate the relationship between Wiener number W, hyper-Wiener number R, Wiener vectors WV, hyper-Wiener vectors HWV, Wiener polynomial H, hyper-Wiener polynomial HH and distance distribution DD of a (molecular) graph. It is shown that for connected graphs G and G*, the following five statements are equivalent:?;and if G and G* have same distance distribution DD then they have same W and R but the contrary is not true. Therefore, we further investigate the graphs with same distance distribution. Some construction methods for finding graphs with same distance distribution are given.
基金Supported by National 863 High Technology Research and Development Program Foundation of China (No.2006AA-01Z208)Six Talented Eminence Foundation of Jiangsu Province (06-E-043), China+1 种基金Natural Science Foundation of Jiangsu Province, China (No.BK2007236)Scientific Innovation Project for Postgraduates of Universities in Jiangsu Province (CX08B-082Z)
文摘In location-aided routing of Mobile Ad hoc NETworks(MANET),nodes mobility and the inaccuracy of location information may result in constant flooding,which will reduce the network performance.In this paper,a Distance-Based Location-Aided Routing(DBLAR) for MANET has been proposed.By tracing the location information of destination nodes and referring to distance change between nodes to adjust route discovery dynamically,the proposed routing algorithm can avoid flooding in the whole networks.Besides,Distance Update Threshold(DUT) is set up to reach the balance between real-time ability and update overhead of location information of nodes,meanwhile,the detection of relative distance vector can achieve the goal of adjusting forwarding condition.Simulation results reveal that DBLAR performs better than LAR1 in terms of packet successful delivery ratio,average end-to-end delay and routing-load,and the set of DUT and relative distance vector has a significant impact on this algorithm.
基金supported by National Science and Technology Support Project of the 12th Five-Year Plan of China (Grant No.2014BAL01B04)Sichuan Provincial Department of Land and Resources Research Project (Grant No.KJ-2018-13)
文摘Aiming at the rapid identification of rural buildings in complex environments from high-spatialresolution images, an improved Mahalanobis distance colour segmentation method(IMDCSM) is proposed and realised in Red, Green and Blue(RGB) space. Vector sets of a lower discrete degree are obtained by filtering the colour vector sets of the building samples, and a standard ellipsoid equation can be constructed based on these vector sets. The threshold of interested colour range can be flexibly and intuitively selected by changing the shape and size of this ellipsoid. Then, according to the relationship between the location of the image pixel colour vector and the ellipsoid, all building information can be extracted quickly. To verify the effectiveness of the proposed method, unmanned aerial vehicle(UAV) images of two areas in the suburbs of Chengdu city and Deyang city were utilised as experimental data for image segmentation, and the existing colour segmentation method based on the Mahalanobis distance was selected as an indicator to assess the effectiveness of this method. The experimental results demonstrate that the completeness and correctness of this method reached 95% and 83.0%, respectively, values that are higher than those of the Mahalanobis distance colour segmentation method(MDCSM). In general, this method is suitable for the rapid extraction of rural building information, and provides a new threshold selection method for classification.
基金supported by Tsinghua University Spring Breeze Fund(2020Z99CFY044)Tsinghua University start-up fundTsinghua University Education Foundation fund(042202008)。
文摘The severe acute respiratory syndrome COVID-19 was discovered on December 31,2019 in China.Subsequently,many COVID-19 cases were reported in many other countries.However,some positive COVID-19 samples had been reported earlier than those officially accepted by health authorities in other countries,such as France and Italy.Thus,it is of great importance to determine the place where SARS-CoV-2 was first transmitted to human.To this end,we analyze genomes of SARS-CoV-2 using k-mer natural vector method and compare the similarities of global SARS-CoV-2 genomes by a new natural metric.Because it is commonly accepted that SARS-CoV-2 is originated from bat coronavirus RaTG13,we only need to determine which SARS-CoV-2 genome sequence has the closest distance to bat coronavirus RaTG13 under our natural metric.From our analysis,SARS-CoV-2 most likely has already existed in other countries such as France,India,Netherland,England and United States before the outbreak at Wuhan,China.
文摘A novel permutation-dependent Baire distance is introduced for multi-channel data. The optimal permutation is given by minimizing the sum of these pairwise distances. It is shown that for most practical cases the minimum is attained by a new gradient descent algorithm introduced in this article. It is of biquadratic time complexity: Both quadratic in number of channels and in size of data. The optimal permutation allows us to introduce a novel Baire-distance kernel Support Vector Machine (SVM). Applied to benchmark hyperspectral remote sensing data, this new SVM produces results which are comparable with the classical linear SVM, but with higher kernel target alignment.
基金Project (No. [2005]555) supported by the Hi-Tech Research and De-velopment Program (863) of China
文摘Various index structures have recently been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one-dimensional (1D) transformation can break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index is proposed, called Bit-code and Distance based index (BD). BD is based on a special partitioning strategy which is optimized for high-dimensional data. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a 1D vector, the key managed by a B+-tree. A new KNN search algorithm is also proposed that exploits the bit code and distance to prune the search space more effectively. Results of extensive experiments using both synthetic and real data demonstrated that BD out- performs the existing index structures for KNN search in high-dimensional spaces.
文摘In the present paper, the problem of handwritten character recognition has been tackled with multiresolution technique using discrete wavelet transform (DWT) and Euclidean distance metric (EDM). The technique has been tested and found to be more accurate and faster. Characters is classified into 26 pattern classes based on appropriate properties. Features of the handwritten character images are extracted by DWT used with appropriate level of multiresolution technique, and then each pattern class is characterized by a mean vector. Distances from input pattern vector to all the mean vectors are computed by EDM. Minimum distance determines the class membership of input pattern vector. The proposed method provides good recognition accuracy of 90% for handwritten characters even with fewer samples.
基金This work was supported in part by the National Natural Science Foundation of China under Grant 61271093,Grant 61471146, and the Program ofMinistry of Education for New Century Excellent Talents under Grant NCET-12-0150
文摘Distance metric learning plays an important role in many machine learning tasks. In this paper, we propose a method for learning a Mahanalobis distance metric. By formulating the metric learning problem with relative distance constraints, we suggest a Relative Distance Constrained Metric Learning (RDCML) model which can be easily implemented and effectively solved by a modified support vector machine (SVM) approach. Experimental results on UCI datasets and handwritten digits datasets show that RDCML achieves better or comparable classification accuracy when compared with the state-of-the-art metric learning methods.
文摘股市的情绪化倾向是股票市场具有高度不确定性的主要原因,直接利用历史数据的股票趋势预测方法难以适应市场情绪的多变性,在实际应用中效果不理想。文章针对市场情绪的不稳定性导致股市拐点难以预测的问题,提出一种基于情绪向量的隐半马尔可夫模型股市拐点预测方法(hidden semi-Markov model stock turning point prediction method based on sentiment vector,SV-HSMM)。针对市场情绪不可观察性,选取与市场情绪相关的主要特征,使用马尔可夫毯融合成市场情绪;利用隐半马尔可夫模型建模市场环境,构建市场情绪、市场状态和状态持续时间之间的结构关系;引入情绪向量平滑情绪的多变性,并利用Kullback-Leibler(KL)距离量化情绪热度;利用隐半马尔可夫模型的动态推理实现股市拐点预测。结果表明情绪向量方法具有更好的预测效果。
文摘针对群组机器人系统在应急场景下移动频繁、能量有限的特点,提出了一种基于能量与速度的分簇自组织按需距离矢量协议(Clustered Ad hoc On-Demand Distance Vector Protocol Based on Energy and Speed,ESC-AODV),以延长群组机器人网络运行时间,提高通信可靠性。用路由性能代替跳数作为路由判据,目的节点在重复接收到路由请求(Route Request,RREQ)数据包时,若路由性能更小,则回复路由应答(Routing Reply,RREP)数据包,以此选择更好的路由,引入分簇结构,通过簇头和网关组成的骨干网络减少广播洪泛次数。实验结果证明,节点数量多时,改进的ESC-AODV协议在延长网络生存时间的同时,平均端到端时延、数据包投递率、吞吐量和路由开销均优于AODV以及基于能量、负载和速度的AODV路由协议(AODV Routing Protocol Based on Energy,Load and Speed,ELS-AODV)。ESC-AODV协议能够节约网络能量,提高可靠性,获得更优的网络性能。