Zero-shot learning enables the recognition of new class samples by migrating models learned from semanticfeatures and existing sample features to things that have never been seen before. The problems of consistencyof ...Zero-shot learning enables the recognition of new class samples by migrating models learned from semanticfeatures and existing sample features to things that have never been seen before. The problems of consistencyof different types of features and domain shift problems are two of the critical issues in zero-shot learning. Toaddress both of these issues, this paper proposes a new modeling structure. The traditional approach mappedsemantic features and visual features into the same feature space;based on this, a dual discriminator approachis used in the proposed model. This dual discriminator approach can further enhance the consistency betweensemantic and visual features. At the same time, this approach can also align unseen class semantic features andtraining set samples, providing a portion of information about the unseen classes. In addition, a new feature fusionmethod is proposed in the model. This method is equivalent to adding perturbation to the seen class features,which can reduce the degree to which the classification results in the model are biased towards the seen classes.At the same time, this feature fusion method can provide part of the information of the unseen classes, improvingits classification accuracy in generalized zero-shot learning and reducing domain bias. The proposed method isvalidated and compared with othermethods on four datasets, and fromthe experimental results, it can be seen thatthe method proposed in this paper achieves promising results.展开更多
Scarring is one of the biggest areas of unmet need in the long-term success of glaucoma filtration surgery.Quantitative evaluation of the scar tissue and the post-operative structure with micron scale resolution facil...Scarring is one of the biggest areas of unmet need in the long-term success of glaucoma filtration surgery.Quantitative evaluation of the scar tissue and the post-operative structure with micron scale resolution facilitates development of anti-fibrosis techniques.However,the distinguishment of conjunctiva,sclera and the scar tissue in the surgical area still relies on pathologists'experience.Since polarized light imaging is sensitive to anisotropic properties of the media,it is ideal for discrimination of scar in the subconjunctival and episcleral area by characterizing small differences between proportion,organization and the orientation of the fibers.In this paper,we defined the conjunctiva,sclera,and the scar tissue as three target tissues after glaucoma filtration surgery and obtained their polarization characteristics from the tissue sections by a Mueller matrix microscope.Discrimination score based on parameters derived from Mueller matrix and machine learning was calculated and tested as a diagnostic index.As a result,the discrimination score of three target tissues showed significant difference between each other(p<0.001).The visualization of the discrimination results showed significant contrast between target tissues.This study proved that Mueller matrix imaging is effective in ocular scar discrimination and paves the way for its application on other forms of ocular fibrosis as a substitute or supplementary for clinical practice.展开更多
Recently,sparse representation classification(SRC)and fisher discrimination dictionary learning(FDDL)methods have emerged as important methods for vehicle classification.In this paper,inspired by recent breakthroughs ...Recently,sparse representation classification(SRC)and fisher discrimination dictionary learning(FDDL)methods have emerged as important methods for vehicle classification.In this paper,inspired by recent breakthroughs of discrimination dictionary learning approach and multi-task joint covariate selection,we focus on the problem of vehicle classification in real-world applications by formulating it as a multi-task joint sparse representation model based on fisher discrimination dictionary learning to merge the strength of multiple features among multiple sensors.To improve the classification accuracy in complex scenes,we develop a new method,called multi-task joint sparse representation classification based on fisher discrimination dictionary learning,for vehicle classification.In our proposed method,the acoustic and seismic sensor data sets are captured to measure the same physical event simultaneously by multiple heterogeneous sensors and the multi-dimensional frequency spectrum features of sensors data are extracted using Mel frequency cepstral coefficients(MFCC).Moreover,we extend our model to handle sparse environmental noise.We experimentally demonstrate the benefits of joint information fusion based on fisher discrimination dictionary learning from different sensors in vehicle classification tasks.展开更多
Periodontitis is closely related to many systemic diseases linked by different periodontal pathogens.To unravel the relationship between periodontitis and systemic diseases,it is very important to correctly discrimina...Periodontitis is closely related to many systemic diseases linked by different periodontal pathogens.To unravel the relationship between periodontitis and systemic diseases,it is very important to correctly discriminate major periodontal pathogens.To realize convenient,effcient,and high-accuracy bacterial species classification,the authors use Raman spectroscopy combined with machine learning algorithms to distinguish three major periodontal pathogens Porphyromonas gingivalis(Pg),Fusobacterium nucleatum(Fn),and Aggregatibacter actinomycetemcomitans(Aa).The result shows that this novel method can successfully discriminate the three abovementioned periodontal pathogens.Moreover,the classification accuracies for the three categories of the original data were 94.7%at the sample level and 93.9%at the spectrum level by the machine learning algorithm extra trees.This study provides a fast,simple,and accurate method which is very beneficial to differentiate periodontal pathogens.展开更多
Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-drive...Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective,difficult to quantify,and no pertinence.As a new research method for landslide susceptibility assessment,machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models.Taking Western Henan for example,the study selected 16 landslide influencing factors such as topography,geological environment,hydrological conditions,and human activities,and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination(RFE)method.Five machine learning methods[Support Vector Machines(SVM),Logistic Regression(LR),Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Linear Discriminant Analysis(LDA)]were used to construct the spatial distribution model of landslide susceptibility.The models were evaluated by the receiver operating characteristic curve and statistical index.After analysis and comparison,the XGBoost model(AUC 0.8759)performed the best and was suitable for dealing with regression problems.The model had a high adaptability to landslide data.According to the landslide susceptibility map of the five models,the overall distribution can be observed.The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest,the Xiaoshan Mountain range in the west,and the Yellow River Basin in the north.These areas have large terrain fluctuations,complicated geological structural environments and frequent human engineering activities.The extremely high and highly prone areas were 12043.3 km^(2)and 3087.45 km^(2),accounting for 47.61%and 12.20%of the total area of the study area,respectively.Our study reflects the distribution of landslide susceptibility in western Henan Province,which provides a scientific basis for regional disaster warning,prediction,and resource protection.The study has important practical significance for subsequent landslide disaster management.展开更多
Sparse-representation-based single-channel source separation,which aims to recover each source’s signal using its corresponding sub-dictionary,has attracted many scholars’attention.The basic premise of this model is...Sparse-representation-based single-channel source separation,which aims to recover each source’s signal using its corresponding sub-dictionary,has attracted many scholars’attention.The basic premise of this model is that each sub-dictionary possesses discriminative information about its corresponding source,and this information can be used to recover almost every sample from that source.However,in a more general sense,the samples from a source are composed not only of discriminative information but also common information shared with other sources.This paper proposes learning a discriminative high-fidelity dictionary to improve the separation performance.The innovations are threefold.Firstly,an extra sub-dictionary was combined into a conventional union dictionary to ensure that the source-specific sub-dictionaries can capture only the purely discriminative information for their corresponding sources because the common information is collected in the additional sub-dictionary.Secondly,a task-driven learning algorithm is designed to optimize the new union dictionary and a set of weights that indicate how much of the common information should be allocated to each source.Thirdly,a source separation scheme based on the learned dictionary is presented.Experimental results on a human speech dataset yield evidence that our algorithm can achieve better separation performance than either state-of-the-art or traditional algorithms.展开更多
:Cross-project defect prediction(CPDP)aims to predict the defects on target project by using a prediction model built on source projects.The main problem in CPDP is the huge distribution gap between the source project...:Cross-project defect prediction(CPDP)aims to predict the defects on target project by using a prediction model built on source projects.The main problem in CPDP is the huge distribution gap between the source project and the target project,which prevents the prediction model from performing well.Most existing methods overlook the class discrimination of the learned features.Seeking an effective transferable model from the source project to the target project for CPDP is challenging.In this paper,we propose an unsupervised domain adaptation based on the discriminative subspace learning(DSL)approach for CPDP.DSL treats the data from two projects as being from two domains and maps the data into a common feature space.It employs crossdomain alignment with discriminative information from different projects to reduce the distribution difference of the data between different projects and incorporates the class discriminative information.Specifically,DSL first utilizes subspace learning based domain adaptation to reduce the distribution gap of data between different projects.Then,it makes full use of the class label information of the source project and transfers the discrimination ability of the source project to the target project in the common space.Comprehensive experiments on five projects verify that DSL can build an effective prediction model and improve the performance over the related competing methods by at least 7.10%and 11.08%in terms of G-measure and AUC.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
Intelligent diagnosis driven by big data for mechanical fault is an important means to ensure the safe operation ofequipment. In these methods, deep learning-based machinery fault diagnosis approaches have received in...Intelligent diagnosis driven by big data for mechanical fault is an important means to ensure the safe operation ofequipment. In these methods, deep learning-based machinery fault diagnosis approaches have received increasingattention and achieved some results. It might lead to insufficient performance for using transfer learning alone andcause misclassification of target samples for domain bias when building deep models to learn domain-invariantfeatures. To address the above problems, a deep discriminative adversarial domain adaptation neural networkfor the bearing fault diagnosis model is proposed (DDADAN). In this method, the raw vibration data are firstlyconverted into frequency domain data by Fast Fourier Transform, and an improved deep convolutional neuralnetwork with wide first-layer kernels is used as a feature extractor to extract deep fault features. Then, domaininvariant features are learned from the fault data with correlation alignment-based domain adversarial training.Furthermore, to enhance the discriminative property of features, discriminative feature learning is embeddedinto this network to make the features compact, as well as separable between classes within the class. Finally, theperformance and anti-noise capability of the proposedmethod are evaluated using two sets of bearing fault datasets.The results demonstrate that the proposed method is capable of handling domain offset caused by differentworkingconditions and maintaining more than 97.53% accuracy on various transfer tasks. Furthermore, the proposedmethod can achieve high diagnostic accuracy under varying noise levels.展开更多
Transfer learning aims to transfer source models to a target domain.Leveraging the feature matching can alleviate the domain shift effectively,but this process ignores the relationship of the marginal distribution mat...Transfer learning aims to transfer source models to a target domain.Leveraging the feature matching can alleviate the domain shift effectively,but this process ignores the relationship of the marginal distribution matching and the conditional distribution matching.Simultaneously,the discriminative information of both domains is also neglected,which is important for improving the performance on the target domain.In this paper,we propose a novel method called Balanced Discriminative Transfer Feature Learning for Visual Domain Adaptation(BDTFL).The proposed method can adaptively balance the relationship of both distribution matchings and capture the category discriminative information of both domains.Therefore,balanced feature matching can achieve more accurate feature matching and adaptively adjust itself to different scenes.At the same time,discriminative information is exploited to alleviate category confusion during feature matching.And with assistance of the category discriminative information captured from both domains,the source classifier can be transferred to the target domain more accurately and boost the performance of target classification.Extensive experiments show the superiority of BDTFL on popular visual cross-domain benchmarks.展开更多
Lithofacies identification is a crucial work in reservoir characterization and modeling.The vast inter-well area can be supplemented by facies identification of seismic data.However,the relationship between lithofacie...Lithofacies identification is a crucial work in reservoir characterization and modeling.The vast inter-well area can be supplemented by facies identification of seismic data.However,the relationship between lithofacies and seismic information that is affected by many factors is complicated.Machine learning has received extensive attention in recent years,among which support vector machine(SVM) is a potential method for lithofacies classification.Lithofacies classification involves identifying various types of lithofacies and is generally a nonlinear problem,which needs to be solved by means of the kernel function.Multi-kernel learning SVM is one of the main tools for solving the nonlinear problem about multi-classification.However,it is very difficult to determine the kernel function and the parameters,which is restricted by human factors.Besides,its computational efficiency is low.A lithofacies classification method based on local deep multi-kernel learning support vector machine(LDMKL-SVM) that can consider low-dimensional global features and high-dimensional local features is developed.The method can automatically learn parameters of kernel function and SVM to build a relationship between lithofacies and seismic elastic information.The calculation speed will be expedited at no cost with respect to discriminant accuracy for multi-class lithofacies identification.Both the model data test results and the field data application results certify advantages of the method.This contribution offers an effective method for lithofacies recognition and reservoir prediction by using SVM.展开更多
This study proposes an approach based on machine learning to forecast currency exchange rates by applying sentiment analysis to messages on Twitter(called tweets).A dataset of the exchange rates between the United Sta...This study proposes an approach based on machine learning to forecast currency exchange rates by applying sentiment analysis to messages on Twitter(called tweets).A dataset of the exchange rates between the United States Dollar(USD)and the Pakistani Rupee(PKR)was formed by collecting information from a forex website as well as a collection of tweets from the business community in Pakistan containing finance-related words.The dataset was collected in raw form,and was subjected to natural language processing by way of data preprocessing.Response variable labeling was then applied to the standardized dataset,where the response variables were divided into two classes:“1”indicated an increase in the exchange rate and“−1”indicated a decrease in it.To better represent the dataset,we used linear discriminant analysis and principal component analysis to visualize the data in three-dimensional vector space.Clusters that were obtained using a sampling approach were then used for data optimization.Five machine learning classifiers—the simple logistic classifier,the random forest,bagging,naïve Bayes,and the support vector machine—were applied to the optimized dataset.The results show that the simple logistic classifier yielded the highest accuracy of 82.14%for the USD and the PKR exchange rates forecasting.展开更多
Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorith...Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorithms,namely Linear Discriminant Analysis(LDA),Support Vector Machines(SVMs),k-nearest neighbor(kNN),Naïve Bayes(NB),Recursive Partitioning and Regression Trees(RPART),and Random Forest(RF),were selected to forecast the timings of barley flowering and maturity based on the Alaska Crop Datasets and climate data from 1991 to 2016 in Fairbanks,Alaska.Among 32 models fit to forecast flowering time,two from LDA,12 from SVMs,four from NB,three from RF outperformed models from other algorithms with the highest accuracy.Models from kNN performed worst to forecast flowering time.Among 32 models fit to forecast maturity time,two models from LDA outperformed the models from other algorithms.Models from kNN and RPART performed worst to forecast maturity time.Models from machine learning methods also provided a variable importance explanation.In this study,four out of six algorithms gave the same variable importance order.Sowing date was the most important variable to forecast flowering but less important variable to forecast maturity.The daily maximum temperature may be more important than daily minimum temperature to fit flowering models while daily minimum temperature may be more important than daily maximum temperature to fit maturity models.The results indicate that models from machine learning provide a promising technique in forecasting the timings of flowering and maturity of barley.展开更多
Due to the combined influences such as ore-forming temperature,fluid and metal sources,sphalerite tends to incorporate diverse contents of trace elements during the formation of different types of Lead-zinc(Pb-Zn)depo...Due to the combined influences such as ore-forming temperature,fluid and metal sources,sphalerite tends to incorporate diverse contents of trace elements during the formation of different types of Lead-zinc(Pb-Zn)deposits.Therefore,trace elements in sphalerite have long been utilized to distinguish Pb-Zn deposit types.However,previous discriminant diagrams usually contain two or three dimensions,which are limited to revealing the complicated interrelations between trace elements of sphalerite and the types of Pb-Zn deposits.In this study,we aim to prove that the sphalerite trace elements can be used to classify the Pb-Zn deposit types and extract key factors from sphalerite trace elements that can dis-criminate Pb-Zn deposit types using machine learning algorithms.A dataset of nearly 3600 sphalerite spot analyses from 95 Pb-Zn deposits worldwide determined by LA-ICP-MS was compiled from peer-reviewed publications,containing 12 elements(Mn,Fe,Co,Cu,Ga,Ge,Ag,Cd,In,Sn,Sb,and Pb)from 5 types,including Sedimentary Exhalative(SEDEX),Mississippi Valley Type(MVT),Volcanic Massive Sulfide(VMS),skarn,and epithermal deposits.Random Forests(RF)is applied to the data processing and the results show that trace elements of sphalerite can successfully discriminate different types of Pb-Zn deposits except for VMS deposits,most of which are falsely distinguished as skarn and epithermal types.To further discriminate VMS deposits,future studies could focus on enlarging the capacity of VMS deposits in datasets and applying other geological factors along with sphalerite trace elements when con-structing the classification model.RF’s feature importance and permutation feature importance were adopted to evaluate the element significance for classification.Besides,a visualized tool,t-distributed stochastic neighbor embedding(t-SNE),was used to verify the results of both classification and evalua-tion.The results presented here show that Mn,Co,and Ge display significant impacts on classification of Pb-Zn deposits and In,Ga,Sn,Cd,and Fe also have relatively important effects compared to the rest ele-ments,confirming that Pb-Zn deposits discrimination is mainly controlled by multi-elements in spha-lerite.Our study hence shows that machine learning algorithm can provide new insights into conventional geochemical analyses,inspiring future research on constructing classification models of mineral deposits using mineral geochemistry data.展开更多
BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To ...BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To explore the prediction accuracy of sociodemographic factors,Parkinson's motor symptoms,Parkinson’s non-motor symptoms,and rapid eye movement sleep disorder for diagnosing EOPD using PD multicenter registry data.METHODS This study analyzed 342 Parkinson patients(66 EOPD patients and 276 PD patients with normal cognition),younger than 65 years.An EOPD prediction model was developed using a random forest algorithm and the accuracy of the developed model was compared with the naive Bayesian model and discriminant analysis.RESULTS The overall accuracy of the random forest was 89.5%,and was higher than that of discriminant analysis(78.3%)and that of the naive Bayesian model(85.8%).In the random forest model,the Korean Mini Mental State Examination(K-MMSE)score,Korean Montreal Cognitive Assessment(K-MoCA),sum of boxes in Clinical Dementia Rating(CDR),global score of CDR,motor score of Untitled Parkinson’s Disease Rating(UPDRS),and Korean Instrumental Activities of Daily Living(KIADL)score were confirmed as the major variables with high weight for EOPD prediction.Among them,the K-MMSE score was the most important factor in the final model.CONCLUSION It was found that Parkinson-related motor symptoms(e.g.,motor score of UPDRS)and instrumental daily performance(e.g.,K-IADL score)in addition to cognitive screening indicators(e.g.,K-MMSE score and K-MoCA score)were predictors with high accuracy in EOPD prediction.展开更多
A multi-layer dictionary learning algorithm that joints global constraints and Fisher discrimination(JGCFD-MDL)for image classification tasks was proposed.The algorithm reveals the manifold structure of the data by le...A multi-layer dictionary learning algorithm that joints global constraints and Fisher discrimination(JGCFD-MDL)for image classification tasks was proposed.The algorithm reveals the manifold structure of the data by learning the global constraint dictionary and introduces the Fisher discriminative constraint dictionary to minimize the intra-class dispersion of samples and increase the inter-class dispersion.To further quantify the abstract features that characterize the data,a multi-layer dictionary learning framework is constructed to obtain high-level complex semantic structures and improve image classification performance.Finally,the algorithm is verified on the multi-label dataset of court costumes in the Ming Dynasty and Qing Dynasty,and better performance is obtained.Experiments show that compared with the local similarity algorithm,the average precision is improved by 3.34%.Compared with the single-layer dictionary learning algorithm,the one-error is improved by 1.00%,and the average precision is improved by 0.54%.Experiments also show that it has better performance on general datasets.展开更多
Agriculture 5.0 is an emerging concept where sensors,big data,Internet-of-Things(IoT),robots,and Artificial In-telligence(AI)are used for agricultural purposes.Different from Agriculture 4.0,robots and AI become the f...Agriculture 5.0 is an emerging concept where sensors,big data,Internet-of-Things(IoT),robots,and Artificial In-telligence(AI)are used for agricultural purposes.Different from Agriculture 4.0,robots and AI become the focus of the implementation in Agriculture 5.0.One of the applications of Agriculture 5.0 is weed management where robots are used to discriminate weeds from the crops or plants so that proper action can be performed to remove the weeds.This paper discusses an in-depth review of Machine Learning(ML)techniques used for discriminating weeds from crops or plants.We specifically present a detailed explanation of five steps required in using ML algorithms to distinguish between weeds and plants.展开更多
文摘Zero-shot learning enables the recognition of new class samples by migrating models learned from semanticfeatures and existing sample features to things that have never been seen before. The problems of consistencyof different types of features and domain shift problems are two of the critical issues in zero-shot learning. Toaddress both of these issues, this paper proposes a new modeling structure. The traditional approach mappedsemantic features and visual features into the same feature space;based on this, a dual discriminator approachis used in the proposed model. This dual discriminator approach can further enhance the consistency betweensemantic and visual features. At the same time, this approach can also align unseen class semantic features andtraining set samples, providing a portion of information about the unseen classes. In addition, a new feature fusionmethod is proposed in the model. This method is equivalent to adding perturbation to the seen class features,which can reduce the degree to which the classification results in the model are biased towards the seen classes.At the same time, this feature fusion method can provide part of the information of the unseen classes, improvingits classification accuracy in generalized zero-shot learning and reducing domain bias. The proposed method isvalidated and compared with othermethods on four datasets, and fromthe experimental results, it can be seen thatthe method proposed in this paper achieves promising results.
基金supported by the Natural Science Foundation of Beijing(No.7194266)Beijing Municipal Administration of Hospitals'Youth Program(No.QML20191206)+1 种基金Fundamental Research Funds for the Central Public Welfare Research Institutes(No.XTCX2021002)Scientific and technological innovation project of China Academy of Chinese Medical Sciences(No.CI2021A00601).
文摘Scarring is one of the biggest areas of unmet need in the long-term success of glaucoma filtration surgery.Quantitative evaluation of the scar tissue and the post-operative structure with micron scale resolution facilitates development of anti-fibrosis techniques.However,the distinguishment of conjunctiva,sclera and the scar tissue in the surgical area still relies on pathologists'experience.Since polarized light imaging is sensitive to anisotropic properties of the media,it is ideal for discrimination of scar in the subconjunctival and episcleral area by characterizing small differences between proportion,organization and the orientation of the fibers.In this paper,we defined the conjunctiva,sclera,and the scar tissue as three target tissues after glaucoma filtration surgery and obtained their polarization characteristics from the tissue sections by a Mueller matrix microscope.Discrimination score based on parameters derived from Mueller matrix and machine learning was calculated and tested as a diagnostic index.As a result,the discrimination score of three target tissues showed significant difference between each other(p<0.001).The visualization of the discrimination results showed significant contrast between target tissues.This study proved that Mueller matrix imaging is effective in ocular scar discrimination and paves the way for its application on other forms of ocular fibrosis as a substitute or supplementary for clinical practice.
基金This work was supported by National Natural Science Foundation of China(NSFC)under Grant No.61771299,No.61771322,No.61375015,No.61301027.
文摘Recently,sparse representation classification(SRC)and fisher discrimination dictionary learning(FDDL)methods have emerged as important methods for vehicle classification.In this paper,inspired by recent breakthroughs of discrimination dictionary learning approach and multi-task joint covariate selection,we focus on the problem of vehicle classification in real-world applications by formulating it as a multi-task joint sparse representation model based on fisher discrimination dictionary learning to merge the strength of multiple features among multiple sensors.To improve the classification accuracy in complex scenes,we develop a new method,called multi-task joint sparse representation classification based on fisher discrimination dictionary learning,for vehicle classification.In our proposed method,the acoustic and seismic sensor data sets are captured to measure the same physical event simultaneously by multiple heterogeneous sensors and the multi-dimensional frequency spectrum features of sensors data are extracted using Mel frequency cepstral coefficients(MFCC).Moreover,we extend our model to handle sparse environmental noise.We experimentally demonstrate the benefits of joint information fusion based on fisher discrimination dictionary learning from different sensors in vehicle classification tasks.
基金funded by the Major Program of Social Science Foundation of Tianjin Municipal Education Commission(2019JWZD53).
文摘Periodontitis is closely related to many systemic diseases linked by different periodontal pathogens.To unravel the relationship between periodontitis and systemic diseases,it is very important to correctly discriminate major periodontal pathogens.To realize convenient,effcient,and high-accuracy bacterial species classification,the authors use Raman spectroscopy combined with machine learning algorithms to distinguish three major periodontal pathogens Porphyromonas gingivalis(Pg),Fusobacterium nucleatum(Fn),and Aggregatibacter actinomycetemcomitans(Aa).The result shows that this novel method can successfully discriminate the three abovementioned periodontal pathogens.Moreover,the classification accuracies for the three categories of the original data were 94.7%at the sample level and 93.9%at the spectrum level by the machine learning algorithm extra trees.This study provides a fast,simple,and accurate method which is very beneficial to differentiate periodontal pathogens.
基金This work was financially supported by National Natural Science Foundation of China(41972262)Hebei Natural Science Foundation for Excellent Young Scholars(D2020504032)+1 种基金Central Plains Science and technology innovation leader Project(214200510030)Key research and development Project of Henan province(221111321500).
文摘Landslide is a serious natural disaster next only to earthquake and flood,which will cause a great threat to people’s lives and property safety.The traditional research of landslide disaster based on experience-driven or statistical model and its assessment results are subjective,difficult to quantify,and no pertinence.As a new research method for landslide susceptibility assessment,machine learning can greatly improve the landslide susceptibility model’s accuracy by constructing statistical models.Taking Western Henan for example,the study selected 16 landslide influencing factors such as topography,geological environment,hydrological conditions,and human activities,and 11 landslide factors with the most significant influence on the landslide were selected by the recursive feature elimination(RFE)method.Five machine learning methods[Support Vector Machines(SVM),Logistic Regression(LR),Random Forest(RF),Extreme Gradient Boosting(XGBoost),and Linear Discriminant Analysis(LDA)]were used to construct the spatial distribution model of landslide susceptibility.The models were evaluated by the receiver operating characteristic curve and statistical index.After analysis and comparison,the XGBoost model(AUC 0.8759)performed the best and was suitable for dealing with regression problems.The model had a high adaptability to landslide data.According to the landslide susceptibility map of the five models,the overall distribution can be observed.The extremely high and high susceptibility areas are distributed in the Funiu Mountain range in the southwest,the Xiaoshan Mountain range in the west,and the Yellow River Basin in the north.These areas have large terrain fluctuations,complicated geological structural environments and frequent human engineering activities.The extremely high and highly prone areas were 12043.3 km^(2)and 3087.45 km^(2),accounting for 47.61%and 12.20%of the total area of the study area,respectively.Our study reflects the distribution of landslide susceptibility in western Henan Province,which provides a scientific basis for regional disaster warning,prediction,and resource protection.The study has important practical significance for subsequent landslide disaster management.
基金This work was supported by the National Natural Science Foundation of China(62001489)the scientific research planning project of National University of Defense Technology(JS19-04).
文摘Sparse-representation-based single-channel source separation,which aims to recover each source’s signal using its corresponding sub-dictionary,has attracted many scholars’attention.The basic premise of this model is that each sub-dictionary possesses discriminative information about its corresponding source,and this information can be used to recover almost every sample from that source.However,in a more general sense,the samples from a source are composed not only of discriminative information but also common information shared with other sources.This paper proposes learning a discriminative high-fidelity dictionary to improve the separation performance.The innovations are threefold.Firstly,an extra sub-dictionary was combined into a conventional union dictionary to ensure that the source-specific sub-dictionaries can capture only the purely discriminative information for their corresponding sources because the common information is collected in the additional sub-dictionary.Secondly,a task-driven learning algorithm is designed to optimize the new union dictionary and a set of weights that indicate how much of the common information should be allocated to each source.Thirdly,a source separation scheme based on the learned dictionary is presented.Experimental results on a human speech dataset yield evidence that our algorithm can achieve better separation performance than either state-of-the-art or traditional algorithms.
基金This paper was supported by the National Natural Science Foundation of China(61772286,61802208,and 61876089)China Postdoctoral Science Foundation Grant 2019M651923Natural Science Foundation of Jiangsu Province of China(BK0191381).
文摘:Cross-project defect prediction(CPDP)aims to predict the defects on target project by using a prediction model built on source projects.The main problem in CPDP is the huge distribution gap between the source project and the target project,which prevents the prediction model from performing well.Most existing methods overlook the class discrimination of the learned features.Seeking an effective transferable model from the source project to the target project for CPDP is challenging.In this paper,we propose an unsupervised domain adaptation based on the discriminative subspace learning(DSL)approach for CPDP.DSL treats the data from two projects as being from two domains and maps the data into a common feature space.It employs crossdomain alignment with discriminative information from different projects to reduce the distribution difference of the data between different projects and incorporates the class discriminative information.Specifically,DSL first utilizes subspace learning based domain adaptation to reduce the distribution gap of data between different projects.Then,it makes full use of the class label information of the source project and transfers the discrimination ability of the source project to the target project in the common space.Comprehensive experiments on five projects verify that DSL can build an effective prediction model and improve the performance over the related competing methods by at least 7.10%and 11.08%in terms of G-measure and AUC.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
基金the Natural Science Foundation of Henan Province(232300420094)the Science and TechnologyResearch Project of Henan Province(222102220092).
文摘Intelligent diagnosis driven by big data for mechanical fault is an important means to ensure the safe operation ofequipment. In these methods, deep learning-based machinery fault diagnosis approaches have received increasingattention and achieved some results. It might lead to insufficient performance for using transfer learning alone andcause misclassification of target samples for domain bias when building deep models to learn domain-invariantfeatures. To address the above problems, a deep discriminative adversarial domain adaptation neural networkfor the bearing fault diagnosis model is proposed (DDADAN). In this method, the raw vibration data are firstlyconverted into frequency domain data by Fast Fourier Transform, and an improved deep convolutional neuralnetwork with wide first-layer kernels is used as a feature extractor to extract deep fault features. Then, domaininvariant features are learned from the fault data with correlation alignment-based domain adversarial training.Furthermore, to enhance the discriminative property of features, discriminative feature learning is embeddedinto this network to make the features compact, as well as separable between classes within the class. Finally, theperformance and anti-noise capability of the proposedmethod are evaluated using two sets of bearing fault datasets.The results demonstrate that the proposed method is capable of handling domain offset caused by differentworkingconditions and maintaining more than 97.53% accuracy on various transfer tasks. Furthermore, the proposedmethod can achieve high diagnostic accuracy under varying noise levels.
文摘Transfer learning aims to transfer source models to a target domain.Leveraging the feature matching can alleviate the domain shift effectively,but this process ignores the relationship of the marginal distribution matching and the conditional distribution matching.Simultaneously,the discriminative information of both domains is also neglected,which is important for improving the performance on the target domain.In this paper,we propose a novel method called Balanced Discriminative Transfer Feature Learning for Visual Domain Adaptation(BDTFL).The proposed method can adaptively balance the relationship of both distribution matchings and capture the category discriminative information of both domains.Therefore,balanced feature matching can achieve more accurate feature matching and adaptively adjust itself to different scenes.At the same time,discriminative information is exploited to alleviate category confusion during feature matching.And with assistance of the category discriminative information captured from both domains,the source classifier can be transferred to the target domain more accurately and boost the performance of target classification.Extensive experiments show the superiority of BDTFL on popular visual cross-domain benchmarks.
基金financially supported by the National Natural Science Foundation of China (41774129, 41904116)the Foundation Research Project of Shaanxi Provincial Key Laboratory of Geological Support for Coal Green Exploitation (MTy2019-20)。
文摘Lithofacies identification is a crucial work in reservoir characterization and modeling.The vast inter-well area can be supplemented by facies identification of seismic data.However,the relationship between lithofacies and seismic information that is affected by many factors is complicated.Machine learning has received extensive attention in recent years,among which support vector machine(SVM) is a potential method for lithofacies classification.Lithofacies classification involves identifying various types of lithofacies and is generally a nonlinear problem,which needs to be solved by means of the kernel function.Multi-kernel learning SVM is one of the main tools for solving the nonlinear problem about multi-classification.However,it is very difficult to determine the kernel function and the parameters,which is restricted by human factors.Besides,its computational efficiency is low.A lithofacies classification method based on local deep multi-kernel learning support vector machine(LDMKL-SVM) that can consider low-dimensional global features and high-dimensional local features is developed.The method can automatically learn parameters of kernel function and SVM to build a relationship between lithofacies and seismic elastic information.The calculation speed will be expedited at no cost with respect to discriminant accuracy for multi-class lithofacies identification.Both the model data test results and the field data application results certify advantages of the method.This contribution offers an effective method for lithofacies recognition and reservoir prediction by using SVM.
文摘This study proposes an approach based on machine learning to forecast currency exchange rates by applying sentiment analysis to messages on Twitter(called tweets).A dataset of the exchange rates between the United States Dollar(USD)and the Pakistani Rupee(PKR)was formed by collecting information from a forex website as well as a collection of tweets from the business community in Pakistan containing finance-related words.The dataset was collected in raw form,and was subjected to natural language processing by way of data preprocessing.Response variable labeling was then applied to the standardized dataset,where the response variables were divided into two classes:“1”indicated an increase in the exchange rate and“−1”indicated a decrease in it.To better represent the dataset,we used linear discriminant analysis and principal component analysis to visualize the data in three-dimensional vector space.Clusters that were obtained using a sampling approach were then used for data optimization.Five machine learning classifiers—the simple logistic classifier,the random forest,bagging,naïve Bayes,and the support vector machine—were applied to the optimized dataset.The results show that the simple logistic classifier yielded the highest accuracy of 82.14%for the USD and the PKR exchange rates forecasting.
文摘Interior Alaska has a short growing season of 110 d.The knowledge of timings of crop flowering and maturity will provide the information for the agricultural decision making.In this study,six machine learning algorithms,namely Linear Discriminant Analysis(LDA),Support Vector Machines(SVMs),k-nearest neighbor(kNN),Naïve Bayes(NB),Recursive Partitioning and Regression Trees(RPART),and Random Forest(RF),were selected to forecast the timings of barley flowering and maturity based on the Alaska Crop Datasets and climate data from 1991 to 2016 in Fairbanks,Alaska.Among 32 models fit to forecast flowering time,two from LDA,12 from SVMs,four from NB,three from RF outperformed models from other algorithms with the highest accuracy.Models from kNN performed worst to forecast flowering time.Among 32 models fit to forecast maturity time,two models from LDA outperformed the models from other algorithms.Models from kNN and RPART performed worst to forecast maturity time.Models from machine learning methods also provided a variable importance explanation.In this study,four out of six algorithms gave the same variable importance order.Sowing date was the most important variable to forecast flowering but less important variable to forecast maturity.The daily maximum temperature may be more important than daily minimum temperature to fit flowering models while daily minimum temperature may be more important than daily maximum temperature to fit maturity models.The results indicate that models from machine learning provide a promising technique in forecasting the timings of flowering and maturity of barley.
基金We would like to acknowledge the financial support of the Ministry of Science and Technology of China(Grant No.2021YFC2900300)the National Natural Science Foundation of China(Grant Nos.41772074 and 42172103).
文摘Due to the combined influences such as ore-forming temperature,fluid and metal sources,sphalerite tends to incorporate diverse contents of trace elements during the formation of different types of Lead-zinc(Pb-Zn)deposits.Therefore,trace elements in sphalerite have long been utilized to distinguish Pb-Zn deposit types.However,previous discriminant diagrams usually contain two or three dimensions,which are limited to revealing the complicated interrelations between trace elements of sphalerite and the types of Pb-Zn deposits.In this study,we aim to prove that the sphalerite trace elements can be used to classify the Pb-Zn deposit types and extract key factors from sphalerite trace elements that can dis-criminate Pb-Zn deposit types using machine learning algorithms.A dataset of nearly 3600 sphalerite spot analyses from 95 Pb-Zn deposits worldwide determined by LA-ICP-MS was compiled from peer-reviewed publications,containing 12 elements(Mn,Fe,Co,Cu,Ga,Ge,Ag,Cd,In,Sn,Sb,and Pb)from 5 types,including Sedimentary Exhalative(SEDEX),Mississippi Valley Type(MVT),Volcanic Massive Sulfide(VMS),skarn,and epithermal deposits.Random Forests(RF)is applied to the data processing and the results show that trace elements of sphalerite can successfully discriminate different types of Pb-Zn deposits except for VMS deposits,most of which are falsely distinguished as skarn and epithermal types.To further discriminate VMS deposits,future studies could focus on enlarging the capacity of VMS deposits in datasets and applying other geological factors along with sphalerite trace elements when con-structing the classification model.RF’s feature importance and permutation feature importance were adopted to evaluate the element significance for classification.Besides,a visualized tool,t-distributed stochastic neighbor embedding(t-SNE),was used to verify the results of both classification and evalua-tion.The results presented here show that Mn,Co,and Ge display significant impacts on classification of Pb-Zn deposits and In,Ga,Sn,Cd,and Fe also have relatively important effects compared to the rest ele-ments,confirming that Pb-Zn deposits discrimination is mainly controlled by multi-elements in spha-lerite.Our study hence shows that machine learning algorithm can provide new insights into conventional geochemical analyses,inspiring future research on constructing classification models of mineral deposits using mineral geochemistry data.
基金Supported by Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Education,No.NRF-2018R1D1A1B07041091 and NRF-2019S1A5A8034211.
文摘BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To explore the prediction accuracy of sociodemographic factors,Parkinson's motor symptoms,Parkinson’s non-motor symptoms,and rapid eye movement sleep disorder for diagnosing EOPD using PD multicenter registry data.METHODS This study analyzed 342 Parkinson patients(66 EOPD patients and 276 PD patients with normal cognition),younger than 65 years.An EOPD prediction model was developed using a random forest algorithm and the accuracy of the developed model was compared with the naive Bayesian model and discriminant analysis.RESULTS The overall accuracy of the random forest was 89.5%,and was higher than that of discriminant analysis(78.3%)and that of the naive Bayesian model(85.8%).In the random forest model,the Korean Mini Mental State Examination(K-MMSE)score,Korean Montreal Cognitive Assessment(K-MoCA),sum of boxes in Clinical Dementia Rating(CDR),global score of CDR,motor score of Untitled Parkinson’s Disease Rating(UPDRS),and Korean Instrumental Activities of Daily Living(KIADL)score were confirmed as the major variables with high weight for EOPD prediction.Among them,the K-MMSE score was the most important factor in the final model.CONCLUSION It was found that Parkinson-related motor symptoms(e.g.,motor score of UPDRS)and instrumental daily performance(e.g.,K-IADL score)in addition to cognitive screening indicators(e.g.,K-MMSE score and K-MoCA score)were predictors with high accuracy in EOPD prediction.
基金supported by the National Key Research and Development Project(2021YFF0901701)。
文摘A multi-layer dictionary learning algorithm that joints global constraints and Fisher discrimination(JGCFD-MDL)for image classification tasks was proposed.The algorithm reveals the manifold structure of the data by learning the global constraint dictionary and introduces the Fisher discriminative constraint dictionary to minimize the intra-class dispersion of samples and increase the inter-class dispersion.To further quantify the abstract features that characterize the data,a multi-layer dictionary learning framework is constructed to obtain high-level complex semantic structures and improve image classification performance.Finally,the algorithm is verified on the multi-label dataset of court costumes in the Ming Dynasty and Qing Dynasty,and better performance is obtained.Experiments show that compared with the local similarity algorithm,the average precision is improved by 3.34%.Compared with the single-layer dictionary learning algorithm,the one-error is improved by 1.00%,and the average precision is improved by 0.54%.Experiments also show that it has better performance on general datasets.
基金ASEAN-India Collaborative R&D scheme under ASEAN-India S&T Development Fund(AISTDF)File Number:CRD/2020/000248 and partly by Universitas Indonesia's Inter-national Indexed Publication(PUTI)Q2 Grant,year 2023,number:NKB-803/UN2.RST/HKP.05.00/2023.
文摘Agriculture 5.0 is an emerging concept where sensors,big data,Internet-of-Things(IoT),robots,and Artificial In-telligence(AI)are used for agricultural purposes.Different from Agriculture 4.0,robots and AI become the focus of the implementation in Agriculture 5.0.One of the applications of Agriculture 5.0 is weed management where robots are used to discriminate weeds from the crops or plants so that proper action can be performed to remove the weeds.This paper discusses an in-depth review of Machine Learning(ML)techniques used for discriminating weeds from crops or plants.We specifically present a detailed explanation of five steps required in using ML algorithms to distinguish between weeds and plants.