There may be several internal defects in railway track work that have different shapes and distribution rules,and these defects affect the safety of high-speed trains.Establishing reliable detection models and methods...There may be several internal defects in railway track work that have different shapes and distribution rules,and these defects affect the safety of high-speed trains.Establishing reliable detection models and methods for these internal defects remains a challenging task.To address this challenge,in this study,an intelligent detection method based on a generalization feature cluster is proposed for internal defects of railway tracks.First,the defects are classified and counted according to their shape and location features.Then,generalized features of the internal defects are extracted and formulated based on the maximum difference between different types of defects and the maximum tolerance among same defects’types.Finally,the extracted generalized features are expressed by function constraints,and formulated as generalization feature clusters to classify and identify internal defects in the railway track.Furthermore,to improve the detection reliability and speed,a reduced-dimension method of the generalization feature clusters is presented in this paper.Based on this reduced-dimension feature and strongly constrained generalized features,the K-means clustering algorithm is developed for defect clustering,and good clustering results are achieved.Regarding the defects in the rail head region,the clustering accuracy is over 95%,and the Davies-Bouldin index(DBI)index is negligible,which indicates the validation of the proposed generalization features with strong constraints.Experimental results prove that the accuracy of the proposed method based on generalization feature clusters is up to 97.55%,and the average detection time is 0.12 s/frame,which indicates that it performs well in adaptability,high accuracy,and detection speed under complex working environments.The proposed algorithm can effectively detect internal defects in railway tracks using an established generalization feature cluster model.展开更多
Nearly half of coal mine disasters in China have been found to occur in clusters or to be accompanied by earthquakes nearby,in which all the disaster types are involved.Stress disturbances seem to exist among mining a...Nearly half of coal mine disasters in China have been found to occur in clusters or to be accompanied by earthquakes nearby,in which all the disaster types are involved.Stress disturbances seem to exist among mining areas and to be responsible for the observed clustering.The earthquakes accompanied by coal mine disasters may be the vital geophysical evidence for tectonic stress disturbances around mining areas.This paper analyzes all the possible causative factors to demonstrate the authenticity and reliability of the observed phenomena.A quantitative study was performed on the degree of clustering,and space-time distribution curves are obtained.Under the threshold of 100 km,47%of disasters are involved in cluster series and 372 coal mine disasters accompanied by earthquakes.The majority cluster series lasting for 1-2 days correspond well earthquakes nearby,which are speculated to be related to local stress disturbance.While the minority lasting longer than 4 days correspond well with fatal earthquakes,which are speculated to be related to regional stress disturbance.The cluster series possess multiple properties,such as the area,the distance,the related disasters,etc.,and compared with the energy and the magnitude of earthquakes,good correspondences are acquired.It indicates that the cluster series of coal mine disasters and earthquakes are linked with fatal earthquakes and may serve as footprints of regional stress disturbance.Speculations relating to the geological model are made,and five disaster-causing models are examined.To earthquake research and disaster prevention,widely scientific significance is suggested.展开更多
Dissolved oxygen(DO)content is an important index of river water quality.Water quality sensors have been used in China for urban river water monitoring and DO content prediction.However,water quality sensors are expen...Dissolved oxygen(DO)content is an important index of river water quality.Water quality sensors have been used in China for urban river water monitoring and DO content prediction.However,water quality sensors are expensive and difficult to maintain,and have a short operation period and difficult to maintain.This study developed a scientific and accurate method for prediction of DO content changes using fish school features.The behavioral features of the Carassius auratus fish school were described using two-dimensional fish school images.The degree of DO content decline was graded into five levels,and the corresponding numerical ranges of cluster characteristic parameters were determined by considering the opinions of ichthyologists.Finally,the variation of DO content was predicted using the characteristic parameters of the fish school and the multiple-input single-output Takagi-Sugeno fuzzy neural network.The prediction results were basically consistent with the actual variations of DO content.Therefore,it is feasible to use the behavioral features of the fish school to dynamically predict the level of DO content in water,and this method is especially suitable for prediction of sharp decline of DO content in a relatively short time.展开更多
The following questions are discussed: feature cluster, feature clusterconcept and the reasoning formula. The defect based on approach direction and feed direction areanalyzed. Feature tool axis direction concept and ...The following questions are discussed: feature cluster, feature clusterconcept and the reasoning formula. The defect based on approach direction and feed direction areanalyzed. Feature tool axis direction concept and its definition method are submitted. The featurefor practical part is also clustered by tool axis direction.展开更多
The critical technical problem of underwater bottom object detection is founding a stable feature space for echo signals classification. The past literatures more focus on the characteristics of object echoes in featu...The critical technical problem of underwater bottom object detection is founding a stable feature space for echo signals classification. The past literatures more focus on the characteristics of object echoes in feature space and reverberation is only treated as interference. In this paper, reverberation is considered as a kind of signal with steady characteristic, and the clustering of reverberation in frequency discrete wavelet transform (FDWT) feature space is studied. In order to extract the identifying information of echo signals, feature compression and cluster analysis are adopted in this paper, and the criterion of separability between object echoes and reverberation is given. The experimental data processing results show that reverberation has steady pattern in FDWT feature space which differs from that of object echoes. It is proven that there is separability between reverberation and object echoes.展开更多
A principal direction Gaussian image (PDGI)-based algorithm is proposed to extract the regular swept surface from point cloud. Firstly, the PDGI of the regular swept surface is constructed from point cloud, then the...A principal direction Gaussian image (PDGI)-based algorithm is proposed to extract the regular swept surface from point cloud. Firstly, the PDGI of the regular swept surface is constructed from point cloud, then the bounding box of the Gaussian sphere is uniformly partitioned into a number of small cubes (3D grids) and the PDGI points on the Gaussian sphere are associated with the corresponding 3D grids. Secondly, cluster analysis technique is used to sort out a group of 3D grids containing more PDGI points among the 3D grids. By the connected-region growing algorithm, the congregation point or the great circle is detected from the 3D grids. Thus the translational direction is determined by the congregation point and the direction of the rotational axis is determined by the great circle. In addition, the positional point of the rotational axis is obtained by the intersection of all the projected normal lines of the rotational surface on the plane being perpendicular to the estimated direction of the rotational axis. Finally, a pattem search method is applied to optimize the translational direction and the rotational axis. Some experiments are used to illustrate the feasibility of the above algorithm.展开更多
In order to solve the problem of indoor place recognition for indoor service robot, a novel algorithm, clustering of features and images (CFI), is proposed in this work. Different from traditional indoor place recog...In order to solve the problem of indoor place recognition for indoor service robot, a novel algorithm, clustering of features and images (CFI), is proposed in this work. Different from traditional indoor place recognition methods which are based on kernels or bag of features, with large margin classifier, CFI proposed in this work is based on feature matching, image similarity and clustering of features and images. It establishes independent local feature clusters by feature cloud registration to represent each room, and defines image distance to describe the similarity between images or feature clusters, which determines the label of query images. Besides, it improves recognition speed by image scaling, with state inertia and hidden Markov model constraining the transition of the state to kill unreasonable wrong recognitions and achieves remarkable precision and speed. A series of experiments are conducted to test the algorithm based on standard databases, and it achieves recognition rate up to 97% and speed is over 30 fps, which is much superior to traditional methods. Its impressive precision and speed demonstrate the great discriminative power in the face of complicated environment.展开更多
Pattern recognition of seismic and mor- phostructural nodes plays an important role in seismic hazard assessment. This is a known fact in seismology that tectonic nodes are prone areas to large earthquake and have thi...Pattern recognition of seismic and mor- phostructural nodes plays an important role in seismic hazard assessment. This is a known fact in seismology that tectonic nodes are prone areas to large earthquake and have this potential. They are identified by morphostructural analysis. In this study, the Alborz region has considered as studied case and locations of future events are forecast based on Kohonen Self-Organized Neural Network. It has been shown how it can predict the location of earthquake, and identifies seismogenic nodes which are prone to earthquake of M5.5+ at the West of Alborz in Iran by using International Institute Earthquake Engineering and Seismology earthquake catalogs data. First, the main faults and tectonic lineaments have been identified based on MZ (land zoning method) method. After that, by using pattern recognition, we generalized past recorded events to future in order to show the region of probable future earthquakes. In other word, hazardous nodes have determined among all nodes by new catalog generated Self-organizing feature maps (SOFM). Our input data are extracted from catalog, consists longitude and latitude of past event between 1980-2015 with magnitude larger or equal to 4.5. It has concluded node D1 is candidate for big earthquakes in comparison with other nodes and other nodes are in lower levels of this potential.展开更多
Cross-project defect prediction (CPDP) uses the labeled data from external source software projects to com- pensate the shortage of useful data in the target project, in order to build a meaningful classification mo...Cross-project defect prediction (CPDP) uses the labeled data from external source software projects to com- pensate the shortage of useful data in the target project, in order to build a meaningful classification model. However, the distribution gap between software features extracted from the source and the target projects may be too large to make the mixed data useful for training. In this paper, we propose a cluster-based novel method FeSCH (Feature Selection Using Clusters of Hybrid-Data) to alleviate the distribution differences by feature selection. FeSCH includes two phases. Tile feature clustering phase clusters features using a density-based clustering method, and the feature selection phase selects features from each cluster using a ranking strategy. For CPDP, we design three different heuristic ranking strategies in the second phase. To investigate the prediction performance of FeSCH, we design experiments based on real-world software projects, and study the effects of design options in FeSCH (such as ranking strategy, feature selection ratio, and classifiers). The experimental results prove the effectiveness of FeSCH. Firstly, compared with the state-of-the-art baseline methods, FeSCH achieves better performance and its performance is less affected by the classifiers used. Secondly, FeSCH enhances the performance by effectively selecting features across feature categories, and provides guidelines for selecting useful features for defect prediction.展开更多
Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this pa...Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.展开更多
Battery fault diagnosis is essential for ensuring the reliability and safety of electric vehicles(EVs).The existing battery fault diagnosis methods are difficult to detect faults at an early stage based on the real-wo...Battery fault diagnosis is essential for ensuring the reliability and safety of electric vehicles(EVs).The existing battery fault diagnosis methods are difficult to detect faults at an early stage based on the real-world vehicle data since lithium-ion battery systems are usually accompanied by inconsistencies,which are difficult to distinguish from faults.A fault diagnosis method based on signal decomposition and two-dimensional feature clustering is introduced in this paper.Symplectic geometry mode decomposition(SGMD)is introduced to obtain the components characterizing battery states,and distance-based similarity measures with the normalized extended average voltage and dynamic time warping distances are established to evaluate the state of batteries.The 2-dimensional feature clustering based on DBSCAN is developed to reduce the number of feature thresholds and differentiate flaw cells from the battery pack with only one parameter under a wide range of values.The proposed method can achieve fault diagnosis and voltage anomaly identification as early as 43 days ahead of the thermal runaway.And the results of four electric vehicles and the comparison with other traditional methods validated the proposed method with strong robustness,high reliability,and long time scale warning,and the method is easy to implement online.展开更多
For industrial processes, new scarce faults are usually judged by experts. The lack of instances for these faults causes a severe data imbalance problem for a diagnosis model and leads to low performance. In this arti...For industrial processes, new scarce faults are usually judged by experts. The lack of instances for these faults causes a severe data imbalance problem for a diagnosis model and leads to low performance. In this article, a new diagnosis method with few-shot learning based on a class-rebalance strategy is proposed to handle the problem. The proposed method is designed to transform instances of the different faults into a feature embedding space. In this way, the fault features can be transformed into separate feature clusters. The fault representations are calculated as the centers of feature clusters. The representations of new faults can also be effectively calculated with few support instances. Therefore, fault diagnosis can be achieved by estimating feature similarity between instances and faults. A cluster loss function is designed to enhance the feature clustering performance. Also, a class-rebalance strategy with data augmentation is designed to imitate potential faults with different reasons and degrees of severity to improve the model’s generalizability. It improves the diagnosis performance of the proposed method. Simulations of fault diagnosis with the proposed method were performed on the Tennessee-Eastman benchmark. The proposed method achieved average diagnosis accuracies ranging from 81.8% to 94.7% for the eight selected faults for the simulation settings of support instances ranging from 3 to 50. The simulation results verify the effectiveness of the proposed method.展开更多
Based on the analysis of the unique shapes and writing styles of Uyghur characters,we design a framework for prototype character recognition system and carry out a systematic theoretical and experimental research on i...Based on the analysis of the unique shapes and writing styles of Uyghur characters,we design a framework for prototype character recognition system and carry out a systematic theoretical and experimental research on its modules.In the preprocessing procedure,we use the linear and nonlinear normalization based on dot density method.Both structural and statistical features are extracted due to the fact that there are some very similar characters in Uyghur literature.In clustering analysis,we adopt the dynamic clustering algorithm based on the minimum spanning tree(MST),and use the k-nearest neighbor matching classification as classifier.The testing results of prototype system show that the recognition rates for characters of the four different types(independent,suffix,intermediate,and initial type) are 74.67%,70.42%,63.33%,and 72.02%,respectively;the recognition rates for the case of five candidates for those characters are 94.34%,94.19%,93.15%,and 95.86%,respectively.The ideas and methods used in this paper have some commonality and usefulness for the recognition of other characters that belong to Altaic languages family.展开更多
In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented da...In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented data supply chains because the high complexity of the data supply chain makes the computation of similarity extremely complex and inefficient. In this paper, we propose a feature space representation model based on key points,which can extract the key features from the subsequences of the original data supply chain and simplify it into a feature vector form. Then, we formulate the similarity computation of the subsequences based on the multiscale features. Further, we propose an improved hierarchical clustering algorithm for a similarity search over the data supply chains. The main idea is to separate the subsequences into disjoint groups such that each group meets one specific clustering criteria; thus, the cluster containing the query object is the similarity search result. The experimental results show that the proposed approach is both effective and efficient for data supply chain retrieval.展开更多
In some image classification tasks, similarities among different categories are different and the samples are usually misclassified as highly similar categories. To distinguish highly similar categories, more specific...In some image classification tasks, similarities among different categories are different and the samples are usually misclassified as highly similar categories. To distinguish highly similar categories, more specific features are required so that the classifier can improve the classification performance. In this paper, we propose a novel two-level hierarchical feature learning framework based on the deep convolutional neural network(CNN), which is simple and effective. First, the deep feature extractors of different levels are trained using the transfer learning method that fine-tunes the pre-trained deep CNN model toward the new target dataset. Second, the general feature extracted from all the categories and the specific feature extracted from highly similar categories are fused into a feature vector. Then the final feature representation is fed into a linear classifier. Finally, experiments using the Caltech-256, Oxford Flower-102, and Tasmania Coral Point Count(CPC) datasets demonstrate that the expression ability of the deep features resulting from two-level hierarchical feature learning is powerful. Our proposed method effectively increases the classification accuracy in comparison with flat multiple classification methods.展开更多
基金National Natural Science Foundation of China(Grant No.61573233)Guangdong Provincial Natural Science Foundation of China(Grant No.2018A0303130188)+1 种基金Guangdong Provincial Science and Technology Special Funds Project of China(Grant No.190805145540361)Special Projects in Key Fields of Colleges and Universities in Guangdong Province of China(Grant No.2020ZDZX2005).
文摘There may be several internal defects in railway track work that have different shapes and distribution rules,and these defects affect the safety of high-speed trains.Establishing reliable detection models and methods for these internal defects remains a challenging task.To address this challenge,in this study,an intelligent detection method based on a generalization feature cluster is proposed for internal defects of railway tracks.First,the defects are classified and counted according to their shape and location features.Then,generalized features of the internal defects are extracted and formulated based on the maximum difference between different types of defects and the maximum tolerance among same defects’types.Finally,the extracted generalized features are expressed by function constraints,and formulated as generalization feature clusters to classify and identify internal defects in the railway track.Furthermore,to improve the detection reliability and speed,a reduced-dimension method of the generalization feature clusters is presented in this paper.Based on this reduced-dimension feature and strongly constrained generalized features,the K-means clustering algorithm is developed for defect clustering,and good clustering results are achieved.Regarding the defects in the rail head region,the clustering accuracy is over 95%,and the Davies-Bouldin index(DBI)index is negligible,which indicates the validation of the proposed generalization features with strong constraints.Experimental results prove that the accuracy of the proposed method based on generalization feature clusters is up to 97.55%,and the average detection time is 0.12 s/frame,which indicates that it performs well in adaptability,high accuracy,and detection speed under complex working environments.The proposed algorithm can effectively detect internal defects in railway tracks using an established generalization feature cluster model.
文摘Nearly half of coal mine disasters in China have been found to occur in clusters or to be accompanied by earthquakes nearby,in which all the disaster types are involved.Stress disturbances seem to exist among mining areas and to be responsible for the observed clustering.The earthquakes accompanied by coal mine disasters may be the vital geophysical evidence for tectonic stress disturbances around mining areas.This paper analyzes all the possible causative factors to demonstrate the authenticity and reliability of the observed phenomena.A quantitative study was performed on the degree of clustering,and space-time distribution curves are obtained.Under the threshold of 100 km,47%of disasters are involved in cluster series and 372 coal mine disasters accompanied by earthquakes.The majority cluster series lasting for 1-2 days correspond well earthquakes nearby,which are speculated to be related to local stress disturbance.While the minority lasting longer than 4 days correspond well with fatal earthquakes,which are speculated to be related to regional stress disturbance.The cluster series possess multiple properties,such as the area,the distance,the related disasters,etc.,and compared with the energy and the magnitude of earthquakes,good correspondences are acquired.It indicates that the cluster series of coal mine disasters and earthquakes are linked with fatal earthquakes and may serve as footprints of regional stress disturbance.Speculations relating to the geological model are made,and five disaster-causing models are examined.To earthquake research and disaster prevention,widely scientific significance is suggested.
基金supported by the Natural Science Foundation of Changzhou City,China(Grants No.CE20195026 and CE20205031)the Teaching Steering Committee of Electronics Information Specialty in Colleges and Universities of the Ministry of Education(Grant No.2020-YB-42)the Jiangsu Overseas Visiting Scholar Program for University Prominent Young and Middle Aged Teachers and Presidents.
文摘Dissolved oxygen(DO)content is an important index of river water quality.Water quality sensors have been used in China for urban river water monitoring and DO content prediction.However,water quality sensors are expensive and difficult to maintain,and have a short operation period and difficult to maintain.This study developed a scientific and accurate method for prediction of DO content changes using fish school features.The behavioral features of the Carassius auratus fish school were described using two-dimensional fish school images.The degree of DO content decline was graded into five levels,and the corresponding numerical ranges of cluster characteristic parameters were determined by considering the opinions of ichthyologists.Finally,the variation of DO content was predicted using the characteristic parameters of the fish school and the multiple-input single-output Takagi-Sugeno fuzzy neural network.The prediction results were basically consistent with the actual variations of DO content.Therefore,it is feasible to use the behavioral features of the fish school to dynamically predict the level of DO content in water,and this method is especially suitable for prediction of sharp decline of DO content in a relatively short time.
基金National Natural Science Foundation of China (No.59875006).
文摘The following questions are discussed: feature cluster, feature clusterconcept and the reasoning formula. The defect based on approach direction and feed direction areanalyzed. Feature tool axis direction concept and its definition method are submitted. The featurefor practical part is also clustered by tool axis direction.
基金Supported by the National Natural Science Foundation of China, under Grant No.51279033.
文摘The critical technical problem of underwater bottom object detection is founding a stable feature space for echo signals classification. The past literatures more focus on the characteristics of object echoes in feature space and reverberation is only treated as interference. In this paper, reverberation is considered as a kind of signal with steady characteristic, and the clustering of reverberation in frequency discrete wavelet transform (FDWT) feature space is studied. In order to extract the identifying information of echo signals, feature compression and cluster analysis are adopted in this paper, and the criterion of separability between object echoes and reverberation is given. The experimental data processing results show that reverberation has steady pattern in FDWT feature space which differs from that of object echoes. It is proven that there is separability between reverberation and object echoes.
基金This project is supported by Key Program of National Natural Science Foundation of China(No.50435020).
文摘A principal direction Gaussian image (PDGI)-based algorithm is proposed to extract the regular swept surface from point cloud. Firstly, the PDGI of the regular swept surface is constructed from point cloud, then the bounding box of the Gaussian sphere is uniformly partitioned into a number of small cubes (3D grids) and the PDGI points on the Gaussian sphere are associated with the corresponding 3D grids. Secondly, cluster analysis technique is used to sort out a group of 3D grids containing more PDGI points among the 3D grids. By the connected-region growing algorithm, the congregation point or the great circle is detected from the 3D grids. Thus the translational direction is determined by the congregation point and the direction of the rotational axis is determined by the great circle. In addition, the positional point of the rotational axis is obtained by the intersection of all the projected normal lines of the rotational surface on the plane being perpendicular to the estimated direction of the rotational axis. Finally, a pattem search method is applied to optimize the translational direction and the rotational axis. Some experiments are used to illustrate the feasibility of the above algorithm.
基金supported by National Natural Science Foundation of China(Nos.61305103 and 61473103)Natural Science Foundation Heilongjiang province(No.QC2014C072)+1 种基金Postdoctoral Science Foundation of Heilongjiang(No.LBH-Z14108)SelfPlanned Task of State Key Laboratory of Robotics and System(HIT)(No.SKLRS201609B)
文摘In order to solve the problem of indoor place recognition for indoor service robot, a novel algorithm, clustering of features and images (CFI), is proposed in this work. Different from traditional indoor place recognition methods which are based on kernels or bag of features, with large margin classifier, CFI proposed in this work is based on feature matching, image similarity and clustering of features and images. It establishes independent local feature clusters by feature cloud registration to represent each room, and defines image distance to describe the similarity between images or feature clusters, which determines the label of query images. Besides, it improves recognition speed by image scaling, with state inertia and hidden Markov model constraining the transition of the state to kill unreasonable wrong recognitions and achieves remarkable precision and speed. A series of experiments are conducted to test the algorithm based on standard databases, and it achieves recognition rate up to 97% and speed is over 30 fps, which is much superior to traditional methods. Its impressive precision and speed demonstrate the great discriminative power in the face of complicated environment.
文摘Pattern recognition of seismic and mor- phostructural nodes plays an important role in seismic hazard assessment. This is a known fact in seismology that tectonic nodes are prone areas to large earthquake and have this potential. They are identified by morphostructural analysis. In this study, the Alborz region has considered as studied case and locations of future events are forecast based on Kohonen Self-Organized Neural Network. It has been shown how it can predict the location of earthquake, and identifies seismogenic nodes which are prone to earthquake of M5.5+ at the West of Alborz in Iran by using International Institute Earthquake Engineering and Seismology earthquake catalogs data. First, the main faults and tectonic lineaments have been identified based on MZ (land zoning method) method. After that, by using pattern recognition, we generalized past recorded events to future in order to show the region of probable future earthquakes. In other word, hazardous nodes have determined among all nodes by new catalog generated Self-organizing feature maps (SOFM). Our input data are extracted from catalog, consists longitude and latitude of past event between 1980-2015 with magnitude larger or equal to 4.5. It has concluded node D1 is candidate for big earthquakes in comparison with other nodes and other nodes are in lower levels of this potential.
文摘Cross-project defect prediction (CPDP) uses the labeled data from external source software projects to com- pensate the shortage of useful data in the target project, in order to build a meaningful classification model. However, the distribution gap between software features extracted from the source and the target projects may be too large to make the mixed data useful for training. In this paper, we propose a cluster-based novel method FeSCH (Feature Selection Using Clusters of Hybrid-Data) to alleviate the distribution differences by feature selection. FeSCH includes two phases. Tile feature clustering phase clusters features using a density-based clustering method, and the feature selection phase selects features from each cluster using a ranking strategy. For CPDP, we design three different heuristic ranking strategies in the second phase. To investigate the prediction performance of FeSCH, we design experiments based on real-world software projects, and study the effects of design options in FeSCH (such as ranking strategy, feature selection ratio, and classifiers). The experimental results prove the effectiveness of FeSCH. Firstly, compared with the state-of-the-art baseline methods, FeSCH achieves better performance and its performance is less affected by the classifiers used. Secondly, FeSCH enhances the performance by effectively selecting features across feature categories, and provides guidelines for selecting useful features for defect prediction.
基金supported by the National Natural Science Foundation of China (No. 60673024)the "Eleventh Five" Preliminary Research Project of PLA (No. 102060206)
文摘Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.
基金the National Natural Science Foundation of China[No.51977007,No.52007006]the Natural Science Foundation of Beijing under grant 3212033.
文摘Battery fault diagnosis is essential for ensuring the reliability and safety of electric vehicles(EVs).The existing battery fault diagnosis methods are difficult to detect faults at an early stage based on the real-world vehicle data since lithium-ion battery systems are usually accompanied by inconsistencies,which are difficult to distinguish from faults.A fault diagnosis method based on signal decomposition and two-dimensional feature clustering is introduced in this paper.Symplectic geometry mode decomposition(SGMD)is introduced to obtain the components characterizing battery states,and distance-based similarity measures with the normalized extended average voltage and dynamic time warping distances are established to evaluate the state of batteries.The 2-dimensional feature clustering based on DBSCAN is developed to reduce the number of feature thresholds and differentiate flaw cells from the battery pack with only one parameter under a wide range of values.The proposed method can achieve fault diagnosis and voltage anomaly identification as early as 43 days ahead of the thermal runaway.And the results of four electric vehicles and the comparison with other traditional methods validated the proposed method with strong robustness,high reliability,and long time scale warning,and the method is easy to implement online.
基金supported by National Natural Science Foundation of China (Nos. 61733004, 62103413)the National Key Research and Development Program of China (No. 2018YFD0400902).
文摘For industrial processes, new scarce faults are usually judged by experts. The lack of instances for these faults causes a severe data imbalance problem for a diagnosis model and leads to low performance. In this article, a new diagnosis method with few-shot learning based on a class-rebalance strategy is proposed to handle the problem. The proposed method is designed to transform instances of the different faults into a feature embedding space. In this way, the fault features can be transformed into separate feature clusters. The fault representations are calculated as the centers of feature clusters. The representations of new faults can also be effectively calculated with few support instances. Therefore, fault diagnosis can be achieved by estimating feature similarity between instances and faults. A cluster loss function is designed to enhance the feature clustering performance. Also, a class-rebalance strategy with data augmentation is designed to imitate potential faults with different reasons and degrees of severity to improve the model’s generalizability. It improves the diagnosis performance of the proposed method. Simulations of fault diagnosis with the proposed method were performed on the Tennessee-Eastman benchmark. The proposed method achieved average diagnosis accuracies ranging from 81.8% to 94.7% for the eight selected faults for the simulation settings of support instances ranging from 3 to 50. The simulation results verify the effectiveness of the proposed method.
基金Supported by the National Natural Science Foundation of China (61065001)
文摘Based on the analysis of the unique shapes and writing styles of Uyghur characters,we design a framework for prototype character recognition system and carry out a systematic theoretical and experimental research on its modules.In the preprocessing procedure,we use the linear and nonlinear normalization based on dot density method.Both structural and statistical features are extracted due to the fact that there are some very similar characters in Uyghur literature.In clustering analysis,we adopt the dynamic clustering algorithm based on the minimum spanning tree(MST),and use the k-nearest neighbor matching classification as classifier.The testing results of prototype system show that the recognition rates for characters of the four different types(independent,suffix,intermediate,and initial type) are 74.67%,70.42%,63.33%,and 72.02%,respectively;the recognition rates for the case of five candidates for those characters are 94.34%,94.19%,93.15%,and 95.86%,respectively.The ideas and methods used in this paper have some commonality and usefulness for the recognition of other characters that belong to Altaic languages family.
基金partly supported by the National Natural Science Foundation of China(Nos.61532012,61370196,and 61672109)
文摘In this paper, we target a similarity search among data supply chains, which plays an essential role in optimizing the supply chain and extending its value. This problem is very challenging for application-oriented data supply chains because the high complexity of the data supply chain makes the computation of similarity extremely complex and inefficient. In this paper, we propose a feature space representation model based on key points,which can extract the key features from the subsequences of the original data supply chain and simplify it into a feature vector form. Then, we formulate the similarity computation of the subsequences based on the multiscale features. Further, we propose an improved hierarchical clustering algorithm for a similarity search over the data supply chains. The main idea is to separate the subsequences into disjoint groups such that each group meets one specific clustering criteria; thus, the cluster containing the query object is the similarity search result. The experimental results show that the proposed approach is both effective and efficient for data supply chain retrieval.
基金Project supported by the National Natural Science Foundation of China(No.61379074)the Zhejiang Provincial Natural Science Foundation of China(Nos.LZ12F02003 and LY15F020035)
文摘In some image classification tasks, similarities among different categories are different and the samples are usually misclassified as highly similar categories. To distinguish highly similar categories, more specific features are required so that the classifier can improve the classification performance. In this paper, we propose a novel two-level hierarchical feature learning framework based on the deep convolutional neural network(CNN), which is simple and effective. First, the deep feature extractors of different levels are trained using the transfer learning method that fine-tunes the pre-trained deep CNN model toward the new target dataset. Second, the general feature extracted from all the categories and the specific feature extracted from highly similar categories are fused into a feature vector. Then the final feature representation is fed into a linear classifier. Finally, experiments using the Caltech-256, Oxford Flower-102, and Tasmania Coral Point Count(CPC) datasets demonstrate that the expression ability of the deep features resulting from two-level hierarchical feature learning is powerful. Our proposed method effectively increases the classification accuracy in comparison with flat multiple classification methods.