Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to th...Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to the correct parent node instead of treating leaf nodes as the final classification target.Second,error propagation occurs when a misclassification at a parent node propagates down the hierarchy,ultimately leading to inaccurate predictions at the leaf nodes.To address these limitations,we propose an uncertainty-guided HTC depth-aware model called DepthMatch.Specifically,we design an early stopping strategy with uncertainty to identify incomplete matching between text and labels,classifying them into the corresponding parent node labels.This approach allows us to dynamically determine the classification depth by leveraging evidence to quantify and accumulate uncertainty.Experimental results show that the proposed DepthMatch outperforms recent strong baselines on four commonly used public datasets:WOS(Web of Science),RCV1-V2(Reuters Corpus Volume I),AAPD(Arxiv Academic Paper Dataset),and BGC.Notably,on the BGC dataset,it improvesMicro-F1 andMacro-F1 scores by at least 1.09%and 1.74%,respectively.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subse...Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.展开更多
We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation process...We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classifier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers.展开更多
Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics...Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics of geometric explosion of information in the era of big data,this paper studies the possible approaches to balance the maximum value and privacy of information,and disposes the Nine-Cells information matrix,hierarchical classification.Furthermore,the paper uses the rough sets theory to proceed from the two dimensions of value and privacy,establishes information classification method,puts forward the countermeasures for information security.Taking spam messages for example,the massive spam messages can be classified,and then targeted hierarchical management strategy was put forward.This paper proposes personal Information index system,Information management platform and possible solutions to protect information security and utilize information value in the age of big data.展开更多
It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (...It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically dusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision.tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.展开更多
The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distr...The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distribution for the set of descriptors of image description.The article also proposes the use of the spatial data processing apparatus,which simplifies and accelerates the classification process.Experiments have shown that the time of calculation of the relevance for two descriptions according to their distributions is about 1000 times less than for the traditional voting procedure,for which the sets of descriptors are compared.The introduction of the system of hierarchical features allows to further reduce the calculation time by 2–3 times while ensuring high efficiency of classification.The noise immunity of the method to additive noise has been experimentally studied.According to the results of the research,the marginal degree of the hierarchy of features for reliable classification with the standard deviation of noise less than 30 is the 8-bit distribution.Computing costs increase proportionally with decreasing bit distribution.The method can be used for application tasks where object identification time is critical.展开更多
Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid mo...Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model.展开更多
We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to r...We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification.展开更多
Based on the tropical cyclone(TC) observations in the western North Pacific from 2000 to 2008, this paper adopts the particle swarm optimization(PSO) algorithm of evolutionary computation to optimize one comprehensive...Based on the tropical cyclone(TC) observations in the western North Pacific from 2000 to 2008, this paper adopts the particle swarm optimization(PSO) algorithm of evolutionary computation to optimize one comprehensive classification rule, and apply the optimized classification rule to the forecasting of TC intensity change. In the process of the optimization, the strategy of hierarchical pruning has been adopted in the PSO algorithm to narrow the search area,and thus to enhance the local search ability, i.e. hierarchical PSO algorithm. The TC intensity classification rule involves core attributes including 12-HMWS, MPI, and Rainrate which play vital roles in TC intensity change. The testing accuracy using the new mined rule by hierarchical PSO algorithm reaches 89.6%. The current study shows that the novel classification method for TC intensity change analysis based on hierarchic PSO algorithm is not only easy to explain the source of rule core attributes, but also has great potential to improve the forecasting of TC intensity change.展开更多
With the deterioration of the environment,it is imperative to protect coastal wetlands.Using multi-source remote sensing data and object-based hierarchical classification to classify coastal wetlands is an effective m...With the deterioration of the environment,it is imperative to protect coastal wetlands.Using multi-source remote sensing data and object-based hierarchical classification to classify coastal wetlands is an effective method.The object-based hierarchical classification using remote sensing indices(OBH-RSI)for coastal wetland is proposed to achieve fine classification of coastal wetland.First,the original categories are divided into four groups according to the category characteristics.Second,the training and test maps of each group are extracted according to the remote sensing indices.Third,four groups are passed through the classifier in order.Finally,the results of the four groups are combined to get the final classification result map.The experimental results demonstrate that the overall accuracy,average accuracy and kappa coefficient of the proposed strategy are over 94%using the Yellow River Delta dataset.展开更多
Artificial intelligence,which has recently emerged with the rapid development of information technology,is drawing attention as a tool for solving various problems demanded by society and industry.In particular,convol...Artificial intelligence,which has recently emerged with the rapid development of information technology,is drawing attention as a tool for solving various problems demanded by society and industry.In particular,convolutional neural networks(CNNs),a type of deep learning technology,are highlighted in computer vision fields,such as image classification and recognition and object tracking.Training these CNN models requires a large amount of data,and a lack of data can lead to performance degradation problems due to overfitting.As CNN architecture development and optimization studies become active,ensemble techniques have emerged to perform image classification by combining features extracted from multiple CNN models.In this study,data augmentation and contour image extraction were performed to overcome the data shortage problem.In addition,we propose a hierarchical ensemble technique to achieve high image classification accuracy,even if trained from a small amount of data.First,we trained the UCMerced land use dataset and the contour images for each image on pretrained VGGNet,GoogLeNet,ResNet,DenseNet,and EfficientNet.We then apply a hierarchical ensemble technique to the number of cases in which each model can be deployed.These experiments were performed in cases where the proportion of training datasets was 30%,50%,and 70%,resulting in a performance improvement of up to 4.68%compared to the average accuracy of the entire model.展开更多
This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to ...This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to literature, the ABC algorithm is traditionally applied to constrained and unconstrained problems, but is combined with modified DNA concepts and implemented for fuzzy classification in this present research. Moreover, from the best of our knowledge, previous research on the ABC algorithm has not combined it with DNA computing for hierarchical fuzzy classification to explore the merits of cooperative coevolution. Therefore, this paper is the first to apply the mechanism of symbiosis to create a hybrid modified DNA-ABC algorithm for hierarchical fuzzy classification applications. In this study, the partition number and the shape of the membership function are extracted by the symbiosis based hybrid modified DNA-ABC optimization algorithm, which provides both sufficient global exploration and also adequate local exploitation for hierarchical fuzzy classification. The proposed optimization algorithm is applied on five benchmark University of Irvine (UCI) data sets, and the results prove the efficiency of the algorithm.展开更多
Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on t...Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.展开更多
The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class cla...The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class classification in the case of insufficient samples,this paper proposes a multi-class classification method combining K-means and multi-task relationship learning(MTRL).The method first uses the split method of One vs.Rest to disassemble the multi-class classification task into binary classification tasks.K-means is used to down sample the dataset of each task,which can prevent over-fitting of the model while reducing training costs.Finally,the sampled dataset is applied to the MTRL,and multiple binary classifiers are trained together.With the help of MTRL,this method can utilize the inter-task association to train the model,and achieve the purpose of improving the classification accuracy of each binary classifier.The effectiveness of the proposed approach is demonstrated by experimental results on the Iris dataset,Wine dataset,Multiple Features dataset,Wireless Indoor Localization dataset and Avila dataset.展开更多
The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wav...The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wavelet transform coefficients and wavelet transform energy distribution constitute feature vectors. These vectors are then trained and tested using SVM multi-class algorithms. Experimental results demonstrate that the SVM multi-class algorithms, which use the Gaussian radial basis function, exponential radial basis function, and hyperbolic tangent function as basis functions, are suitable methods for power quality disturbance classification.展开更多
SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detect...SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam.展开更多
Skin lesion classification plays a crucial role in the early detection and diagnosis of various skin conditions.Recent advances in computer-aided diagnostic techniques have been instrumental in timely intervention,the...Skin lesion classification plays a crucial role in the early detection and diagnosis of various skin conditions.Recent advances in computer-aided diagnostic techniques have been instrumental in timely intervention,thereby improving patient outcomes,particularly in rural communities lacking specialized expertise.Despite the widespread adoption of convolutional neural networks(CNNs)in skin disease detection,their effectiveness has been hindered by the limited size and data imbalance of publicly accessible skin lesion datasets.In this context,a two-step hierarchical binary classification approach is proposed utilizing hybrid machine and deep learning(DL)techniques.Experiments conducted on the International Skin Imaging Collaboration(ISIC 2017)dataset demonstrate the effectiveness of the hierarchical approach in handling large class imbalances.Specifically,employing DenseNet121(DNET)as a feature extractor and random forest(RF)as a classifier yielded the most promising results,achieving a balanced multiclass accuracy(BMA)of 91.07%compared to the pure deep-learning model(end-to-end DNET)with a BMA of 88.66%.The RF ensemble exhibited significantly greater efficiency than other machine-learning classifiers in aiding DL to address the challenge of learning with limited data.Furthermore,the implemented predictive hybrid hierarchical model demonstrated enhanced performance while significantly reducing computational time,indicating its potential efficiency in real-world applications for the classification of skin lesions.展开更多
With the development of deep learning and Convolutional Neural Networks(CNNs),the accuracy of automatic food recognition based on visual data have significantly improved.Some research studies have shown that the deepe...With the development of deep learning and Convolutional Neural Networks(CNNs),the accuracy of automatic food recognition based on visual data have significantly improved.Some research studies have shown that the deeper the model is,the higher the accuracy is.However,very deep neural networks would be affected by the overfitting problem and also consume huge computing resources.In this paper,a new classification scheme is proposed for automatic food-ingredient recognition based on deep learning.We construct an up-to-date combinational convolutional neural network(CBNet)with a subnet merging technique.Firstly,two different neural networks are utilized for learning interested features.Then,a well-designed feature fusion component aggregates the features from subnetworks,further extracting richer and more precise features for image classification.In order to learn more complementary features,the corresponding fusion strategies are also proposed,including auxiliary classifiers and hyperparameters setting.Finally,CBNet based on the well-known VGGNet,ResNet and DenseNet is evaluated on a dataset including 41 major categories of food ingredients and 100 images for each category.Theoretical analysis and experimental results demonstrate that CBNet achieves promising accuracy for multi-class classification and improves the performance of convolutional neural networks.展开更多
A vast amount of information has been produced in recent years,which brings a huge challenge to information management.The better usage of big data is of important theoretical and practical significance for effectivel...A vast amount of information has been produced in recent years,which brings a huge challenge to information management.The better usage of big data is of important theoretical and practical significance for effectively addressing and managing messages.In this paper,we propose a nine-rectangle-grid information model according to the information value and privacy,and then present information use policies based on the rough set theory.Recurrent neural networks were employed to classify OTT messages.The content of user interest is effectively incorporated into the classification process during the annotation of OTT messages,ending with a reliable trained classification model.Experimental results showed that the proposed method yielded an accurate classification performance and hence can be used for effective distribution and control of OTT messages.展开更多
基金sponsored by the National Key Research and Development Program of China(No.2021YFF0704100)the National Natural Science Foundation of China(No.62136002)+1 种基金the Chongqing Natural Science Foundation(No.cstc2022ycjh-bgzxm0004)the Science and Technology Commission of Chongqing Municipality(CSTB2023NSCQ-LZX0006),respectively.
文摘Hierarchical Text Classification(HTC)aims to match text to hierarchical labels.Existing methods overlook two critical issues:first,some texts cannot be fully matched to leaf node labels and need to be classified to the correct parent node instead of treating leaf nodes as the final classification target.Second,error propagation occurs when a misclassification at a parent node propagates down the hierarchy,ultimately leading to inaccurate predictions at the leaf nodes.To address these limitations,we propose an uncertainty-guided HTC depth-aware model called DepthMatch.Specifically,we design an early stopping strategy with uncertainty to identify incomplete matching between text and labels,classifying them into the corresponding parent node labels.This approach allows us to dynamically determine the classification depth by leveraging evidence to quantify and accumulate uncertainty.Experimental results show that the proposed DepthMatch outperforms recent strong baselines on four commonly used public datasets:WOS(Web of Science),RCV1-V2(Reuters Corpus Volume I),AAPD(Arxiv Academic Paper Dataset),and BGC.Notably,on the BGC dataset,it improvesMicro-F1 andMacro-F1 scores by at least 1.09%and 1.74%,respectively.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金supported in part by NIH grants R01NS39600,U01MH114829RF1MH128693(to GAA)。
文摘Many fields,such as neuroscience,are experiencing the vast prolife ration of cellular data,underscoring the need fo r organizing and interpreting large datasets.A popular approach partitions data into manageable subsets via hierarchical clustering,but objective methods to determine the appropriate classification granularity are missing.We recently introduced a technique to systematically identify when to stop subdividing clusters based on the fundamental principle that cells must differ more between than within clusters.Here we present the corresponding protocol to classify cellular datasets by combining datadriven unsupervised hierarchical clustering with statistical testing.These general-purpose functions are applicable to any cellular dataset that can be organized as two-dimensional matrices of numerical values,including molecula r,physiological,and anatomical datasets.We demonstrate the protocol using cellular data from the Janelia MouseLight project to chara cterize morphological aspects of neurons.
基金Supported by the China Postdoctoral Science Foundation
文摘We explore the techniques of utilizing N gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classifier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers.
文摘Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics of geometric explosion of information in the era of big data,this paper studies the possible approaches to balance the maximum value and privacy of information,and disposes the Nine-Cells information matrix,hierarchical classification.Furthermore,the paper uses the rough sets theory to proceed from the two dimensions of value and privacy,establishes information classification method,puts forward the countermeasures for information security.Taking spam messages for example,the massive spam messages can be classified,and then targeted hierarchical management strategy was put forward.This paper proposes personal Information index system,Information management platform and possible solutions to protect information security and utilize information value in the age of big data.
基金National Natural Science Foundation of China ( No. 61070033 )Fundamental Research Funds for the Central Universities,China( No. 2012ZM0061)
文摘It is a challenging topic to develop an efficient algorithm for large scale classification problems in many applications of machine learning. In this paper, a hierarchical clustering and fixed- layer local learning (HCFLL) based support vector machine(SVM) algorithm is proposed to deal with this problem. Firstly, HCFLL hierarchically dusters a given dataset into a modified clustering feature tree based on the ideas of unsupervised clustering and supervised clustering. Then it locally trains SVM on each labeled subtree at a fixed-layer of the tree. The experimental results show that compared with the existing popular algorithms such as core vector machine and decision.tree support vector machine, HCFLL can significantly improve the training and testing speeds with comparable testing accuracy.
文摘The results of the development of the new fast-speed method of classification images using a structural approach are presented.The method is based on the system of hierarchical features,based on the bitwise data distribution for the set of descriptors of image description.The article also proposes the use of the spatial data processing apparatus,which simplifies and accelerates the classification process.Experiments have shown that the time of calculation of the relevance for two descriptions according to their distributions is about 1000 times less than for the traditional voting procedure,for which the sets of descriptors are compared.The introduction of the system of hierarchical features allows to further reduce the calculation time by 2–3 times while ensuring high efficiency of classification.The noise immunity of the method to additive noise has been experimentally studied.According to the results of the research,the marginal degree of the hierarchy of features for reliable classification with the standard deviation of noise less than 30 is the 8-bit distribution.Computing costs increase proportionally with decreasing bit distribution.The method can be used for application tasks where object identification time is critical.
基金Fundamental Research Funds for the Central University,China(No.2232018D3-17)。
文摘Text format information is full of most of the resources of Internet,which puts forward higher and higher requirements for the accuracy of text classification.Therefore,in this manuscript,firstly,we design a hybrid model of bidirectional encoder representation from transformers-hierarchical attention networks-dilated convolutions networks(BERT_HAN_DCN)which based on BERT pre-trained model with superior ability of extracting characteristic.The advantages of HAN model and DCN model are taken into account which can help gain abundant semantic information,fusing context semantic features and hierarchical characteristics.Secondly,the traditional softmax algorithm increases the learning difficulty of the same kind of samples,making it more difficult to distinguish similar features.Based on this,AM-softmax is introduced to replace the traditional softmax.Finally,the fused model is validated,which shows superior performance in the accuracy rate and F1-score of this hybrid model on two datasets and the experimental analysis shows the general single models such as HAN,DCN,based on BERT pre-trained model.Besides,the improved AM-softmax network model is superior to the general softmax network model.
文摘We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification.
基金National Natural Science Foundation of China(41201045)Jiangsu Qing Lan Project(2016)Natural Science Foundation of Jiangsu Province(BK20151458)
文摘Based on the tropical cyclone(TC) observations in the western North Pacific from 2000 to 2008, this paper adopts the particle swarm optimization(PSO) algorithm of evolutionary computation to optimize one comprehensive classification rule, and apply the optimized classification rule to the forecasting of TC intensity change. In the process of the optimization, the strategy of hierarchical pruning has been adopted in the PSO algorithm to narrow the search area,and thus to enhance the local search ability, i.e. hierarchical PSO algorithm. The TC intensity classification rule involves core attributes including 12-HMWS, MPI, and Rainrate which play vital roles in TC intensity change. The testing accuracy using the new mined rule by hierarchical PSO algorithm reaches 89.6%. The current study shows that the novel classification method for TC intensity change analysis based on hierarchic PSO algorithm is not only easy to explain the source of rule core attributes, but also has great potential to improve the forecasting of TC intensity change.
基金supported by the Beijing Natural Science Foundation(No.JQ20021)the National Natural Science Foundation of China(Nos.61922013,61421001 and U1833203)the Remote Sensing Monitoring Project of Geographical Elements in Shandong Yellow River Delta National Nature Reserve。
文摘With the deterioration of the environment,it is imperative to protect coastal wetlands.Using multi-source remote sensing data and object-based hierarchical classification to classify coastal wetlands is an effective method.The object-based hierarchical classification using remote sensing indices(OBH-RSI)for coastal wetland is proposed to achieve fine classification of coastal wetland.First,the original categories are divided into four groups according to the category characteristics.Second,the training and test maps of each group are extracted according to the remote sensing indices.Third,four groups are passed through the classifier in order.Finally,the results of the four groups are combined to get the final classification result map.The experimental results demonstrate that the overall accuracy,average accuracy and kappa coefficient of the proposed strategy are over 94%using the Yellow River Delta dataset.
文摘Artificial intelligence,which has recently emerged with the rapid development of information technology,is drawing attention as a tool for solving various problems demanded by society and industry.In particular,convolutional neural networks(CNNs),a type of deep learning technology,are highlighted in computer vision fields,such as image classification and recognition and object tracking.Training these CNN models requires a large amount of data,and a lack of data can lead to performance degradation problems due to overfitting.As CNN architecture development and optimization studies become active,ensemble techniques have emerged to perform image classification by combining features extracted from multiple CNN models.In this study,data augmentation and contour image extraction were performed to overcome the data shortage problem.In addition,we propose a hierarchical ensemble technique to achieve high image classification accuracy,even if trained from a small amount of data.First,we trained the UCMerced land use dataset and the contour images for each image on pretrained VGGNet,GoogLeNet,ResNet,DenseNet,and EfficientNet.We then apply a hierarchical ensemble technique to the number of cases in which each model can be deployed.These experiments were performed in cases where the proportion of training datasets was 30%,50%,and 70%,resulting in a performance improvement of up to 4.68%compared to the average accuracy of the entire model.
文摘This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to literature, the ABC algorithm is traditionally applied to constrained and unconstrained problems, but is combined with modified DNA concepts and implemented for fuzzy classification in this present research. Moreover, from the best of our knowledge, previous research on the ABC algorithm has not combined it with DNA computing for hierarchical fuzzy classification to explore the merits of cooperative coevolution. Therefore, this paper is the first to apply the mechanism of symbiosis to create a hybrid modified DNA-ABC algorithm for hierarchical fuzzy classification applications. In this study, the partition number and the shape of the membership function are extracted by the symbiosis based hybrid modified DNA-ABC optimization algorithm, which provides both sufficient global exploration and also adequate local exploitation for hierarchical fuzzy classification. The proposed optimization algorithm is applied on five benchmark University of Irvine (UCI) data sets, and the results prove the efficiency of the algorithm.
基金the Natural Science Foundation of China(Grant Numbers 72074014 and 72004012).
文摘Purpose:Many science,technology and innovation(STI)resources are attached with several different labels.To assign automatically the resulting labels to an interested instance,many approaches with good performance on the benchmark datasets have been proposed for multi-label classification task in the literature.Furthermore,several open-source tools implementing these approaches have also been developed.However,the characteristics of real-world multi-label patent and publication datasets are not completely in line with those of benchmark ones.Therefore,the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.Research limitations:Three real-world datasets differ in the following aspects:statement,data quality,and purposes.Additionally,open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection,which in turn impacts the performance of a multi-label classification approach.In the near future,we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.Practical implications:The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets,underscoring the complexity of real-world multi-label classification tasks.Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels.With ongoing enhancements in deep learning algorithms and large-scale models,it is expected that the efficacy of multi-label classification tasks will be significantly improved,reaching a level of practical utility in the foreseeable future.Originality/value:(1)Seven multi-label classification methods are comprehensively compared on three real-world datasets.(2)The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution.(3)The MLkNN method works better on the larger-scale dataset with more unbalanced document-label distribution.
基金supported by the National Natural Science Foundation of China(61703131 61703129+1 种基金 61701148 61703128)
文摘The basic idea of multi-class classification is a disassembly method,which is to decompose a multi-class classification task into several binary classification tasks.In order to improve the accuracy of multi-class classification in the case of insufficient samples,this paper proposes a multi-class classification method combining K-means and multi-task relationship learning(MTRL).The method first uses the split method of One vs.Rest to disassemble the multi-class classification task into binary classification tasks.K-means is used to down sample the dataset of each task,which can prevent over-fitting of the model while reducing training costs.Finally,the sampled dataset is applied to the MTRL,and multiple binary classifiers are trained together.With the help of MTRL,this method can utilize the inter-task association to train the model,and achieve the purpose of improving the classification accuracy of each binary classifier.The effectiveness of the proposed approach is demonstrated by experimental results on the Iris dataset,Wine dataset,Multiple Features dataset,Wireless Indoor Localization dataset and Avila dataset.
文摘The accurate identification and classification of various power quality disturbances are keys to ensuring high-quality electrical energy. In this study, the statistical characteristics of the disturbance signal of wavelet transform coefficients and wavelet transform energy distribution constitute feature vectors. These vectors are then trained and tested using SVM multi-class algorithms. Experimental results demonstrate that the SVM multi-class algorithms, which use the Gaussian radial basis function, exponential radial basis function, and hyperbolic tangent function as basis functions, are suitable methods for power quality disturbance classification.
文摘SMS spam poses a significant challenge to maintaining user privacy and security.Recently,spammers have employed fraudulent writing styles to bypass spam detection systems.This paper introduces a novel two-level detection system that utilizes deep learning techniques for effective spam identification to address the challenge of sophisticated SMS spam.The system comprises five steps,beginning with the preprocessing of SMS data.RoBERTa word embedding is then applied to convert text into a numerical format for deep learning analysis.Feature extraction is performed using a Convolutional Neural Network(CNN)for word-level analysis and a Bidirectional Long Short-Term Memory(BiLSTM)for sentence-level analysis.The two-level feature extraction enables a complete understanding of individual words and sentence structure.The novel part of the proposed approach is the Hierarchical Attention Network(HAN),which fuses and selects features at two levels through an attention mechanism.The HAN can deal with words and sentences to focus on the most pertinent aspects of messages for spam detection.This network is productive in capturing meaningful features,considering both word-level and sentence-level semantics.In the classification step,the model classifies the messages into spam and ham.This hybrid deep learning method improve the feature representation,and enhancing the model’s spam detection capabilities.By significantly reducing the incidence of SMS spam,our model contributes to a safer mobile communication environment,protecting users against potential phishing attacks and scams,and aiding in compliance with privacy and security regulations.This model’s performance was evaluated using the SMS Spam Collection Dataset from the UCI Machine Learning Repository.Cross-validation is employed to consider the dataset’s imbalanced nature,ensuring a reliable evaluation.The proposed model achieved a good accuracy of 99.48%,underscoring its efficiency in identifying SMS spam.
基金supported by EU Commission,under Project ECS 0000024“Rome Technopole”,No.CUP H33C22000420001.
文摘Skin lesion classification plays a crucial role in the early detection and diagnosis of various skin conditions.Recent advances in computer-aided diagnostic techniques have been instrumental in timely intervention,thereby improving patient outcomes,particularly in rural communities lacking specialized expertise.Despite the widespread adoption of convolutional neural networks(CNNs)in skin disease detection,their effectiveness has been hindered by the limited size and data imbalance of publicly accessible skin lesion datasets.In this context,a two-step hierarchical binary classification approach is proposed utilizing hybrid machine and deep learning(DL)techniques.Experiments conducted on the International Skin Imaging Collaboration(ISIC 2017)dataset demonstrate the effectiveness of the hierarchical approach in handling large class imbalances.Specifically,employing DenseNet121(DNET)as a feature extractor and random forest(RF)as a classifier yielded the most promising results,achieving a balanced multiclass accuracy(BMA)of 91.07%compared to the pure deep-learning model(end-to-end DNET)with a BMA of 88.66%.The RF ensemble exhibited significantly greater efficiency than other machine-learning classifiers in aiding DL to address the challenge of learning with limited data.Furthermore,the implemented predictive hybrid hierarchical model demonstrated enhanced performance while significantly reducing computational time,indicating its potential efficiency in real-world applications for the classification of skin lesions.
基金This paper is partially supported by National Natural Foundation of China(Grant No.61772561)the Key Research&Development Plan of Hunan Province(Grant No.2018NK2012)+2 种基金Postgraduate Research and Innovative Project of Central South University of Forestry and Technology(Grant No.20183012)Graduate Education and Teaching Reform Project of Central South University of Forestry and Technology(Grant No.2018JG005)Teaching Reform Project of Central South University of Forestry and Technology(Grant No.20180682).
文摘With the development of deep learning and Convolutional Neural Networks(CNNs),the accuracy of automatic food recognition based on visual data have significantly improved.Some research studies have shown that the deeper the model is,the higher the accuracy is.However,very deep neural networks would be affected by the overfitting problem and also consume huge computing resources.In this paper,a new classification scheme is proposed for automatic food-ingredient recognition based on deep learning.We construct an up-to-date combinational convolutional neural network(CBNet)with a subnet merging technique.Firstly,two different neural networks are utilized for learning interested features.Then,a well-designed feature fusion component aggregates the features from subnetworks,further extracting richer and more precise features for image classification.In order to learn more complementary features,the corresponding fusion strategies are also proposed,including auxiliary classifiers and hyperparameters setting.Finally,CBNet based on the well-known VGGNet,ResNet and DenseNet is evaluated on a dataset including 41 major categories of food ingredients and 100 images for each category.Theoretical analysis and experimental results demonstrate that CBNet achieves promising accuracy for multi-class classification and improves the performance of convolutional neural networks.
基金This work is supported by the Research on Big Data in Application for Education of BUPT(No.2018Y0403)Fundamental Research Funds of BUPT(No.2018XKJC07,2018RC27)the National Natural Science Foundation of China(No.61571059).
文摘A vast amount of information has been produced in recent years,which brings a huge challenge to information management.The better usage of big data is of important theoretical and practical significance for effectively addressing and managing messages.In this paper,we propose a nine-rectangle-grid information model according to the information value and privacy,and then present information use policies based on the rough set theory.Recurrent neural networks were employed to classify OTT messages.The content of user interest is effectively incorporated into the classification process during the annotation of OTT messages,ending with a reliable trained classification model.Experimental results showed that the proposed method yielded an accurate classification performance and hence can be used for effective distribution and control of OTT messages.