A holistic analysis of problem and incident tickets in a real production cloud service environment is presented in this paper.By extracting different bags of words,we use principal component analysis(PCA)to examine th...A holistic analysis of problem and incident tickets in a real production cloud service environment is presented in this paper.By extracting different bags of words,we use principal component analysis(PCA)to examine the clustering characteristics of these tickets.Then Kmeans and latent Dirichlet allocation(LDA)are applied to show the potential clusters within this Cloud environment.The second part of our study uses a pre-trained bidirectional encoder representation from transformers(BERT)model to classify the tickets,with the goal of predicting the optimal dispatching department for a given ticket.Experimental results show that due to the unique characteristics of ticket description,pre-processing with domain knowledge turns out to be critical in both clustering and classification.Our classification model yields 86%accuracy when predicting the target dispatching department.展开更多
Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the...Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.展开更多
The inverse problems for motions of dynamic systems of which are described by system of the ordinary differential equations are examined. The classification of such type of inverse problems is given. It was shown that...The inverse problems for motions of dynamic systems of which are described by system of the ordinary differential equations are examined. The classification of such type of inverse problems is given. It was shown that inverse problems can be divided into two types: synthesis inverse problems and inverse problems of measurement (recognition). Each type of inverse problems requires separate approach to statements and solution methods. The regularization method for obtaining of stable solution of inverse problems was suggested. In some cases, instead of recognition of inverse problems solution, the estimation of solution can be used. Within the framework of this approach, two practical inverse problems of measurement are considered.展开更多
High-dimensional datasets present significant challenges for classification tasks.Dimensionality reduction,a crucial aspect of data preprocessing,has gained substantial attention due to its ability to improve classifi...High-dimensional datasets present significant challenges for classification tasks.Dimensionality reduction,a crucial aspect of data preprocessing,has gained substantial attention due to its ability to improve classification per-formance.However,identifying the optimal features within high-dimensional datasets remains a computationally demanding task,necessitating the use of efficient algorithms.This paper introduces the Arithmetic Optimization Algorithm(AOA),a novel approach for finding the optimal feature subset.AOA is specifically modified to address feature selection problems based on a transfer function.Additionally,two enhancements are incorporated into the AOA algorithm to overcome limitations such as limited precision,slow convergence,and susceptibility to local optima.The first enhancement proposes a new method for selecting solutions to be improved during the search process.This method effectively improves the original algorithm’s accuracy and convergence speed.The second enhancement introduces a local search with neighborhood strategies(AOA_NBH)during the AOA exploitation phase.AOA_NBH explores the vast search space,aiding the algorithm in escaping local optima.Our results demonstrate that incorporating neighborhood methods enhances the output and achieves significant improvement over state-of-the-art methods.展开更多
Big data is a term that refers to a set of data that,due to its largeness or complexity,cannot be stored or processed with one of the usual tools or applications for data management,and it has become a prominent word...Big data is a term that refers to a set of data that,due to its largeness or complexity,cannot be stored or processed with one of the usual tools or applications for data management,and it has become a prominent word in recent years for the massive development of technology.Almost immediately thereafter,the term“big data mining”emerged,i.e.,mining from big data even as an emerging and interconnected field of research.Classification is an important stage in data mining since it helps people make better decisions in a variety of situations,including scientific endeavors,biomedical research,and industrial applications.The probabilistic neural network(PNN)is a commonly used and successful method for handling classification and pattern recognition issues.In this study,the authors proposed to combine the probabilistic neural network(PPN),which is one of the data mining techniques,with the vibrating particles system(VPS),which is one of the metaheuristic algorithms named“VPS-PNN”,to solve classi-fication problems more effectively.The data set is eleven common benchmark medical datasets from the machine-learning library,the suggested method was tested.The suggested VPS-PNN mechanism outperforms the PNN,biogeography-based optimization,enhanced-water cycle algorithm(E-WCA)and the firefly algorithm(FA)in terms of convergence speed and classification accuracy.展开更多
The establishment of a unified land use classification system is the basis for realizing the unified management of land and sea,urban and rural areas,and aboveground and underground space.In November 2020,the Ministry...The establishment of a unified land use classification system is the basis for realizing the unified management of land and sea,urban and rural areas,and aboveground and underground space.In November 2020,the Ministry of Natural Resources of the People's Republic of China issued the Classification Guide for Land and Space Survey,Planning and Use Control of Land and Sea(for Trial Implementation),which aims to establish a national unified land and sea use classification system,lay an important foundation for scientific planning and unified management of natural resources,rational use and protection of natural resources,and speed up the construction of a new pattern of land and space development and protection.However,there are still some obvious shortcomings in the Classification Guide.This paper analyzes some problems existing in this classification standard from three aspects of logicality,rigorousness and comprehensiveness,and puts forward some suggestions for further improvement.This has important practical significance to better guiding the practice of land use and land resources management,and then to achieving the goal of unified management of natural resources.展开更多
A modified multisurface "proximal support vector machine classifier via generalized eigenvalues (GEPSVM for short)" was proposed. By defining a new principle, we designed a new classification approach via GEPSVM, ...A modified multisurface "proximal support vector machine classifier via generalized eigenvalues (GEPSVM for short)" was proposed. By defining a new principle, we designed a new classification approach via GEPSVM, namely, maximum or minimum plane distance GEPSVM (MPDGEPSVM). Unlike GEPSVM, our approach obtains two planes by solving two simple eigenvalue problems, such that it can avoid occurrence of singular problems. Our approach, compared with GEPSVM, has better classification performalce. Moreover, MPDGEPSVM is over one order of magnitude faster than GEPSVM, and almost two orders of magnitude faster than SVM. Computational results on public datasets from UCI database illustrated the efficiency of MPDGEPSVM.展开更多
Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challe...Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.展开更多
This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to ...This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to literature, the ABC algorithm is traditionally applied to constrained and unconstrained problems, but is combined with modified DNA concepts and implemented for fuzzy classification in this present research. Moreover, from the best of our knowledge, previous research on the ABC algorithm has not combined it with DNA computing for hierarchical fuzzy classification to explore the merits of cooperative coevolution. Therefore, this paper is the first to apply the mechanism of symbiosis to create a hybrid modified DNA-ABC algorithm for hierarchical fuzzy classification applications. In this study, the partition number and the shape of the membership function are extracted by the symbiosis based hybrid modified DNA-ABC optimization algorithm, which provides both sufficient global exploration and also adequate local exploitation for hierarchical fuzzy classification. The proposed optimization algorithm is applied on five benchmark University of Irvine (UCI) data sets, and the results prove the efficiency of the algorithm.展开更多
长尾分类在现实世界中是一项不可避免且充满挑战的任务。传统方法通常只专注于类间的不平衡分布,然而近期的研究开始重视类内的长尾分布,即同一类别内,具有头部属性的样本远多于尾部属性的样本。由于属性的隐含性和其组合的复杂性,类内...长尾分类在现实世界中是一项不可避免且充满挑战的任务。传统方法通常只专注于类间的不平衡分布,然而近期的研究开始重视类内的长尾分布,即同一类别内,具有头部属性的样本远多于尾部属性的样本。由于属性的隐含性和其组合的复杂性,类内不平衡问题更加难以处理。为此,文中提出一种基于引领森林并使用多中心损失的广义长尾分类框架(Cognisance),旨在通过不变性特征学习的范式建立长尾分类问题的多粒度联合求解模型。首先,该框架通过无监督学习构建粗粒度引领森林(Coarse-Grained Leading Forest,CLF),以更好地表征类内关于不同属性的样本分布,进而在不变风险最小化的过程中构建不同的环境。其次,设计了一种新的度量学习损失,即多中心损失(Multi-Center Loss,MCL),可在特征学习过程中逐步消除混淆属性。同时,Cognisance不依赖于特定模型结构,可作为独立组件与其他长尾分类方法集成。在ImageNet-GLT和MSCOCO-GLT数据集上的实验结果显示,所提框架取得了最佳性能,现有方法通过与本框架集成,在Top1-Accuracy指标上均获得2%~8%的提升。展开更多
文摘A holistic analysis of problem and incident tickets in a real production cloud service environment is presented in this paper.By extracting different bags of words,we use principal component analysis(PCA)to examine the clustering characteristics of these tickets.Then Kmeans and latent Dirichlet allocation(LDA)are applied to show the potential clusters within this Cloud environment.The second part of our study uses a pre-trained bidirectional encoder representation from transformers(BERT)model to classify the tickets,with the goal of predicting the optimal dispatching department for a given ticket.Experimental results show that due to the unique characteristics of ticket description,pre-processing with domain knowledge turns out to be critical in both clustering and classification.Our classification model yields 86%accuracy when predicting the target dispatching department.
基金supported by the General Projects of ISTIC Innovation Foundation“Problem innovation solution mining based on text generation model”(MS2024-03).
文摘Purpose:A text generation based multidisciplinary problem identification method is proposed,which does not rely on a large amount of data annotation.Design/methodology/approach:The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique;second,it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model;third,it extracts problem phrases from generated titles according to regular expression rules;fourth,it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm;finally,it identifies multidisciplinary problems based on the disciplinary labels of papers.Findings:Experiments in the“Carbon Peaking and Carbon Neutrality”field show that the proposed method can effectively identify multidisciplinary research problems.The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field.Research limitations:It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness.Practical implications:Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments,fund valuable multidisciplinary problems for research management authorities,and borrow ideas from other disciplines for researchers.Originality/value:This approach proposes a novel multidisciplinary problem identification method based on text generation,which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.
文摘The inverse problems for motions of dynamic systems of which are described by system of the ordinary differential equations are examined. The classification of such type of inverse problems is given. It was shown that inverse problems can be divided into two types: synthesis inverse problems and inverse problems of measurement (recognition). Each type of inverse problems requires separate approach to statements and solution methods. The regularization method for obtaining of stable solution of inverse problems was suggested. In some cases, instead of recognition of inverse problems solution, the estimation of solution can be used. Within the framework of this approach, two practical inverse problems of measurement are considered.
文摘High-dimensional datasets present significant challenges for classification tasks.Dimensionality reduction,a crucial aspect of data preprocessing,has gained substantial attention due to its ability to improve classification per-formance.However,identifying the optimal features within high-dimensional datasets remains a computationally demanding task,necessitating the use of efficient algorithms.This paper introduces the Arithmetic Optimization Algorithm(AOA),a novel approach for finding the optimal feature subset.AOA is specifically modified to address feature selection problems based on a transfer function.Additionally,two enhancements are incorporated into the AOA algorithm to overcome limitations such as limited precision,slow convergence,and susceptibility to local optima.The first enhancement proposes a new method for selecting solutions to be improved during the search process.This method effectively improves the original algorithm’s accuracy and convergence speed.The second enhancement introduces a local search with neighborhood strategies(AOA_NBH)during the AOA exploitation phase.AOA_NBH explores the vast search space,aiding the algorithm in escaping local optima.Our results demonstrate that incorporating neighborhood methods enhances the output and achieves significant improvement over state-of-the-art methods.
文摘Big data is a term that refers to a set of data that,due to its largeness or complexity,cannot be stored or processed with one of the usual tools or applications for data management,and it has become a prominent word in recent years for the massive development of technology.Almost immediately thereafter,the term“big data mining”emerged,i.e.,mining from big data even as an emerging and interconnected field of research.Classification is an important stage in data mining since it helps people make better decisions in a variety of situations,including scientific endeavors,biomedical research,and industrial applications.The probabilistic neural network(PNN)is a commonly used and successful method for handling classification and pattern recognition issues.In this study,the authors proposed to combine the probabilistic neural network(PPN),which is one of the data mining techniques,with the vibrating particles system(VPS),which is one of the metaheuristic algorithms named“VPS-PNN”,to solve classi-fication problems more effectively.The data set is eleven common benchmark medical datasets from the machine-learning library,the suggested method was tested.The suggested VPS-PNN mechanism outperforms the PNN,biogeography-based optimization,enhanced-water cycle algorithm(E-WCA)and the firefly algorithm(FA)in terms of convergence speed and classification accuracy.
文摘The establishment of a unified land use classification system is the basis for realizing the unified management of land and sea,urban and rural areas,and aboveground and underground space.In November 2020,the Ministry of Natural Resources of the People's Republic of China issued the Classification Guide for Land and Space Survey,Planning and Use Control of Land and Sea(for Trial Implementation),which aims to establish a national unified land and sea use classification system,lay an important foundation for scientific planning and unified management of natural resources,rational use and protection of natural resources,and speed up the construction of a new pattern of land and space development and protection.However,there are still some obvious shortcomings in the Classification Guide.This paper analyzes some problems existing in this classification standard from three aspects of logicality,rigorousness and comprehensiveness,and puts forward some suggestions for further improvement.This has important practical significance to better guiding the practice of land use and land resources management,and then to achieving the goal of unified management of natural resources.
基金The National Defence Basic Research Pro-gram in China(No.S0500A001)the National High Technol-ogy Research and Development Program of China(863 Pro-gram) (No.2002AA411030)the Scientific and Techno-logical Innovation Foundation of Jiangsu Province in China
文摘A modified multisurface "proximal support vector machine classifier via generalized eigenvalues (GEPSVM for short)" was proposed. By defining a new principle, we designed a new classification approach via GEPSVM, namely, maximum or minimum plane distance GEPSVM (MPDGEPSVM). Unlike GEPSVM, our approach obtains two planes by solving two simple eigenvalue problems, such that it can avoid occurrence of singular problems. Our approach, compared with GEPSVM, has better classification performalce. Moreover, MPDGEPSVM is over one order of magnitude faster than GEPSVM, and almost two orders of magnitude faster than SVM. Computational results on public datasets from UCI database illustrated the efficiency of MPDGEPSVM.
基金This research was supported by Taif University Researchers Supporting Project number(TURSP-2020/254),Taif University,Taif,Saudi Arabia.
文摘Datasets with the imbalanced class distribution are difficult to handle with the standard classification algorithms.In supervised learning,dealing with the problem of class imbalance is still considered to be a challenging research problem.Various machine learning techniques are designed to operate on balanced datasets;therefore,the state of the art,different undersampling,over-sampling and hybrid strategies have been proposed to deal with the problem of imbalanced datasets,but highly skewed datasets still pose the problem of generalization and noise generation during resampling.To overcome these problems,this paper proposes amajority clusteringmodel for classification of imbalanced datasets known as MCBC-SMOTE(Majority Clustering for balanced Classification-SMOTE).The model provides a method to convert the problem of binary classification into a multi-class problem.In the proposed algorithm,the number of clusters for themajority class is calculated using the elbow method and the minority class is over-sampled as an average of clustered majority classes to generate a symmetrical class distribution.The proposed technique is cost-effective,reduces the problem of noise generation and successfully disables the imbalances present in between and within classes.The results of the evaluations on diverse real datasets proved to provide better classification results as compared to state of the art existing methodologies based on several performance metrics.
文摘This paper offers a symbiosis based hybrid modified DNA-ABC optimization algorithm which combines modified DNA concepts and artificial bee colony (ABC) algorithm to aid hierarchical fuzzy classification. According to literature, the ABC algorithm is traditionally applied to constrained and unconstrained problems, but is combined with modified DNA concepts and implemented for fuzzy classification in this present research. Moreover, from the best of our knowledge, previous research on the ABC algorithm has not combined it with DNA computing for hierarchical fuzzy classification to explore the merits of cooperative coevolution. Therefore, this paper is the first to apply the mechanism of symbiosis to create a hybrid modified DNA-ABC algorithm for hierarchical fuzzy classification applications. In this study, the partition number and the shape of the membership function are extracted by the symbiosis based hybrid modified DNA-ABC optimization algorithm, which provides both sufficient global exploration and also adequate local exploitation for hierarchical fuzzy classification. The proposed optimization algorithm is applied on five benchmark University of Irvine (UCI) data sets, and the results prove the efficiency of the algorithm.
文摘长尾分类在现实世界中是一项不可避免且充满挑战的任务。传统方法通常只专注于类间的不平衡分布,然而近期的研究开始重视类内的长尾分布,即同一类别内,具有头部属性的样本远多于尾部属性的样本。由于属性的隐含性和其组合的复杂性,类内不平衡问题更加难以处理。为此,文中提出一种基于引领森林并使用多中心损失的广义长尾分类框架(Cognisance),旨在通过不变性特征学习的范式建立长尾分类问题的多粒度联合求解模型。首先,该框架通过无监督学习构建粗粒度引领森林(Coarse-Grained Leading Forest,CLF),以更好地表征类内关于不同属性的样本分布,进而在不变风险最小化的过程中构建不同的环境。其次,设计了一种新的度量学习损失,即多中心损失(Multi-Center Loss,MCL),可在特征学习过程中逐步消除混淆属性。同时,Cognisance不依赖于特定模型结构,可作为独立组件与其他长尾分类方法集成。在ImageNet-GLT和MSCOCO-GLT数据集上的实验结果显示,所提框架取得了最佳性能,现有方法通过与本框架集成,在Top1-Accuracy指标上均获得2%~8%的提升。