Three-way concept analysis is an important tool for information processing,and rule acquisition is one of the research hotspots of three-way concept analysis.However,compared with three-way concept lattices,three-way ...Three-way concept analysis is an important tool for information processing,and rule acquisition is one of the research hotspots of three-way concept analysis.However,compared with three-way concept lattices,three-way semi-concept lattices have three-way operators with weaker constraints,which can generate more concepts.In this article,the problem of rule acquisition for three-way semi-concept lattices is discussed in general.The authors construct the finer relation of three-way semi-concept lattices,and propose a method of rule acquisition for three-way semi-concept lattices.The authors also discuss the set of decision rules and the relationships of decision rules among object-induced three-way semi-concept lattices,object-induced three-way concept lattices,classical concept lattices and semi-concept lattices.Finally,examples are provided to illustrate the validity of our conclusions.展开更多
To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Con...To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.展开更多
Cloud computing has developed as an important information technology paradigm which can provide on-demand services. Meanwhile,its energy consumption problem has attracted a grow-ing attention both from academic and in...Cloud computing has developed as an important information technology paradigm which can provide on-demand services. Meanwhile,its energy consumption problem has attracted a grow-ing attention both from academic and industrial communities. In this paper,from the perspective of cloud tasks,the relationship between cloud tasks and cloud platform energy consumption is established and analyzed on the basis of the multidimensional attributes of cloud tasks. Furthermore,a three-way clustering algorithm of cloud tasks is proposed for saving energy. In the algorithm,f irst,t he cloud tasks are classified into three categories according to the content properties of the cloud tasks and resources respectively. Next,cloud tasks and cloud resources are clustered according to their computation characteristics( e. g. computation-intensive,data-intensive). Subsequently,greedy scheduling is performed. The simulation results showthat the proposed algorithm can significantly reduce the energy cost and improve resources utilization,compared with the general greedy scheduling algorithm.展开更多
We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to r...We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification.展开更多
Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditi...Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree (SCT) to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved.展开更多
Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approac...Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.展开更多
Purpose: Formal concept analysis(FCA) and concept lattice theory(CLT) are introduced for constructing a network of IDR topics and for evaluating their effectiveness for knowledge structure exploration.Design/methodolo...Purpose: Formal concept analysis(FCA) and concept lattice theory(CLT) are introduced for constructing a network of IDR topics and for evaluating their effectiveness for knowledge structure exploration.Design/methodology/approach: We introduced the theory and applications of FCA and CLT, and then proposed a method for interdisciplinary knowledge discovery based on CLT. As an example of empirical analysis, interdisciplinary research(IDR) topics in Information & Library Science(LIS) and Medical Informatics, and in LIS and Geography-Physical, were utilized as empirical fields. Subsequently, we carried out a comparative analysis with two other IDR topic recognition methods.Findings: The CLT approach is suitable for IDR topic identification and predictions.Research limitations: IDR topic recognition based on the CLT is not sensitive to the interdisciplinarity of topic terms, since the data can only reflect whether there is a relationship between the discipline and the topic terms. Moreover, the CLT cannot clearly represent a large amounts of concepts.Practical implications: A deeper understanding of the IDR topics was obtained as the structural and hierarchical relationships between them were identified, which can help to get more precise identification and prediction to IDR topics.Originality/value: IDR topics identification based on CLT have performed well and this theory has several advantages for identifying and predicting IDR topics. First, in a concept lattice, there is a partial order relation between interconnected nodes, and consequently, a complete concept lattice can present hierarchical properties. Second, clustering analysis of IDR topics based on concept lattices can yield clusters that highlight the essential knowledge features and help display the semantic relationship between different IDR topics. Furthermore, the Hasse diagram automatically displays all the IDR topics associated with the different disciplines, thus forming clusters of specific concepts and visually retaining and presenting the associations of IDR topics through multiple inheritance relationships between the concepts.展开更多
Drug taxonomy could be described as an inherent structure of different pharmaceutical componential drugs. Unfortunately, the literature does not always provide a clear path to define and classify adverse drug events. ...Drug taxonomy could be described as an inherent structure of different pharmaceutical componential drugs. Unfortunately, the literature does not always provide a clear path to define and classify adverse drug events. While not a systematic review, this paper uses examples from the literature to illustrate problems that investigators will confront as they develop a conceptual framework for their research. It also proposes a targeted taxonomy that can facilitate a clear and consistent approach to understanding different drugs and could aid in the comparison to results of past and future studies. In terms of building the drugs taxonomy, symptoms information were selected, clustered and adapted for this purpose. Finally, although national or international agreement on taxonomy for different drugs is a distant or unachievable goal, individual investigations and the literature as a whole will be improved by prospective, explicit classification of different drugs using this new pharmacy information system (PIS) and inclusion of the study's approach to classification in publications. The PIS allows user to find information quickly by following semantic connections that surround every drug linked to the subject. It provides quicker search, faster and more intuitive understanding of the focus. This research work can pretend to become a leading provider of encyclopedia service for scientists and educators, as well as attract the scientific community-universities, research and development groups.展开更多
高等数学课程群在高校公共必修课中占据着举足轻重的位置,随着当代信息技术日渐发展以及在教育领域的逐步渗透,高等数学课程群的教学模式的转变日新月异。近些年,课程思政在高等数学课程群教学中的受欢迎程度逐渐提高。然而,立足Outcome...高等数学课程群在高校公共必修课中占据着举足轻重的位置,随着当代信息技术日渐发展以及在教育领域的逐步渗透,高等数学课程群的教学模式的转变日新月异。近些年,课程思政在高等数学课程群教学中的受欢迎程度逐渐提高。然而,立足Outcome Based Education(OBE)教育理念,不难发现在高等数学课程群的思政教学中存在一些不足之处有待解决。文本以OBE教育理念为依据,结合高等数学课程群的教学现状,提出了高等数学课程群课程思政创新教学建议,以期完善OBE教育理念下的高等数学课程群课程思政教学体系。展开更多
数据流分类是数据挖掘中重要的研究内容,但是数据流中的概念漂移和标记成本昂贵的问题给分类带来了巨大的挑战。现有的研究工作大多采用基于主动学习的在线分类技术,一定程度上缓解了概念漂移和有限标签的问题,但是这些方法的分类效率较...数据流分类是数据挖掘中重要的研究内容,但是数据流中的概念漂移和标记成本昂贵的问题给分类带来了巨大的挑战。现有的研究工作大多采用基于主动学习的在线分类技术,一定程度上缓解了概念漂移和有限标签的问题,但是这些方法的分类效率较低,并且忽略了内存开销的问题。针对这些问题提出了一种结合微聚类和主动学习的流分类方法(a data stream classification method combining micro-clustering and active learning,CALC)。提出一种新的主动学习混合查询策略,将其与基于错误的表示学习相结合,从而在维护过程中衡量每个微聚类的重要性,通过动态维护一组微聚类以适应数据流中产生的概念漂移。采用基于微聚类的惰性学习方法,实现对数据流的分类,并完成对缓存微聚类的在线更新。使用三个真实数据集和三个人工合成数据集进行实验,结果显示CALC在分类准确率和内存开销方面优于现有的数据流分类算法。与基准模型(online reliable semi-supervised learning on evolving data streams,ORSL)相比,CALC的分类准确率有一定的提升,在六个数据集上的平均准确率分别提高了5.07、2.41、1.04、1.03、3.47、0.64个百分点。展开更多
基金Central University Basic Research Fund of China,Grant/Award Number:FWNX04Ningxia Natural Science Foundation,Grant/Award Number:2021AAC03203National Natural Science Foundation of China,Grant/Award Number:61662001。
文摘Three-way concept analysis is an important tool for information processing,and rule acquisition is one of the research hotspots of three-way concept analysis.However,compared with three-way concept lattices,three-way semi-concept lattices have three-way operators with weaker constraints,which can generate more concepts.In this article,the problem of rule acquisition for three-way semi-concept lattices is discussed in general.The authors construct the finer relation of three-way semi-concept lattices,and propose a method of rule acquisition for three-way semi-concept lattices.The authors also discuss the set of decision rules and the relationships of decision rules among object-induced three-way semi-concept lattices,object-induced three-way concept lattices,classical concept lattices and semi-concept lattices.Finally,examples are provided to illustrate the validity of our conclusions.
基金The National Natural Science Foundation of China(No60672056)Open Fund of MOE-MS Key Laboratory of Multime-dia Computing and Communication(No06120809)
文摘To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.
基金Supported by the Harbin Technology Bureau Youth Talented Project(2014RFQXJ073)China Postdoctoral Fund Projects(2014M561330)
文摘Cloud computing has developed as an important information technology paradigm which can provide on-demand services. Meanwhile,its energy consumption problem has attracted a grow-ing attention both from academic and industrial communities. In this paper,from the perspective of cloud tasks,the relationship between cloud tasks and cloud platform energy consumption is established and analyzed on the basis of the multidimensional attributes of cloud tasks. Furthermore,a three-way clustering algorithm of cloud tasks is proposed for saving energy. In the algorithm,f irst,t he cloud tasks are classified into three categories according to the content properties of the cloud tasks and resources respectively. Next,cloud tasks and cloud resources are clustered according to their computation characteristics( e. g. computation-intensive,data-intensive). Subsequently,greedy scheduling is performed. The simulation results showthat the proposed algorithm can significantly reduce the energy cost and improve resources utilization,compared with the general greedy scheduling algorithm.
文摘We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification.
基金Supported by the National Natural Science Foun-dation of China (60403027) the Natural Science Foundation of HubeiProvince (2005ABA258)the Opening Foundation of State KeyLaboratory of Software Engineering (SKLSE05-07)
文摘Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree (SCT) to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved.
文摘Clustering high dimensional data is challenging as data dimensionality increases the distance between data points,resulting in sparse regions that degrade clustering performance.Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space.Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams.Data streams are not only high-dimensional,but also unbounded and evolving.This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams.Although many articles have contributed to the literature review on data stream clustering,there is currently no specific review on subspace clustering algorithms in high-dimensional data streams.Therefore,this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments.The review follows a systematic methodological approach and includes 18 articles for the final analysis.The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams.The main findings relate to six elements:clustering process,cluster search,subspace search,synopsis structure,cluster maintenance,and evaluation measures.Most algorithms use a two-phase clustering approach consisting of an initialization stage,a refinement stage,a cluster maintenance stage,and a final clustering stage.The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected microclusters.Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers.Future work can focus on the clustering framework,parameter optimization,subspace search techniques,memory-efficient synopsis structures,explicit cluster change detection,and intrinsic performance metrics.This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.
基金an outcome of the project "Study on the Recognition Method of Innovative Evolving Trajectory based on Topic Correlation Analysis of Science and Technology" (No. 71704170) supported by National Natural Science Foundation of Chinathe project "Study on Regularity and Dynamics of Knowledge Diffusion among Scientific Disciplines" (No. 71704063) supported by National Natura Science Foundation of Chinathe Youth Innovation Promotion Association, CAS (Grant No. 2016159)
文摘Purpose: Formal concept analysis(FCA) and concept lattice theory(CLT) are introduced for constructing a network of IDR topics and for evaluating their effectiveness for knowledge structure exploration.Design/methodology/approach: We introduced the theory and applications of FCA and CLT, and then proposed a method for interdisciplinary knowledge discovery based on CLT. As an example of empirical analysis, interdisciplinary research(IDR) topics in Information & Library Science(LIS) and Medical Informatics, and in LIS and Geography-Physical, were utilized as empirical fields. Subsequently, we carried out a comparative analysis with two other IDR topic recognition methods.Findings: The CLT approach is suitable for IDR topic identification and predictions.Research limitations: IDR topic recognition based on the CLT is not sensitive to the interdisciplinarity of topic terms, since the data can only reflect whether there is a relationship between the discipline and the topic terms. Moreover, the CLT cannot clearly represent a large amounts of concepts.Practical implications: A deeper understanding of the IDR topics was obtained as the structural and hierarchical relationships between them were identified, which can help to get more precise identification and prediction to IDR topics.Originality/value: IDR topics identification based on CLT have performed well and this theory has several advantages for identifying and predicting IDR topics. First, in a concept lattice, there is a partial order relation between interconnected nodes, and consequently, a complete concept lattice can present hierarchical properties. Second, clustering analysis of IDR topics based on concept lattices can yield clusters that highlight the essential knowledge features and help display the semantic relationship between different IDR topics. Furthermore, the Hasse diagram automatically displays all the IDR topics associated with the different disciplines, thus forming clusters of specific concepts and visually retaining and presenting the associations of IDR topics through multiple inheritance relationships between the concepts.
文摘Drug taxonomy could be described as an inherent structure of different pharmaceutical componential drugs. Unfortunately, the literature does not always provide a clear path to define and classify adverse drug events. While not a systematic review, this paper uses examples from the literature to illustrate problems that investigators will confront as they develop a conceptual framework for their research. It also proposes a targeted taxonomy that can facilitate a clear and consistent approach to understanding different drugs and could aid in the comparison to results of past and future studies. In terms of building the drugs taxonomy, symptoms information were selected, clustered and adapted for this purpose. Finally, although national or international agreement on taxonomy for different drugs is a distant or unachievable goal, individual investigations and the literature as a whole will be improved by prospective, explicit classification of different drugs using this new pharmacy information system (PIS) and inclusion of the study's approach to classification in publications. The PIS allows user to find information quickly by following semantic connections that surround every drug linked to the subject. It provides quicker search, faster and more intuitive understanding of the focus. This research work can pretend to become a leading provider of encyclopedia service for scientists and educators, as well as attract the scientific community-universities, research and development groups.
文摘高等数学课程群在高校公共必修课中占据着举足轻重的位置,随着当代信息技术日渐发展以及在教育领域的逐步渗透,高等数学课程群的教学模式的转变日新月异。近些年,课程思政在高等数学课程群教学中的受欢迎程度逐渐提高。然而,立足Outcome Based Education(OBE)教育理念,不难发现在高等数学课程群的思政教学中存在一些不足之处有待解决。文本以OBE教育理念为依据,结合高等数学课程群的教学现状,提出了高等数学课程群课程思政创新教学建议,以期完善OBE教育理念下的高等数学课程群课程思政教学体系。
文摘数据流分类是数据挖掘中重要的研究内容,但是数据流中的概念漂移和标记成本昂贵的问题给分类带来了巨大的挑战。现有的研究工作大多采用基于主动学习的在线分类技术,一定程度上缓解了概念漂移和有限标签的问题,但是这些方法的分类效率较低,并且忽略了内存开销的问题。针对这些问题提出了一种结合微聚类和主动学习的流分类方法(a data stream classification method combining micro-clustering and active learning,CALC)。提出一种新的主动学习混合查询策略,将其与基于错误的表示学习相结合,从而在维护过程中衡量每个微聚类的重要性,通过动态维护一组微聚类以适应数据流中产生的概念漂移。采用基于微聚类的惰性学习方法,实现对数据流的分类,并完成对缓存微聚类的在线更新。使用三个真实数据集和三个人工合成数据集进行实验,结果显示CALC在分类准确率和内存开销方面优于现有的数据流分类算法。与基准模型(online reliable semi-supervised learning on evolving data streams,ORSL)相比,CALC的分类准确率有一定的提升,在六个数据集上的平均准确率分别提高了5.07、2.41、1.04、1.03、3.47、0.64个百分点。