Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditi...Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree (SCT) to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved.展开更多
We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to r...We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification.展开更多
Purpose: Formal concept analysis(FCA) and concept lattice theory(CLT) are introduced for constructing a network of IDR topics and for evaluating their effectiveness for knowledge structure exploration.Design/methodolo...Purpose: Formal concept analysis(FCA) and concept lattice theory(CLT) are introduced for constructing a network of IDR topics and for evaluating their effectiveness for knowledge structure exploration.Design/methodology/approach: We introduced the theory and applications of FCA and CLT, and then proposed a method for interdisciplinary knowledge discovery based on CLT. As an example of empirical analysis, interdisciplinary research(IDR) topics in Information & Library Science(LIS) and Medical Informatics, and in LIS and Geography-Physical, were utilized as empirical fields. Subsequently, we carried out a comparative analysis with two other IDR topic recognition methods.Findings: The CLT approach is suitable for IDR topic identification and predictions.Research limitations: IDR topic recognition based on the CLT is not sensitive to the interdisciplinarity of topic terms, since the data can only reflect whether there is a relationship between the discipline and the topic terms. Moreover, the CLT cannot clearly represent a large amounts of concepts.Practical implications: A deeper understanding of the IDR topics was obtained as the structural and hierarchical relationships between them were identified, which can help to get more precise identification and prediction to IDR topics.Originality/value: IDR topics identification based on CLT have performed well and this theory has several advantages for identifying and predicting IDR topics. First, in a concept lattice, there is a partial order relation between interconnected nodes, and consequently, a complete concept lattice can present hierarchical properties. Second, clustering analysis of IDR topics based on concept lattices can yield clusters that highlight the essential knowledge features and help display the semantic relationship between different IDR topics. Furthermore, the Hasse diagram automatically displays all the IDR topics associated with the different disciplines, thus forming clusters of specific concepts and visually retaining and presenting the associations of IDR topics through multiple inheritance relationships between the concepts.展开更多
数据流分类是数据挖掘中重要的研究内容,但是数据流中的概念漂移和标记成本昂贵的问题给分类带来了巨大的挑战。现有的研究工作大多采用基于主动学习的在线分类技术,一定程度上缓解了概念漂移和有限标签的问题,但是这些方法的分类效率较...数据流分类是数据挖掘中重要的研究内容,但是数据流中的概念漂移和标记成本昂贵的问题给分类带来了巨大的挑战。现有的研究工作大多采用基于主动学习的在线分类技术,一定程度上缓解了概念漂移和有限标签的问题,但是这些方法的分类效率较低,并且忽略了内存开销的问题。针对这些问题提出了一种结合微聚类和主动学习的流分类方法(a data stream classification method combining micro-clustering and active learning,CALC)。提出一种新的主动学习混合查询策略,将其与基于错误的表示学习相结合,从而在维护过程中衡量每个微聚类的重要性,通过动态维护一组微聚类以适应数据流中产生的概念漂移。采用基于微聚类的惰性学习方法,实现对数据流的分类,并完成对缓存微聚类的在线更新。使用三个真实数据集和三个人工合成数据集进行实验,结果显示CALC在分类准确率和内存开销方面优于现有的数据流分类算法。与基准模型(online reliable semi-supervised learning on evolving data streams,ORSL)相比,CALC的分类准确率有一定的提升,在六个数据集上的平均准确率分别提高了5.07、2.41、1.04、1.03、3.47、0.64个百分点。展开更多
高等数学课程群在高校公共必修课中占据着举足轻重的位置,随着当代信息技术日渐发展以及在教育领域的逐步渗透,高等数学课程群的教学模式的转变日新月异。近些年,课程思政在高等数学课程群教学中的受欢迎程度逐渐提高。然而,立足Outcome...高等数学课程群在高校公共必修课中占据着举足轻重的位置,随着当代信息技术日渐发展以及在教育领域的逐步渗透,高等数学课程群的教学模式的转变日新月异。近些年,课程思政在高等数学课程群教学中的受欢迎程度逐渐提高。然而,立足Outcome Based Education(OBE)教育理念,不难发现在高等数学课程群的思政教学中存在一些不足之处有待解决。文本以OBE教育理念为依据,结合高等数学课程群的教学现状,提出了高等数学课程群课程思政创新教学建议,以期完善OBE教育理念下的高等数学课程群课程思政教学体系。展开更多
基金Supported by the National Natural Science Foun-dation of China (60403027) the Natural Science Foundation of HubeiProvince (2005ABA258)the Opening Foundation of State KeyLaboratory of Software Engineering (SKLSE05-07)
文摘Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree (SCT) to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved.
文摘We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among key-words in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality. Key words text classification - concept association - hierarchical clustering - hamming clustering CLC number TN 915. 08 Foundation item: Supporteded by the National 863 Project of China (2001AA142160, 2002AA145090)Biography: Su Gui-yang (1974-), male, Ph. D candidate, research direction: information filter and text classification.
基金an outcome of the project "Study on the Recognition Method of Innovative Evolving Trajectory based on Topic Correlation Analysis of Science and Technology" (No. 71704170) supported by National Natural Science Foundation of Chinathe project "Study on Regularity and Dynamics of Knowledge Diffusion among Scientific Disciplines" (No. 71704063) supported by National Natura Science Foundation of Chinathe Youth Innovation Promotion Association, CAS (Grant No. 2016159)
文摘Purpose: Formal concept analysis(FCA) and concept lattice theory(CLT) are introduced for constructing a network of IDR topics and for evaluating their effectiveness for knowledge structure exploration.Design/methodology/approach: We introduced the theory and applications of FCA and CLT, and then proposed a method for interdisciplinary knowledge discovery based on CLT. As an example of empirical analysis, interdisciplinary research(IDR) topics in Information & Library Science(LIS) and Medical Informatics, and in LIS and Geography-Physical, were utilized as empirical fields. Subsequently, we carried out a comparative analysis with two other IDR topic recognition methods.Findings: The CLT approach is suitable for IDR topic identification and predictions.Research limitations: IDR topic recognition based on the CLT is not sensitive to the interdisciplinarity of topic terms, since the data can only reflect whether there is a relationship between the discipline and the topic terms. Moreover, the CLT cannot clearly represent a large amounts of concepts.Practical implications: A deeper understanding of the IDR topics was obtained as the structural and hierarchical relationships between them were identified, which can help to get more precise identification and prediction to IDR topics.Originality/value: IDR topics identification based on CLT have performed well and this theory has several advantages for identifying and predicting IDR topics. First, in a concept lattice, there is a partial order relation between interconnected nodes, and consequently, a complete concept lattice can present hierarchical properties. Second, clustering analysis of IDR topics based on concept lattices can yield clusters that highlight the essential knowledge features and help display the semantic relationship between different IDR topics. Furthermore, the Hasse diagram automatically displays all the IDR topics associated with the different disciplines, thus forming clusters of specific concepts and visually retaining and presenting the associations of IDR topics through multiple inheritance relationships between the concepts.
文摘数据流分类是数据挖掘中重要的研究内容,但是数据流中的概念漂移和标记成本昂贵的问题给分类带来了巨大的挑战。现有的研究工作大多采用基于主动学习的在线分类技术,一定程度上缓解了概念漂移和有限标签的问题,但是这些方法的分类效率较低,并且忽略了内存开销的问题。针对这些问题提出了一种结合微聚类和主动学习的流分类方法(a data stream classification method combining micro-clustering and active learning,CALC)。提出一种新的主动学习混合查询策略,将其与基于错误的表示学习相结合,从而在维护过程中衡量每个微聚类的重要性,通过动态维护一组微聚类以适应数据流中产生的概念漂移。采用基于微聚类的惰性学习方法,实现对数据流的分类,并完成对缓存微聚类的在线更新。使用三个真实数据集和三个人工合成数据集进行实验,结果显示CALC在分类准确率和内存开销方面优于现有的数据流分类算法。与基准模型(online reliable semi-supervised learning on evolving data streams,ORSL)相比,CALC的分类准确率有一定的提升,在六个数据集上的平均准确率分别提高了5.07、2.41、1.04、1.03、3.47、0.64个百分点。
文摘高等数学课程群在高校公共必修课中占据着举足轻重的位置,随着当代信息技术日渐发展以及在教育领域的逐步渗透,高等数学课程群的教学模式的转变日新月异。近些年,课程思政在高等数学课程群教学中的受欢迎程度逐渐提高。然而,立足Outcome Based Education(OBE)教育理念,不难发现在高等数学课程群的思政教学中存在一些不足之处有待解决。文本以OBE教育理念为依据,结合高等数学课程群的教学现状,提出了高等数学课程群课程思政创新教学建议,以期完善OBE教育理念下的高等数学课程群课程思政教学体系。