摘要
基于本体的文本聚类方法,在文本表示上引入WordNet,并定义了关键概念集,使用WordNet中的概念节点及概念间的语义关系减少文本特征向量维数,提高聚类效果.聚类过程中,算法使用文本的关键概念集和概念特征向量计算文本相似度,利用文本的关键概念集标注聚簇为聚类结果中的各个簇提供解释.实验结果表明,该方法有效地减少了文本特征向量的维数,提高了文本聚类效果以及聚类结果的可解释性.
The text clustering method based on ontology applies WordNet and key concept set during text reprensentation, and the concept nodes and the semantic relations between the concepts in the ontology WordNet are used to reduce the number of features so as to improve clustering effect. And during text clustering, the algorithm uses the key concept set and the concept feature vector to calculate the similarity and uses key concept set to provide an explanation for every cluster of the result. The experimental results show that the method can effectively reduce the dimension number of the text feature vector and improve the text clustering effect compared with other text clustering algorithm and the novel method for text clustering can come up with a good explanation for the clusters.
出处
《吉林大学学报(理学版)》
CAS
CSCD
北大核心
2010年第2期277-283,共7页
Journal of Jilin University:Science Edition
基金
国家自然科学基金(批准号:60973040
60903098)
教育部高等学校博士学科点专项科研基金(批准号:200801830021)
吉林省自然科学基金(批准号:20070533)
吉林大学基本科研业务费交叉学科与创新项目基金(批准号:200810025)