期刊文献+

具有特征语义权重的数据聚类方法 被引量:1

Data Clustering Method with Feature Semantic Weight
下载PDF
导出
摘要 针对聚类中的特征选择问题,提出一种基于特征语义权重的数据聚类方法。该方法由用户指定必需的特征集,通过计算特征之间的语义相关度,选择和指定特征集相关的特征集作为补充。利用语义相关度确定各个特征的语义权重,在特征语义权重计算的基础上对传统的K-Means聚类算法进行改进,提出具有特征语义权重的FSW-KMeans算法。实验结果表明,FSW-KMeans算法较大地提高了聚类算法准确率和效率。 This paper proposes a data clustering method based on feature semantic weight for feature selection in clustering. The method acquires Must-Link set from user, and chooses the features which are relevant to the Must-Link as a supplement by calculating the semantic relativity and calculates feature semantic weight by the semantic relativity. It improves the traditional K-Means clustering algorithm based on the calculation of semantic relativity and presents FSW-KMeans clustering algorithm with feature semantics weight. Experimental results show that the clustering accuracy and efficiency of FSW-KMeans algorithm are improved.
出处 《计算机工程》 CAS CSCD 北大核心 2011年第4期64-66,共3页 Computer Engineering
基金 国家自然科学基金资助项目(50674086) 江苏省社会发展科技计划基金资助项目(BS2006002) 高等学校博士学科点专项科研基金资助项目(20060290508) 中国矿业大学校基金资助项目(0D090229)
关键词 本体 特征语义权重 语义相关度 FSW-KMeans算法 ontology feature semantic weight semantic relativity FSW-KMeans algorithm
  • 相关文献

参考文献6

  • 1卫威,王建民.一种大规模数据的快速潜在语义索引[J].计算机工程,2009,35(15):35-37. 被引量:10
  • 2Nagar A, AI-Mubaid H. A New Path Length Measure Based on GO for Gene Similarity with Evaluation[C]//Proc. of the 21st IEEE International Symposium on Computer-based Medical Systems. Washington D. C., USA: IEEE Press, 2008.
  • 3Esther G, Nayak R, Xu Yue, et al. A User Driven Data Mining Process Model and Learning System[C]//Proc. of the 13th International Conference on Database Systems for Advanced Applications. New Delhi, India: Is. n.], 2008.
  • 4Huang J Z, Ng M K, Rong H, et al. Automated Variable Weighting in k-mean Type Clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5): 657-668.
  • 5Huang Haichao, Cheng Yong, Zhao Ruilian. A Semi-supervised Clustering Algorithm Based on Must-link Set[C]//Proc. of the 4th International Conference on Advanced Data Mining and Applications. Chengdu, China: [s. n.], 2008.
  • 6曹文平.一种有效k-均值聚类中心的选取方法[J].计算机与现代化,2008(3):95-97. 被引量:9

二级参考文献13

  • 1何明,冯博琴,傅向华.基于Rough集潜在语义索引的Web文档分类[J].计算机工程,2004,30(13):3-5. 被引量:7
  • 2Scott C D,Dumais S T,Thomas K L,et al.Indexing by Latent Semantic Analysis[J].Journal of the American Society for Information Sciences,1990,41 (6):391-407.
  • 3Tang Chunqiang,Dwarkadas S,Xu Zhichen.On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems[C]//Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval.NY,USA:ACM Press,2004:112-121.
  • 4Kolda T G,O'Leary D P.A Semidiscrete Matrix Decomposition for Latent Semantic Indexing Information Retrieval[J].ACM Trans.on Inf.Syst.,1998,16(4):322-346.
  • 5Karypis G,Hart E H S.Concept Indexing:A Fast Dimensionality Reduction Algorithm with Application to Document Retrieval and Categorization[C]//Proceedings of CIKM'00.McLean,VA,USA:[s.n.],2000:12-19.
  • 6Bingham E,Mannila H.Random Projection in Dimensionality Reduction:Applications to Image and Text Data[C]//Proceedings of KDD'01.San Francisco,CA,USA:[s.n.],2001:245-250.
  • 7P S Bradley, Usmna M Fayyad. Refinining initial points for k-means clustering [ C ]//15th International Conf. on Machine Learning, 1998.
  • 8Pavel Berkhin. Survey of Clustering Data Mining Techniques [ DB/OL]. http://www.ee. ucr. edu/-barth/ EE242/clustering_survey. pdf,2002-03-01.
  • 9Siddheswar Ray, Rose H Turi. Determination of Number of Clusters in k-means Clustering and Application in Colour Image Segmentation [ DB/OL ]. http ://www. esse. monash. edu. au/-roset/papers/ca199.pdf, 1999-03-01.
  • 10J M Pena, J A Lozano, P Larranaga. An empirical comparison of four initialization methods for the k-means algorithm [J]. Pattern Recognition Letters, 1999,20 (10) : 1027- 1040.

共引文献17

同被引文献8

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部