期刊文献+

基于半监督学习的中文多文档子主题划分 被引量:1

Sub-topic detecting for chinese multi-documents based on semi-supervised learning
下载PDF
导出
摘要 为了能在多文档自动摘要过程中更好地划分子主题,提出了一种基于半监督学习的子主题划分方法:首先计算句子的语义相似度;然后通过层次聚类对可信度高的句子进行主题类别标记,生成少量已标记主题类别的句子集,在此基础上对所有句子进行constrained-k-means聚类,通过交叉验证的方法确定子主题的数目k;最后使用k-means聚类获得多文档的各个子主题.实验结果表明,该方法有效地提高了子主题的识别率. Aimed to depart the sub-topic of multi-documents more effectively,it was proposed a new method based on semi-supervised learning: it firstly got the primal sets of topics by hierarchy clustering based on semantic distance of sentences,and labeled the sentences which had high scores in the topics,then used the method of constrained-k-means to decide the number of topics k,and finally obtained the topic sets by k-means clustering.The experiment results indicated that this method improved the accuracy of sub-topic recognition.
作者 徐晓丹
出处 《浙江师范大学学报(自然科学版)》 CAS 2011年第3期302-305,共4页 Journal of Zhejiang Normal University:Natural Sciences
关键词 多文档文摘 子主题 半监督学习 K-MEANS聚类 multi-documents summarization sub-topic semi-supervised learning k-means clustering
  • 相关文献

参考文献8

  • 1Endre B, Paul B K, David J N. A clustering based approach to creating multi-document summaries [ C ]//The 24th Annum International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans:ACM SIGIR,2001:34-42.
  • 2Radev R,Jing Hongyan, Malgorzata B. Centroid-based summarizaiton of multiple documents:Sentence extraction, utility-based evaluation, and user studies [ C ]//ANLP/NAACL 2000 Workshop on Summarization. Seattle: Association for Computational Linguist,2000 : 21-29.
  • 3秦兵,刘挺,陈尚林,李生.多文档文摘中句子优化选择方法研究[J].计算机研究与发展,2006,43(6):1129-1134. 被引量:13
  • 4Zhu Xiaojin. Semi-supervised learning literature survey[ R ]. Madison: University of Wisconsin ,2008.
  • 5李昆仑,曹铮,曹丽苹,张超,刘明.半监督聚类的若干新进展[J].模式识别与人工智能,2009,22(5):735-742. 被引量:50
  • 6Mac Q J. Some methods for classification and analysis of multivariate observations[ C]//Proc of the 5th Berkeley Symp on Mathematical Statis- tics and Prohability. Berkeley: University of Califfornia Press, 1967:281-297.
  • 7Klein D, Kamvar S D, Manning C. From instance-level constraints to space-level constraints : Making the most of prior knowledge in data cluste- ring[ C ]//Proc of the 19th International Conference on Machine Learning. Sydney :International Machine Learning Society ,2002:307-314.
  • 8Basu S, Baneoee A, Moonev R J. Semi-supervised clustering by seeding [ C ]////Proc of the 19th International Conference on Machine Learning. Sydney: International Machine Learning Society,2002 : 19-26.

二级参考文献39

  • 1Olivier C, Bernhard S, Alexander Z. Semi-Supervised Learning. Cambridge, USA : MIT Press, 2006 : 3 - 10.
  • 2Blum A, Mitchell T. Combining Labeled and Unlabeled Data with Co-Training//Proe of the 11th Annual Conference on Computational Learning Theory. Madison, USA, 1998 : 92 - 100.
  • 3Zhong Shi. Semi-Supervised Model-Based Document Clustering: A Comparative Study. Machine Learning, 2006, 65 ( 1 ) : 3 - 29.
  • 4Wagstaff K, Cardie C, Rogers S, et al. Constrained K-means Clustering with Background Knowledge // Proc of 18th International Conference on Machine Learning. San Francisco, USA, 2001:577 -584.
  • 5Wagstaff K, Cardie C. Clustering with Instance-Level Constraints// Proc of the 17th International Conference on Machine Learning. SanFrancisco, USA, 2000:1103 - 1110.
  • 6Huang Desheng, Pan Wei. Incorporating Biological Knowledge into Distance-Based Clustering Analysis of Micro Array Gene Expression Data. Bioinformatics, 2006, 22 (10) : 1259 - 1268.
  • 7Tari L, Baral C, Kim S. Fuzzy C-Means Clustering with Prior Biological Knowledge. Journal of Biomedical Informatics, 2009, 42 (1): 74-81.
  • 8Ceccarelli M, Maratea A. Improving Fuzzy Clustering of Biological Data by Metric Learning with Side Information. International Journal of Approximate Reasoning, 2008, 47 ( 1 ) : 45 - 57.
  • 9Huang Ruizhang, Lam W. An Active Learning Framework for Semi Supervised Document Clustering with Language Modeling. Data & Knowledge Engineering, 2008, 68 ( 1 ) : 49 - 67.
  • 10Erman J, Mahanti A, Arlitt M, et al. Offline/Realtime Traffic Classification Using Semi-Supervised Learning. Performance Evaluation, 2007, 64(9/10/11/12): 1194- 1213.

共引文献61

同被引文献12

引证文献1

二级引证文献25

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部