期刊文献+

基于混合策略的英汉双语新闻聚类研究 被引量:2

Research on English-Chinese Bilingual News Clustering Based on Mixed Strategy
原文传递
导出
摘要 英汉双语文本聚类是一项非常有价值的研究。使用单语言文本聚类算法,在英汉双语新闻语料基础上,对基于中文单语、英文单语和英汉双语混合的方法进行了文本聚类比较研究,实验结果表明,基于英汉双语混合的文本聚类方法可以取得较好的聚类结果。 English-Chinese bilingual doucment clustering is a valuale research. Based on monolingual al- gorithm, the paper makes an comparative study about monolingual-based clustering and mixed lan- guage-based method clustering by using the corpus of English-Chinese bilingual news document,. Ac cording to the experimental result, it shows that mixed language-based method can make a better perfor- mance.
出处 《情报科学》 CSSCI 北大核心 2013年第1期118-122,共5页 Information Science
基金 教育部人文社会科学重点研究基地重大项目(08JJD870225) 2011年南京大学研究生科研创新基金资助项目(2011CW12)
关键词 双语聚类 多语聚类 混合策略方法 bilingual clustering muhilingual clustering mixed strategy
  • 相关文献

参考文献14

  • 1Boley D, Gini M, Gross R, et al. Partitioning-Based cluster- ing for web document categorization[J]. Decision Support Sys-tern Journal, 1999, 27(3):329-341.
  • 2Mao J, Jain A K. A serf-organizing network for hyperellipsoi- da] elustering[J]. IEEE Trans. Neural Networks, 1996,7(2): 16-29.
  • 3Cai WL, Chen SC, Zhang DQ. Fast and robust fuzzy c-means clustering algorithms incorporating local information for im- age segmentation[J]. Pattern Recognition, 2007, 40(3): 825-833.
  • 4章成志,王惠临.多语言文本聚类研究综述[J].现代图书情报技术,2009(6):31-36. 被引量:4
  • 5Chen H H, Lin C J. A Muhilingual News Summarizer[C].In Proceedings of the 18'h International Conference on Computa- tional Linguistics, 2000:159-165.
  • 6Lawrence J L.Newsblaster Russian-English Clustering Perfor- mance Analysis[R].Columbia Computer Science Technical Reports, 2003.
  • 7David K, Evans J, Klavans R. Columbia Newsblaster: Muhilin- gual News Summarization on the Web Demonstration [A]. HLT-NAACL 2004 [C].PA, USA,2004:1-4.
  • 8Mathieu B, Besancon R, Fluhr C. Multilingual document clus- ters discovery[C].In Proceedings of RIAO2004, 2004:1-10.
  • 9Montalvo S,Martinez R, Casi]las A, et al. Muhilingual Docu- ment Clustering: an Heuristic Approach Based on Cognate Named Entities[C].In Proceedings of the 218' International Conference on Computational Linguistics and 44:h Annual Meeting of the ACL, 2006:1145-1152.
  • 10Dumais S T, Letsche T A, Littman M L, et al. Automatic Cross-Language Information Retrieval using Latent Seman- tic Indexing[C].In the Proceedings of the AAAI Symposium on Cross-language Text and Speech Retrieval .American As- sociation for Artificial Intelligence, 1997:15-21.

二级参考文献34

  • 1骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 2Google News [ EB/OL]. [2008 - 12 - 10]. http://news. google.
  • 3Montalvo S, Martinez R, Casillas A,et al. Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities [ C ], In : Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 2006:1145 - 1152.
  • 4Chen H H,Lin C J. A Muhilingual News Summarizer [ C ]. In: Proceedings of the 18th International Conference on Computational Linguistics, 2000 : 159 - 165.
  • 5Braschler M, Ripplinger B, Schuble P. Experiments with the Eurospider Retrieval System for CLEF2001 [ C ]. In: Proceedings of the Second Workshop of the Cross -Language Evaluation Forum, 2001 : 102 - 110.
  • 6Lawrence J L. Newsblaster Russian - English Clustering Performance Analysis [ R ]. Columbia Computer Science Technical Reports, 2003.
  • 7Steinberger R, Hagman J, Scheer S. Using Thesauri for Automatic Indexing and for the Visualization of Muhilingual Document Collections [ C]. In : Proceedings of the Workshop on Ontologies and Lexical Knowledge Bases, 2000:130 - 141.
  • 8Evans D K, Klavans J L. A Platform for Muhilingual News Summarization [ R]. Technical Report, Department of Computer Science, Columbia University, 2003.
  • 9Mathieu B, Besancon R, Fluhr C. Muhilingual Document Clusters Discovery [ C]. In: Proceedings of RIAO2004, 2004:1 - 10.
  • 10Pouliquen B, Steinberger R, Ignat C,et al. Muhilingual and Cross - Lingual News Topic Tracking [ C ]. In : Proceedings of the 20th International Conference on Computational Linguistics, 2004:959 - 965.

共引文献28

同被引文献32

  • 1李立新.论对外汉语词汇教学对语境理论的应用[J].陕西师范大学学报(哲学社会科学版),2006,35(S2):286-288. 被引量:21
  • 2杨惠元.辨音辨调跟理解词义句义的关系——一次听力理解的实验[J].世界汉语教学,2000,14(1):82-88. 被引量:13
  • 3罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量:55
  • 4金千里,赵军,徐波.弱指导的统计隐含语义分析及其在跨语言信息检索中的应用[C]//全国第七届计算语言学联合学术会议.北京:清华大学,2003-08-01:527-533.
  • 5章成志,王惠临.多语言文本聚类研究综述[J].现代由书情报技术,2009( 6 ): 31-36.
  • 6Chen H H,Lin C J. A Multilingual News Summarizer[C]. In :Proceedings of the 18th International Conference on Compu-tational Linguistics. Stroudsburg, PA: Association for Com-putational Linguistics, 2000: 159-165.
  • 7Leftin L J. Newsblaster Russian-English Clustering Perfor-mance Analysis[R]. Columbia Computer Science TechnicalReports, 2003.
  • 8Wu K, Lu B L. Cross-Lingual Document Clustering[C]. In:Proceedings of the 11th Pacific-Asia Conference on Know-ledge Discovery and Data Mining. Berlin,Heidelberg :Springer, 2007: 956-963.
  • 9Montalvo S, Martinez R, Casillas A, et al. Multilingual NewsClustering: Feature Translation vs. Identification of CognateNamed Entities[J]. Pattern Recognition Letter,2007,28( 16 ):2305-2311.
  • 10Denicia-Carral C,Montes-Gomez M, Villasenor-Pineda L,et al. Bilingual Document Clustering Using TranslationIndependent Features[C]. In: Proceedings of CICLing’ 10.2010.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部