期刊文献+

融合要素及主题的汉越双语新闻话题分析 被引量:3

Analysis of Sino-Vietnamese Bilingual News Topics Mixing Elements and Themes
下载PDF
导出
摘要 双语话题分析与发现是当前国内外的研究热点,但针对特定文本研究较少。为此,在汉越双语新闻文本中,基于双语主题分布词的汉越文本相似度计算方法,提出融合标题、关键词以及实体等并针对新闻文本的新闻要素特征。将这些新闻特征信息融合到文本相似度计算中构建双语文本相似度矩阵,对汉越双语新闻文本采用自适应K均值算法进行聚类,分析汉越双语新闻话题。实验结果表明,与仅考虑新闻文本相似度的计算方法和K均值聚类方法相比,该方法的准确率、召回率和F值更高。 It is a hot research point of analyzing and discovering bilingual topics. However, there is no further research on specific contexts. So this paper puts forward a similarity calculation method for Sino-Vietnamese context based on bilingual subject distribution words in Sino-Vietnamese bilingual news texts. It is mixed with element features of news such as titles, key words and entities, integrates the news feature information into the context similarity calculation to construct bilingual text similarity matrix, and uses adaptive K-means algorithm to cluster Sino-Vietnamese bilingual news texts in order to analyze Sino-Vietnamese bilingual news topics. Experimental results prove that the accuracy rate, recall rate and F-measure of the proposed method are higher than that of the calculation method using only news text similarity and K-means clustering method.
出处 《计算机工程》 CAS CSCD 北大核心 2016年第9期186-191,共6页 Computer Engineering
基金 国家自然科学基金资助项目(61462055 61472168 61262041) 云南省自然科学基金资助重点项目(2013FA130)
关键词 双语新闻话题分析 汉越双语 文本相似度 主题 自适应聚类 analysis of bilingual news topic Sino-Vietnamese bilingual text similarity topic adaptive clustering
  • 相关文献

参考文献13

  • 1Yang Yiming, Pierce T, Carbonell J. A Study of Retrospective and On-line Event Detection E C ]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM Press, 1998:28-36.
  • 2Allan J, Papka R, Lavrenko V. On-line New Event Detection and Tracking [ C l//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA : ACM Press, 1998 : 37 -45.
  • 3Ahmed A,Ho Q, Eisenstein J, et al. Unified Analysis of Streaming News I C ]//Proceedings of the 20th Interna- tional Conference on World Wide Web. New York, USA : ACM Press, 2011 : 267-276.
  • 4Allan J,Bolivar A, ConneU M, et al. Umass tdt 2003 Research Summary [ C ]//Proceedings of Workshop on Topic Detection and Tracking. Berlin, Germany:Springer,2003 : 1-4.
  • 5Levow G A, Oard D W. Signal Boosting for Trans- lingual Topic Tracking I M~. Berlin ,Germany :Springer,2002.
  • 6Jin H, Schwartz R, Sista S, et al. Topic Tracking for Radio, TV Broadcast and Newswire ~ C ~//Proceedings of DARPA Broadcast News Workshop. Budapest, Hungary : [ s. n. ] , 1999 : 199-204.
  • 7Leek T, Jin H, Sista S, et al. The BBN Crosslingual Topic Detection and Tracking System ~ C ]//Proceedings of Topic Detection and Tracking Workshop. Berlin, Germany : Springer, 1999 : 214-221.
  • 8徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,34(8):1423-1436. 被引量:236
  • 9Ni Xiaochuan, Sun Jiantao, Hu Jian, et al. Mining Multilingual Topics from Wikipedia E C ~//Proceedings of the 18th International Conference on World Wide Web. New York, USA : ACM Press, 2009 : 1155-1156.
  • 10Mimno D, Wallach H M, Naradowsky J, et al. Polylingual Topic Models ~ C ]//Proceedings of Con- ference on Empirical Methods in Natural Language Processing. Stroudsburg, USA : Association for Comput- ational Linguistics ,2009:880-889.

二级参考文献80

  • 1Deerwester S C, Dumais S T, Landauer T K, et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.
  • 2Hofmann T. Probabilistic latent semantic indexing//Proceedings of the 22nd Annual International SIGIR Conference. New York: ACM Press, 1999:50-57.
  • 3Blei D, Ng A, Jordan M. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 4Griffiths T L, Steyvers M. Finding scientific topics//Proceedings of the National Academy of Sciences, 2004, 101: 5228 5235.
  • 5Steyvers M, Gritfiths T. Probabilistic topic models. Latent Semantic Analysis= A Road to Meaning. Laurence Erlbaum, 2006.
  • 6Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarchical dirichlet processes. Technical Report 653. UC Berkeley Statistics, 2004.
  • 7Dempster A P, Laird N M, Rubin D B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 1977, B39(1): 1-38.
  • 8Bishop C M. Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
  • 9Roweis S. EM algorithms for PCA and SPCA//Advances in Neural Information Processing Systems. Cambridge, MA, USA: The MIT Press, 1998, 10.
  • 10Hofmann T. Probabilistic latent semantic analysis//Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. Stockholm, Sweden, 1999:289- 296.

共引文献238

同被引文献16

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部