期刊文献+

基于文本空间表示模型的文本相似度计算研究 被引量:4

Using Text Space Representation Model in Text Similarity Computing
下载PDF
导出
摘要 在分析现有文本表示法的基础之处,提出一种以段落、语句、词语为层次结构的文本表示方法———文本空间表示模型,并在此模型基础上探讨一种以文本段落为基本单位的相似文本计算算法,以实现相似文本检测目标。最后建立测试集并在测试集上执行检测实验,结果表明此方具有较好的相似文本发现效果。 Based on the analysis of existing text representation methods, a new model for text representation that decompose text to paragraphs, sentences and phrases hierarchically is introduced in this study contrast with the inadequacy of existing text representation methods. With the help of this model we propose a text similarity computing method on the unit of paragraph of text. The following experiments carried on the testing set proved a better effect by this method.
出处 《现代情报》 CSSCI 2013年第2期21-23,124,共4页 Journal of Modern Information
基金 湖北省教育厅人文社科项目"云计算环境下图书馆信息服务模型研究"(2012Q190)研究成果之一
关键词 文本相似度 文本空间表示模型 段落 算法 text similarity text space representation model paragraph algorithm
  • 相关文献

参考文献10

  • 1Yatsko V.A., Vishnyakov T.N. A methtd for evaluating mode-n sys- term of automatic text sunanarization [J]. In: Automatic Docuraenta- tio, and Mathematical Linguistics, 2007, 41 (3): 93- 103.
  • 2金博,史彦军,滕弘飞.基于语义理解的文本相似度算法[J].大连理工大学学报,2005,45(2):291-297. 被引量:80
  • 3Mihalcea R., Tarau P. TextRank.- Bringing Order into Texts [M]. Depamnent Computer Science University o/" North Texas, 2004.
  • 4Ozlcrn Uzuner, Randall Davis, Boris Katz. Using enirleal methodsfor evaluation expression and content similarity [J]. Pnsting d the 37th Hawaii International Conference on System Sciences, 2004.
  • 5SunZ, ErmmiM, I_engT, Renani C, Choradia N, etal. System- atlc Characterizations of Text Similarity in Full Text Biomedical Pub[ica- rims [J]. (2010) PLoS ONE5 (9). e12704.
  • 6Llekar A., Mujumdar A. et ai. Automatic text stmmmrizatlon us- ing: CA - GP [J]. International Journal F. Re- search and Application, 2012, 2 (2) : 1551 - 1555.
  • 7Islam A., Inkpen D. Semantic text similarity using corpus - based word sirlarlty and string slrlarlty [J]. ACM Try. Knowl. Dis- coy. Data. July 2008.
  • 8Salton G.. WongA., YangC.S.. A vectorslmcemlel frautm,t- icindexing [J]. C.,mmmicationdthe ACM, 1975, 18 (11): 613 - 620.
  • 9刁力力,王丽坤,陆玉昌,石纯一.计算文本相似度阈值的方法[J].清华大学学报(自然科学版),2003,43(1):108-111. 被引量:18
  • 10宋韶旭,李春平.基于非对称相似度的文本聚类方法[J].清华大学学报(自然科学版),2006,46(7):1325-1328. 被引量:7

二级参考文献17

  • 1刘群 李素建.基于《知网》的词汇语义相似度计算[A]..第三届汉语词汇语义学研讨会[c].台北,2002..
  • 2俞士汶 段慧明 田剪秋.机械文摘自动评测的原理及实现[A].吴泉源.智能计算机接口与应用进展—第三届中国计算机智能接口与智能应用学术会议论文集[C].北京:电子工业出版社,1998.230-233.
  • 3车万翔 刘挺 秦兵.面向双语句对检索的汉语句子相似度计算[A]..全国第七届计算语言学联合学术会议[C].北京:清华大学出版社,2003.81-88.
  • 4董振东 董强.知网[EB/OL].http://www.keenage.com.,2003-07—12.
  • 5WILLETT P. Recent trends in hierarchical document clustering: a critical review [J]. Inf Process and Manage, 1988, 24(5) : 577-597.
  • 6SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval [J]. Inf Process and Manage, 1988, 24(5) : 513-523.
  • 7CALLAN J P. Passage-level evidence in document retrieval [A]. Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval[C]. Dublin: [s n], 1994. 302-310.
  • 8AGIRRE E, RIGAU G. A proposal for word sense disambiguation using conceptual distance [A].International Conference on Recent Advances in Natural Language Processing [C]. Velingrad : [s n],1995. 258-264.
  • 9ZHANG Hua-ping, Yu Hong-kui, Xiong De-yi, et al. HHMM-based Chinese lexieal analyzer ICTCLAS[A]. 41st Annual Meeting of the Association for Computational Linguistics [C]. Sapporo: [s n],2003. 184-187.
  • 10Han J,Kamber M.Data Mining:Concept and Techniques[M].San Fransisco:Morgan Kaufmann Publishers,2001.

共引文献101

同被引文献54

  • 1晋耀红.基于语境框架的文本相似度计算[J].计算机工程与应用,2004,40(16):36-39. 被引量:26
  • 2金博,史彦军,滕弘飞.基于语义理解的文本相似度算法[J].大连理工大学学报,2005,45(2):291-297. 被引量:80
  • 3FAHIM A.M,SALEM A.M,TORKEY F.A,RAMADAN M.A.An efficient enhanced k-means clustering algorithm[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2006,7(10):1626-1633. 被引量:30
  • 4王毅,唐歆瑜,谢治华.基于向量空间模型的毕业论文相似性辨识研究[J].科学技术与工程,2007,7(9):2111-2113. 被引量:1
  • 5搜狗实验室.文本分类语料库[EB/OL].http://www.sogou.com/labs/dl/c.html.
  • 6SELVI R T, PRAKASH-RAJ G D. Information retrieval models: a survey[ J]. International Joumal of Research and Reviews in In- formation Sciences ,2012, 2 (3) : 227- 233.
  • 7OUYANG You, LI Wen-jie, ZHANG Ren-xian, et al. A progressive sentence selection strategy for document summarization[J]. Informa- tion Processing & Management, 2012, 49( 1 ) : 213-221.
  • 8WANG Na, WANG Peng-yuan, ZHANG Bao-wei. An improved TF- IDF weights function based on information theory [ C ]//Proc of Inter- national Conference on Computer and Communication Technologies in Agriculture Engineering. 2010: 439-441.
  • 9刘群,李素建.基于《知网》的词汇语义拟合度的计算[c]//第三届汉语词汇语义学研讨会论文集.2002.
  • 10SONG Wei, SOON C P. Genetic algorithm for text clustering based on latent semantic indexing [ J ]. Computers& Mathematics with Applications, 2009, 57 ( 11-12) : 1901-1907.

引证文献4

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部