期刊文献+

基于词条之间关联关系的文档聚类 被引量:1

Document clustering based on association relations between terms
下载PDF
导出
摘要 针对现有的空间向量模型在进行文档表示时忽略词条之间的语义关系的不足,提出了一种新的基于关联规则的文档向量表示方法。在广义空间向量模型中分析词条的频繁同现关系得到词条同现语义,根据关联规则分析词条之间的关联相关性,挖掘出文档中词条之间的潜在关联语义关系,将词条同现语义和关联语义线性加权对文档进行表示。实验结果表明,与BOW模型和GVSM模型相比,采用关联规则文档向量表示的文档聚类结果更准确。 For the existing vector space model to omit making insufficient semantic relationships between terms in documents representation, this paper proposes a novel document vector representation approach based association relationship.In terms of generalized vector space model, it captures the frequent co-occurrence semantic relations between terms, and then analyzes the correlation between related terms based on association rules, digging out the potential relevance of semantic relationships between terms in the document. It represents documents with linear weighting co-occurrence semantic relations with association semantic. Experimental results show that, compared with the BOW model and GVSM model, the clustering results using association rules document vector represented are more accurate.
出处 《计算机工程与应用》 CSCD 北大核心 2016年第7期86-90,共5页 Computer Engineering and Applications
基金 国家青年科学基金(No.61003162) 辽宁省教育厅一般项目(No.L2013131)
关键词 文档聚类 关联关系 词条同现 文档相似度 潜在语义 document clustering association terms co-occurrence document similarity latent semantic
  • 相关文献

参考文献16

  • 1Salton G,Mcgill M J.Introduction to modern information retrieval[M].New York:Mc Graw-Hill,1983.
  • 2张明卫,刘莹,张斌,朱志良.一种基于概念的数据聚类模型[J].软件学报,2009,20(9):2387-2396. 被引量:15
  • 3Wong S K M,Ziarko W,Wong P C N.Generalized vector spaces model in information retrieval[C]//Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1985:18-25.
  • 4Billhardt H,Borrajo D,Maojo V.A context vector model for information retrieval[J].J Am Soc Info Sci Technol,2002,53(3):236-249.
  • 5Kalogeratos A,Likas A.Text document clustering using global term context vectors[J].Knowledge and Information Systems,2012,31(3):455-474.
  • 6Cai D,He X,Han J.Locally consistent concept factorization for document clustering[J].IEEE Trans on Knowl Data Eng,2011,23(6):902-913.
  • 7黄承慧,印鉴,侯昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011,34(5):856-864. 被引量:221
  • 8俞辉.基于改进LSA的文档聚类算法[J].小型微型计算机系统,2009,30(5):963-966. 被引量:5
  • 9常鹏,冯楠,马辉.一种基于词共现的文档聚类算法[J].计算机工程,2012,38(2):213-214. 被引量:15
  • 10Billionaire J A,Levy J P.Extracting semantic representations from word co-occurrence statistics:a computational study[J].Behavior Research Methods,2007,39(3):510-526.

二级参考文献63

  • 1白硕.不完全知识下的概念聚类[J].计算机学报,1995,18(6):409-416. 被引量:6
  • 2耿焕同,蔡庆生,于琨,赵鹏.一种基于词共现图的文档主题词自动抽取方法[J].南京大学学报(自然科学版),2006,42(2):156-162. 被引量:30
  • 3张敏,耿焕同,王煦法.一种利用BC方法的关键词自动提取算法研究[J].小型微型计算机系统,2007,28(1):189-192. 被引量:19
  • 4Fung B C M,Wang K,Ester M.Hierarchical document clustering//Wang John ed.The Encyclopedia of Data Warehousing and Mining,idea Group.2005:970-975.
  • 5Salton G.The SMART Retrieval System-Experiments in Automatic Document Processing.Englewood Cliffs,New Jersey:Prentice Hall Inc,1971.
  • 6Wang Y,Julia H.Document clustering with semantic analysis//Proceedings of the 39th Hawaii International Conferences on System Sciences.Hawaii,US,2006:54-63.
  • 7Hotho A,Staab S,Stumme G.Wordnet improves text document clustering//Proceedings of the Semantic Web Workshop at SIGIR-2003,26th Annual International ACM SIGIR Conference.Toronto,Canada,2003:541-550.
  • 8Hall P,Dowling G.Approximate string matching.Computing Survey,1980,12(4):381-402.
  • 9Coelho T,Calado P,Souza L,Ribeiro-Neto B,Muntz R.Image retrieval using multiple evidence ranking.IEEETransactions on Knowledge and Data Engineering,2004,16(4):408-417.
  • 10Ko Y,Park J,Seo J.Improving text categorization using the importance of sentences.lnformation Processing and Management,2004,40(1):65-79.

共引文献275

同被引文献9

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部