期刊文献+

语义分析与TF-IDF方法相结合的新闻推荐技术 被引量:11

News Recommendation Technology Combining Semantic Analysis with TF-IDF Method
下载PDF
导出
摘要 在新闻项目的推荐系统中,通常使用TF-IDF权重技术结合余弦相似性度量方法,然而这种技术没有考虑到文字本身的实际语义,因此,提出了基于内容和语义分析相结合的一种新方法。此方法将同义词集合的逆文档频率及语义相似性相结合,采用WordNet同义词集合做相似性计算。构建用户配置文件进行实验测试,验证了该方法的有效性。实验结果表明,提出的语义方法性能优于TF-IDF方法。 Currently in the news item recommendation system, usually using TF-IDF weighting technology combined with the cosine similarity measure, however, this technique does not take into account the actual semantics of the text itself, therefore, the paper propsed a new method based on the combination of contents and their semantic similarities. This method is a collection of synonyms and inverse document frequency combining semantic similarity using WordNet synset do similar calculations. Building user profiles for laboratory tests to verify the effectiveness of the method. Experimental results show that the proposed method outperforms the TF-IDF method.
作者 周由 戴牡红
出处 《计算机科学》 CSCD 北大核心 2013年第11A期267-269,300,共4页 Computer Science
基金 湖南省自然科学基金项目(2011FJ3034)资助
关键词 新闻推荐系统 语义分析 语义相似度 WordNet同义词集合 News recommendation system, Semantic analysis, Semantic similarity, WordNet synset
  • 相关文献

参考文献11

  • 1华秀丽,朱巧明,李培峰.语义分析与词频统计相结合的中文文本相似度量方法研究[J].计算机应用研究,2012,29(3):833-836. 被引量:41
  • 2Goossen F,Jntema W, Frasincar F, et al. News Personalization using the CF-IDF Semantic Recommender[C] //Proc of the In- ternational Conference on Web Intelligence, Mining and Seman- tics. 2011.
  • 3黄承慧,印鉴,侯昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011,34(5):856-864. 被引量:212
  • 4李明涛,罗军勇,尹美娟,路林.结合词义的文本特征词权重计算方法[J].计算机应用,2012,32(5):1355-1358. 被引量:9
  • 5Toutanova K, Klein D, Manning C D, et al. Feature-Rich Part-o Speech Tagging with a Cyclic Dependency Network[C] //Proc of " NAACL'. 2003 173-180.
  • 6Jensen A S, Boss N S. Dry similarity[OL], http://damn, dk/ similarity/javadoc/model/similarity/Lesk, html,2008.
  • 7Lextek Onix Text Retrieval Toolkit { API Reference. http// www. lextek, com/manuals/onix/stopwordsl, html (2011)(stop word).
  • 8Jiang J J, Conrath D W. Semantic Similarity Basedon Corpus Statistics and Lexical Taxonomy[J]. Proc of 10th International Conference on Research in Computational Linguistics, 1997,19 (33).
  • 9Fellbaum C. WordNet: an electronic lexieal database [OL]. WordNet is available from http://www, eogsci, princeton, edu/ wn,2010.
  • 10I Resnik P. Using Information Content to Evaluate Semantic Sim- ilarity in a Taxonomy[C] ffProc of the 14th International Joint Conference on Artificial Intelligence. 1995,11 .. 448-453.

二级参考文献47

  • 1车万翔,刘挺,秦兵,等.面向双语句对检索的汉语句子相似度计算[C]//全国第七届计算语言学联合学术会议论文集.北京:清华大学出版社,2003:81-88.
  • 2Fung B C M,Wang K,Ester M.Hierarchical document clustering//Wang John ed.The Encyclopedia of Data Warehousing and Mining,idea Group.2005:970-975.
  • 3Salton G.The SMART Retrieval System-Experiments in Automatic Document Processing.Englewood Cliffs,New Jersey:Prentice Hall Inc,1971.
  • 4Wang Y,Julia H.Document clustering with semantic analysis//Proceedings of the 39th Hawaii International Conferences on System Sciences.Hawaii,US,2006:54-63.
  • 5Hotho A,Staab S,Stumme G.Wordnet improves text document clustering//Proceedings of the Semantic Web Workshop at SIGIR-2003,26th Annual International ACM SIGIR Conference.Toronto,Canada,2003:541-550.
  • 6Hall P,Dowling G.Approximate string matching.Computing Survey,1980,12(4):381-402.
  • 7Coelho T,Calado P,Souza L,Ribeiro-Neto B,Muntz R.Image retrieval using multiple evidence ranking.IEEETransactions on Knowledge and Data Engineering,2004,16(4):408-417.
  • 8Ko Y,Park J,Seo J.Improving text categorization using the importance of sentences.lnformation Processing and Management,2004,40(1):65-79.
  • 9Erkan G,Radev D.Lexrank:Graph-based lexical centrality as salience in text summarization.Journal of Artificial Intelligence Research,2004,22(7):457-479.
  • 10Theobald M,Siddharth J,Paepcke A.SpotSigs:Robust and efficient near duplicate detection in large Web collections//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Singapore,2008:563-570.

共引文献252

同被引文献66

  • 1林鸿飞,杨志豪,赵晶.基于内容和合作模式的信息推荐机制[J].中文信息学报,2005,19(1):48-55. 被引量:14
  • 2倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的流数据高效密度聚类算法[J].小型微型计算机系统,2007,28(1):83-87. 被引量:8
  • 3徐琳宏,林鸿飞,杨志豪.基于语义理解的文本倾向性识别机制[J].中文信息学报,2007,21(1):96-100. 被引量:119
  • 4范明 等.数据挖掘概念与技术[M].北京:机械工业出版社,2001..
  • 5刘伟,孟小峰,孟卫一.Deep Web数据集成研究综述[J].计算机学报,2007,30(9):1475-1489. 被引量:136
  • 6PeterHarrington著.机器学习实战[M].李锐,李鹏等译.北京:人民邮电出版社,2013.6.
  • 7BERGMAN M K. The Deep Web : surfacing hidden value[ EB/OL]. [ 2014-6-18] .http://www.brightplanet.com/2012/06/the-deep-web-surfa-cing-hidden-value/.
  • 8Liu Tantan,Wang Fan, Agrawal G.Instance discovery and schema matching with applications to biological Deep Web data integration[ C] .Washing-ton ,IEEE International Conference on Bioinformatics & Bioengineering, 2010.
  • 9Research on Deep Web Query InterfaceClustering Based on Hadoop[ J].Journal of Software,2014, 9( 12) :3057-3062.
  • 10WangYing; LiHuilai ; ZuoWanli ; et al. Ontology - Based Approach to Integrate Deep Web Query Interfaces.J] . Advanced Science Letters, 2012( 4):220-223.

引证文献11

二级引证文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部