期刊文献+

融合句义结构模型的短文本推荐算法研究 被引量:1

Research on Short Text Recommendation Merging Sentential Semantic Structure Model
下载PDF
导出
摘要 传统的基于协同过滤的推荐系统需要收集用户兴趣喜好等相关数据,在一定程度上涉及到用户的个人隐私,当前信息安全和隐私保护是数据挖掘领域的热点之一,为了有效避免用户信息泄露带来的安全问题,提出一种融合句义结构模型的短文本推荐方法。该方法以句子为研究对象,首先利用LDA主题模型构建文章-主题矩阵,划分子主题,然后利用句义结构模型抽取句子的语义格得到句子的语义特征,基于LDA主题模型使用句义结构计算句子两两之间的语义相似度,构建相似度矩阵,融合句子的语义特征和关联特征综合加权得到句子权值,以文章内最高单句权值衡量文章权值,将文章权值统一进行排序,按照排序顺序去冗余后依次推荐。在压缩比为0.5%的条件下,ROUGE-1值达到31.388%,ROUGE-SU*达到15.701%.实验结果表明,以句子为粒度的短文本推荐算法能丰富文本的特征信息、深化语义分析层次,在数据处理过程中未收集用户信息,从而有效避免用户信息泄露等安全问题,实现更加安全、快速向用户推荐文本。 Based on the collaborative filtering traditional recommendation system need to collect relevant data of user’s interests and preferences,to a certain extent involved in the user’s personal privacy, current information security and privacy protection is one of the hot field of data mining, in order to avoid disclosure of user information with to security issues, In this paper, we propose a new short text recommendation method based on sentential semantic structure model. We fist use topic model structure text-theme matrix to several subtopics. We employ sentential semantic structure model to extract semantic features to get sentential semantic features. Then LDA topic model fusing sentential semantic structure model is used to calculate the pairwise sentence similarities and construct the similarity matrix. Then we acquire sentential relationship features. At last,combining both sentential semantic features and relationship features, the most informative text are extracted from each subtopic. Experimental results demonstrate the improvement of our proposed framework , ROUGE-1 value is 31. 388% while ROUGE-SU* value is 15. 701% on compress ratio at 0. 5% . The results indicate that introducing sentential semantic structure model can better understand sentential semantic and using both sentential semantic features and relationship features can also enrich the features representation.
出处 《信息安全研究》 2015年第1期67-73,共7页 Journal of Information Security Research
基金 国家242信息安全计划资助项目(2005C48) 北京理工大学科技创新计划重大项目培育专项(2011CX01015)
关键词 微博 短文本推荐 主题模型 自然语言处理 信息安全 microblog short text recommendation topic model natural language processing information security
  • 相关文献

参考文献4

二级参考文献34

  • 1秦兵,刘挺,李生.基于局部主题判定与抽取的多文档文摘技术[J].自动化学报,2004,30(6):905-910. 被引量:10
  • 2卢志茂,刘挺,李生.统计词义消歧的研究进展[J].电子学报,2006,34(2):333-343. 被引量:28
  • 3刘挺,车万翔,李生.基于最大熵分类器的语义角色标注[J].软件学报,2007,18(3):565-573. 被引量:73
  • 4贾彦德.汉语语义学[M].北京:北京大学出版社,2005:117-130.
  • 5RADEV D R,HOVY E,MCKEOWN K.Introduction to the special issue on text summarization[J].Computational Linguistics,2002,28(4):399-408.
  • 6LEE J H,SUN P,AHN C M,et al.Automatic generic document summarization based on non-negative matrix factorization[J].Information Processing and Management,2009,45(1):20-34.
  • 7HIRAO T,ISOZAKI H,MAEDA E,et al.Extracting important sentences with support vector machines[C]//Proc of the 19th International Conference on Computational Linguistics.Taipei,China,2002:1-7.
  • 8NENKOVA A,VANDERWENDE L.The impact of frequency on summarization:MSR-TR-2005-101[R].Redmond,USA:Microsoft Research,2005.
  • 9LINC Y,HOVY E.The automated acquisition of topic signatures FOR text summarization[C]//Proc of the 18th International Conference on Computational Linguistics.Sarbrflcken,Germany,2000:271-278.
  • 10ANTIQUEIRA L,Jr OLIVEIRA O N.A complex network approach to text summarization[J].Information Science,2009 (179):584-599.

共引文献60

同被引文献14

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部