摘要
传统的基于协同过滤的推荐系统需要收集用户兴趣喜好等相关数据,在一定程度上涉及到用户的个人隐私,当前信息安全和隐私保护是数据挖掘领域的热点之一,为了有效避免用户信息泄露带来的安全问题,提出一种融合句义结构模型的短文本推荐方法。该方法以句子为研究对象,首先利用LDA主题模型构建文章-主题矩阵,划分子主题,然后利用句义结构模型抽取句子的语义格得到句子的语义特征,基于LDA主题模型使用句义结构计算句子两两之间的语义相似度,构建相似度矩阵,融合句子的语义特征和关联特征综合加权得到句子权值,以文章内最高单句权值衡量文章权值,将文章权值统一进行排序,按照排序顺序去冗余后依次推荐。在压缩比为0.5%的条件下,ROUGE-1值达到31.388%,ROUGE-SU*达到15.701%.实验结果表明,以句子为粒度的短文本推荐算法能丰富文本的特征信息、深化语义分析层次,在数据处理过程中未收集用户信息,从而有效避免用户信息泄露等安全问题,实现更加安全、快速向用户推荐文本。
Based on the collaborative filtering traditional recommendation system need to collect relevant data of user’s interests and preferences,to a certain extent involved in the user’s personal privacy, current information security and privacy protection is one of the hot field of data mining, in order to avoid disclosure of user information with to security issues, In this paper, we propose a new short text recommendation method based on sentential semantic structure model. We fist use topic model structure text-theme matrix to several subtopics. We employ sentential semantic structure model to extract semantic features to get sentential semantic features. Then LDA topic model fusing sentential semantic structure model is used to calculate the pairwise sentence similarities and construct the similarity matrix. Then we acquire sentential relationship features. At last,combining both sentential semantic features and relationship features, the most informative text are extracted from each subtopic. Experimental results demonstrate the improvement of our proposed framework , ROUGE-1 value is 31. 388% while ROUGE-SU* value is 15. 701% on compress ratio at 0. 5% . The results indicate that introducing sentential semantic structure model can better understand sentential semantic and using both sentential semantic features and relationship features can also enrich the features representation.
出处
《信息安全研究》
2015年第1期67-73,共7页
Journal of Information Security Research
基金
国家242信息安全计划资助项目(2005C48)
北京理工大学科技创新计划重大项目培育专项(2011CX01015)
关键词
微博
短文本推荐
主题模型
自然语言处理
信息安全
microblog
short text recommendation
topic model
natural language processing
information security