摘要
通过将研究分解成三个子任务,对网络数据从运用PageRank与TrustRank剔除作弊网页开始;借助结合网页间主题相关度、时间差以及在线评论比例的权重的TC-PageRank算法,提炼与产品主题高度相关并包含大量在线评论数据的网页集;最后考虑了网页与产品主题的相似度以及网页的链接增幅对网页权威性的影响,运用改进的HITS算法,确定在线评论分析数据来源的权威网页集;而基于MapReduce的矩阵分块运算,降低了算法时空的复杂度。并通过仿真实验验证了该方法的可行性与准确性。
Through resolve the research into three subtasks,starting from operation PageRank and Trust Rank eliminate cheating page of network. Refining web page of high topic relevance by TC-PageRank combined topic relevancy between web pages and weight of time difference and reviews on web page. Finally,thought of similarity between page and topic of product and amplification of page have the influence on the web authority,determine the authority of the web page of online review analysis data source by the improved HITS. The partitioning of matrix operation based on Map Reduce,reduces the time and space complexity of the algorithm. And through the simulation experiments it verifies the feasibility and accuracy of the method.
出处
《软科学》
CSSCI
北大核心
2015年第4期94-99,共6页
Soft Science
基金
国家自然科学基金项目(71302087)
江苏省普通高校研究生科研创新计划项目(KYZZ_0287)