摘要
为充分利用移动搜索和政府网站的特点,发挥Hadoop处理大数据的优势,设计开发了日志挖掘和个性化定制系统。利用Flume和HDFS实现了海量日志的汇总和存储,为日志挖掘提供了数据源和调用接口;采用Map Reduce实现了对日志的高效分析,利用搜索结果网页的标签和导航,建立了网页向量空间模型和用户兴趣模型;根据用户兴趣模型,使用聚类分析中的Kmeans算法将有相似兴趣的用户组成兴趣组;通过计算搜索结果网页到用户所在兴趣组的距离,判断用户对该网页是否感兴趣,据此调整搜索结果的排序,实现个性化搜索和推送功能。
By taking full advantage of the characteristics of mobile search and government website, a log mining and customization system, which makes use of the advantages of Hadoop in large data processing, is designed and developed. First, it uses Flume and HDFS to realize the collection and storage of massive log and to provide source data and program interface of log mining. Second, the system uses MapReduce to efficiently analyze the log by taking advantage of labels and navigation bar of search result pages. Thus, the vector space model of search result pages and user interest model are established. Third, based on user interest model and combined with MapReduce again, the K-means algorithm which is for cluster analysis is used. Then, users are divided into different interest groups depending on their interests. Finally, by calculating the distance between search result page and the user's interest group, whether the user is interested in this page is determined, then the system adjusts the order of search results and pushes a new page to this user accordingly. Therefore, the personalized search and push function are implemented.
出处
《科技导报》
CAS
CSCD
北大核心
2014年第36期110-116,共7页
Science & Technology Review