摘要
论文提出了一种基于向量空间模型的用户个性化需求建模方法。对关键词权重算法作出改进,将网页分为四类逻辑段,通过计算关键词在各类逻辑段中的权重而加权得到综合权重。采用基于内容的构建原则和反馈原则,将用户模型构建分为训练阶段和自适应学习阶段。在训练阶段由用户给出的样本文档与关键词采用类重心分类算法训练得到初始用户模型;在自适应学习阶段,提出了基于Rocchio算法的周期性自适应学习机制,根据用户对过滤结果的评价,调整用户模型,以提高对用户个性化需求的动态追踪能力。开发了个性化信息过滤原型系统。以中国服装网为实验数据源,对比百度搜索引擎,测试系统的信息过滤性能。实验结果表明,系统索引更新及时,响应速度快,返回的信息更精确,更合理,更加符合用户的实际需求。
A profiling method of user's personalized requirements based on vector space model is proposed .The algo-rithm for calculating keyword's weight is improved .Web page is divided into four kinds of logic sections ,and the keyword's weight is integrated by calculating weight in all kinds of logic sections .The user profile building process is divided into the training stage and the adaptive learning stage ,adopting the content-based constructing principle and feedback principle .In the training stage ,the initial user profile is gained from the sample documents and keywords given by the user through the al-gorithm of Text Categorization based on Category Centric .In the adaptive learning stage ,a periodic adaptive learning mecha-nism based on Rocchio algorithm is put forward .The user profile is adjusted according to user's evaluation about the filtering result to improve the dynamic tracking ability of user's personalized requirements .Personalized information filtering proto-type system is developed .Taking Chinese Clothing Network as the experimental data ,its performance of information filte-ring is tested with the comparison of Baidu search engine .The experimental results show that the system's index updates more timely ,responds faster ,and returns the information more accurate ,reasonable ,and more in line with the user's actual needs .
出处
《计算机与数字工程》
2014年第10期1940-1944,1990,共6页
Computer & Digital Engineering
基金
浙江省哲学社会科学规划课题"基于专利引证网络的知识基因提取方法探索"(编号:13NDJC19YBM)
浙江省软科学研究计划项目"技术标准下提升企业自主创新能力--基于专利池的组建与管理"(编号:2013C35064)
浙江省人力资源和社会保障科学研究课题"技术标准下面向知识创新的公共信息服务平台研究"(编号:R2013A012)资助