摘要
用户兴趣本体弥补了基于关键词的用户兴趣模型不能从语义上表达用户兴趣的缺陷,但大多是利用领域本体来构建,很难反映用户多方面和潜在兴趣,并且构建领域本体也是一个难点。本文据此提出一种基于词汇同现的用户兴趣本体构建方法。根据网页浏览记录找到用户兴趣网页集,经过数据处理将其转换成用户兴趣文本集。以TFIDF为指标抽取概念,词汇同现统计提取概念间关系,运用无尺度K-中心点聚类算法对其调整,将有关联用户的本体合并得到多用户本体,该本体能在语义上更全面反映用户兴趣并发现潜在兴趣。
User interest ontology can make up the deficiencies of the Keyword-based user interest model that can not express the user interest from semantics.However,in most cases,we use the domain ontology to construct the user interest,and it's difficult to reflect the user interest in various aspects and the potential interest.Furthermore,the construction of the domain ontology is also a challenge.Therefore,this paper proposes a method of constructing the user interest ontology based on word co-occurrence.We find the user interest sets of Web pages from the Web page browsing records and convert them into the user interest sets of text through data processing.Then we extract the concepts by taking TFIDF as the index,and extract the relationships between concepts by word co-occurrence statistics.Finally,the scale-free K-central point clustering algorithm is used to adjust the ontology.By merging the ontology of relevance users,we can find multi-user ontology.The method can reflect the user interest from semantics more completely and can help identify the potential interest.
出处
《情报理论与实践》
CSSCI
北大核心
2012年第5期99-102,共4页
Information Studies:Theory & Application
基金
教育部人文社会科学重点研究基地重大项目"面向决策的企业信息资源集成研究"(项目编号:2009JJD870002)
教育部人文社会科学研究项目"企业信息资源集成研究"(项目编号:2008JA870013)的成果之一
关键词
用户兴趣
本体构建
词汇同现
user interest
ontology architecture
word co-occurrence