摘要
针对目前一般文本搜索引擎采用的关键词匹配方法导致搜索效率相对低下的问题,在分析语义相关性的度量方案基础上,利用Wikipedia丰富链接结构所蕴涵信息,提出基于链接结构分析的主题搜索策略.设计了词条相关性算法,用以描述词间距离,并对词条进行相关度的重新排序.实验中引入用户评价机制,并与传统策略搜索结果进行对比.结果证明,该策略在扩大主题覆盖度的同时保证了较高的用户意图识别度.
Current text search engines always have low search efficiency due to their keyword matching method.Based on the comparison of previous works,a thematic search strategy is proposed.The main idea of this strategy is grounded on the rich information implicated by the link structure of Wikipedia.It defines a measure of distance between words in terms of DBW,underpinned by computational thematic communities model.In this way,the authors can use this algorithm to rank and reorient the Key words to discover the closest keyword clusters and improve the quality of searching result.Introducing users' appraisal mechanism and making comparison with the traditional search engines' outcomes in experiment prove that the strategy expands the thematic coverage and maintains a high users' intent recognition at the same time.
出处
《北京工业大学学报》
EI
CAS
CSCD
北大核心
2011年第4期614-618,623,共6页
Journal of Beijing University of Technology
基金
国家自然科学基金资助项目(70671007)
关键词
维基百科
网络聚类
知识发现
Wikipedia
network clustering
knowledge discovery in databases(KDD)