摘要
为了进一步提升语义检索的精度和改善用户体验,提出了一种基于多分类语义分析和个性化的语义检索方法.首先,利用改进的多分类语义分析方法实现目标文档的向量化,并建立词向量库;然后,利用支持向量机对文档进行分类,并结合文档类别生成标签索引.在检索时,根据词向量库的引导,使用用户历史检索记录和个人信息优化检索结果.实验结果显示,基于该方法的系统的检索精度、平均DCG和nDCG指标值分别达到0.7,7.267和0.890,较基于Lucene方法和Yahoo Directory方法所得结果的均值分别高出31%,36%和19%.在时间复杂度上,每次检索的平均耗时为0.669 s,较Lucene方法仅增加了0.326 s.由此可见,该方法提高了检索的精度和综合相关度,且额外的时间消耗较少.
To further enhance the accuracy of semantic search and improve the user experience,a novel approach for semantic search based on multi-classification semantic analysis (MSA)and per-sonalization is presented.First,documents are transformed into vectors and stored in term vector da-tabase (TVDB )by using the modified MSA method.Then,documents are classified by support vector machine(SVM)and wrote into index with categories.In the search process,users' search history and personal information are used to optimize the search results with the help of TVDB .The experiment results show that the average precision,the average discounted cumulative gain(DCG) and the average normalized discounted cumulative gain(nDCG)otained by using this approach are 0.7,7.267 and 0.890,respectively,which are 31%,36%and 19%higher than the average of the results calculated by the Lucene method and the Yahoo Directory method.And the time complexity per query is 0.669 s,which is only 0.326 s more than that by using the Lucene method.Therefore, this approach can improve the relevance and precision of semantic search with a rational time cost.
出处
《东南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2014年第2期261-265,共5页
Journal of Southeast University:Natural Science Edition
基金
国家自然科学基金资助项目(61001197
61372182)
国家电网公司科技资助项目(522722130292)