摘要
主题漂移和词不匹配是自然语言处理中一个难题,文本挖掘与信息检索的结合有助于解决该问题.鉴于此,本文提出一种基于完全加权正负关联模式挖掘的越-英跨语言查询译后扩展算法.该算法采用新的完全加权正负项集支持度和关联度计算方法以及模式评价框架,对初检用户相关反馈文档集挖掘与原查询词相关的正负关联模式,从模式中提取扩展词实现跨语言查询译后扩展.与现有基于伪相关反馈、加权关联模式挖掘的跨语言扩展算法比较,本文算法能有效地减少查询主题漂移和词不匹配问题,提高跨语言信息检索性能;本文模式挖掘方法可用于推荐系统,提高其准确性.
Topic drift and word mismatch are a difficult problem in natural language processing.The combination of text mining and information retrieval can help to solve the problem.In view of this,this paper proposes an algorithm of Vietnamese-English cross language(VECL)query post-translation expansion based on all-weighted positive and negative association pattern mining.The algorithm utilized a computing method of support and correlation degree of all-weighted positive and negative itemset,and mined the all-weighted positive and negative association pattern related to the original query by the pattern evaluation framework in the user relevance feedback document set from the VECL first retrieval results.The expansion terms were extracted from the patterns in order to carry out VECL query post-translation expansion.A comparison between the proposed algorithm and the existing cross language query expansion algorithms based on pseudo relevance feedback and weighted association pattern mining is made,which shows that the former can effectively reduce the problems of query topic drift and word mismatch,and improve the performance of cross language information retrieval.And moreover,the method of pattern mining in this paper can be used in recommender systems and improve its accuracy.
作者
黄名选
蒋曹清
HUANG Ming-xuan;JIANG Cao-qing(Guangxi Key Laboratory Cultivation Base of Cross-border E-commerce Intelligent Information Processing,Guangxi University of Finance and Economics,Nanning,Guangxi 530003,China;School of Information and Statistics,Guangxi University of Financeand Economics,Nanning,Guangxi 530003,China)
出处
《电子学报》
EI
CAS
CSCD
北大核心
2018年第12期3029-3036,共8页
Acta Electronica Sinica
基金
国家自然科学基金(No.61762006
No.61662003
No.61262028)
关键词
自然语言处理
信息检索
文本挖掘
模式挖掘
查询扩展
推荐系统
natural language processing
information retrieval
text mining
pattern mining
query expansion
recommender system