摘要
为了正确理解检索意图和客观表达用户的主观信息,结合CRF模型较高的语义区分率和歧义消解率等特点,对用户文本检索需求信息进行区分,同时选择关键词的上下文信息作为特征获取更丰富的信息,提出一种基于条件随机场(conditional random field,CRF)模型的文本检索需求信息划分算法(CRF_Q),从而清晰地划分两个连续检索词间的边界.在锚文本相似度和检索词相似度两个属性相组合的实验结果中,决策树模型和CRF_Q算法最优,且CRF_Q算法的综合评价指标较决策树模型高4.4%.
In order to correctly understand the retrieval intention and express the user subjective information, combined with the characteristics of higher semantic differential rate and ambiguity resolution rate of CRF model, the user text retrieval requirement information is differentiated, The CRF_Q algorithm is given in this paper. With the keyword context information as a feature, more information obtained at the same time. The boundary between two consecutive retrievals is clearly divided. In the experimental results of combining with anchor text similarity and retrieval similari- ty, the decision tree model and CRF_Q algorithm are optimal. Furthermore, the comprehensive evaluation index of CRF_Q algorithm is 4.4 % higher than that of the decision tree model.
出处
《扬州大学学报(自然科学版)》
CAS
北大核心
2016年第4期47-49,53,共4页
Journal of Yangzhou University:Natural Science Edition
基金
国家级星火计划资助项目(2011GA690190)
江苏省高校哲学社会科学研究资助项目(2015SJD702)
淮阴工学院科研基金资助项目(HGC1422)