摘要
在线论坛中包含了大量的有用信息,通过检索论坛中的数据用户可以方便地获取所需的知识,然而论坛数据的层次特征给内容检索提出了严峻的挑战。针对论坛数据的层次特征,提出了一种基于层次评分函数的多粒度搜索方法。将论坛数据用树型层次结构表示,并基于多个因素提出了融合话题、发言、语句和单词多个粒度的层次评分函数。为了避免多种粒度的数据在返回结果中具有重复性,提出了一种有约束的返回结果最大化模型。将返回结果最大化模型转换为最大独立集合问题,并给出了一种启发式优化算法。实验表明,提出的算法在检索论坛数据时不仅具有很好的效率,而且准确性非常高。
Online forums contains much useful information, which makes it convenient for users to retrieve necessary know- ledge, however, the hierarchical structure of forum data poses great challenges to content retrieve. In order to solve this prob- lem, this paper proposed a hierarchical scoring function based multi-granularity searching method. Firstly, it represented the forum data with trees, and gave a scoring function including topics, posts, sentences and words based on several considera- tions. Secondly, in order to avoid the replication of data in results of multi-granularity, it proposed a maximization model of re- suits with constraints. Finally, it transformed the maximization model of results into the problem of maximal independent sets, and gave a heuristic optimal algorithm. The experiments show that, the proposed method is more efficient and accurate that re- lated works while retrieving forum data.
出处
《计算机应用研究》
CSCD
北大核心
2016年第1期101-103,121,共4页
Application Research of Computers
基金
湖北省国际交流与合作项目(2012IHA0140)
湖北省教育厅科学技术研究计划指导性项目(B2014153)
关键词
论坛
信息检索
层次评分函数
多粒度搜索
forum
information retrieval
hierarchical scoring function
multi-granularity searching