摘要
使用基于PAT-Tree的候选短语提取算法,通过修改PAT-Tree数据结构使之适合处理变长中文字符串及非中文字符。根据交互信息评估字符串的关联程度,并结合新闻报道和网络热词的特点提出向前过滤算法发现网络热词。与其它同类算法相比,本算法不需要制定复杂的语言规则和候选短语的评分公式,实现更加简单、速度更快。实验证明了本文算法的有效性和正确性。
This paper proposes a candidate phrase extraction methods based on PAT-Tree.By modifying the PAT-Tree data structure,the paper makes it suitable for the Chinese string of variable length,then uses mutual information to assess the candidates.Combined with news text's features and characteristics of network hot words,the paper uses a forward filtering method to filter the candidates.Compared with other similar algorithms,our algorithm does not need complex language rules and evaluate formula.The experimental results show that our algorithm is proper and efficient.
出处
《计算机与现代化》
2013年第3期58-62,66,共6页
Computer and Modernization