摘要
基于特征过滤的新词语自动提取方法是一种新的新词语提取法。通过对近5年新词语构成特点及在语料中的分布、频率等的分析,确定特征碎片的范围,运用特征过滤的方法获取字符串集。然后,根据新词语构词特点、结构类型等进行过滤,最终提取出新词语的候选集。该方法可以在保证较高召回率的前提下获取较少的字符串,提高垃圾串过滤效率,进而提高准确率。
A new word extraction method was developed that first remove feature segmentation from target marked sentence.The range of feature segmentation set is determined by the traits,frequency of used characters and probability of structure mode of late new words and background knowledge.To collect the set of strings by the method of feature filter.Then,the set of strings is filtered by traits of new words and the probability of single character.Finally,the set of candidate strings of new words are acquired.Under the method are acqurred,the scale of strings will be diminished,then come to improve accuracy while keep the higher recall.
出处
《北华大学学报(社会科学版)》
2012年第5期18-22,共5页
Journal of Beihua University(Social Sciences)
关键词
新词语
特征碎片
特征过滤
自动提取
New word
Feature segmentation
Feature filter
Auto-exxraction