摘要
本文意在提高文本分类的准确度和速度。利用tf算法对特征项进行初步赋予权值,再使用屏蔽词对特殊非实意词进行屏蔽。本文独创概率论分布法,使用L-E算子进行加权,使得特殊位置与分布广泛的特征项,呈指数形式加权,较优结果能更快收敛。本文利用遗传算法,采用交叉算子和变异算子,采用适宜的目标函数,加快了检索速度,并有更大概率得到最优结果。采用混合算法,可以排除同义词和非特征项的干扰。
This article aims to improve the accuracy and speed of text classification. T * f algorithm is used to initially weigh the feature item, then stop words is used to shield specially meaningless words. Original probability distribution method and weighted L- E operator enable the features in the special positions or widely distributed to weight in exponential form, so that the better results converge faster. In this paper, by using the genetic algorithm, crossover operator and mutation operator, and adopting appropriate objective function, the retrieval process speeds up, and has a greater probability to get the optimal result. Hybrid algorithm is proposed, which can eliminate the synonyms and the characteristics of interference.
出处
《电脑与电信》
2015年第3期49-52,共4页
Computer & Telecommunication
基金
大夏基金项目
项目编号:2013DX-241
关键词
遗传算法
文本分类
特征项
genetic algorithm
text classification
term