摘要
传统的文本信息处理方法无法表征文本内特征,所以不适用于模糊特征的提取分类提出一种高特征参差性下强收敛性文本的信息处理技术,对文本间和文本内的特征同时进行提取,采用迭代控制的TFIDF对特征进行加权值的计算,最后采用22类文本进行性能测试,结果显示,迭代控制的TFIDF算法能够更加细致地对文本进行分类,对特征进行提取,并且算法收敛速度快,稳定性好,具有很好的应用价值。
A processing method of text with high characteristics mixed and strong convergence was propose, the multimediainformation based on iterative classification process and control technology were used to classify the text, and the characteristics between and within the text were all used, the iterative control TFIDF algorithm was used to count the weigh. 22 typesof text were taken as target to test the ability between the improved algorithm and the traditional algorithm, the resultshowed that the improved TFIDF algorithm with iterative can classify the text more detailed and more characteristics is extracted, the algorithm is stable with good convergence ability, it will be used widely.
出处
《科技通报》
北大核心
2014年第4期71-73,共3页
Bulletin of Science and Technology
基金
盐城师范学院校级自然科学研究基金(11YCKL032)
关键词
迭代算法
TFIDF
高特征参差
文本分类
iteration algorithm
TFIDF
high characteristics mixed
text classification