期刊文献+

属性序下的粗糙集与KNN相结合的英文文本分类研究 被引量:1

English text classification study based on rough set under attribute sequence and KNN
下载PDF
导出
摘要 由于KNN(K Nearest Neighbor)文本分类器的待分类文本数据维数和计算次数较高,其耗费的时间和空间成本也很高,故引入粗糙集的属性约简算法对待分类的数据进行预处理。提出了基于属性序的处理方法和算法,解决粗糙集属性约简中的NP-hard问题。降低算法计算量从算法本身和运算技巧两个层面出发:在粗糙集区分矩阵的关键环节正区域计算上提出递减式计算方法,减少等价类的计算工作量;运用去停止词的查表法、位置信息在属性序中的引入及倒排索引的检索方法等来进一步降低系统的运行时间和空间成本。通过实验验证,经过粗糙集约简处理的KNN分类系统在分类的准确度、召回率与F值上与没有约简的KNN分类器效果相当,但是系统的时间和空间成本大幅降低。 Due to the high data dimensionality and high computing frequency of the KNN (K Nearest Neighbor) text classifier, which leads to a very high cost of time and space, a novel method is proposed to reduce the cost of the KNN, in which the data is pretreated by the attribute reduction algorithm of rough set, and then an attribute sequence based treatment method and algorithm is employed to solve the NP- hard problem of the rough set attribute reduction. The efforts on the optimized algorithm and operation techniques are made to reduce the amount of calculation. On one hand, the algorithm is optimized by decreasing the workload of the equivalence class, which is realized by employing a novel decreasing type calculation method in the calculation of positive area. It is the key of the discernibility matrix of rough set. On the other hand, the operating cost of whole system is further reduced by using calculation methods including the look-up table in cutting stop word, the location information in attribute sequence, and inverted index retrieval. The method is tested, and the results show no great difference with the KNN classifiers without rough set reduction in classification accuracy, recall rate, and F values, however time and space cost of the system is reduced significantly.
作者 朱敏玲
出处 《黑龙江大学自然科学学报》 CAS 北大核心 2015年第3期404-408,共5页 Journal of Natural Science of Heilongjiang University
基金 国家自然科学基金资助项目(61272375) 北京市教育委员会科技计划科研项目(KM201411232015) 北京市属高等学校创新团队建设与教师职业发展计划项目(IDHT20130519) 北京信息科技大学开放实验室课题(ICDD2015)
关键词 粗糙集 KNN 属性序 文本分类 属性约简 rough set KNN attribute sequence text classification attribute reduction
  • 相关文献

参考文献18

二级参考文献104

共引文献1014

同被引文献12

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部