摘要
首先基于改进的Hash和位运算设计了快速等价类与正区域算法,将其作为求核基础;然后设计基于全局正区域不一致性的快速求核算法。区别于现有算法在求核过程中需要反复多次求正区域,深入分析了核属性ai的特征,捕捉两类C-{ai}所形成的正区域与全局正区域的不一致,不需要反复求完整的C-{ai}正区域,通过3个定理证明了基于全局正区域不一致性识别核属性的正确性和有效性。使用UCI中21个数据集、超高维和海量数据集进行全面检验,结果表明无论是多/少实体、多/少属性和有/无核的决策表,本算法在大部分情况下都优于现有同类算法,尤其适用于大型决策表。
This paper firstly proposed basic algorithms of positive region and equivalence class based on bit vector and improved Hash algorithm. Then the core attributes computation algorithm was designed based on global positive region inconsistency. Different from current algorithms which need to eompute complete positive regions repeatedly when seeking attributes core, this paper studied the characteristics of core attributes, and caught the inconsistencies between the positive region of C-- {al } and the global positive region. The complete positive regions of C--{ai } don' t need to be computed repeatedly. Global positive region inconsistency based attributes core reeognition was proved by 3 theories. 21 data sets of UCI, ultra-high-dimensional data sets and massive data sets were used to test the algorithms proposed by this paper. And the results show the attributes core eomputation algorithm of this paper owns good performance no matter when the number of entities and attributes is more or less and especially is suitable for processing large decision table.
出处
《计算机科学》
CSCD
北大核心
2015年第8期259-264,共6页
Computer Science
基金
国家自然科学基金资助项目:DS证据推理下抗信誉共谋攻击的行为信任研究(71401045)
大学生创业创新训练项目:电子商务中抗信誉共谋欺诈的推理算法和模型研究资助
关键词
粗糙集
核属性
全局正区域
不一致性
Rough set, Attributes core, Global positive region, Inconsistency