摘要
两阶段方法能够高效地对稀有类进行分类,第一阶段(P阶段)训练P规则预测目标类的存在性.这些P规则覆盖绝大多数的正例(目标类例子),在该阶段中尽量覆盖那些属于目标类的正例.第二阶段(N阶段)学习N规则预测目标类的不存在性,其所基于的数据集为所有P规则所覆盖的正例和反例(非目标类例子)集.N规则的目标是尽量去除由P规则所引入的反例,并尽可能地保留正例,在该阶段中尽量获得高精度.在测试阶段对所得到P规则和N规则根据一定的分值机制赋值,并根据这些规则的分值来分类.
The key feature of the method is that it is composed with two phases, the first phase aims for high recall by introducing rules with high support and a reasonable level of accuracy. The second phase then tries to improve the precision by learning rules to remove false positives in the collection of the records covered by the first phase rules. In testing phase we use these p-rules and n-rules, whichare assigned a score by scoring algorithm, for classification.
出处
《周口师范学院学报》
CAS
2003年第5期25-29,共5页
Journal of Zhoukou Normal University
关键词
两阶段
顺序覆盖技术
稀有类
Two-Phase
sequential covering technique
rare class