摘要
非共现数据是指不符合联合概率分布,而是符合一个未知函数的数据.将非共现数据转化为共现形式后可以采用熵来定量度量信息并进行聚类.但是,现有算法假设非共现数据的各个属性特征对聚类贡献均匀,没有考虑代表性属性和不相关(冗余)属性对聚类效果的不同影响.因此,本文提出一个非共现数据的两阶段加权IB算法(TSAW-sIB),在非共现数据共现转化的两个阶段,从"非共现/共现/联合"三个视角观察非共现数据,突出代表性属性,抑制冗余属性,获得更能准确反映非共现数据特征的数据表示并进行聚类.实验表明,TSAW-sIB算法优于ROCK、COOLCAT和LIMBO算法.
Non co-occurrence data does not appear in the form of co-occurrence of two variables X, Y, hut rather as a sample of values of an unknown function Z(X, Y) The co-occurrence transformation of non co-occurrence data is necessary for clustering on the concept of Shannon entropy. However, these clustering algorithms treat all features fairly and set weights of all features equally Therefore, the paper proposes a two-stage attribute weighting IB Algorithm (TPAW-slB). At two stages of the formation, we highlight representative features and dim irrelevant features from three viewpoints: non and both. Experiments show that the TPAW-slB algorithm is superior to the ROCK algorithm, the COOLCAT algorithm and the LIMBO algorithm.
出处
《小型微型计算机系统》
CSCD
北大核心
2012年第10期2278-2282,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60773048
61170223)资助
关键词
非共现数据
特征加权
两阶段
信息瓶颈方法
聚类
non co-occurrence
feature weighting
two stage
information bottleneck
clustering