摘要
聚类按关联进行分类,关联和聚类分析的基础是相似性计算。通常相似性是指绝对相似性,具有对称性。但自然语言研究中发现大部分规律都是偏向的,具有不对称性,需要用偏向的思路来考察不对称的关联和聚类策略:以类似条件概率的概率蕴涵指标来描写特征间的不对称关联,并在此基础上定义优势关系、紧密关系、控制中心、中途岛等关联特性;基于偏向相似性的聚类策略,从而能更好地处理语言本体研究中的"假性孤立点"、数据稀疏问题和家族象似性类型的聚类。
Cluster analysis is the task of grouping a set of objects by associations of these objects. The diameters of cluster and association analysis are similarity measures, which often involves the absolute similarity of the symmetry property. But most rules found in natural languages are inclined and have asymmetrical forms. We describes the asymmetrical associationby a parameter of Probability Entailment, i.e. the conditional probability, to represent the asymmetrical associations among features. And then we define the Domination Relation, the Tight Relation, the Control Center, and the Midway island. A strategy for cluster based on inclined similarity measures is presented to deal with issues likethe false isolated points, data sparsity and family iconicity.
出处
《中文信息学报》
CSCD
北大核心
2017年第1期205-211,220,共8页
Journal of Chinese Information Processing
基金
教育部人文社会科学规划基金(13YJA740005)
关键词
不对称性
条件概率
关联
聚类
asymmetry, conditional probability, association, cluster