摘要
样本相似性度量在机器学习,尤其在聚类任务中起着至关重要的作用,而且许多实际问题涉及的样本,如基因、蛋白质、药物等通常具有不同维度。因此,如何度量不同维度样本相似性显得尤为重要。鉴于大多数现有相似性度量方法仅适用于相同维度样本,基于Cheng-范数,提出一种不同维度样本相似性度量新方法,并以该方法分别基于蛋白质加权图对应的加权邻接矩阵和加权拉普拉斯矩阵测量的相似性,采用层次聚类算法对蛋白质样本进行功能聚类,取得了预期的聚类结果,从而充分验证了该相似性度量方法的有效性。
Similarity measurement of samples plays an important role in machine learning,especially in clustering,tasks,and the samples involved in many practical problems,such as genes,proteins,drugs,etc.usually have different dimensions.Therefore,how to measure the similarity of samples with different dimensions becomes particularly important.Considering that the majority of existing similarity measurement methods are only suitable for samples with the same dimension,based on Cheng-norm,a new method for similarity measurement of samples with different dimensions is introduced.With the similarity measured by this method based on weighted adjacency matrices and weighted Laplacian matrices corresponding to protein weighted graphs,a clustering of protein functions is conducted performed by hierarchical clustering algorithm,which shows expected clustering results and fully verifying the effectiveness of this similarity measurement method.
作者
郭志伟
陈新庄
GUO Zhiwei;CHEN Xinzhuang(College of Mathematics and Computer Science,Yan’an University,Yan’an 716000,China)
出处
《延安大学学报(自然科学版)》
2022年第2期29-35,共7页
Journal of Yan'an University:Natural Science Edition
基金
国家自然科学基金项目(62041212)
陕西省自然科学基础研究计划项目(2020JM-548)
延安大学博士科研启动项目(YDBK2021-03)。
关键词
Cheng-范数
不同维度样本
相似性
机器学习
层次聚类算法
Cheng-norm
samples with different dimensions
similarity
machine learning
hierarchical clustering algorithm