摘要
针对传统的相似性度量方法无法有效处理高维较稀疏数据集的问题,提出了二次相似性度量Y(x_(i),x_(j))。该相似性度量在一定程度上克服了传统相似性度量对高维稀疏数据可信度较低的弊端。用k-means和k-medoids方法进行聚类,结果表明,针对高维较稀疏数据集,二次相似性度量体现了它的有效性和稳定性,为进一步研究和分析提供了有利条件。
According to the problem of being unable to effectively deal with high-dimensional sparse data in traditional similarity measure,the study proposes quadratic similarity measure Y(x_(i),x_(j)).Disadvantages of the low reliability of high-dimensional sparse data in traditional similarity measure are overcome to some extent.K-means and k-medoids are used to cluster.The results show that quadratic similarity measure reflects its effectiveness and stability according to high-dimensional sparse data,in order to provide convenient conditions for further research and analysis.
作者
汪颖
张立莹
Wang Ying;Zhang Liying(Dalian Jiaotong University,Dalian 116028,China)
出处
《黑龙江科学》
2022年第22期15-18,25,共5页
Heilongjiang Science
关键词
聚类
RNA测序
相似性度量
Clustering
RNA sequence
Similarity measure