摘要
传统流式数据采用人工设门法分析,效率低下且依赖于专家。近几年,很多自动流式数据聚类算法纷纷被提出,然而针对数据量不多且分布稀疏的小样本类群始终没有很好的解决办法。提出了一种基于密度-距离的t-混合模型流式数据聚类优化方法,能够较好地解决小样本类群区分困难的问题。该方法通过密度-距离中心算法定位各类群的初始中心,作为t-混合算法的初值对样本数据进行处理,通过最大似然估计求出各类群对应的样本数目,从而实现样本聚类。实验表明,与经典模型算法相比,基于密度-距离的t-混合模型优化算法具有更好的稳定性和可靠性,对小样本类群以及混叠的类群具有较强的适应能力。
Traditionally, the flow cytometry data is analyzed manually, which is inefficient and depends on expert experiences. In recent years, a lot of automatic cluster algorithms have been proposed. However, the clustering performance is not satisfied for sparse data with a random distribution. Therefore, this paper presents an automatic clustering method based on density-distance center for t-mixture model algorithm in flow cytometry data, which is suitable for rare samples. The proposed method finds the center of each group by density- distance center algorithm and uses it as the initial value of t-mixture model to estimate the sample data by maximum likelihood estimation. Compared with the classical algorithm, the result shows that the t-mixture model based on density-distance center has better stability and reliability, and can better fit small or mixed samples.
作者
赵其杰
柯震南
陶靖
卢建霞
Zhao Qijie Ke Zhennan Tao Jing Lu Jianxia(School of Mechatronics Engineering and Automation, Shanghai University, Shanghai 200072, China Shanghai Key Laboratory of Intelligent Manufacturing and Robotics, Shanghai 200072, China Shanghai Nayan Biotechnology Co. , Ltd,Shanghai 201108, China)
出处
《仪器仪表学报》
EI
CAS
CSCD
北大核心
2017年第9期2130-2137,共8页
Chinese Journal of Scientific Instrument
基金
上海市浦江人才计划(17PJ1432300)项目资助