摘要
癌症的早期诊断可以显著提高癌症患者的存活率,三分类问题就是将未知样本与已知样本进行匹配度检测,预测样本是健康状态,良性发展状态,还是癌症状态.针对复杂难分的卵巢癌蛋白质质谱数据,提出了一种基于高斯混合模型和BP神经网络的三分类预测模型.首先,去除原数据中的冗余,对其进行方差排序及交集筛选提取特征集合一,再利用高斯混合模型处理求得参数作为特征集合二,最后使用BP神经网络进行样本三分类,准确率达到72.9%.结果表明:模型可以作为卵巢癌质谱数据三分类的可选择工具.
Early diagnosis of cancer can significantly improve the survival rate of cancer patients.Tri-classification problem is to detect the matching degree between unknown and known samples and to predict whether the samples is healthy,benign or cancer.According to the complex and difficult mass spectra of ovarian cancer proteins,this paper presented a tri-classification method based on Gaussian mixture model and BP neural network algorithm.Firstly,remove redundancy from the original data.Secondly,extract the first feature set with variance sequencing and intersection screening,and obtain the second feature set by Gaussian mixture model processing.Finally,predict sample classification by BP neural network.The accuracy of the proposed method is 72.9%,higher than the known result.
作者
马敬山
魏东
任福全
李玉双
MA Jing-shan;WEI Dong;REN Fu quan;LI Yu-shuang(School of Science,Yanshan University,Qinhuangdao 066004,China;Hebei Dataport Technology Co.,Ltd,Qinhuangdao 066004,China)
出处
《数学的实践与认识》
北大核心
2020年第7期147-153,共7页
Mathematics in Practice and Theory
基金
国家自然科学基金(61807029)。
关键词
卵巢癌质谱数据
高斯混合模型
BP神经网络
三分类
mass spectrometric data for ovarian cancer
Gaussian mixture model
BP neural network
tri-classification