摘要
乳腺癌是世界范围内妇女死亡的主要原因之一,准确的诊断是乳腺癌治疗中最重要的步骤之一。本文详细讲解了逻辑回归模型的原理知识,结合Sklearn机器学习库的Logistic Regression算法对乳腺癌威斯康辛(诊断)数据集进行了数据分类。由于该数据集分类标签划分为两类(恶性、良性),能够很好地适用于逻辑回归模型。用基于两个特征的逻辑回归模型得到的分类结果表明,当选取平均半径和最大周长两个特征时,分类精度最高(95.72%)。与以往的方法相比,该方法在性能上有所提高。
Breast cancer is one of the major causes of death for women worldwide,and accurate diagnosis is one of the most important steps in the treatment of breast cancer.This paper explains the knowledge of the logistic regression model in detail,and classifies the data set of breast cancer by using the Logistic Regression algorithm of Sklearn machine learning library.The classification label of the data set is divided into 2 classes(malignant and benign),which is appropriate for the logistic regression model.The classification results based on the logistic regression model with two features show that the classification accuracy is the highest(95.72%)when the two characteristics of the mean radius and the largest perimeter are selected.In comparison to previous methods,the performance has been improved to some extent.
作者
刘蕾
LIU Lei(Dalian Neusoft Information University,Dalian 116023,China)
出处
《软件工程》
2018年第2期21-23,17,共4页
Software Engineering
关键词
乳腺癌数据集
逻辑回归分类算法
预测
breast cancer data set
logistic regression classification algorithm
prediction