In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
The International Skin Imaging Collaboration(ISIC)datasets are pivotal resources for researchers in machine learning for medical image analysis,especially in skin cancer detection.These datasets contain tens of thousa...The International Skin Imaging Collaboration(ISIC)datasets are pivotal resources for researchers in machine learning for medical image analysis,especially in skin cancer detection.These datasets contain tens of thousands of dermoscopic photographs,each accompanied by gold-standard lesion diagnosis metadata.Annual challenges associated with ISIC datasets have spurred significant advancements,with research papers reporting metrics surpassing those of human experts.Skin cancers are categorized into melanoma and non-melanoma types,with melanoma posing a greater threat due to its rapid potential for metastasis if left untreated.This paper aims to address challenges in skin cancer detection via visual inspection and manual examination of skin lesion images,processes historically known for their laboriousness.Despite notable advancements in machine learning and deep learning models,persistent challenges remain,largely due to the intricate nature of skin lesion images.We review research on convolutional neural networks(CNNs)in skin cancer classification and segmentation,identifying issues like data duplication and augmentation problems.We explore the efficacy of Vision Transformers(ViTs)in overcoming these challenges within ISIC dataset processing.ViTs leverage their capabilities to capture both global and local relationships within images,reducing data duplication and enhancing model generalization.Additionally,ViTs alleviate augmentation issues by effectively leveraging original data.Through a thorough examination of ViT-based methodologies,we illustrate their pivotal role in enhancing ISIC image classification and segmentation.This study offers valuable insights for researchers and practitioners looking to utilize ViTs for improved analysis of dermatological images.Furthermore,this paper emphasizes the crucial role of mathematical and computational modeling processes in advancing skin cancer detection methodologies,highlighting their significance in improving algorithmic performance and interpretability.展开更多
Breast cancer(BC)is the most widespread tumor in females worldwide and is a severe public health issue.BC is the leading reason of death affecting females between the ages of 20 to 59 around the world.Early detection ...Breast cancer(BC)is the most widespread tumor in females worldwide and is a severe public health issue.BC is the leading reason of death affecting females between the ages of 20 to 59 around the world.Early detection and therapy can help women receive effective treatment and,as a result,decrease the rate of breast cancer disease.The cancer tumor develops when cells grow improperly and attack the healthy tissue in the human body.Tumors are classified as benign or malignant,and the absence of cancer in the breast is considered normal.Deep learning,machine learning,and transfer learning models are applied to detect and identify cancerous tissue like BC.This research assists in the identification and classification of BC.We implemented the pre-trained model AlexNet and proposed model Breast cancer identification and classification(BCIC),which are machine learning-based models,by evaluating them in the form of comparative research.We used 3 datasets,A,B,and C.We fuzzed these datasets and got 2 datasets,A2C and B3C.Dataset A2C is the fusion of A,B,and C with 2 classes categorized as benign and malignant.Dataset B3C is the fusion of datasets A,B,and C with 3 classes classified as benign,malignant,and normal.We used customized AlexNet according to our datasets and BCIC in our proposed model.We achieved an accuracy of 86.5%on Dataset B3C and 76.8%on Dataset A2C by using AlexNet,and we achieved the optimum accuracy of 94.5%on Dataset B3C and 94.9%on Dataset A2C by using proposed model BCIC at 40 epochs with 0.00008 learning rate.We proposed fuzzed dataset model using transfer learning.We fuzzed three datasets to get more accurate results and the proposed model achieved the highest prediction accuracy using fuzzed dataset transfer learning technique.展开更多
Breast Cancer(BC)is considered the most commonly scrutinized can-cer in women worldwide,affecting one in eight women in a lifetime.Mammogra-phy screening becomes one such standard method that is helpful in identifying...Breast Cancer(BC)is considered the most commonly scrutinized can-cer in women worldwide,affecting one in eight women in a lifetime.Mammogra-phy screening becomes one such standard method that is helpful in identifying suspicious masses’malignancy of BC at an initial level.However,the prior iden-tification of masses in mammograms was still challenging for extremely dense and dense breast categories and needs an effective and automatic mechanisms for helping radiotherapists in diagnosis.Deep learning(DL)techniques were broadly utilized for medical imaging applications,particularly breast mass classi-fication.The advancements in the DL field paved the way for highly intellectual and self-reliant computer-aided diagnosis(CAD)systems since the learning cap-ability of Machine Learning(ML)techniques was constantly improving.This paper presents a new Hyperparameter Tuned Deep Hybrid Denoising Autoenco-der Breast Cancer Classification(HTDHDAE-BCC)on Digital Mammograms.The presented HTDHDAE-BCC model examines the mammogram images for the identification of BC.In the HTDHDAE-BCC model,the initial stage of image preprocessing is carried out using an average median filter.In addition,the deep convolutional neural network-based Inception v4 model is employed to generate feature vectors.The parameter tuning process uses the binary spider monkey opti-mization(BSMO)algorithm.The HTDHDAE-BCC model exploits chameleon swarm optimization(CSO)with the DHDAE model for BC classification.The experimental analysis of the HTDHDAE-BCC model is performed using the MIAS database.The experimental outcomes demonstrate the betterments of the HTDHDAE-BCC model over other recent approaches.展开更多
Context/Objectives: Hepatocellular carcinoma occurs mainly and increasingly in developing countries, where the prognosis is particularly poor. The Barcelona Clinic Liver Cancer classification is used to guide the trea...Context/Objectives: Hepatocellular carcinoma occurs mainly and increasingly in developing countries, where the prognosis is particularly poor. The Barcelona Clinic Liver Cancer classification is used to guide the treatment of hepatocellular carcinoma. The aim of this retrospective study was to describe the Barcelona Clinic Liver Cancer classification and the treatment of hepatocellular carcinoma in a University Hospital in Côte d’Ivoire. Methods: Patients with hepatocellular carcinoma hospitalized in the hepato-gastroenterology unit of the University Hospital of Yopougon from 01 January 2012 to 30 June 2017 were included. The diagnosis of hepatocellular carcinoma was based on the presence of hepatic nodules on the abdominal ultrasound scan, typical images with the helical scanner associated or not with an increase of the α-fetoprotein higher than 200 ng/ml or with histology. Demographic, clinical, biological and radiological data were determined at the time of diagnosis. Patients were classified according to the Barcelona Clinic Liver Cancer classification. Their treatment was specified. Results: There were 258 patients whose median age was 48.1 years. Viral hepatitis B virus was the primary cause of hepatocellular carcinoma in 64.7% of cases. The severity of the underlying cirrhosis was Child-Pugh A in 12.1%, B in 63.6% and C in 24.3% of cases. The median size of the tumor was 63 mm. The α-fetoprotein level was higher than 200 mg/ml in 56.03% of cases. The Eastern Cooperative Oncology Group (ECOG)/World Health Organization (WHO) system was ≥2 in 82.9%. The Barcelona Clinic Liver Cancer classification was A in 1.3%, B in 0%, C in 55.2% and D in 43.5% of patients. There was no transplantation or hepatic resection. Very few patients (1.9%) received radio-frequency curative therapy. The treatment was predominantly symptomatic in 97.8% of patients. During hospitalization 43.7% of patients died. Conclusion: Hepatocellular carcinoma occurs on a liver with severe cirrhosis at a late stage. This does not allow cure treatment and explains a high mortality rate during hospitalization. Hepatitis B virus is the main risk factor and immunization at birth will reduce the incidence of this cancer in Africa.展开更多
Proposed system has been developed to extract the optimal features from the breast tumors using Enhanced Cuckoo Search (ECS) and presented in this paper. The texture feature, intensity histogram feature, radial distan...Proposed system has been developed to extract the optimal features from the breast tumors using Enhanced Cuckoo Search (ECS) and presented in this paper. The texture feature, intensity histogram feature, radial distance feature and shape features have been extracted and the optimal feature set has been obtained using ECS. The overall accuracy of a minimum distance classifier and k-Nearest Neighbor (k-NN) on validation samples is used as a fitness value for ECS. The new approach is carried out on the extracted feature dataset. The proposed system selects only the minimum number of features and performed the accuracy of 98.75% with Minimum Distance Classifier and 99.13% with k-NN Classifier. The performance of the new ECS is compared with the Cuckoo Search and Harmony Search. This result shows that the ECS algorithm is more accurate than the other algorithm. The proposed system can provide valuable information to the physician in medical pathology.展开更多
This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categori...This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categories: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), large cell carcinoma (LULC), and normal. Although CNNs have made significant advancements in medical imaging, their limited capacity to capture long-range dependencies has led to the exploration of ViTs, which leverage self-attention mechanisms for a more comprehensive global understanding of images. The study utilized a dataset of 748 lung CT images to train both models with standardized input sizes, assessing their performance through conventional metrics—accuracy, precision, recall, F1 score, specificity, and AUC—as well as cross entropy, a novel metric for evaluating prediction uncertainty. Both models achieved similar accuracy rates (95%), with ViT demonstrating a slight edge over ResNet50 in precision and F1 scores for specific classes. However, ResNet50 exhibited higher recall for LULC, indicating fewer missed cases. Cross entropy analysis showed that the ViT model had lower average uncertainty, particularly in the LUAD, Normal, and LUSC classes, compared to ResNet50. This finding suggests that ViT predictions are generally more reliable, though ResNet50 performed better for LULC. The study underscores that accuracy alone is insufficient for model comparison, as cross entropy offers deeper insights into the reliability and confidence of model predictions. The results highlight the importance of incorporating cross entropy alongside traditional metrics for a more comprehensive evaluation of deep learning models in medical image classification, providing a nuanced understanding of their performance and reliability. While the ViT outperformed the CNN-based ResNet50 in lung cancer classification based on cross-entropy values, the performance differences were minor and may not hold clinical significance. Therefore, it may be premature to consider replacing CNNs with ViTs in this specific application.展开更多
Skin cancer diagnosis is difficult due to lesion presentation variability. Conventionalmethods struggle to manuallyextract features and capture lesions spatial and temporal variations. This study introduces a deep lea...Skin cancer diagnosis is difficult due to lesion presentation variability. Conventionalmethods struggle to manuallyextract features and capture lesions spatial and temporal variations. This study introduces a deep learning-basedConvolutional and Recurrent Neural Network (CNN-RNN) model with a ResNet-50 architecture which usedas the feature extractor to enhance skin cancer classification. Leveraging synergistic spatial feature extractionand temporal sequence learning, the model demonstrates robust performance on a dataset of 9000 skin lesionphotos from nine cancer types. Using pre-trained ResNet-50 for spatial data extraction and Long Short-TermMemory (LSTM) for temporal dependencies, the model achieves a high average recognition accuracy, surpassingprevious methods. The comprehensive evaluation, including accuracy, precision, recall, and F1-score, underscoresthe model’s competence in categorizing skin cancer types. This research contributes a sophisticated model andvaluable guidance for deep learning-based diagnostics, also this model excels in overcoming spatial and temporalcomplexities, offering a sophisticated solution for dermatological diagnostics research.展开更多
The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data,however,makes the classification quite challenging. Altho...The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data,however,makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data,it may overemphasize some aspects and ignore some other important information contained in the richly complex data,because it displays only the difference in the first twoor three-dimensional PC subspaces. Based on PCA,a principal component accumulation (PCAcc) method was proposed. It employs the information contained in multiple PC subspaces and improves the class separability of cancers. The effectiveness of the present method was evaluated by four commonly used gene expression datasets,and the results show that the method performs well for cancer classification.展开更多
Both microRNA (miRNA) and mRNA expression profiles are important methods for cancer type classification. A comparative study of their classification performance will be helpful in choosing the means of classificatio...Both microRNA (miRNA) and mRNA expression profiles are important methods for cancer type classification. A comparative study of their classification performance will be helpful in choosing the means of classification. Here we evaluated the classification performance of miRNA and mRNA profiles using a new data mining approach based on a novel SVM (Support Vector Machines) based recursive fea- ture elimination (nRFE) algorithm. Computational experiments showed that information encoded in miRNAs is not sufficient to classify cancers; gut-derived samples cluster more accurately when using mRNA expression profiles compared with using miRNA profiles; and poorly differentiated tumors (PDT) could be classified by mRNA expression profiles at the accuracy of 100% versus 93.8% when using miRNA profiles. Furthermore, we showed that mRNA expression profiles have higher capacity in normal tissue classifications than miRNA. We concluded that classification performance using mRNA profiles is superior to that of miRNA profiles in multiple-class cancer classifications.展开更多
One goal of precise oncology is to re-classify cancer based on molecular features rather than its tissue origin. Integrative clustering of large-scale multi-omics data is an important way for molecule-based cancer cla...One goal of precise oncology is to re-classify cancer based on molecular features rather than its tissue origin. Integrative clustering of large-scale multi-omics data is an important way for molecule-based cancer classification. The data heterogeneity and the complexity of inter-omics variations are two major challenges for the integrative clustering analysis. According to the different strategies to deal with these difficulties, we summarized the clustering methods as three major categories: direct integrative clustering, clustering of clusters and regulatory integrative clustering. A few practical considerations on data pre-processing, post-clustering analysis and pathway-based analysis arc also discussed.展开更多
The early detection of skin cancer,particularly melanoma,presents a substantial risk to human health.This study aims to examine the necessity of implementing efficient early detection systems through the utilization o...The early detection of skin cancer,particularly melanoma,presents a substantial risk to human health.This study aims to examine the necessity of implementing efficient early detection systems through the utilization of deep learning techniques.Nevertheless,the existing methods exhibit certain constraints in terms of accessibility,diagnostic precision,data availability,and scalability.To address these obstacles,we put out a lightweight model known as Smart MobiNet,which is derived from MobileNet and incorporates additional distinctive attributes.The model utilizes a multi-scale feature extraction methodology by using various convolutional layers.The ISIC 2019 dataset,sourced from the International Skin Imaging Collaboration,is employed in this study.Traditional data augmentation approaches are implemented to address the issue of model overfitting.In this study,we conduct experiments to evaluate and compare the performance of three different models,namely CNN,MobileNet,and Smart MobiNet,in the task of skin cancer detection.The findings of our study indicate that the proposed model outperforms other architectures,achieving an accuracy of 0.89.Furthermore,the model exhibits balanced precision,sensitivity,and F1 scores,all measuring at 0.90.This model serves as a vital instrument that assists clinicians efficiently and precisely detecting skin cancer.展开更多
BACKGROUND Rectal cancer is characterized by more local recurrence(LR)and lung metastasis than colon cancer.However,the diagnosis of rectal cancer is not standardized as there is no global consensus on its definition ...BACKGROUND Rectal cancer is characterized by more local recurrence(LR)and lung metastasis than colon cancer.However,the diagnosis of rectal cancer is not standardized as there is no global consensus on its definition and classification.The classification of rectal cancer differs between Japanese and Western guidelines.AIM To clarify the characteristics of rectal cancer by comparing the tumor location and characteristics of rectal cancer with those of colon cancer according to each set of guidelines.METHODS A total of 958 patients with Stage II and III colorectal cancer were included in the analysis:607 with colon cancer and 351 with rectal cancer.Localization of rectal cancers was assessed by enema examination and rigid endoscopy.According to Japan guidelines,rectal cancer is classified as Rb(below the peritoneal inversion),Ra(between the inferior margin of second sacral vertebrae and Rb)or RS(between Ra and sacral promontory).RESULTS There were no significant differences between RS rectal cancer and colon cancer in the rates of liver and lung metastasis or LR.Lung metastasis and LR were significantly more common among Rb rectal cancer(in Japan)than in colon cancer(P=0.0043 and P=0.0002,respectively).Lung metastases and LR occurred at significantly higher rates in rectal cancer measuring≤12 cm and≤10 cm than in colon cancers(P=0.0117,P=0.0467,P=0.0036,P=0.0010).Finally,the rates of liver metastasis,lung metastasis,and LR in rectal cancers measuring 11 cm to 15 cm were 6.9%,2.8%,and 5.7%,respectively.These were equivalent to the rates in colon cancer.CONCLUSION High rectal cancer may be treated with the same treatment strategies as colon cancer.There was no difference in the classification of colorectal cancer between Japan and Western countries.展开更多
Gastric cancer (GC) is a major public health issue. It is considered the 5th most common cancer diagnosed worldwide and it is one of the main causes of malignant disease-associated morbidity and mortality. The corners...Gastric cancer (GC) is a major public health issue. It is considered the 5th most common cancer diagnosed worldwide and it is one of the main causes of malignant disease-associated morbidity and mortality. The cornerstone of curative treatment is still surgery, and since the rate of relapse is high, a multidisciplinary approach is warranted in most developed countries. And while there have been recent developments in the perioperative scenario namely the FLOT regimen, little has advanced considering patient selection. We have reviewed the major trials in this setting and provide some insights from recently reported microsatellite instability (MSI) in a subgroup analysis in the MAGIC trial patients that seem to suggest an opportunity to patient selection. Furthermore, GC subtyping may prove helpful selecting candidates to immunotherapy or even multimodal therapy in the future. As the paradigm is moving towards a precision oncology model, GC patient selection remains one the biggest challenges in oncology but seems closer to clinical practice reality as new developments are being reported.展开更多
Background:Gastric cancer is a highly heterogeneous disease,presenting a major obstacle to personalized treatment.Effective markers of the immune checkpoint blockade response are needed for precise patient classificat...Background:Gastric cancer is a highly heterogeneous disease,presenting a major obstacle to personalized treatment.Effective markers of the immune checkpoint blockade response are needed for precise patient classification.We,therefore,divided patients with gastric cancer according to collagen gene expression to indicate their prognosis and treatment response.Methods:We collected data for 1250 patients with gastric cancer from four cohorts.For the TCGA‐STAD cohort,we used consensus clustering to stratify patients based on expression levels of 44 collagen genes and compared the prognosis and clinical characteristics between collagen subtypes.We then identified distinct transcriptomic and genetic alteration signatures for the subtypes.We analyzed the associations of collagen subtypes with the responses to chemotherapy,immunotherapy,and targeted therapy.We also established a platform‐independent collagen‐subtype predictor.We verified the findings in three validation cohorts(GSE84433,GSE62254,and GSE15459)and compared the collagen subtyping method with other molecular subtyping methods.Results:We identified two subtypes of gastric adenocarcinoma:a highexpression collagen subtype(CS‐H)and a low‐expression collagen subtype(CS‐L).Collagen subtype was an independent prognostic factor,with better overall survival in the CS‐L subgroup.The inflammatory response,angiogenesis,and phosphoinositide 3‐kinase(PI3K)/Akt pathways were transcriptionally active in the CS‐H subtype,while DNA repair activity was significantly greater in the CS‐L subtype.PIK3CA was frequently amplified in the CS‐H subtype,while PIK3C2A,PIK3C2G,and PIK3R1 were frequently deleted in the CS‐L subtype.CS‐H subtype tumors were more sensitive to fluorouracil,while CS‐L subtype tumors were more sensitive to immune checkpoint blockade.CS‐L subtype was predicted to be more sensitive to HER2‐targeted drugs,and CS‐H subtype was predicted to be more sensitive to vascular endothelial growth factor and PI3K pathway‐targeting drugs.Collagen subtyping also has the potential to be combined with existing molecular subtyping methods for better patient classification.Conclusions:We classified gastric cancers into two subtypes based on collagen gene expression and validated these subtypes in three validation cohorts.The collagen subgroups differed in terms of prognosis,clinical characteristics,transcriptome,and genetic alterations.The subtypes were closely related to patient responses to chemotherapy,immunotherapy,and targeted therapy.展开更多
Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods...Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods: Two independent datasets from serum samples of 253 ovarian cancer and 167 breast cancer patients were used. The samples were examined by surface- enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The datasets were used to extract the informative proteins using a data-mining method in the discrete stationary wavelet transform domain. As a dimensionality re- duction procedure, the hard thresholding method was applied to reduce the number of wavelet coefficients. Also, a distance measure was used to select the most discriminative coefficients. To find the potential biomarkers using the selected wavelet coefficients, we applied the inverse discrete stationary wavelet transform combined with a two-sided t-test. Results: From the ovarian cancer dataset, a set of five proteins were detected as potential biomarkers that could be used to identify the cancer patients from the healthy cases with accuracy, sensitivity, and specificity of 100%. Also, from the breast cancer dataset, a set of eight proteins were found as the potential biomarkers that could separate the healthy cases from the cancer patients with accuracy of 98.26%, sensitivity of 100%, and specificity of 95.6%. Conclusion: The results have shown that the new bioinformatic tool can be used in combination with the high-throughput proteomic data such as SELDI-TOF MS to find the potential biomarkers with high discriminative power.展开更多
This study aimed to assess the role of the National Comprehensive Cancer Network (NCCN) risk classification in predicting biochemical recurrence (BCR) after radical prostatectomy (RP) in Chinese prostate cancer ...This study aimed to assess the role of the National Comprehensive Cancer Network (NCCN) risk classification in predicting biochemical recurrence (BCR) after radical prostatectomy (RP) in Chinese prostate cancer patients. We included a consecutive cohort of 385 patients with prostate cancer who underwent RP at Fudan University Shanghai Cancer Center (Shanghai, China) from March 2011 to December 2014. Gleason grade groups were applied at analysis according to the 2014 International Society of Urological Pathology Consensus. Risk groups were stratified according to the NCCN Clinical Practice Guidelines in Oncology: Prostate Cancer version 1, 2017. All 385 patients were divided into BCR and non-BCR groups. The clinicopathological characteristics were compared using an independent sample t-test, Chi-squared test, and Fisher's exact test. BCR-free survival was compared using the log-rank test and multivariable Cox proportional hazard analysis. During median follow-up of 48 months (range: 1-78 months), 31 (8.05%) patients experienced BCR. The BCR group had higher prostate-specific antigen level at diagnosis (46.54 ± 39.58 ng m1-1 vs 21.02 ± 21.06 ng ml-1, P= 0.001), more advanced pT stage (P= 0.002), and higher pN1 rate (P〈 0.001). NCCN risk classification was a significant predictor of BCR {P = 0.0006) and BCR-free survival (P = 0.003) after RP. As NCCN risk level increased, there was a significant decreasing trend in BCR-free survival rate (Ptrend = 0.0002). This study confirmed and validated that NCCN risk classification was a significant predictor of BCR and BCR-free survival after RP.展开更多
Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We...Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We developed an ensemble machine learning system named performance-weighted-voting model for cancer type classification in 6,249 samples across 14 cancer types.Our ensemble system consists of five weak classifiers(logistic regression,SVM,random forest,XGBoost and neural networks).We first used cross-validation to get the predicted results for the five classifiers.The weights of the five weak classifiers can be obtained based on their predictive performance by solving linear regression functions.The final predicted probability of the performance-weighted-voting model for a cancer type can be determined by the summation of each classifier's weight multiplied by its predicted probability.Results:Using the somatic mutation count of each gene as the input feature,the overall accuracy of the performance-weighted-voting model reached 71.46%,which was significantly higher than the five weak classifiers and two other ensemble models:the hard-voting model and the soft-voting model.In addition,by analyzing the predictive pattern of the performance-weighted-voting model,we found that in most cancer types,higher tumor mutational burden can improve overall accuracy.Conclusion:This study has important clinical significance for identifying the origin of cancer,especially for those where the primary cannot be determined.In addition,our model presents a good strategy for using ensemble systems for cancer type classification.展开更多
It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a ...It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYLg, and GUCA2B.展开更多
The proposition of cancer cells in a tumor sample,named as tumor purity,is an intrinsic factor of tumor samples and has potentially great influence in variety of analyses including differential methylation,subclonal d...The proposition of cancer cells in a tumor sample,named as tumor purity,is an intrinsic factor of tumor samples and has potentially great influence in variety of analyses including differential methylation,subclonal deconvolution and subtype clustering.Infinium-Purify is an integrated R package for estimating and accounting for tumor purity based on DNA methylation Infinium 450 k array data.InfiniumPurify has three main functions getPurity,InfiniumDMC and InfiniumClust,which could infer tumor purity,differential methylation analysis and tumor sample cluster accounting for estimated or user-provided tumor purities,respectively.The InfiniumPurify package provides a comprehensive analysis of tumor purity in cancer methylation research.展开更多
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
文摘The International Skin Imaging Collaboration(ISIC)datasets are pivotal resources for researchers in machine learning for medical image analysis,especially in skin cancer detection.These datasets contain tens of thousands of dermoscopic photographs,each accompanied by gold-standard lesion diagnosis metadata.Annual challenges associated with ISIC datasets have spurred significant advancements,with research papers reporting metrics surpassing those of human experts.Skin cancers are categorized into melanoma and non-melanoma types,with melanoma posing a greater threat due to its rapid potential for metastasis if left untreated.This paper aims to address challenges in skin cancer detection via visual inspection and manual examination of skin lesion images,processes historically known for their laboriousness.Despite notable advancements in machine learning and deep learning models,persistent challenges remain,largely due to the intricate nature of skin lesion images.We review research on convolutional neural networks(CNNs)in skin cancer classification and segmentation,identifying issues like data duplication and augmentation problems.We explore the efficacy of Vision Transformers(ViTs)in overcoming these challenges within ISIC dataset processing.ViTs leverage their capabilities to capture both global and local relationships within images,reducing data duplication and enhancing model generalization.Additionally,ViTs alleviate augmentation issues by effectively leveraging original data.Through a thorough examination of ViT-based methodologies,we illustrate their pivotal role in enhancing ISIC image classification and segmentation.This study offers valuable insights for researchers and practitioners looking to utilize ViTs for improved analysis of dermatological images.Furthermore,this paper emphasizes the crucial role of mathematical and computational modeling processes in advancing skin cancer detection methodologies,highlighting their significance in improving algorithmic performance and interpretability.
基金supported by Research Fund from University of Johannes-burg,Johannesburg City,South Africa.
文摘Breast cancer(BC)is the most widespread tumor in females worldwide and is a severe public health issue.BC is the leading reason of death affecting females between the ages of 20 to 59 around the world.Early detection and therapy can help women receive effective treatment and,as a result,decrease the rate of breast cancer disease.The cancer tumor develops when cells grow improperly and attack the healthy tissue in the human body.Tumors are classified as benign or malignant,and the absence of cancer in the breast is considered normal.Deep learning,machine learning,and transfer learning models are applied to detect and identify cancerous tissue like BC.This research assists in the identification and classification of BC.We implemented the pre-trained model AlexNet and proposed model Breast cancer identification and classification(BCIC),which are machine learning-based models,by evaluating them in the form of comparative research.We used 3 datasets,A,B,and C.We fuzzed these datasets and got 2 datasets,A2C and B3C.Dataset A2C is the fusion of A,B,and C with 2 classes categorized as benign and malignant.Dataset B3C is the fusion of datasets A,B,and C with 3 classes classified as benign,malignant,and normal.We used customized AlexNet according to our datasets and BCIC in our proposed model.We achieved an accuracy of 86.5%on Dataset B3C and 76.8%on Dataset A2C by using AlexNet,and we achieved the optimum accuracy of 94.5%on Dataset B3C and 94.9%on Dataset A2C by using proposed model BCIC at 40 epochs with 0.00008 learning rate.We proposed fuzzed dataset model using transfer learning.We fuzzed three datasets to get more accurate results and the proposed model achieved the highest prediction accuracy using fuzzed dataset transfer learning technique.
基金This project was supported by the Deanship of Scientific Research at Prince SattamBin Abdulaziz University under research Project#(PSAU-2022/01/20287).
文摘Breast Cancer(BC)is considered the most commonly scrutinized can-cer in women worldwide,affecting one in eight women in a lifetime.Mammogra-phy screening becomes one such standard method that is helpful in identifying suspicious masses’malignancy of BC at an initial level.However,the prior iden-tification of masses in mammograms was still challenging for extremely dense and dense breast categories and needs an effective and automatic mechanisms for helping radiotherapists in diagnosis.Deep learning(DL)techniques were broadly utilized for medical imaging applications,particularly breast mass classi-fication.The advancements in the DL field paved the way for highly intellectual and self-reliant computer-aided diagnosis(CAD)systems since the learning cap-ability of Machine Learning(ML)techniques was constantly improving.This paper presents a new Hyperparameter Tuned Deep Hybrid Denoising Autoenco-der Breast Cancer Classification(HTDHDAE-BCC)on Digital Mammograms.The presented HTDHDAE-BCC model examines the mammogram images for the identification of BC.In the HTDHDAE-BCC model,the initial stage of image preprocessing is carried out using an average median filter.In addition,the deep convolutional neural network-based Inception v4 model is employed to generate feature vectors.The parameter tuning process uses the binary spider monkey opti-mization(BSMO)algorithm.The HTDHDAE-BCC model exploits chameleon swarm optimization(CSO)with the DHDAE model for BC classification.The experimental analysis of the HTDHDAE-BCC model is performed using the MIAS database.The experimental outcomes demonstrate the betterments of the HTDHDAE-BCC model over other recent approaches.
文摘Context/Objectives: Hepatocellular carcinoma occurs mainly and increasingly in developing countries, where the prognosis is particularly poor. The Barcelona Clinic Liver Cancer classification is used to guide the treatment of hepatocellular carcinoma. The aim of this retrospective study was to describe the Barcelona Clinic Liver Cancer classification and the treatment of hepatocellular carcinoma in a University Hospital in Côte d’Ivoire. Methods: Patients with hepatocellular carcinoma hospitalized in the hepato-gastroenterology unit of the University Hospital of Yopougon from 01 January 2012 to 30 June 2017 were included. The diagnosis of hepatocellular carcinoma was based on the presence of hepatic nodules on the abdominal ultrasound scan, typical images with the helical scanner associated or not with an increase of the α-fetoprotein higher than 200 ng/ml or with histology. Demographic, clinical, biological and radiological data were determined at the time of diagnosis. Patients were classified according to the Barcelona Clinic Liver Cancer classification. Their treatment was specified. Results: There were 258 patients whose median age was 48.1 years. Viral hepatitis B virus was the primary cause of hepatocellular carcinoma in 64.7% of cases. The severity of the underlying cirrhosis was Child-Pugh A in 12.1%, B in 63.6% and C in 24.3% of cases. The median size of the tumor was 63 mm. The α-fetoprotein level was higher than 200 mg/ml in 56.03% of cases. The Eastern Cooperative Oncology Group (ECOG)/World Health Organization (WHO) system was ≥2 in 82.9%. The Barcelona Clinic Liver Cancer classification was A in 1.3%, B in 0%, C in 55.2% and D in 43.5% of patients. There was no transplantation or hepatic resection. Very few patients (1.9%) received radio-frequency curative therapy. The treatment was predominantly symptomatic in 97.8% of patients. During hospitalization 43.7% of patients died. Conclusion: Hepatocellular carcinoma occurs on a liver with severe cirrhosis at a late stage. This does not allow cure treatment and explains a high mortality rate during hospitalization. Hepatitis B virus is the main risk factor and immunization at birth will reduce the incidence of this cancer in Africa.
文摘Proposed system has been developed to extract the optimal features from the breast tumors using Enhanced Cuckoo Search (ECS) and presented in this paper. The texture feature, intensity histogram feature, radial distance feature and shape features have been extracted and the optimal feature set has been obtained using ECS. The overall accuracy of a minimum distance classifier and k-Nearest Neighbor (k-NN) on validation samples is used as a fitness value for ECS. The new approach is carried out on the extracted feature dataset. The proposed system selects only the minimum number of features and performed the accuracy of 98.75% with Minimum Distance Classifier and 99.13% with k-NN Classifier. The performance of the new ECS is compared with the Cuckoo Search and Harmony Search. This result shows that the ECS algorithm is more accurate than the other algorithm. The proposed system can provide valuable information to the physician in medical pathology.
文摘This study evaluates the performance and reliability of a vision transformer (ViT) compared to convolutional neural networks (CNNs) using the ResNet50 model in classifying lung cancer from CT images into four categories: lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), large cell carcinoma (LULC), and normal. Although CNNs have made significant advancements in medical imaging, their limited capacity to capture long-range dependencies has led to the exploration of ViTs, which leverage self-attention mechanisms for a more comprehensive global understanding of images. The study utilized a dataset of 748 lung CT images to train both models with standardized input sizes, assessing their performance through conventional metrics—accuracy, precision, recall, F1 score, specificity, and AUC—as well as cross entropy, a novel metric for evaluating prediction uncertainty. Both models achieved similar accuracy rates (95%), with ViT demonstrating a slight edge over ResNet50 in precision and F1 scores for specific classes. However, ResNet50 exhibited higher recall for LULC, indicating fewer missed cases. Cross entropy analysis showed that the ViT model had lower average uncertainty, particularly in the LUAD, Normal, and LUSC classes, compared to ResNet50. This finding suggests that ViT predictions are generally more reliable, though ResNet50 performed better for LULC. The study underscores that accuracy alone is insufficient for model comparison, as cross entropy offers deeper insights into the reliability and confidence of model predictions. The results highlight the importance of incorporating cross entropy alongside traditional metrics for a more comprehensive evaluation of deep learning models in medical image classification, providing a nuanced understanding of their performance and reliability. While the ViT outperformed the CNN-based ResNet50 in lung cancer classification based on cross-entropy values, the performance differences were minor and may not hold clinical significance. Therefore, it may be premature to consider replacing CNNs with ViTs in this specific application.
文摘Skin cancer diagnosis is difficult due to lesion presentation variability. Conventionalmethods struggle to manuallyextract features and capture lesions spatial and temporal variations. This study introduces a deep learning-basedConvolutional and Recurrent Neural Network (CNN-RNN) model with a ResNet-50 architecture which usedas the feature extractor to enhance skin cancer classification. Leveraging synergistic spatial feature extractionand temporal sequence learning, the model demonstrates robust performance on a dataset of 9000 skin lesionphotos from nine cancer types. Using pre-trained ResNet-50 for spatial data extraction and Long Short-TermMemory (LSTM) for temporal dependencies, the model achieves a high average recognition accuracy, surpassingprevious methods. The comprehensive evaluation, including accuracy, precision, recall, and F1-score, underscoresthe model’s competence in categorizing skin cancer types. This research contributes a sophisticated model andvaluable guidance for deep learning-based diagnostics, also this model excels in overcoming spatial and temporalcomplexities, offering a sophisticated solution for dermatological diagnostics research.
基金supported by the National Natural Science Foundation of China (20835002)International Science and Technology Cooperation Program of the Ministry of Science and Technology (MOST) of China (2008DFA32250)
文摘The classification of cancer is a major research topic in bioinformatics. The nature of high dimensionality and small size associated with gene expression data,however,makes the classification quite challenging. Although principal component analysis (PCA) is of particular interest for the high-dimensional data,it may overemphasize some aspects and ignore some other important information contained in the richly complex data,because it displays only the difference in the first twoor three-dimensional PC subspaces. Based on PCA,a principal component accumulation (PCAcc) method was proposed. It employs the information contained in multiple PC subspaces and improves the class separability of cancers. The effectiveness of the present method was evaluated by four commonly used gene expression datasets,and the results show that the method performs well for cancer classification.
基金supported by a grant from the National High-tech R&D Program (863 Program, No. 2006AA02Z331) to Liangbiao Chen
文摘Both microRNA (miRNA) and mRNA expression profiles are important methods for cancer type classification. A comparative study of their classification performance will be helpful in choosing the means of classification. Here we evaluated the classification performance of miRNA and mRNA profiles using a new data mining approach based on a novel SVM (Support Vector Machines) based recursive fea- ture elimination (nRFE) algorithm. Computational experiments showed that information encoded in miRNAs is not sufficient to classify cancers; gut-derived samples cluster more accurately when using mRNA expression profiles compared with using miRNA profiles; and poorly differentiated tumors (PDT) could be classified by mRNA expression profiles at the accuracy of 100% versus 93.8% when using miRNA profiles. Furthermore, we showed that mRNA expression profiles have higher capacity in normal tissue classifications than miRNA. We concluded that classification performance using mRNA profiles is superior to that of miRNA profiles in multiple-class cancer classifications.
文摘One goal of precise oncology is to re-classify cancer based on molecular features rather than its tissue origin. Integrative clustering of large-scale multi-omics data is an important way for molecule-based cancer classification. The data heterogeneity and the complexity of inter-omics variations are two major challenges for the integrative clustering analysis. According to the different strategies to deal with these difficulties, we summarized the clustering methods as three major categories: direct integrative clustering, clustering of clusters and regulatory integrative clustering. A few practical considerations on data pre-processing, post-clustering analysis and pathway-based analysis arc also discussed.
文摘The early detection of skin cancer,particularly melanoma,presents a substantial risk to human health.This study aims to examine the necessity of implementing efficient early detection systems through the utilization of deep learning techniques.Nevertheless,the existing methods exhibit certain constraints in terms of accessibility,diagnostic precision,data availability,and scalability.To address these obstacles,we put out a lightweight model known as Smart MobiNet,which is derived from MobileNet and incorporates additional distinctive attributes.The model utilizes a multi-scale feature extraction methodology by using various convolutional layers.The ISIC 2019 dataset,sourced from the International Skin Imaging Collaboration,is employed in this study.Traditional data augmentation approaches are implemented to address the issue of model overfitting.In this study,we conduct experiments to evaluate and compare the performance of three different models,namely CNN,MobileNet,and Smart MobiNet,in the task of skin cancer detection.The findings of our study indicate that the proposed model outperforms other architectures,achieving an accuracy of 0.89.Furthermore,the model exhibits balanced precision,sensitivity,and F1 scores,all measuring at 0.90.This model serves as a vital instrument that assists clinicians efficiently and precisely detecting skin cancer.
文摘BACKGROUND Rectal cancer is characterized by more local recurrence(LR)and lung metastasis than colon cancer.However,the diagnosis of rectal cancer is not standardized as there is no global consensus on its definition and classification.The classification of rectal cancer differs between Japanese and Western guidelines.AIM To clarify the characteristics of rectal cancer by comparing the tumor location and characteristics of rectal cancer with those of colon cancer according to each set of guidelines.METHODS A total of 958 patients with Stage II and III colorectal cancer were included in the analysis:607 with colon cancer and 351 with rectal cancer.Localization of rectal cancers was assessed by enema examination and rigid endoscopy.According to Japan guidelines,rectal cancer is classified as Rb(below the peritoneal inversion),Ra(between the inferior margin of second sacral vertebrae and Rb)or RS(between Ra and sacral promontory).RESULTS There were no significant differences between RS rectal cancer and colon cancer in the rates of liver and lung metastasis or LR.Lung metastasis and LR were significantly more common among Rb rectal cancer(in Japan)than in colon cancer(P=0.0043 and P=0.0002,respectively).Lung metastases and LR occurred at significantly higher rates in rectal cancer measuring≤12 cm and≤10 cm than in colon cancers(P=0.0117,P=0.0467,P=0.0036,P=0.0010).Finally,the rates of liver metastasis,lung metastasis,and LR in rectal cancers measuring 11 cm to 15 cm were 6.9%,2.8%,and 5.7%,respectively.These were equivalent to the rates in colon cancer.CONCLUSION High rectal cancer may be treated with the same treatment strategies as colon cancer.There was no difference in the classification of colorectal cancer between Japan and Western countries.
文摘Gastric cancer (GC) is a major public health issue. It is considered the 5th most common cancer diagnosed worldwide and it is one of the main causes of malignant disease-associated morbidity and mortality. The cornerstone of curative treatment is still surgery, and since the rate of relapse is high, a multidisciplinary approach is warranted in most developed countries. And while there have been recent developments in the perioperative scenario namely the FLOT regimen, little has advanced considering patient selection. We have reviewed the major trials in this setting and provide some insights from recently reported microsatellite instability (MSI) in a subgroup analysis in the MAGIC trial patients that seem to suggest an opportunity to patient selection. Furthermore, GC subtyping may prove helpful selecting candidates to immunotherapy or even multimodal therapy in the future. As the paradigm is moving towards a precision oncology model, GC patient selection remains one the biggest challenges in oncology but seems closer to clinical practice reality as new developments are being reported.
基金Science and Technology Program of Fujian Province of China,Grant/Award Number:2019YZ016006Scientific Research Foundation of Fujian Cancer Hospital,Grant/Award Number:2022YNG08。
文摘Background:Gastric cancer is a highly heterogeneous disease,presenting a major obstacle to personalized treatment.Effective markers of the immune checkpoint blockade response are needed for precise patient classification.We,therefore,divided patients with gastric cancer according to collagen gene expression to indicate their prognosis and treatment response.Methods:We collected data for 1250 patients with gastric cancer from four cohorts.For the TCGA‐STAD cohort,we used consensus clustering to stratify patients based on expression levels of 44 collagen genes and compared the prognosis and clinical characteristics between collagen subtypes.We then identified distinct transcriptomic and genetic alteration signatures for the subtypes.We analyzed the associations of collagen subtypes with the responses to chemotherapy,immunotherapy,and targeted therapy.We also established a platform‐independent collagen‐subtype predictor.We verified the findings in three validation cohorts(GSE84433,GSE62254,and GSE15459)and compared the collagen subtyping method with other molecular subtyping methods.Results:We identified two subtypes of gastric adenocarcinoma:a highexpression collagen subtype(CS‐H)and a low‐expression collagen subtype(CS‐L).Collagen subtype was an independent prognostic factor,with better overall survival in the CS‐L subgroup.The inflammatory response,angiogenesis,and phosphoinositide 3‐kinase(PI3K)/Akt pathways were transcriptionally active in the CS‐H subtype,while DNA repair activity was significantly greater in the CS‐L subtype.PIK3CA was frequently amplified in the CS‐H subtype,while PIK3C2A,PIK3C2G,and PIK3R1 were frequently deleted in the CS‐L subtype.CS‐H subtype tumors were more sensitive to fluorouracil,while CS‐L subtype tumors were more sensitive to immune checkpoint blockade.CS‐L subtype was predicted to be more sensitive to HER2‐targeted drugs,and CS‐H subtype was predicted to be more sensitive to vascular endothelial growth factor and PI3K pathway‐targeting drugs.Collagen subtyping also has the potential to be combined with existing molecular subtyping methods for better patient classification.Conclusions:We classified gastric cancers into two subtypes based on collagen gene expression and validated these subtypes in three validation cohorts.The collagen subgroups differed in terms of prognosis,clinical characteristics,transcriptome,and genetic alterations.The subtypes were closely related to patient responses to chemotherapy,immunotherapy,and targeted therapy.
文摘Objective: To develop a new bioinformatic tool based on a data-mining approach for extraction of the most infor- mative proteins that could be used to find the potential biomarkers for the detection of cancer. Methods: Two independent datasets from serum samples of 253 ovarian cancer and 167 breast cancer patients were used. The samples were examined by surface- enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The datasets were used to extract the informative proteins using a data-mining method in the discrete stationary wavelet transform domain. As a dimensionality re- duction procedure, the hard thresholding method was applied to reduce the number of wavelet coefficients. Also, a distance measure was used to select the most discriminative coefficients. To find the potential biomarkers using the selected wavelet coefficients, we applied the inverse discrete stationary wavelet transform combined with a two-sided t-test. Results: From the ovarian cancer dataset, a set of five proteins were detected as potential biomarkers that could be used to identify the cancer patients from the healthy cases with accuracy, sensitivity, and specificity of 100%. Also, from the breast cancer dataset, a set of eight proteins were found as the potential biomarkers that could separate the healthy cases from the cancer patients with accuracy of 98.26%, sensitivity of 100%, and specificity of 95.6%. Conclusion: The results have shown that the new bioinformatic tool can be used in combination with the high-throughput proteomic data such as SELDI-TOF MS to find the potential biomarkers with high discriminative power.
基金This study was sponsored by the National Natural Science Foundation of China (No. 81472377) and the Natural Science Foundation of Shanghai (No. 16ZR1406500). The authors also thank Wei-Yi Yang, Cui-Zhu Zhang, and Ying Shen for helping with follow-up of patients.
文摘This study aimed to assess the role of the National Comprehensive Cancer Network (NCCN) risk classification in predicting biochemical recurrence (BCR) after radical prostatectomy (RP) in Chinese prostate cancer patients. We included a consecutive cohort of 385 patients with prostate cancer who underwent RP at Fudan University Shanghai Cancer Center (Shanghai, China) from March 2011 to December 2014. Gleason grade groups were applied at analysis according to the 2014 International Society of Urological Pathology Consensus. Risk groups were stratified according to the NCCN Clinical Practice Guidelines in Oncology: Prostate Cancer version 1, 2017. All 385 patients were divided into BCR and non-BCR groups. The clinicopathological characteristics were compared using an independent sample t-test, Chi-squared test, and Fisher's exact test. BCR-free survival was compared using the log-rank test and multivariable Cox proportional hazard analysis. During median follow-up of 48 months (range: 1-78 months), 31 (8.05%) patients experienced BCR. The BCR group had higher prostate-specific antigen level at diagnosis (46.54 ± 39.58 ng m1-1 vs 21.02 ± 21.06 ng ml-1, P= 0.001), more advanced pT stage (P= 0.002), and higher pN1 rate (P〈 0.001). NCCN risk classification was a significant predictor of BCR {P = 0.0006) and BCR-free survival (P = 0.003) after RP. As NCCN risk level increased, there was a significant decreasing trend in BCR-free survival rate (Ptrend = 0.0002). This study confirmed and validated that NCCN risk classification was a significant predictor of BCR and BCR-free survival after RP.
文摘Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We developed an ensemble machine learning system named performance-weighted-voting model for cancer type classification in 6,249 samples across 14 cancer types.Our ensemble system consists of five weak classifiers(logistic regression,SVM,random forest,XGBoost and neural networks).We first used cross-validation to get the predicted results for the five classifiers.The weights of the five weak classifiers can be obtained based on their predictive performance by solving linear regression functions.The final predicted probability of the performance-weighted-voting model for a cancer type can be determined by the summation of each classifier's weight multiplied by its predicted probability.Results:Using the somatic mutation count of each gene as the input feature,the overall accuracy of the performance-weighted-voting model reached 71.46%,which was significantly higher than the five weak classifiers and two other ensemble models:the hard-voting model and the soft-voting model.In addition,by analyzing the predictive pattern of the performance-weighted-voting model,we found that in most cancer types,higher tumor mutational burden can improve overall accuracy.Conclusion:This study has important clinical significance for identifying the origin of cancer,especially for those where the primary cannot be determined.In addition,our model presents a good strategy for using ensemble systems for cancer type classification.
基金supported by the National Natural Science Foundation of China(Grant No.61672386)Humanities and Social Sciences Planning Project of Ministry of Education,China(Grant No.16YJAZH071)+1 种基金Anhui Provincial Natural Science Foundation of China(Grant No.1708085MF142)the Natural Science Research Key Project of Anhui Colleges,China(Grant No.KJ2014A266)
文摘It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYLg, and GUCA2B.
基金This project was partially supported by the National Natural Science Foundation of China(61702325 and 61572327)Shanghai Science and Technology Innovation Action Plan(16391902900)National Institute of Health(R01GM122083).
文摘The proposition of cancer cells in a tumor sample,named as tumor purity,is an intrinsic factor of tumor samples and has potentially great influence in variety of analyses including differential methylation,subclonal deconvolution and subtype clustering.Infinium-Purify is an integrated R package for estimating and accounting for tumor purity based on DNA methylation Infinium 450 k array data.InfiniumPurify has three main functions getPurity,InfiniumDMC and InfiniumClust,which could infer tumor purity,differential methylation analysis and tumor sample cluster accounting for estimated or user-provided tumor purities,respectively.The InfiniumPurify package provides a comprehensive analysis of tumor purity in cancer methylation research.