期刊文献+
共找到377篇文章
< 1 2 19 >
每页显示 20 50 100
Prediction of Lung Cancer Stage Using Tumor Gene Expression Data
1
作者 Yadi Gu 《Journal of Cancer Therapy》 2024年第8期287-302,共16页
Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based... Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity. 展开更多
关键词 Lung Cancer Detection Stage Prediction gene expression data Xgboost Machine Learning
下载PDF
Deep Learning Enabled Microarray Gene Expression Classification for Data Science Applications
2
作者 Areej A.Malibari Reem M.Alshehri +5 位作者 Fahd N.Al-Wesabi Noha Negm Mesfer Al Duhayyim Anwer Mustafa Hilal Ishfaq Yaseen Abdelwahed Motwakel 《Computers, Materials & Continua》 SCIE EI 2022年第11期4277-4290,共14页
In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary cha... In bioinformatics applications,examination of microarray data has received significant interest to diagnose diseases.Microarray gene expression data can be defined by a massive searching space that poses a primary challenge in the appropriate selection of genes.Microarray data classification incorporates multiple disciplines such as bioinformatics,machine learning(ML),data science,and pattern classification.This paper designs an optimal deep neural network based microarray gene expression classification(ODNN-MGEC)model for bioinformatics applications.The proposed ODNN-MGEC technique performs data normalization process to normalize the data into a uniform scale.Besides,improved fruit fly optimization(IFFO)based feature selection technique is used to reduce the high dimensionality in the biomedical data.Moreover,deep neural network(DNN)model is applied for the classification of microarray gene expression data and the hyperparameter tuning of the DNN model is carried out using the Symbiotic Organisms Search(SOS)algorithm.The utilization of IFFO and SOS algorithms pave the way for accomplishing maximum gene expression classification outcomes.For examining the improved outcomes of the ODNN-MGEC technique,a wide ranging experimental analysis is made against benchmark datasets.The extensive comparison study with recent approaches demonstrates the enhanced outcomes of the ODNN-MGEC technique in terms of different measures. 展开更多
关键词 BIOINFORMATICS data science microarray gene expression data classification deep learning metaheuristics
下载PDF
A Novel Soft Clustering Approach for Gene Expression Data
3
作者 E.Kavitha R.Tamilarasan +1 位作者 Arunadevi Baladhandapani M.K.Jayanthi Kannan 《Computer Systems Science & Engineering》 SCIE EI 2022年第12期871-886,共16页
Gene expression data represents a condition matrix where each rowrepresents the gene and the column shows the condition. Micro array used todetect gene expression in lab for thousands of gene at a time. Genes encode p... Gene expression data represents a condition matrix where each rowrepresents the gene and the column shows the condition. Micro array used todetect gene expression in lab for thousands of gene at a time. Genes encode proteins which in turn will dictate the cell function. The production of messengerRNA along with processing the same are the two main stages involved in the process of gene expression. The biological networks complexity added with thevolume of data containing imprecision and outliers increases the challenges indealing with them. Clustering methods are hence essential to identify the patternspresent in massive gene data. Many techniques involve hierarchical, partitioning,grid based, density based, model based and soft clustering approaches for dealingwith the gene expression data. Understanding the gene regulation and other usefulinformation from this data can be possible only through effective clustering algorithms. Though many methods are discussed in the literature, we concentrate onproviding a soft clustering approach for analyzing the gene expression data. Thepopulation elements are grouped based on the fuzziness principle and a degree ofmembership is assigned to all the elements. An improved Fuzzy clustering byLocal Approximation of Memberships (FLAME) is proposed in this workwhich overcomes the limitations of the other approaches while dealing with thenon-linear relationships and provide better segregation of biological functions. 展开更多
关键词 REINFORCEMENT MEMBERSHIP CENTROID threshold STATISTICS BIOINFORMATICS gene expression data
下载PDF
Incorporating heterogeneous biological data sources in clustering gene expression data
4
作者 Gang-Guo Li Zheng-Zhi Wang 《Health》 2009年第1期17-23,共7页
In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity... In this paper, a similarity measure between genes with protein-protein interactions is pro-posed. The chip-chip data are converted into the same form of gene expression data with pear-son correlation as its similarity measure. On the basis of the similarity measures of protein- protein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure is introduced into K-means method, which can be considered as an improved K-means method. The improved K-means method and other three clustering methods are evaluated by a real dataset. Per-formance of these methods is assessed by a prediction accuracy analysis through known gene annotations. Our results show that the improved K-means method outperforms other clustering methods. The performance of the improved K-means method is also tested by varying the tuning coefficients of the combined dissimilarity measure. The results show that it is very helpful and meaningful to incorporate het-erogeneous data sources in clustering gene expression data, and those coefficients for the genome-wide or completed data sources should be given larger values when constructing the combined dissimilarity measure. 展开更多
关键词 STATISTICAL Analysis Similarity/ DISSIMILARITY MEASURE gene expression data Clustering data Fusion
下载PDF
Challenges Analyzing RNA-Seq Gene Expression Data
5
作者 Liliana López-Kleine Cristian González-Prieto 《Open Journal of Statistics》 2016年第4期628-636,共9页
The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pr... The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pre-processing: Two different paths can be chosen: Transform RNA- sequencing count data to a continuous variable or continue to work with count data. For each data type, analysis tools have been developed and seem appropriate at first sight, but a deeper analysis of data distribution and structure, are a discussion worth. In this review, open questions regarding RNA-sequencing data nature are discussed and highlighted, indicating important future research topics in statistics that should be addressed for a better analysis of already available and new appearing gene expression data. Moreover, a comparative analysis of RNAseq count and transformed data is presented. This comparison indicates that transforming RNA-seq count data seems appropriate, at least for differential expression detection. 展开更多
关键词 RNA-Seq Analysis Count data PREPROCESSING Differential expression gene Co-expression Network
下载PDF
GEO(Gene Expression Omnibus):高通量基因表达数据库 被引量:9
6
作者 刘华 马文丽 郑文岭 《中国生物化学与分子生物学报》 CAS CSCD 北大核心 2007年第3期236-244,共9页
GEO(Gene Expression Omnibus)数据库包括高通量实验数据的广泛分类,有单通道和双通道以微阵列为基础的对mRNA丰度的测定;基因组DNA和蛋白质分子的实验数据;其中包括来自以非阵列为基础的高通量功能基因组学和蛋白质组学技术的数据也被... GEO(Gene Expression Omnibus)数据库包括高通量实验数据的广泛分类,有单通道和双通道以微阵列为基础的对mRNA丰度的测定;基因组DNA和蛋白质分子的实验数据;其中包括来自以非阵列为基础的高通量功能基因组学和蛋白质组学技术的数据也被存档,例如基因表达系列分析(serial analysis of gene expression,SAGE)和蛋白质鉴定技术.迄今为止,GEO数据库包含的数据含概10000个杂交实验和来自30种不同生物体的SAGE库.本文概述了GEO数据库的查询和浏览,数据下载和格式,数据分析,贮存与更新,并着重分析GEO数据浏览器中控制词汇的使用,阐述了GEO数据库的数据挖掘以及GEO在分子生物学领域中的应用前景.GEO可由此公众网址直接登陆http://www.ncbi.nlm.nih.gov/projects/geo/. 展开更多
关键词 基因表达 数据库 控制词汇 数据挖掘
下载PDF
Identify the signature genes for diagnose of uveal melanoma by weight gene co-expression network analysis 被引量:10
7
作者 Kai Shi Zhi-Tong Bing +4 位作者 Gui-Qun Cao Ling Guo Ya-Na Cao Hai-Ou Jiang Mei-Xia Zhang 《International Journal of Ophthalmology(English edition)》 SCIE CAS 2015年第2期269-274,共6页
AIM: To identify and understand the relationship between co-expression pattern and clinic traits in uveal melanoma, weighted gene co-expression network analysis(WGCNA) is applied to investigate the gene expression lev... AIM: To identify and understand the relationship between co-expression pattern and clinic traits in uveal melanoma, weighted gene co-expression network analysis(WGCNA) is applied to investigate the gene expression levels and patient clinic features. Uveal melanoma is the most common primary eye tumor in adults. Although many studies have identified some important genes and pathways that were relevant to progress of uveal melanoma, the relationship between co-expression and clinic traits in systems level of uveal melanoma is unclear yet. We employ WGCNA to investigate the relationship underlying molecular and phenotype in this study.METHODS: Gene expression profile of uveal melanoma and patient clinic traits were collected from the Gene Expression Omnibus(GEO) database. The gene co-expression is calculated by WGCNA that is the R package software. The package is used to analyze the correlation between pairs of expression levels of genes.The function of the genes were annotated by gene ontology(GO).RESULTS: In this study, we identified four co-expression modules significantly correlated with clinictraits. Module blue positively correlated with radiotherapy treatment. Module purple positively correlates with tumor location(sclera) and negatively correlates with patient age. Module red positively correlates with sclera and negatively correlates with thickness of tumor. Module black positively correlates with the largest tumor diameter(LTD). Additionally, we identified the hug gene(top connectivity with other genes) in each module. The hub gene RPS15 A, PTGDS, CD53 and MSI2 might play a vital role in progress of uveal melanoma.CONCLUSION: From WGCNA analysis and hub gene calculation, we identified RPS15 A, PTGDS, CD53 and MSI2 might be target or diagnosis for uveal melanoma. 展开更多
关键词 weighted gene co-expression network analysis microarray data gene ontology
下载PDF
A Survey on Acute Leukemia Expression Data Classification Using Ensembles
8
作者 Abdel Nasser H.Zaied Ehab Rushdy Mona Gamal 《Computer Systems Science & Engineering》 SCIE EI 2023年第11期1349-1364,共16页
Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists... Acute leukemia is an aggressive disease that has high mortality rates worldwide.The error rate can be as high as 40%when classifying acute leukemia into its subtypes.So,there is an urgent need to support hematologists during the classification process.More than two decades ago,researchers used microarray gene expression data to classify cancer and adopted acute leukemia as a test case.The high classification accuracy they achieved confirmed that it is possible to classify cancer subtypes using microarray gene expression data.Ensemble machine learning is an effective method that combines individual classifiers to classify new samples.Ensemble classifiers are recognized as powerful algorithms with numerous advantages over traditional classifiers.Over the past few decades,researchers have focused a great deal of attention on ensemble classifiers in a wide variety of fields,including but not limited to disease diagnosis,finance,bioinformatics,healthcare,manufacturing,and geography.This paper reviews the recent ensemble classifier approaches utilized for acute leukemia gene expression data classification.Moreover,a framework for classifying acute leukemia gene expression data is proposed.The pairwise correlation gene selection method and the Rotation Forest of Bayesian Networks are both used in this framework.Experimental outcomes show that the classification accuracy achieved by the acute leukemia ensemble classifiers constructed according to the suggested framework is good compared to the classification accuracy achieved in other studies. 展开更多
关键词 LEUKEMIA CLASSIFICATION ENSEMBLE rotation forest pairwise correlation bayesian networks gene expression data MICROARRAY gene selection
下载PDF
DENGENE:一种高精度的基于密度的适用于基因表达数据的聚类算法 被引量:1
9
作者 孙亮 赵芳 王永吉 《计算机应用研究》 CSCD 北大核心 2007年第4期58-61,共4页
根据基因表达数据的特点,提出一种高精度的基于密度的聚类算法DENGENE。DENGENE通过定义一致性检测和引进峰点改进搜索方向,使得算法能够更好地处理基因表达数据。为了评价算法的性能,选取了两组广为使用的测试数据,即啤酒酵母基因表达... 根据基因表达数据的特点,提出一种高精度的基于密度的聚类算法DENGENE。DENGENE通过定义一致性检测和引进峰点改进搜索方向,使得算法能够更好地处理基因表达数据。为了评价算法的性能,选取了两组广为使用的测试数据,即啤酒酵母基因表达数据集对算法来进行测试。实验结果表明,与基于模型的五种算法、CAST算法、K-均值聚类等相比,DENGENE在滤除噪声和聚类精度方面取得了显著的改善。 展开更多
关键词 基因表达数据 聚类分析 基于密度的聚类 一致性检测 峰点
下载PDF
Data Mining Based on Principal Component Analysis Application to the Nitric Oxide Response in Escherichia coli
10
作者 AiLing Teh Donovan Layton +2 位作者 Daniel R. Hyduke Laura R. Jarboe Derrick K. Rollins Sd 《Journal of Statistical Science and Application》 2014年第1期1-18,共18页
This work evaluates a recently developed multivariate statistical method based on the creation of pseudo or latent variables using principal component analysis (PCA). The application is the data mining of gene expre... This work evaluates a recently developed multivariate statistical method based on the creation of pseudo or latent variables using principal component analysis (PCA). The application is the data mining of gene expression data to find a small subset of the most important genes in a set of thousand or tens of thousands of genes from a relatively small number of experimental runs. The method was previously developed and evaluated on artificially generated data and real data sets. Its evaluations consisted of its ability to rank the genes against known truth in simulated data studies and to identify known important genes in real data studies. The purpose of the work described here is to identify a ranked set of genes in an experimental study and then for a few of the most highly ranked unverified genes, experimentally verify their importance.This method was evaluated using the transcriptional response of Escherichia coli to treatment with four distinct inhibitory compounds: nitric oxide, S-nitrosoglutathione, serine hydroxamate and potassium cyanide. Our analysis identified genes previously recognized in the response to these compounds and also identified new genes.Three of these new genes, ycbR, yJhA and yahN, were found to significantly (p-values〈0.002) affect the sensitivityofE, coli to nitric oxide-mediated growth inhibition. Given that the three genes were not highly ranked in the selected ranked set (RS), these results support strong sensitivity in the ability of the method to successfully identify genes related to challenge by NO and GSNO. This ability to identify genes related to the response to an inhibitory compound is important for engineering tolerance to inhibitory metabolic products, such as biofuels, and utilization of cheap sugar streams, such as biomass-derived sugars or hydrolysate. 展开更多
关键词 data mining principal component analysis (PCA) gene expression data analysis
下载PDF
Modeling of gene regulatory networks: A review
11
作者 Nedumparambathmarath Vijesh Swarup Kumar Chakrabarti Janardanan Sreekumar 《Journal of Biomedical Science and Engineering》 2013年第2期223-231,共9页
Gene regulatory networks play an important role the molecular mechanism underlying biological processes. Modeling of these networks is an important challenge to be addressed in the post genomic era. Several methods ha... Gene regulatory networks play an important role the molecular mechanism underlying biological processes. Modeling of these networks is an important challenge to be addressed in the post genomic era. Several methods have been proposed for estimating gene networks from gene expression data. Computational methods for development of network models and analysis of their functionality have proved to be valuable tools in bioinformatics applications. In this paper we tried to review the different methods for reconstructing gene regulatory networks. 展开更多
关键词 gene NETWORK gene expressION data gene REGULATION
下载PDF
The application of hidden markov model in building genetic regulatory network
12
作者 Rui-Rui Ji Ding Liu Wen Zhang 《Journal of Biomedical Science and Engineering》 2010年第6期633-637,共5页
The research hotspot in post-genomic era is from sequence to function. Building genetic regulatory network (GRN) can help to understand the regulatory mechanism between genes and the function of organisms. Probabilist... The research hotspot in post-genomic era is from sequence to function. Building genetic regulatory network (GRN) can help to understand the regulatory mechanism between genes and the function of organisms. Probabilistic GRN has been paid more attention recently. This paper discusses the Hidden Markov Model (HMM) approach served as a tool to build GRN. Different genes with similar expression levels are considered as different states during training HMM. The probable regulatory genes of target genes can be found out through the resulting states transition matrix and the determinate regulatory functions can be predicted using nonlinear regression algorithm. The experiments on artificial and real-life datasets show the effectiveness of HMM in building GRN. 展开更多
关键词 geneTIC REGULATORY Network Hidden MARKOV Model STATES TRANSITION gene expression data
下载PDF
血管样本生物信息学分析鉴定烟雾病相关的潜在关键基因
13
作者 刘洋 杨俊华 +1 位作者 吴俊 王硕 《中国卒中杂志》 北大核心 2024年第4期431-439,共9页
目的本研究对烟雾病患者血管样本的差异表达基因(differentially expressed genes,DEGs)进行生物信息学鉴定和分析,旨在探讨烟雾病的潜在发病机制。方法本研究以烟雾病和颈内动脉瘤患者大脑血管样本为研究对象。利用R语言线性模型微阵... 目的本研究对烟雾病患者血管样本的差异表达基因(differentially expressed genes,DEGs)进行生物信息学鉴定和分析,旨在探讨烟雾病的潜在发病机制。方法本研究以烟雾病和颈内动脉瘤患者大脑血管样本为研究对象。利用R语言线性模型微阵列数据(linear models for microarray data,limma)分析包对基因表达综合数据库(gene expression omnibus,GEO)中的GSE141025数据集进行分析,该数据集涵盖4例烟雾病患者和4例颈内动脉瘤患者的大脑中动脉和颞浅动脉样本各1个,共计16个样本。选择烟雾病患者的大脑中动脉、颞浅动脉及颈内动脉瘤患者的颞浅动脉共12个样本进行DEGs筛选。通过R语言功能富集分析工具包clusterProfiler,对筛选出的DEGs进行基因本体(gene ontology,GO)富集分析和京都基因与基因组百科全书(Kyoto encyclopedia of genes and genomes,KEGG)通路分析。利用STRING数据库构建蛋白质-蛋白质相互作用(proteinprotein interaction,PPI)网络,并使用网络可视化软件Cytoscape进行蛋白质网络的可视化和枢纽基因筛选。结果本研究在烟雾病患者的大脑中动脉与颞浅动脉样本间鉴定出138个DEGs,包括18个上调基因和120个下调基因。GO富集分析显示,以上DEGs在细胞外基质、受体配体活性和生长因子活性等方面显著富集,可能与烟雾病相关的血管病变和神经保护机制有关。KEGG通路分析提示,DEGs主要在酪氨酸代谢通路中富集。通过PPI网络分析,共筛选出9个枢纽基因,包括骨膜蛋白(periostin,POSTN)、脑源性神经营养因子(brain derived neurotrophic factor,BDNF)、血小板衍生生长因子受体α(platelet derived growth factor receptor alpha,PDGFRA)、Thy-1细胞表面抗原(Thy-1 cell surface antigen,THY1)、ⅩⅤ型胶原蛋白α1链(collagen typeⅩⅤalpha 1 chain,COL15A1)、成纤维细胞生长因子7(fibroblast growth factor 7,FGF7)、光蛋白聚糖(l umi can,LUM)、层粘连蛋白α2亚基(laminin subunit alpha 2,LAMA2)和RELN(reelin)。此外,上调基因delta样典型Notch配体4(delta like canonical Notch ligand 4,DLL4)在本研究中首次被发现可能在烟雾病中扮演重要角色,或与烟雾病的病理性血管生成有关。结论细胞外基质、生长因子及其受体的表达失调等可能参与烟雾病的发病过程。DEGs分析筛选出的枢纽基因(POSTN、BDNF、PDGFRA、THY1、COL15A1、FGF7、LUM、LAMA2、RELN)以及DLL4可能在烟雾病的病理形成过程中发挥作用。 展开更多
关键词 烟雾病 生物信息学分析 基因表达数据 枢纽基因
下载PDF
Integration of genome scale data for identifying newplayers in colorectal cancer
14
作者 Viktorija Sokolova Elisabetta Crippa Manuela Gariboldi 《World Journal of Gastroenterology》 SCIE CAS 2016年第2期534-545,共12页
Colorectal cancers(CRCs) display a wide variety of genomic aberrations that may be either causally linked to their development and progression, or might serve as biomarkers for their presence. Recent advances in rapid... Colorectal cancers(CRCs) display a wide variety of genomic aberrations that may be either causally linked to their development and progression, or might serve as biomarkers for their presence. Recent advances in rapid high-throughput genetic and genomic analysis have helped to identify a plethora of alterations that can potentially serve as new cancer biomarkers, and thus help to improve CRC diagnosis, prognosis, and treatment. Each distinct data type(copy number variations, gene and micro RNAs expression, Cp G island methylation) provides an investigator with a different, partially independent, and complementary view of the entire genome. However, elucidation of gene function will require more information than can be provided by analyzing a single type of data. The integration of knowledge obtained from different sources is becoming increasingly essential for obtaining an interdisciplinary view of large amounts of information, and also for cross-validating experimental results. The integration of numerous types of genetic and genomic data derived from public sources, and via the use of ad-hoc bioinformatics tools and statistical methods facilitates the discovery and validation of novel, informative biomarkers. This combinatory approach will also enable researchers to more accurately and comprehensively understand the associations between different biologic pathways, mechanisms, and phenomena, and gain new insights into the etiology of CRC. 展开更多
关键词 COLORECTAL cancer COPY number VARIATIONS gene expressION miRNA expressION Methylome dataintegration
下载PDF
ELOVL3在预测肝细胞癌预后中的作用
15
作者 宁娜 巴图 +3 位作者 穆尼沙·买买提力 玉苏甫卡迪尔·麦麦提尼加提 卢爽 陈雄 《诊断病理学杂志》 2024年第8期756-761,共6页
目的探索精准预测肝细胞癌患者预后的分子标志物。方法在TCGA数据库下载肝细胞癌(LIHC)样本的mRNA表达数据及相应的临床预后数据,并且收集从2017年1月1日至2022年1月1日在新疆维吾尔自治区人民医院接受肝切除手术的91例肝细胞癌(HCC)患... 目的探索精准预测肝细胞癌患者预后的分子标志物。方法在TCGA数据库下载肝细胞癌(LIHC)样本的mRNA表达数据及相应的临床预后数据,并且收集从2017年1月1日至2022年1月1日在新疆维吾尔自治区人民医院接受肝切除手术的91例肝细胞癌(HCC)患者的临床病理资料和随访资料。用R软件筛查差异表达基因(DEGs)后,进一步进行Cox单变量回归分析和LASSO回归分析筛出与HCC患者生存相关的风险基因;用R软件的ggplot2分析包和免疫组织化学染色法分析ELOVL3在不同肝组织上的表达情况,Survival分析包实现Kaplan-Meier生存分析。结果初步筛查得到752个DEGs,包括表达上调的247个基因和505个下调基因。进一步通过Cox回归分析得出175个与HCC患者的生存显著相关的基因集,LASSO回归分析,挑出14个与HCC患者生存密切相关的风险基因。最终通过预实验和文献调研将目标基因确定为ELOVL3。TCGA-LIHC的分析结果显示:ELOVL3在癌组织中的表达量明显高于正常肝组织(P<0.001),而且该基因在癌组织中的表达量也高于互相匹配的癌旁组织(P<0.001)。免疫组化染色结果提示:ELOVL3在59例(64.8%)患者的肝癌组织中表达阳性,而32例(35.2%)患者的肝癌组织中没有表达;在互相匹配的癌旁组织中表达均为阴性。生存分析结果显示ELOVL3阳性患者的5年生存率明显比ELOVL3阴性患者低(r=0.028);ELOVL3高表达的患者总体生存率相对于ELOVL3低表达的患者低,而且表达量越高,预后越差(HR=1.7,P=0.008)。结论ELOVL3的表达与HCC患者的不良预后相关,而且具有做预测HCC预后的新型分子标志物的潜力。 展开更多
关键词 肝细胞癌 TCGA数据库 差异表达基因 预后
下载PDF
人工智能在肿瘤基因表达数据中的应用研究进展 被引量:1
16
作者 李坤鹏 王泽朋 +1 位作者 周玉 李四海 《中国医学物理学杂志》 CSCD 2024年第3期389-396,共8页
肿瘤是影响人类健康的严重疾病,早期诊断对提高治疗成功率和患者生存率至关重要。肿瘤基因表达数据的研究已经成为揭示肿瘤疾病机制的主要工具,人工智能在肿瘤基因表达数据分析中扮演着重要角色。本文从机器学习方法的角度,探讨监督式... 肿瘤是影响人类健康的严重疾病,早期诊断对提高治疗成功率和患者生存率至关重要。肿瘤基因表达数据的研究已经成为揭示肿瘤疾病机制的主要工具,人工智能在肿瘤基因表达数据分析中扮演着重要角色。本文从机器学习方法的角度,探讨监督式学习、无监督式学习和深度学习在肿瘤预测和分类中的潜在优势,特别关注特征选择算法对基因筛选的影响及其在高维度基因表达数据中的重要性。通过全面综述人工智能在肿瘤基因表达数据分析中的应用与发展,旨在为未来的研究方向提供参考,促进进一步发展。 展开更多
关键词 基因表达数据 人工智能 机器学习 特征选择 综述
下载PDF
基于基因关联分析的贝叶斯网络疾病样本分类算法
17
作者 李志杰 廖旭红 +1 位作者 李元香 李青蓝 《计算机应用》 CSCD 北大核心 2024年第11期3449-3458,共10页
基因表达数据作为生物学中一种特定类型的大数据,尽管基因表达值都是普通的实数值,但它们的相似性不是基于欧氏距离度量,而是基于基因表达值是否展现同升同降趋势。目前的基因贝叶斯网络以基因表达水平值为节点随机变量,没有体现这种子... 基因表达数据作为生物学中一种特定类型的大数据,尽管基因表达值都是普通的实数值,但它们的相似性不是基于欧氏距离度量,而是基于基因表达值是否展现同升同降趋势。目前的基因贝叶斯网络以基因表达水平值为节点随机变量,没有体现这种子空间模式的相似性。因此,提出基于基因关联分析的贝叶斯网络疾病分类算法(BCGA),从带类标签的疾病样本-基因表达数据中学习贝叶斯网络并预测新疾病样本的分类。首先,将疾病样本离散化过滤以选择基因,并将降维后的基因表达值排序和置换为基因列下标;其次,分解基因列下标序列为长度为2的原子序列集合,而这个集合的频繁原子序列对应一对基因的关联关系;最后,通过基因关联熵度量因果关系,并用于贝叶斯网络结构学习。BCGA的参数学习也变得很容易,基因节点的条件概率分布只要统计该基因的原子序列和父节点基因的原子序列出现频次即可。在多个肿瘤和非肿瘤基因表达数据集上的实验结果表明,相较于已有的同类算法,BCGA的疾病分类准确率明显提高,分析时间有效缩短;另外,BCGA使用基因关联熵代替条件独立性,使用基因原子序列代替基因表达值,可以更好地拟合基因表达数据。 展开更多
关键词 基因表达数据 频繁原子序列 基因关联熵 基因序列贝叶斯网络 疾病分类
下载PDF
Gene Expression Data Classification Using Consensus Independent Component Analysis 被引量:7
18
作者 Chun-Hou Zheng De-Shuang Huang +1 位作者 Xiang-Zhen Kong Xing-Ming Zhao 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2008年第2期74-82,共9页
We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (... We propose a new method for tumor classification from gene expression data, which mainly contains three steps. Firstly, the original DNA microarray gene expression data are modeled by independent component analysis (ICA). Secondly, the most discriminant eigenassays extracted by ICA are selected by the sequential floating forward selection technique. Finally, support vector machine is used to classify the modeling data. To show the validity of the proposed method, we applied it to classify three DNA microarray datasets involving various human normal and tumor tissue samples. The experimental results show that the method is efficient and feasible. 展开更多
关键词 independent component analysis feature selection support vector machine gene expression data
原文传递
Mining and Integrating Reliable Decision Rules for Imbalanced Cancer Gene Expression Data Sets 被引量:4
19
作者 Hualong Yu 1 , Jun Ni 2 , Yuanyuan Dan 3 , Sen Xu 4 1. School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China +2 位作者 2. Department of Radiology, Carver College of Medicine, The University of Iowa, Iowa City, IA 52242, USA 3. School of Biology and Chemical Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China 4. School of Information Engineering, Yancheng Institute of Technology, Yancheng 224051, China 《Tsinghua Science and Technology》 SCIE EI CAS 2012年第6期666-673,共8页
There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms ... There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails. This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples; thus avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets. 展开更多
关键词 cancer gene expression data class imbalance paired differential expression genes decision ruleensemble learning majority voting
原文传递
Outlier Analysis for Gene Expression Data 被引量:3
20
作者 ChaoYan Guo-LiangChen Yi-FeiShen 《Journal of Computer Science & Technology》 SCIE EI CSCD 2004年第1期13-21,共9页
The rapid developments of technologies that generate arrays of gene dataenable a global view of the transcription levels of hundreds of thousands of genes simultaneously.The outlier detection problem for gene data has... The rapid developments of technologies that generate arrays of gene dataenable a global view of the transcription levels of hundreds of thousands of genes simultaneously.The outlier detection problem for gene data has its importance but together with the difficulty ofhigh dimensionality. The sparsity of data in high-dimensional space makes each point a relativelygood outlier in the view of traditional distance-based definitions. Thus, finding outliers in highdimensional data is more complex. In this paper, some basic outlier analysis algorithms arediscussed and a new genetic algorithm is presented. This algorithm is to find best dimensionprojections based on a revised cell-based algorithm and to give explanations to solutions. It cansolve the outlier detection problem for gene expression data and for other high dimensional data aswell. 展开更多
关键词 gene expression data outlier analysis cell-based algorithm geneTICALGORITHM
原文传递
上一页 1 2 19 下一页 到第
使用帮助 返回顶部