Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to ...Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to provide background knowledge to direct the process of data mining. This paper gives a common introduction to the method and presents a practical analysis example using SVM (support vector machine) as the classifier. Gene Ontology and the accompanying annotations compose a big knowledge base, on which many researches have been carried out. Microarray dataset is the output of DNA chip. With the help of Gene Ontology we present a more elaborate analysis on microarray data than former researchers. The method can also be used in other fields with similar scenario.展开更多
Objective: To discuss strategies and methods of normalization on how to deal with and analyze data for different chips with the combination of statistics, mathematics and bioinformatics in order to find significant d...Objective: To discuss strategies and methods of normalization on how to deal with and analyze data for different chips with the combination of statistics, mathematics and bioinformatics in order to find significant difference genes. Methods: With Excel and SPSS software, high or low density chips were analyzed through total intensity normalization (TIN) and locally weighted linear regression normalization (LWLRN). Results: These methods effectively reduced systemic errors and made data more comparable and reliable. Conclusion: These methods can search the genes of significant difference, although normalization methods are being developed and need to be improved further. Great breakthrough will be obtained in microarray data normalization analysis and transformation with the development of non-linear technology, software and hardware of computer.展开更多
[目的]探讨导致乳腺癌的可能致病基因及其生物学意义。[方法]基于国际上通用的乳腺癌公共测试集Breast-2 (79)数据库,提出了一种集成的决策信息因子(decision information factor,DIF)方法,以有效地选择出候选致病基因,并完成乳腺癌识...[目的]探讨导致乳腺癌的可能致病基因及其生物学意义。[方法]基于国际上通用的乳腺癌公共测试集Breast-2 (79)数据库,提出了一种集成的决策信息因子(decision information factor,DIF)方法,以有效地选择出候选致病基因,并完成乳腺癌识别。基于R语言对原始基因数据做加权共表达网络分析以识别网络中的重要基因模块;使用DAVID软件对重要基因模块进行Pathway富集分析,验证是否具有统计学意义;使用DIF方法从具有统计学意义的重要基因模块中选择出2个候选致病基因;借助反空间稀疏表示分类模型完成乳腺癌识别。[结果]通过加权基因共表达网络得到3个基因模块,其中2个经Pathway富集分析检验具有统计学意义,在这两个模块上采用DIF基因选择方法选出的2个候选致病基因用于乳腺癌识别时,准确率达到71.07%,比信噪比(signal noise ratio,SNR)、受试者工作特征曲线(receiver operating characteristic curve,ROC)、组内与组间平方和比率(the ratio of between-groups to within-groups sum of squares,BW)的方法分别高出13.93%、11.19%和8.57%。[结论]该文提出的集成DIF基因选择方法得到的候选致病基因能有效识别乳腺癌,并具有明确的生物学意义。展开更多
基金Project (No. 20040248001) supported by the Ph.D. Programs Foun-dation of Ministry of Education of China
文摘Background knowledge is important for data mining, especially in complicated situation. Ontological engineering is the successor of knowledge engineering. The sharable knowledge bases built on ontology can be used to provide background knowledge to direct the process of data mining. This paper gives a common introduction to the method and presents a practical analysis example using SVM (support vector machine) as the classifier. Gene Ontology and the accompanying annotations compose a big knowledge base, on which many researches have been carried out. Microarray dataset is the output of DNA chip. With the help of Gene Ontology we present a more elaborate analysis on microarray data than former researchers. The method can also be used in other fields with similar scenario.
基金the National Natural Science Foundation of China(No. 60371034)the Scientific Research Foundation of Third Military Medical University(2007XG20)
文摘Objective: To discuss strategies and methods of normalization on how to deal with and analyze data for different chips with the combination of statistics, mathematics and bioinformatics in order to find significant difference genes. Methods: With Excel and SPSS software, high or low density chips were analyzed through total intensity normalization (TIN) and locally weighted linear regression normalization (LWLRN). Results: These methods effectively reduced systemic errors and made data more comparable and reliable. Conclusion: These methods can search the genes of significant difference, although normalization methods are being developed and need to be improved further. Great breakthrough will be obtained in microarray data normalization analysis and transformation with the development of non-linear technology, software and hardware of computer.
文摘[目的]探讨导致乳腺癌的可能致病基因及其生物学意义。[方法]基于国际上通用的乳腺癌公共测试集Breast-2 (79)数据库,提出了一种集成的决策信息因子(decision information factor,DIF)方法,以有效地选择出候选致病基因,并完成乳腺癌识别。基于R语言对原始基因数据做加权共表达网络分析以识别网络中的重要基因模块;使用DAVID软件对重要基因模块进行Pathway富集分析,验证是否具有统计学意义;使用DIF方法从具有统计学意义的重要基因模块中选择出2个候选致病基因;借助反空间稀疏表示分类模型完成乳腺癌识别。[结果]通过加权基因共表达网络得到3个基因模块,其中2个经Pathway富集分析检验具有统计学意义,在这两个模块上采用DIF基因选择方法选出的2个候选致病基因用于乳腺癌识别时,准确率达到71.07%,比信噪比(signal noise ratio,SNR)、受试者工作特征曲线(receiver operating characteristic curve,ROC)、组内与组间平方和比率(the ratio of between-groups to within-groups sum of squares,BW)的方法分别高出13.93%、11.19%和8.57%。[结论]该文提出的集成DIF基因选择方法得到的候选致病基因能有效识别乳腺癌,并具有明确的生物学意义。