摘要
蛋白质亚细胞定位是蛋白质组学基本问题之一。某些类型蛋白质可能存在于两个或两个以上的亚细胞位置,这类蛋白质的亚细胞定位问题更为复杂。分别利用Gene Ontology和伪氨基酸成分法,将一条蛋白质表示为一实值向量;采纳多标记学习中的Ranking思想,计算出一得分向量V,该向量的每一分量的值表示被预测蛋白质属于某个亚细胞位置的概率;利用最近邻算法预测蛋白质所属亚细胞位置的个数n,得分向量V中得分最高的n个分量对应的亚细胞位置即为预测的位置。
A It is one of basic problems of proteomics to identify the subcellular locations of a protein. It makes the problem more complicated that some proteins may simultaneously exist in two or more than two subcellular locations. Gene Ontology and pseudo amino acid composition are respectively employed to represent a protein as a real values vector. The idea of Ranking initiating from multi-label learning community is adopted to compute a score vector V, each component value of which indicates the probability that a protein of the corresponding subcellular location.The nearest neighbor algorithm is then employed to predict the number n of subcellular localization of human proteins. Finally, the n subcellular locations correspondin~ to the too n scores components in Vare assign to the ouerv nrotein.
出处
《计算机工程与应用》
CSCD
2012年第6期126-128,共3页
Computer Engineering and Applications
基金
国家自然科学基金(No.60961003)
江西省自然科学基金(No.2010GQS0127)