期刊文献+

一种面向大规模序列数据的交互特征并行挖掘算法 被引量:8

A Parallel Algorithm for Mining Interactive Features from Large Scale Sequences
下载PDF
导出
摘要 序列是一种重要的数据类型,在诸多应用领域广泛存在.基于序列的特征选择具有广阔的现实应用场景.交互特征是指一组整体具有显著强于单独个体与目标相关性的特征集合.从大规模序列中挖掘交互特征面临着位点的"组合爆炸"问题,计算挑战性极大.针对该问题,以生物领域高通量测序数据为背景,提出了一种新的基于并行处理和演化计算的高阶交互特征挖掘算法.位点数是制约交互作用挖掘效率的根本因素.摈弃了现有方法基于序列分块的并行策略,采用基于位点分块的并行思想,具有天然的效率优势.进一步,提出了极大等位公共子序列(maximal allelic common subsequence, MACS)的概念并设计了基于MACS的特征区域划分策略.该策略能将交互特征的查找范围缩小至许多"碎片"空间,并保证不同"碎片"间不存在交互特征,避免计算耦合引起的高额通信代价.利用基于置换搜索的并行蚁群算法,执行交互特征选择.大量真实数据集和合成数据集上的实验结果,证实提出的PACOIFS算法在有效性和效率上优于同类其他算法. Sequence is an important type of data which is widely existing in various domains, and thus feature selection from sequence data is of practical significance in extensive applications. Interactive features refer to a set of features, each of which is weakly correlated with the target, but the whole of which is strongly correlated with the target. It is of great challenge to mine interactive features from large scale sequence data for the combinatorial explosion problem of loci. To address the problem, against the background of high-throughput sequencing in biology, a parallel evolutionary algorithm for high-order interactive features mining is proposed in this paper. Instead of sequence-block based parallel strategy, the work is inspired by loci-based idea since the number of loci is the fundamental factor that restricts the efficiency. Further, we propose the conception of maximal allelic common subsequence (MACS) and MACS based strategy for feature region partition. According to the strategy, the search range of interactive features is narrowed to many fragged spaces and interactions are guaranteed not to exist among different fragments. Finally, a parallel ant algorithm based on substitution search is developed to conduct interactive feature selection. Extensive experiments on real and synthetic datasets show that the efficiency and effectiveness of the proposed PACOIFS algorithm is superior to that of competitive algorithms.
作者 赵宇海 印莹 李源 汪嗣尧 王国仁 Zhao Yuhai;Yin Ying;Li Yuan;Wang Siyao;Wang Guoren(School of Computer Science and Engineering, Northeastern University, Shenyang 110819)
出处 《计算机研究与发展》 EI CSCD 北大核心 2019年第5期992-1006,共15页 Journal of Computer Research and Development
基金 国家重点研发计划项目(2018YFB1004402) 国家自然科学基金面上项目(61772124)~~
关键词 交互特征 数据挖掘 大规模序列 蚁群算法 并行计算 极大等位公共子序列 interactive features data mining large scale sequence ant colony algorithm parallel computation maximal allelic common subsequence (MACS)
  • 相关文献

参考文献2

二级参考文献94

  • 1Ahmadiyeh, N., Pomerantz, M.M., Grisanzio, C., Herman, R, Jia, L., Almendro, V., He, H.H., Brown, M., Liu, X.S., Davis, M., Caswell, J.L., Beckwith, C.A., Hills, A., Macconaill, L., Coetzee, G.A., Regan, M.M., Freedman, M.L., 2010. 8q24 prostate, breast, and colon cancer risk loci show tissue-specific long-range interaction with MYC. Proc. Natl. Acad. Sci. USA 107, 9742-9746.
  • 2Aran, D., Sabato, S., Hellman, A., 2013. DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 14, R21.
  • 3Bauer, D.E., Kamran, S.C., Lessard, S., Xu, J., Fujiwara, Y., Lin, C., Shao, Z., Canver, M.C., Smith, E.C., Pinello, L., Sabo, EJ., Vierstra, J., Voit, R.A., Yuan, G.C., Porteus, M.H., Stamatoyannopoulo, J.A., Lettre, G., Orkin, S.H., 2013. An erythroid enhancer of BCLIlA subject to genetic variation determines fetal hemoglobin level. Science 342, 253-257.
  • 4Bell, C.G., Finer, S., Lindgren, C.M., Wilson, G.A., Rakyan, V.K., Teschendorff, A.E., Akan, P., Stupka, E., Down, T.A., Prokopenko, I., Morison, I.M., Mill, J., Pidsley, R., International Type 2 Diabetes lq Consortium, Deloukas, P., Frayling, T.M., Hattersley, A.T., McCarthy, M.I., Beck, S., Hitman, G.A., 2010. Integrated genetic and epigenetic analysis identifies haplotype-specific methylation in the I~TO type 2 diabetes and obesity susceptibility locus. PLoS One 5, e14040.
  • 5Bergholdt, R., Brorsson, C., Palleja, A., Berchtold, L.A., Floyel, T., Bang- Berthelsen, C.H., Frederiksen, K.S., Jensen, L.J., Storling, J., Pociot, F., 2012. Identification of novel type 1 diabetes candidate genes by integrating genome-wide association data, protein--protein interactions, and human pancreatic islet gene expression. Diabetes 61,954-962.
  • 6Bojesen, S.E., Pooley, K.A., Jofinatty, S.E., Beesley, J., Michailidou, K., Tyrer, J.E, Edwards, S.L., Pickett, H.A., Shen, H.C., Smart, C.E., Hillman, K.M., Mai, EL., Lawrenson, K., Stutz, M.D., Lu, Y., Karevan, R., Woods, N., Johnston, R.L., French, J.D., Chen, X., Weischer, M., Nielsen, S.F., Maranian, M.J., Ghoussaini, M., Ahmed, S., Baynes, C., Bolla, M.K., Wang, Q., Dennis, J., McGuffog, L., Barrowdale, D., Lee, A., Healey, S., Lush, M., Tessier, D.C., Vincent, D., Bacot, F., Australian Cancer Study, Australian Ovarian Cancer Study, Kathleen Cuningham Foundation Consortium for Research into Familial BreastCancer (kCon- Fab), Gene Environment Interaction and Breast Cancer (GENICA), Swedish Breast Cancer Stndy(SWE-BRCA), Hereditary Breastand Ovarian Cancer Research Group Netherlands (HEBON), Epidemiological study of BRCA1 & BRCA2 Mutation Carriers (EMBRACE), Genetic Modifiers of Cancer Risk in BRCA1/2 Mutation Carriers (GEMO), Vergote, I., Lambrechts, S., Despierre, E., Risch, H.A., Gonz{tlez- Neira, A., Rossing, M.A., Pita, G., Doherty, J.A., Alvarez, N., Larson, M.C., Fridley, B.L., Schoof, N., Chang-Claude, J., Cicek, M.S., Peto, J., Kalli, K.R., Broeks, A., Armasu, S.M., Schmidt, M.K., Braaf, L.M., Winterhoff, B., Nevanlinna, H., Konecny, G.E., Lambrechts, D., Rogmann, L., Gunnel, E, Teoman, A., Milne, R.L., Garcia, J.J., Cox, A., Shridhar, V., Burwinkel, B., Marme, E, Hein, R., Sawyer, E.J., Haiman, C.A., Wang-Gohrke, S., Andrulis, I.L., Moysich, K.B., Hopper, J.L., Odunsi, K., Lindblom, A., Giles, G.G., Brenner, H., Simard, J., Lurie, G., Fasching, EA., Carney, M.E., Radice, E, Wilkens, L.R., Swerdlow, A., Goodman, M.T., Brauch, H., Garcia- Closas, M., Hillemanns, E, Winqvist, R., Diirst, M., Devilee, E,Runnebaum, I., Jakubowska, A., Lubinski, J., Mannermaa, A., Butzow, R., Bogdanova, N.V., D6rk, T., Pelttari, L.M., Zheng, W., Leminen, A., Anton- Culver, H., Bunker, C.H., Kristensen, V., Ness, R.B., Muir, K., Edwards, R., Meindl, A., Heitz, E, Matsuo, K., du Bois, A., Wu, A.H., Harter, R, Teo, S.H., Schwaab, I., Shu, X.O., Blot, W., Hosono, S., Kang, D., Nakanishi, T., Hartman, M., Yatabe, Y., Hamann, U., Karlan, B.Y., Sangrajrang, S., Kjaer, S.K., Gaborieau, V., Jensen, A., Eccles, D., H0gdall, E., Shen, C.Y., Brown, J., Woo, Y.L., Shah, M., Azmi, M.A., Luben, R., Omar, S.Z., Czene, K., Vierkant, R.A., Nordestgaard, B.G., Flyger, H., Vachon, C., Olson, J.E., Wang, X., Levine, D.A., Rudolph, A., Weber, R.R, Flesch-Janys, D., Iversen, E., Nickels, S., Schildkraut, J.M., Silva Idos, S., Cramer, D.W., Gibson, L., Terry, K.L., Fletcher, O., Vitonis, A.E, van der Schoot, C.E., Poole, E.M., Hogervorst, EB., Tworoger, S.S., Liu, J., Bandera, E.V., Li, J., Olson, S.H., Humphreys, K., Orlow, I., Blomqvist, C., Rodriguez-Rodriguez, L., Aittom~iki, K., Salvesen, H.B., Muranen, T.A., Wik, E., Brouwers, B., Krakstad, C., Wauters, E., Halle, M.K., Wildiers, H., Kiemeney, L.A., Mulot, C., Aben, K.K., Laurent-Puig, P., Altena, A.M., Truong, T., Massuger, L.E, Benitez, J., Pejovic, T., Perez, J.I., Hoatlin, M., Zamora, M.R, Cook, L.S., Balasubramanian, S,R, Kelemen, L.E., Schneeweiss, A., Le, N.D., Sohn, C., Brooks-Wilson, A., Tomlinson, I., Kerin, M.J., Miller, N., Cybulski, C., Henderson, B.E., Menkiszak, J., Schumacher, F., Wentzensen, N., Le Marchand, L., Yang, H.R, Mulligan, A.M., Glendon, G., Engelholm, S.A., Knight, J.A., HCgdall, C.K., Apicella, C., Gore, M., Tsimiklis, H., Song, H., Southey, M.C., Jager, A., den Ouweland, A.M., Brown, R., Martens, J.W., Flanagan, J.M., Kriege, M., Paul, J., Margolin, S., Siddiqui, N., Severi, G., Whittemore, A.S., Baglietto, L., McGuire, V., Stegmaier, C., Sieh, W., Mtiller, H., Arndt, V., Labr~che, F., Gao, Y.T., Goldberg, M.S., Yang, G., Dumont, M., McLaughlin, J.R., Hartmann, A., Ekici, A.B., Beckmann, M.W., Phelan, C.M., Lux, M.E, Permuth-Wey, J., Peissel, B., Sellers, T.A., Ficarazzi, E, Barile, M., Ziogas, A., Ashworth, A., Gentry- Maharaj, A., Jones, M., Ramus, S.J., Orr, N., Menon, U., Pearce, C.L., Brtining, T., Pike, M.C., Ko, Y.D., Lissowska, J., Figueroa, J., Kupryjanczyk, J., Chanock, S.J., Dansonka-Mieszkowska, A., Jukkola- Vuorinen, A., Rzepecka, I.K., Pylk~is, K., Bidzinski, M., Kauppila, S., Hollestetle, A., Seynaeve, C., Tollenaar, R.A., Durda, K., Jaworska, K., Hartikainen, J.M., Kosma, V.M., Kataja, V., Antonenkova, N.N., Long, J., Shrubsole, M., Deming-Halverson, S., Lophatananon, A., Siriwanarangsan, R, Stewart-Brown, S., Ditsch, N., Lichtner, R, Schmutzler, R.K., Ito, H., Iwata, H., Tajima, K., Tseng, C.C., Stram, D.O., van den Berg, D., Yip, C.H., Ikram, M.K., Teh, Y.C., Cai, H., Lu, W., Signorello, L.B., Cai, Q., Nob, D.Y., Yoo, K.Y., Miao, H., lau, RT., Teo, Y.Y., McKay, J., Shapiro, C., Ademuyiwa, E, Fountzilas, G., Hsiung, C.N., Yu, J.C., Hou, M.E, Healey, C.S., Luccarini, C., Peock, S., Stoppa-Lyonnet, D., Peterlongo, R, Rebbeck, T.R., Piedmonte, M., Singer, C.E, Friedman, E., Thomassen, M., Offit, K., Hansen, T.V., Neuhausen, S.L., Szabo, C.I., Blanco, I., Garber, J., Narod, S.A., Weitzel, J.N., Montagna, M., Olah, E., Godwin, A.K., Yannoukakos, D., Goldgar, D.E., Caldes, T., Imyanitov, E.N., Tihomirova, L,, Arun, B.K., Campbell, I., Mensenkamp, A.R., van Asperen, C.J., van Roozendaal, K.E., Meijers-Heijboer, H., Collie, J.M., Oosterwijk, J.C., Hooning, M.J., Rookus, M.A., van der Luijt, R.B., Os, T.A., Evans, D.G., Frost, D., Fineberg, E., Barwell, J., Walker, L., Kennedy, M.J., Platte, R., Davidson, R., Ellis, S.D., Cole, T., Bressac-de Paillerets, B., Buecher, B., Damiola, E, Faivre, L., Frenay, M., Sinilnikova, O.M., Carom O., Giraud, S., Mazoyer, S., Bonadona, V., Caux-Moncoutier, V., Toloczko- Grabarek, A., Gronwald, J., Byrski, T., Spurdle, A.B., Bonanni, B., Zaffaroni, D., Giannini, G., Bernard, L., Dolcetti, R., Manoukian, S., Arnold, N., Engel, C., Deissler, H., Rhiem, K., Niederacher, D., Plendl, H., Sutter, C., Wappenschmidt, B., Borg, A., Melin, B., Rantala, J., Soller, M., Nathanson, K.L., Domchek, S.M., Rodriguez, G.C., Salani, R., Kaulich, D.G., Tea, M.K., Paluch, S.S., Laitman, Y., Skytte, A.B., Kruse, T.A., Jensen, U.B., Robson, M., Gerdes, A.M., Ejlertsen, B., Foretova, L., Savage, S.A., Lester, J., Soucy, R, Kuchenbaecker, K.B., Olswold, C., Cunningham, J.M., Slager, S., Pankratz, V.S., Dicks, E., Lakhani, S.R., Couch, FJ., Hall, R, Monteiro, A.N., Gayther, S.A., Pharoah, RD., Reddel, R.R., Goode, E.L., Greene, M.H., Easton, D.F.,Berchuck, A., Antoniou, A.C., Chenevix-Trench, G., Dunning, A.M., 2013. Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer. Nat. Genet. 45, 371--384.
  • 7Bruno, A.E., Li, L., Kalabus, J.L., Pan, Y., Yu, A., Hu, Z., 2012. miRdSNP: a database of disease-associated SNPs and microRNA target sites on 3' UTRs of human genes. BMC Genomics 13, 44.
  • 8Burkhardt, R., Kenny, E.E., Lowe, J.K., Birkeland, A., Josowitz, R., Noel, M., Salit, J., Mailer, J.B., Pe'er, I., Daly, M.J., Altshuler, D., Stoffel, M., Friedman, J.M., Breslow, J.L., 2008. Common SNPs in HMGCR in Mi- cronesians and Caucasians associated with LDL-Cholesterol levels affect alternative splicing of exonl3. Arterioscler. Thromb. "Vasc. Biol. 28, 2078--2084.
  • 9Caussy, C., Charri~re, S., Marqais, C., Di Filippo, M., Sassolas, A., Delay, M., Euthine, V., Jalabert, A., Lefai, E., Rome, S., Moulin, P., 2014. An APOA5 3' UTR variant associated with plasma triglycerides triggers APOA5 downregulation by creating a functional miR-485-5p binding site. Am. J. Hum. Genet. 94, 129--134.
  • 10Civelek, M., Lusis, A.J., 2014. Systems genetics approaches to understand complex traits. Nat. Rev. Genet. 15, 34-48.

共引文献15

同被引文献104

引证文献8

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部