基于多重解码器的自编码器模型的生物序列聚类方法

Biological Sequence Clustering Method Based on Multi-decoder Autoencoder Model

下载PDF

导出

摘要提出一种基于多重解码器的自编码器模型,用于学习生物序列数据的表示,并使用k-means算法对序列的表示进行聚类.试验结果验证了提出的方法在DNA序列数据集上具有良好的性能. In this paper, an autoencoder model based on multi-decoder is proposed to learn the representation of biological sequence data, and thenk-means is used to cluster the representation of sequences.Experimental results show that the proposed method has good performance on DNA sequence data sets.

作者陈城林劼 CHEN Cheng;LIN Jie(School of Mathematics and Statistics,Fujian Normal University,Fuzhou 350117,China)

机构地区福建师范大学数学与统计学院

出处《福建师范大学学报（自然科学版）》 CAS 2022年第6期1-9,共9页 Journal of Fujian Normal University：Natural Science Edition

基金国家自然科学基金资助项目(61472082)。

关键词生物序列聚类自编码器表示学习 biological sequence clustering autoencoder representation learning

分类号 Q811.4 [生物学—生物工程]

引文网络
相关文献

参考文献2

1Bin Liu,Hao Wu,Kuo-Chen Chou.Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences[J].Natural Science,2017,9(4):67-91. 被引量：12
2杨恒宇.生物序列数据挖掘技术研究[J].合肥工业大学学报（自然科学版）,2012,35(9):1212-1216. 被引量：3

二级参考文献31

1沈红斌,王士同,吴小俊.离群模糊核聚类算法[J].软件学报,2004,15(7):1021-1029. 被引量：37
2陈娟,陈崚.多重序列比对的蚁群算法[J].计算机应用,2006,26(B06):124-128. 被引量：5
3葛宏伟,梁艳春.基于隐马尔可夫模型和免疫粒子群优化的多序列比对算法[J].计算机研究与发展,2006,43(8):1330-1336. 被引量：9
4Mount D W. Bioinformatics sequence and genome analysis [M]. New York Colt Spring Harbor Laboratory Press, 2001:21--22.
5张法.生物序列相似性的比较[J].信息技术快报,2005,3(5):7一19.
6Otterpohl J R. Baum-Welch learning in discrete hidden Markov models with linear factorial constraints [C]//Dor- ronsoro J R. Lecture Notes in Computer Science 2415. Ber- lin: Springer, 2002 .. 1180-- 1185.
7Colin M, Jignesh M P, Shniti K. OASIS: an online and accu- rate technique for local alignment searches on biological se- quences [C]//Freytag J C, Lockemann P C, Abiteboul S, et al. Proc of the 29th Int Con{ on Very Large Data Bases (VLDB). Berlin: Morgan Kaufmann Publishers, 2003.. 910--921.
8GenBank. National center for biotechnology information [EB/OL]. (1983-04-07) [2011-05-06]. http://www, ncbi. nih. gov/genbank/.
9Ester M, Zhang X. A top-down method for mining most specific frequent patterns in biological sequence data[C]// Proc of the 4th SIAM Int Conf on Data Mining, 2004.. 90--101.
10Chen G, Wu X, Zhu X, et al. Efficient string matching with wildcards and length constraints[J]. Knowledge and Infor marion Systems,2006,4: 399--419.

共引文献13

1王斌,黄晓芳,袁平.基于PrefixSpan序列模式挖掘的改进算法[J].西南科技大学学报,2016,31(4):68-72. 被引量：6
2徐彭娜,魏静,林劼,江育娥.基于位置信息熵的局部敏感哈希聚类方法[J].计算机应用与软件,2018,35(3):230-235.
3李佳楠,高兴泉,李卓,滕小华,黄斌,张继成,唐友.四种机器学习算法预测大豆蛋白质定位对比研究[J].大豆科学,2022,41(3):337-344. 被引量：1
4朱景勇,李钧翔,李旭辉,张瑾,毋文静.深度学习在基于序列的蛋白质互作预测中的应用进展[J].合成生物学,2024,5(1):88-106.
5Xuan Xiao,Xiang Cheng,Shengchao Su,Qi Mao,Kuo-Chen Chou.pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins[J].Natural Science,2017,9(9):330-349. 被引量：4
6Md. Al Mehedi Hasan,Shamim Ahmad.mLysPTMpred: Multiple Lysine PTM Site Prediction Using Combination of SVM with Resolving Data Imbalance Issue[J].Natural Science,2018,10(9):370-384.
7Kuo-Chen Chou.Gordon Life Science Institute and Its Impacts on Computational Biology and Drug Development[J].Natural Science,2020,12(3):125-161. 被引量：2
8Weizhong Lin,Xuan Xiao,Wangren Qiu,Kuo-Chen Chou.Use Chou’s 5-Steps Rule to Predict Remote Homology Proteins by Merging Grey Incidence Analysis and Domain Similarity Analysis[J].Natural Science,2020,12(3):181-198. 被引量：15
9Kuo-Chen Chou.The Development of Gordon Life Science Institute: Its Driving Force and Accomplishments[J].Natural Science,2020,12(4):202-217. 被引量：1
10Yutao Shao,Kuo-Chen Chou.pLoc_Deep-mEuk: Predict Subcellular Localization of Eukaryotic Proteins by Deep Learning[J].Natural Science,2020,12(6):400-428. 被引量：3

福建师范大学学报（自然科学版）

2022年第6期

浏览历史

内容加载中请稍等...

基于多重解码器的自编码器模型的生物序列聚类方法

参考文献2

二级参考文献31

共引文献13

相关作者

相关机构

相关主题

浏览历史