期刊文献+

基于密度感知模式的生物序列分类算法

Biological sequence classification algorithm based on density-aware patterns
下载PDF
导出
摘要 针对现有的基于模式的序列分类算法对于生物序列存在分类精度不理想、模型训练时间长的问题,提出密度感知模式,并设计了基于密度感知模式的生物序列分类算法——BSC。首先,在生物序列中挖掘具有"密度感知"的频繁序列模式;然后,对挖掘出的频繁序列模式进行筛选、排序制定成分类规则;最后,通过分类规则对没有分类的序列进行分类预测。在4组真实生物序列中进行实验,分析了BSC算法参数对结果的影响并提供了推荐参数设置;同时分类结果表明,相比其他四种基于模式的分类算法,BSC算法在实验数据集上的准确率至少提高了2.03个百分点。结果表明,BSC算法有较高的生物序列分类精度和执行效率。 Concerning unsatisfactory classification accuracy and low efficiency of the existing pattern-based classification methods for model training, a concept of density-aware pattern and an algorithm for biological sequence classification based on density-aware patterns, namely BSC( Biological Sequence Classifier), were proposed. Firstly, frequent sequence patterns based on density-aware concept were mined. Then, the mined frequent sequence patterns were filtered and sorted for designing the classification rules. Finally, the sequences without classification were classified by classification rules. According to a number of experiments conducted on four real biological sequence datasets, the influence of BSC algorithm parameters on the results were analyzed and the recommended parameter settings were provided. Meanwhile, the experimental results showed that the accuracies of BSC algorithm were improved by at least 2. 03 percentage points compared with other four pattern-based baseline algorithms. The results indicate that BSC algorithm has high biological sequence classification accuracy and execution efficiency.
出处 《计算机应用》 CSCD 北大核心 2018年第2期427-432,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61572332 81473446) 中国博士后科学基金特别资助项目(2016T90850) 中央高校基本科研业务费资助项目(2016SCU04A22)~~
关键词 生物序列 序列分类 序列模式 密度感知模式 分类规则 biological sequence sequence classification sequential pattern density-aware pattern classification rule
  • 相关文献

参考文献1

二级参考文献30

  • 1季星来,孙之荣.基于结构的丝氨酸蛋白酶超家族进化分析[J].电子学报,2001,29(z1):1756-1758. 被引量:16
  • 2符维娟,汪源源,卢大儒.无比对的生物分子序列比较方法[J].生物医学工程学杂志,2005,22(3):598-601. 被引量:3
  • 3D G George.Proposal for the definition of a protein superfamily[OL].http://pir.georgetown.edu/pirwww/otherinfo/sfdef.pdf,1993-01-01.
  • 4M S Nikolski,J David.Family relationships:Should consensus reign?-Consensus clustering for protein families[J].Bioinformatics,2007,23(2):71-76.
  • 5M Lynch.Intron evolution as a population-genetic process[J].Natl Acad Sci,2002,99(9):6118-6123.
  • 6Y Zhang,K P V Vinci,K Powell,et al.Genome shuffling leads to rapid phenotypic improvement in bacteria[J].Nature,2002,415:644-646.
  • 7M Baron,D G Norman,et al.Protein modules[J].Trends Bioehem Sci,1991,16(1):13-17.
  • 8A Ben-Hut,D Brutlag.Remote homology detection:A motif based approach[J].Bioinformatics,2003,19(1):26-33.
  • 9X Wang,D Schroeder,D Dobbs,et al.Automated data-driven discovery of motif-based protein function classiers[J].Information Sciences,2003,155(1-2):1-18.
  • 10Pfaro[OL].http://www.sanger.ac.uk/Software/Pfam/,2007-05-30.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部