摘要
针对现有的基于模式的序列分类算法对于生物序列存在分类精度不理想、模型训练时间长的问题,提出密度感知模式,并设计了基于密度感知模式的生物序列分类算法——BSC。首先,在生物序列中挖掘具有"密度感知"的频繁序列模式;然后,对挖掘出的频繁序列模式进行筛选、排序制定成分类规则;最后,通过分类规则对没有分类的序列进行分类预测。在4组真实生物序列中进行实验,分析了BSC算法参数对结果的影响并提供了推荐参数设置;同时分类结果表明,相比其他四种基于模式的分类算法,BSC算法在实验数据集上的准确率至少提高了2.03个百分点。结果表明,BSC算法有较高的生物序列分类精度和执行效率。
Concerning unsatisfactory classification accuracy and low efficiency of the existing pattern-based classification methods for model training, a concept of density-aware pattern and an algorithm for biological sequence classification based on density-aware patterns, namely BSC( Biological Sequence Classifier), were proposed. Firstly, frequent sequence patterns based on density-aware concept were mined. Then, the mined frequent sequence patterns were filtered and sorted for designing the classification rules. Finally, the sequences without classification were classified by classification rules. According to a number of experiments conducted on four real biological sequence datasets, the influence of BSC algorithm parameters on the results were analyzed and the recommended parameter settings were provided. Meanwhile, the experimental results showed that the accuracies of BSC algorithm were improved by at least 2. 03 percentage points compared with other four pattern-based baseline algorithms. The results indicate that BSC algorithm has high biological sequence classification accuracy and execution efficiency.
出处
《计算机应用》
CSCD
北大核心
2018年第2期427-432,共6页
journal of Computer Applications
基金
国家自然科学基金资助项目(61572332
81473446)
中国博士后科学基金特别资助项目(2016T90850)
中央高校基本科研业务费资助项目(2016SCU04A22)~~
关键词
生物序列
序列分类
序列模式
密度感知模式
分类规则
biological sequence
sequence classification
sequential pattern
density-aware pattern
classification rule