Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous deb...Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous debris flows showing complicated sediment composition and sedimentary processes were poorly understood. The main objective of this work is to establish a classification scheme and facies sequence models of subaqueous debris flows for well understanding their sedimentary processes and depositional characteristics.展开更多
In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we pr...In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we proposed a new approach in classifying DNA sequences using the convolutional neural network while considering these sequences as text data. We used one-hot vectors to represent sequences as input to the model;therefore, it conserves the essential position information of each nucleotide in sequences. Using 12 DNA sequence datasets, we evaluated our proposed model and achieved significant improvements in all of these datasets. This result has shown a potential of using convolutional neural network for DNA sequence to solve other sequence problems in bioinformatics.展开更多
Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomef...Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomefamous due to its high tendency towards convergence to the global optimummost of the time. But, still the standard bat with random walk has a problemof getting stuck in local minima. In order to solve this problem, this researchproposed bat algorithm with levy flight random walk. Then, the proposedBat with Levy flight algorithm is further hybridized with three differentvariants of ANN. The proposed BatLFBP is applied to the problem ofinsulin DNA sequence classification of healthy homosapien. For classificationperformance, the proposed models such as Bat levy flight Artificial NeuralNetwork (BatLFANN) and Bat levy Flight Back Propagation (BatLFBP) arecompared with the other state-of-the-art algorithms like Bat Artificial NeuralNetwork (BatANN), Bat back propagation (BatBP), Bat Gaussian distribution Artificial Neural Network (BatGDANN). And Bat Gaussian distributionback propagation (BatGDBP), in-terms of means squared error (MSE) andaccuracy. From the perspective of simulations results, it is show that theproposed BatLFANN achieved 99.88153% accuracy with MSE of 0.001185,and BatLFBP achieved 99.834185 accuracy with MSE of 0.001658 on WL5.While on WL10 the proposed BatLFANN achieved 99.89899% accuracy withMSE of 0.00101, and BatLFBP achieved 99.84473% accuracy with MSE of0.004553. Similarly, on WL15 the proposed BatLFANN achieved 99.82853%accuracy with MSE of 0.001715, and BatLFBP achieved 99.3262% accuracywith MSE of 0.006738 which achieve better accuracy as compared to the otherhybrid models.展开更多
From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-c...From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which axe the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples axe covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.展开更多
The energy of interaction between DNA strands in promoters is of great functional importance. Visualization of the energy of DNA strands distribution in promoter sequences was achieved. The separation of promoters in ...The energy of interaction between DNA strands in promoters is of great functional importance. Visualization of the energy of DNA strands distribution in promoter sequences was achieved. The separation of promoters in groups by their energetic properties enables evaluation of the dependence of promoter strength on the energetic properties. The analysis of groups (clusters) of promoters distributed by the energy of DNA strands interaction in ?55, ?35, ?10 and +6 sequences indicates their connection with the transcriptional activity.展开更多
A recombinant inbred line (RIL) population of F8 and F9 generations derived from a cross between a typical indica rice (Qishanzhan) and a typical japonica rice (Akihikari) was used to study the difference betwee...A recombinant inbred line (RIL) population of F8 and F9 generations derived from a cross between a typical indica rice (Qishanzhan) and a typical japonica rice (Akihikari) was used to study the difference between morphological differentiation based on phenotype characters and genetic differentiation using indica and japonica specific SSR markers, and to evaluate the relationship between vascular bundle characters and morphological and genetic differentiations. The results showed that the frequency distributions of morphological and genetic differentiations were all inclined to japonica type in the filial generation. The population was more inclined to japonica type based on genetic differentiation than on morphological differentiation. The consistent degrees of classification based on the Cheng’s index, the ratio of large vascular bundle number to small vascular bundle number in panicle neck (RLSVB) and the ratio of large vascular bundle number in the second internode from the top to that in the panicle neck (RLVB) were all about 50% compared with the genetic differentiation, and the consistent degree of the total scores of the Cheng’s index combined with the vascular bundle number ratios was significantly increased to about 80% compared with the genetic differentiation. Therefore, the vascular bundle characters could be used as a helpful supplement for subspecies classification.展开更多
基金jointly funded by the National Natural Science Foundation of China(grants No.41172104,41202078 and 41372117)the Major National S&T Program of China(grant No.2011ZX05009-002)
文摘Objective Debris flows are cohesive sediment gravity flows which occur in both subaerial and subaqueous settings. Compared to subaerial debris flows which have been well studied as a geological hazard, subaqueous debris flows showing complicated sediment composition and sedimentary processes were poorly understood. The main objective of this work is to establish a classification scheme and facies sequence models of subaqueous debris flows for well understanding their sedimentary processes and depositional characteristics.
文摘In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we proposed a new approach in classifying DNA sequences using the convolutional neural network while considering these sequences as text data. We used one-hot vectors to represent sequences as input to the model;therefore, it conserves the essential position information of each nucleotide in sequences. Using 12 DNA sequence datasets, we evaluated our proposed model and achieved significant improvements in all of these datasets. This result has shown a potential of using convolutional neural network for DNA sequence to solve other sequence problems in bioinformatics.
基金This research is supported by Tier-1 Research Grant, vote no. H938 by ResearchManagement Office (RMC), Universiti Tun Hussein Onn Malaysia and Ministry of Higher Education,Malaysia.
文摘Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomefamous due to its high tendency towards convergence to the global optimummost of the time. But, still the standard bat with random walk has a problemof getting stuck in local minima. In order to solve this problem, this researchproposed bat algorithm with levy flight random walk. Then, the proposedBat with Levy flight algorithm is further hybridized with three differentvariants of ANN. The proposed BatLFBP is applied to the problem ofinsulin DNA sequence classification of healthy homosapien. For classificationperformance, the proposed models such as Bat levy flight Artificial NeuralNetwork (BatLFANN) and Bat levy Flight Back Propagation (BatLFBP) arecompared with the other state-of-the-art algorithms like Bat Artificial NeuralNetwork (BatANN), Bat back propagation (BatBP), Bat Gaussian distribution Artificial Neural Network (BatGDANN). And Bat Gaussian distributionback propagation (BatGDBP), in-terms of means squared error (MSE) andaccuracy. From the perspective of simulations results, it is show that theproposed BatLFANN achieved 99.88153% accuracy with MSE of 0.001185,and BatLFBP achieved 99.834185 accuracy with MSE of 0.001658 on WL5.While on WL10 the proposed BatLFANN achieved 99.89899% accuracy withMSE of 0.00101, and BatLFBP achieved 99.84473% accuracy with MSE of0.004553. Similarly, on WL15 the proposed BatLFANN achieved 99.82853%accuracy with MSE of 0.001715, and BatLFBP achieved 99.3262% accuracywith MSE of 0.006738 which achieve better accuracy as compared to the otherhybrid models.
基金supported by Australian Research Council Linkage Project under Grant No. LP0775041the Early Career Researcher Grant under Grant No. 2007002448 from University of Technology, Sydney, Australia
文摘From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which axe the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples axe covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.
文摘The energy of interaction between DNA strands in promoters is of great functional importance. Visualization of the energy of DNA strands distribution in promoter sequences was achieved. The separation of promoters in groups by their energetic properties enables evaluation of the dependence of promoter strength on the energetic properties. The analysis of groups (clusters) of promoters distributed by the energy of DNA strands interaction in ?55, ?35, ?10 and +6 sequences indicates their connection with the transcriptional activity.
基金supported by the National Basic Research Program of China (Grant No.2009CB126007)the ‘948’ Project of China
文摘A recombinant inbred line (RIL) population of F8 and F9 generations derived from a cross between a typical indica rice (Qishanzhan) and a typical japonica rice (Akihikari) was used to study the difference between morphological differentiation based on phenotype characters and genetic differentiation using indica and japonica specific SSR markers, and to evaluate the relationship between vascular bundle characters and morphological and genetic differentiations. The results showed that the frequency distributions of morphological and genetic differentiations were all inclined to japonica type in the filial generation. The population was more inclined to japonica type based on genetic differentiation than on morphological differentiation. The consistent degrees of classification based on the Cheng’s index, the ratio of large vascular bundle number to small vascular bundle number in panicle neck (RLSVB) and the ratio of large vascular bundle number in the second internode from the top to that in the panicle neck (RLVB) were all about 50% compared with the genetic differentiation, and the consistent degree of the total scores of the Cheng’s index combined with the vascular bundle number ratios was significantly increased to about 80% compared with the genetic differentiation. Therefore, the vascular bundle characters could be used as a helpful supplement for subspecies classification.