To classify DNA sequences, k-mer frequency is widely used since it can convert variable-length sequences into fixed-length and numerical feature vectors. However, in case of fixed-length DNA sequence classification, s...To classify DNA sequences, k-mer frequency is widely used since it can convert variable-length sequences into fixed-length and numerical feature vectors. However, in case of fixed-length DNA sequence classification, subsequences starting at a specific position of the given sequence can also be used as categorical features. Through the performance evaluation on six datasets of fixed-length DNA sequences, our algorithm based on the above idea achieved comparable or better performance than other state-of-the art algorithms.展开更多
Fifty bacterial strains isolated from dairy product, skin and blood from cancer and kidney failure dialysis patients were identified to 22 species and the following genera: Brevibacterium, Corynebacterium, Arthrobact...Fifty bacterial strains isolated from dairy product, skin and blood from cancer and kidney failure dialysis patients were identified to 22 species and the following genera: Brevibacterium, Corynebacterium, Arthrobacter, Actinomyces, Exiguobacterium, Kocuria, Micrococcus, Rothia, Rhodococcus and classified numerically using a set of 52 phenetic characteristics, using simple matching coefficient (Ssm) and clustering method of unweighted average linkage between groups by SPSS program. They were also grouped to 7 main clusters and 29 sub-clusters in the hierarchical tree. Twelve isolates of the different species from the genera Brevibacterium, Arthrobacter, Corynebacterium, Kocuria, Rhodococcus, Rothia were selected from the taxonomic clusters and probed for lin gene by peR. One species Kocuria rhizophila which inhibited most of the test organism did not have lin gene in the chromosome while the species Corynebacterium glucuronolyticum, Arthrobacter comminsii, Arthrobacter oxydans have the lin gene. Our results establish a wide distribution of the structural gene encoding this Iinocin M 18 within coryneform bacteria and also in the genus Kocuria.展开更多
One of the critical aspects in mine design is slope stability analysis and the determination of stable slopes. In the Chador- Malu iron ore mine, one of the most important iron ore mines in central Iran, it was consid...One of the critical aspects in mine design is slope stability analysis and the determination of stable slopes. In the Chador- Malu iron ore mine, one of the most important iron ore mines in central Iran, it was considered vital to perform a comprehensive slope stability analysis. At first, we divided the existing rock hosting pit into six zones and a geotechnical map was prepared. Then, the value of MRMR (Mining Rock Mass Rating) was determined for each zone. Owing to the fact that the Chador-Malu iron ore mine is located in a highly tectonic area and the rock mass completely crushed, the Hoek-Brown failure criterion was found suitable to estimate geo-mechanical parameters. After that, the value of cohesion (c) and friction angle (tp) were calculated for different geotechnical zones and relative graphs and equations were derived as a function of slope height. The stability analyses using numerical and limit equilibrium methods showed that some instability problems might occur by increasing the slope height. Therefore, stable slopes for each geotechnical zone and prepared sections were calculated and presented as a function of slope height.展开更多
The conventional sparse representation-based image classification usually codes the samples independently,which will ignore the correlation information existed in the data.Hence,if we can explore the correlation infor...The conventional sparse representation-based image classification usually codes the samples independently,which will ignore the correlation information existed in the data.Hence,if we can explore the correlation information hidden in the data,the classification result will be improved significantly.To this end,in this paper,a novel weighted supervised spare coding method is proposed to address the image classification problem.The proposed method firstly explores the structural information sufficiently hidden in the data based on the low rank representation.And then,it introduced the extracted structural information to a novel weighted sparse representation model to code the samples in a supervised way.Experimental results show that the proposed method is superiority to many conventional image classification methods.展开更多
Pedogenetic soil horizons are one of the fundamental building blocks of modern soil classification; however, in soils of urban areas which are often strongly disturbed by human activities, horizons are difficult to di...Pedogenetic soil horizons are one of the fundamental building blocks of modern soil classification; however, in soils of urban areas which are often strongly disturbed by human activities, horizons are difficult to distinguish but substitutive morphological layers may be identified. To identify the characteristic soil layers in an urban environment, 224 soil layers of 36 in-situ pedons were examined and described in urban and suburban Nanjing, and 27 variables were extracted for multivariate analysis. Three groups and six subdivisions were identified by TwoStep cluster analysis combined with hierarchical cluster analysis based on factor scores. Soil forming factors and soil forming processes could be interpreted from the principal component analysis (PCA) of variables, cluster analysis of soil layers, and discriminant analysis of soil layer groups and their subdivisions. Parent materials, moisture regimes, organic matter accumulation, and especially nutrient accumulation were the main causes of characteristic soil layer formations. The numerical approaches used in this study were useful tools for characteristic soil layer identification of urban soils.展开更多
The growth trajectory of hailstones in clouds determines the ground intensity and spatial distribution of hailfall.A systematic study of hail trajectories can help improve the current scientific understanding of the m...The growth trajectory of hailstones in clouds determines the ground intensity and spatial distribution of hailfall.A systematic study of hail trajectories can help improve the current scientific understanding of the mechanisms by which hail forms in semi-arid regions of China and,in doing so,improve the quality of hail forecasts and warnings and help to prevent and mitigate disasters.In this study,the WRFv3.7.1 model was employed to provide the background field to drive the hailstone trajectory model.Cluster analysis was then used to classify hail trajectories to investigate the characteristics of different types of hail trajectories and the microphysical characteristics of hail formation.The differences in hail trajectories might be mainly due to differences in the background flow fields and microphysical fields of hail clouds in different regions.Comparative analysis revealed that as the maximum particle size of ground hailfall increased,the maximum supercooled cloud water content and the maximum updraft velocity for the formation and growth of hailstone increased.The larger the size when the hailstone reaches its maximum height,the larger the ground hailstone formed.Overall,the formation and growth of hailstone are caused by the joint action of the dynamical flow field and cloud microphysical processes.The physical processes of hailstone growth and main growth regions differ for different types of hail trajectories.Therefore,different catalytic schemes should be adopted in artificial hail prevention operations for different hail clouds and trajectories due to differences in hail formation processes and ground hailfall characteristics.展开更多
Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treat...Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.展开更多
Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein charac...Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.展开更多
Chinese abbreviation,containing fewer words and delivering a wealth of information,is a vital component of Chinese language.But the tremendous differences between Chinese and English make it an arduous task to transla...Chinese abbreviation,containing fewer words and delivering a wealth of information,is a vital component of Chinese language.But the tremendous differences between Chinese and English make it an arduous task to translate Chinese abbreviations into English.Based on the analyses of the structure and patterns of word–formation of Chinese abbreviations,it makes a classification of Chinese abbreviations,summarize the translation methods,and point out some attention points in translation.A systematic analysis on the structure and classification of Chinese abbreviations will be beneficial to reduce the mistakes in its translation.展开更多
文摘To classify DNA sequences, k-mer frequency is widely used since it can convert variable-length sequences into fixed-length and numerical feature vectors. However, in case of fixed-length DNA sequence classification, subsequences starting at a specific position of the given sequence can also be used as categorical features. Through the performance evaluation on six datasets of fixed-length DNA sequences, our algorithm based on the above idea achieved comparable or better performance than other state-of-the art algorithms.
文摘Fifty bacterial strains isolated from dairy product, skin and blood from cancer and kidney failure dialysis patients were identified to 22 species and the following genera: Brevibacterium, Corynebacterium, Arthrobacter, Actinomyces, Exiguobacterium, Kocuria, Micrococcus, Rothia, Rhodococcus and classified numerically using a set of 52 phenetic characteristics, using simple matching coefficient (Ssm) and clustering method of unweighted average linkage between groups by SPSS program. They were also grouped to 7 main clusters and 29 sub-clusters in the hierarchical tree. Twelve isolates of the different species from the genera Brevibacterium, Arthrobacter, Corynebacterium, Kocuria, Rhodococcus, Rothia were selected from the taxonomic clusters and probed for lin gene by peR. One species Kocuria rhizophila which inhibited most of the test organism did not have lin gene in the chromosome while the species Corynebacterium glucuronolyticum, Arthrobacter comminsii, Arthrobacter oxydans have the lin gene. Our results establish a wide distribution of the structural gene encoding this Iinocin M 18 within coryneform bacteria and also in the genus Kocuria.
文摘One of the critical aspects in mine design is slope stability analysis and the determination of stable slopes. In the Chador- Malu iron ore mine, one of the most important iron ore mines in central Iran, it was considered vital to perform a comprehensive slope stability analysis. At first, we divided the existing rock hosting pit into six zones and a geotechnical map was prepared. Then, the value of MRMR (Mining Rock Mass Rating) was determined for each zone. Owing to the fact that the Chador-Malu iron ore mine is located in a highly tectonic area and the rock mass completely crushed, the Hoek-Brown failure criterion was found suitable to estimate geo-mechanical parameters. After that, the value of cohesion (c) and friction angle (tp) were calculated for different geotechnical zones and relative graphs and equations were derived as a function of slope height. The stability analyses using numerical and limit equilibrium methods showed that some instability problems might occur by increasing the slope height. Therefore, stable slopes for each geotechnical zone and prepared sections were calculated and presented as a function of slope height.
基金This research is funded by the National Natural Science Foundation of China(61771154).
文摘The conventional sparse representation-based image classification usually codes the samples independently,which will ignore the correlation information existed in the data.Hence,if we can explore the correlation information hidden in the data,the classification result will be improved significantly.To this end,in this paper,a novel weighted supervised spare coding method is proposed to address the image classification problem.The proposed method firstly explores the structural information sufficiently hidden in the data based on the low rank representation.And then,it introduced the extracted structural information to a novel weighted sparse representation model to code the samples in a supervised way.Experimental results show that the proposed method is superiority to many conventional image classification methods.
基金the National Natural Science Foundation of China (No40625001)the Knowledge Innovation Program of the Chinese Academy of Sciences (NoKZCX2-YW-409)
文摘Pedogenetic soil horizons are one of the fundamental building blocks of modern soil classification; however, in soils of urban areas which are often strongly disturbed by human activities, horizons are difficult to distinguish but substitutive morphological layers may be identified. To identify the characteristic soil layers in an urban environment, 224 soil layers of 36 in-situ pedons were examined and described in urban and suburban Nanjing, and 27 variables were extracted for multivariate analysis. Three groups and six subdivisions were identified by TwoStep cluster analysis combined with hierarchical cluster analysis based on factor scores. Soil forming factors and soil forming processes could be interpreted from the principal component analysis (PCA) of variables, cluster analysis of soil layers, and discriminant analysis of soil layer groups and their subdivisions. Parent materials, moisture regimes, organic matter accumulation, and especially nutrient accumulation were the main causes of characteristic soil layer formations. The numerical approaches used in this study were useful tools for characteristic soil layer identification of urban soils.
基金supported by the National Natural Science Foundation of China (Grant Nos. 41975176, 42061134009)the High Performance Computing Center of Nanjing University of Information Science and Technology for their support of this work
文摘The growth trajectory of hailstones in clouds determines the ground intensity and spatial distribution of hailfall.A systematic study of hail trajectories can help improve the current scientific understanding of the mechanisms by which hail forms in semi-arid regions of China and,in doing so,improve the quality of hail forecasts and warnings and help to prevent and mitigate disasters.In this study,the WRFv3.7.1 model was employed to provide the background field to drive the hailstone trajectory model.Cluster analysis was then used to classify hail trajectories to investigate the characteristics of different types of hail trajectories and the microphysical characteristics of hail formation.The differences in hail trajectories might be mainly due to differences in the background flow fields and microphysical fields of hail clouds in different regions.Comparative analysis revealed that as the maximum particle size of ground hailfall increased,the maximum supercooled cloud water content and the maximum updraft velocity for the formation and growth of hailstone increased.The larger the size when the hailstone reaches its maximum height,the larger the ground hailstone formed.Overall,the formation and growth of hailstone are caused by the joint action of the dynamical flow field and cloud microphysical processes.The physical processes of hailstone growth and main growth regions differ for different types of hail trajectories.Therefore,different catalytic schemes should be adopted in artificial hail prevention operations for different hail clouds and trajectories due to differences in hail formation processes and ground hailfall characteristics.
文摘Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.
文摘Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.
文摘Chinese abbreviation,containing fewer words and delivering a wealth of information,is a vital component of Chinese language.But the tremendous differences between Chinese and English make it an arduous task to translate Chinese abbreviations into English.Based on the analyses of the structure and patterns of word–formation of Chinese abbreviations,it makes a classification of Chinese abbreviations,summarize the translation methods,and point out some attention points in translation.A systematic analysis on the structure and classification of Chinese abbreviations will be beneficial to reduce the mistakes in its translation.