Dominance-based rough set approach(DRSA) permits representation and analysis of all phenomena involving monotonicity relationship between some measures or perceptions.DRSA has also some merits within granular computin...Dominance-based rough set approach(DRSA) permits representation and analysis of all phenomena involving monotonicity relationship between some measures or perceptions.DRSA has also some merits within granular computing,as it extends the paradigm of granular computing to ordered data,specifies a syntax and modality of information granules which are appropriate for dealing with ordered data,and enables computing with words and reasoning about ordered data.Granular computing with ordered data is a very general paradigm,because other modalities of information constraints,such as veristic,possibilistic and probabilistic modalities,have also to deal with ordered value sets(with qualifiers relative to grades of truth,possibility and probability),which gives DRSA a large area of applications.展开更多
We present two recent methods,called UTAGMS and GRIP,from the viewpoint of robust ranking of multi-criteria alternatives.In these methods,the preference information provided by a single or multiple Decision Makers(DMs...We present two recent methods,called UTAGMS and GRIP,from the viewpoint of robust ranking of multi-criteria alternatives.In these methods,the preference information provided by a single or multiple Decision Makers(DMs)is composed of holistic judgements of some selected alternatives,called reference alternatives.The judgements express pairwise comparisons of some reference alternatives(in UTAGMS),and comparisons of selected pairs of reference alternatives from the viewpoint of intensity of preference(in GRIP).Ordinal regression is used to find additive value functions compatible with this preference information.The whole set of compatible value functions is then used in Linear Programming(LP)to calculate a necessary and possible weak preference relations in the set of all alternatives,and in the set of all pairs of alternatives.While the necessary relation is true for all compatible value functions,the possible relation is true for at least one compatible value function.The necessary relation is a partial preorder and the possible relation is a complete and negatively transitive relation.The necessary relations show consequences of the given preference information which are robust because "always true".We illustrate this methodology with an example.展开更多
Deep Learning(DL)is a subfield of machine learning that significantly impacts extracting new knowledge.By using DL,the extraction of advanced data representations and knowledge can be made possible.Highly effective DL...Deep Learning(DL)is a subfield of machine learning that significantly impacts extracting new knowledge.By using DL,the extraction of advanced data representations and knowledge can be made possible.Highly effective DL techniques help to find more hidden knowledge.Deep learning has a promising future due to its great performance and accuracy.We need to understand the fundamentals and the state‐of‐the‐art of DL to leverage it effectively.A survey on DL ways,advantages,drawbacks,architectures,and methods to have a straightforward and clear understanding of it from different views is explained in the paper.Moreover,the existing related methods are compared with each other,and the application of DL is described in some applications,such as medical image analysis,handwriting recognition,and so on.展开更多
Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spat...Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spatial distribution of CD3^(+)and CD8^(+)T cells within the tumor microenvironment has been demonstrated to have a significant impact on the prognosis of colorectal cancer(CRC).This study aimed to investigate CD3_(CT)(CD3^(+)T cells density in the core of the tumor[CT])prognostic ability in patients with CRC by using AI technology.Methods:The study involved the enrollment of 492 patients from two distinct medical centers,with 358 patients assigned to the training cohort and an additional 134 patients allocated to the validation cohort.To facilitate tissue segmentation and T-cells quantification in whole-slide images(WSIs),a fully automated workflow based on deep learning was devised.Upon the completion of tissue segmentation and subsequent cell segmentation,a comprehensive analysis was conducted.Results:The evaluation of various positive T cell densities revealed comparable discriminatory ability between CD3_(CT) and CD3-CD8(the combination of CD3^(+)and CD8^(+)T cells density within the CT and invasive margin)in predicting mortality(C-index in training cohort:0.65 vs.0.64;validation cohort:0.69 vs.0.69).The CD3_(CT) was confirmed as an independent prognostic factor,with high CD3_(CT) density associated with increased overall survival(OS)in the training cohort(hazard ratio[HR]=0.22,95%confidence interval[CI]:0.12–0.38,P<0.001)and validation cohort(HR=0.21,95%CI:0.05–0.92,P=0.037).Conclusions:We quantify the spatial distribution of CD3^(+)and CD8^(+)T cells within tissue regions in WSIs using AI technology.The CD3_(CT) confirmed as a stage-independent predictor for OS in CRC patients.Moreover,CD3_(CT) shows promise in simplifying the CD3-CD8 system and facilitating its practical application in clinical settings.展开更多
COVID-19 has caused severe health complications and produced a substantial adverse economic impact around the world.Forecasting the trend of COVID-19 infections could help in executing policies to effectively reduce t...COVID-19 has caused severe health complications and produced a substantial adverse economic impact around the world.Forecasting the trend of COVID-19 infections could help in executing policies to effectively reduce the number of new cases.In this study,we apply the decomposition and ensemble model to forecast COVID-19 confirmed cases,deaths,and recoveries in Pakistan for the upcoming month until the end of July.For the decomposition of data,the Ensemble Empirical Mode Decomposition(EEMD)technique is applied.EEMD decomposes the data into small components,called Intrinsic Mode Functions(IMFs).For individual IMFs modelling,we use the Autoregressive Integrated Moving Average(ARIMA)model.The data used in this study is obtained from the official website of Pakistan that is publicly available and designated for COVID-19 outbreak with daily updates.Our analyses reveal that the number of recoveries,new cases,and deaths are increasing in Pakistan exponentially.Based on the selected EEMD-ARIMA model,the new confirmed cases are expected to rise from 213,470 to 311,454 by 31 July 2020,which is an increase of almost 1.46 times with a 95%prediction interval of 246,529 to 376,379.The 95%prediction interval for recovery is 162,414 to 224,579,with an increase of almost two times in total from 100802 to 193495 by 31 July 2020.On the other hand,the deaths are expected to increase from 4395 to 6751,which is almost 1.54 times,with a 95%prediction interval of 5617 to 7885.Thus,the COVID-19 forecasting results of Pakistan are alarming for the next month until 31 July 2020.They also confirm that the EEMD-ARIMA model is useful for the short-term forecasting of COVID-19,and that it is capable of keeping track of the real COVID-19 data in nearly all scenarios.The decomposition and ensemble strategy can be useful to help decision-makers in developing short-term strategies about the current number of disease occurrences until an appropriate vaccine is developed.展开更多
We present main ideas of a recently proposed method for interactive multiobjective optimization,which is based on application of a logical preference model built using the Dominance-based Rough Set Approach(DRSA).
Background:Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome.SARS-CoV-2 with potential origin of bat is still circulating in China.In this study,a prediction model is propo...Background:Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome.SARS-CoV-2 with potential origin of bat is still circulating in China.In this study,a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning.Methods:The spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource(2019nCoVR)Database of China National Genomics Data Center on Jan 29,2020.A total of 507 human-origin viruses were regarded as positive samples,whereas 2159 non-human-origin viruses were regarded as negative.To capture the key information of the spike protein,three feature encoding algorithms(amino acid composition,AAC;parallel correlation-based pseudo-amino-acid composition,PC-PseAAC and G-gap dipeptide composition,GGAP)were used to train 41 random forest models.The optimal feature with the best performance was identified by the multidimensional scaling method,which was used to explore the pattern of human coronavirus.Results:The 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP(g=3)feature.The predictive model achieved the maximum ACC of 98.18%coupled with the Matthews correlation coefficient(MCC)of 0.9638.Seven clusters for human coronaviruses(229E,NL63,OC43,HKU1,MERS-CoV,SARS-CoV,and SARS-CoV-2)were found.The cluster for SARS-CoV-2 was very close to that for SARS-CoV,which suggests that both of viruses have the same human receptor(angiotensin converting enzyme II).The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously.The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual.Conclusions:The optimal feature(GGAP,g=3)performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple,fast and large-scale manner.The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.展开更多
Background:Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity pheno...Background:Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus.Methods:The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification.Results:The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method.Conclusions:The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.展开更多
Background:Coronaviruses can be isolated from bats,civets,pangolins,birds and other wild animals.As an animalorigin pathogen,coronavirus can cross species barrier and cause pandemic in humans.In this study,a deep lear...Background:Coronaviruses can be isolated from bats,civets,pangolins,birds and other wild animals.As an animalorigin pathogen,coronavirus can cross species barrier and cause pandemic in humans.In this study,a deep learning model for early prediction of pandemic risk was proposed based on the sequences of viral genomes.Methods:A total of 3257 genomes were downloaded from the Coronavirus Genome Resource Library.We present a deep learning model of cross-species coronavirus infection that combines a bidirectional gated recurrent unit network with a one-dimensional convolution.The genome sequence of animal-origin coronavirus was directly input to extract features and predict pandemic risk.The best performances were explored with the use of pre-trained DNA vector and attention mechanism.The area under the receiver operating characteristic curve(AUROC)and the area under precision-recall curve(AUPR)were used to evaluate the predictive models.Results:The six specifc models achieved good performances for the corresponding virus groups(1 for AUROC and 1 for AUPR).The general model with pre-training vector and attention mechanism provided excellent predictions for all virus groups(1 for AUROC and 1 for AUPR)while those without pre-training vector or attention mechanism had obvi‑ously reduction of performance(about 5–25%).Re-training experiments showed that the general model has good capabilities of transfer learning(average for six groups:0.968 for AUROC and 0.942 for AUPR)and should give reason‑able prediction for potential pathogen of next pandemic.The artifcial negative data with the replacement of the coding region of the spike protein were also predicted correctly(100%accuracy).With the application of the Python programming language,an easy-to-use tool was created to implements our predictor.Conclusions:Robust deep learning model with pre-training vector and attention mechanism mastered the features from the whole genomes of animal-origin coronaviruses and could predict the risk of cross-species infection for early warning of next pandemic.展开更多
Dynamic geometry software, as a piece of computer-assisted instruction(CAI) software, is closely and deeply associated with mathematics, and is widely applied to mathematics teaching activities in primary and secondar...Dynamic geometry software, as a piece of computer-assisted instruction(CAI) software, is closely and deeply associated with mathematics, and is widely applied to mathematics teaching activities in primary and secondary schools. Meanwhile, web technology also has become an important technology for assisting education and teaching. This paper expounds a web-based dynamic geometry software development process, and analyses specific requirements regarding graphical application programming interface(API) required by dynamic geometry software. With experiments and comparison on the two different hypertext markup language(HTML)5 graphical API technologies, i.e., scalable vector graphics(SVG) and Canvas, on different apparatuses and browsers, we draw the conclusion that it is more suitable to adopt Canvas as the graphical API technology for the web-based dynamic geometry software, thus further proposed the principles and methods for an object-oriented Canvas design. The dynamic geometry software based on the newly-designed Canvas has technical advantages and educational value, well incorporating aesthetic education into mathematics education.展开更多
This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exist...This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.展开更多
A dynamic geometry system,as an important application in the field of geometric constraint solving,is widely used in elementary mathematics education;moreover,the dynamic geometry system is also a fundamental environm...A dynamic geometry system,as an important application in the field of geometric constraint solving,is widely used in elementary mathematics education;moreover,the dynamic geometry system is also a fundamental environment for automated theorem proving in geometry.In a geometric constraint solving process,a situation involving a critical point is often encountered,and geometric element degeneracy may occur at this point.Usually,the degeneracy situation must be substantively focused on during the learning and exploration process.However,many degeneracy situations cannot be completely presented even by the well-known dynamic geometry software.In this paper,the mechanisms causing the degeneracy of a geometric element are analyzed,and relevant definitions and formalized descriptions for the problem are provided according to the relevant modern Euclidean geometry theories.To solve the problem,the data structure is optimized,and a domain model design for the geometric element and the constraint relationships thereof in the dynamic geometry system are formed;furthermore,an update algorithm for the element is proposed based on the novel domain model.In addition,instances show that the proposed domain model and the update algorithm can effectively cope with the geometric element degeneracy situations in the geometric constraint solving process,thereby achieving unification of the dynamic geometry drawing and the geometric intuition of the user.展开更多
文摘Dominance-based rough set approach(DRSA) permits representation and analysis of all phenomena involving monotonicity relationship between some measures or perceptions.DRSA has also some merits within granular computing,as it extends the paradigm of granular computing to ordered data,specifies a syntax and modality of information granules which are appropriate for dealing with ordered data,and enables computing with words and reasoning about ordered data.Granular computing with ordered data is a very general paradigm,because other modalities of information constraints,such as veristic,possibilistic and probabilistic modalities,have also to deal with ordered value sets(with qualifiers relative to grades of truth,possibility and probability),which gives DRSA a large area of applications.
文摘We present two recent methods,called UTAGMS and GRIP,from the viewpoint of robust ranking of multi-criteria alternatives.In these methods,the preference information provided by a single or multiple Decision Makers(DMs)is composed of holistic judgements of some selected alternatives,called reference alternatives.The judgements express pairwise comparisons of some reference alternatives(in UTAGMS),and comparisons of selected pairs of reference alternatives from the viewpoint of intensity of preference(in GRIP).Ordinal regression is used to find additive value functions compatible with this preference information.The whole set of compatible value functions is then used in Linear Programming(LP)to calculate a necessary and possible weak preference relations in the set of all alternatives,and in the set of all pairs of alternatives.While the necessary relation is true for all compatible value functions,the possible relation is true for at least one compatible value function.The necessary relation is a partial preorder and the possible relation is a complete and negatively transitive relation.The necessary relations show consequences of the given preference information which are robust because "always true".We illustrate this methodology with an example.
文摘Deep Learning(DL)is a subfield of machine learning that significantly impacts extracting new knowledge.By using DL,the extraction of advanced data representations and knowledge can be made possible.Highly effective DL techniques help to find more hidden knowledge.Deep learning has a promising future due to its great performance and accuracy.We need to understand the fundamentals and the state‐of‐the‐art of DL to leverage it effectively.A survey on DL ways,advantages,drawbacks,architectures,and methods to have a straightforward and clear understanding of it from different views is explained in the paper.Moreover,the existing related methods are compared with each other,and the application of DL is described in some applications,such as medical image analysis,handwriting recognition,and so on.
基金supported by grants from the National Key R&D Program of China(No.2021YFF1201003)the National Science Fund for Distinguished Young Scholars(No.81925023)+3 种基金the Key-Area Research and Development Program of Guangdong Province(No.2021B0101420006)the Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application(No.2022B1212010011)the High-level Hospital Construction Project(No.DFJHBF202105)the National Science Foundation for Young Scientists of China(No.82001986)
文摘Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spatial distribution of CD3^(+)and CD8^(+)T cells within the tumor microenvironment has been demonstrated to have a significant impact on the prognosis of colorectal cancer(CRC).This study aimed to investigate CD3_(CT)(CD3^(+)T cells density in the core of the tumor[CT])prognostic ability in patients with CRC by using AI technology.Methods:The study involved the enrollment of 492 patients from two distinct medical centers,with 358 patients assigned to the training cohort and an additional 134 patients allocated to the validation cohort.To facilitate tissue segmentation and T-cells quantification in whole-slide images(WSIs),a fully automated workflow based on deep learning was devised.Upon the completion of tissue segmentation and subsequent cell segmentation,a comprehensive analysis was conducted.Results:The evaluation of various positive T cell densities revealed comparable discriminatory ability between CD3_(CT) and CD3-CD8(the combination of CD3^(+)and CD8^(+)T cells density within the CT and invasive margin)in predicting mortality(C-index in training cohort:0.65 vs.0.64;validation cohort:0.69 vs.0.69).The CD3_(CT) was confirmed as an independent prognostic factor,with high CD3_(CT) density associated with increased overall survival(OS)in the training cohort(hazard ratio[HR]=0.22,95%confidence interval[CI]:0.12–0.38,P<0.001)and validation cohort(HR=0.21,95%CI:0.05–0.92,P=0.037).Conclusions:We quantify the spatial distribution of CD3^(+)and CD8^(+)T cells within tissue regions in WSIs using AI technology.The CD3_(CT) confirmed as a stage-independent predictor for OS in CRC patients.Moreover,CD3_(CT) shows promise in simplifying the CD3-CD8 system and facilitating its practical application in clinical settings.
文摘COVID-19 has caused severe health complications and produced a substantial adverse economic impact around the world.Forecasting the trend of COVID-19 infections could help in executing policies to effectively reduce the number of new cases.In this study,we apply the decomposition and ensemble model to forecast COVID-19 confirmed cases,deaths,and recoveries in Pakistan for the upcoming month until the end of July.For the decomposition of data,the Ensemble Empirical Mode Decomposition(EEMD)technique is applied.EEMD decomposes the data into small components,called Intrinsic Mode Functions(IMFs).For individual IMFs modelling,we use the Autoregressive Integrated Moving Average(ARIMA)model.The data used in this study is obtained from the official website of Pakistan that is publicly available and designated for COVID-19 outbreak with daily updates.Our analyses reveal that the number of recoveries,new cases,and deaths are increasing in Pakistan exponentially.Based on the selected EEMD-ARIMA model,the new confirmed cases are expected to rise from 213,470 to 311,454 by 31 July 2020,which is an increase of almost 1.46 times with a 95%prediction interval of 246,529 to 376,379.The 95%prediction interval for recovery is 162,414 to 224,579,with an increase of almost two times in total from 100802 to 193495 by 31 July 2020.On the other hand,the deaths are expected to increase from 4395 to 6751,which is almost 1.54 times,with a 95%prediction interval of 5617 to 7885.Thus,the COVID-19 forecasting results of Pakistan are alarming for the next month until 31 July 2020.They also confirm that the EEMD-ARIMA model is useful for the short-term forecasting of COVID-19,and that it is capable of keeping track of the real COVID-19 data in nearly all scenarios.The decomposition and ensemble strategy can be useful to help decision-makers in developing short-term strategies about the current number of disease occurrences until an appropriate vaccine is developed.
文摘We present main ideas of a recently proposed method for interactive multiobjective optimization,which is based on application of a logical preference model built using the Dominance-based Rough Set Approach(DRSA).
基金This work was supported by the National Natural Science Foundation of China(61972109,61632002)the Natural Science Foundation of Guangdong Province of China(2018A030313380)。
文摘Background:Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome.SARS-CoV-2 with potential origin of bat is still circulating in China.In this study,a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning.Methods:The spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource(2019nCoVR)Database of China National Genomics Data Center on Jan 29,2020.A total of 507 human-origin viruses were regarded as positive samples,whereas 2159 non-human-origin viruses were regarded as negative.To capture the key information of the spike protein,three feature encoding algorithms(amino acid composition,AAC;parallel correlation-based pseudo-amino-acid composition,PC-PseAAC and G-gap dipeptide composition,GGAP)were used to train 41 random forest models.The optimal feature with the best performance was identified by the multidimensional scaling method,which was used to explore the pattern of human coronavirus.Results:The 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP(g=3)feature.The predictive model achieved the maximum ACC of 98.18%coupled with the Matthews correlation coefficient(MCC)of 0.9638.Seven clusters for human coronaviruses(229E,NL63,OC43,HKU1,MERS-CoV,SARS-CoV,and SARS-CoV-2)were found.The cluster for SARS-CoV-2 was very close to that for SARS-CoV,which suggests that both of viruses have the same human receptor(angiotensin converting enzyme II).The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously.The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual.Conclusions:The optimal feature(GGAP,g=3)performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple,fast and large-scale manner.The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.
文摘Background:Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus.Methods:The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification.Results:The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method.Conclusions:The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.
基金supported by the National Natural Science Foundation of China(61972109,62172114,61632002).
文摘Background:Coronaviruses can be isolated from bats,civets,pangolins,birds and other wild animals.As an animalorigin pathogen,coronavirus can cross species barrier and cause pandemic in humans.In this study,a deep learning model for early prediction of pandemic risk was proposed based on the sequences of viral genomes.Methods:A total of 3257 genomes were downloaded from the Coronavirus Genome Resource Library.We present a deep learning model of cross-species coronavirus infection that combines a bidirectional gated recurrent unit network with a one-dimensional convolution.The genome sequence of animal-origin coronavirus was directly input to extract features and predict pandemic risk.The best performances were explored with the use of pre-trained DNA vector and attention mechanism.The area under the receiver operating characteristic curve(AUROC)and the area under precision-recall curve(AUPR)were used to evaluate the predictive models.Results:The six specifc models achieved good performances for the corresponding virus groups(1 for AUROC and 1 for AUPR).The general model with pre-training vector and attention mechanism provided excellent predictions for all virus groups(1 for AUROC and 1 for AUPR)while those without pre-training vector or attention mechanism had obvi‑ously reduction of performance(about 5–25%).Re-training experiments showed that the general model has good capabilities of transfer learning(average for six groups:0.968 for AUROC and 0.942 for AUPR)and should give reason‑able prediction for potential pathogen of next pandemic.The artifcial negative data with the replacement of the coding region of the spike protein were also predicted correctly(100%accuracy).With the application of the Python programming language,an easy-to-use tool was created to implements our predictor.Conclusions:Robust deep learning model with pre-training vector and attention mechanism mastered the features from the whole genomes of animal-origin coronaviruses and could predict the risk of cross-species infection for early warning of next pandemic.
基金supported by the Sichuan Science and Technology Program(2018GZDZX0041)the National Key R&D Program of China(2018YFB1005100,2018YFB1005104)Specialized Fund for Science and Technology Platform and Talent Team Project of Guizhou Province(Qian Ke He Ping TaiRen Cai[2016]5609)。
文摘Dynamic geometry software, as a piece of computer-assisted instruction(CAI) software, is closely and deeply associated with mathematics, and is widely applied to mathematics teaching activities in primary and secondary schools. Meanwhile, web technology also has become an important technology for assisting education and teaching. This paper expounds a web-based dynamic geometry software development process, and analyses specific requirements regarding graphical application programming interface(API) required by dynamic geometry software. With experiments and comparison on the two different hypertext markup language(HTML)5 graphical API technologies, i.e., scalable vector graphics(SVG) and Canvas, on different apparatuses and browsers, we draw the conclusion that it is more suitable to adopt Canvas as the graphical API technology for the web-based dynamic geometry software, thus further proposed the principles and methods for an object-oriented Canvas design. The dynamic geometry software based on the newly-designed Canvas has technical advantages and educational value, well incorporating aesthetic education into mathematics education.
基金supported in part by Royal Society YVolfson Research Merit Award WRM/R1/180014,ERC 652976,EPSRC EP/M025268/1,Shenzhen Institute of Computing Sciences,and Beijing Advanced Innovation Center for Big Data and Brain Computing.
文摘This work aims to reduce queries on big data to computations on small data,and hence make querying big data possible under bounded resources.A query Q is boundedly evaluable when posed on any big dataset D,there exists a fraction DQ of D such that Q(D)=Q(DQ),and the cost of identifying DQ is independent of the size of D.It has been shown that with an auxiliary structure known as access schema,many queries in relational algebra(RA)are boundedly evaluable under the set semantics of RA.This paper extends the theory of bounded evaluation to RAaggr,i.e.,RA extended with aggregation,under the bag semantics.(1)We extend access schema to bag access schema,to help us identify DQ for RAaggr queries Q.(2)While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema,we identify special cases that are decidable and practical.(3)In addition,we develop an effective syntax for bounded RAaggr queries,i.e.,a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power.(4)Based on the effective syntax,we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries.(5)As proof of concept,we extend PostgreSQL to support bounded evaluation.We experimentally verify that the extended system improves performance by orders of magnitude.
基金the Sichuan Science and Technology Program of China under Grant Nos.2018GZDZX0041 and 2020YFG0011the National Natural Science Foundation of China under Grant No.11701118,the Guangzhou Academician and Expert Workstation under Grant No.20200115-9Key Disciplines of Guizhou Province of China-Computer Science and Technology under Grant No.ZDXK[2018]007.
文摘A dynamic geometry system,as an important application in the field of geometric constraint solving,is widely used in elementary mathematics education;moreover,the dynamic geometry system is also a fundamental environment for automated theorem proving in geometry.In a geometric constraint solving process,a situation involving a critical point is often encountered,and geometric element degeneracy may occur at this point.Usually,the degeneracy situation must be substantively focused on during the learning and exploration process.However,many degeneracy situations cannot be completely presented even by the well-known dynamic geometry software.In this paper,the mechanisms causing the degeneracy of a geometric element are analyzed,and relevant definitions and formalized descriptions for the problem are provided according to the relevant modern Euclidean geometry theories.To solve the problem,the data structure is optimized,and a domain model design for the geometric element and the constraint relationships thereof in the dynamic geometry system are formed;furthermore,an update algorithm for the element is proposed based on the novel domain model.In addition,instances show that the proposed domain model and the update algorithm can effectively cope with the geometric element degeneracy situations in the geometric constraint solving process,thereby achieving unification of the dynamic geometry drawing and the geometric intuition of the user.