This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consi...This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consisted of two support vector regressors (SVRs). Nonlinear relationship between water quality variables and SPOT 5 spectrum was described by the two SVRs, and semi-supervised co-training algorithm for the SVRs was es-tablished. The model was used for retrieving concentrations of four representative pollution indicators―permangan- ate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) of the Weihe River in Shaanxi Province, China. The spatial distribution map for those variables over a part of the Weihe River was also produced. SVR can be used to implement any nonlinear mapping readily, and semi-supervis- ed learning can make use of both labeled and unlabeled samples. By integrating the two SVRs and using semi-supervised learning, we provide an operational method when paired samples are limited. The results show that it is much better than the multiple statistical regression method, and can provide the whole water pollution condi-tions for management fast and can be extended to hyperspectral remote sensing applications.展开更多
Purpose:This study explores the underlying research topics regarding CRISPR based on the LDA model and figures out trends in knowledge transfer from science to technology in this area over the latest 10 years.Design/m...Purpose:This study explores the underlying research topics regarding CRISPR based on the LDA model and figures out trends in knowledge transfer from science to technology in this area over the latest 10 years.Design/methodology/approach:We collected publications on CRISPR between 2011 and2020 from the Web of Science,and traced all the patents citing them from lens.org.15,904 articles and 18,985 patents in total are downloaded and analyzed.The LDA model was applied to identify underlying research topics in related research.In addition,some indicators were introduced to measure the knowledge transfer from research topics of scientific publications to IPC-4 classes of patents.Findings:The emerging research topics on CRISPR were identified and their evolution over time displayed.Furthermore,a big picture of knowledge transition from research topics to technological classes of patents was presented.We found that for all topics on CRISPR,the average first transition year,the ratio of articles cited by patents,the NPR transition rate are respectively 1.08,15.57%,and 1.19,extremely shorter and more intensive than those of general fields.Moreover,the transition patterns are different among research topics.Research limitations:Our research is limited to publications retrieved from the Web of Science and their citing patents indexed in lens.org.A limitation inherent with LDA analysis is in the manual interpretation and labeling of"topics".Practical implications:Our study provides good references for policy-makers on allocating scientific resources and regulating financial budgets to face challenges related to the transformative technology of CRISPR.Originality/value:The LDA model here is applied to topic identification in the area of transformative researches for the first time,as exemplified on CRISPR.Additionally,the dataset of all citing patents in this area helps to provide a full picture to detect the knowledge transition between S&T.展开更多
Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The li...Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The literatures on single cell research were extracted from Clarivate Analytic’s Web of Science Core Collection between 2009 and 2019.Firstly,bibliometric analyses were performed with Thomson Data Analyzer(TDA).Secondly,topic identification and evolution trends of single cell research was conducted through the LDA topic model.Thirdly,taking the post-discretized method which is used for topic evolution analysis for reference,the topics were also be dispersed to countries to detect the spatial distribution.Findings:The publication of single cell research shows significantly increasing tendency in the last decade.The topics of single cell research field can be divided into three categories,which respectively refers to single cell research methods,mechanism of biological process,and clinical application of single cell technologies.The different trends of these categories indicate that technological innovation drives the development of applied research.The continuous and rapid growth of the topic strength in the field of cancer diagnosis and treatment indicates that this research topic has received extensive attention in recent years.The topic distributions of some countries are relatively balanced,while for the other countries,several topics show significant superiority.Research limitations:The analyzed data of this study only contain those were included in the Web of Science Core Collection.Practical implications:This study provides insights into the research progress regarding single cell field and identifies the most concerned topics which reflect potential opportunities and challenges.The national topic distribution analysis based on the post-discretized analysis method extends topic analysis from time dimension to space dimension.Originality/value:This paper combines bibliometric analysis and LDA model to analyze the evolution trends of single cell research field.The method of extending post-discretized analysis from time dimension to space dimension is distinctive and insightful.展开更多
In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In thi...In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.展开更多
基金Under the auspices of National Natural Science Foundation of China (No. 40671133)Fundamental Research Funds for the Central Universities (No. GK200902015)
文摘This paper proposed a semi-supervised regression model with co-training algorithm based on support vector machine, which was used for retrieving water quality variables from SPOT 5 remote sensing data. The model consisted of two support vector regressors (SVRs). Nonlinear relationship between water quality variables and SPOT 5 spectrum was described by the two SVRs, and semi-supervised co-training algorithm for the SVRs was es-tablished. The model was used for retrieving concentrations of four representative pollution indicators―permangan- ate index (CODmn), ammonia nitrogen (NH3-N), chemical oxygen demand (COD) and dissolved oxygen (DO) of the Weihe River in Shaanxi Province, China. The spatial distribution map for those variables over a part of the Weihe River was also produced. SVR can be used to implement any nonlinear mapping readily, and semi-supervis- ed learning can make use of both labeled and unlabeled samples. By integrating the two SVRs and using semi-supervised learning, we provide an operational method when paired samples are limited. The results show that it is much better than the multiple statistical regression method, and can provide the whole water pollution condi-tions for management fast and can be extended to hyperspectral remote sensing applications.
基金supported by the National Natural Science Foundation of China,Grant numbers:71974167 and 71573225。
文摘Purpose:This study explores the underlying research topics regarding CRISPR based on the LDA model and figures out trends in knowledge transfer from science to technology in this area over the latest 10 years.Design/methodology/approach:We collected publications on CRISPR between 2011 and2020 from the Web of Science,and traced all the patents citing them from lens.org.15,904 articles and 18,985 patents in total are downloaded and analyzed.The LDA model was applied to identify underlying research topics in related research.In addition,some indicators were introduced to measure the knowledge transfer from research topics of scientific publications to IPC-4 classes of patents.Findings:The emerging research topics on CRISPR were identified and their evolution over time displayed.Furthermore,a big picture of knowledge transition from research topics to technological classes of patents was presented.We found that for all topics on CRISPR,the average first transition year,the ratio of articles cited by patents,the NPR transition rate are respectively 1.08,15.57%,and 1.19,extremely shorter and more intensive than those of general fields.Moreover,the transition patterns are different among research topics.Research limitations:Our research is limited to publications retrieved from the Web of Science and their citing patents indexed in lens.org.A limitation inherent with LDA analysis is in the manual interpretation and labeling of"topics".Practical implications:Our study provides good references for policy-makers on allocating scientific resources and regulating financial budgets to face challenges related to the transformative technology of CRISPR.Originality/value:The LDA model here is applied to topic identification in the area of transformative researches for the first time,as exemplified on CRISPR.Additionally,the dataset of all citing patents in this area helps to provide a full picture to detect the knowledge transition between S&T.
基金the Chinese Academy of Sciences literature information capability construction project of 2020“Construction of strategic information research and consultation system in science and technology field”(Grant No.E290001)。
文摘Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The literatures on single cell research were extracted from Clarivate Analytic’s Web of Science Core Collection between 2009 and 2019.Firstly,bibliometric analyses were performed with Thomson Data Analyzer(TDA).Secondly,topic identification and evolution trends of single cell research was conducted through the LDA topic model.Thirdly,taking the post-discretized method which is used for topic evolution analysis for reference,the topics were also be dispersed to countries to detect the spatial distribution.Findings:The publication of single cell research shows significantly increasing tendency in the last decade.The topics of single cell research field can be divided into three categories,which respectively refers to single cell research methods,mechanism of biological process,and clinical application of single cell technologies.The different trends of these categories indicate that technological innovation drives the development of applied research.The continuous and rapid growth of the topic strength in the field of cancer diagnosis and treatment indicates that this research topic has received extensive attention in recent years.The topic distributions of some countries are relatively balanced,while for the other countries,several topics show significant superiority.Research limitations:The analyzed data of this study only contain those were included in the Web of Science Core Collection.Practical implications:This study provides insights into the research progress regarding single cell field and identifies the most concerned topics which reflect potential opportunities and challenges.The national topic distribution analysis based on the post-discretized analysis method extends topic analysis from time dimension to space dimension.Originality/value:This paper combines bibliometric analysis and LDA model to analyze the evolution trends of single cell research field.The method of extending post-discretized analysis from time dimension to space dimension is distinctive and insightful.
基金supported in part by the National Key R&D Program of China under Grant 2018YFA0701601part by the National Natural Science Foundation of China(Grant No.U22A2002,61941104,62201605)part by Tsinghua University-China Mobile Communications Group Co.,Ltd.Joint Institute。
文摘In the upcoming large-scale Internet of Things(Io T),it is increasingly challenging to defend against malicious traffic,due to the heterogeneity of Io T devices and the diversity of Io T communication protocols.In this paper,we propose a semi-supervised learning-based approach to detect malicious traffic at the access side.It overcomes the resource-bottleneck problem of traditional malicious traffic defenders which are deployed at the victim side,and also is free of labeled traffic data in model training.Specifically,we design a coarse-grained behavior model of Io T devices by self-supervised learning with unlabeled traffic data.Then,we fine-tune this model to improve its accuracy in malicious traffic detection by adopting a transfer learning method using a small amount of labeled data.Experimental results show that our method can achieve the accuracy of 99.52%and the F1-score of 99.52%with only 1%of the labeled training data based on the CICDDoS2019 dataset.Moreover,our method outperforms the stateof-the-art supervised learning-based methods in terms of accuracy,precision,recall and F1-score with 1%of the training data.