The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited stand...The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited standard samples with labeled certified concentrations are available. A novel semi-supervised LIBS quantitative analysis method is proposed, based on co-training regression model with selection of effective unlabeled samples. The main idea of the proposed method is to obtain better regression performance by adding effective unlabeled samples in semisupervised learning. First, effective unlabeled samples are selected according to the testing samples by Euclidean metric. Two original regression models based on least squares support vector machine with different parameters are trained by the labeled samples separately, and then the effective unlabeled samples predicted by the two models are used to enlarge the training dataset based on labeling confidence estimation. The final predictions of the proposed method on the testing samples will be determined by weighted combinations of the predictions of two updated regression models. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples were carried out, in which 5 samples with labeled concentrations and 11 unlabeled samples were used to train the regression models and the remaining 7 samples were used for testing. With the numbers of effective unlabeled samples increasing, the root mean square error of the proposed method went down from 1.80% to 0.84% and the relative prediction error was reduced from 9.15% to 4.04%.展开更多
A fluoroimmunoassay method using unlabeled europium chelate is described.The principle is similar to that of fluoroimmunoassay method using lanthanide chelate as labels.The procedure is simple because labeling process...A fluoroimmunoassay method using unlabeled europium chelate is described.The principle is similar to that of fluoroimmunoassay method using lanthanide chelate as labels.The procedure is simple because labeling process is omitted.The detection limit is about 10^(10) mol/L antigen.The relative standard deviation of immunoassay is less than 10%.The recoveries of human serum albumin and estradiol protein conjugate are 96-105% and 111% respectively.展开更多
A fluoroimmunoassay method using unlabeled Terbium chelate is described.The principle is similar to that of fluoroimmunoassay method using lanthanide chelate as labels.The procedure is simpte because labeling process ...A fluoroimmunoassay method using unlabeled Terbium chelate is described.The principle is similar to that of fluoroimmunoassay method using lanthanide chelate as labels.The procedure is simpte because labeling process is unnecessary.The recovery of HSA and albumin in urine is 107% and 95% respectively.The standard deviation is tess than 10%.展开更多
In the last decade,there has been significant progress in time series classification.However,in real-world in-dustrial settings,it is expensive and difficult to obtain high-quality labeled data.Therefore,the positive ...In the last decade,there has been significant progress in time series classification.However,in real-world in-dustrial settings,it is expensive and difficult to obtain high-quality labeled data.Therefore,the positive and unlabeled learning(PU-learning)problem has become more and more popular recently.The current PU-learning approaches of the time series data suffer from low accuracy due to the lack of negative labeled time series.In this paper,we propose a novel shapelet based two-step(2STEP)PU-learning approach.In the first step,we generate shapelet features based on the posi-tive time series,which are used to select a set of negative examples.In the second step,based on both positive and nega-tive time series,we select the final features and build the classification model.The experimental results show that our 2STEP approach can improve the average F1 score on 15 datasets by 9.1%compared with the baselines,and achieves the highest F1 score on 10 out of 15 time series datasets.展开更多
Identifying and correcting grammatical errors in the text written by non-native writers have received increasing attention in recent years. Although a number of annotated corpora have been established to facilitate da...Identifying and correcting grammatical errors in the text written by non-native writers have received increasing attention in recent years. Although a number of annotated corpora have been established to facilitate data-driven grammatical error detection and correction approaches, they are still limited in terms of quantity and coverage because human annotation is labor-intensive, time-consuming, and expensive. In this work, we propose to utilize unlabeled data to train neural network based grammatical error detection models. The basic idea is to cast error detection as a binary classification problem and derive positive and negative training examples from unlabeled data. We introduce an attention-based neural network to capture long-distance dependencies that influence the word being detected. Experiments show that the proposed approach significantly outperforms SVM and convolutional networks with fixed-size context window.展开更多
Nowadays,large numbers of smart sensors(e.g.,road-side cameras)which com-municate with nearby base stations could launch distributed denial of services(DDoS)attack storms in intelligent transportation systems.DDoS att...Nowadays,large numbers of smart sensors(e.g.,road-side cameras)which com-municate with nearby base stations could launch distributed denial of services(DDoS)attack storms in intelligent transportation systems.DDoS attacks disable the services provided by base stations.Thus in this paper,considering the uneven communication traffic ows and privacy preserving,we give a hidden Markov model-based prediction model by utilizing the multi-step characteristic of DDoS with a federated learning framework to predict whether DDoS attacks will happen on base stations in the future.However,in the federated learning,we need to consider the problem of poisoning attacks due to malicious participants.The poisoning attacks will lead to the intelligent transportation systems paralysis without security protection.Traditional poisoning attacks mainly apply to the classi cation model with labeled data.In this paper,we propose a reinforcement learning-based poisoningmethod speci cally for poisoning the prediction model with unlabeled data.Besides,previous related defense strategies rely on validation datasets with labeled data in the server.However,it is unrealistic since the local training datasets are not uploaded to the server due to privacy preserving,and our datasets are also unlabeled.Furthermore,we give a validation dataset-free defense strategy based on Dempster-Shafer(D-S)evidence theory avoiding anomaly aggregation to obtain a robust global model for precise DDoS prediction.In our experiments,we simulate 3000 points in combination with DARPA2000 dataset to carry out evaluations.The results indicate that our poisoning method can successfully poison the global prediction model with unlabeled data in a short time.Meanwhile,we compare our proposed defense algorithm with three popularly used defense algorithms.The results show that our defense method has a high accuracy rate of excluding poisoners and can obtain a high attack prediction probability.展开更多
Support vector machines (SVMs) aim to find an optimal separating hyper-plane that maximizes separation between two classes of training examples (more precisely, maximizes the margin between the two classes of examp...Support vector machines (SVMs) aim to find an optimal separating hyper-plane that maximizes separation between two classes of training examples (more precisely, maximizes the margin between the two classes of examples). The choice of the cost parameter for training the SVM model is always a critical issue. This analysis studies how the cost parameter determines the hyper-plane; especially for classifications using only positive data and unlabeled data. An algorithm is given for the entire solution path by choosing the 'best' cost parameter while training the SVM model. The performance of the algorithm is compared with conventional implementations that use default values as the cost parameter on two synthetic data sets and two real-world data sets. The results show that the algorithm achieves better results when dealing with positive data and unlabeled classification.展开更多
Super-resolution(SR)microscopy has dramatically enhanced our understanding of biological processes.However,scattering media in thick specimens severely limits the spatial resolution,often rendering the images unclear ...Super-resolution(SR)microscopy has dramatically enhanced our understanding of biological processes.However,scattering media in thick specimens severely limits the spatial resolution,often rendering the images unclear or indistinguishable.Additionally,live-cell imaging faces challenges in achieving high temporal resolution for fast-moving subcellular structures.Here,we present the principles of a synthetic wave microscopy(SWM)to extract three-dimensional information from thick unlabeled specimens,where photobleaching and phototoxicity are avoided.SWM exploits multiple-wave interferometry to reveal the specimen’s phase information in the area of interest,which is not affected by the scattering media in the optical path.SWM achieves~0.42λ/NA resolution at an imaging speed of up to 106 pixels/s.SWM proves better temporal resolution and sensitivity than the most conventional microscopes currently available while maintaining exceptional SR and anti-scattering capabilities.Penetrating through the scattering media is challenging for conventional imaging techniques.Remarkably,SWM retains its efficacy even in conditions of low signal-to-noise ratios.It facilitates the visualization of dynamic subcellular structures in live cells,encompassing tubular endoplasmic reticulum(ER),lipid droplets,mitochondria,and lysosomes.展开更多
文本分类是信息检索的关键问题之一.提取更多的可信反例和构造准确高效的分类器是PU(positive and unlabeled)文本分类的两个重要问题.然而,在现有的可信反例提取方法中,很多方法提取的可信反例数量较少,构建的分类器质量有待提高.分别...文本分类是信息检索的关键问题之一.提取更多的可信反例和构造准确高效的分类器是PU(positive and unlabeled)文本分类的两个重要问题.然而,在现有的可信反例提取方法中,很多方法提取的可信反例数量较少,构建的分类器质量有待提高.分别针对这两个重要步骤提供了一种基于聚类的半监督主动分类方法.与传统的反例提取方法不同,利用聚类技术和正例文档应与反例文档共享尽可能少的特征项这一特点,从未标识数据集中尽可能多地移除正例,从而可以获得更多的可信反例.结合SVM主动学习和改进的Rocchio构建分类器,并采用改进的TFIDF(term frequency inverse document frequency)进行特征提取,可以显著提高分类的准确度.分别在3个不同的数据集中测试了分类结果(RCV1,Reuters-21578,20 Newsgoups).实验结果表明,基于聚类寻找可信反例可以在保持较低错误率的情况下获取更多的可信反例,而且主动学习方法的引入也显著提升了分类精度.展开更多
准确提取城市不透水面对生态环境、水热循环及热岛效应等研究具有重要意义。该文利用WorldView高分辨遥感影像,提出基于PUL(Positive and Unlabeled Learning)算法的高分辨率影像城市不透水面提取方法,该方法不需要负样本数据,只需少量...准确提取城市不透水面对生态环境、水热循环及热岛效应等研究具有重要意义。该文利用WorldView高分辨遥感影像,提出基于PUL(Positive and Unlabeled Learning)算法的高分辨率影像城市不透水面提取方法,该方法不需要负样本数据,只需少量的正样本和未标记样本即可训练分类模型。结果显示,PUL算法的提取结果优于一类支持向量机(OCSVM)以及最大熵(MAXENT)模型。使用不同正样本量时,PUL的提取结果总体精度和kappa系数均优于OCSVM和MAXENT,最高总体精度为91.27%,最高kappa系数可达0.8255,可快速、有效地从高分辨率遥感影像中提取不透水面。展开更多
For the classification problem in practice,one of the challenging issues is to obtain enough labeled data for training.Moreover,even if such labeled data has been sufficiently accumulated,most datasets often exhibit l...For the classification problem in practice,one of the challenging issues is to obtain enough labeled data for training.Moreover,even if such labeled data has been sufficiently accumulated,most datasets often exhibit long-tailed distribution with heavy class imbalance,which results in a biased model towards a majority class.To alleviate such class imbalance,semisupervised learning methods using additional unlabeled data have been considered.However,as a matter of course,the accuracy is much lower than that from supervised learning.In this study,under the assumption that additional unlabeled data is available,we propose the iterative semi-supervised learning algorithms,which iteratively correct the labeling of the extra unlabeled data based on softmax probabilities.The results show that the proposed algorithms provide the accuracy as high as that from the supervised learning.To validate the proposed algorithms,we tested on the two scenarios:with the balanced unlabeled dataset and with the imbalanced unlabeled dataset.Under both scenarios,our proposed semi-supervised learning algorithms provided higher accuracy than previous state-of-the-arts.Code is available at https://github.com/HeewonChung92/iterative-semi-learning.展开更多
An unsupervised clustering\|based intrusion detection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With th...An unsupervised clustering\|based intrusion detection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal cluster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesn't need labeled training data sets. The experiment concludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99.展开更多
Nowadays,emerging mobile medical technology and disease prevention become new trends of disease prevention and control.Based on this technology,we present disease prediction models based on transfer learning.Breast ca...Nowadays,emerging mobile medical technology and disease prevention become new trends of disease prevention and control.Based on this technology,we present disease prediction models based on transfer learning.Breast cancer disease data has been used to build our model.According to the neural networks,the basic model has been provided.With unlabeled data,transfer learning is a appropriate way to revise the module to increase accuracy.The test results show that the algorithm is suitable for data classification,especially for unlabeled health data.展开更多
Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medica...Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medical,financial,and military domains,labeled data is very scarce,while unlabeled data is readily available.Previous studies have used unlabeled data to enrich word representations,but a large amount of entity information in unlabeled data is neglected,which may be beneficial to the NER task.In this study,we propose a semi-supervised method for NER tasks,which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels.Pseudo labels are automatically generated for unlabeled data and used as if they were true labels.Our semi-supervised framework includes three steps:constructing an optimal single neural model for a specific NER task,learning a module that evaluates pseudo labels,and creating new labeled data and improving the NER model iteratively.Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model.Even when we use only pre-trained static word embeddings and do not rely on any external knowledge,our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.展开更多
INTRODUCTION Atopic dermatitis (AD) is a common chronic inflammatory skin disorder that is characterized by dry skin and disturbed skin barrier functions. Mutations in the filaggrin (FLG) gene, the gene coding pro...INTRODUCTION Atopic dermatitis (AD) is a common chronic inflammatory skin disorder that is characterized by dry skin and disturbed skin barrier functions. Mutations in the filaggrin (FLG) gene, the gene coding profilaggrin/filaggrin, have a great impact on the epidermal barrier function and are an important predisposing factor for AD. However, in both Europeans and Asians,展开更多
基金supported by National Natural Science Foundation of China (No. 51674032)
文摘The accuracy of laser-induced breakdown spectroscopy(LIBS) quantitative method is greatly dependent on the amount of certified standard samples used for training. However, in practical applications, only limited standard samples with labeled certified concentrations are available. A novel semi-supervised LIBS quantitative analysis method is proposed, based on co-training regression model with selection of effective unlabeled samples. The main idea of the proposed method is to obtain better regression performance by adding effective unlabeled samples in semisupervised learning. First, effective unlabeled samples are selected according to the testing samples by Euclidean metric. Two original regression models based on least squares support vector machine with different parameters are trained by the labeled samples separately, and then the effective unlabeled samples predicted by the two models are used to enlarge the training dataset based on labeling confidence estimation. The final predictions of the proposed method on the testing samples will be determined by weighted combinations of the predictions of two updated regression models. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples were carried out, in which 5 samples with labeled concentrations and 11 unlabeled samples were used to train the regression models and the remaining 7 samples were used for testing. With the numbers of effective unlabeled samples increasing, the root mean square error of the proposed method went down from 1.80% to 0.84% and the relative prediction error was reduced from 9.15% to 4.04%.
基金This work was supported by National Natural Science Foundation of China.
文摘A fluoroimmunoassay method using unlabeled europium chelate is described.The principle is similar to that of fluoroimmunoassay method using lanthanide chelate as labels.The procedure is simple because labeling process is omitted.The detection limit is about 10^(10) mol/L antigen.The relative standard deviation of immunoassay is less than 10%.The recoveries of human serum albumin and estradiol protein conjugate are 96-105% and 111% respectively.
基金supported by National Commission of Natural Science Foundation of China.
文摘A fluoroimmunoassay method using unlabeled Terbium chelate is described.The principle is similar to that of fluoroimmunoassay method using lanthanide chelate as labels.The procedure is simpte because labeling process is unnecessary.The recovery of HSA and albumin in urine is 107% and 95% respectively.The standard deviation is tess than 10%.
基金supported by the National Key Research and Development Program of China under Grant No.2020YFB1710001.
文摘In the last decade,there has been significant progress in time series classification.However,in real-world in-dustrial settings,it is expensive and difficult to obtain high-quality labeled data.Therefore,the positive and unlabeled learning(PU-learning)problem has become more and more popular recently.The current PU-learning approaches of the time series data suffer from low accuracy due to the lack of negative labeled time series.In this paper,we propose a novel shapelet based two-step(2STEP)PU-learning approach.In the first step,we generate shapelet features based on the posi-tive time series,which are used to select a set of negative examples.In the second step,based on both positive and nega-tive time series,we select the final features and build the classification model.The experimental results show that our 2STEP approach can improve the average F1 score on 15 datasets by 9.1%compared with the baselines,and achieves the highest F1 score on 10 out of 15 time series datasets.
文摘Identifying and correcting grammatical errors in the text written by non-native writers have received increasing attention in recent years. Although a number of annotated corpora have been established to facilitate data-driven grammatical error detection and correction approaches, they are still limited in terms of quantity and coverage because human annotation is labor-intensive, time-consuming, and expensive. In this work, we propose to utilize unlabeled data to train neural network based grammatical error detection models. The basic idea is to cast error detection as a binary classification problem and derive positive and negative training examples from unlabeled data. We introduce an attention-based neural network to capture long-distance dependencies that influence the word being detected. Experiments show that the proposed approach significantly outperforms SVM and convolutional networks with fixed-size context window.
基金the National Key Research and Development Project(2018YFB2100801)in part by the National Natural Science Foundation of China(61972080)+1 种基金in part by the Shanghai Rising-Star Program(19QA1400300)in part by the Open Research Project from the Key Laboratory of the Ministry of Education for Embedded System and Service Computing(ESSCKF2021-01).
文摘Nowadays,large numbers of smart sensors(e.g.,road-side cameras)which com-municate with nearby base stations could launch distributed denial of services(DDoS)attack storms in intelligent transportation systems.DDoS attacks disable the services provided by base stations.Thus in this paper,considering the uneven communication traffic ows and privacy preserving,we give a hidden Markov model-based prediction model by utilizing the multi-step characteristic of DDoS with a federated learning framework to predict whether DDoS attacks will happen on base stations in the future.However,in the federated learning,we need to consider the problem of poisoning attacks due to malicious participants.The poisoning attacks will lead to the intelligent transportation systems paralysis without security protection.Traditional poisoning attacks mainly apply to the classi cation model with labeled data.In this paper,we propose a reinforcement learning-based poisoningmethod speci cally for poisoning the prediction model with unlabeled data.Besides,previous related defense strategies rely on validation datasets with labeled data in the server.However,it is unrealistic since the local training datasets are not uploaded to the server due to privacy preserving,and our datasets are also unlabeled.Furthermore,we give a validation dataset-free defense strategy based on Dempster-Shafer(D-S)evidence theory avoiding anomaly aggregation to obtain a robust global model for precise DDoS prediction.In our experiments,we simulate 3000 points in combination with DARPA2000 dataset to carry out evaluations.The results indicate that our poisoning method can successfully poison the global prediction model with unlabeled data in a short time.Meanwhile,we compare our proposed defense algorithm with three popularly used defense algorithms.The results show that our defense method has a high accuracy rate of excluding poisoners and can obtain a high attack prediction probability.
基金Supported by the National Natural Science Foundation of China(Nos.90604025 and 60703059)the Chinese Young Faculty Research Fund(No.20070003093)
文摘Support vector machines (SVMs) aim to find an optimal separating hyper-plane that maximizes separation between two classes of training examples (more precisely, maximizes the margin between the two classes of examples). The choice of the cost parameter for training the SVM model is always a critical issue. This analysis studies how the cost parameter determines the hyper-plane; especially for classifications using only positive data and unlabeled data. An algorithm is given for the entire solution path by choosing the 'best' cost parameter while training the SVM model. The performance of the algorithm is compared with conventional implementations that use default values as the cost parameter on two synthetic data sets and two real-world data sets. The results show that the algorithm achieves better results when dealing with positive data and unlabeled classification.
基金support from CAS West Light Grant (xbzgzdsys-202206)National Key Research and Development Program of China (2021YFA1401003).
文摘Super-resolution(SR)microscopy has dramatically enhanced our understanding of biological processes.However,scattering media in thick specimens severely limits the spatial resolution,often rendering the images unclear or indistinguishable.Additionally,live-cell imaging faces challenges in achieving high temporal resolution for fast-moving subcellular structures.Here,we present the principles of a synthetic wave microscopy(SWM)to extract three-dimensional information from thick unlabeled specimens,where photobleaching and phototoxicity are avoided.SWM exploits multiple-wave interferometry to reveal the specimen’s phase information in the area of interest,which is not affected by the scattering media in the optical path.SWM achieves~0.42λ/NA resolution at an imaging speed of up to 106 pixels/s.SWM proves better temporal resolution and sensitivity than the most conventional microscopes currently available while maintaining exceptional SR and anti-scattering capabilities.Penetrating through the scattering media is challenging for conventional imaging techniques.Remarkably,SWM retains its efficacy even in conditions of low signal-to-noise ratios.It facilitates the visualization of dynamic subcellular structures in live cells,encompassing tubular endoplasmic reticulum(ER),lipid droplets,mitochondria,and lysosomes.
文摘文本分类是信息检索的关键问题之一.提取更多的可信反例和构造准确高效的分类器是PU(positive and unlabeled)文本分类的两个重要问题.然而,在现有的可信反例提取方法中,很多方法提取的可信反例数量较少,构建的分类器质量有待提高.分别针对这两个重要步骤提供了一种基于聚类的半监督主动分类方法.与传统的反例提取方法不同,利用聚类技术和正例文档应与反例文档共享尽可能少的特征项这一特点,从未标识数据集中尽可能多地移除正例,从而可以获得更多的可信反例.结合SVM主动学习和改进的Rocchio构建分类器,并采用改进的TFIDF(term frequency inverse document frequency)进行特征提取,可以显著提高分类的准确度.分别在3个不同的数据集中测试了分类结果(RCV1,Reuters-21578,20 Newsgoups).实验结果表明,基于聚类寻找可信反例可以在保持较低错误率的情况下获取更多的可信反例,而且主动学习方法的引入也显著提升了分类精度.
文摘准确提取城市不透水面对生态环境、水热循环及热岛效应等研究具有重要意义。该文利用WorldView高分辨遥感影像,提出基于PUL(Positive and Unlabeled Learning)算法的高分辨率影像城市不透水面提取方法,该方法不需要负样本数据,只需少量的正样本和未标记样本即可训练分类模型。结果显示,PUL算法的提取结果优于一类支持向量机(OCSVM)以及最大熵(MAXENT)模型。使用不同正样本量时,PUL的提取结果总体精度和kappa系数均优于OCSVM和MAXENT,最高总体精度为91.27%,最高kappa系数可达0.8255,可快速、有效地从高分辨率遥感影像中提取不透水面。
基金This work was supported by the National Research Foundation of Korea(No.2020R1A2C1014829)by the Korea Medical Device Development Fund grant,which is funded by the Government of the Republic of Korea Korea government(the Ministry of Science and ICT+2 种基金the Ministry of Trade,Industry and Energythe Ministry of Health and Welfareand the Ministry of Food and Drug Safety)(grant KMDF_PR_20200901_0095).
文摘For the classification problem in practice,one of the challenging issues is to obtain enough labeled data for training.Moreover,even if such labeled data has been sufficiently accumulated,most datasets often exhibit long-tailed distribution with heavy class imbalance,which results in a biased model towards a majority class.To alleviate such class imbalance,semisupervised learning methods using additional unlabeled data have been considered.However,as a matter of course,the accuracy is much lower than that from supervised learning.In this study,under the assumption that additional unlabeled data is available,we propose the iterative semi-supervised learning algorithms,which iteratively correct the labeling of the extra unlabeled data based on softmax probabilities.The results show that the proposed algorithms provide the accuracy as high as that from the supervised learning.To validate the proposed algorithms,we tested on the two scenarios:with the balanced unlabeled dataset and with the imbalanced unlabeled dataset.Under both scenarios,our proposed semi-supervised learning algorithms provided higher accuracy than previous state-of-the-arts.Code is available at https://github.com/HeewonChung92/iterative-semi-learning.
文摘An unsupervised clustering\|based intrusion detection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal cluster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesn't need labeled training data sets. The experiment concludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99.
文摘Nowadays,emerging mobile medical technology and disease prevention become new trends of disease prevention and control.Based on this technology,we present disease prediction models based on transfer learning.Breast cancer disease data has been used to build our model.According to the neural networks,the basic model has been provided.With unlabeled data,transfer learning is a appropriate way to revise the module to increase accuracy.The test results show that the algorithm is suitable for data classification,especially for unlabeled health data.
基金Project supported by the National Key Research and Development Program of China(No.2016YFB0201305)the National Natural Science Foundation of China(No.61872376)。
文摘Deep learning models have achieved state-of-the-art performance in named entity recognition(NER);the good performance,however,relies heavily on substantial amounts of labeled data.In some specific areas such as medical,financial,and military domains,labeled data is very scarce,while unlabeled data is readily available.Previous studies have used unlabeled data to enrich word representations,but a large amount of entity information in unlabeled data is neglected,which may be beneficial to the NER task.In this study,we propose a semi-supervised method for NER tasks,which learns to create high-quality labeled data by applying a pre-trained module to filter out erroneous pseudo labels.Pseudo labels are automatically generated for unlabeled data and used as if they were true labels.Our semi-supervised framework includes three steps:constructing an optimal single neural model for a specific NER task,learning a module that evaluates pseudo labels,and creating new labeled data and improving the NER model iteratively.Experimental results on two English NER tasks and one Chinese clinical NER task demonstrate that our method further improves the performance of the best single neural model.Even when we use only pre-trained static word embeddings and do not rely on any external knowledge,our method achieves comparable performance to those state-of-the-art models on the CoNLL-2003 and OntoNotes 5.0 English NER tasks.
文摘INTRODUCTION Atopic dermatitis (AD) is a common chronic inflammatory skin disorder that is characterized by dry skin and disturbed skin barrier functions. Mutations in the filaggrin (FLG) gene, the gene coding profilaggrin/filaggrin, have a great impact on the epidermal barrier function and are an important predisposing factor for AD. However, in both Europeans and Asians,