Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rar...Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rare work paid attention to the effectiveness of active learning on it.In this paper,we proposed a deep active learning model with bidirectional encoder representations from transformers(BERT)for text classification.BERT takes advantage of the self-attention mechanism to integrate contextual information,which is beneficial to accelerate the convergence of training.As for the process of active learning,we design an instance selection strategy based on posterior probabilities Margin,Intra-correlation and Inter-correlation(MII).Selected instances are characterized by small margin,low intra-cohesion and high inter-cohesion.We conduct extensive experiments and analytics with our methods.The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets.The results show that our method outperforms the baselines in terms of accuracy.展开更多
Entity resolution(ER)is a significant task in data integration,which aims to detect all entity profiles that correspond to the same real-world entity.Due to its inherently quadratic complexity,blocking was proposed to...Entity resolution(ER)is a significant task in data integration,which aims to detect all entity profiles that correspond to the same real-world entity.Due to its inherently quadratic complexity,blocking was proposed to ameliorate ER,and it offers an approximate solution which clusters similar entity profiles into blocks so that it suffices to perform pair-wise comparisons inside each block in order to reduce the computational cost of ER.This paper presents a comprehensive survey on existing blocking technologies.We summarize and analyze all classic blocking methods with emphasis on different blocking construction and optimization techniques.We find that traditional blocking ER methods which depend on the fixed schema may not work in the context of highly heterogeneous information spaces.How to use schema information flexibly is of great significance to efficiently process data with the new features of this era.Machine learning is an important tool for ER,but end-to-end and efficient machine learning methods still need to be explored.We also sum up and provide the most promising trend for future work from the directions of real-time blocking ER,incremental blocking ER,deep learning with ER,etc.展开更多
Ultrathin two-dimensional (2D) porous Zn(OH)2 nanosheets (PNs) were fabricated by means of one-dimensional Cu nanowires as backbones. The PNs have thickness of approximately 3.8 nm and pore size of 4-10 nm. To f...Ultrathin two-dimensional (2D) porous Zn(OH)2 nanosheets (PNs) were fabricated by means of one-dimensional Cu nanowires as backbones. The PNs have thickness of approximately 3.8 nm and pore size of 4-10 nm. To form "smart" porous nanosheets, DNA aptamers were covalently conjugated to the surface of PNs. These ultrathin nanosheets show good biocompatibility, effident cellular uptaker and promising pH-stimulated drug release.展开更多
基金This work is supported by National Natural Science Foundation of China(61402225,61728204)Innovation Funding(NJ20160028,NT2018028,NS2018057)+1 种基金Aeronautical Science Foundation of China(2016551500)State Key Laboratory for smart grid protection and operation control Foundation,and the Science and Technology Funds from National State Grid Ltd.,China degree and Graduate Education Fund.
文摘Active learning has been widely utilized to reduce the labeling cost of supervised learning.By selecting specific instances to train the model,the performance of the model was improved within limited steps.However,rare work paid attention to the effectiveness of active learning on it.In this paper,we proposed a deep active learning model with bidirectional encoder representations from transformers(BERT)for text classification.BERT takes advantage of the self-attention mechanism to integrate contextual information,which is beneficial to accelerate the convergence of training.As for the process of active learning,we design an instance selection strategy based on posterior probabilities Margin,Intra-correlation and Inter-correlation(MII).Selected instances are characterized by small margin,low intra-cohesion and high inter-cohesion.We conduct extensive experiments and analytics with our methods.The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets.The results show that our method outperforms the baselines in terms of accuracy.
基金supported by the National Natural Science Foundation of China under Grant No.61772268the Fundamental Research Funds for the Central Universities of China under Grant Nos.NS2018057 and NJ2018014.
文摘Entity resolution(ER)is a significant task in data integration,which aims to detect all entity profiles that correspond to the same real-world entity.Due to its inherently quadratic complexity,blocking was proposed to ameliorate ER,and it offers an approximate solution which clusters similar entity profiles into blocks so that it suffices to perform pair-wise comparisons inside each block in order to reduce the computational cost of ER.This paper presents a comprehensive survey on existing blocking technologies.We summarize and analyze all classic blocking methods with emphasis on different blocking construction and optimization techniques.We find that traditional blocking ER methods which depend on the fixed schema may not work in the context of highly heterogeneous information spaces.How to use schema information flexibly is of great significance to efficiently process data with the new features of this era.Machine learning is an important tool for ER,but end-to-end and efficient machine learning methods still need to be explored.We also sum up and provide the most promising trend for future work from the directions of real-time blocking ER,incremental blocking ER,deep learning with ER,etc.
基金The authors are grateful to Dr. Kathryn Williams for her critical comments during the preparation of this manuscript. This work is supported by grants awarded by the National Institutes of Health (No. GM079359 and CA133086). This work is also supported by the National Basic Research Program of China (No. 2011CB911000), National Natural Science Foundation of China (Nos. 21221003 and 21327009) and China National Instrumentation Program (No. 2011YQ03012412).
文摘Ultrathin two-dimensional (2D) porous Zn(OH)2 nanosheets (PNs) were fabricated by means of one-dimensional Cu nanowires as backbones. The PNs have thickness of approximately 3.8 nm and pore size of 4-10 nm. To form "smart" porous nanosheets, DNA aptamers were covalently conjugated to the surface of PNs. These ultrathin nanosheets show good biocompatibility, effident cellular uptaker and promising pH-stimulated drug release.