Structured illumination microscopy(SIM)has been widely used in live-cell superresolution(SR)imaging.However,conventional physical model-based SIM SR reconstruction algorithms are prone to artifacts in handling raw ima...Structured illumination microscopy(SIM)has been widely used in live-cell superresolution(SR)imaging.However,conventional physical model-based SIM SR reconstruction algorithms are prone to artifacts in handling raw images with low signal-to-noise ratios(SNRs).Deep-learning(DL)-based methods can address this challenge but may lead to degradation and hallucinations.By combining the physical inversion model with a total deep variation(TDV)regularization,we propose a hybrid restoration method(TDV-SIM)that outperforms conventional or DL methods in suppressing artifacts and hallucinations while maintaining resolutions.We demonstrate the performance superiority of TDV-SIM in restoring actin filaments,endoplasmic reticulum,and mitochondrial cristae from extremely low SNR raw images.Thus TDV-SIM represents the ideal method for prolonged live-cell SR imaging with minimal exposure and photodamage.Overall,TDV-SIM proves the power of integrating model-based reconstruction methods with DL ones,possibly leading to the rapid exploration of similar strategies in high-fidelity reconstructions of other microscopy methods.展开更多
Unlike small molecules,the topological complexity of macromolecules remains largely unexplored due to the huge synthetic challenge.Herein,we report the development of orthogonal active templates for concise and select...Unlike small molecules,the topological complexity of macromolecules remains largely unexplored due to the huge synthetic challenge.Herein,we report the development of orthogonal active templates for concise and selective synthesis of protein[n]heterocatenanes toward protein olympiadanes.An active template(AT-Snoop)was first developed based on the isopeptide-bond-forming RrgA domain with comparable efficiency and excellent orthogonality to the previously reported active template(AT-Spy)based on the CnaB2 domain.Their combination facilitated the selective synthesis of protein[n]catenanes from multiple components in one step and the resulting structures were verified by sodium dodecyl sulfate-polyacrylamide gel electrophoresis,size exclusion chromatography,liquid chromatography-mass spectrometry,and proteolytic digestion experiments.The results offered a promising solution to tackling the daunting challenge of precision synthesis of protein olympiadane with five distinct ring components.Not only did the success provide new tools for proteintopology engin eering but alsospurred and fueled the future exploitation of topology-related functional benefits in protein science.展开更多
The expansion of protein topological diversity requires new and efficient synthetic tools.Herein,we report the second and third generations of the SpyStapler-mediated SpyTag/BDTag ligation system for the efficient syn...The expansion of protein topological diversity requires new and efficient synthetic tools.Herein,we report the second and third generations of the SpyStapler-mediated SpyTag/BDTag ligation system for the efficient synthesis of 4-arm star proteins and the repurposing of the third generation as an active template to enable the synthesis of higher-order protein[n]catenanes(n=3,4,and 5).SpyStapler003 has a higher affinity to its cognate SpyTag and BDTag reactive pairs relative to the original SpyStapler.Hence,it can overcome much more profound steric hindrance in protein ligation and improve the efficiency of the resulting active template tool to facilitate the construction of radial protein[n]catenanes.Various proteins of interest,such as dihydrofolate reductase and the nanobody KN035,can be modularly incorporated into the[n]catenanes with intact activity.Combination of passive and active template strategies gives rise to linear protein[4]catenanes,which further expands the current topological diversity.Moreover,higher-order protein catenation not only leads to enhanced thermal stability and proteolytic resistance but also higher affinity of the nanobody via multivalent effects.Our study provides tools useful for bioconjugation and new topological protein scaffolds for the multivalent display of enzymes and antibodies.展开更多
Action potentials(APs)in neurons are generated at the axon initial segment(AIS).AP dynamics,including initiation and propagation,are intimately associated with neuronal excitability and neurotransmitter release kineti...Action potentials(APs)in neurons are generated at the axon initial segment(AIS).AP dynamics,including initiation and propagation,are intimately associated with neuronal excitability and neurotransmitter release kinetics.Most learning and memory studies at the single-neuron level have relied on the use of animal models,most notably rodents.Here,we studied AP initiation and propagation in cultured hippocampal neurons from Sprague-Dawley(SD)rats and C57BL/6(C57)mice with genetically encoded voltage indicator(GEVI)-based voltage imaging.Our data showed that APs traveled bidirectionally in neurons from both species;forward-propagating APs(fpAPs)had a different speed than backpropagating APs(bpAPs).Additionally,we observed distinct AP propagation characteristics in AISs emerging from the somatic envelope compared to those originating from dendrites.Compared with rat neurons,mouse neurons exhibited higher bpAP speed and lower fpAP speed,more distally located ankyrin G(AnkG)in AISs,and longer Nav1.2 lengths in AISs.Moreover,during AIS plasticity,AnkG and Nav1.2 showed distal shifts in location and shorter lengths of labeled AISs in rat neurons;in mouse neurons,however,they showed a longer AnkG-labeled length and more distal Nav1.2 location.Our findings suggest that hippocampal neurons in SD rats and C57 mice may have different AP propagation speeds,different AnkG and Nav1.2 patterns in the AIS,and different AIS plasticity properties,indicating that comparisons between these species must be carefully considered.展开更多
Vision plays a peculiar role in intelligence.Visual information,forming a large part of the sensory information,is fed into the human brain to formulate various types of cognition and behaviours that make humans becom...Vision plays a peculiar role in intelligence.Visual information,forming a large part of the sensory information,is fed into the human brain to formulate various types of cognition and behaviours that make humans become intelligent agents.Recent advances have led to the development of brain-inspired algorithms and models for machine vision.One of the key components of these methods is the utilization of the computational principles underlying biological neurons.Additionally,advanced experimental neuroscience techniques have generated different types of neural signals that carry essential visual information.Thus,there is a high demand for mapping out functional models for reading out visual information from neural signals.Here,we briefly review recent progress on this issue with a focus on how machine learning techniques can help in the development of models for contending various types of neural signals,from fine-scale neural spikes and single-cell calcium imaging to coarse-scale electroencephalography(EEG)and functional magnetic resonance imaging recordings of brain signals.展开更多
A sememe is defined as the minimum semantic unit of languages in linguistics.Sememe knowledge bases are built by manually annotating sememes for words and phrases.HowNet is the most well-known sememe knowledge base.It...A sememe is defined as the minimum semantic unit of languages in linguistics.Sememe knowledge bases are built by manually annotating sememes for words and phrases.HowNet is the most well-known sememe knowledge base.It has been extensively utilized in many natural language processing tasks in the era of statistical natural language processing and proven to be effective and helpful to understanding and using languages.In the era of deep learning,although data are thought to be of vital importance,there are some studies working on incorporating sememe knowledge bases like HowNet into neural network models to enhance system performance.Some successful attempts have been made in the tasks including word representation learning,language modeling,semantic composition,etc.In addition,considering the high cost of manual annotation and update for sememe knowledge bases,some work has tried to use machine learning methods to automatically predict sememes for words and phrases to expand sememe knowledge bases.Besides,some studies try to extend HowNet to other languages by automatically predicting sememes for words and phrases in a new language.In this paper,we summarize recent studies on application and expansion of sememe knowledge bases and point out some future directions of research on sememes.展开更多
In fluorescence microscopy,computational algorithms have been developed to suppress noise,enhance contrast,and even enable super-resolution(SR).However,the local quality of the images may vary on multiple scales,and t...In fluorescence microscopy,computational algorithms have been developed to suppress noise,enhance contrast,and even enable super-resolution(SR).However,the local quality of the images may vary on multiple scales,and these differences can lead to misconceptions.Current mapping methods fail to finely estimate the local quality,challenging to associate the SR scale content.Here,we develop a rolling Fourier ring correlation(rFRC)method to evaluate the reconstruction uncertainties down to SR scale.To visually pinpoint regions with low reliability,a filtered rFRC is combined with a modified resolution-scaled error map(RSM),offering a comprehensive and concise map for further examination.We demonstrate their performances on various SR imaging modalities,and the resulting quantitative maps enable better SR images integrated from different reconstructions.Overall,we expect that our framework can become a routinely used tool for biologists in assessing their image datasets in general and inspire further advances in the rapidly developing field of computational imaging.展开更多
Chemical topology refers to the three-dimensional arrangement(i.e.,connectivity and spatial relationship)of a molecule's constituent atoms and bonds.The molecular mechanism for translation defines the linear confi...Chemical topology refers to the three-dimensional arrangement(i.e.,connectivity and spatial relationship)of a molecule's constituent atoms and bonds.The molecular mechanism for translation defines the linear configuration of all nascent proteins.Nontrivial protein topology arises only upon post-translational processing events and often imparts functional benefits such as enhanced stability,making topology a unique dimension for protein engineering.Utilizing the assembly-reaction synergy,our group has developed several methods for the effective and convenient cellular synthesis of a variety of topological proteins,such as lasso proteins,protein rotaxanes,and protein catenanes.The work opens the access to new protein classes and paves the road toward illustrating the topological effects on structure-function relationship of proteins,which lays solid foundation for exploring topological proteins’practical application.展开更多
Network embedding,as an approach to learning low-dimensional representations of nodes,has been proved extremely useful in many applications,e.g.,node classification and link prediction.Unfortunately,existing network e...Network embedding,as an approach to learning low-dimensional representations of nodes,has been proved extremely useful in many applications,e.g.,node classification and link prediction.Unfortunately,existing network embed-ding models are vulnerable to random or adversarial perturbations,which may degrade the performance of network em-bedding when being applied to downstream tasks.To achieve robust network embedding,researchers introduce adversari-al training to regularize the embedding learning process by training on a mixture of adversarial examples and original ex-amples.However,existing methods generate adversarial examples heuristically,failing to guarantee the imperceptibility of generated adversarial examples,and thus limit the power of adversarial training.In this paper,we propose a novel method Identity-Preserving Adversarial Training(IPAT)for network embedding,which generates imperceptible adversarial exam-ples with explicit identity-preserving regularization.We formalize such identity-preserving regularization as a multi-class classification problem where each node represents a class,and we encourage each adversarial example to be discriminated as the class of its original node.Extensive experimental results on real-world datasets demonstrate that our proposed IPAT method significantly improves the robustness of network embedding models and the generalization of the learned node representations on various downstream tasks.展开更多
Most State-Of-The-Art(SOTA) Neural Machine Translation(NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtaina...Most State-Of-The-Art(SOTA) Neural Machine Translation(NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtainable.However,the translation quality of NMT for morphologically rich languages is still unsatisfactory,mainly because of the data sparsity problem encountered in Low-Resource Languages(LRLs).In the low-resource NMT paradigm,Transfer Learning(TL) has been developed into one of the most efficient methods.It is difficult to train the model on high-resource languages to include the information in both parent and child models,as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature.In this work,we aim to address this issue by proposing the language-independent Hybrid Transfer Learning(HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises.First,we train the High-Resource Languages(HRLs) as the parent model with its vocabularies.Then,we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model.Finally,we fine-tune the morphologically rich child model using a hybrid model.Besides,we explore some exciting discoveries on the original TL approach.Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani(Az) and Uzbek(Uz).Meanwhile,our approach is practical and significantly better,achieving improvements of up to 4:94 and 4:84 BLEU points for low-resource child languages Az ! Zh and Uz ! Zh,respectively.展开更多
Brain-inspired computer vision aims to learn from biological systems to develop advanced image processing techniques.However,its progress so far is not impressing.We recognize that a main obstacle comes from that the ...Brain-inspired computer vision aims to learn from biological systems to develop advanced image processing techniques.However,its progress so far is not impressing.We recognize that a main obstacle comes from that the current paradigm for brain-inspired computer vision has not captured the fundamental nature of biological vision,i.e.,the biological vision is targeted for processing spatio-temporal patterns.Recently,a new paradigm for developing brain-inspired computer vision is emerging,which emphasizes on the spatio-temporal nature of visual signals and the brain-inspired models for processing this type of data.In this paper,we review some recent primary works towards this new paradigm,including the development of spike cameras which acquire spiking signals directly from visual scenes,and the development of computational models learned from neural systems that are specialized to process spatio-temporal patterns,including models for object detection,tracking,and recognition.We also discuss about the future directions to improve the paradigm.展开更多
Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new...Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new approach of choosing weights based on an approximation of generalized cross validation.The resultant least squares model average estimators are proved to be asymptotically optimal in the sense of achieving the lowest possible squared errors.Especially,the optimality is built under both discrete and continuous weigh sets.Compared with the existing approach based on Mallows criterion,the conditions required for the asymptotic optimality of the proposed method are more reasonable.Simulation studies and real data application show good performance of the proposed estimators.展开更多
Web search provides a promising way for people to obtain information and has been extensively studied.With the surge of deep learning and large-scale pre-training techniques,various neural information retrieval models...Web search provides a promising way for people to obtain information and has been extensively studied.With the surge of deep learning and large-scale pre-training techniques,various neural information retrieval models are proposed,and they have demonstrated the power for improving search(especially,the ranking)quality.All these existing search methods follow a common paradigm,i.e.,index-retrieve-rerank,where they first build an index of all documents based on document terms(i.e.,sparse inverted index)or representation vectors(i.e.,dense vector index),then retrieve and rerank retrieved documents based on the similarity between the query and documents via ranking models.In this paper,we explore a new paradigm of information retrieval without an explicit index but only with a pre-trained model.Instead,all of the knowledge of the documents is encoded into model parameters,which can be regarded as a differentiable indexer and optimized in an end-to-end manner.Specifically,we propose a pre-trained model-based information retrieval(IR)system called DynamicRetriever,which directly returns document identifiers for a given query.Under such a framework,we implement two variants to explore how to train the model from scratch and how to combine the advantages of dense retrieval models.Compared with existing search methods,the model-based IR system parameterizes the traditional static index with a pre-training model,which converts the document semantic mapping into a dynamic and updatable process.Extensive experiments conducted on the public search benchmark Microsoft machine reading comprehension(MS MARCO)verify the effectiveness and potential of our proposed new paradigm for information retrieval.展开更多
Genetically encoded covalent peptide tagging tools such as the SpyTag/SpyCatcher reactive pair,have been demonstrated versatile and useful for protein modification.Herein,we present a superpositively charged SpyCatche...Genetically encoded covalent peptide tagging tools such as the SpyTag/SpyCatcher reactive pair,have been demonstrated versatile and useful for protein modification.Herein,we present a superpositively charged SpyCatcher bearing a theoretical net charge of+21 capable of accomplishing multiple unrelated independent tasks to enrich this toolbox and cultivate new functions.The SpyCatcher(+21)possessed stimuli-responsive reactivity toward SpyTag and could serve as a potent and general platform for the delivery of proteins,including RNaseAinto HeLa cells.Remarkably,the delivered RNase A caused substantial proliferation inhibition toward HeLa cells.In addition,the superpositively charged SpyCatcher could form coacervate with plasmid DNA for further study of gene delivery and liquid–liquid phase separation.These findings demonstrate the robustness of the SpyTag/SpyCatcher structure against surface mutation and the prospect of applying supercharging technology on diverse functional proteins to create moonlighting proteins.展开更多
The combinatorial optimization problem(COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the...The combinatorial optimization problem(COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the problem scale increases.Troubled by this,researchers may prefer fast methods even if they are not exact,so approximation algorithms,heuristic algorithms,and machine learning have been proposed.Some works proposed chaotic simulated annealing(CSA)based on the Hopfield neural network and did a good job.However,CSA is not something that current general-purpose processors can handle easily,and there is no special hardware for it.To efficiently perform CSA,we propose a software and hardware co-design.In software,we quantize the weight and output using appropriate bit widths,and then modify the calculations that are not suitable for hardware implementation.In hardware,we design a specialized processing-in-memory hardware architecture named COPPER based on the memristor.COPPER is capable of efficiently running the modified quantized CSA algorithm and supporting the pipeline further acceleration.The results show that COPPER can perform CSA remarkably well in both speed and energy.展开更多
基金support by grants from the National Science and Technology Major Project Program(Grant Nos.2021YFA1100201,2022YFF0712500,and 2022YFC3400600)the National Natural Science Foundation of China(Grant Nos.92054301,81925022,92150301,32170691,62103071,and 31901061)+5 种基金the Beijing Natural Science Foundation(Grant No.Z20J00059)the Lingang Laboratory(Grant No.LG-QS-202206-06)Clinical Medicine Plus X-Young Scholars Project,Peking University,the Fundamental Research Funds for the Central Universities,the Natural Science Foundation of Chongqing(Grant No.cstc2021jcyj-msxmX0526)the Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJQN202100630)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDA16021200)the High-Performance Computing Platform of Peking University.
文摘Structured illumination microscopy(SIM)has been widely used in live-cell superresolution(SR)imaging.However,conventional physical model-based SIM SR reconstruction algorithms are prone to artifacts in handling raw images with low signal-to-noise ratios(SNRs).Deep-learning(DL)-based methods can address this challenge but may lead to degradation and hallucinations.By combining the physical inversion model with a total deep variation(TDV)regularization,we propose a hybrid restoration method(TDV-SIM)that outperforms conventional or DL methods in suppressing artifacts and hallucinations while maintaining resolutions.We demonstrate the performance superiority of TDV-SIM in restoring actin filaments,endoplasmic reticulum,and mitochondrial cristae from extremely low SNR raw images.Thus TDV-SIM represents the ideal method for prolonged live-cell SR imaging with minimal exposure and photodamage.Overall,TDV-SIM proves the power of integrating model-based reconstruction methods with DL ones,possibly leading to the rapid exploration of similar strategies in high-fidelity reconstructions of other microscopy methods.
基金the National Natural Science Foundation of China(grant nos.21991132,21925102,92056118,22101010,22201017,and 22201016)the National Key R&D Program of China(grant no.2020YFA0908100)+1 种基金Beijing National Laboratory for Molecular Sciences(grant no.BNLMS-CXXM-202006)supported by the National Center for Protein Sciences at Peking University.
文摘Unlike small molecules,the topological complexity of macromolecules remains largely unexplored due to the huge synthetic challenge.Herein,we report the development of orthogonal active templates for concise and selective synthesis of protein[n]heterocatenanes toward protein olympiadanes.An active template(AT-Snoop)was first developed based on the isopeptide-bond-forming RrgA domain with comparable efficiency and excellent orthogonality to the previously reported active template(AT-Spy)based on the CnaB2 domain.Their combination facilitated the selective synthesis of protein[n]catenanes from multiple components in one step and the resulting structures were verified by sodium dodecyl sulfate-polyacrylamide gel electrophoresis,size exclusion chromatography,liquid chromatography-mass spectrometry,and proteolytic digestion experiments.The results offered a promising solution to tackling the daunting challenge of precision synthesis of protein olympiadane with five distinct ring components.Not only did the success provide new tools for proteintopology engin eering but alsospurred and fueled the future exploitation of topology-related functional benefits in protein science.
基金support from the National Natural Science Foundation of China(grant nos.21991132,21925102,92056118,22101010,22201016,and 22201017)the National Key R&D Program of China(grant no.2020YFA0908100)the Beijing National Laboratory for Molecular Sciences(grant no.BNLMSCXXM-202006)。
文摘The expansion of protein topological diversity requires new and efficient synthetic tools.Herein,we report the second and third generations of the SpyStapler-mediated SpyTag/BDTag ligation system for the efficient synthesis of 4-arm star proteins and the repurposing of the third generation as an active template to enable the synthesis of higher-order protein[n]catenanes(n=3,4,and 5).SpyStapler003 has a higher affinity to its cognate SpyTag and BDTag reactive pairs relative to the original SpyStapler.Hence,it can overcome much more profound steric hindrance in protein ligation and improve the efficiency of the resulting active template tool to facilitate the construction of radial protein[n]catenanes.Various proteins of interest,such as dihydrofolate reductase and the nanobody KN035,can be modularly incorporated into the[n]catenanes with intact activity.Combination of passive and active template strategies gives rise to linear protein[4]catenanes,which further expands the current topological diversity.Moreover,higher-order protein catenation not only leads to enhanced thermal stability and proteolytic resistance but also higher affinity of the nanobody via multivalent effects.Our study provides tools useful for bioconjugation and new topological protein scaffolds for the multivalent display of enzymes and antibodies.
基金supported by the National Science and Technology Innovation 2030-Major Program of “Brain Science and Brain-Like Research”(2022ZD0211800)National Natural Science Foundation of China General Research Grant (81971679, 21727806,31771147)+4 种基金Major Research Grant (91632305, 32088101)Ministry of Science and Technology (2018YFA0507600, 2017YFA0503600)Qidong-PKU SLS Innovation Fund (2016000663)Fundamental Research Funds for the Central Universities and National Key R&D Program of China (2020AAA0105200)sponsored by the Bayer Investigator Award。
文摘Action potentials(APs)in neurons are generated at the axon initial segment(AIS).AP dynamics,including initiation and propagation,are intimately associated with neuronal excitability and neurotransmitter release kinetics.Most learning and memory studies at the single-neuron level have relied on the use of animal models,most notably rodents.Here,we studied AP initiation and propagation in cultured hippocampal neurons from Sprague-Dawley(SD)rats and C57BL/6(C57)mice with genetically encoded voltage indicator(GEVI)-based voltage imaging.Our data showed that APs traveled bidirectionally in neurons from both species;forward-propagating APs(fpAPs)had a different speed than backpropagating APs(bpAPs).Additionally,we observed distinct AP propagation characteristics in AISs emerging from the somatic envelope compared to those originating from dendrites.Compared with rat neurons,mouse neurons exhibited higher bpAP speed and lower fpAP speed,more distally located ankyrin G(AnkG)in AISs,and longer Nav1.2 lengths in AISs.Moreover,during AIS plasticity,AnkG and Nav1.2 showed distal shifts in location and shorter lengths of labeled AISs in rat neurons;in mouse neurons,however,they showed a longer AnkG-labeled length and more distal Nav1.2 location.Our findings suggest that hippocampal neurons in SD rats and C57 mice may have different AP propagation speeds,different AnkG and Nav1.2 patterns in the AIS,and different AIS plasticity properties,indicating that comparisons between these species must be carefully considered.
基金supported by National Natural Science Foundation of China(Nos.62176003 and 62088102)the Royal Society Newton Advanced Fellowship of the UK(No.NAF-R1-191082)。
文摘Vision plays a peculiar role in intelligence.Visual information,forming a large part of the sensory information,is fed into the human brain to formulate various types of cognition and behaviours that make humans become intelligent agents.Recent advances have led to the development of brain-inspired algorithms and models for machine vision.One of the key components of these methods is the utilization of the computational principles underlying biological neurons.Additionally,advanced experimental neuroscience techniques have generated different types of neural signals that carry essential visual information.Thus,there is a high demand for mapping out functional models for reading out visual information from neural signals.Here,we briefly review recent progress on this issue with a focus on how machine learning techniques can help in the development of models for contending various types of neural signals,from fine-scale neural spikes and single-cell calcium imaging to coarse-scale electroencephalography(EEG)and functional magnetic resonance imaging recordings of brain signals.
基金the National Key Research and Development Program of China(2018 YFB1004503)the National Natural Science Foundation of China(NSFC Grant Nos.61732008,61532010).
文摘A sememe is defined as the minimum semantic unit of languages in linguistics.Sememe knowledge bases are built by manually annotating sememes for words and phrases.HowNet is the most well-known sememe knowledge base.It has been extensively utilized in many natural language processing tasks in the era of statistical natural language processing and proven to be effective and helpful to understanding and using languages.In the era of deep learning,although data are thought to be of vital importance,there are some studies working on incorporating sememe knowledge bases like HowNet into neural network models to enhance system performance.Some successful attempts have been made in the tasks including word representation learning,language modeling,semantic composition,etc.In addition,considering the high cost of manual annotation and update for sememe knowledge bases,some work has tried to use machine learning methods to automatically predict sememes for words and phrases to expand sememe knowledge bases.Besides,some studies try to extend HowNet to other languages by automatically predicting sememes for words and phrases in a new language.In this paper,we summarize recent studies on application and expansion of sememe knowledge bases and point out some future directions of research on sememes.
基金supported by the National Natural Science Foundation of China(grant no.T2222009 to H.L.,grant no.32227802 to L.C.,grant no.81925022 to L.C.,grant no.92054301 to L.C.,grant no.62305083 to W.Z.,grant no.12174208 to P.L.,grant no.32301257 to S.Z.,grant no.32222022 to Y.J.,grant no.32071458 to H.M.)the National Key Research and Development Program of China(grant no.2022YFC3400600 to L.C.)+4 种基金the Natural Science Foundation of Heilongjiang Province(grant no.YQ2021F013 to H.L.)the Beijing Natural Science Foundation(grant no.Z20J00059 to L.C.)the Guangdong Major Project of Basic and Applied Basic Research(grant no.2020B0301030009 to P.L.)the China Postdoctoral Science Foundation(grant no.2023T160163 to W.Z.,grant no.2022M720971 to W.Z.)the Heilongjiang Provincial Postdoctoral Science Foundation(grant no.LBH-Z22027 to W.Z.).L.C.acknowledges support from the High-performance Computing Platform of Peking University。
文摘In fluorescence microscopy,computational algorithms have been developed to suppress noise,enhance contrast,and even enable super-resolution(SR).However,the local quality of the images may vary on multiple scales,and these differences can lead to misconceptions.Current mapping methods fail to finely estimate the local quality,challenging to associate the SR scale content.Here,we develop a rolling Fourier ring correlation(rFRC)method to evaluate the reconstruction uncertainties down to SR scale.To visually pinpoint regions with low reliability,a filtered rFRC is combined with a modified resolution-scaled error map(RSM),offering a comprehensive and concise map for further examination.We demonstrate their performances on various SR imaging modalities,and the resulting quantitative maps enable better SR images integrated from different reconstructions.Overall,we expect that our framework can become a routinely used tool for biologists in assessing their image datasets in general and inspire further advances in the rapidly developing field of computational imaging.
基金We are grateful for the financial support from the National Key R&D Program of China(No.2020YFA0908100)the National Natural Science Foundation of China(Nos.21991132,21925102,92056118,22101010,22201016,22201017)Beijing National Laboratory for Molecular Sciences(BNLMS-CXXM-202006)。
文摘Chemical topology refers to the three-dimensional arrangement(i.e.,connectivity and spatial relationship)of a molecule's constituent atoms and bonds.The molecular mechanism for translation defines the linear configuration of all nascent proteins.Nontrivial protein topology arises only upon post-translational processing events and often imparts functional benefits such as enhanced stability,making topology a unique dimension for protein engineering.Utilizing the assembly-reaction synergy,our group has developed several methods for the effective and convenient cellular synthesis of a variety of topological proteins,such as lasso proteins,protein rotaxanes,and protein catenanes.The work opens the access to new protein classes and paves the road toward illustrating the topological effects on structure-function relationship of proteins,which lays solid foundation for exploring topological proteins’practical application.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos.U21B2046 and 62102402the National Key Research and Development Program of China under Grant No.2020AAA0105200.
文摘Network embedding,as an approach to learning low-dimensional representations of nodes,has been proved extremely useful in many applications,e.g.,node classification and link prediction.Unfortunately,existing network embed-ding models are vulnerable to random or adversarial perturbations,which may degrade the performance of network em-bedding when being applied to downstream tasks.To achieve robust network embedding,researchers introduce adversari-al training to regularize the embedding learning process by training on a mixture of adversarial examples and original ex-amples.However,existing methods generate adversarial examples heuristically,failing to guarantee the imperceptibility of generated adversarial examples,and thus limit the power of adversarial training.In this paper,we propose a novel method Identity-Preserving Adversarial Training(IPAT)for network embedding,which generates imperceptible adversarial exam-ples with explicit identity-preserving regularization.We formalize such identity-preserving regularization as a multi-class classification problem where each node represents a class,and we encourage each adversarial example to be discriminated as the class of its original node.Extensive experimental results on real-world datasets demonstrate that our proposed IPAT method significantly improves the robustness of network embedding models and the generalization of the learned node representations on various downstream tasks.
基金supported by the National Key R&D Program of China (No. 2017YFB0202204)the National Natural Science Foundation of China (Nos. 61925601, 61761166008, and 61772302)+1 种基金Beijing Advanced Innovation Center for Language Resources (No. TYR17002)the NExT ++ project which supported by the National Research Foundation, Prime Ministers Office, Singapore under its IRC@Singapore Funding Initiative。
文摘Most State-Of-The-Art(SOTA) Neural Machine Translation(NMT) systems today achieve outstanding results based only on large parallel corpora.The large-scale parallel corpora for high-resource languages is easily obtainable.However,the translation quality of NMT for morphologically rich languages is still unsatisfactory,mainly because of the data sparsity problem encountered in Low-Resource Languages(LRLs).In the low-resource NMT paradigm,Transfer Learning(TL) has been developed into one of the most efficient methods.It is difficult to train the model on high-resource languages to include the information in both parent and child models,as well as the initially trained model that only contains the lexicon features and word embeddings of the parent model instead of the child languages feature.In this work,we aim to address this issue by proposing the language-independent Hybrid Transfer Learning(HTL) method for LRLs by sharing lexicon embedding between parent and child languages without leveraging back translation or manually injecting noises.First,we train the High-Resource Languages(HRLs) as the parent model with its vocabularies.Then,we combine the parent and child language pairs using the oversampling method to train the hybrid model initialized by the previously parent model.Finally,we fine-tune the morphologically rich child model using a hybrid model.Besides,we explore some exciting discoveries on the original TL approach.Experimental results show that our model consistently outperforms five SOTA methods in two languages Azerbaijani(Az) and Uzbek(Uz).Meanwhile,our approach is practical and significantly better,achieving improvements of up to 4:94 and 4:84 BLEU points for low-resource child languages Az ! Zh and Uz ! Zh,respectively.
基金supported by National Key R&D Program of China(No.2020AAA0105200)Science and Technology Innovation 2030-Brain Science and Brain-inspired Intelligence Project(No.2021ZD0200204)+1 种基金National Key Research and Development Program of China(No.2020AAA0130401)Huawei Technology Co.,Ltd,China(No.YBN2019105137)。
文摘Brain-inspired computer vision aims to learn from biological systems to develop advanced image processing techniques.However,its progress so far is not impressing.We recognize that a main obstacle comes from that the current paradigm for brain-inspired computer vision has not captured the fundamental nature of biological vision,i.e.,the biological vision is targeted for processing spatio-temporal patterns.Recently,a new paradigm for developing brain-inspired computer vision is emerging,which emphasizes on the spatio-temporal nature of visual signals and the brain-inspired models for processing this type of data.In this paper,we review some recent primary works towards this new paradigm,including the development of spike cameras which acquire spiking signals directly from visual scenes,and the development of computational models learned from neural systems that are specialized to process spatio-temporal patterns,including models for object detection,tracking,and recognition.We also discuss about the future directions to improve the paradigm.
基金by National Key R&D Program of China(2020AAA0105200)the Ministry of Science and Technology of China(Grant no.2016YFB0502301)+1 种基金the National Natural Science Foundation of China(Grant nos.11871294,12031016,11971323,71925007,72042019,72091212 and 12001559)a joint grant from the Academy for Multidisciplinary Studies,Capital Normal University.
文摘Frequentist model averaging has received much attention from econometricians and statisticians in recent years.A key problem with frequentist model average estimators is the choice of weights.This paper develops a new approach of choosing weights based on an approximation of generalized cross validation.The resultant least squares model average estimators are proved to be asymptotically optimal in the sense of achieving the lowest possible squared errors.Especially,the optimality is built under both discrete and continuous weigh sets.Compared with the existing approach based on Mallows criterion,the conditions required for the asymptotic optimality of the proposed method are more reasonable.Simulation studies and real data application show good performance of the proposed estimators.
基金supported by National Natural Science Foundation of China(Nos.61872370 and 61832017)Beijing Outstanding Young Scientist Program(No.BJJWZYJH012019100020098)Beijing Academy of Artificial Intelligence(BAAI),the Outstanding Innovative Talents Cultivation Funded Programs 2021 of Renmin University of China,and Intelligent Social Governance Platform,Major Innovation&Planning Interdisciplinary Platform for the“Double-First Class”Initiative,Renmin University of China.
文摘Web search provides a promising way for people to obtain information and has been extensively studied.With the surge of deep learning and large-scale pre-training techniques,various neural information retrieval models are proposed,and they have demonstrated the power for improving search(especially,the ranking)quality.All these existing search methods follow a common paradigm,i.e.,index-retrieve-rerank,where they first build an index of all documents based on document terms(i.e.,sparse inverted index)or representation vectors(i.e.,dense vector index),then retrieve and rerank retrieved documents based on the similarity between the query and documents via ranking models.In this paper,we explore a new paradigm of information retrieval without an explicit index but only with a pre-trained model.Instead,all of the knowledge of the documents is encoded into model parameters,which can be regarded as a differentiable indexer and optimized in an end-to-end manner.Specifically,we propose a pre-trained model-based information retrieval(IR)system called DynamicRetriever,which directly returns document identifiers for a given query.Under such a framework,we implement two variants to explore how to train the model from scratch and how to combine the advantages of dense retrieval models.Compared with existing search methods,the model-based IR system parameterizes the traditional static index with a pre-training model,which converts the document semantic mapping into a dynamic and updatable process.Extensive experiments conducted on the public search benchmark Microsoft machine reading comprehension(MS MARCO)verify the effectiveness and potential of our proposed new paradigm for information retrieval.
基金support fromthe National Key R&D Program of China(grant nos.2020YFA0908100,2020AAA0105200)the National Natural Science Foundation of China(NSFC,grant nos.21991132,21925102,92056118,8200907120,8200907121)Beijing National Laboratory for Molecular Sciences(BNLMS,grant no.BNLMS-CXXM-202006).
文摘Genetically encoded covalent peptide tagging tools such as the SpyTag/SpyCatcher reactive pair,have been demonstrated versatile and useful for protein modification.Herein,we present a superpositively charged SpyCatcher bearing a theoretical net charge of+21 capable of accomplishing multiple unrelated independent tasks to enrich this toolbox and cultivate new functions.The SpyCatcher(+21)possessed stimuli-responsive reactivity toward SpyTag and could serve as a potent and general platform for the delivery of proteins,including RNaseAinto HeLa cells.Remarkably,the delivered RNase A caused substantial proliferation inhibition toward HeLa cells.In addition,the superpositively charged SpyCatcher could form coacervate with plasmid DNA for further study of gene delivery and liquid–liquid phase separation.These findings demonstrate the robustness of the SpyTag/SpyCatcher structure against surface mutation and the prospect of applying supercharging technology on diverse functional proteins to create moonlighting proteins.
基金Project supported by the National Natural Science Foundation of China(Nos.61832020,62032001,92064006,and 62274036)the Beijing Academy of Artificial Intelligence(BAAI)of Chinathe 111 Project of China(No.B18001)。
文摘The combinatorial optimization problem(COP),which aims to find the optimal solution in discrete space,is fundamental in various fields.Unfortunately,many COPs are NP-complete,and require much more time to solve as the problem scale increases.Troubled by this,researchers may prefer fast methods even if they are not exact,so approximation algorithms,heuristic algorithms,and machine learning have been proposed.Some works proposed chaotic simulated annealing(CSA)based on the Hopfield neural network and did a good job.However,CSA is not something that current general-purpose processors can handle easily,and there is no special hardware for it.To efficiently perform CSA,we propose a software and hardware co-design.In software,we quantize the weight and output using appropriate bit widths,and then modify the calculations that are not suitable for hardware implementation.In hardware,we design a specialized processing-in-memory hardware architecture named COPPER based on the memristor.COPPER is capable of efficiently running the modified quantized CSA algorithm and supporting the pipeline further acceleration.The results show that COPPER can perform CSA remarkably well in both speed and energy.