Most existing blockchain schemes are based on the design concept“openness and transparency”to realize data security,which usually require transaction data to be presented in the form of plaintext.However,it inevitab...Most existing blockchain schemes are based on the design concept“openness and transparency”to realize data security,which usually require transaction data to be presented in the form of plaintext.However,it inevitably brings the issues with respect to data privacy and operating performance.In this paper,we proposed a novel blockchain scheme called Cipherchain,which can process and maintain transaction data in the form of ciphertext while the characteristics of immutability and auditability are guaranteed.Specifically in our scheme,transactions can be encrypted locally based on a searchable encryption scheme called multi-user public key encryption with conjunctive keyword search(mPECK),and can be accessed by multiple specific participants after appended to the globally consistent distributed ledger.By introducing execution-consensus-update paradigm of transaction flow,Cipherchain cannot only make it possible for transaction data to exist in the form of ciphertext,but also guarantee the overall system performance not greatly affected by cryptographic operations and other local execution work.In addition,Cipherchain is a promising scheme to realize the technology combination of“blockchain+cloud computing”and“permissioned blockchain+public blockchain”.展开更多
While the Harris-Todaro model is a traditional approach used in researching the urban-rural dichotomy,it fails to explain families’goals to maximize their current utility in terms of intertemporal decision-making con...While the Harris-Todaro model is a traditional approach used in researching the urban-rural dichotomy,it fails to explain families’goals to maximize their current utility in terms of intertemporal decision-making con‐ditions.To fill this gap,in this paper,an urban-rural dichotomy model involving labor migration and educa‐tion is established,in which it is assumed that family utility derives from consumption and children’s educa‐tional achievement.The steady-state path derived through the Bellman equation suggests that increasing edu‐cational investment and family education intensity leads to a significant urban-rural difference in children’s educational achievement.Compared with the traditional Harris-Todaro model,the transversality condition is loosened in this model,while the unavailability of loans constrains migrant families.Four hypotheses are made and tested using an empirical study.An ordinary least squares regression was used in the analysis,but due to the endogeneity caused by missing variables,the instrumental variable method and two-stage least squares regression were used.The results demonstrate that the household registration system can explain 44.5%of the educational achievement difference,and the initial difference is inflated 4.73 times after nine years of compulsory education.This divergence could increase the differences caused by household registra‐tion status,resulting in larger income gaps and intergenerational heredity of identities.展开更多
The springing up of large language models(LLMs)has shifted the community from single-task-orientated natural language processing(NLP)research to a holistic end-to-end multi-task learning paradigm.Along this line of re...The springing up of large language models(LLMs)has shifted the community from single-task-orientated natural language processing(NLP)research to a holistic end-to-end multi-task learning paradigm.Along this line of research endeavors in the area,LLM-based prompting methods have attracted much attention,partially due to the technological advantages brought by prompt engineering(PE)as well as the underlying NLP principles disclosed by various prompting methods.Traditional supervised learning usually requires training a model based on labeled data and then making predictions.In contrast,PE methods directly use the powerful capabilities of existing LLMs(e.g.,GPT-3 and GPT-4)via composing appropriate prompts,especially under few-shot or zero-shot scenarios.Facing the abundance of studies related to the prompting and the ever-evolving nature of this field,this article aims to 1)illustrate a novel perspective to review existing PE methods within the well-established communication theory framework,2)facilitate a better/deeper understanding of developing trends of existing PE methods used in three typical tasks,and 3)shed light on promising research directions for future PE methods.展开更多
China is experiencing accelerated urbanisation,with a large number of people moving from rural to urban areas[1].It has resulted in large losses in the net primary production(NPP),biodiversity and carbon stocks and an...China is experiencing accelerated urbanisation,with a large number of people moving from rural to urban areas[1].It has resulted in large losses in the net primary production(NPP),biodiversity and carbon stocks and an increase in environmental pollution and CO_(2)emissions[2–4].In 2015,196 countries signed the Paris Agreement and committed to setting long-term goals to jointly manage climate change and reduce their individual emissions,aiming to control the increase in global average temperature from the pre-industrial level to below 2℃and to curtail the temperature rise within 1.5℃till the end of the 21st century[5].China is bolstering its efforts to achieve the climate change mitigation goals and has announced a plan for achieving carbon neutrality by 2060[6].The carbon neutrality goal poses a challenge to the current policies promoting rapid urbanisation across China.展开更多
In the past decades,artificial intelligence(AI)has achieved unprecedented success,where statistical models become the central entity in AI.However,the centralized training and inference paradigm for building and using...In the past decades,artificial intelligence(AI)has achieved unprecedented success,where statistical models become the central entity in AI.However,the centralized training and inference paradigm for building and using these models is facing more and more privacy and legal challenges.To bridge the gap between data privacy and the need for data fusion,an emerging AI paradigm feder-ated learning(FL)has emerged as an approach for solving data silos and data privacy problems.Based on secure distributed AI,feder-ated learning emphasizes data security throughout the lifecycle,which includes the following steps:data preprocessing,training,evalu-ation,and deployments.FL keeps data security by using methods,such as secure multi-party computation(MPC),differential privacy,and hardware solutions,to build and use distributed multiple-party machine-learning systems and statistical models over different data sources.Besides data privacy concerns,we argue that the concept of“model”matters,when developing and deploying federated models,they are easy to expose to various kinds of risks including plagiarism,illegal copy,and misuse.To address these issues,we introduce FedIPR,a novel ownership verification scheme,by embedding watermarks into FL models to verify the ownership of FL models and protect model intellectual property rights(IPR or IP-right for short).While security is at the core of FL,there are still many articles re-ferred to distributed machine learning with no security guarantee as“federated learning”,which are not satisfied with the FL definition supposed to be.To this end,in this paper,we reiterate the concept of federated learning and propose secure federated learning(SFL),where the ultimate goal is to build trustworthy and safe AI with strong privacy-preserving and IP-right-preserving.We provide a com-prehensive overview of existing works,including threats,attacks,and defenses in each phase of SFL from the lifecycle perspective.展开更多
Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to anno...Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets.展开更多
The rapid accumulation of large-scale single-cell RNA-seq datasets from multiple institutions presents remarkable opportunities for automatically cell annotations through integrative analyses.However,the privacy issue...The rapid accumulation of large-scale single-cell RNA-seq datasets from multiple institutions presents remarkable opportunities for automatically cell annotations through integrative analyses.However,the privacy issue has existed but being ignored,since we are limited to access and utilize all the reference datasets distributed in different institutions globally due to the prohibited data transmission across institutions by data regulation laws.To this end,we present scPrivacy,which is the first and generalized automatically single-cell type identification prototype to facilitate single cell annotations in a data privacy-preserving collaboration manner.We evaluated scPrivacy on a comprehensive set of publicly available benchmark datasets for single-cell type identification to stimulate the scenario that the reference datasets are rapidly generated and distributed in multiple institutions,while they are prohibited to be integrated directly or exposed to each other due to the data privacy regulations,demonstrating its effectiveness,time efficiency and robustness for privacy-preserving integration of multiple institutional datasets in single cell annotations.展开更多
基金This work is supported by the NSFC(Grant Nos.61671087,61962009,61003287)the Fok Ying Tong Education Foundation(Grant No.131067)+4 种基金the Major Scientific and Technological Special Project of Guizhou Province(Grant No.20183001)the Foundation of State Key Laboratory of Public Big Data(Grant No.2018BDKFJJ018)CCF-Tencent Open Fund WeBank Special Funding(CCF-WebankRAGR20180104)the High-quality and Cutting-edge Disciplines Construction Project for Universities in Beijing(Internet Information,Communication University of China)the Fundamental Research Funds for the Central Universities,and the Fundamental Research Funds for the Central Universities No.2019XD-A02.
文摘Most existing blockchain schemes are based on the design concept“openness and transparency”to realize data security,which usually require transaction data to be presented in the form of plaintext.However,it inevitably brings the issues with respect to data privacy and operating performance.In this paper,we proposed a novel blockchain scheme called Cipherchain,which can process and maintain transaction data in the form of ciphertext while the characteristics of immutability and auditability are guaranteed.Specifically in our scheme,transactions can be encrypted locally based on a searchable encryption scheme called multi-user public key encryption with conjunctive keyword search(mPECK),and can be accessed by multiple specific participants after appended to the globally consistent distributed ledger.By introducing execution-consensus-update paradigm of transaction flow,Cipherchain cannot only make it possible for transaction data to exist in the form of ciphertext,but also guarantee the overall system performance not greatly affected by cryptographic operations and other local execution work.In addition,Cipherchain is a promising scheme to realize the technology combination of“blockchain+cloud computing”and“permissioned blockchain+public blockchain”.
文摘While the Harris-Todaro model is a traditional approach used in researching the urban-rural dichotomy,it fails to explain families’goals to maximize their current utility in terms of intertemporal decision-making con‐ditions.To fill this gap,in this paper,an urban-rural dichotomy model involving labor migration and educa‐tion is established,in which it is assumed that family utility derives from consumption and children’s educa‐tional achievement.The steady-state path derived through the Bellman equation suggests that increasing edu‐cational investment and family education intensity leads to a significant urban-rural difference in children’s educational achievement.Compared with the traditional Harris-Todaro model,the transversality condition is loosened in this model,while the unavailability of loans constrains migrant families.Four hypotheses are made and tested using an empirical study.An ordinary least squares regression was used in the analysis,but due to the endogeneity caused by missing variables,the instrumental variable method and two-stage least squares regression were used.The results demonstrate that the household registration system can explain 44.5%of the educational achievement difference,and the initial difference is inflated 4.73 times after nine years of compulsory education.This divergence could increase the differences caused by household registra‐tion status,resulting in larger income gaps and intergenerational heredity of identities.
文摘The springing up of large language models(LLMs)has shifted the community from single-task-orientated natural language processing(NLP)research to a holistic end-to-end multi-task learning paradigm.Along this line of research endeavors in the area,LLM-based prompting methods have attracted much attention,partially due to the technological advantages brought by prompt engineering(PE)as well as the underlying NLP principles disclosed by various prompting methods.Traditional supervised learning usually requires training a model based on labeled data and then making predictions.In contrast,PE methods directly use the powerful capabilities of existing LLMs(e.g.,GPT-3 and GPT-4)via composing appropriate prompts,especially under few-shot or zero-shot scenarios.Facing the abundance of studies related to the prompting and the ever-evolving nature of this field,this article aims to 1)illustrate a novel perspective to review existing PE methods within the well-established communication theory framework,2)facilitate a better/deeper understanding of developing trends of existing PE methods used in three typical tasks,and 3)shed light on promising research directions for future PE methods.
基金supported by the National Natural Science Foundation of China(42201319,42001281,42201347 and 42001324)the Guangdong Basic and Applied Basic Research Foundation(2023A1515011946 and 2023A1515011216)+1 种基金the Open Funding Project of the Key Laboratory of Marine Environmental Survey Technology and Application,Ministry of Natural Resources(MESTA-2021-B003)Independent Research Project of Guangming Laboratory Project:Moonshot Carbon Credit Rating Driven by AI and Remote Sensing Big Data(23400002)。
文摘China is experiencing accelerated urbanisation,with a large number of people moving from rural to urban areas[1].It has resulted in large losses in the net primary production(NPP),biodiversity and carbon stocks and an increase in environmental pollution and CO_(2)emissions[2–4].In 2015,196 countries signed the Paris Agreement and committed to setting long-term goals to jointly manage climate change and reduce their individual emissions,aiming to control the increase in global average temperature from the pre-industrial level to below 2℃and to curtail the temperature rise within 1.5℃till the end of the 21st century[5].China is bolstering its efforts to achieve the climate change mitigation goals and has announced a plan for achieving carbon neutrality by 2060[6].The carbon neutrality goal poses a challenge to the current policies promoting rapid urbanisation across China.
基金supported by National Key Research and Development Program of China(No.2018AAA 0101100).
文摘In the past decades,artificial intelligence(AI)has achieved unprecedented success,where statistical models become the central entity in AI.However,the centralized training and inference paradigm for building and using these models is facing more and more privacy and legal challenges.To bridge the gap between data privacy and the need for data fusion,an emerging AI paradigm feder-ated learning(FL)has emerged as an approach for solving data silos and data privacy problems.Based on secure distributed AI,feder-ated learning emphasizes data security throughout the lifecycle,which includes the following steps:data preprocessing,training,evalu-ation,and deployments.FL keeps data security by using methods,such as secure multi-party computation(MPC),differential privacy,and hardware solutions,to build and use distributed multiple-party machine-learning systems and statistical models over different data sources.Besides data privacy concerns,we argue that the concept of“model”matters,when developing and deploying federated models,they are easy to expose to various kinds of risks including plagiarism,illegal copy,and misuse.To address these issues,we introduce FedIPR,a novel ownership verification scheme,by embedding watermarks into FL models to verify the ownership of FL models and protect model intellectual property rights(IPR or IP-right for short).While security is at the core of FL,there are still many articles re-ferred to distributed machine learning with no security guarantee as“federated learning”,which are not satisfied with the FL definition supposed to be.To this end,in this paper,we reiterate the concept of federated learning and propose secure federated learning(SFL),where the ultimate goal is to build trustworthy and safe AI with strong privacy-preserving and IP-right-preserving.We provide a com-prehensive overview of existing works,including threats,attacks,and defenses in each phase of SFL from the lifecycle perspective.
文摘Robustness is a long-standing challenge for automatic speech recognition(ASR)as the applied environment of any ASR system faces much noisier speech samples than clean training corpora.However,it is impractical to annotate every types of noisy environments.In this work,we propose a novel phonetic-semantic pre-training(PSP)framework that allows a model to effectively improve the performance of ASR against practical noisy environments via seamlessly integrating pre-training,self-supervised learning,and fine-tuning.In particular,there are three fundamental stages in PSP.First,pre-train the phone-to-word transducer(PWT)to map the generated phone sequence to the target text using only unpaired text data;second,continue training the PWT on more complex data generated from an empirical phone-perturbation heuristic,in additional to self-supervised signals by recovering the tainted phones;and third,fine-tune the resultant PWT with real world speech data.We perform experiments on two real-life datasets collected from industrial scenarios and synthetic noisy datasets,which show that the PSP effectively improves the traditional ASR pipeline with relative character error rate(CER)reductions of 28.63%and 26.38%,respectively,in two real-life datasets.It also demonstrates its robustness against synthetic highly noisy speech datasets.
基金supported by the National Key Research and Development Program of China(2021YFF1200900,2021YFF1201200)the National Natural Science Foundation of China(31970638,61572361)+3 种基金the Shanghai Artificial Intelligence Technology Standard Project(19DZ2200900)the Shanghai Shuguang Scholars ProjectWeBank Scholars Projectthe Fundamental Research Funds for the Central Universities。
文摘The rapid accumulation of large-scale single-cell RNA-seq datasets from multiple institutions presents remarkable opportunities for automatically cell annotations through integrative analyses.However,the privacy issue has existed but being ignored,since we are limited to access and utilize all the reference datasets distributed in different institutions globally due to the prohibited data transmission across institutions by data regulation laws.To this end,we present scPrivacy,which is the first and generalized automatically single-cell type identification prototype to facilitate single cell annotations in a data privacy-preserving collaboration manner.We evaluated scPrivacy on a comprehensive set of publicly available benchmark datasets for single-cell type identification to stimulate the scenario that the reference datasets are rapidly generated and distributed in multiple institutions,while they are prohibited to be integrated directly or exposed to each other due to the data privacy regulations,demonstrating its effectiveness,time efficiency and robustness for privacy-preserving integration of multiple institutional datasets in single cell annotations.