As one of the most widely used languages in the world,Chinese language is distinct from most western languages in many properties,thus providing a unique opportunity for understanding the brain basis of human language...As one of the most widely used languages in the world,Chinese language is distinct from most western languages in many properties,thus providing a unique opportunity for understanding the brain basis of human language and cognition.In recent years,non-invasive neuroimaging techniques such as magnetic resonance imaging(MRI)blaze a new trail to comprehensively study specific neural correlates of Chinese language processing and Chinese speakers.We reviewed the application of functional MRI(fMRI)in such studies and some essential findings on brain systems in processing Chinese.Specifically,for example,the application of task fMRI and resting-state fMRI in observing the process of reading and writing the logographic characters and producing or listening to the tonal speech.Elementary cognitive neuroscience and several potential research directions around brain and Chinese language were discussed,which may be informative for future research.展开更多
While large language models(LLMs)have made significant strides in natural language processing(NLP),they continue to face challenges in adequately addressing the intricacies of the Chinese language in certain scenarios...While large language models(LLMs)have made significant strides in natural language processing(NLP),they continue to face challenges in adequately addressing the intricacies of the Chinese language in certain scenarios.We propose a framework called Six-Writings multimodal processing(SWMP)to enable direct integration of Chinese NLP(CNLP)with morphological and semantic elements.The first part of SWMP,known as Six-Writings pictophonetic coding(SWPC),is introduced with a suitable level of granularity for radicals and components,enabling effective representation of Chinese characters and words.We conduct several experimental scenarios,including the following:(1)We establish an experimental database consisting of images and SWPC for Chinese characters,enabling dual-mode processing and matrix generation for CNLP.(2)We characterize various generative modes of Chinese words,such as thousands of Chinese idioms,used as question-and-answer(Q&A)prompt functions,facilitating analogies by SWPC.The experiments achieve 100%accuracy in answering all questions in the Chinese morphological data set(CA8-Mor-10177).(3)A fine-tuning mechanism is proposed to refine word embedding results using SWPC,resulting in an average relative error of≤25%for 39.37%of the questions in the Chinese wOrd Similarity data set(COS960).The results demonstrate that SWMP/SWPC methods effectively capture the distinctive features of Chinese and offer a promising mechanism to enhance CNLP with better efficiency.展开更多
The exption of Chinese natural language processing(NLP)has stimulated research in the broader NLP domain.However,existing large language models have limitations in comprehending and reasoning in Chinese.This paper add...The exption of Chinese natural language processing(NLP)has stimulated research in the broader NLP domain.However,existing large language models have limitations in comprehending and reasoning in Chinese.This paper addresses these limitations by enhancing Chinese language models comprehension and reasoning capabilities while minimizing resource requirements.We propose LLaMA-LoRA,a neural prompt engineering framework that builds upon the LLaMA-13B model and incorporates the Low-Rank Adaptation(LoRA)of Large Language Models technique for refinement.Chain-of-Thought(CoT)are crucial for generating intermediate reasoning chains in language models,but their effectiveness can be limited by isolated language patterns.Erroneous reasoning resulting from conventional prompts negatively impacts model performance.Automatic prompts are introduced to encourage reasoning chain generation and accurate answer inference.Training the model with an extensive corpus of Chinese CoT data enhances its comprehension and reasoning abilities.The LLaMA-LoRA model demonstrates exceptional performance across numerous Chinese language tasks,surpassing benchmark performance achieved by related language models such as GPT-3.5,Chat-GLM,and OpenAssistant,delivering accurate,comprehensive,and professional answers.The availability of our open-source model code facilitates further research in the field of Chinese text logical reasoning thinking chains.展开更多
基金the National Natural Scientific Foundation of China(Grants 81790650,81790651,81727808,81627901,and 31771253)the Beijing Municipal Science and Technology Commission(Grants Z171100000117012 and Z181100001518003)the Collaborative Research Fund of the Chinese Institute for Brain Research,Beijing(No.2020-NKXPT-02).
文摘As one of the most widely used languages in the world,Chinese language is distinct from most western languages in many properties,thus providing a unique opportunity for understanding the brain basis of human language and cognition.In recent years,non-invasive neuroimaging techniques such as magnetic resonance imaging(MRI)blaze a new trail to comprehensively study specific neural correlates of Chinese language processing and Chinese speakers.We reviewed the application of functional MRI(fMRI)in such studies and some essential findings on brain systems in processing Chinese.Specifically,for example,the application of task fMRI and resting-state fMRI in observing the process of reading and writing the logographic characters and producing or listening to the tonal speech.Elementary cognitive neuroscience and several potential research directions around brain and Chinese language were discussed,which may be informative for future research.
基金Project partially supported by the Brazilian National Council for Scientific and Technological Development(CNPq)(No.309545/2021-8)。
文摘While large language models(LLMs)have made significant strides in natural language processing(NLP),they continue to face challenges in adequately addressing the intricacies of the Chinese language in certain scenarios.We propose a framework called Six-Writings multimodal processing(SWMP)to enable direct integration of Chinese NLP(CNLP)with morphological and semantic elements.The first part of SWMP,known as Six-Writings pictophonetic coding(SWPC),is introduced with a suitable level of granularity for radicals and components,enabling effective representation of Chinese characters and words.We conduct several experimental scenarios,including the following:(1)We establish an experimental database consisting of images and SWPC for Chinese characters,enabling dual-mode processing and matrix generation for CNLP.(2)We characterize various generative modes of Chinese words,such as thousands of Chinese idioms,used as question-and-answer(Q&A)prompt functions,facilitating analogies by SWPC.The experiments achieve 100%accuracy in answering all questions in the Chinese morphological data set(CA8-Mor-10177).(3)A fine-tuning mechanism is proposed to refine word embedding results using SWPC,resulting in an average relative error of≤25%for 39.37%of the questions in the Chinese wOrd Similarity data set(COS960).The results demonstrate that SWMP/SWPC methods effectively capture the distinctive features of Chinese and offer a promising mechanism to enhance CNLP with better efficiency.
基金supported by the the Science and Technology Program of Sichuan Province(Grant no.2023YFS0424)the"Open bidding for selecting the best candidates"Science and Technology Project of Chengdu(Grant no.2023-JB00-00020-GX)the National Natural Science Foundation(Grant nos.61902324,11426179,and 61872298).
文摘The exption of Chinese natural language processing(NLP)has stimulated research in the broader NLP domain.However,existing large language models have limitations in comprehending and reasoning in Chinese.This paper addresses these limitations by enhancing Chinese language models comprehension and reasoning capabilities while minimizing resource requirements.We propose LLaMA-LoRA,a neural prompt engineering framework that builds upon the LLaMA-13B model and incorporates the Low-Rank Adaptation(LoRA)of Large Language Models technique for refinement.Chain-of-Thought(CoT)are crucial for generating intermediate reasoning chains in language models,but their effectiveness can be limited by isolated language patterns.Erroneous reasoning resulting from conventional prompts negatively impacts model performance.Automatic prompts are introduced to encourage reasoning chain generation and accurate answer inference.Training the model with an extensive corpus of Chinese CoT data enhances its comprehension and reasoning abilities.The LLaMA-LoRA model demonstrates exceptional performance across numerous Chinese language tasks,surpassing benchmark performance achieved by related language models such as GPT-3.5,Chat-GLM,and OpenAssistant,delivering accurate,comprehensive,and professional answers.The availability of our open-source model code facilitates further research in the field of Chinese text logical reasoning thinking chains.