The rise of social networking enables the development of multilingual Internet-accessible digital documents in several languages.The digital document needs to be evaluated physically through the Cross-Language Text Su...The rise of social networking enables the development of multilingual Internet-accessible digital documents in several languages.The digital document needs to be evaluated physically through the Cross-Language Text Summarization(CLTS)involved in the disparate and generation of the source documents.Cross-language document processing is involved in the generation of documents from disparate language sources toward targeted documents.The digital documents need to be processed with the contextual semantic data with the decoding scheme.This paper presented a multilingual crosslanguage processing of the documents with the abstractive and summarising of the documents.The proposed model is represented as the Hidden Markov Model LSTM Reinforcement Learning(HMMlstmRL).First,the developed model uses the Hidden Markov model for the computation of keywords in the cross-language words for the clustering.In the second stage,bi-directional long-short-term memory networks are used for key word extraction in the cross-language process.Finally,the proposed HMMlstmRL uses the voting concept in reinforcement learning for the identification and extraction of the keywords.The performance of the proposed HMMlstmRL is 2%better than that of the conventional bi-direction LSTM model.展开更多
The main goal of English teaching in colleges and universities is to cultivate students’ability to use the language,but many students are still unable to complete oral communication fluently after years of study.For ...The main goal of English teaching in colleges and universities is to cultivate students’ability to use the language,but many students are still unable to complete oral communication fluently after years of study.For this reason,teachers need to deeply analyze and study the linguistic features of oral English corpora and formulate reasonable teaching strategies to improve students’oral expression skills.This paper outlines the linguistic features of oral English corpora,comparatively analyzes the differences between oral English corpora and written English corpora,and explores effective teaching strategies,hoping to provide guidelines for relevant teachers.展开更多
As one of Chinese minority languages,Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently.This,along with the relatively small Tibetan corpus,has resu...As one of Chinese minority languages,Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently.This,along with the relatively small Tibetan corpus,has resulted in an unsatisfying performance of Tibetan speech recognition based on an end-to-end model.This paper aims to achieve an accurate Tibetan speech recognition using a small amount of Tibetan training data.We demonstrate effective methods of Tibetan end-to-end speech recognition via cross-language transfer learning from three aspects:modeling unit selection,transfer learning method,and source language selection.Experimental results show that the Chinese-Tibetan multi-language learning method using multilanguage character set as the modeling unit yields the best performance on Tibetan Character Error Rate(CER)at 27.3%,which is reduced by 26.1%compared to the language-specific model.And our method also achieves the 2.2%higher accuracy using less amount of data compared with the method using Tibetan multi-dialect transfer learning under the same model structure and data set.展开更多
The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were t...The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were translated by two machine translation systems on the Language Gridem. The queries were then expanded using an online dictionary to translate compound words or word phrases. A concept base was used to compare back translation words with the original query in order to delete mistranslated words. In order to evaluate the proposed method, we constructed a CLIR system and used the science documents of the NTCIR1 dataset. The proposed method achieved high precision. However~ proper nouns (names of people and places) appear infrequently in science documents. In information retrieval, proper nouns present unique problems. Since proper nouns are usually unknown words, they are difficult to find in monolingual dictionaries, not to mention bilingual dictionaries. Furthermore, the initial query of the user is not always the best description of the desired information. In order to solve this problem, and to create a better query representation, query expansion is often proposed as a solution. Wikipedia was used to translate compound words or word phrases. It was also used to expand queries together with a concept base. The NTCIRI and NTCIR 6 datasets were used to evaluate the proposed method. In the proposed method, the CLIR system was implemented with a high rate of precision. The proposed syst had a higher ranking than the NTCIRI and NTCIR6 participation systems.展开更多
Bilingual word vectors have been exploited a lot in cross-language information retrieval research. However, most of the research is currently focused on similar language pairs. There are very few studies exploring the...Bilingual word vectors have been exploited a lot in cross-language information retrieval research. However, most of the research is currently focused on similar language pairs. There are very few studies exploring the impact of using bilingual word vectors for cross-language information retrieval in long-distance language pairs. In this paper, it systematically analyzes the retrieval performance of various European languages (English, German, Italian, French, Finnish, Dutch) as well as Asian languages (Chinese, Japanese) in the adhoc task of CLEF 2002–2003 campaign. Genetic proximity was used to visually represent the relationships between languages and compare their crosslingual retrieval performance in various settings. The results show that the differences in language vocabulary would dramatically affect the retrieval performance. At the same time, the term by term translation retrieval method performs slightly better than the simple vector addition retrieval methods. It proves that the translation-based retrieval model can still maintain its advantage under the new semantic scheme.展开更多
针对机电设备领域相关语料匮乏、关系类型特征挖掘不充分以及文本包含重叠三元组的问题,提出一种融合提示学习与先验知识以迭代式对抗训练的三元组抽取方法TBPA(Triplet extraction Based on Prompt and Antagonistic training)。首先,...针对机电设备领域相关语料匮乏、关系类型特征挖掘不充分以及文本包含重叠三元组的问题,提出一种融合提示学习与先验知识以迭代式对抗训练的三元组抽取方法TBPA(Triplet extraction Based on Prompt and Antagonistic training)。首先,利用BERT(Bidirectional Encoder Representations from Transformers)模型在自构语料库上进行微调,以获取输入文本的特征向量;接着,采用投影梯度下降(PGD)方法在嵌入层进行迭代式对抗训练,提高模型对干扰样本的抵御能力和对真实样本的泛化能力;然后,利用单层头尾指针网络识别出头实体,并结合提示学习模板获取头实体对应的领域先验特征,将字向量与Prompt模板中预测得到的提示向量相结合;最后,在分层标注框架下,使用单层头尾指针网络逐个识别预定义的所有关系类型所对应的尾实体。与基线模型CasRel相比,TBPA在精确率、召回率和F1值上分别提高了3.10、6.12、4.88个百分点。实验结果表明,TBPA在煤矿机电设备领域三元组抽取任务中具有一定的优势。展开更多
The present article provides a critical review of Randi Reppen's impressing book Using Corpora in the Language Classroom.It's argued that Randi Reppen's book,despite a few slight flaws,has a strong practic...The present article provides a critical review of Randi Reppen's impressing book Using Corpora in the Language Classroom.It's argued that Randi Reppen's book,despite a few slight flaws,has a strong practical orientation and is a laudable effort to make English language teachers to realize the importance and practicality of bringing corpora into classroom in digital age.The book is particularly worthy of reading for those language teachers(especially beginner teachers) who want to breathe new life into their English teaching.展开更多
Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representation...Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.展开更多
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net...Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.展开更多
The rising of aging and the declining of birth rates have forced the public to focus on the youth’s view on marriage.Based on critical discourse analysis and combined with Fairclough’s three-dimensional discourse an...The rising of aging and the declining of birth rates have forced the public to focus on the youth’s view on marriage.Based on critical discourse analysis and combined with Fairclough’s three-dimensional discourse analysis model,this paper builds a“Chinese media News Report Corpus on the topic of‘marriage’”whose news are collected from China Daily.It is found that the discourses are neutral and objective with regard to the advantages and disadvantages of marriage,but in general,it is still a traditional view of marriage that is inevitable and closely related to fertility.Although this is controlled by the policies and the social reasons including declining fertility rate,it deviates from the current view of the youth towards marriage,resulting in many serious consequences such as young people’s rejection.In addition,this research found that male and female have great differences in their views on marriage,and men’s resistance to marriage is far greater than that of women,which is departure from the public’s cognition.The reasons behind this need to be explored in order to solve the marriage and love problems of young people in today’s era and realize the healthy development of young marriage.展开更多
The paper firstly reviews the developments of language transfer research. Utilizing the results supplied by CEM and AntConc, the present writer conducts analyses of the lexicai errors committed by the students in TEM-...The paper firstly reviews the developments of language transfer research. Utilizing the results supplied by CEM and AntConc, the present writer conducts analyses of the lexicai errors committed by the students in TEM-8 test and reveals the contributions to SLA and EFL language teaching that language transfer may make.展开更多
This paper examines the application of electronic corpora to English classroom of lexical learning.It starts with a literature review of basic issues on corpus linguistics and theories underlying lexical study,followe...This paper examines the application of electronic corpora to English classroom of lexical learning.It starts with a literature review of basic issues on corpus linguistics and theories underlying lexical study,followed by discussion on the specific lexical learning aspects in which a corpus might provide some insight.Based other research like data-driven learning(DDL)in this area,the paper goes further by exploring the possibility and ways of applying corpora in vocabulary learning classroom.展开更多
A novel visualized sound description, called sound dendrogram is proposed to make manual annotation easier when building large speech corpora. It is a lattice structure built from a group of "seed regions" and throu...A novel visualized sound description, called sound dendrogram is proposed to make manual annotation easier when building large speech corpora. It is a lattice structure built from a group of "seed regions" and through an iterative procedure of mergence. A simple but reliable extraction method of "seed regkms" and advanced distance metric are adopted to construct the sound dendrogram, so that it can present speech's structure character ranging from coarse to fine in a visualized way. Tests show that all phonemic boundaries are contained in the lattice structure of sound dendrogram and very easy to identify. Sound dendrogram can be a powerful assistant tool during the process of speech corporals manual annotation.展开更多
In the present study,we aimed to investigate a protective role for resveratrol against the effects of immobilization stress on corpora lutea(CL)of mice in early pregnancy.A total of 45 early-pregnant mice were divided...In the present study,we aimed to investigate a protective role for resveratrol against the effects of immobilization stress on corpora lutea(CL)of mice in early pregnancy.A total of 45 early-pregnant mice were divided into no immobilization stress(NIS)group,immobilization stress(IS)group,and immobilization and resveratrol treatment(IS+RES)group(n=15).Mice were immobilized in plastic tubes(50 mL)for 3 h per day during day 1 to 7 of pregnancy.In the IS+RES group,5 mg kg-'d-1 of resveratrol was administered just prior to application of stress.We analyzed apoptotic activity in CL by Western botting analysis(WB),transmission electron microscopy(TEM),and immunohistochemistry(IHC).Serum progesterone levels were examined with radioimmunoassay(RIA).IHC results showed that the intensity of positive staining for Bax was increased,and for BcI-2 was decreased in CL after IS,while resveratrol treatment reversed the positive staining for Bax and Bcl-2.WB revealed that immobilization stress up-regulated the expression of Bax and caspase-9,and down-regulated Bcl-2 expression,while resveratrol treatment attenuated the effects of immobilization stress on the expression of Bax,Bcl-2 and caspase-9.According to our TEM results,apoptosis as defined by chromatin condensation was found in CL after immobilization stress,while resveratrol inhibited the apoptosis.We also demonstrated that immobilization stress decreased progesterone concentrations and ovarian expression of StAR,while resveratrol restored the concentrations of progesterone and expression of StAR back to normal.These results indicated that immobilization stress induced luteal regression while resveratrol inhibited luteal regression,suggesting that resveratrol plays a protective role on corpora lutea of mice during early pregnancy.展开更多
The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing,recommending,prescribing,or treating patie...The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing,recommending,prescribing,or treating patients for uniform phenotype features from patients’profile.The authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning.Therefore,here we demonstrated a mechanism to come up with uniform NER(Named Entity Recognition)tagged medical corpora that is fed with 14407 endocrine patients’data set in Comma Separated Values(CSV)format diagnosed with diabetes mellitus and comorbidity diseases.The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com.ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common natural language programming(NLP)techniques and frameworks like TensorFlow,Keras,Long Short-Term Memory(LSTM),and Bi-LSTM.In our preliminary experiments,albeit label sets in form of(instance,label)pair were tagged with Sequential()model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms.The maximum accuracy achieved for model validation was 0.8846.展开更多
Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text...Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text documents toextract case reasoning and related data. This sort of case processing helps professionals and researchers to refer the previous case with more accuracy in reducedtime. The rapid development of judicial ontologies seems to deliver interestingproblem solving to legal knowledge formalization. Mining context informationthrough ontologies from corpora is a challenging and interesting field. Thisresearch paper presents a three tier contextual text mining framework throughontologies for judicial corpora. This framework comprises on the judicial corpus,text mining processing resources and ontologies for mining contextual text fromcorpora to make text and data mining more reliable and fast. A top-down ontologyconstruction approach has been adopted in this paper. The judicial corpus hasbeen selected with a sufficient dataset to process and evaluate the results.The experimental results and evaluations show significant improvements incomparison with the available techniques.展开更多
BACKGROUND: Insulin-like growth factor-I(IGF-1), as one of the important members of growth factor family, participants in the regulation of many physiological functions and behaviors, having very strong neuroprotec...BACKGROUND: Insulin-like growth factor-I(IGF-1), as one of the important members of growth factor family, participants in the regulation of many physiological functions and behaviors, having very strong neuroprotective effect. However, the expression of IGF-1 following cerebral ischemia/reperfusion is still disputed. OBJECTIVE: To observe the expression of IGF-1 and protein of corpora striata in ischemic side at the early stage of middle cerebral artery ischemia/reperfusion in rhesus monkey. DESIGN : A completely randomized grouping design, controlled animal experiment SETTING : Institute of Cerebrovascular Disease, Affiliated Hospital of Medical College of Qingdao University MATERIALS: ① Totally 17 rhesus monkeys , of either gender, aged 4 to 5 years, were enrolled . Seven rhesus monkeys observed with gene chip were randomly divided into 2 groups: sham operation group (n=3) and ischemia/reperfusion group 〈n=4〉. Ten rhesus monkeys observed with in situ hybridization and immunohistochemistry method were randomly divided into 2 groups: sham operation group 〈n=3 〉and ischemia/reperfusion group (n=7). Rhesus monkeys observed under microscope were divided into 2 groups: sham operation group (n=6) and ischamia/reperfusion group (n=-11).②Materials used in the experiment: cresyl violet (Sigma Company, America); immunohistochemical reagent kit ( Huamei Bio-engineering Company); In situ hybridization reagent kit (Boshide Bio-engineering Co.Ltd, Wuhan); 12 800 dots chip (Boxing Company, Shanghai). METHODS : This experiment was carried out at the Institute of Cerebrovascular Disease, Affiliated Hospital of Medical College of Qingdao University from January 2001 to December 2003.① The onset area of middle cerebral artery was blocked for 2 hours, middle cerebral artery ischemia/reperfusion models were created.② After ischemia/reperfusion for 24 hours, cerebral tissue sections of rhesus monkeys were prepared and stained with cresyl violet. Image analysis was performed with 5001W image analysis software. Morphological change of corpora striata of operative side was observed in the rhesus monkeys between two groups. Total RNA was extracted from cerebral tissue. ③ Detection of gene chip: Cy3-duTP and Cy5-duTP were used to respectively perform reverse transcription labeling. The sample was reversely transcribed into cDNA, then hybridized with cDNA of cerebral tissue. Genes with the separate absolute value of cy3 and cy5〉800, cY3/cy5 〉 2(high expression) or 〈 0.5 (low expression) were found out. Those were genes with differential expression. ④ The expressions of IGF-1 mRNA and protein level of corpora striata in ischemic side of rhe- sus monkeys were detected between sham operation group and ischemia/reperfusion group at 9 and 24 hours after ischemia/reperfusion with in situ hybridization method and immunohistochemical method. Brown granules were IGF-1 protein positive cells. ⑤ Analysis of variance was used in the difference comparison of measurement data among groups. MAIN OUTCOME MEASURES : ① Change of morphological structure of corpora striata at ischemic side in rhesus monkeys. ② Change of cerebral gene expression profiles at ischemia/reperfusion in rhesus monkeys between two groups.③ Expression of IGF-1 mRNA and protein level of corpora striata at ischemia/reperfu- sion in rhesus monkeys between two groups. RESULTS : ① Pathological change : Obvious pathological change of cerebral infarction appeared in the ischemia and reperfusion group, while there was no such pathological change in the sham operation group.② Change of gene expression profile : There were 4480 genes with difference expression in the ischemia/reperfusion group and sham-operation group, in which, 260 genes had high expression and their absolute value was over 800, and 63 genes had low expression, cy3/cy5 of IGF-1 was 0.379, being relative low ex- pression. ③ IGF-1 mRNA and protein positive cell counts in corpora striata at cerebral ischemic side[IGF-1 mRNA: 〈9.72±1.18),(9.11 ±0.76),(14.77±0.60) counts/field:lGF-1 protein: (15.11 ±1.83),(15.39±0.78), (34.62±0.97)counts/field, P 〈 0.05-0.01]. CONCLUSION: IGF-1 mRNA and protein are lowly expressed in middle cerebral artery of rhesus monkeys at ischemia/reperfusion.展开更多
文摘The rise of social networking enables the development of multilingual Internet-accessible digital documents in several languages.The digital document needs to be evaluated physically through the Cross-Language Text Summarization(CLTS)involved in the disparate and generation of the source documents.Cross-language document processing is involved in the generation of documents from disparate language sources toward targeted documents.The digital documents need to be processed with the contextual semantic data with the decoding scheme.This paper presented a multilingual crosslanguage processing of the documents with the abstractive and summarising of the documents.The proposed model is represented as the Hidden Markov Model LSTM Reinforcement Learning(HMMlstmRL).First,the developed model uses the Hidden Markov model for the computation of keywords in the cross-language words for the clustering.In the second stage,bi-directional long-short-term memory networks are used for key word extraction in the cross-language process.Finally,the proposed HMMlstmRL uses the voting concept in reinforcement learning for the identification and extraction of the keywords.The performance of the proposed HMMlstmRL is 2%better than that of the conventional bi-direction LSTM model.
文摘The main goal of English teaching in colleges and universities is to cultivate students’ability to use the language,but many students are still unable to complete oral communication fluently after years of study.For this reason,teachers need to deeply analyze and study the linguistic features of oral English corpora and formulate reasonable teaching strategies to improve students’oral expression skills.This paper outlines the linguistic features of oral English corpora,comparatively analyzes the differences between oral English corpora and written English corpora,and explores effective teaching strategies,hoping to provide guidelines for relevant teachers.
基金This work was supported by three projects.Zhao Y received the Grant with Nos.61976236 and 2020MDJC06Bi X J received the Grant with No.20&ZD279.
文摘As one of Chinese minority languages,Tibetan speech recognition technology was not researched upon as extensively as Chinese and English were until recently.This,along with the relatively small Tibetan corpus,has resulted in an unsatisfying performance of Tibetan speech recognition based on an end-to-end model.This paper aims to achieve an accurate Tibetan speech recognition using a small amount of Tibetan training data.We demonstrate effective methods of Tibetan end-to-end speech recognition via cross-language transfer learning from three aspects:modeling unit selection,transfer learning method,and source language selection.Experimental results show that the Chinese-Tibetan multi-language learning method using multilanguage character set as the modeling unit yields the best performance on Tibetan Character Error Rate(CER)at 27.3%,which is reduced by 26.1%compared to the language-specific model.And our method also achieves the 2.2%higher accuracy using less amount of data compared with the method using Tibetan multi-dialect transfer learning under the same model structure and data set.
文摘The present paper describes the use of online free language resources for translating and expanding queries in CLIR (cross-language information retrieval). In a previous study, we proposed method queries that were translated by two machine translation systems on the Language Gridem. The queries were then expanded using an online dictionary to translate compound words or word phrases. A concept base was used to compare back translation words with the original query in order to delete mistranslated words. In order to evaluate the proposed method, we constructed a CLIR system and used the science documents of the NTCIR1 dataset. The proposed method achieved high precision. However~ proper nouns (names of people and places) appear infrequently in science documents. In information retrieval, proper nouns present unique problems. Since proper nouns are usually unknown words, they are difficult to find in monolingual dictionaries, not to mention bilingual dictionaries. Furthermore, the initial query of the user is not always the best description of the desired information. In order to solve this problem, and to create a better query representation, query expansion is often proposed as a solution. Wikipedia was used to translate compound words or word phrases. It was also used to expand queries together with a concept base. The NTCIRI and NTCIR 6 datasets were used to evaluate the proposed method. In the proposed method, the CLIR system was implemented with a high rate of precision. The proposed syst had a higher ranking than the NTCIRI and NTCIR6 participation systems.
基金National Natural Science Foundation of China under Project No. 61876062Scientific Research Fund of Hunan Provincial Education Department of China under Project No. 16K030Hunan Provincial Natural Science Foundation of China under Project No. 2017JJ2101, Hunan Provincial Innovation Foundation for Postgraduate under Project No. CX2018B671.
文摘Bilingual word vectors have been exploited a lot in cross-language information retrieval research. However, most of the research is currently focused on similar language pairs. There are very few studies exploring the impact of using bilingual word vectors for cross-language information retrieval in long-distance language pairs. In this paper, it systematically analyzes the retrieval performance of various European languages (English, German, Italian, French, Finnish, Dutch) as well as Asian languages (Chinese, Japanese) in the adhoc task of CLEF 2002–2003 campaign. Genetic proximity was used to visually represent the relationships between languages and compare their crosslingual retrieval performance in various settings. The results show that the differences in language vocabulary would dramatically affect the retrieval performance. At the same time, the term by term translation retrieval method performs slightly better than the simple vector addition retrieval methods. It proves that the translation-based retrieval model can still maintain its advantage under the new semantic scheme.
文摘针对机电设备领域相关语料匮乏、关系类型特征挖掘不充分以及文本包含重叠三元组的问题,提出一种融合提示学习与先验知识以迭代式对抗训练的三元组抽取方法TBPA(Triplet extraction Based on Prompt and Antagonistic training)。首先,利用BERT(Bidirectional Encoder Representations from Transformers)模型在自构语料库上进行微调,以获取输入文本的特征向量;接着,采用投影梯度下降(PGD)方法在嵌入层进行迭代式对抗训练,提高模型对干扰样本的抵御能力和对真实样本的泛化能力;然后,利用单层头尾指针网络识别出头实体,并结合提示学习模板获取头实体对应的领域先验特征,将字向量与Prompt模板中预测得到的提示向量相结合;最后,在分层标注框架下,使用单层头尾指针网络逐个识别预定义的所有关系类型所对应的尾实体。与基线模型CasRel相比,TBPA在精确率、召回率和F1值上分别提高了3.10、6.12、4.88个百分点。实验结果表明,TBPA在煤矿机电设备领域三元组抽取任务中具有一定的优势。
文摘The present article provides a critical review of Randi Reppen's impressing book Using Corpora in the Language Classroom.It's argued that Randi Reppen's book,despite a few slight flaws,has a strong practical orientation and is a laudable effort to make English language teachers to realize the importance and practicality of bringing corpora into classroom in digital age.The book is particularly worthy of reading for those language teachers(especially beginner teachers) who want to breathe new life into their English teaching.
基金funded by the Major Science and Technology Projects in Henan Province,China,Grant No.221100210600.
文摘Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.
文摘Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.
文摘The rising of aging and the declining of birth rates have forced the public to focus on the youth’s view on marriage.Based on critical discourse analysis and combined with Fairclough’s three-dimensional discourse analysis model,this paper builds a“Chinese media News Report Corpus on the topic of‘marriage’”whose news are collected from China Daily.It is found that the discourses are neutral and objective with regard to the advantages and disadvantages of marriage,but in general,it is still a traditional view of marriage that is inevitable and closely related to fertility.Although this is controlled by the policies and the social reasons including declining fertility rate,it deviates from the current view of the youth towards marriage,resulting in many serious consequences such as young people’s rejection.In addition,this research found that male and female have great differences in their views on marriage,and men’s resistance to marriage is far greater than that of women,which is departure from the public’s cognition.The reasons behind this need to be explored in order to solve the marriage and love problems of young people in today’s era and realize the healthy development of young marriage.
文摘The paper firstly reviews the developments of language transfer research. Utilizing the results supplied by CEM and AntConc, the present writer conducts analyses of the lexicai errors committed by the students in TEM-8 test and reveals the contributions to SLA and EFL language teaching that language transfer may make.
文摘This paper examines the application of electronic corpora to English classroom of lexical learning.It starts with a literature review of basic issues on corpus linguistics and theories underlying lexical study,followed by discussion on the specific lexical learning aspects in which a corpus might provide some insight.Based other research like data-driven learning(DDL)in this area,the paper goes further by exploring the possibility and ways of applying corpora in vocabulary learning classroom.
文摘A novel visualized sound description, called sound dendrogram is proposed to make manual annotation easier when building large speech corpora. It is a lattice structure built from a group of "seed regions" and through an iterative procedure of mergence. A simple but reliable extraction method of "seed regkms" and advanced distance metric are adopted to construct the sound dendrogram, so that it can present speech's structure character ranging from coarse to fine in a visualized way. Tests show that all phonemic boundaries are contained in the lattice structure of sound dendrogram and very easy to identify. Sound dendrogram can be a powerful assistant tool during the process of speech corporals manual annotation.
基金The authors wish to thank Prof.Emeritus Reinhold J.Hutz,PhD of the Department of Biological Sciences,University of Wisconsin-Milwaukee,USA,for his editing and helpful adviceThis work was supported by the National Natural Science Foundation of China(31501956 and 31572403).
文摘In the present study,we aimed to investigate a protective role for resveratrol against the effects of immobilization stress on corpora lutea(CL)of mice in early pregnancy.A total of 45 early-pregnant mice were divided into no immobilization stress(NIS)group,immobilization stress(IS)group,and immobilization and resveratrol treatment(IS+RES)group(n=15).Mice were immobilized in plastic tubes(50 mL)for 3 h per day during day 1 to 7 of pregnancy.In the IS+RES group,5 mg kg-'d-1 of resveratrol was administered just prior to application of stress.We analyzed apoptotic activity in CL by Western botting analysis(WB),transmission electron microscopy(TEM),and immunohistochemistry(IHC).Serum progesterone levels were examined with radioimmunoassay(RIA).IHC results showed that the intensity of positive staining for Bax was increased,and for BcI-2 was decreased in CL after IS,while resveratrol treatment reversed the positive staining for Bax and Bcl-2.WB revealed that immobilization stress up-regulated the expression of Bax and caspase-9,and down-regulated Bcl-2 expression,while resveratrol treatment attenuated the effects of immobilization stress on the expression of Bax,Bcl-2 and caspase-9.According to our TEM results,apoptosis as defined by chromatin condensation was found in CL after immobilization stress,while resveratrol inhibited the apoptosis.We also demonstrated that immobilization stress decreased progesterone concentrations and ovarian expression of StAR,while resveratrol restored the concentrations of progesterone and expression of StAR back to normal.These results indicated that immobilization stress induced luteal regression while resveratrol inhibited luteal regression,suggesting that resveratrol plays a protective role on corpora lutea of mice during early pregnancy.
基金This research is supported by Shifa International Hospital,Pakistan.Endocrine patients’data contributed for diagnosis of diabetes,and its comorbidities holds a lot of worth to come up with these observations from experimental study。
文摘The motivation for this research comes from the gap found in discovering the common ground for medical context learning through analytics for different purposes of diagnosing,recommending,prescribing,or treating patients for uniform phenotype features from patients’profile.The authors of this paper while searching for possible solutions for medical context learning found that unified corpora tagged with medical nomenclature was missing to train the analytics for medical context learning.Therefore,here we demonstrated a mechanism to come up with uniform NER(Named Entity Recognition)tagged medical corpora that is fed with 14407 endocrine patients’data set in Comma Separated Values(CSV)format diagnosed with diabetes mellitus and comorbidity diseases.The other corpus is of ICD-10-CM coding scheme in text format taken from www.icd10data.com.ICD-10-CM corpus is to be tagged for understanding the medical context with uniformity for which we are conducting different experiments using common natural language programming(NLP)techniques and frameworks like TensorFlow,Keras,Long Short-Term Memory(LSTM),and Bi-LSTM.In our preliminary experiments,albeit label sets in form of(instance,label)pair were tagged with Sequential()model formed on TensorFlow.Keras and Bi-LSTM NLP algorithms.The maximum accuracy achieved for model validation was 0.8846.
文摘Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text documents toextract case reasoning and related data. This sort of case processing helps professionals and researchers to refer the previous case with more accuracy in reducedtime. The rapid development of judicial ontologies seems to deliver interestingproblem solving to legal knowledge formalization. Mining context informationthrough ontologies from corpora is a challenging and interesting field. Thisresearch paper presents a three tier contextual text mining framework throughontologies for judicial corpora. This framework comprises on the judicial corpus,text mining processing resources and ontologies for mining contextual text fromcorpora to make text and data mining more reliable and fast. A top-down ontologyconstruction approach has been adopted in this paper. The judicial corpus hasbeen selected with a sufficient dataset to process and evaluate the results.The experimental results and evaluations show significant improvements incomparison with the available techniques.
基金the Natural Science Foundation of Shandong Province, No. Y2004C04
文摘BACKGROUND: Insulin-like growth factor-I(IGF-1), as one of the important members of growth factor family, participants in the regulation of many physiological functions and behaviors, having very strong neuroprotective effect. However, the expression of IGF-1 following cerebral ischemia/reperfusion is still disputed. OBJECTIVE: To observe the expression of IGF-1 and protein of corpora striata in ischemic side at the early stage of middle cerebral artery ischemia/reperfusion in rhesus monkey. DESIGN : A completely randomized grouping design, controlled animal experiment SETTING : Institute of Cerebrovascular Disease, Affiliated Hospital of Medical College of Qingdao University MATERIALS: ① Totally 17 rhesus monkeys , of either gender, aged 4 to 5 years, were enrolled . Seven rhesus monkeys observed with gene chip were randomly divided into 2 groups: sham operation group (n=3) and ischemia/reperfusion group 〈n=4〉. Ten rhesus monkeys observed with in situ hybridization and immunohistochemistry method were randomly divided into 2 groups: sham operation group 〈n=3 〉and ischemia/reperfusion group (n=7). Rhesus monkeys observed under microscope were divided into 2 groups: sham operation group (n=6) and ischamia/reperfusion group (n=-11).②Materials used in the experiment: cresyl violet (Sigma Company, America); immunohistochemical reagent kit ( Huamei Bio-engineering Company); In situ hybridization reagent kit (Boshide Bio-engineering Co.Ltd, Wuhan); 12 800 dots chip (Boxing Company, Shanghai). METHODS : This experiment was carried out at the Institute of Cerebrovascular Disease, Affiliated Hospital of Medical College of Qingdao University from January 2001 to December 2003.① The onset area of middle cerebral artery was blocked for 2 hours, middle cerebral artery ischemia/reperfusion models were created.② After ischemia/reperfusion for 24 hours, cerebral tissue sections of rhesus monkeys were prepared and stained with cresyl violet. Image analysis was performed with 5001W image analysis software. Morphological change of corpora striata of operative side was observed in the rhesus monkeys between two groups. Total RNA was extracted from cerebral tissue. ③ Detection of gene chip: Cy3-duTP and Cy5-duTP were used to respectively perform reverse transcription labeling. The sample was reversely transcribed into cDNA, then hybridized with cDNA of cerebral tissue. Genes with the separate absolute value of cy3 and cy5〉800, cY3/cy5 〉 2(high expression) or 〈 0.5 (low expression) were found out. Those were genes with differential expression. ④ The expressions of IGF-1 mRNA and protein level of corpora striata in ischemic side of rhe- sus monkeys were detected between sham operation group and ischemia/reperfusion group at 9 and 24 hours after ischemia/reperfusion with in situ hybridization method and immunohistochemical method. Brown granules were IGF-1 protein positive cells. ⑤ Analysis of variance was used in the difference comparison of measurement data among groups. MAIN OUTCOME MEASURES : ① Change of morphological structure of corpora striata at ischemic side in rhesus monkeys. ② Change of cerebral gene expression profiles at ischemia/reperfusion in rhesus monkeys between two groups.③ Expression of IGF-1 mRNA and protein level of corpora striata at ischemia/reperfu- sion in rhesus monkeys between two groups. RESULTS : ① Pathological change : Obvious pathological change of cerebral infarction appeared in the ischemia and reperfusion group, while there was no such pathological change in the sham operation group.② Change of gene expression profile : There were 4480 genes with difference expression in the ischemia/reperfusion group and sham-operation group, in which, 260 genes had high expression and their absolute value was over 800, and 63 genes had low expression, cy3/cy5 of IGF-1 was 0.379, being relative low ex- pression. ③ IGF-1 mRNA and protein positive cell counts in corpora striata at cerebral ischemic side[IGF-1 mRNA: 〈9.72±1.18),(9.11 ±0.76),(14.77±0.60) counts/field:lGF-1 protein: (15.11 ±1.83),(15.39±0.78), (34.62±0.97)counts/field, P 〈 0.05-0.01]. CONCLUSION: IGF-1 mRNA and protein are lowly expressed in middle cerebral artery of rhesus monkeys at ischemia/reperfusion.