This work is about the progress of previous related work based on an experiment to improve the intelligence of robotic systems,with the aim of achieving more linguistic communication capabilities between humans and ro...This work is about the progress of previous related work based on an experiment to improve the intelligence of robotic systems,with the aim of achieving more linguistic communication capabilities between humans and robots.In this paper,the authors attempt an algorithmic approach to natural language generation through hole semantics and by applying the OMAS-III computational model as a grammatical formalism.In the original work,a technical language is used,while in the later works,this has been replaced by a limited Greek natural language dictionary.This particular effort was made to give the evolving system the ability to ask questions,as well as the authors developed an initial dialogue system using these techniques.The results show that the use of these techniques the authors apply can give us a more sophisticated dialogue system in the future.展开更多
Along with the development of big data, various Natural Language Generation systems (NLGs) have recently been developed by different companies. The aim of this paper is to propose a better understanding of how these s...Along with the development of big data, various Natural Language Generation systems (NLGs) have recently been developed by different companies. The aim of this paper is to propose a better understanding of how these systems are designed and used. We propose to study in details one of them which is the NLGs developed by the company Nomao. First, we show the development of this NLGs underlies strong economic stakes since the business model of Nomao partly depends on it. Then, thanks to an eye movement analysis conducted with 28 participants, we show that the texts generated by Nomao’s NLGs contain syntactic and semantic structures that are easy to read but lack socio-semantic coherence which would improve their understanding. From a scientific perspective, our research results highlight the importance of socio-semantic coherence in text-based communication produced by NLGs.展开更多
Both analyzing a large amount of space weather observed data and alleviating personal experience bias are significant challenges in generating artificial space weather forecast products.With the use of natural languag...Both analyzing a large amount of space weather observed data and alleviating personal experience bias are significant challenges in generating artificial space weather forecast products.With the use of natural language generation methods based on the sequence-to-sequence model,space weather forecast texts can be automatically generated.To conduct our generation tasks at a fine-grained level,a taxonomy of space weather phenomena based on descriptions is presented.Then,our MDH(Multi-Domain Hybrid)model is proposed for generating space weather summaries in two stages.This model is composed of three sequence-to-sequence-based deep neural network sub-models(one Bidirectional Auto-Regressive Transformers pre-trained model and two Transformer models).Then,to evaluate how well MDH performs,quality evaluation metrics based on two prevalent automatic metrics and our innovative human metric are presented.The comprehensive scores of the three summaries generating tasks on testing datasets are 70.87,93.50,and 92.69,respectively.The results suggest that MDH can generate space weather summaries with high accuracy and coherence,as well as suitable length,which can assist forecasters in generating high-quality space weather forecast products,despite the data being starved.展开更多
The expert system is an important field of the artificial intelligence. The traditional interface of the expert system is the command, menu and window at present. It limits the application of the expert system and emb...The expert system is an important field of the artificial intelligence. The traditional interface of the expert system is the command, menu and window at present. It limits the application of the expert system and embarrasses the enthusiasm of using expert system. Combining with the study on the expert system of network fault diagnosis, the natural language interface of the expert system has been discussed in this article. This interface can understand and generate Chinese sentences. Using this interface, the user and field experts can use the expert system to diagnose the fault of network conveniently. In the article, first, the extended production rule has been proposed. Then the methods of Chinese sentence generation from conceptual graphs and the model of expert system are introduced in detail. Using this model, the network fault diagnosis expert system and its natural language interface have been developed with Prolog.展开更多
Software testing is an important and cost intensive activity in software development.The major contribution in cost is due to test case generations.Requirement-based testing is an approach in which test cases are deri...Software testing is an important and cost intensive activity in software development.The major contribution in cost is due to test case generations.Requirement-based testing is an approach in which test cases are derivative from requirements without considering the implementation’s internal structure.Requirement-based testing includes functional and nonfunctional requirements.The objective of this study is to explore the approaches that generate test cases from requirements.A systematic literature review based on two research questions and extensive quality assessment criteria includes studies.The study identies 30 primary studies from 410 studies spanned from 2000 to 2018.The review’s nding shows that 53%of journal papers,42%of conference papers,and 5%of book chapters’address requirementsbased testing.Most of the studies use UML,activity,and use case diagrams for test case generation from requirements.One of the signicant lessons learned is that most software testing errors are traced back to errors in natural language requirements.A substantial amount of work focuses on UML diagrams for test case generations,which cannot capture all the system’s developed attributes.Furthermore,there is a lack of UML-based models that can generate test cases from natural language requirements by rening them in context.Coverage criteria indicate how efciently the testing has been performed 12.37%of studies use requirements coverage,20%of studies cover path coverage,and 17%study basic coverage.展开更多
Paraphrase is an expression of a text with alternative words and orders to achieve a better clarity. Paraphrases have been found vital for augmenting training dataset, which aid to enhance performance of machine learn...Paraphrase is an expression of a text with alternative words and orders to achieve a better clarity. Paraphrases have been found vital for augmenting training dataset, which aid to enhance performance of machine learning models that intended for various natural language processing (NLP) tasks. Thus, recently, automatic paraphrase generation has received increasing attention. However, evaluating quality of generated paraphrases is technically challenging. In the literature, the importance of generated paraphrases is tended to be determined by their impact on the performance of other NLP tasks. This kind of evaluation is referred as extrinsic evaluation, which requires high computational resources to train and test the models. So far, very little attention has been paid to the role of intrinsic evaluation in which quality of generated paraphrase is judged against predefined ground truth (reference paraphrases). In fact, it is also very challenging to find ideal and complete reference paraphrases. Therefore, in this study, we propose semantic or meaning oriented automatic evaluation metric that helps to evaluate quality of generated paraphrases against the original text, which is an intrinsic evaluation approach. Further, we evaluate quality of the paraphrases by assessing their impact on other NLP tasks, which is an extrinsic evaluation method. The goal is to explore the relationship between intrinsic and extrinsic evaluation methods. To ensure the effectiveness of proposed evaluation methods, extensive experiments are done on different publicly available datasets. The experimental results demonstrate that our proposed intrinsic and extrinsic evaluation strategies are promising. The results further reveal that there is a significant correlation between intrinsic and extrinsic evaluation approaches.展开更多
UML Class diagram generation from textual requirements is an important task in object-oriented design and programing course.This study proposes a method for automatically generating class diagrams from Chinese textual...UML Class diagram generation from textual requirements is an important task in object-oriented design and programing course.This study proposes a method for automatically generating class diagrams from Chinese textual requirements on the basis of Natural Language Processing(NLP)and mapping rules for sentence pattern matching.First,classes are identified through entity recognition rules and candidate class pruning rules using NLP from requirements.Second,class attributes and relationships between classes are extracted using mapping rules for sentence pattern matching on the basis of NLP.Third,we developed an assistant tool integrated into a precision micro classroom system for automatic generation of class diagram,to effectively assist the teaching of object-oriented design and programing course.Results are evaluated with precision,accuracy and recall from eight requirements of object-oriented design and programing course using truth values created by teachers.Our research should benefit beginners of object-oriented design and programing course,who may be students or software developers.It helps them to create correct domain models represented in the UML class diagram.展开更多
Text generation is an essential research area in artificial intelligence(AI)technology and natural language processing and provides key technical support for the rapid development of AI-generated content(AIGC).It is b...Text generation is an essential research area in artificial intelligence(AI)technology and natural language processing and provides key technical support for the rapid development of AI-generated content(AIGC).It is based on technologies such as natural language processing,machine learning,and deep learning,which enable learning language rules through training models to automatically generate text that meets grammatical and semantic requirements.In this paper,we sort and systematically summarize the main research progress in text generation and review recent text generation papers,focusing on presenting a detailed understanding of the technical models.In addition,several typical text generation application systems are presented.Finally,we address some challenges and future directions in AI text generation.We conclude that improving the quality,quantity,interactivity,and adaptability of generated text can help fundamentally advance AI text generation development.展开更多
Cyber security addresses the protection of information systems in cyberspace. These systems face multiple attacks on a daily basis, with the level of complication getting increasingly challenging. Despite the existenc...Cyber security addresses the protection of information systems in cyberspace. These systems face multiple attacks on a daily basis, with the level of complication getting increasingly challenging. Despite the existence of multiple solutions, attackers are still quite successful at identifying vulnerabilities to exploit. This is why cyber deception is increasingly being used to divert attackers’ attention and, therefore, enhance the security of information systems. To be effective, deception environments need fake data. This is where Natural Language (NLP) Processing comes in. Many cyber security models have used NLP for vulnerability detection in information systems, email classification, fake citation detection, and many others. Although it is used for text generation, existing models seem to be unsuitable for data generation in a deception environment. Our goal is to use text generation in NLP to generate data in the deception context that will be used to build multi-level deception in information systems. Our model consists of three (3) components, including the connection component, the deception component, composed of several states in which an attacker may be, depending on whether he is malicious or not, and the text generation component. The text generation component considers as input the real data of the information system and allows the production of several texts as output, which are usable at different deception levels.展开更多
Multimodal sentence summarization(MMSS)is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image.Although existing methods have gained promising success in MM...Multimodal sentence summarization(MMSS)is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image.Although existing methods have gained promising success in MMSS,they overlook the powerful generation ability of generative pre-trained language models(GPLMs),which have shown to be effective in many text generation tasks.To fill this research gap,we propose to using GPLMs to promote the performance of MMSS.Notably,adopting GPLMs to solve MMSS inevitably faces two challenges:1)What fusion strategy should we use to inject visual information into GPLMs properly?2)How to keep the GPLM′s generation ability intact to the utmost extent when the visual feature is injected into the GPLM.To address these two challenges,we propose a vision enhanced generative pre-trained language model for MMSS,dubbed as Vision-GPLM.In Vision-GPLM,we obtain features of visual and textual modalities with two separate encoders and utilize a text decoder to produce a summary.In particular,we utilize multi-head attention to fuse the features extracted from visual and textual modalities to inject the visual feature into the GPLM.Meanwhile,we train Vision-GPLM in two stages:the vision-oriented pre-training stage and fine-tuning stage.In the vision-oriented pre-training stage,we particularly train the visual encoder by the masked language model task while the other components are frozen,aiming to obtain homogeneous representations of text and image.In the fine-tuning stage,we train all the components of Vision-GPLM by the MMSS task.Extensive experiments on a public MMSS dataset verify the superiority of our model over existing baselines.展开更多
Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax...Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax or programming language technologies.However,writing Pseudo-code to each code instruction is laborious.Recently,neural machine translation is used to generate textual descriptions for the source code.In this paper,a novel deep learning-based transformer(DLBT)model is proposed for automatic Pseudo-code generation from the source code.The proposed model uses deep learning which is based on Neural Machine Translation(NMT)to work as a language translator.The DLBT is based on the transformer which is an encoder-decoder structure.There are three major components:tokenizer and embeddings,transformer,and post-processing.Each code line is tokenized to dense vector.Then transformer captures the relatedness between the source code and the matching Pseudo-code without the need of Recurrent Neural Network(RNN).At the post-processing step,the generated Pseudo-code is optimized.The proposed model is assessed using a real Python dataset,which contains more than 18,800 lines of a source code written in Python.The experiments show promising performance results compared with other machine translation methods such as Recurrent Neural Network(RNN).The proposed DLBT records 47.32,68.49 accuracy and BLEU performance measures,respectively.展开更多
Aiming at complex and changeable factors such as speech theme and environment,which make it difficult for a speaker to prepare the speech text in a short time,this paper proposes a speech generation and demonstration s...Aiming at complex and changeable factors such as speech theme and environment,which make it difficult for a speaker to prepare the speech text in a short time,this paper proposes a speech generation and demonstration system based on deep learning.This system is based on the Deep Learning Development Framework(PyTorch),trained through the theory of GPT-2 and the open source pretrained model,to generate multiple speeches according to the topics given by users,and the system generates thefinal speech and corresponding voice demon-stration audio through text modification,speech synthesis and other technologies to help users quickly obtain the target document and audio.Experiments show that the text generated by this model is smooth and easy to use,which helps shorten the preparation time of speakers and improves the confidence of the impromptu speaker.In addition,the paper explores the application prospects of text generation and has certain reference value.展开更多
为了解决基于LCA(Lower Common Ancestor)的XML关键字查询丢失语义的问题,提出了一种基于"自然语言生成技术(Natural Language Generation,NLG)"的XML关键字查询技术,将NLG的内容规划应用到XML文档,产生针对用户查询的消息语...为了解决基于LCA(Lower Common Ancestor)的XML关键字查询丢失语义的问题,提出了一种基于"自然语言生成技术(Natural Language Generation,NLG)"的XML关键字查询技术,将NLG的内容规划应用到XML文档,产生针对用户查询的消息语句集,通过对消息语句集的筛选既可以实现基于语义的XML关键字查询,又可以极大地提高查询效率。展开更多
Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.Ho...Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance.展开更多
文摘This work is about the progress of previous related work based on an experiment to improve the intelligence of robotic systems,with the aim of achieving more linguistic communication capabilities between humans and robots.In this paper,the authors attempt an algorithmic approach to natural language generation through hole semantics and by applying the OMAS-III computational model as a grammatical formalism.In the original work,a technical language is used,while in the later works,this has been replaced by a limited Greek natural language dictionary.This particular effort was made to give the evolving system the ability to ask questions,as well as the authors developed an initial dialogue system using these techniques.The results show that the use of these techniques the authors apply can give us a more sophisticated dialogue system in the future.
文摘Along with the development of big data, various Natural Language Generation systems (NLGs) have recently been developed by different companies. The aim of this paper is to propose a better understanding of how these systems are designed and used. We propose to study in details one of them which is the NLGs developed by the company Nomao. First, we show the development of this NLGs underlies strong economic stakes since the business model of Nomao partly depends on it. Then, thanks to an eye movement analysis conducted with 28 participants, we show that the texts generated by Nomao’s NLGs contain syntactic and semantic structures that are easy to read but lack socio-semantic coherence which would improve their understanding. From a scientific perspective, our research results highlight the importance of socio-semantic coherence in text-based communication produced by NLGs.
基金Supported by the Key Research Program of the Chinese Academy of Sciences(ZDRE-KT-2021-3)。
文摘Both analyzing a large amount of space weather observed data and alleviating personal experience bias are significant challenges in generating artificial space weather forecast products.With the use of natural language generation methods based on the sequence-to-sequence model,space weather forecast texts can be automatically generated.To conduct our generation tasks at a fine-grained level,a taxonomy of space weather phenomena based on descriptions is presented.Then,our MDH(Multi-Domain Hybrid)model is proposed for generating space weather summaries in two stages.This model is composed of three sequence-to-sequence-based deep neural network sub-models(one Bidirectional Auto-Regressive Transformers pre-trained model and two Transformer models).Then,to evaluate how well MDH performs,quality evaluation metrics based on two prevalent automatic metrics and our innovative human metric are presented.The comprehensive scores of the three summaries generating tasks on testing datasets are 70.87,93.50,and 92.69,respectively.The results suggest that MDH can generate space weather summaries with high accuracy and coherence,as well as suitable length,which can assist forecasters in generating high-quality space weather forecast products,despite the data being starved.
基金This work was supported by the National Natural Science Foundation of China (No.60173066) .
文摘The expert system is an important field of the artificial intelligence. The traditional interface of the expert system is the command, menu and window at present. It limits the application of the expert system and embarrasses the enthusiasm of using expert system. Combining with the study on the expert system of network fault diagnosis, the natural language interface of the expert system has been discussed in this article. This interface can understand and generate Chinese sentences. Using this interface, the user and field experts can use the expert system to diagnose the fault of network conveniently. In the article, first, the extended production rule has been proposed. Then the methods of Chinese sentence generation from conceptual graphs and the model of expert system are introduced in detail. Using this model, the network fault diagnosis expert system and its natural language interface have been developed with Prolog.
文摘Software testing is an important and cost intensive activity in software development.The major contribution in cost is due to test case generations.Requirement-based testing is an approach in which test cases are derivative from requirements without considering the implementation’s internal structure.Requirement-based testing includes functional and nonfunctional requirements.The objective of this study is to explore the approaches that generate test cases from requirements.A systematic literature review based on two research questions and extensive quality assessment criteria includes studies.The study identies 30 primary studies from 410 studies spanned from 2000 to 2018.The review’s nding shows that 53%of journal papers,42%of conference papers,and 5%of book chapters’address requirementsbased testing.Most of the studies use UML,activity,and use case diagrams for test case generation from requirements.One of the signicant lessons learned is that most software testing errors are traced back to errors in natural language requirements.A substantial amount of work focuses on UML diagrams for test case generations,which cannot capture all the system’s developed attributes.Furthermore,there is a lack of UML-based models that can generate test cases from natural language requirements by rening them in context.Coverage criteria indicate how efciently the testing has been performed 12.37%of studies use requirements coverage,20%of studies cover path coverage,and 17%study basic coverage.
文摘Paraphrase is an expression of a text with alternative words and orders to achieve a better clarity. Paraphrases have been found vital for augmenting training dataset, which aid to enhance performance of machine learning models that intended for various natural language processing (NLP) tasks. Thus, recently, automatic paraphrase generation has received increasing attention. However, evaluating quality of generated paraphrases is technically challenging. In the literature, the importance of generated paraphrases is tended to be determined by their impact on the performance of other NLP tasks. This kind of evaluation is referred as extrinsic evaluation, which requires high computational resources to train and test the models. So far, very little attention has been paid to the role of intrinsic evaluation in which quality of generated paraphrase is judged against predefined ground truth (reference paraphrases). In fact, it is also very challenging to find ideal and complete reference paraphrases. Therefore, in this study, we propose semantic or meaning oriented automatic evaluation metric that helps to evaluate quality of generated paraphrases against the original text, which is an intrinsic evaluation approach. Further, we evaluate quality of the paraphrases by assessing their impact on other NLP tasks, which is an extrinsic evaluation method. The goal is to explore the relationship between intrinsic and extrinsic evaluation methods. To ensure the effectiveness of proposed evaluation methods, extensive experiments are done on different publicly available datasets. The experimental results demonstrate that our proposed intrinsic and extrinsic evaluation strategies are promising. The results further reveal that there is a significant correlation between intrinsic and extrinsic evaluation approaches.
基金This work is supported by the Collaborative education project of QST Innovation Technology Group Co.,Ltd and the Ministry of Education of PRC(NO.201801243022).
文摘UML Class diagram generation from textual requirements is an important task in object-oriented design and programing course.This study proposes a method for automatically generating class diagrams from Chinese textual requirements on the basis of Natural Language Processing(NLP)and mapping rules for sentence pattern matching.First,classes are identified through entity recognition rules and candidate class pruning rules using NLP from requirements.Second,class attributes and relationships between classes are extracted using mapping rules for sentence pattern matching on the basis of NLP.Third,we developed an assistant tool integrated into a precision micro classroom system for automatic generation of class diagram,to effectively assist the teaching of object-oriented design and programing course.Results are evaluated with precision,accuracy and recall from eight requirements of object-oriented design and programing course using truth values created by teachers.Our research should benefit beginners of object-oriented design and programing course,who may be students or software developers.It helps them to create correct domain models represented in the UML class diagram.
基金Project supported by the National Natural Science Foundation of China(No.62272100)the Consulting Project of Chinese Academy of Engineering(No.2023-XY-09)+1 种基金the Major Project of the National Social Science Fund of China(No.21ZD11)the Fundamental Research Funds for the Central Universities,China。
文摘Text generation is an essential research area in artificial intelligence(AI)technology and natural language processing and provides key technical support for the rapid development of AI-generated content(AIGC).It is based on technologies such as natural language processing,machine learning,and deep learning,which enable learning language rules through training models to automatically generate text that meets grammatical and semantic requirements.In this paper,we sort and systematically summarize the main research progress in text generation and review recent text generation papers,focusing on presenting a detailed understanding of the technical models.In addition,several typical text generation application systems are presented.Finally,we address some challenges and future directions in AI text generation.We conclude that improving the quality,quantity,interactivity,and adaptability of generated text can help fundamentally advance AI text generation development.
文摘Cyber security addresses the protection of information systems in cyberspace. These systems face multiple attacks on a daily basis, with the level of complication getting increasingly challenging. Despite the existence of multiple solutions, attackers are still quite successful at identifying vulnerabilities to exploit. This is why cyber deception is increasingly being used to divert attackers’ attention and, therefore, enhance the security of information systems. To be effective, deception environments need fake data. This is where Natural Language (NLP) Processing comes in. Many cyber security models have used NLP for vulnerability detection in information systems, email classification, fake citation detection, and many others. Although it is used for text generation, existing models seem to be unsuitable for data generation in a deception environment. Our goal is to use text generation in NLP to generate data in the deception context that will be used to build multi-level deception in information systems. Our model consists of three (3) components, including the connection component, the deception component, composed of several states in which an attacker may be, depending on whether he is malicious or not, and the text generation component. The text generation component considers as input the real data of the information system and allows the production of several texts as output, which are usable at different deception levels.
文摘Multimodal sentence summarization(MMSS)is a new yet challenging task that aims to generate a concise summary of a long sentence and its corresponding image.Although existing methods have gained promising success in MMSS,they overlook the powerful generation ability of generative pre-trained language models(GPLMs),which have shown to be effective in many text generation tasks.To fill this research gap,we propose to using GPLMs to promote the performance of MMSS.Notably,adopting GPLMs to solve MMSS inevitably faces two challenges:1)What fusion strategy should we use to inject visual information into GPLMs properly?2)How to keep the GPLM′s generation ability intact to the utmost extent when the visual feature is injected into the GPLM.To address these two challenges,we propose a vision enhanced generative pre-trained language model for MMSS,dubbed as Vision-GPLM.In Vision-GPLM,we obtain features of visual and textual modalities with two separate encoders and utilize a text decoder to produce a summary.In particular,we utilize multi-head attention to fuse the features extracted from visual and textual modalities to inject the visual feature into the GPLM.Meanwhile,we train Vision-GPLM in two stages:the vision-oriented pre-training stage and fine-tuning stage.In the vision-oriented pre-training stage,we particularly train the visual encoder by the masked language model task while the other components are frozen,aiming to obtain homogeneous representations of text and image.In the fine-tuning stage,we train all the components of Vision-GPLM by the MMSS task.Extensive experiments on a public MMSS dataset verify the superiority of our model over existing baselines.
文摘Understanding the content of the source code and its regular expression is very difficult when they are written in an unfamiliar language.Pseudo-code explains and describes the content of the code without using syntax or programming language technologies.However,writing Pseudo-code to each code instruction is laborious.Recently,neural machine translation is used to generate textual descriptions for the source code.In this paper,a novel deep learning-based transformer(DLBT)model is proposed for automatic Pseudo-code generation from the source code.The proposed model uses deep learning which is based on Neural Machine Translation(NMT)to work as a language translator.The DLBT is based on the transformer which is an encoder-decoder structure.There are three major components:tokenizer and embeddings,transformer,and post-processing.Each code line is tokenized to dense vector.Then transformer captures the relatedness between the source code and the matching Pseudo-code without the need of Recurrent Neural Network(RNN).At the post-processing step,the generated Pseudo-code is optimized.The proposed model is assessed using a real Python dataset,which contains more than 18,800 lines of a source code written in Python.The experiments show promising performance results compared with other machine translation methods such as Recurrent Neural Network(RNN).The proposed DLBT records 47.32,68.49 accuracy and BLEU performance measures,respectively.
文摘Aiming at complex and changeable factors such as speech theme and environment,which make it difficult for a speaker to prepare the speech text in a short time,this paper proposes a speech generation and demonstration system based on deep learning.This system is based on the Deep Learning Development Framework(PyTorch),trained through the theory of GPT-2 and the open source pretrained model,to generate multiple speeches according to the topics given by users,and the system generates thefinal speech and corresponding voice demon-stration audio through text modification,speech synthesis and other technologies to help users quickly obtain the target document and audio.Experiments show that the text generated by this model is smooth and easy to use,which helps shorten the preparation time of speakers and improves the confidence of the impromptu speaker.In addition,the paper explores the application prospects of text generation and has certain reference value.
文摘为了解决基于LCA(Lower Common Ancestor)的XML关键字查询丢失语义的问题,提出了一种基于"自然语言生成技术(Natural Language Generation,NLG)"的XML关键字查询技术,将NLG的内容规划应用到XML文档,产生针对用户查询的消息语句集,通过对消息语句集的筛选既可以实现基于语义的XML关键字查询,又可以极大地提高查询效率。
基金supported by the Science and Technology Department of Sichuan Province(No.2021YFG0156).
文摘Generating diverse and factual text is challenging and is receiving increasing attention.By sampling from the latent space,variational autoencoder-based models have recently enhanced the diversity of generated text.However,existing research predominantly depends on summarizationmodels to offer paragraph-level semantic information for enhancing factual correctness.The challenge lies in effectively generating factual text using sentence-level variational autoencoder-based models.In this paper,a novel model called fact-aware conditional variational autoencoder is proposed to balance the factual correctness and diversity of generated text.Specifically,our model encodes the input sentences and uses them as facts to build a conditional variational autoencoder network.By training a conditional variational autoencoder network,the model is enabled to generate text based on input facts.Building upon this foundation,the input text is passed to the discriminator along with the generated text.By employing adversarial training,the model is encouraged to generate text that is indistinguishable to the discriminator,thereby enhancing the quality of the generated text.To further improve the factual correctness,inspired by the natural language inference system,the entailment recognition task is introduced to be trained together with the discriminator via multi-task learning.Moreover,based on the entailment recognition results,a penalty term is further proposed to reconstruct the loss of our model,forcing the generator to generate text consistent with the facts.Experimental results demonstrate that compared with competitivemodels,ourmodel has achieved substantial improvements in both the quality and factual correctness of the text,despite only sacrificing a small amount of diversity.Furthermore,when considering a comprehensive evaluation of diversity and quality metrics,our model has also demonstrated the best performance.