期刊文献+

基于自然语言句法信息的正则表达式生成

Regular Expression Generation Based on Natural Language Syntax Information
下载PDF
导出
摘要 正则表达式由一系列字符和元字符组成,定义了一种匹配规则,可以用来检查一个字符串是否与所需的模式匹配。在软件开发过程中,很多开发人员发现编写正则表达式较为困难。因此,根据自然语言需求描述生成正则表达式成为研究热点。近年来,将自然语言描述转化为正则表达式的系统取得了一些研究成果,但往往只针对简单的序列化文本。探讨了将自然语言查询转化为可以执行其功能的正则表达式的方法。鉴于自然语言处理中句法解析的成功应用,模型使用自然语言的结构信息,以分层聚合的方式对语法解析树进行嵌入,并使用适用于输入树结构的Tree-transformer架构对自然语言描述进行自注意编码。解码器使用交叉注意力来预测正则表达式。在两个公共数据集上对模型进行了验证。实验证明,所提模型有效地提高了生成的正则表达式的质量,并在DFA-Equal-Acc评估指标中优于现有模型。 Regular expressions are composed of a series of characters and metacharacters,defining a matching pattern that can be used to check whether a string matches the desired criteria.Many developers find it is difficlult to write regular expressions during the software development process.Therefore,generating regular expressions based on natural language requirements has become a research focus.In recent years,systems that transform natural language descriptions into regular expressions have achieved some research results,but often only for simple serialized texts.This paper explores methods for converting natural language queries into regular expressions that can execute their intended functionality.Given the successful application of syntactic parsing in natural language processing,our model utilizes the structural information of natural language by embedding syntax parse trees in a hierarchically aggregated manner.We employ the Tree-transformer architecture,suitable for input tree structures,to perform self-attention encoding on natural language descriptions.The decoder uses cross-attention to predict the regular expression.The model is validated on two public datasets.Experimental results demonstrate that our model effectively improves the quality of generated regular expressions.It outperforms existing models in the DFA-Equal-Acc evaluation metric.
作者 王昊 吴军华 WANG Hao;WU Junhua(College of Computer and Information Engineering,Nanjing Tech University,Nanjing 211816,China)
出处 《计算机科学》 CSCD 北大核心 2024年第S02期92-97,共6页 Computer Science
基金 江苏省高等学校教育技术研究会高等教育信息化研究课题重点课题(2021JSETKT023)。
关键词 正则表达式生成 Tree-Transformer 句法解析 Regular expression generation Tree-Transformer Syntactic parsing
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部