One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ...One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.展开更多
Automatically mapping a requirement specification to design model in Software Engineering is an open complex problem. Existing methods use a complex manual process that use the knowledge from the requirement specifica...Automatically mapping a requirement specification to design model in Software Engineering is an open complex problem. Existing methods use a complex manual process that use the knowledge from the requirement specification/modeling and the design, and try to find a good match between them. The key task done by designers is to convert a natural language based requirement specification (or corresponding UML based representation) into a predominantly computer language based design model—thus the process is very complex as there is a very large gap between our natural language and computer language. Moreover, this is not just a simple language conversion, but rather a complex knowledge conversion that can lead to meaningful design implementation. In this paper, we describe an automated method to map Requirement Model to Design Model and thus automate/partially automate the Structured Design (SD) process. We believe, this is the first logical step in mapping a more complex requirement specification to design model. We call it IRTDM (Intelligent Agent based requirement model to design model mapping). The main theme of IRTDM is to use some AI (Artificial Intelligence) based algorithms, semantic representation using Ontology or Predicate Logic, design structures using some well known design framework and Machine Learning algorithms for learning over time. Semantics help convert natural language based requirement specification (and associated UML representation) into high level design model followed by mapping to design structures. AI method can also be used to convert high level design structures into lower level design which then can be refined further by some manual and/or semi automated process. We emphasize that automation is one of the key ways to minimize the software cost, and is very important for all, especially, for the “Design for the Bottom 90% People” or BOP (Base of the Pyramid People).展开更多
Domain-specific metamodeling language(DSMML) defined by informal method cannot strictly represent its structural semantics,so its properties such as consistency cannot be holistically and systematically verified.In re...Domain-specific metamodeling language(DSMML) defined by informal method cannot strictly represent its structural semantics,so its properties such as consistency cannot be holistically and systematically verified.In response,the paper proposes a formal representation of the structural semantics of DSMML named extensible markup language(XML) based metamodeling language(XMML) and its metamodels consistency verification method.Firstly,we describe our approach of formalization,based on this,the method of consistency verification of XMML and its metamodels based on first-order logical inference is presented;then,the formalization automatic mapping engine for metamodels is designed to show the feasibility of our formal method.展开更多
A new scheme is presented to detect a large number ofKeywordsin voice controlled switchboard tasks. The new scheme is based on two stages. In the first stage, N best syllable candidates with their corresponding acous...A new scheme is presented to detect a large number ofKeywordsin voice controlled switchboard tasks. The new scheme is based on two stages. In the first stage, N best syllable candidates with their corresponding acoustic scores are generated by an acoustic recognizer. In the second stage, a semantic model based parser is applied to determine the optimum keywords by searching through the lattice of N best candidates. The experimental results show that when the spoken input deviates from the predefined syntactic constraints, the parser can also demonstrate high performance. For comparison purposes, the most common way to incorporate the syntactic knowledge of the task directly into the acoustic recognizer in the form of a finite state network is also investigated. Furthermore, to address the sparse data problems, out of domain data in the form of newspaper text are used to obtain a more robust combined semantic model. The experiments show that the combined semantic model can improve the keywords detection rate from 90.07% to 92.91% when 80 ungrammatical sentences which do not conform to the task grammar are used as testing material.展开更多
文摘One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.
文摘Automatically mapping a requirement specification to design model in Software Engineering is an open complex problem. Existing methods use a complex manual process that use the knowledge from the requirement specification/modeling and the design, and try to find a good match between them. The key task done by designers is to convert a natural language based requirement specification (or corresponding UML based representation) into a predominantly computer language based design model—thus the process is very complex as there is a very large gap between our natural language and computer language. Moreover, this is not just a simple language conversion, but rather a complex knowledge conversion that can lead to meaningful design implementation. In this paper, we describe an automated method to map Requirement Model to Design Model and thus automate/partially automate the Structured Design (SD) process. We believe, this is the first logical step in mapping a more complex requirement specification to design model. We call it IRTDM (Intelligent Agent based requirement model to design model mapping). The main theme of IRTDM is to use some AI (Artificial Intelligence) based algorithms, semantic representation using Ontology or Predicate Logic, design structures using some well known design framework and Machine Learning algorithms for learning over time. Semantics help convert natural language based requirement specification (and associated UML representation) into high level design model followed by mapping to design structures. AI method can also be used to convert high level design structures into lower level design which then can be refined further by some manual and/or semi automated process. We emphasize that automation is one of the key ways to minimize the software cost, and is very important for all, especially, for the “Design for the Bottom 90% People” or BOP (Base of the Pyramid People).
基金the Yunnan Provincial Department of Education Research Fund Key Project(No.2011z025)General Project(No.2011y214)
文摘Domain-specific metamodeling language(DSMML) defined by informal method cannot strictly represent its structural semantics,so its properties such as consistency cannot be holistically and systematically verified.In response,the paper proposes a formal representation of the structural semantics of DSMML named extensible markup language(XML) based metamodeling language(XMML) and its metamodels consistency verification method.Firstly,we describe our approach of formalization,based on this,the method of consistency verification of XMML and its metamodels based on first-order logical inference is presented;then,the formalization automatic mapping engine for metamodels is designed to show the feasibility of our formal method.
基金the State High-Tech Developments Planof China !( No. 863 -3 0 6-0 2 -1) Chinese211Engineering Project!( No.9610 3 -2 )
文摘A new scheme is presented to detect a large number ofKeywordsin voice controlled switchboard tasks. The new scheme is based on two stages. In the first stage, N best syllable candidates with their corresponding acoustic scores are generated by an acoustic recognizer. In the second stage, a semantic model based parser is applied to determine the optimum keywords by searching through the lattice of N best candidates. The experimental results show that when the spoken input deviates from the predefined syntactic constraints, the parser can also demonstrate high performance. For comparison purposes, the most common way to incorporate the syntactic knowledge of the task directly into the acoustic recognizer in the form of a finite state network is also investigated. Furthermore, to address the sparse data problems, out of domain data in the form of newspaper text are used to obtain a more robust combined semantic model. The experiments show that the combined semantic model can improve the keywords detection rate from 90.07% to 92.91% when 80 ungrammatical sentences which do not conform to the task grammar are used as testing material.