One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ...One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.展开更多
Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimat...Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.展开更多
In recent years, mobile devices have become widespread and refined, and they have offered increased convenience in human life. For these reasons, a variety of embedded systems have been designed. Therefore, improving ...In recent years, mobile devices have become widespread and refined, and they have offered increased convenience in human life. For these reasons, a variety of embedded systems have been designed. Therefore, improving methods for developing of embedded software systematically has become an important issue. Platform-based design is one example of an embedded-system design method that can reduce the design cost via improving a design’s abstraction level. However, platform-based design lacks precise definitions for platforms and design processes. This paper provides an approach that combines the aspects and platform-based design methods for developing embedded software. The approach is built on platform-based design methodology and uses the separating of concerns (SoC) concept to define the aspects and to reduce the crosscutting concerns in embedded system modeling. For aspect issues, we use the extended UML notation with aspects to describe both the static structure and the dynamic structure of the embedded system. We used an example of a digital photo frame system to demonstrate our approach.展开更多
The choice of methods or design languages is a crucial phase in the development of systems and software, also for real time and embedded systems. An open question that remains in the design of these types of systems i...The choice of methods or design languages is a crucial phase in the development of systems and software, also for real time and embedded systems. An open question that remains in the design of these types of systems is to build a method, or to choose one among those existing, capable to cover the life cycle of a project, and particularly the development phases. This article contributes to answer the question, by proposing an approach based on a multi-criteria comparative study, of few languages and methods dedicated to the design of real time and embedded systems. The underlying objective of this work is to present to designers a wide range of approaches, and elements that can guide their choices. In order to reach this goal, we propose different comparison criteria. Each criterion is divided into sub-criteria, so that the designers can refine their choices according to the qualities they prefer and wish to have in the method or language. We also define a rating scale which is used to assess the retained languages and methods. The scores obtained from this assessment are presented in tables, one table per criterion, followed by a summary table giving the overall scores. Graphics built from these tables are provided and intend to facilitate the judgement and thus the choice of the designers.展开更多
首先归纳了AADL(architecture analysis and design language)的发展历程及其主要建模元素.其次,从模型驱动设计与实现的角度综述了AADL在不同阶段的研究与应用,总结了研究热点,分析了现有研究的不足,并对AADL的建模与分析工具、应用实...首先归纳了AADL(architecture analysis and design language)的发展历程及其主要建模元素.其次,从模型驱动设计与实现的角度综述了AADL在不同阶段的研究与应用,总结了研究热点,分析了现有研究的不足,并对AADL的建模与分析工具、应用实践进行了概述.最后,探讨了AADL的发展与研究方向.展开更多
文摘One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.
文摘Nowadays,we can use the multi-task learning approach to train a machine-learning algorithm to learn multiple related tasks instead of training it to solve a single task.In this work,we propose an algorithm for estimating textual similarity scores and then use these scores in multiple tasks such as text ranking,essay grading,and question answering systems.We used several vectorization schemes to represent the Arabic texts in the SemEval2017-task3-subtask-D dataset.The used schemes include lexical-based similarity features,frequency-based features,and pre-trained model-based features.Also,we used contextual-based embedding models such as Arabic Bidirectional Encoder Representations from Transformers(AraBERT).We used the AraBERT model in two different variants.First,as a feature extractor in addition to the text vectorization schemes’features.We fed those features to various regression models to make a prediction value that represents the relevancy score between Arabic text units.Second,AraBERT is adopted as a pre-trained model,and its parameters are fine-tuned to estimate the relevancy scores between Arabic textual sentences.To evaluate the research results,we conducted several experiments to compare the use of the AraBERT model in its two variants.In terms of Mean Absolute Percentage Error(MAPE),the results showminor variance between AraBERT v0.2 as a feature extractor(21.7723)and the fine-tuned AraBERT v2(21.8211).On the other hand,AraBERT v0.2-Large as a feature extractor outperforms the finetuned AraBERT v2 model on the used data set in terms of the coefficient of determination(R2)values(0.014050,−0.032861),respectively.
文摘In recent years, mobile devices have become widespread and refined, and they have offered increased convenience in human life. For these reasons, a variety of embedded systems have been designed. Therefore, improving methods for developing of embedded software systematically has become an important issue. Platform-based design is one example of an embedded-system design method that can reduce the design cost via improving a design’s abstraction level. However, platform-based design lacks precise definitions for platforms and design processes. This paper provides an approach that combines the aspects and platform-based design methods for developing embedded software. The approach is built on platform-based design methodology and uses the separating of concerns (SoC) concept to define the aspects and to reduce the crosscutting concerns in embedded system modeling. For aspect issues, we use the extended UML notation with aspects to describe both the static structure and the dynamic structure of the embedded system. We used an example of a digital photo frame system to demonstrate our approach.
文摘The choice of methods or design languages is a crucial phase in the development of systems and software, also for real time and embedded systems. An open question that remains in the design of these types of systems is to build a method, or to choose one among those existing, capable to cover the life cycle of a project, and particularly the development phases. This article contributes to answer the question, by proposing an approach based on a multi-criteria comparative study, of few languages and methods dedicated to the design of real time and embedded systems. The underlying objective of this work is to present to designers a wide range of approaches, and elements that can guide their choices. In order to reach this goal, we propose different comparison criteria. Each criterion is divided into sub-criteria, so that the designers can refine their choices according to the qualities they prefer and wish to have in the method or language. We also define a rating scale which is used to assess the retained languages and methods. The scores obtained from this assessment are presented in tables, one table per criterion, followed by a summary table giving the overall scores. Graphics built from these tables are provided and intend to facilitate the judgement and thus the choice of the designers.
文摘首先归纳了AADL(architecture analysis and design language)的发展历程及其主要建模元素.其次,从模型驱动设计与实现的角度综述了AADL在不同阶段的研究与应用,总结了研究热点,分析了现有研究的不足,并对AADL的建模与分析工具、应用实践进行了概述.最后,探讨了AADL的发展与研究方向.