One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ...One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.展开更多
The interaction of wave-particles and wave-wave in the space plasmas are essentially non-linear or non-Gaussian processes. Using the higher-order statistical analyses methods (higher-order moments and bi-tri correlati...The interaction of wave-particles and wave-wave in the space plasmas are essentially non-linear or non-Gaussian processes. Using the higher-order statistical analyses methods (higher-order moments and bi-tri correlation or bi-tri spectrum), its physical properties can be described. The question addressed in this paper is that of the usefulness of higher-order statistical analysis for identification of the wave-particles interaction in space plasmas. The signals handled are from the ARCAD-3 ISOPROBE experiment on ELF frequency range, then strong electrostatic turbulence and electron density irregularities. Second and third order statistical analyses are applied: first, on time series associated with each type of measurement, then, on the two types. All results are presented for one typical case. Correlation functions estimated over the corresponding time intervals point out the existence of a, non-linear interaction between these fluctuations and electrostatic filed.展开更多
文摘One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.
文摘The interaction of wave-particles and wave-wave in the space plasmas are essentially non-linear or non-Gaussian processes. Using the higher-order statistical analyses methods (higher-order moments and bi-tri correlation or bi-tri spectrum), its physical properties can be described. The question addressed in this paper is that of the usefulness of higher-order statistical analysis for identification of the wave-particles interaction in space plasmas. The signals handled are from the ARCAD-3 ISOPROBE experiment on ELF frequency range, then strong electrostatic turbulence and electron density irregularities. Second and third order statistical analyses are applied: first, on time series associated with each type of measurement, then, on the two types. All results are presented for one typical case. Correlation functions estimated over the corresponding time intervals point out the existence of a, non-linear interaction between these fluctuations and electrostatic filed.