The proliferation of forums and blogs leads to challenges and opportunities for processing large amounts of information. The information shared on various topics often contains opinionated words which are qualitative ...The proliferation of forums and blogs leads to challenges and opportunities for processing large amounts of information. The information shared on various topics often contains opinionated words which are qualitative in nature. These qualitative words need statistical computations to convert them into useful quantitative data. This data should be processed properly since it expresses opinions. Each of these opinion bearing words differs based on the significant meaning it conveys. To process the linguistic meaning of words into data and to enhance opinion mining analysis, we propose a novel weighting scheme, referred to as inferred word weighting(IWW). IWW is computed based on the significance of the word in the document(SWD) and the significance of the word in the expression(SWE) to enhance their performance. The proposed weighting methods give an analytic view and provide appropriate weights to the words compared to existing methods. In addition to the new weighting methods, another type of checking is done on the performance of text classification by including stop-words. Generally, stop-words are removed in text processing. When this new concept of including stop-words is applied to the proposed and existing weighting methods, two facts are observed:(1) Classification performance is enhanced;(2) The outcome difference between inclusion and exclusion of stop-words is smaller in the proposed methods, and larger in existing methods. The inferences provided by these observations are discussed. Experimental results of the benchmark data sets show the potential enhancement in terms of classification accuracy.展开更多
词性是自然语言处理的基本要素,词语顺序包含了所传达的语义与语法信息,它们都是自然语言中的关键信息.在word embedding模型中如何有效地将两者结合起来,是目前研究的重点.本文提出的Structured word2vec on POS联合了词语顺序与词性...词性是自然语言处理的基本要素,词语顺序包含了所传达的语义与语法信息,它们都是自然语言中的关键信息.在word embedding模型中如何有效地将两者结合起来,是目前研究的重点.本文提出的Structured word2vec on POS联合了词语顺序与词性两种信息,不仅使模型可以感知词语位置顺序,而且利用词性关联信息来建立上下文窗口内词语之间的固有句法关系.Structured word2vec on POS将词语按其位置顺序定向嵌入,对词向量和词性相关加权矩阵进行联合优化.实验通过词语类比、词相似性任务,证明了所提出的方法的有效性.展开更多
文摘The proliferation of forums and blogs leads to challenges and opportunities for processing large amounts of information. The information shared on various topics often contains opinionated words which are qualitative in nature. These qualitative words need statistical computations to convert them into useful quantitative data. This data should be processed properly since it expresses opinions. Each of these opinion bearing words differs based on the significant meaning it conveys. To process the linguistic meaning of words into data and to enhance opinion mining analysis, we propose a novel weighting scheme, referred to as inferred word weighting(IWW). IWW is computed based on the significance of the word in the document(SWD) and the significance of the word in the expression(SWE) to enhance their performance. The proposed weighting methods give an analytic view and provide appropriate weights to the words compared to existing methods. In addition to the new weighting methods, another type of checking is done on the performance of text classification by including stop-words. Generally, stop-words are removed in text processing. When this new concept of including stop-words is applied to the proposed and existing weighting methods, two facts are observed:(1) Classification performance is enhanced;(2) The outcome difference between inclusion and exclusion of stop-words is smaller in the proposed methods, and larger in existing methods. The inferences provided by these observations are discussed. Experimental results of the benchmark data sets show the potential enhancement in terms of classification accuracy.
文摘词性是自然语言处理的基本要素,词语顺序包含了所传达的语义与语法信息,它们都是自然语言中的关键信息.在word embedding模型中如何有效地将两者结合起来,是目前研究的重点.本文提出的Structured word2vec on POS联合了词语顺序与词性两种信息,不仅使模型可以感知词语位置顺序,而且利用词性关联信息来建立上下文窗口内词语之间的固有句法关系.Structured word2vec on POS将词语按其位置顺序定向嵌入,对词向量和词性相关加权矩阵进行联合优化.实验通过词语类比、词相似性任务,证明了所提出的方法的有效性.