Sentiment analysis is now more and more important in modern natural language processing,and the sentiment classification is the one of the most popular applications.The crucial part of sentiment classification is feat...Sentiment analysis is now more and more important in modern natural language processing,and the sentiment classification is the one of the most popular applications.The crucial part of sentiment classification is feature extraction.In this paper,two methods for feature extraction,feature selection and feature embedding,are compared.Then Word2Vec is used as an embedding method.In this experiment,Chinese document is used as the corpus,and tree methods are used to get the features of a document:average word vectors,Doc2Vec and weighted average word vectors.After that,these samples are fed to three machine learning algorithms to do the classification,and support vector machine(SVM) has the best result.Finally,the parameters of random forest are analyzed.展开更多
We propose an approach to learning sample embedding for analyzing multi-dimensional datasets.The basic idea is to extract rules from the given dataset and learn the embedding for each sample based on the rules it sati...We propose an approach to learning sample embedding for analyzing multi-dimensional datasets.The basic idea is to extract rules from the given dataset and learn the embedding for each sample based on the rules it satisfies.The approach can filter out pattern-irrelevant attributes,leading to significant visual structures of samples satisfying the same rules in the projection.In addition,analysts can understand a visual structure based on the rules that the involved samples satisfy,which improves the projection’s pattern interpretability.Our research involves two methods for achieving and applying the approach.First,we give a method to learn rule-based embedding for each sample.Second,we integrate the method into a system to achieve an analytical workflow.Cases on real-world dataset and quantitative experiment results show the usability and effectiveness of our approach.展开更多
基金National Natural Science Foundation of China(No.71331008)
文摘Sentiment analysis is now more and more important in modern natural language processing,and the sentiment classification is the one of the most popular applications.The crucial part of sentiment classification is feature extraction.In this paper,two methods for feature extraction,feature selection and feature embedding,are compared.Then Word2Vec is used as an embedding method.In this experiment,Chinese document is used as the corpus,and tree methods are used to get the features of a document:average word vectors,Doc2Vec and weighted average word vectors.After that,these samples are fed to three machine learning algorithms to do the classification,and support vector machine(SVM) has the best result.Finally,the parameters of random forest are analyzed.
文摘We propose an approach to learning sample embedding for analyzing multi-dimensional datasets.The basic idea is to extract rules from the given dataset and learn the embedding for each sample based on the rules it satisfies.The approach can filter out pattern-irrelevant attributes,leading to significant visual structures of samples satisfying the same rules in the projection.In addition,analysts can understand a visual structure based on the rules that the involved samples satisfy,which improves the projection’s pattern interpretability.Our research involves two methods for achieving and applying the approach.First,we give a method to learn rule-based embedding for each sample.Second,we integrate the method into a system to achieve an analytical workflow.Cases on real-world dataset and quantitative experiment results show the usability and effectiveness of our approach.