In the age of the internet,social media are connecting us all at the tip of our fingers.People are linkedthrough different social media.The social network,Twitter,allows people to tweet their thoughts on any particula...In the age of the internet,social media are connecting us all at the tip of our fingers.People are linkedthrough different social media.The social network,Twitter,allows people to tweet their thoughts on any particular event or a specific political body which provides us with a diverse range of political insights.This paper serves the purpose of text processing of a multilingual dataset including Urdu,English,and Roman Urdu.Explore machine learning solutions for sentiment analysis and train models,collect the data on government from Twitter,apply sentiment analysis,and provide a python library that classifies text sentiment.Training data contained tweets in three languages:English:200k,Urdu:200k and Roman Urdu:11k.Five different classification models are applied to determine sentiments,and eventually,the use of ensemble technique to move forward with the acquired results is explored.The Logistic Regression model performed best with an accuracy of 75%,followed by the Linear Support Vector classifier and Stochastic Gradient Descent model,both having 74%accuracy.Lastly,Multinomial Naïve Bayes and Complement Naïve Bayes models both achieved 73%accuracy.展开更多
[目的/意义]通过对国内外多语本体领域映射技术相关研究成果的总结和Euro Word Net案例分析,为国内跨语言信息检索系统映射机制的建立提供借鉴和参考。[方法/过程]选取目前发展较为成熟的多语本体库Euro Word Net作为案例,分别从数据库...[目的/意义]通过对国内外多语本体领域映射技术相关研究成果的总结和Euro Word Net案例分析,为国内跨语言信息检索系统映射机制的建立提供借鉴和参考。[方法/过程]选取目前发展较为成熟的多语本体库Euro Word Net作为案例,分别从数据库设计、本体构建、概念存储和多语文化差异的映射处理4个方面对其中间语言索引机制(Inter-Lingual-Index,ILI)进行分析。[结果/结论]嵌入式的数据库结构设计、概念抽取及同义词集对应关系的界定、概念存储的细粒度化和复杂等价关系的建立是建立跨语言信息检索映射机制的关键。展开更多
文摘In the age of the internet,social media are connecting us all at the tip of our fingers.People are linkedthrough different social media.The social network,Twitter,allows people to tweet their thoughts on any particular event or a specific political body which provides us with a diverse range of political insights.This paper serves the purpose of text processing of a multilingual dataset including Urdu,English,and Roman Urdu.Explore machine learning solutions for sentiment analysis and train models,collect the data on government from Twitter,apply sentiment analysis,and provide a python library that classifies text sentiment.Training data contained tweets in three languages:English:200k,Urdu:200k and Roman Urdu:11k.Five different classification models are applied to determine sentiments,and eventually,the use of ensemble technique to move forward with the acquired results is explored.The Logistic Regression model performed best with an accuracy of 75%,followed by the Linear Support Vector classifier and Stochastic Gradient Descent model,both having 74%accuracy.Lastly,Multinomial Naïve Bayes and Complement Naïve Bayes models both achieved 73%accuracy.
文摘[目的/意义]通过对国内外多语本体领域映射技术相关研究成果的总结和Euro Word Net案例分析,为国内跨语言信息检索系统映射机制的建立提供借鉴和参考。[方法/过程]选取目前发展较为成熟的多语本体库Euro Word Net作为案例,分别从数据库设计、本体构建、概念存储和多语文化差异的映射处理4个方面对其中间语言索引机制(Inter-Lingual-Index,ILI)进行分析。[结果/结论]嵌入式的数据库结构设计、概念抽取及同义词集对应关系的界定、概念存储的细粒度化和复杂等价关系的建立是建立跨语言信息检索映射机制的关键。