The global growth of the Internet and the rapid expansion of social networks such as Facebook make multilingual sentiment analysis of social media content very necessary. This paper performs the first sentiment analys...The global growth of the Internet and the rapid expansion of social networks such as Facebook make multilingual sentiment analysis of social media content very necessary. This paper performs the first sentiment analysis on code-mixed Bambara-French Facebook comments. We develop four Long Short-term Memory(LSTM)-based models and two Convolutional Neural Network(CNN)-based models, and use these six models, Na?ve Bayes, and Support Vector Machines(SVM) to conduct experiments on a constituted dataset. Social media text written in Bambara is scarce. To mitigate this weakness, this paper uses dictionaries of character and word indexes to produce character and word embedding in place of pre-trained word vectors. We investigate the effect of comment length on the models and perform a comparison among them. The best performing model is a one-layer CNN deep learning model with an accuracy of 83.23 %.展开更多
Chinese-English code-mixing with distinct linguistic features has become prevailing in domestic trademarks in recent three or four decades. The reasons behind this prevalence were left untouched in previous studies. I...Chinese-English code-mixing with distinct linguistic features has become prevailing in domestic trademarks in recent three or four decades. The reasons behind this prevalence were left untouched in previous studies. In this paper, Giddens' modernity theory is used as the theoretical framework to account for the major reason behind the prevalence of Chinese-English codemixing: English code in Chinese business logos plays a positive role as a reflection of China's modernization and her participation in economic globalization.展开更多
The study concerns the individual bilingualism of students at a higher education level.The MIX phenomenon of single word is a process for them to master the language from the beginning to a higher level.However,the MI...The study concerns the individual bilingualism of students at a higher education level.The MIX phenomenon of single word is a process for them to master the language from the beginning to a higher level.However,the MIX should be aware that it is not the more MIX the better,rather the more“norm and complete”the better.展开更多
Code-mixing is a natural phenomenon in multilingual and bilingual communities.As a result of language contact in China,code mixing is on the rise.This paper is devoted to exploring the attitude of contemporary college...Code-mixing is a natural phenomenon in multilingual and bilingual communities.As a result of language contact in China,code mixing is on the rise.This paper is devoted to exploring the attitude of contemporary college students towards the linguistic phenomenon of code mixing of Chinese and English.Through online survey and data analysis,this paper finds that the need to speak expressively,to bring forth humorous effect and use euphemism to avoid awkward situation are the first three factors accounting for codemixing.As speakers,they tend to use code mixing of Chinese-English in informal situations with more intimate people;as listeners,they hope such expressions can make humor and make the topic easy.In terms of usage,they are more willing to accept English expressions to avoid some disadvantages.Most of them are remaining neutral on supporting or opposing the code mixing of Chinese and English.展开更多
Language plays a significant role in business,trade and commerce.Bargaining in open-air markets often involves the speech act of negotiating,compromising and manoeuvring,which could either result in conflict or persua...Language plays a significant role in business,trade and commerce.Bargaining in open-air markets often involves the speech act of negotiating,compromising and manoeuvring,which could either result in conflict or persuasion of the potential buyer to patronize a seller.This article examined the sociolinguistic aspects of language use between sellers and buyers in Ipata,a popular market in Ilorin,north-central Nigeria.The call strategy,spiel,honorification,pragmatic mechanics as well as sociolinguistic style employed by vendors were observed.The objectives of the study were to:identify the number of languages used in Ipata market;investigate the factors that influence the choice of any of the languages used between sellers and buyers at a particular time;analyze their sociolinguistic and stylistic features;and discuss some of the barriers that could cause intercultural communication breakdown between sellers and buyers in a market situation.Oral interview,systematic observation and Bauman’s(2001)method of street recording were used to gather data for the study.Sixty-five people comprising thirty-six females and twenty-nine males were interviewed.The survey which spanned three months discovered that open-air markets bear some universal features,however,due to cultural relativity,variations occur.To this end,it was established that Ipata market harbours different languages;it also identified calls and spiels with their characteristic stylistic,sociolinguistic and discourse features in the market.In conclusion,the study argued that studying the verbal discourse of marketplaces is significant as it defines what the language-culture-society-relationship is all about.展开更多
The discussion of HKE’s(Hong Kong English)origin,linguistic features,language planning policies and status aims to present a general overview of HKE.Through literature reviewing,it is found that HKE has simpler vowel...The discussion of HKE’s(Hong Kong English)origin,linguistic features,language planning policies and status aims to present a general overview of HKE.Through literature reviewing,it is found that HKE has simpler vowel system and smaller num ber of vowel contrasts,all fricatives are voiceless for most HKE speakers,HKE speaker like to simplify the final consonant clusters or omit the final consonant,the subject of a relative clause is usually missing in the‘zero’-subject relatives,code-mixing,codeswitching and the direct translation(from Cantonese into English or the directive translation of the sound of load-words)are popu lar in HKE speakers’speech,and HKE enjoys the status of second language in Hong Kong,etc.展开更多
Purpose-Normalization is an important step in all the natural language processing applications that are handling social media text.The text from social media poses a different kind of problems that are not present in ...Purpose-Normalization is an important step in all the natural language processing applications that are handling social media text.The text from social media poses a different kind of problems that are not present in regular text.Recently,a considerable amount of work has been done in this direction,but mostly in the English language.People who do not speak English code mixed the text with their native language and posted text on social media using the Roman script.This kind of text further aggravates the problem of normalizing.This paper aims to discuss the concept of normalization with respect to code-mixed social media text,and a model has been proposed to normalize such text.Design/methodology/approach-The system is divided into two phases-candidate generation and most probable sentence selection.Candidate generation task is treated as machine translation task where the Roman text is treated as source language and Gurmukhi text is treated as the target language.Characterbased translation system has been proposed to generate candidate tokens.Once candidates are generated,the second phase uses the beam search method for selecting the most probable sentence based on hidden Markov model.Findings-Character error rate(CER)and bilingual evaluation understudy(BLEU)score are reported.The proposed system has been compared with Akhar software and RB\_R2G system,which are also capable of transliterating Roman text to Gurmukhi.The performance of the system outperforms Akhar software.The CER and BLEU scores are 0.268121 and 0.6807939,respectively,for ill-formed text.Research limitations/implications-It was observed that the system produces dialectical variations of a word or the word with minor errors like diacritic missing.Spell checker can improve the output of the system by correcting these minor errors.Extensive experimentation is needed for optimizing language identifier,which will further help in improving the output.The language model also seeks further exploration.Inclusion of wider context,particularly from social media text,is an important area that deserves further investigation.Practical implications-The practical implications of this study are:(1)development of parallel dataset containing Roman and Gurmukhi text;(2)development of dataset annotated with language tag;(3)development of the normalizing system,which is first of its kind and proposes translation based solution for normalizing noisy social media text from Roman to Gurmukhi.It can be extended for any pair of scripts.(4)The proposed system can be used for better analysis of social media text.Theoretically,our study helps in better understanding of text normalization in social media context and opens the doors for further research in multilingual social media text normalization.Originality/value-Existing research work focus on normalizing monolingual text.This study contributes towards the development of a normalization system for multilingual text.展开更多
基金Supported by the National Natural Science Foundation of China(61272451,61572380,61772383 and 61702379)the Major State Basic Research Development Program of China(2014CB340600)
文摘The global growth of the Internet and the rapid expansion of social networks such as Facebook make multilingual sentiment analysis of social media content very necessary. This paper performs the first sentiment analysis on code-mixed Bambara-French Facebook comments. We develop four Long Short-term Memory(LSTM)-based models and two Convolutional Neural Network(CNN)-based models, and use these six models, Na?ve Bayes, and Support Vector Machines(SVM) to conduct experiments on a constituted dataset. Social media text written in Bambara is scarce. To mitigate this weakness, this paper uses dictionaries of character and word indexes to produce character and word embedding in place of pre-trained word vectors. We investigate the effect of comment length on the models and perform a comparison among them. The best performing model is a one-layer CNN deep learning model with an accuracy of 83.23 %.
文摘Chinese-English code-mixing with distinct linguistic features has become prevailing in domestic trademarks in recent three or four decades. The reasons behind this prevalence were left untouched in previous studies. In this paper, Giddens' modernity theory is used as the theoretical framework to account for the major reason behind the prevalence of Chinese-English codemixing: English code in Chinese business logos plays a positive role as a reflection of China's modernization and her participation in economic globalization.
文摘The study concerns the individual bilingualism of students at a higher education level.The MIX phenomenon of single word is a process for them to master the language from the beginning to a higher level.However,the MIX should be aware that it is not the more MIX the better,rather the more“norm and complete”the better.
文摘Code-mixing is a natural phenomenon in multilingual and bilingual communities.As a result of language contact in China,code mixing is on the rise.This paper is devoted to exploring the attitude of contemporary college students towards the linguistic phenomenon of code mixing of Chinese and English.Through online survey and data analysis,this paper finds that the need to speak expressively,to bring forth humorous effect and use euphemism to avoid awkward situation are the first three factors accounting for codemixing.As speakers,they tend to use code mixing of Chinese-English in informal situations with more intimate people;as listeners,they hope such expressions can make humor and make the topic easy.In terms of usage,they are more willing to accept English expressions to avoid some disadvantages.Most of them are remaining neutral on supporting or opposing the code mixing of Chinese and English.
文摘Language plays a significant role in business,trade and commerce.Bargaining in open-air markets often involves the speech act of negotiating,compromising and manoeuvring,which could either result in conflict or persuasion of the potential buyer to patronize a seller.This article examined the sociolinguistic aspects of language use between sellers and buyers in Ipata,a popular market in Ilorin,north-central Nigeria.The call strategy,spiel,honorification,pragmatic mechanics as well as sociolinguistic style employed by vendors were observed.The objectives of the study were to:identify the number of languages used in Ipata market;investigate the factors that influence the choice of any of the languages used between sellers and buyers at a particular time;analyze their sociolinguistic and stylistic features;and discuss some of the barriers that could cause intercultural communication breakdown between sellers and buyers in a market situation.Oral interview,systematic observation and Bauman’s(2001)method of street recording were used to gather data for the study.Sixty-five people comprising thirty-six females and twenty-nine males were interviewed.The survey which spanned three months discovered that open-air markets bear some universal features,however,due to cultural relativity,variations occur.To this end,it was established that Ipata market harbours different languages;it also identified calls and spiels with their characteristic stylistic,sociolinguistic and discourse features in the market.In conclusion,the study argued that studying the verbal discourse of marketplaces is significant as it defines what the language-culture-society-relationship is all about.
文摘The discussion of HKE’s(Hong Kong English)origin,linguistic features,language planning policies and status aims to present a general overview of HKE.Through literature reviewing,it is found that HKE has simpler vowel system and smaller num ber of vowel contrasts,all fricatives are voiceless for most HKE speakers,HKE speaker like to simplify the final consonant clusters or omit the final consonant,the subject of a relative clause is usually missing in the‘zero’-subject relatives,code-mixing,codeswitching and the direct translation(from Cantonese into English or the directive translation of the sound of load-words)are popu lar in HKE speakers’speech,and HKE enjoys the status of second language in Hong Kong,etc.
文摘Purpose-Normalization is an important step in all the natural language processing applications that are handling social media text.The text from social media poses a different kind of problems that are not present in regular text.Recently,a considerable amount of work has been done in this direction,but mostly in the English language.People who do not speak English code mixed the text with their native language and posted text on social media using the Roman script.This kind of text further aggravates the problem of normalizing.This paper aims to discuss the concept of normalization with respect to code-mixed social media text,and a model has been proposed to normalize such text.Design/methodology/approach-The system is divided into two phases-candidate generation and most probable sentence selection.Candidate generation task is treated as machine translation task where the Roman text is treated as source language and Gurmukhi text is treated as the target language.Characterbased translation system has been proposed to generate candidate tokens.Once candidates are generated,the second phase uses the beam search method for selecting the most probable sentence based on hidden Markov model.Findings-Character error rate(CER)and bilingual evaluation understudy(BLEU)score are reported.The proposed system has been compared with Akhar software and RB\_R2G system,which are also capable of transliterating Roman text to Gurmukhi.The performance of the system outperforms Akhar software.The CER and BLEU scores are 0.268121 and 0.6807939,respectively,for ill-formed text.Research limitations/implications-It was observed that the system produces dialectical variations of a word or the word with minor errors like diacritic missing.Spell checker can improve the output of the system by correcting these minor errors.Extensive experimentation is needed for optimizing language identifier,which will further help in improving the output.The language model also seeks further exploration.Inclusion of wider context,particularly from social media text,is an important area that deserves further investigation.Practical implications-The practical implications of this study are:(1)development of parallel dataset containing Roman and Gurmukhi text;(2)development of dataset annotated with language tag;(3)development of the normalizing system,which is first of its kind and proposes translation based solution for normalizing noisy social media text from Roman to Gurmukhi.It can be extended for any pair of scripts.(4)The proposed system can be used for better analysis of social media text.Theoretically,our study helps in better understanding of text normalization in social media context and opens the doors for further research in multilingual social media text normalization.Originality/value-Existing research work focus on normalizing monolingual text.This study contributes towards the development of a normalization system for multilingual text.