期刊文献+
共找到10篇文章
< 1 >
每页显示 20 50 100
Sentiment Analysis on Twitter Data Using Term Frequency-Inverse Document Frequency
1
作者 Akash Addiga Sikha Bagui 《Journal of Computer and Communications》 2022年第8期117-128,共12页
This study is an exploratory analysis of applying natural language processing techniques such as Term Frequency-Inverse Document Frequency and Sentiment Analysis on Twitter data. The uniqueness of this work is establi... This study is an exploratory analysis of applying natural language processing techniques such as Term Frequency-Inverse Document Frequency and Sentiment Analysis on Twitter data. The uniqueness of this work is established by determining the overall sentiment of a politician’s tweets based on TF-IDF values of terms used in their published tweets. By calculating the TF-IDF value of terms from the corpus, this work displays the correlation between TF-IDF score and polarity. The results of this work show that calculating the TF-IDF score of the corpus allows for a more accurate representation of the overall polarity since terms are given a weight based on their uniqueness and relevance rather than just the frequency at which they appear in the corpus. 展开更多
关键词 Sentiment Analysis Twitter Data Term frequency Inverse Term frequency Term frequency-Inverse document frequency (TF-IDF) Social Media
下载PDF
A Machine Learning-Based Technique with Intelligent WordNet Lemmatize for Twitter Sentiment Analysis
2
作者 S.Saranya G.Usha 《Intelligent Automation & Soft Computing》 SCIE 2023年第4期339-352,共14页
Laterally with the birth of the Internet,the fast growth of mobile stra-tegies has democratised content production owing to the widespread usage of social media,resulting in a detonation of short informal writings.Twi... Laterally with the birth of the Internet,the fast growth of mobile stra-tegies has democratised content production owing to the widespread usage of social media,resulting in a detonation of short informal writings.Twitter is micro-blogging short text and social networking services,with posted millions of quick messages.Twitter analysis addresses the topic of interpreting users’tweets in terms of ideas,interests,and views in a range of settings andfields.This type of study can be useful for a variation of academics and applications that need knowing people’s perspectives on a given topic or event.Although sentiment examination of these texts is useful for a variety of reasons,it is typically seen as a difficult undertaking due to the fact that these messages are frequently short,informal,loud,and rich in linguistic ambiguities such as polysemy.Furthermore,most contemporary sentiment analysis algorithms are based on clean data.In this paper,we offers a machine-learning-based sentiment analysis method that extracts features from Term Frequency and Inverse Document Frequency(TF-IDF)and needs to apply deep intelligent wordnet lemmatize to improve the excellence of tweets by removing noise.We also utilise the Random Forest network to detect the emotion of a tweet.To authenticate the proposed approach performance,we conduct extensive tests on publically accessible datasets,and thefindings reveal that the suggested technique significantly outperforms sentiment classification in multi-class emotion text data. 展开更多
关键词 Random Forest sentiment analysis social media term frequency and inverse document frequency TWITTER wordnet lemmatize
下载PDF
Environmental complaint insights through text mining based on the driver,pressure,state,impact,and response(DPSIR)framework:Evidence from an Italian environmental agency
3
作者 Fabiana MANSERVISI Michele BANZI +5 位作者 Tomaso TONELLI Paolo VERONESI Susanna RICCI Damiano DISTANTE Stefano FARALLI Giuseppe BORTONE 《Regional Sustainability》 2023年第3期261-281,共21页
Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,wa... Individuals,local communities,environmental associations,private organizations,and public representatives and bodies may all be aggrieved by environmental problems concerning poor air quality,illegal waste disposal,water contamination,and general pollution.Environmental complaints represent the expressions of dissatisfaction with these issues.As the timeconsuming of managing a large number of complaints,text mining may be useful for automatically extracting information on stakeholder priorities and concerns.The paper used text mining and semantic network analysis to crawl relevant keywords about environmental complaints from two online complaint submission systems:online claim submission system of Regional Agency for Prevention,Environment and Energy(Arpae)(“Contact Arpae”);and Arpae's internal platform for environmental pollution(“Environmental incident reporting portal”)in the Emilia-Romagna Region,Italy.We evaluated the total of 2477 records and classified this information based on the claim topic(air pollution,water pollution,noise pollution,waste,odor,soil,weather-climate,sea-coast,and electromagnetic radiation)and geographical distribution.Then,this paper used natural language processing to extract keywords from the dataset,and classified keywords ranking higher in Term Frequency-Inverse Document Frequency(TF-IDF)based on the driver,pressure,state,impact,and response(DPSIR)framework.This study provided a systemic approach to understanding the interaction between people and environment in different geographical contexts and builds sustainable and healthy communities.The results showed that most complaints are from the public and associated with air pollution and odor.Factories(particularly foundries and ceramic industries)and farms are identified as the drivers of environmental issues.Citizen believed that environmental issues mainly affect human well-being.Moreover,the keywords of“odor”,“report”,“request”,“presence”,“municipality”,and“hours”were the most influential and meaningful concepts,as demonstrated by their high degree and betweenness centrality values.Keywords connecting odor(classified as impacts)and air pollution(classified as state)were the most important(such as“odor-burnt plastic”and“odor-acrid”).Complainants perceived odor annoyance as a primary environmental concern,possibly related to two main drivers:“odor-factory”and“odorsfarms”.The proposed approach has several theoretical and practical implications:text mining may quickly and efficiently address citizen needs,providing the basis toward automating(even partially)the complaint process;and the DPSIR framework might support the planning and organization of information and the identification of stakeholder concerns and priorities,as well as metrics and indicators for their assessment.Therefore,integration of the DPSIR framework with the text mining of environmental complaints might generate a comprehensive environmental knowledge base as a prerequisite for a wider exploitation of analysis to support decision-making processes and environmental management activities. 展开更多
关键词 Environmental complaints Text mining approach Term frequency-Inverse document frequency(TF-IDF) DRIVER PRESSURE STATE impact and response(DPSIR)framework Semantic network analysis Regional Agency for Prevention Environment and Energy(Arpae)
下载PDF
ISTC: A New Method for Clustering Search Results 被引量:2
4
作者 ZHANG Wei XU Baowen +1 位作者 ZHANG Weifeng XU Junling 《Wuhan University Journal of Natural Sciences》 CAS 2008年第4期501-504,共4页
A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phr... A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering. 展开更多
关键词 Web search results clustering suffix tree term frequency-inverse document frequency (TFIDF) independence of phrases
下载PDF
Spark 平台下基于加权词向量的文本分类方法
5
作者 蔡宇翔 王佳斌 郑天华 《现代计算机》 2022年第3期25-30,共6页
针对Spark平台下文本分类中文本表示方法不够完善,导致分类准确率低的问题,结合SparkML下的TF-IDF算法和Word2vec模型,提出一种基于SparkML的加权词向量文本表示方法。首先对文本进行分词,去停用词等预处理,基于SparkML计算出每个词语... 针对Spark平台下文本分类中文本表示方法不够完善,导致分类准确率低的问题,结合SparkML下的TF-IDF算法和Word2vec模型,提出一种基于SparkML的加权词向量文本表示方法。首先对文本进行分词,去停用词等预处理,基于SparkML计算出每个词语的词频和逆文档频率,同时计算词语的词向量。使用词语的TF-IDF值作为词向量的权重,将文本表示为加权词向量,再使用SVM分类器进行分类。在THUNews数据集上进行实验。实验结果表明,该方法相比于传统的TF-IDF算法、平均Word2Vec词向量文本表示,可以提升分类的精度。 展开更多
关键词 SPARK 文本分类 TF-IDF(term frequency-inverse document frequency) Word2Vec 支持向量机 文本表示
下载PDF
Enhanced Topic-Aware Summarization Using Statistical Graph Neural Networks
6
作者 Ayesha Khaliq Salman Afsar Awan +2 位作者 Fahad Ahmad Muhammad Azam Zia Muhammad Zafar Iqbal 《Computers, Materials & Continua》 SCIE EI 2024年第8期3221-3242,共22页
The rapid expansion of online content and big data has precipitated an urgent need for efficient summarization techniques to swiftly comprehend vast textual documents without compromising their original integrity.Curr... The rapid expansion of online content and big data has precipitated an urgent need for efficient summarization techniques to swiftly comprehend vast textual documents without compromising their original integrity.Current approaches in Extractive Text Summarization(ETS)leverage the modeling of inter-sentence relationships,a task of paramount importance in producing coherent summaries.This study introduces an innovative model that integrates Graph Attention Networks(GATs)with Transformer-based Bidirectional Encoder Representa-tions from Transformers(BERT)and Latent Dirichlet Allocation(LDA),further enhanced by Term Frequency-Inverse Document Frequency(TF-IDF)values,to improve sentence selection by capturing comprehensive topical information.Our approach constructs a graph with nodes representing sentences,words,and topics,thereby elevating the interconnectivity and enabling a more refined understanding of text structures.This model is stretched to Multi-Document Summarization(MDS)from Single-Document Summarization,offering significant improvements over existing models such as THGS-GMM and Topic-GraphSum,as demonstrated by empirical evaluations on benchmark news datasets like Cable News Network(CNN)/Daily Mail(DM)and Multi-News.The results consistently demonstrate superior performance,showcasing the model’s robustness in handling complex summarization tasks across single and multi-document contexts.This research not only advances the integration of BERT and LDA within a GATs but also emphasizes our model’s capacity to effectively manage global information and adapt to diverse summarization challenges. 展开更多
关键词 Summarization graph attention network bidirectional encoder representations from transformers Latent Dirichlet Allocation term frequency-inverse document frequency
下载PDF
Hybrid Approach to Document Anomaly Detection:An Application to Facilitate RPA in Title Insurance
7
作者 Abhijit Guha Debabrata Samanta 《International Journal of Automation and computing》 EI CSCD 2021年第1期55-72,共18页
Anomaly detection(AD)is an important aspect of various domains and title insurance(TI)is no exception.Robotic process automation(RPA)is taking over manual tasks in TI business processes,but it has its limitations with... Anomaly detection(AD)is an important aspect of various domains and title insurance(TI)is no exception.Robotic process automation(RPA)is taking over manual tasks in TI business processes,but it has its limitations without the support of artificial intelligence(AI)and machine learning(ML).With increasing data dimensionality and in composite population scenarios,the complexity of detecting anomalies increases and AD in automated document management systems(ADMS)is the least explored domain.Deep learning,being the fastest maturing technology can be combined along with traditional anomaly detectors to facilitate and improve the RPAs in TI.We present a hybrid model for AD,using autoencoders(AE)and a one-class support vector machine(OSVM).In the present study,OSVM receives input features representing real-time documents from the TI business,orchestrated and with dimensions reduced by AE.The results obtained from multiple experiments are comparable with traditional methods and within a business acceptable range,regarding accuracy and performance. 展开更多
关键词 Anomaly detection title insurance autoencoder one-class support vector machine(OSVM) term frequency-inverse document frequency(TF-IDF) robotic process automation dimensionality reduction
原文传递
Fusion Model for Tentative Diagnosis Inference Based on Clinical Narratives
8
作者 Ying Yu Junwen Duan Min Li 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第4期686-695,共10页
In general,physicians make a preliminary diagnosis based on patients’admission narratives and admission conditions,largely depending on their experiences and professional knowledge.An automatic and accurate tentative... In general,physicians make a preliminary diagnosis based on patients’admission narratives and admission conditions,largely depending on their experiences and professional knowledge.An automatic and accurate tentative diagnosis based on clinical narratives would be of great importance to physicians,particularly in the shortage of medical resources.Despite its great value,little work has been conducted on this diagnosis method.Thus,in this study,we propose a fusion model that integrates the semantic and symptom features contained in the clinical text.The semantic features of the input text are initially captured by an attention-based Bidirectional Long Short-Term Memory(BiLSTM)network.The symptom concepts,recognized from the input text,are then vectorized by using the term frequency-inverse document frequency method based on the relations between symptoms and diseases.Finally,two fusion strategies are utilized to recommend the most potential candidate for the international classification of diseases code.Model training and evaluation are performed on a public clinical dataset.The results show that both fusion strategies achieved a promising performance,in which the best performance obtained a top-3 accuracy of 0.7412. 展开更多
关键词 tentative diagnosis clinical narrative Bidirectional Long Short-Term Memory(BiLSTM) Term frequencyInverse document frequency(TF-IDF) fusion strategy
原文传递
Spontaneous Language Analysis in Alzheimer’s Disease:Evaluation of Natural Language Processing Technique for Analyzing Lexical Performance
9
作者 刘宁 袁贞明 《Journal of Shanghai Jiaotong university(Science)》 EI 2022年第2期160-167,共8页
Language disorder,a common manifestation of Alzheimer’s disease(AD),has attracted widespread attention in recent years.This paper uses a novel natural language processing(NLP)method,compared with latest deep learning... Language disorder,a common manifestation of Alzheimer’s disease(AD),has attracted widespread attention in recent years.This paper uses a novel natural language processing(NLP)method,compared with latest deep learning technology,to detect AD and explore the lexical performance.Our proposed approach is based on two stages.First,the dialogue contents are summarized into two categories with the same category.Second,term frequency—inverse document frequency(TF-IDF)algorithm is used to extract the keywords of transcripts,and the similarity of keywords between the groups was calculated separately by cosine distance.Several deep learning methods are used to compare the performance.In the meanwhile,keywords with the best performance are used to analyze AD patients’lexical performance.In the Predictive Challenge of Alzheimer’s Disease held by iFlytek in 2019,the proposed AD diagnosis model achieves a better performance in binary classification by adjusting the number of keywords.The F1 score of the model has a considerable improvement over the baseline of 75.4%,and the training process of which is simple and efficient.We analyze the keywords of the model and find that AD patients use less noun and verb than normal controls.A computer-assisted AD diagnosis model on small Chinese dataset is proposed in this paper,which provides a potential way for assisting diagnosis of AD and analyzing lexical performance in clinical setting. 展开更多
关键词 natural language processing(NLP) Alzheimer's disease(AD) mild cognitive impairment term frequency-inverse document frequency(TF-IDF) bag of words
原文传递
Web-Based Biomedical Literature Mining
10
作者 安建福 薛惠平 +2 位作者 陈瑛 吴建国 章鲁 《Journal of Shanghai Jiaotong university(Science)》 EI 2012年第4期494-499,共6页
With an upsurge in biomedical literature,using data-mining method to search new knowledge from literature has drawing more attention of scholars.In this study,taking the mining of non-coding gene literature from the n... With an upsurge in biomedical literature,using data-mining method to search new knowledge from literature has drawing more attention of scholars.In this study,taking the mining of non-coding gene literature from the network database of PubMed as an example,we first preprocessed the abstract data,next applied the term occurrence frequency(TF) and inverse document frequency(IDF)(TF-IDF) method to select features,and then established a biomedical literature data-mining model based on Bayesian algorithm.Finally,we assessed the model through area under the receiver operating characteristic curve(AUC),accuracy,specificity,sensitivity,precision rate and recall rate.When 1 000 features are selected,AUC,specificity,sensitivity,accuracy rate,precision rate and recall rate are 0.868 3,84.63%,89.02%,86.83%,89.02% and 98.14%,respectively.These results indicate that our method can identify the targeted literature related to a particular topic effectively. 展开更多
关键词 Bayesian algorithm term occurrence frequency(TF) and inverse document frequency(IDF)(TFIDF) DATA-MINING
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部