Laterally with the birth of the Internet,the fast growth of mobile stra-tegies has democratised content production owing to the widespread usage of social media,resulting in a detonation of short informal writings.Twi...Laterally with the birth of the Internet,the fast growth of mobile stra-tegies has democratised content production owing to the widespread usage of social media,resulting in a detonation of short informal writings.Twitter is micro-blogging short text and social networking services,with posted millions of quick messages.Twitter analysis addresses the topic of interpreting users’tweets in terms of ideas,interests,and views in a range of settings andfields.This type of study can be useful for a variation of academics and applications that need knowing people’s perspectives on a given topic or event.Although sentiment examination of these texts is useful for a variety of reasons,it is typically seen as a difficult undertaking due to the fact that these messages are frequently short,informal,loud,and rich in linguistic ambiguities such as polysemy.Furthermore,most contemporary sentiment analysis algorithms are based on clean data.In this paper,we offers a machine-learning-based sentiment analysis method that extracts features from Term Frequency and Inverse Document Frequency(TF-IDF)and needs to apply deep intelligent wordnet lemmatize to improve the excellence of tweets by removing noise.We also utilise the Random Forest network to detect the emotion of a tweet.To authenticate the proposed approach performance,we conduct extensive tests on publically accessible datasets,and thefindings reveal that the suggested technique significantly outperforms sentiment classification in multi-class emotion text data.展开更多
This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synony...This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach.展开更多
With an upsurge in biomedical literature,using data-mining method to search new knowledge from literature has drawing more attention of scholars.In this study,taking the mining of non-coding gene literature from the n...With an upsurge in biomedical literature,using data-mining method to search new knowledge from literature has drawing more attention of scholars.In this study,taking the mining of non-coding gene literature from the network database of PubMed as an example,we first preprocessed the abstract data,next applied the term occurrence frequency(TF) and inverse document frequency(IDF)(TF-IDF) method to select features,and then established a biomedical literature data-mining model based on Bayesian algorithm.Finally,we assessed the model through area under the receiver operating characteristic curve(AUC),accuracy,specificity,sensitivity,precision rate and recall rate.When 1 000 features are selected,AUC,specificity,sensitivity,accuracy rate,precision rate and recall rate are 0.868 3,84.63%,89.02%,86.83%,89.02% and 98.14%,respectively.These results indicate that our method can identify the targeted literature related to a particular topic effectively.展开更多
文摘Laterally with the birth of the Internet,the fast growth of mobile stra-tegies has democratised content production owing to the widespread usage of social media,resulting in a detonation of short informal writings.Twitter is micro-blogging short text and social networking services,with posted millions of quick messages.Twitter analysis addresses the topic of interpreting users’tweets in terms of ideas,interests,and views in a range of settings andfields.This type of study can be useful for a variation of academics and applications that need knowing people’s perspectives on a given topic or event.Although sentiment examination of these texts is useful for a variety of reasons,it is typically seen as a difficult undertaking due to the fact that these messages are frequently short,informal,loud,and rich in linguistic ambiguities such as polysemy.Furthermore,most contemporary sentiment analysis algorithms are based on clean data.In this paper,we offers a machine-learning-based sentiment analysis method that extracts features from Term Frequency and Inverse Document Frequency(TF-IDF)and needs to apply deep intelligent wordnet lemmatize to improve the excellence of tweets by removing noise.We also utilise the Random Forest network to detect the emotion of a tweet.To authenticate the proposed approach performance,we conduct extensive tests on publically accessible datasets,and thefindings reveal that the suggested technique significantly outperforms sentiment classification in multi-class emotion text data.
基金Project (No. 60082003) supported by the National Natural Science Foundation of China
文摘This paper presents a new improved term frequency/inverse document frequency (TF-IDF) approach which uses confidence, support and characteristic words to enhance the recall and precision of text classification. Synonyms defined by a lexicon are processed in the improved TF-IDF approach. We detailedly discuss and analyze the relationship among confidence, recall and precision. The experiments based on science and technology gave promising results that the new TF-IDF approach improves the precision and recall of text classification compared with the conventional TF-IDF approach.
文摘With an upsurge in biomedical literature,using data-mining method to search new knowledge from literature has drawing more attention of scholars.In this study,taking the mining of non-coding gene literature from the network database of PubMed as an example,we first preprocessed the abstract data,next applied the term occurrence frequency(TF) and inverse document frequency(IDF)(TF-IDF) method to select features,and then established a biomedical literature data-mining model based on Bayesian algorithm.Finally,we assessed the model through area under the receiver operating characteristic curve(AUC),accuracy,specificity,sensitivity,precision rate and recall rate.When 1 000 features are selected,AUC,specificity,sensitivity,accuracy rate,precision rate and recall rate are 0.868 3,84.63%,89.02%,86.83%,89.02% and 98.14%,respectively.These results indicate that our method can identify the targeted literature related to a particular topic effectively.