The task of classifying opinions conveyed in any form of text online is referred to as sentiment analysis.The emergence of social media usage and its spread has given room for sentiment analysis in our daily lives.Soc...The task of classifying opinions conveyed in any form of text online is referred to as sentiment analysis.The emergence of social media usage and its spread has given room for sentiment analysis in our daily lives.Social media applications and websites have become the foremost spring of data recycled for reviews for sentimentality in various fields.Various subject matter can be encountered on social media platforms,such as movie product reviews,consumer opinions,and testimonies,among others,which can be used for sentiment analysis.The rapid uncovering of these web contents contains divergence of many benefits like profit-making,which is one of the most vital of them all.According to a recent study,81%of consumers conduct online research prior to making a purchase.But the reviews available online are too huge and numerous for human brains to process and analyze.Hence,machine learning classifiers are one of the prominent tools used to classify sentiment in order to get valuable information for use in companies like hotels,game companies,and so on.Understanding the sentiments of people towards different commodities helps to improve the services for contextual promotions,referral systems,and market research.Therefore,this study proposes a sentiment-based framework detection to enable the rapid uncovering of opinionated contents of hotel reviews.A Naive Bayes classifier was used to process and analyze the dataset for the detection of the polarity of the words.The dataset from Datafiniti’s Business Database obtained from Kaggle was used for the experiments in this study.The performance evaluation of the model shows a test accuracy of 96.08%,an F1-score of 96.00%,a precision of 96.00%,and a recall of 96.00%.The results were compared with state-of-the-art classifiers and showed a promising performance andmuch better in terms of performancemetrics.展开更多
With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So...With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction.展开更多
According to the current research status of component retrieval, the component description model based on facet classification is improved by adding semantic features. Furthermore, the component retrieval process mode...According to the current research status of component retrieval, the component description model based on facet classification is improved by adding semantic features. Furthermore, the component retrieval process model is put forward by combining the domain ontology with the relative concept match algorithm. A detailed illustration of a component reasoning engine and a component classification engine is given and the component classification algorithm is provided by using the Naive Bayes algorithm based on domain ontology. The experimental results show that the recall ratio and the precision ratio are obviously improved by using the method based on semantics, and demonstrate the feasibility and effectiveness of the proposed method.展开更多
This paper aims to analyze the microblog data published by the official account in a certain province of China,and finds out the rule of Weibo that is easier to be forwarded in the new police media perspective.In this...This paper aims to analyze the microblog data published by the official account in a certain province of China,and finds out the rule of Weibo that is easier to be forwarded in the new police media perspective.In this paper,a new topic-based model is proposed.Firstly,the LDA topic clustering algorithm is used to extract the topic categories with forwarding heat from the microblogs with high forwarding numbers,then the Naive Bayesian algorithm is used to topic categories.The sample data is processed to predict the type of microblog forwarding.In order to evaluate this method,a large number of microblog online data is used to analysis.The experimental results show that the proposed method can accurately predict the forwarding of Weibo.展开更多
General noise cost functions have been recently proposed for support vector regression(SVR). When applied to tasks whose underlying noise distribution is similar to the one assumed for the cost function, these models ...General noise cost functions have been recently proposed for support vector regression(SVR). When applied to tasks whose underlying noise distribution is similar to the one assumed for the cost function, these models should perform better than classical -SVR. On the other hand, uncertainty estimates for SVR have received a somewhat limited attention in the literature until now and still have unaddressed problems. Keeping this in mind,three main goals are addressed here. First, we propose a framework that uses a combination of general noise SVR models with naive online R minimization algorithm(NORMA) as optimization method, and then gives nonconstant error intervals dependent upon input data aided by the use of clustering techniques. We give theoretical details required to implement this framework for Laplace, Gaussian, Beta, Weibull and Marshall–Olkin generalized exponential distributions. Second, we test the proposed framework in two real-world regression problems using data of two public competitions about solar energy. Results show the validity of our models and an improvement over classical -SVR. Finally, in accordance with the principle of reproducible research, we make sure that data and model implementations used for the experiments are easily and publicly accessible.展开更多
文摘The task of classifying opinions conveyed in any form of text online is referred to as sentiment analysis.The emergence of social media usage and its spread has given room for sentiment analysis in our daily lives.Social media applications and websites have become the foremost spring of data recycled for reviews for sentimentality in various fields.Various subject matter can be encountered on social media platforms,such as movie product reviews,consumer opinions,and testimonies,among others,which can be used for sentiment analysis.The rapid uncovering of these web contents contains divergence of many benefits like profit-making,which is one of the most vital of them all.According to a recent study,81%of consumers conduct online research prior to making a purchase.But the reviews available online are too huge and numerous for human brains to process and analyze.Hence,machine learning classifiers are one of the prominent tools used to classify sentiment in order to get valuable information for use in companies like hotels,game companies,and so on.Understanding the sentiments of people towards different commodities helps to improve the services for contextual promotions,referral systems,and market research.Therefore,this study proposes a sentiment-based framework detection to enable the rapid uncovering of opinionated contents of hotel reviews.A Naive Bayes classifier was used to process and analyze the dataset for the detection of the polarity of the words.The dataset from Datafiniti’s Business Database obtained from Kaggle was used for the experiments in this study.The performance evaluation of the model shows a test accuracy of 96.08%,an F1-score of 96.00%,a precision of 96.00%,and a recall of 96.00%.The results were compared with state-of-the-art classifiers and showed a promising performance andmuch better in terms of performancemetrics.
基金This work is supported in part by the National Science Foundation of China(Nos.61672392,61373038)in part by the National Key Research and Development Program of China(No.2016YFC1202204).
文摘With the continuous expansion of software scale,software update and maintenance have become more and more important.However,frequent software code updates will make the software more likely to introduce new defects.So how to predict the defects quickly and accurately on the software change has become an important problem for software developers.Current defect prediction methods often cannot reflect the feature information of the defect comprehensively,and the detection effect is not ideal enough.Therefore,we propose a novel defect prediction model named ITNB(Improved Transfer Naive Bayes)based on improved transfer Naive Bayesian algorithm in this paper,which mainly considers the following two aspects:(1)Considering that the edge data of the test set may affect the similarity calculation and final prediction result,we remove the edge data of the test set when calculating the data similarity between the training set and the test set;(2)Considering that each feature dimension has different effects on defect prediction,we construct the calculation formula of training data weight based on feature dimension weight and data gravity,and then calculate the prior probability and the conditional probability of training data from the weight information,so as to construct the weighted bayesian classifier for software defect prediction.To evaluate the performance of the ITNB model,we use six datasets from large open source projects,namely Bugzilla,Columba,Mozilla,JDT,Platform and PostgreSQL.We compare the ITNB model with the transfer Naive Bayesian(TNB)model.The experimental results show that our ITNB model can achieve better results than the TNB model in terms of accurary,precision and pd for within-project and cross-project defect prediction.
基金The National Natural Science Foundation of China(No60072006)
文摘According to the current research status of component retrieval, the component description model based on facet classification is improved by adding semantic features. Furthermore, the component retrieval process model is put forward by combining the domain ontology with the relative concept match algorithm. A detailed illustration of a component reasoning engine and a component classification engine is given and the component classification algorithm is provided by using the Naive Bayes algorithm based on domain ontology. The experimental results show that the recall ratio and the precision ratio are obviously improved by using the method based on semantics, and demonstrate the feasibility and effectiveness of the proposed method.
基金supported by Jiangsu Province University Students Practice Innovation and Entrepreneurship Training Program Project,Project Number:201910329031Y,Project Name:Research on the influence of new media platform of Public Security Colleges under the background of big data“Research on the reform and innovation of network public opinion teaching in public security colleges and universities from the perspective of overall national security”(Project No.C-B/2020/01/27)+1 种基金Jiangsu Province modern education technology research project“Research on the innovation of public security network public opinion teaching mode based on modern information technology”(Project No.2017-R-59195)The key teaching reform project of Jiangsu Police Institute“Research on the reconstruction of online and offline hybrid”golden course”teaching system of Internet information inspection course(Project No.2019A30).
文摘This paper aims to analyze the microblog data published by the official account in a certain province of China,and finds out the rule of Weibo that is easier to be forwarded in the new police media perspective.In this paper,a new topic-based model is proposed.Firstly,the LDA topic clustering algorithm is used to extract the topic categories with forwarding heat from the microblogs with high forwarding numbers,then the Naive Bayesian algorithm is used to topic categories.The sample data is processed to predict the type of microblog forwarding.In order to evaluate this method,a large number of microblog online data is used to analysis.The experimental results show that the proposed method can accurately predict the forwarding of Weibo.
基金With partial support from Spain’s grants TIN2013-42351-P, TIN2016-76406-P, TIN2015-70308-REDT, as well as S2013/ICE-2845 CASI-CAM-CMsupported also by project FACIL–Ayudas Fundación BBVA a Equipos de Investigación Científica 2016
文摘General noise cost functions have been recently proposed for support vector regression(SVR). When applied to tasks whose underlying noise distribution is similar to the one assumed for the cost function, these models should perform better than classical -SVR. On the other hand, uncertainty estimates for SVR have received a somewhat limited attention in the literature until now and still have unaddressed problems. Keeping this in mind,three main goals are addressed here. First, we propose a framework that uses a combination of general noise SVR models with naive online R minimization algorithm(NORMA) as optimization method, and then gives nonconstant error intervals dependent upon input data aided by the use of clustering techniques. We give theoretical details required to implement this framework for Laplace, Gaussian, Beta, Weibull and Marshall–Olkin generalized exponential distributions. Second, we test the proposed framework in two real-world regression problems using data of two public competitions about solar energy. Results show the validity of our models and an improvement over classical -SVR. Finally, in accordance with the principle of reproducible research, we make sure that data and model implementations used for the experiments are easily and publicly accessible.