The growth of the internet and technology has had a significant effect on social interactions.False information has become an important research topic due to the massive amount of misinformed content on social network...The growth of the internet and technology has had a significant effect on social interactions.False information has become an important research topic due to the massive amount of misinformed content on social networks.It is very easy for any user to spread misinformation through the media.Therefore,misinformation is a problem for professionals,organizers,and societies.Hence,it is essential to observe the credibility and validity of the News articles being shared on social media.The core challenge is to distinguish the difference between accurate and false information.Recent studies focus on News article content,such as News titles and descriptions,which has limited their achievements.However,there are two ordinarily agreed-upon features of misinformation:first,the title and text of an article,and second,the user engagement.In the case of the News context,we extracted different user engagements with articles,for example,tweets,i.e.,read-only,user retweets,likes,and shares.We calculate user credibility and combine it with article content with the user’s context.After combining both features,we used three Natural language processing(NLP)feature extraction techniques,i.e.,Term Frequency-Inverse Document Frequency(TF-IDF),Count-Vectorizer(CV),and Hashing-Vectorizer(HV).Then,we applied different machine learning classifiers to classify misinformation as real or fake.Therefore,we used a Support Vector Machine(SVM),Naive Byes(NB),Random Forest(RF),Decision Tree(DT),Gradient Boosting(GB),and K-Nearest Neighbors(KNN).The proposed method has been tested on a real-world dataset,i.e.,“fakenewsnet”.We refine the fakenewsnet dataset repository according to our required features.The dataset contains 23000+articles with millions of user engagements.The highest accuracy score is 93.4%.The proposed model achieves its highest accuracy using count vector features and a random forest classifier.Our discoveries confirmed that the proposed classifier would effectively classify misinformation in social networks.展开更多
The demand for cloud computing has increased manifold in the recent past.More specifically,on-demand computing has seen a rapid rise as organizations rely mostly on cloud service providers for their day-to-day computi...The demand for cloud computing has increased manifold in the recent past.More specifically,on-demand computing has seen a rapid rise as organizations rely mostly on cloud service providers for their day-to-day computing needs.The cloud service provider fulfills different user requirements using virtualization-where a single physical machine can host multiple VirtualMachines.Each virtualmachine potentially represents a different user environment such as operating system,programming environment,and applications.However,these cloud services use a large amount of electrical energy and produce greenhouse gases.To reduce the electricity cost and greenhouse gases,energy efficient algorithms must be designed.One specific area where energy efficient algorithms are required is virtual machine consolidation.With virtualmachine consolidation,the objective is to utilize the minimumpossible number of hosts to accommodate the required virtual machines,keeping in mind the service level agreement requirements.This research work formulates the virtual machine migration as an online problem and develops optimal offline and online algorithms for the single host virtual machine migration problem under a service level agreement constraint for an over-utilized host.The online algorithm is analyzed using a competitive analysis approach.In addition,an experimental analysis of the proposed algorithm on real-world data is conducted to showcase the improved performance of the proposed algorithm against the benchmark algorithms.Our proposed online algorithm consumed 25%less energy and performed 43%fewer migrations than the benchmark algorithms.展开更多
The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many...The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many attempts show improvements made by excluding samples with missing information from the analysis process,while others have tried to fill the gaps with possible values.While the former is simple,the latter safeguards information loss.For that,a neighbour-based(KNN)approach has proven more effective than other global estimators.The paper extends this further by introducing a new summarizationmethod to theKNNmodel.It is the first study that applies the concept of ordered weighted averaging(OWA)operator to such a problem context.In particular,two variations of OWA aggregation are proposed and evaluated against their baseline and other neighbor-based models.Using different ratios of missing values from 1%-20%and a set of six published gene expression datasets,the experimental results suggest that newmethods usually provide more accurate estimates than those compared methods.Specific to the missing rates of 5%and 20%,the best NRMSE scores as averages across datasets is 0.65 and 0.69,while the highest measures obtained by existing techniques included in this study are 0.80 and 0.84,respectively.展开更多
A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age a...A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age and nominal ones like blood type are exploited to characterize account details,business transactions or individuals.However,only a few standard clustering techniques and consensus clusteringmethods are provided to examine such a data thus far.Given this insight,the paper introduces novel extensions of link-based cluster ensemble,LCEWCT and LCEWTQ that are accurate for analyzing mixed-type data.They promote diversity within an ensemble through different initializations of the k-prototypes algorithm as base clusterings and then refine the summarized data using a link-based approach.Based on the evaluationmetric of NMI(NormalizedMutual Information)that is averaged across different combinations of benchmark datasets and experimental settings,these new models reach the improved level of 0.34,while the best model found in the literature obtains only around the mark of 0.24.Besides,parameter analysis included herein helps to enhance their performance even further,given relations of clustering quality and algorithmic variables specific to the underlying link-based models.Moreover,another significant factor of ensemble size is examined in such a way to justify a tradeoff between complexity and accuracy.展开更多
Anomaly detection in Hyperspectral Imagery(HSI)has received considerable attention because of its potential application in several areas.Numerous anomaly detection algorithms for HSI have been proposed in the literatu...Anomaly detection in Hyperspectral Imagery(HSI)has received considerable attention because of its potential application in several areas.Numerous anomaly detection algorithms for HSI have been proposed in the literature;however,due to the use of different datasets in previous studies,an extensive performance comparison of these algorithms is missing.In this paper,an overview of the current state of research in hyperspectral anomaly detection is presented by broadly dividing all the previously proposed algorithms into eight different categories.In addition,this paper presents the most comprehensive comparative analysis to-date in hyperspectral anomaly detection by evaluating 22 algorithms on 17 different publicly available datasets.Results indicate that attribute and edge-preserving filtering-based detection(AED),local summation anomaly detection based on collaborative representation and inverse distance weight(LSAD-CR-IDW)and local summation unsupervised nearest regularized subspace with an outlier removal anomaly detector(LSUNRSORAD)perform better as indicated by the mean and median values of area under the receiver operating characteristic(ROC)curves.Finally,this paper studies the effect of various dimensionality reduction techniques on anomaly detection.Results indicate that reducing the number of components to around 20 improves the performance;however,any further decrease deteriorates the performance.展开更多
文摘The growth of the internet and technology has had a significant effect on social interactions.False information has become an important research topic due to the massive amount of misinformed content on social networks.It is very easy for any user to spread misinformation through the media.Therefore,misinformation is a problem for professionals,organizers,and societies.Hence,it is essential to observe the credibility and validity of the News articles being shared on social media.The core challenge is to distinguish the difference between accurate and false information.Recent studies focus on News article content,such as News titles and descriptions,which has limited their achievements.However,there are two ordinarily agreed-upon features of misinformation:first,the title and text of an article,and second,the user engagement.In the case of the News context,we extracted different user engagements with articles,for example,tweets,i.e.,read-only,user retweets,likes,and shares.We calculate user credibility and combine it with article content with the user’s context.After combining both features,we used three Natural language processing(NLP)feature extraction techniques,i.e.,Term Frequency-Inverse Document Frequency(TF-IDF),Count-Vectorizer(CV),and Hashing-Vectorizer(HV).Then,we applied different machine learning classifiers to classify misinformation as real or fake.Therefore,we used a Support Vector Machine(SVM),Naive Byes(NB),Random Forest(RF),Decision Tree(DT),Gradient Boosting(GB),and K-Nearest Neighbors(KNN).The proposed method has been tested on a real-world dataset,i.e.,“fakenewsnet”.We refine the fakenewsnet dataset repository according to our required features.The dataset contains 23000+articles with millions of user engagements.The highest accuracy score is 93.4%.The proposed model achieves its highest accuracy using count vector features and a random forest classifier.Our discoveries confirmed that the proposed classifier would effectively classify misinformation in social networks.
文摘The demand for cloud computing has increased manifold in the recent past.More specifically,on-demand computing has seen a rapid rise as organizations rely mostly on cloud service providers for their day-to-day computing needs.The cloud service provider fulfills different user requirements using virtualization-where a single physical machine can host multiple VirtualMachines.Each virtualmachine potentially represents a different user environment such as operating system,programming environment,and applications.However,these cloud services use a large amount of electrical energy and produce greenhouse gases.To reduce the electricity cost and greenhouse gases,energy efficient algorithms must be designed.One specific area where energy efficient algorithms are required is virtual machine consolidation.With virtualmachine consolidation,the objective is to utilize the minimumpossible number of hosts to accommodate the required virtual machines,keeping in mind the service level agreement requirements.This research work formulates the virtual machine migration as an online problem and develops optimal offline and online algorithms for the single host virtual machine migration problem under a service level agreement constraint for an over-utilized host.The online algorithm is analyzed using a competitive analysis approach.In addition,an experimental analysis of the proposed algorithm on real-world data is conducted to showcase the improved performance of the proposed algorithm against the benchmark algorithms.Our proposed online algorithm consumed 25%less energy and performed 43%fewer migrations than the benchmark algorithms.
基金This work is funded by Newton Institutional Links 2020-21 project:623718881,jointly by British Council and National Research Council of Thailand(www.britishcouncil.org).The corresponding author is the project PI.
文摘The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many attempts show improvements made by excluding samples with missing information from the analysis process,while others have tried to fill the gaps with possible values.While the former is simple,the latter safeguards information loss.For that,a neighbour-based(KNN)approach has proven more effective than other global estimators.The paper extends this further by introducing a new summarizationmethod to theKNNmodel.It is the first study that applies the concept of ordered weighted averaging(OWA)operator to such a problem context.In particular,two variations of OWA aggregation are proposed and evaluated against their baseline and other neighbor-based models.Using different ratios of missing values from 1%-20%and a set of six published gene expression datasets,the experimental results suggest that newmethods usually provide more accurate estimates than those compared methods.Specific to the missing rates of 5%and 20%,the best NRMSE scores as averages across datasets is 0.65 and 0.69,while the highest measures obtained by existing techniques included in this study are 0.80 and 0.84,respectively.
基金This work is funded by Newton Institutional Links 2020-21 project:623718881,jointly by British Council and National Research Council of Thailand(www.british council.org).The first author is the project PI with the other participating as a Co-I.
文摘A mix between numerical and nominal data types commonly presents many modern-age data collections.Examples of these include banking data,sales history and healthcare records,where both continuous attributes like age and nominal ones like blood type are exploited to characterize account details,business transactions or individuals.However,only a few standard clustering techniques and consensus clusteringmethods are provided to examine such a data thus far.Given this insight,the paper introduces novel extensions of link-based cluster ensemble,LCEWCT and LCEWTQ that are accurate for analyzing mixed-type data.They promote diversity within an ensemble through different initializations of the k-prototypes algorithm as base clusterings and then refine the summarized data using a link-based approach.Based on the evaluationmetric of NMI(NormalizedMutual Information)that is averaged across different combinations of benchmark datasets and experimental settings,these new models reach the improved level of 0.34,while the best model found in the literature obtains only around the mark of 0.24.Besides,parameter analysis included herein helps to enhance their performance even further,given relations of clustering quality and algorithmic variables specific to the underlying link-based models.Moreover,another significant factor of ensemble size is examined in such a way to justify a tradeoff between complexity and accuracy.
基金supported by Pakistan Space and Upper Atmosphere Research Commission[grant number NSP-654-20].
文摘Anomaly detection in Hyperspectral Imagery(HSI)has received considerable attention because of its potential application in several areas.Numerous anomaly detection algorithms for HSI have been proposed in the literature;however,due to the use of different datasets in previous studies,an extensive performance comparison of these algorithms is missing.In this paper,an overview of the current state of research in hyperspectral anomaly detection is presented by broadly dividing all the previously proposed algorithms into eight different categories.In addition,this paper presents the most comprehensive comparative analysis to-date in hyperspectral anomaly detection by evaluating 22 algorithms on 17 different publicly available datasets.Results indicate that attribute and edge-preserving filtering-based detection(AED),local summation anomaly detection based on collaborative representation and inverse distance weight(LSAD-CR-IDW)and local summation unsupervised nearest regularized subspace with an outlier removal anomaly detector(LSUNRSORAD)perform better as indicated by the mean and median values of area under the receiver operating characteristic(ROC)curves.Finally,this paper studies the effect of various dimensionality reduction techniques on anomaly detection.Results indicate that reducing the number of components to around 20 improves the performance;however,any further decrease deteriorates the performance.