In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated alon...In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated along with time-consuming to process a massive amount of data.Thus,to design the Distribution Preserving Framework for BD,a novel methodology has been proposed utilizing Manhattan Distance(MD)-centered Partition Around Medoid(MD–PAM)along with Conjugate Gradient Artificial Neural Network(CG-ANN),which undergoes various steps to reduce the complications of BD.Firstly,the data are processed in the pre-processing phase by mitigating the data repetition utilizing the map-reduce function;subsequently,the missing data are handled by substituting or by ignoring the missed values.After that,the data are transmuted into a normalized form.Next,to enhance the classification performance,the data’s dimensionalities are minimized by employing Gaussian Kernel(GK)-Fisher Discriminant Analysis(GK-FDA).Afterwards,the processed data is submitted to the partitioning phase after transmuting it into a structured format.In the partition phase,by utilizing the MD-PAM,the data are partitioned along with grouped into a cluster.Lastly,by employing CG-ANN,the data are classified in the classification phase so that the needed data can be effortlessly retrieved by the user.To analogize the outcomes of the CG-ANN with the prevailing methodologies,the NSL-KDD openly accessible datasets are utilized.The experiential outcomes displayed that an efficient result along with a reduced computation cost was shown by the proposed CG-ANN.The proposed work outperforms well in terms of accuracy,sensitivity and specificity than the existing systems.展开更多
Objective:To develop tobacco control strategies by analyzing online tobacco marketing information in China.Methods:Using web-crawler software,this study acquired 106,485 pieces of online tobacco marketing information ...Objective:To develop tobacco control strategies by analyzing online tobacco marketing information in China.Methods:Using web-crawler software,this study acquired 106,485 pieces of online tobacco marketing information published on 11 different Internet platforms including Weibo,WeChat,Baidu,etc.,from January-June 2018.The data were used to investigate the characteristics and social networks of online tobacco marketing via content and social network analysis.Results:The total volume of online tobacco marketing during the study period was high,showing a positive trend.Of all the marketing subjects,those involving"flavor capsule","Marlboro",and"Esse"were the most popular.The Weibo platform had the highest volume of online tobacco marketing information as well as the largest proportion of explicit marketing information.This was followed by other social media platforms such as Baidu Search,Baidu Tieba,and Xiaohongshu,where implicit marketing information predominated.The overall network structure of tobacco websites exhibited a significant centralization feature,where traditional and novel tobacco websites formed two clusters with almost no intersections.The China Tobacco Science and Education Website(http://www.tobaccoinfo.com.cn/)and E-Cigarette Home(http://ecigm.com/)were the two nodes of the highest degree centrality within the respective"circle",while the China Tobacco Monopoly Bureau Website(http://www.tobacco.gov.cn/)was the node with the highest closeness centrality.By contrast,Baidu Tieba's overall network structure was more decentralized,and the degree of correlation between different nodes was relatively low.Conclusion:Online tobacco marketing demonstrated high volumes and wide coverage,and an intertwined network,thereby creating major obstacles for tobacco control.To address this issue,the government should strengthen network supervision of tobacco marketing and revise its current regulations.Meanwhile,Internet platforms should improve self-regulation by comprehensively removing and blocking tobacco-related information.Lastly,the media and public should advocate associated policies and support Internet platform supervision.展开更多
In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving...In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving process with improved accuracy and to reduce the searching time.Since,in the data recommendation system,this type of data searching becomes complex to search for the best matching for given query data and fails in the accuracy of the query recommendation process.To improve the performance of data validation,this paper proposed a novel model of data similarity estimation and clustering method to retrieve the relevant data with the best matching in the big data processing.In this paper advanced model of the Logarithmic Directionality Texture Pattern(LDTP)method with a Metaheuristic Pattern Searching(MPS)system was used to estimate the similarity between the query data in the entire database.The overall work was implemented for the application of the data recommendation process.These are all indexed and grouped as a cluster to form a paged format of database structure which can reduce the computation time while at the searching period.Also,with the help of a neural network,the relevancies of feature attributes in the database are predicted,and the matching index was sorted to provide the recommended data for given query data.This was achieved by using the Distributional Recurrent Neural Network(DRNN).This is an enhanced model of Neural Network technology to find the relevancy based on the correlation factor of the feature set.The training process of the DRNN classifier was carried out by estimating the correlation factor of the attributes of the dataset.These are formed as clusters and paged with proper indexing based on the MPS parameter of similarity metric.The overall performance of the proposed work can be evaluated by varying the size of the training database by 60%,70%,and 80%.The parameters that are considered for performance analysis are Precision,Recall,F1-score and the accuracy of data retrieval,the query recommendation output,and comparison with other state-of-art methods.展开更多
文摘In several fields like financial dealing,industry,business,medicine,et cetera,Big Data(BD)has been utilized extensively,which is nothing but a collection of a huge amount of data.However,it is highly complicated along with time-consuming to process a massive amount of data.Thus,to design the Distribution Preserving Framework for BD,a novel methodology has been proposed utilizing Manhattan Distance(MD)-centered Partition Around Medoid(MD–PAM)along with Conjugate Gradient Artificial Neural Network(CG-ANN),which undergoes various steps to reduce the complications of BD.Firstly,the data are processed in the pre-processing phase by mitigating the data repetition utilizing the map-reduce function;subsequently,the missing data are handled by substituting or by ignoring the missed values.After that,the data are transmuted into a normalized form.Next,to enhance the classification performance,the data’s dimensionalities are minimized by employing Gaussian Kernel(GK)-Fisher Discriminant Analysis(GK-FDA).Afterwards,the processed data is submitted to the partitioning phase after transmuting it into a structured format.In the partition phase,by utilizing the MD-PAM,the data are partitioned along with grouped into a cluster.Lastly,by employing CG-ANN,the data are classified in the classification phase so that the needed data can be effortlessly retrieved by the user.To analogize the outcomes of the CG-ANN with the prevailing methodologies,the NSL-KDD openly accessible datasets are utilized.The experiential outcomes displayed that an efficient result along with a reduced computation cost was shown by the proposed CG-ANN.The proposed work outperforms well in terms of accuracy,sensitivity and specificity than the existing systems.
基金This work was funded by The Campaign for Tobacco-Free KidsThe funder was not involved in the design and conduct of the study,collection,analysis,interpretation of data,writing of the report,or decision to submit the article for publication.
文摘Objective:To develop tobacco control strategies by analyzing online tobacco marketing information in China.Methods:Using web-crawler software,this study acquired 106,485 pieces of online tobacco marketing information published on 11 different Internet platforms including Weibo,WeChat,Baidu,etc.,from January-June 2018.The data were used to investigate the characteristics and social networks of online tobacco marketing via content and social network analysis.Results:The total volume of online tobacco marketing during the study period was high,showing a positive trend.Of all the marketing subjects,those involving"flavor capsule","Marlboro",and"Esse"were the most popular.The Weibo platform had the highest volume of online tobacco marketing information as well as the largest proportion of explicit marketing information.This was followed by other social media platforms such as Baidu Search,Baidu Tieba,and Xiaohongshu,where implicit marketing information predominated.The overall network structure of tobacco websites exhibited a significant centralization feature,where traditional and novel tobacco websites formed two clusters with almost no intersections.The China Tobacco Science and Education Website(http://www.tobaccoinfo.com.cn/)and E-Cigarette Home(http://ecigm.com/)were the two nodes of the highest degree centrality within the respective"circle",while the China Tobacco Monopoly Bureau Website(http://www.tobacco.gov.cn/)was the node with the highest closeness centrality.By contrast,Baidu Tieba's overall network structure was more decentralized,and the degree of correlation between different nodes was relatively low.Conclusion:Online tobacco marketing demonstrated high volumes and wide coverage,and an intertwined network,thereby creating major obstacles for tobacco control.To address this issue,the government should strengthen network supervision of tobacco marketing and revise its current regulations.Meanwhile,Internet platforms should improve self-regulation by comprehensively removing and blocking tobacco-related information.Lastly,the media and public should advocate associated policies and support Internet platform supervision.
文摘In the data retrieval process of the Data recommendation system,the matching prediction and similarity identification take place a major role in the ontology.In that,there are several methods to improve the retrieving process with improved accuracy and to reduce the searching time.Since,in the data recommendation system,this type of data searching becomes complex to search for the best matching for given query data and fails in the accuracy of the query recommendation process.To improve the performance of data validation,this paper proposed a novel model of data similarity estimation and clustering method to retrieve the relevant data with the best matching in the big data processing.In this paper advanced model of the Logarithmic Directionality Texture Pattern(LDTP)method with a Metaheuristic Pattern Searching(MPS)system was used to estimate the similarity between the query data in the entire database.The overall work was implemented for the application of the data recommendation process.These are all indexed and grouped as a cluster to form a paged format of database structure which can reduce the computation time while at the searching period.Also,with the help of a neural network,the relevancies of feature attributes in the database are predicted,and the matching index was sorted to provide the recommended data for given query data.This was achieved by using the Distributional Recurrent Neural Network(DRNN).This is an enhanced model of Neural Network technology to find the relevancy based on the correlation factor of the feature set.The training process of the DRNN classifier was carried out by estimating the correlation factor of the attributes of the dataset.These are formed as clusters and paged with proper indexing based on the MPS parameter of similarity metric.The overall performance of the proposed work can be evaluated by varying the size of the training database by 60%,70%,and 80%.The parameters that are considered for performance analysis are Precision,Recall,F1-score and the accuracy of data retrieval,the query recommendation output,and comparison with other state-of-art methods.