This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,tradit...This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.展开更多
A Schwann cell has regenerative capabilities and is an important cell in the peripheral nervous system.This microarray study is part of a bioinformatics study that focuses mainly on Schwann cells. Microarray data prov...A Schwann cell has regenerative capabilities and is an important cell in the peripheral nervous system.This microarray study is part of a bioinformatics study that focuses mainly on Schwann cells. Microarray data provide information on differences between microarray-based and experiment-based gene expression analyses. According to microarray data, several genes exhibit increased expression(fold change) but they are weakly expressed in experimental studies(based on morphology, protein and mRNA levels). In contrast, some genes are weakly expressed in microarray data and highly expressed in experimental studies;such genes may represent future target genes in Schwann cell studies. These studies allow us to learn about additional genes that could be used to achieve targeted results from experimental studies. In the current big data study by retrieving more than 5000 scientific articles from PubMed or NCBI, Google Scholar, and Google, 1016(up-and downregulated) genes were determined to be related to Schwann cells. However,no experiment was performed in the laboratory; rather, the present study is part of a big data analysis. Our study will contribute to our understanding of Schwann cell biology by aiding in the identification of genes.Based on a comparative analysis of all microarray data, we conclude that the microarray could be a good tool for predicting the expression and intensity of different genes of interest in actual experiments.展开更多
Big data on product sales are an emerging resource for supporting modular product design to meet diversified customers’requirements of product specification combinations.To better facilitate decision-making of modula...Big data on product sales are an emerging resource for supporting modular product design to meet diversified customers’requirements of product specification combinations.To better facilitate decision-making of modular product design,correlations among specifications and components originated from customers’conscious and subconscious preferences can be investigated by using big data on product sales.This study proposes a framework and the associated methods for supporting modular product design decisions based on correlation analysis of product specifications and components using big sales data.The correlations of the product specifications are determined by analyzing the collected product sales data.By building the relations between the product components and specifications,a matrix for measuring the correlation among product components is formed for component clustering.Six rules for supporting the decision making of modular product design are proposed based on the frequency analysis of the specification values per component cluster.A case study of electric vehicles illustrates the application of the proposed method.展开更多
As of 2020,the issue of user satisfaction has generated a significant amount of interest.Therefore,we employ a big data approach for exploring user satisfaction among Uber users.We develop a research model of user sat...As of 2020,the issue of user satisfaction has generated a significant amount of interest.Therefore,we employ a big data approach for exploring user satisfaction among Uber users.We develop a research model of user satisfaction by expanding the list of user experience(UX)elements(i.e.,pragmatic,expectation confirmation,hedonic,and burden)by including more elements,namely:risk,cost,promotion,anxiety,sadness,and anger.Subsequently,we collect 125,768 comments from online reviews of Uber services and perform a sentiment analysis to extract the UX elements.The results of a regression analysis reveal the following:hedonic,promotion,and pragmatic significantly and positively affect user satisfaction,while burden,cost,and risk have a substantial negative influence.However,the influence of expectation confirmation on user satisfaction is not supported.Moreover,sadness,anxiety,and anger are positively related to the perceived risk of users.Compared with sadness and anxiety,anger has a more important role in increasing the perceived burden of users.Based on these findings,we also provide some theoretical implications for future UX literature and some core suggestions related to establishing strategies for Uber and similar services.The proposed big data approach may be utilized in other UX studies in the future.展开更多
Method development has always been and will continue to be a core driving force of microbiome science, In this perspective, we argue that in the next decade, method development in microbiome analysis will be driven by...Method development has always been and will continue to be a core driving force of microbiome science, In this perspective, we argue that in the next decade, method development in microbiome analysis will be driven by three key changes in both ways of thinking and technological platforms: ① a shift from dissecting microbiota structure by sequencing to tracking microbiota state, function, and intercellular interaction via imaging; ② a shift from interrogating a consortium or population of cells to probing individual cells; and ③a shift from microbiome data analysis to microbiome data science. Some of the recent methoddevelopment efforts by Chinese microbiome scientists and their international collaborators that underlie these technological trends are highlighted here. It is our belief that the China Microbiome Initiative has the opportunity to deliver outstanding "Made-in-China" tools to the international research community, by building an ambitious, competitive, and collaborative program at the forefront of method development for microbiome science.展开更多
Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data vis...Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data visualization refers to data that is presented in a visual form,such as a chart or map,to help people understand the meaning of the data.Data visualization helps people extract meaning from data quickly and easily.Visualization can be used to fully demonstrate the patterns,trends,and dependencies of your data,which can be found in other displays.Big data visualization analysis combines the advantages of computers,which can be static or interactive,interactive analysis methods and interactive technologies,which can directly help people and effectively understand the information behind big data.It is indispensable in the era of big data visualization,and it can be very intuitive if used properly.Graphical analysis also found that valuable information becomes a powerful tool in complex data relationships,and it represents a significant business opportunity.With the rise of big data,important technologies suitable for dealing with complex relationships have emerged.Graphics come in a variety of shapes and sizes for a variety of business problems.Graphic analysis is first in the visualization.The step is to get the right data and answer the goal.In short,to choose the right method,you must understand each relative strengths and weaknesses and understand the data.Key steps to get data:target;collect;clean;connect.展开更多
In this paper,we describe a method of emotion analysis on social big data.Social big data means text data that is emerging on Internet social networking services.We collect multilingual web corpora and annotated emoti...In this paper,we describe a method of emotion analysis on social big data.Social big data means text data that is emerging on Internet social networking services.We collect multilingual web corpora and annotated emotion tags to these corpora for the purpose of emotion analysis.Because these data are constructed by manual annotation,their quality is high but their quantity is low.If we create an emotion analysis model based on this corpus with high quality and use the model for the analysis of social big data,we might be able to statistically analyze emotional sensesand behavior of the people in Internet communications,which we could not know before.In this paper,we create an emotion analysis model that integrate the highquality emotion corpus and the automaticconstructed corpus that we created in our past studies,and then analyze a large-scale corpus consisting of Twitter tweets based on the model.As the result of time-series analysis on the large-scale corpus and the result of model evaluation,we show the effectiveness of our proposed method.展开更多
In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedente...In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies.展开更多
Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with o...Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.展开更多
This study developed a new methodology for analyzing the risk level of marine spill accidents from two perspectives,namely,marine traffic density and sensitive resources.Through a case study conducted in Busan,South K...This study developed a new methodology for analyzing the risk level of marine spill accidents from two perspectives,namely,marine traffic density and sensitive resources.Through a case study conducted in Busan,South Korea,detailed procedures of the methodology were proposed and its scalability was confirmed.To analyze the risk from a more detailed and microscopic viewpoint,vessel routes as hazard sources were delineated on the basis of automated identification system(AIS)big data.The outliers and errors of AIS big data were removed using the density-based spatial clustering of applications with noise algorithm,and a marine traffic density map was evaluated by combining all of the gridded routes.Vulnerability of marine environment was identified on the basis of the sensitive resource map constructed by the Korea Coast Guard in a similar manner to the National Oceanic and Atmospheric Administration environmental sensitivity index approach.In this study,aquaculture sites,water intake facilities of power plants,and beach/resort areas were selected as representative indicators for each category.The vulnerability values of neighboring cells decreased according to the Euclidean distance from the resource cells.Two resulting maps were aggregated to construct a final sensitive resource and traffic density(SRTD)risk analysis map of the Busan–Ulsan sea areas.We confirmed the effectiveness of SRTD risk analysis by comparing it with the actual marine spill accident records.Results show that all of the marine spill accidents in 2018 occurred within 2 km of high-risk cells(level 6 and above).Thus,if accident management and monitoring capabilities are concentrated on high-risk cells,which account for only 6.45%of the total study area,then it is expected that it will be possible to cope with most marine spill accidents effectively.展开更多
There are a lot of biological and experimental data from genomics, proteomics, drug screening, medicinal chemistry, etc. A large amount of data must be analyzed by special methods of statistics, bioinformatics, and co...There are a lot of biological and experimental data from genomics, proteomics, drug screening, medicinal chemistry, etc. A large amount of data must be analyzed by special methods of statistics, bioinformatics, and computer science. Big data analysis is an effective way to build scientific hypothesis and explore internal mechanism.Here, gene expression is taken as an example to illustrate the basic procedure of the big data analysis.展开更多
The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on pr...The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on privacy preserving data publishing and access control.There is little research on the association of user privacy information,so it is not easy to design personalized privacy protection strategy,but also increase the complexity of user privacy settings.Therefore,this paper concentrates on the association of user privacy information taking big data analysis tools,so as to provide data support for personalized privacy protection strategy design.展开更多
Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university...Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university’s big data.Taking the student card information as the research sample,using spark big data mining technology and K-Means clustering algorithm,taking scholarship evaluation as an example,the big data is analyzed.Data includes analysis of students’daily behavior from multiple dimensions,and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism,votes of teachers and students,etc.At the same time,students’absenteeism,physical health and psychological status in advance can be predicted,which makes student management work more active,accurate and effective.展开更多
With the arrival of the era of big data,the audit thinking mode has been promoted to change.Under the influence of big data,audit will become an activity of continuous behavio Through cloud data,the staff can control ...With the arrival of the era of big data,the audit thinking mode has been promoted to change.Under the influence of big data,audit will become an activity of continuous behavio Through cloud data,the staff can control the operation status and risk assessment of the whole enterprise,timely analyze,control and respond to risks,and protect the enterprise to reduce risks.With the advent of the era of big data,audit data analysis is becoming more and more important.At the same time,a large amount of data analysis also brings challenges to auditors.Methods to deal and solve the challenges has become an urgent problem to be solved at present.This paper mainly studies the challenges and countermeasures brought by the changes of audit approaches and methods to audit data analysis under the background of big data,so as to continuously innovate and practice the improvement of audit technology and promote the healthy and rapid development of social economy.展开更多
The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibi...The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibilities that increase the dynamic and volatility of their ecosystems. On the one hand, this evolution generates a huge field for exploitation, but on the other hand also increases complexity including new challenges and requirements demanding for new approaches in several issues. One challenge is the analysis of such systems that generate huge amounts of (continuously generated) data, potentially containing valuable information useful for several use cases, such as knowledge generation, key performance indicator (KPI) optimization, diagnosis, predication, feedback to design or decision support. This work presents a review of Big Data analysis in smart manufacturing systems. It includes the status quo in research, innovation and development, next challenges, and a comprehensive list of potential use cases and exploitation possibilities.展开更多
By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline...By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline published in the China Academic Network Publishing Database(CNKI)was analyzed and discussed.It is found that there was a lack of communication and cooperation among research institutions and scholars;the research hotspots involved four main areas,including“application in tourism research”,“application in traffic travel research”,“application in work-housing relationship research”,and“application in personal family life research”.展开更多
Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in d...Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in decision making. Risk assessment is very important for safe and reliable investment. Risk management involves assessing the risk sources and designing strategies and procedures to mitigate those risks to an acceptable level. In this paper, we emphasize on classification of different types of risk factors and find a simple and effective way to calculate the risk exposure.. The study uses rough set method to classify and judge the safety attributes related to investment policy. The method which based on intelligent knowledge accusation provides an innovative way for risk analysis. From this approach, we are able to calculate the significance of each factor and relative risk exposure based on the original data without assigning the weight subjectively.展开更多
Quantitative analysis of digital images requires detection and segmentation of the borders of the object of interest. Accurate segmentation is required for volume determination, 3D rendering, radiation therapy, and su...Quantitative analysis of digital images requires detection and segmentation of the borders of the object of interest. Accurate segmentation is required for volume determination, 3D rendering, radiation therapy, and surgery planning. In medical images, segmentation has traditionally been done by human experts. Substantial computational and storage requirements become especially acute when object orientation and scale have to be considered. Therefore, automated or semi-automated segmentation techniques are essential if these software applications are ever to gain widespread clinical use. Many methods have been proposed to detect and segment 2D shapes, most of which involve template matching. Advanced segmentation techniques called Snakes or active contours have been used, considering deformable models or templates. The main purpose of this work is to apply segmentation techniques for the definition of 3D organs (anatomical structures) when big data information has been stored and must be organized by the doctors for medical diagnosis. The processes would be implemented in the CT images from patients with COVID-19.展开更多
The bug tracking system is well known as the project support tool of open source software. There are many categorical data sets recorded on the bug tracking system. In the past, many reliability assessment methods hav...The bug tracking system is well known as the project support tool of open source software. There are many categorical data sets recorded on the bug tracking system. In the past, many reliability assessment methods have been proposed in the research area of software reliability. Also, there are several software project analyses based on the software effort data such as the earned value management. In particular, the software reliability growth models can </span><span style="font-family:Verdana;">apply to the system testing phase of software development. On the other</span><span style="font-family:Verdana;"> hand, the software effort analysis can apply to all development phase, because the fault data is only recorded on the testing phase. We focus on the big fault data and effort data of open source software. Then, it is difficult to assess by using the typical statistical assessment method, because the data recorded on the bug tracking system is large scale. Also, we discuss the jump diffusion process model based on the estimation method of jump parameters by using the discriminant analysis. Moreover, we analyze actual big fault data to show numerical examples of software effort assessment considering many categorical data set.展开更多
Clinical databases have accumulated large quantities of information about patients and their medical conditions. Current challenges in biomedical research and clinical practice include information overload and the nee...Clinical databases have accumulated large quantities of information about patients and their medical conditions. Current challenges in biomedical research and clinical practice include information overload and the need to optimize workflows, processes and guidelines, to increase capacity while reducing costs and improving efficiency. There is an urgent need for integrative and interactive machine learning solutions, because no medical doctor or biomedical researcher can keep pace today with the increasingly large and complex data sets – often called "Big Data".展开更多
文摘This study aims to explore the application of Bayesian analysis based on neural networks and deep learning in data visualization.The research background is that with the increasing amount and complexity of data,traditional data analysis methods have been unable to meet the needs.Research methods include building neural networks and deep learning models,optimizing and improving them through Bayesian analysis,and applying them to the visualization of large-scale data sets.The results show that the neural network combined with Bayesian analysis and deep learning method can effectively improve the accuracy and efficiency of data visualization,and enhance the intuitiveness and depth of data interpretation.The significance of the research is that it provides a new solution for data visualization in the big data environment and helps to further promote the development and application of data science.
基金supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(2018R1D1A1B07040282 to JJ)+1 种基金a grant from Kyung Hee University in 2018(KHU-20181065 to JJ)
文摘A Schwann cell has regenerative capabilities and is an important cell in the peripheral nervous system.This microarray study is part of a bioinformatics study that focuses mainly on Schwann cells. Microarray data provide information on differences between microarray-based and experiment-based gene expression analyses. According to microarray data, several genes exhibit increased expression(fold change) but they are weakly expressed in experimental studies(based on morphology, protein and mRNA levels). In contrast, some genes are weakly expressed in microarray data and highly expressed in experimental studies;such genes may represent future target genes in Schwann cell studies. These studies allow us to learn about additional genes that could be used to achieve targeted results from experimental studies. In the current big data study by retrieving more than 5000 scientific articles from PubMed or NCBI, Google Scholar, and Google, 1016(up-and downregulated) genes were determined to be related to Schwann cells. However,no experiment was performed in the laboratory; rather, the present study is part of a big data analysis. Our study will contribute to our understanding of Schwann cell biology by aiding in the identification of genes.Based on a comparative analysis of all microarray data, we conclude that the microarray could be a good tool for predicting the expression and intensity of different genes of interest in actual experiments.
基金National Key R&D Program of China(Grant No.2018YFB1701701)Sailing Talent Program+1 种基金Guangdong Provincial Science and Technologies Program of China(Grant No.2017B090922008)Special Grand Grant from Tianjin City Government of China。
文摘Big data on product sales are an emerging resource for supporting modular product design to meet diversified customers’requirements of product specification combinations.To better facilitate decision-making of modular product design,correlations among specifications and components originated from customers’conscious and subconscious preferences can be investigated by using big data on product sales.This study proposes a framework and the associated methods for supporting modular product design decisions based on correlation analysis of product specifications and components using big sales data.The correlations of the product specifications are determined by analyzing the collected product sales data.By building the relations between the product components and specifications,a matrix for measuring the correlation among product components is formed for component clustering.Six rules for supporting the decision making of modular product design are proposed based on the frequency analysis of the specification values per component cluster.A case study of electric vehicles illustrates the application of the proposed method.
基金supported by a National Research Foundation of Korea(NRF)(http://nrf.re.kr/eng/index)grant funded by the Korean government(NRF-2020R1A2C1014957).
文摘As of 2020,the issue of user satisfaction has generated a significant amount of interest.Therefore,we employ a big data approach for exploring user satisfaction among Uber users.We develop a research model of user satisfaction by expanding the list of user experience(UX)elements(i.e.,pragmatic,expectation confirmation,hedonic,and burden)by including more elements,namely:risk,cost,promotion,anxiety,sadness,and anger.Subsequently,we collect 125,768 comments from online reviews of Uber services and perform a sentiment analysis to extract the UX elements.The results of a regression analysis reveal the following:hedonic,promotion,and pragmatic significantly and positively affect user satisfaction,while burden,cost,and risk have a substantial negative influence.However,the influence of expectation confirmation on user satisfaction is not supported.Moreover,sadness,anxiety,and anger are positively related to the perceived risk of users.Compared with sadness and anxiety,anger has a more important role in increasing the perceived burden of users.Based on these findings,we also provide some theoretical implications for future UX literature and some core suggestions related to establishing strategies for Uber and similar services.The proposed big data approach may be utilized in other UX studies in the future.
基金We are grateful to the support from the National Natural Science Foundation of China (NSFC) (31425002, 91231205, 81430011, 61303161, 31470220, and 31327001), and the Frontier Science Research Program, the Soil-Microbe System Function and Regulation Program, and the Science and Technology Service Network Initiative (STS) from the Chinese Academy of Sciences (CAS).
文摘Method development has always been and will continue to be a core driving force of microbiome science, In this perspective, we argue that in the next decade, method development in microbiome analysis will be driven by three key changes in both ways of thinking and technological platforms: ① a shift from dissecting microbiota structure by sequencing to tracking microbiota state, function, and intercellular interaction via imaging; ② a shift from interrogating a consortium or population of cells to probing individual cells; and ③a shift from microbiome data analysis to microbiome data science. Some of the recent methoddevelopment efforts by Chinese microbiome scientists and their international collaborators that underlie these technological trends are highlighted here. It is our belief that the China Microbiome Initiative has the opportunity to deliver outstanding "Made-in-China" tools to the international research community, by building an ambitious, competitive, and collaborative program at the forefront of method development for microbiome science.
基金This research work is supported by Hunan Provincial Education Science 13th Five Year Plan(Grant No.XJK016BXX001)Social Science Foundation of Hunan Province(Grant No.17YBA049)+2 种基金Hunan Provincial Natural Science Foundation of China(Grant No.2017JJ2016)National Students’platform for innovation and entrepreneurship training(Grant No.201811532010)The work is also supported by Open foundation for University Innovation Platform from Hunan Province,China(Grand No.16K013)and the 2011 Collaborative Innovation Center of Big Data for Financial and Economical Asset Development and Utility in Universities of Hunan Province.We also thank the anonymous reviewers for their valuable comments and insightful suggestions.
文摘Graphical methods are used for construction.Data analysis and visualization are an important area of applications of big data.At the same time,visual analysis is also an important method for big data analysis.Data visualization refers to data that is presented in a visual form,such as a chart or map,to help people understand the meaning of the data.Data visualization helps people extract meaning from data quickly and easily.Visualization can be used to fully demonstrate the patterns,trends,and dependencies of your data,which can be found in other displays.Big data visualization analysis combines the advantages of computers,which can be static or interactive,interactive analysis methods and interactive technologies,which can directly help people and effectively understand the information behind big data.It is indispensable in the era of big data visualization,and it can be very intuitive if used properly.Graphical analysis also found that valuable information becomes a powerful tool in complex data relationships,and it represents a significant business opportunity.With the rise of big data,important technologies suitable for dealing with complex relationships have emerged.Graphics come in a variety of shapes and sizes for a variety of business problems.Graphic analysis is first in the visualization.The step is to get the right data and answer the goal.In short,to choose the right method,you must understand each relative strengths and weaknesses and understand the data.Key steps to get data:target;collect;clean;connect.
文摘In this paper,we describe a method of emotion analysis on social big data.Social big data means text data that is emerging on Internet social networking services.We collect multilingual web corpora and annotated emotion tags to these corpora for the purpose of emotion analysis.Because these data are constructed by manual annotation,their quality is high but their quantity is low.If we create an emotion analysis model based on this corpus with high quality and use the model for the analysis of social big data,we might be able to statistically analyze emotional sensesand behavior of the people in Internet communications,which we could not know before.In this paper,we create an emotion analysis model that integrate the highquality emotion corpus and the automaticconstructed corpus that we created in our past studies,and then analyze a large-scale corpus consisting of Twitter tweets based on the model.As the result of time-series analysis on the large-scale corpus and the result of model evaluation,we show the effectiveness of our proposed method.
文摘In the era of big data, huge volumes of data are generated from online social networks, sensor networks, mobile devices, and organizations’ enterprise systems. This phenomenon provides organizations with unprecedented opportunities to tap into big data to mine valuable business intelligence. However, traditional business analytics methods may not be able to cope with the flood of big data. The main contribution of this paper is the illustration of the development of a novel big data stream analytics framework named BDSASA that leverages a probabilistic language model to analyze the consumer sentiments embedded in hundreds of millions of online consumer reviews. In particular, an inference model is embedded into the classical language modeling framework to enhance the prediction of consumer sentiments. The practical implication of our research work is that organizations can apply our big data stream analytics framework to analyze consumers’ product preferences, and hence develop more effective marketing and production strategies.
文摘Opinion (sentiment) analysis on big data streams from the constantly generated text streams on social media networks to hundreds of millions of online consumer reviews provides many organizations in every field with opportunities to discover valuable intelligence from the massive user generated text streams. However, the traditional content analysis frameworks are inefficient to handle the unprecedentedly big volume of unstructured text streams and the complexity of text analysis tasks for the real time opinion analysis on the big data streams. In this paper, we propose a parallel real time sentiment analysis system: Social Media Data Stream Sentiment Analysis Service (SMDSSAS) that performs multiple phases of sentiment analysis of social media text streams effectively in real time with two fully analytic opinion mining models to combat the scale of text data streams and the complexity of sentiment analysis processing on unstructured text streams. We propose two aspect based opinion mining models: Deterministic and Probabilistic sentiment models for a real time sentiment analysis on the user given topic related data streams. Experiments on the social media Twitter stream traffic captured during the pre-election weeks of the 2016 Presidential election for real-time analysis of public opinions toward two presidential candidates showed that the proposed system was able to predict correctly Donald Trump as the winner of the 2016 Presidential election. The cross validation results showed that the proposed sentiment models with the real-time streaming components in our proposed framework delivered effectively the analysis of the opinions on two presidential candidates with average 81% accuracy for the Deterministic model and 80% for the Probabilistic model, which are 1% - 22% improvements from the results of the existing literature.
基金This research was supported by a grant[KCG-01-2017-01]through the Disaster and Safety Management Institute funded by the Ministry of Public Safety and Securitythe National Research Foundation of Korea(NRF)grant[No.2018R1D1A1B07050208]funded by the Ministry of Science and ICT of Korea Government.
文摘This study developed a new methodology for analyzing the risk level of marine spill accidents from two perspectives,namely,marine traffic density and sensitive resources.Through a case study conducted in Busan,South Korea,detailed procedures of the methodology were proposed and its scalability was confirmed.To analyze the risk from a more detailed and microscopic viewpoint,vessel routes as hazard sources were delineated on the basis of automated identification system(AIS)big data.The outliers and errors of AIS big data were removed using the density-based spatial clustering of applications with noise algorithm,and a marine traffic density map was evaluated by combining all of the gridded routes.Vulnerability of marine environment was identified on the basis of the sensitive resource map constructed by the Korea Coast Guard in a similar manner to the National Oceanic and Atmospheric Administration environmental sensitivity index approach.In this study,aquaculture sites,water intake facilities of power plants,and beach/resort areas were selected as representative indicators for each category.The vulnerability values of neighboring cells decreased according to the Euclidean distance from the resource cells.Two resulting maps were aggregated to construct a final sensitive resource and traffic density(SRTD)risk analysis map of the Busan–Ulsan sea areas.We confirmed the effectiveness of SRTD risk analysis by comparing it with the actual marine spill accident records.Results show that all of the marine spill accidents in 2018 occurred within 2 km of high-risk cells(level 6 and above).Thus,if accident management and monitoring capabilities are concentrated on high-risk cells,which account for only 6.45%of the total study area,then it is expected that it will be possible to cope with most marine spill accidents effectively.
文摘There are a lot of biological and experimental data from genomics, proteomics, drug screening, medicinal chemistry, etc. A large amount of data must be analyzed by special methods of statistics, bioinformatics, and computer science. Big data analysis is an effective way to build scientific hypothesis and explore internal mechanism.Here, gene expression is taken as an example to illustrate the basic procedure of the big data analysis.
基金We thank the anonymous reviewers and editors for their very constructive comments.the National Social Science Foundation Project of China under Grant 16BTQ085.
文摘The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on privacy preserving data publishing and access control.There is little research on the association of user privacy information,so it is not easy to design personalized privacy protection strategy,but also increase the complexity of user privacy settings.Therefore,this paper concentrates on the association of user privacy information taking big data analysis tools,so as to provide data support for personalized privacy protection strategy design.
基金Nanjing Key Laboratory of Intelligent Information Processing Open Fund Project(No.19AIP05)。
文摘Big data analysis has penetrated into all fields of society and has brought about profound changes.However,there is relatively little research on big data supporting student management regarding college and university’s big data.Taking the student card information as the research sample,using spark big data mining technology and K-Means clustering algorithm,taking scholarship evaluation as an example,the big data is analyzed.Data includes analysis of students’daily behavior from multiple dimensions,and it can prevent the unreasonable scholarship evaluation caused by unfair factors such as plagiarism,votes of teachers and students,etc.At the same time,students’absenteeism,physical health and psychological status in advance can be predicted,which makes student management work more active,accurate and effective.
基金Key Major of Audit Science in quality Engineering Project of Private Universities in 2020(Grant No.:HS2020ZLGC06)Supervisor System Research Project of Huashang College of Guangdong University of Finance and Economics in 2018(Grant No.:2018HSDS03)University Quality Engineering of Huashang College in 2021(Grant No.:HS2021ZLGC19)。
文摘With the arrival of the era of big data,the audit thinking mode has been promoted to change.Under the influence of big data,audit will become an activity of continuous behavio Through cloud data,the staff can control the operation status and risk assessment of the whole enterprise,timely analyze,control and respond to risks,and protect the enterprise to reduce risks.With the advent of the era of big data,audit data analysis is becoming more and more important.At the same time,a large amount of data analysis also brings challenges to auditors.Methods to deal and solve the challenges has become an urgent problem to be solved at present.This paper mainly studies the challenges and countermeasures brought by the changes of audit approaches and methods to audit data analysis under the background of big data,so as to continuously innovate and practice the improvement of audit technology and promote the healthy and rapid development of social economy.
文摘The technological evolution emerges a unified (Industrial) Internet of Things network, where loosely coupled smart manufacturing devices build smart manufacturing systems and enable comprehensive collaboration possibilities that increase the dynamic and volatility of their ecosystems. On the one hand, this evolution generates a huge field for exploitation, but on the other hand also increases complexity including new challenges and requirements demanding for new approaches in several issues. One challenge is the analysis of such systems that generate huge amounts of (continuously generated) data, potentially containing valuable information useful for several use cases, such as knowledge generation, key performance indicator (KPI) optimization, diagnosis, predication, feedback to design or decision support. This work presents a review of Big Data analysis in smart manufacturing systems. It includes the status quo in research, innovation and development, next challenges, and a comprehensive list of potential use cases and exploitation possibilities.
文摘By using CiteSpace software to create a knowledge map of authors,institutions and keywords,the literature on the spatio-temporal behavior of Chinese residents based on big data in the architectural planning discipline published in the China Academic Network Publishing Database(CNKI)was analyzed and discussed.It is found that there was a lack of communication and cooperation among research institutions and scholars;the research hotspots involved four main areas,including“application in tourism research”,“application in traffic travel research”,“application in work-housing relationship research”,and“application in personal family life research”.
文摘Rough set theory is relativly new to area of soft computing to handle the uncertain big data efficiently. It also provides a powerful way to calculate the importance degree of vague and uncertain big data to help in decision making. Risk assessment is very important for safe and reliable investment. Risk management involves assessing the risk sources and designing strategies and procedures to mitigate those risks to an acceptable level. In this paper, we emphasize on classification of different types of risk factors and find a simple and effective way to calculate the risk exposure.. The study uses rough set method to classify and judge the safety attributes related to investment policy. The method which based on intelligent knowledge accusation provides an innovative way for risk analysis. From this approach, we are able to calculate the significance of each factor and relative risk exposure based on the original data without assigning the weight subjectively.
文摘Quantitative analysis of digital images requires detection and segmentation of the borders of the object of interest. Accurate segmentation is required for volume determination, 3D rendering, radiation therapy, and surgery planning. In medical images, segmentation has traditionally been done by human experts. Substantial computational and storage requirements become especially acute when object orientation and scale have to be considered. Therefore, automated or semi-automated segmentation techniques are essential if these software applications are ever to gain widespread clinical use. Many methods have been proposed to detect and segment 2D shapes, most of which involve template matching. Advanced segmentation techniques called Snakes or active contours have been used, considering deformable models or templates. The main purpose of this work is to apply segmentation techniques for the definition of 3D organs (anatomical structures) when big data information has been stored and must be organized by the doctors for medical diagnosis. The processes would be implemented in the CT images from patients with COVID-19.
文摘The bug tracking system is well known as the project support tool of open source software. There are many categorical data sets recorded on the bug tracking system. In the past, many reliability assessment methods have been proposed in the research area of software reliability. Also, there are several software project analyses based on the software effort data such as the earned value management. In particular, the software reliability growth models can </span><span style="font-family:Verdana;">apply to the system testing phase of software development. On the other</span><span style="font-family:Verdana;"> hand, the software effort analysis can apply to all development phase, because the fault data is only recorded on the testing phase. We focus on the big fault data and effort data of open source software. Then, it is difficult to assess by using the typical statistical assessment method, because the data recorded on the bug tracking system is large scale. Also, we discuss the jump diffusion process model based on the estimation method of jump parameters by using the discriminant analysis. Moreover, we analyze actual big fault data to show numerical examples of software effort assessment considering many categorical data set.
文摘Clinical databases have accumulated large quantities of information about patients and their medical conditions. Current challenges in biomedical research and clinical practice include information overload and the need to optimize workflows, processes and guidelines, to increase capacity while reducing costs and improving efficiency. There is an urgent need for integrative and interactive machine learning solutions, because no medical doctor or biomedical researcher can keep pace today with the increasingly large and complex data sets – often called "Big Data".