The connectivity of sandbodies is a key constraint to the exploration effectiveness of Bohai A Oilfield.Conventional connectivity studies often use methods such as seismic attribute fusion,while the development of con...The connectivity of sandbodies is a key constraint to the exploration effectiveness of Bohai A Oilfield.Conventional connectivity studies often use methods such as seismic attribute fusion,while the development of contiguous composite sandbodies in this area makes it challenging to characterize connectivity changes with conventional seismic attributes.Aiming at the above problem in the Bohai A Oilfield,this study proposes a big data analysis method based on the Deep Forest algorithm to predict the sandbody connectivity.Firstly,by compiling the abundant exploration and development sandbodies data in the study area,typical sandbodies with reliable connectivity were selected.Then,sensitive seismic attribute were extracted to obtain training samples.Finally,based on the Deep Forest algorithm,mapping model between attribute combinations and sandbody connectivity was established through machine learning.This method achieves the first quantitative determination of the connectivity for continuous composite sandbodies in the Bohai Oilfield.Compared with conventional connectivity discrimination methods such as high-resolution processing and seismic attribute analysis,this method can combine the sandbody characteristics of the study area in the process of machine learning,and jointly judge connectivity by combining multiple seismic attributes.The study results show that this method has high accuracy and timeliness in predicting connectivity for continuous composite sandbodies.Applied to the Bohai A Oilfield,it successfully identified multiple sandbody connectivity relationships and provided strong support for the subsequent exploration potential assessment and well placement optimization.This method also provides a new idea and method for studying sandbody connectivity under similar complex geological conditions.展开更多
Method development has always been and will continue to be a core driving force of microbiome science, In this perspective, we argue that in the next decade, method development in microbiome analysis will be driven by...Method development has always been and will continue to be a core driving force of microbiome science, In this perspective, we argue that in the next decade, method development in microbiome analysis will be driven by three key changes in both ways of thinking and technological platforms: ① a shift from dissecting microbiota structure by sequencing to tracking microbiota state, function, and intercellular interaction via imaging; ② a shift from interrogating a consortium or population of cells to probing individual cells; and ③a shift from microbiome data analysis to microbiome data science. Some of the recent methoddevelopment efforts by Chinese microbiome scientists and their international collaborators that underlie these technological trends are highlighted here. It is our belief that the China Microbiome Initiative has the opportunity to deliver outstanding "Made-in-China" tools to the international research community, by building an ambitious, competitive, and collaborative program at the forefront of method development for microbiome science.展开更多
The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on pr...The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on privacy preserving data publishing and access control.There is little research on the association of user privacy information,so it is not easy to design personalized privacy protection strategy,but also increase the complexity of user privacy settings.Therefore,this paper concentrates on the association of user privacy information taking big data analysis tools,so as to provide data support for personalized privacy protection strategy design.展开更多
Quantitative analysis of digital images requires detection and segmentation of the borders of the object of interest. Accurate segmentation is required for volume determination, 3D rendering, radiation therapy, and su...Quantitative analysis of digital images requires detection and segmentation of the borders of the object of interest. Accurate segmentation is required for volume determination, 3D rendering, radiation therapy, and surgery planning. In medical images, segmentation has traditionally been done by human experts. Substantial computational and storage requirements become especially acute when object orientation and scale have to be considered. Therefore, automated or semi-automated segmentation techniques are essential if these software applications are ever to gain widespread clinical use. Many methods have been proposed to detect and segment 2D shapes, most of which involve template matching. Advanced segmentation techniques called Snakes or active contours have been used, considering deformable models or templates. The main purpose of this work is to apply segmentation techniques for the definition of 3D organs (anatomical structures) when big data information has been stored and must be organized by the doctors for medical diagnosis. The processes would be implemented in the CT images from patients with COVID-19.展开更多
Metastasis is the greatest contributor to cancer?related death.In the era of precision medicine,it is essential to predict and to prevent the spread of cancer cells to significantly improve patient survival.Thanks to ...Metastasis is the greatest contributor to cancer?related death.In the era of precision medicine,it is essential to predict and to prevent the spread of cancer cells to significantly improve patient survival.Thanks to the application of a variety of high?throughput technologies,accumulating big data enables researchers and clinicians to identify aggressive tumors as well as patients with a high risk of cancer metastasis.However,there have been few large?scale gene collection studies to enable metastasis?related analyses.In the last several years,emerging efforts have identi?fied pro?metastatic genes in a variety of cancers,providing us the ability to generate a pro?metastatic gene cluster for big data analyses.We carefully selected 285 genes with in vivo evidence of promoting metastasis reported in the literature.These genes have been investigated in different tumor types.We used two datasets downloaded from The Cancer Genome Atlas database,specifically,datasets of clear cell renal cell carcinoma and hepatocellular carcinoma,for validation tests,and excluded any genes for which elevated expression level correlated with longer overall survival in any of the datasets.Ultimately,150 pro?metastatic genes remained in our analyses.We believe this collection of pro?metastatic genes will be helpful for big data analyses,and eventually will accelerate anti?metastasis research and clinical intervention.展开更多
In view of the frequent fluctuation of garlic price under the market economy and the current situation of garlic price,the fluctuation of garlic price in the circulation link of garlic industry chain is analyzed,and t...In view of the frequent fluctuation of garlic price under the market economy and the current situation of garlic price,the fluctuation of garlic price in the circulation link of garlic industry chain is analyzed,and the application mode of multidisciplinary in the agricultural industry is discussed.On the basis of the big data platform of garlic industry chain,this paper constructs a Garch model to analyze the fluctuation law of garlic price in the circulation link and provides the garlic industry service from the angle of price fluctuation combined with the economic analysis.The research shows that the average price rate of the price of garlic shows“agglomeration”and cyclical phenomenon,which has the characteristics of fragility,left and a non-normal distribution and the fitting value of the GARCH model is very close to the true value.Finally,it looks into the industrial service form from the perspective of garlic price fluctuation.展开更多
The year of 2011 is considered the first year of big data market in China.Compared with the global scale,China's big data growth will be faster than the global average growth rate,and China will usher in the rapid...The year of 2011 is considered the first year of big data market in China.Compared with the global scale,China's big data growth will be faster than the global average growth rate,and China will usher in the rapid expansion of big data market in the next few years.This paper presents the overall big data development in China in terms of market scale and development stages,enterprise development in the industry chain,the technology standards,and industrial applications.The paper points out the issues and challenges facing big data development in China and proposes to make polices and create support approaches for big data transactions and personal privacy protection.展开更多
Monitoring,understanding and predicting Origin-destination(OD)flows in a city is an important problem for city planning and human activity.Taxi-GPS traces,acted as one kind of typical crowd sensed data,it can be used ...Monitoring,understanding and predicting Origin-destination(OD)flows in a city is an important problem for city planning and human activity.Taxi-GPS traces,acted as one kind of typical crowd sensed data,it can be used to mine the semantics of OD flows.In this paper,we firstly construct and analyze a complex network of OD flows based on large-scale GPS taxi traces of a city in China.The spatiotemporal analysis for the OD flows complex network showed that there were distinctive patterns in OD flows.Then based on a novel complex network model,a semantics mining method of OD flows is proposed through compounding Points of Interests(POI)network and public transport network to the OD flows network.The propose method would offer a novel way to predict the location characteristic and future traffic conditions accurately.展开更多
In recent years,China has successfully set up multiple single-product big data platforms.As an indigenous and unique plant in China,the peony contains immense economic returns,strong social benefits,and profound cultu...In recent years,China has successfully set up multiple single-product big data platforms.As an indigenous and unique plant in China,the peony contains immense economic returns,strong social benefits,and profound cultural heritage.Its seed oil,as an emerging edible oil,has attracted much attention.Heze city is one of the places optimal for cultivating peonies.In this context,a study of the big data of the peony industry in Heze city bears practical significance.This paper begins with the literature review of big data platforms for the entire industry.Referring to established single-product big data platforms,it reports the results of a case study of the peony industry in Heze city that identify potential difficulties and problems regarding the building of a big data platform for the peony industry that incorporates the five dimensions of service,management,application,resource,and technology.展开更多
Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been wi...Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been widely used in various biomedical applications such as arrhythmia detection,disease-specific detection,mortality prediction,and biometric recognition.In recent years,ECG-related studies have been carried out using a variety of publicly available datasets,with many differences in the datasets used,data preprocessing methods,targeted challenges,and modeling and analysis techniques.Here we systematically summarize and analyze the ECGbased automatic analysis methods and applications.Specifically,we first reviewed 22 commonly used ECG public datasets and provided an overview of data preprocessing processes.Then we described some of the most widely used applications of ECG signals and analyzed the advanced methods involved in these applications.Finally,we elucidated some of the challenges in ECG analysis and provided suggestions for further research.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
In recent years, due to the widespread use of electronic services and the use of social network as well, large volumes of information are being made that this information contains various types of things such as video...In recent years, due to the widespread use of electronic services and the use of social network as well, large volumes of information are being made that this information contains various types of things such as videos, photos, texts etc. besides large volume. Due to the high volume and the lack of specificity of this information, covering them through traditional and relational databases is not possible and modem solutions should be used for processing them, so that processing speed is also covered. Data storage for processing and the way of accessing to them in memory, network communication, covering required features for distributed system in solutions that are in use for storing big data, are the items that should be covered. In this paper, a collection of advantages and challenges of big data, special features and characteristics of them has been provided and with the introduction of technologies in use, storage methods are studied and research opportunities to continue the way will be introduced.展开更多
The advent of the big data era has made data visualization a crucial tool for enhancing the efficiency and insights of data analysis. This theoretical research delves into the current applications and potential future...The advent of the big data era has made data visualization a crucial tool for enhancing the efficiency and insights of data analysis. This theoretical research delves into the current applications and potential future trends of data visualization in big data analysis. The article first systematically reviews the theoretical foundations and technological evolution of data visualization, and thoroughly analyzes the challenges faced by visualization in the big data environment, such as massive data processing, real-time visualization requirements, and multi-dimensional data display. Through extensive literature research, it explores innovative application cases and theoretical models of data visualization in multiple fields including business intelligence, scientific research, and public decision-making. The study reveals that interactive visualization, real-time visualization, and immersive visualization technologies may become the main directions for future development and analyzes the potential of these technologies in enhancing user experience and data comprehension. The paper also delves into the theoretical potential of artificial intelligence technology in enhancing data visualization capabilities, such as automated chart generation, intelligent recommendation of visualization schemes, and adaptive visualization interfaces. The research also focuses on the role of data visualization in promoting interdisciplinary collaboration and data democratization. Finally, the paper proposes theoretical suggestions for promoting data visualization technology innovation and application popularization, including strengthening visualization literacy education, developing standardized visualization frameworks, and promoting open-source sharing of visualization tools. This study provides a comprehensive theoretical perspective for understanding the importance of data visualization in the big data era and its future development directions.展开更多
In the United States,the buildings sector consumes about 76%of electricity use and 40% of all primary energy use and associated greenhouse gas emissions.Occupant behavior has drawn increasing research interests due to...In the United States,the buildings sector consumes about 76%of electricity use and 40% of all primary energy use and associated greenhouse gas emissions.Occupant behavior has drawn increasing research interests due to its impacts on the building energy consumption.However,occupant behavior study at urban scale remains a challenge,and very limited studies have been conducted.As an effort to couple big data analysis with human mobility modeling,this study has explored urban scale human mobility utilizing three months Global Positioning System(GPS)data of 93,o00 users at Phoenix Metropolitan Area.This research extracted stay points from raw data,and identified users'home,work,and other locations by Density-Based Spatial Clustering algorithm.Then,daily mobility patterns were constructed using different types of locations.We propose a novel approach to predict urban scale daily human mobility patterns with 12-hour prediction horizon,using Long Short-Term Memory(LSTM)neural network model.Results shows the developed models achieved around 85%average accuracy and about 86%mean precision.The developed models can be further applied to analyze urban scale occupant behavior,building energy demand and flexibility,and contributed to urban planning.展开更多
This paper proposes a method for improving the data security of wireless sensor networks based on blockchain technology.Blockchain technology is applied to data transfer to build a highly secure wireless sensor networ...This paper proposes a method for improving the data security of wireless sensor networks based on blockchain technology.Blockchain technology is applied to data transfer to build a highly secure wireless sensor network.In this network,the relay stations use microcontrollers and embedded devices,and the microcontrollers,such as Raspberry Pi and Arduino Yun,represents mobile databases.The proposed system uses microcontrollers to facilitate the connection of various sensor devices.By adopting blockchain encryption,the security of sensing data can be effectively improved.A blockchain is a concatenated transaction record that is protected by cryptography.Each section contains the encrypted hash of the previous section,the corresponding timestamp,and transaction data.The transaction data denote the sensing data of the wireless sensing network.The proposed system uses a hash value representation calculated by the Merkel-tree algorithm,which makes the transfer data of the system difficult to be tamped with.However,the proposed system can serve as a private cloud data center.In this study,the system visualizes the data uploaded by sensors and create relevant charts based on big data analysis.Since the webpage server of the proposed system is built on an embedded operating system,it is easy to model and visualize the corresponding graphics using Python or JavaScript programming language.Finally,this study creates an embedded system mobile database and web server,which can utilize JavaScript program language and Node.js runtime environment to apply blockchain technology to mobile databases.The proposed method is verified by the experiment using about 1600 data records.The results show that the possibility of data being changed is very small,and the probability of data being changed is almost zero.展开更多
The term sentiment analysis deals with sentiment classification based on the review made by the user in a social network.The sentiment classification accuracy is evaluated using various selection methods,especially thos...The term sentiment analysis deals with sentiment classification based on the review made by the user in a social network.The sentiment classification accuracy is evaluated using various selection methods,especially those that deal with algorithm selection.In this work,every sentiment received through user expressions is ranked in order to categorise sentiments as informative and non-informative.In order to do so,the work focus on Query Expansion Ranking(QER)algorithm that takes user text as input and process for sentiment analysis andfinally produces the results as informative or non-informative.The challenge is to convert non-informative into informative using the concepts of classifiers like Bayes multinomial,entropy modelling along with the traditional sentimental analysis algorithm like Support Vector Machine(SVM)and decision trees.The work also addresses simulated annealing along with QER to classify data based on sentiment analysis.As the input volume is very fast,the work also addresses the concept of big data for information retrieval and processing.The result com-parison shows that the QER algorithm proved to be versatile when compared with the result of SVM.This work uses Twitter user comments for evaluating senti-ment analysis.展开更多
Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data i...Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters.Thus,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes.In performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model.New distributed computing frameworks need to be developed to conquer these challenges.In this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis.In addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.展开更多
The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, custome...The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, etc.While the amount of textual data is increasing rapidly, users ability to summarise, understand, and make sense of such data for making better business/living decisions remains challenging. This paper studies how to analyse textual data, based on layered software patterns, for extracting insightful user intelligence from a large collection of documents and for using such information to improve user operations and performance.展开更多
Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed...Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability.In this paper,we present a comprehensive survey of the methods and techniques of data partitioning and sampling with respect to big data processing and analysis.We start with an overview of the mainstream big data frameworks on Hadoop clusters.The basic methods of data partitioning are then discussed including three classical horizontal partitioning schemes:range,hash,and random partitioning.Data partitioning on Hadoop clusters is also discussed with a summary of new strategies for big data partitioning,including the new Random Sample Partition(RSP)distributed model.The classical methods of data sampling are then investigated,including simple random sampling,stratified sampling,and reservoir sampling.Two common methods of big data sampling on computing clusters are also discussed:record-level sampling and blocklevel sampling.Record-level sampling is not as efficient as block-level sampling on big distributed data.On the other hand,block-level sampling on data blocks generated with the classical data partitioning methods does not necessarily produce good representative samples for approximate computing of big data.In this survey,we also summarize the prevailing strategies and related work on sampling-based approximation on Hadoop clusters.We believe that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.展开更多
This study innovatively built an intelligent analysis platform for learning behavior,which deeply integrated the cutting-edge technology of big data and Artificial Intelligence(AI),\mined and analyzed students’learni...This study innovatively built an intelligent analysis platform for learning behavior,which deeply integrated the cutting-edge technology of big data and Artificial Intelligence(AI),\mined and analyzed students’learning data,and realized the personalized customization of learning resources and the accurate matching of intelligent learning partners.With the help of advanced algorithms and multi-dimensional data fusion strategies,the platform not only promotes positive interaction and collaboration in the learning environment but also provides teachers with comprehensive and in-depth students’learning portraits,which provides solid support for the implementation of precision education and the personalized adjustment of teaching strategies.In this study,a recommender system based on user similarity evaluation and a collaborative filtering mechanism is carefully designed,and its technical architecture and implementation process are described in detail.展开更多
文摘The connectivity of sandbodies is a key constraint to the exploration effectiveness of Bohai A Oilfield.Conventional connectivity studies often use methods such as seismic attribute fusion,while the development of contiguous composite sandbodies in this area makes it challenging to characterize connectivity changes with conventional seismic attributes.Aiming at the above problem in the Bohai A Oilfield,this study proposes a big data analysis method based on the Deep Forest algorithm to predict the sandbody connectivity.Firstly,by compiling the abundant exploration and development sandbodies data in the study area,typical sandbodies with reliable connectivity were selected.Then,sensitive seismic attribute were extracted to obtain training samples.Finally,based on the Deep Forest algorithm,mapping model between attribute combinations and sandbody connectivity was established through machine learning.This method achieves the first quantitative determination of the connectivity for continuous composite sandbodies in the Bohai Oilfield.Compared with conventional connectivity discrimination methods such as high-resolution processing and seismic attribute analysis,this method can combine the sandbody characteristics of the study area in the process of machine learning,and jointly judge connectivity by combining multiple seismic attributes.The study results show that this method has high accuracy and timeliness in predicting connectivity for continuous composite sandbodies.Applied to the Bohai A Oilfield,it successfully identified multiple sandbody connectivity relationships and provided strong support for the subsequent exploration potential assessment and well placement optimization.This method also provides a new idea and method for studying sandbody connectivity under similar complex geological conditions.
基金We are grateful to the support from the National Natural Science Foundation of China (NSFC) (31425002, 91231205, 81430011, 61303161, 31470220, and 31327001), and the Frontier Science Research Program, the Soil-Microbe System Function and Regulation Program, and the Science and Technology Service Network Initiative (STS) from the Chinese Academy of Sciences (CAS).
文摘Method development has always been and will continue to be a core driving force of microbiome science, In this perspective, we argue that in the next decade, method development in microbiome analysis will be driven by three key changes in both ways of thinking and technological platforms: ① a shift from dissecting microbiota structure by sequencing to tracking microbiota state, function, and intercellular interaction via imaging; ② a shift from interrogating a consortium or population of cells to probing individual cells; and ③a shift from microbiome data analysis to microbiome data science. Some of the recent methoddevelopment efforts by Chinese microbiome scientists and their international collaborators that underlie these technological trends are highlighted here. It is our belief that the China Microbiome Initiative has the opportunity to deliver outstanding "Made-in-China" tools to the international research community, by building an ambitious, competitive, and collaborative program at the forefront of method development for microbiome science.
基金We thank the anonymous reviewers and editors for their very constructive comments.the National Social Science Foundation Project of China under Grant 16BTQ085.
文摘The issue of privacy protection for mobile social networks is a frontier topic in the field of social network applications.The existing researches on user privacy protection in mobile social network mainly focus on privacy preserving data publishing and access control.There is little research on the association of user privacy information,so it is not easy to design personalized privacy protection strategy,but also increase the complexity of user privacy settings.Therefore,this paper concentrates on the association of user privacy information taking big data analysis tools,so as to provide data support for personalized privacy protection strategy design.
文摘Quantitative analysis of digital images requires detection and segmentation of the borders of the object of interest. Accurate segmentation is required for volume determination, 3D rendering, radiation therapy, and surgery planning. In medical images, segmentation has traditionally been done by human experts. Substantial computational and storage requirements become especially acute when object orientation and scale have to be considered. Therefore, automated or semi-automated segmentation techniques are essential if these software applications are ever to gain widespread clinical use. Many methods have been proposed to detect and segment 2D shapes, most of which involve template matching. Advanced segmentation techniques called Snakes or active contours have been used, considering deformable models or templates. The main purpose of this work is to apply segmentation techniques for the definition of 3D organs (anatomical structures) when big data information has been stored and must be organized by the doctors for medical diagnosis. The processes would be implemented in the CT images from patients with COVID-19.
基金supported by grants from the National Natural Science Foundation of China(No.81272340,No.81472386,No.81672872)the National High Technology Research and Development Program of China(863 Program)(No.2012AA02A501)+1 种基金the Science and Technology Planning Project of Guangdong Province,China(No.2014B020212017,No.2014B050504004 and No.2015B050501005)the Natural Science Foundation of Guangdong Province,China(No.2016A030311011)
文摘Metastasis is the greatest contributor to cancer?related death.In the era of precision medicine,it is essential to predict and to prevent the spread of cancer cells to significantly improve patient survival.Thanks to the application of a variety of high?throughput technologies,accumulating big data enables researchers and clinicians to identify aggressive tumors as well as patients with a high risk of cancer metastasis.However,there have been few large?scale gene collection studies to enable metastasis?related analyses.In the last several years,emerging efforts have identi?fied pro?metastatic genes in a variety of cancers,providing us the ability to generate a pro?metastatic gene cluster for big data analyses.We carefully selected 285 genes with in vivo evidence of promoting metastasis reported in the literature.These genes have been investigated in different tumor types.We used two datasets downloaded from The Cancer Genome Atlas database,specifically,datasets of clear cell renal cell carcinoma and hepatocellular carcinoma,for validation tests,and excluded any genes for which elevated expression level correlated with longer overall survival in any of the datasets.Ultimately,150 pro?metastatic genes remained in our analyses.We believe this collection of pro?metastatic genes will be helpful for big data analyses,and eventually will accelerate anti?metastasis research and clinical intervention.
文摘In view of the frequent fluctuation of garlic price under the market economy and the current situation of garlic price,the fluctuation of garlic price in the circulation link of garlic industry chain is analyzed,and the application mode of multidisciplinary in the agricultural industry is discussed.On the basis of the big data platform of garlic industry chain,this paper constructs a Garch model to analyze the fluctuation law of garlic price in the circulation link and provides the garlic industry service from the angle of price fluctuation combined with the economic analysis.The research shows that the average price rate of the price of garlic shows“agglomeration”and cyclical phenomenon,which has the characteristics of fragility,left and a non-normal distribution and the fitting value of the GARCH model is very close to the true value.Finally,it looks into the industrial service form from the perspective of garlic price fluctuation.
文摘The year of 2011 is considered the first year of big data market in China.Compared with the global scale,China's big data growth will be faster than the global average growth rate,and China will usher in the rapid expansion of big data market in the next few years.This paper presents the overall big data development in China in terms of market scale and development stages,enterprise development in the industry chain,the technology standards,and industrial applications.The paper points out the issues and challenges facing big data development in China and proposes to make polices and create support approaches for big data transactions and personal privacy protection.
基金This work is supported by Shandong Provincial Natural Science Foundation,China under Grant No.ZR2017MG011This work is also supported by Key Research and Development Program in Shandong Provincial(2017GGX90103).
文摘Monitoring,understanding and predicting Origin-destination(OD)flows in a city is an important problem for city planning and human activity.Taxi-GPS traces,acted as one kind of typical crowd sensed data,it can be used to mine the semantics of OD flows.In this paper,we firstly construct and analyze a complex network of OD flows based on large-scale GPS taxi traces of a city in China.The spatiotemporal analysis for the OD flows complex network showed that there were distinctive patterns in OD flows.Then based on a novel complex network model,a semantics mining method of OD flows is proposed through compounding Points of Interests(POI)network and public transport network to the OD flows network.The propose method would offer a novel way to predict the location characteristic and future traffic conditions accurately.
文摘In recent years,China has successfully set up multiple single-product big data platforms.As an indigenous and unique plant in China,the peony contains immense economic returns,strong social benefits,and profound cultural heritage.Its seed oil,as an emerging edible oil,has attracted much attention.Heze city is one of the places optimal for cultivating peonies.In this context,a study of the big data of the peony industry in Heze city bears practical significance.This paper begins with the literature review of big data platforms for the entire industry.Referring to established single-product big data platforms,it reports the results of a case study of the peony industry in Heze city that identify potential difficulties and problems regarding the building of a big data platform for the peony industry that incorporates the five dimensions of service,management,application,resource,and technology.
基金Supported by the NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization(U1909208)the Science and Technology Major Project of Changsha(kh2202004)the Changsha Municipal Natural Science Foundation(kq2202106).
文摘Electrocardiogram(ECG)is a low-cost,simple,fast,and non-invasive test.It can reflect the heart’s electrical activity and provide valuable diagnostic clues about the health of the entire body.Therefore,ECG has been widely used in various biomedical applications such as arrhythmia detection,disease-specific detection,mortality prediction,and biometric recognition.In recent years,ECG-related studies have been carried out using a variety of publicly available datasets,with many differences in the datasets used,data preprocessing methods,targeted challenges,and modeling and analysis techniques.Here we systematically summarize and analyze the ECGbased automatic analysis methods and applications.Specifically,we first reviewed 22 commonly used ECG public datasets and provided an overview of data preprocessing processes.Then we described some of the most widely used applications of ECG signals and analyzed the advanced methods involved in these applications.Finally,we elucidated some of the challenges in ECG analysis and provided suggestions for further research.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
文摘In recent years, due to the widespread use of electronic services and the use of social network as well, large volumes of information are being made that this information contains various types of things such as videos, photos, texts etc. besides large volume. Due to the high volume and the lack of specificity of this information, covering them through traditional and relational databases is not possible and modem solutions should be used for processing them, so that processing speed is also covered. Data storage for processing and the way of accessing to them in memory, network communication, covering required features for distributed system in solutions that are in use for storing big data, are the items that should be covered. In this paper, a collection of advantages and challenges of big data, special features and characteristics of them has been provided and with the introduction of technologies in use, storage methods are studied and research opportunities to continue the way will be introduced.
文摘The advent of the big data era has made data visualization a crucial tool for enhancing the efficiency and insights of data analysis. This theoretical research delves into the current applications and potential future trends of data visualization in big data analysis. The article first systematically reviews the theoretical foundations and technological evolution of data visualization, and thoroughly analyzes the challenges faced by visualization in the big data environment, such as massive data processing, real-time visualization requirements, and multi-dimensional data display. Through extensive literature research, it explores innovative application cases and theoretical models of data visualization in multiple fields including business intelligence, scientific research, and public decision-making. The study reveals that interactive visualization, real-time visualization, and immersive visualization technologies may become the main directions for future development and analyzes the potential of these technologies in enhancing user experience and data comprehension. The paper also delves into the theoretical potential of artificial intelligence technology in enhancing data visualization capabilities, such as automated chart generation, intelligent recommendation of visualization schemes, and adaptive visualization interfaces. The research also focuses on the role of data visualization in promoting interdisciplinary collaboration and data democratization. Finally, the paper proposes theoretical suggestions for promoting data visualization technology innovation and application popularization, including strengthening visualization literacy education, developing standardized visualization frameworks, and promoting open-source sharing of visualization tools. This study provides a comprehensive theoretical perspective for understanding the importance of data visualization in the big data era and its future development directions.
基金supported by the U.S.National Science Foundation(Award No.1949372 and No.2125775)in part supported through computational resources provided by Syracuse University.
文摘In the United States,the buildings sector consumes about 76%of electricity use and 40% of all primary energy use and associated greenhouse gas emissions.Occupant behavior has drawn increasing research interests due to its impacts on the building energy consumption.However,occupant behavior study at urban scale remains a challenge,and very limited studies have been conducted.As an effort to couple big data analysis with human mobility modeling,this study has explored urban scale human mobility utilizing three months Global Positioning System(GPS)data of 93,o00 users at Phoenix Metropolitan Area.This research extracted stay points from raw data,and identified users'home,work,and other locations by Density-Based Spatial Clustering algorithm.Then,daily mobility patterns were constructed using different types of locations.We propose a novel approach to predict urban scale daily human mobility patterns with 12-hour prediction horizon,using Long Short-Term Memory(LSTM)neural network model.Results shows the developed models achieved around 85%average accuracy and about 86%mean precision.The developed models can be further applied to analyze urban scale occupant behavior,building energy demand and flexibility,and contributed to urban planning.
基金supported by the Department of Electrical Engineering,National Chin-Yi University of Technologythe National Chin-Yi University of Technology,Takming University of Science and Technology,Taiwan,for supporting this research.
文摘This paper proposes a method for improving the data security of wireless sensor networks based on blockchain technology.Blockchain technology is applied to data transfer to build a highly secure wireless sensor network.In this network,the relay stations use microcontrollers and embedded devices,and the microcontrollers,such as Raspberry Pi and Arduino Yun,represents mobile databases.The proposed system uses microcontrollers to facilitate the connection of various sensor devices.By adopting blockchain encryption,the security of sensing data can be effectively improved.A blockchain is a concatenated transaction record that is protected by cryptography.Each section contains the encrypted hash of the previous section,the corresponding timestamp,and transaction data.The transaction data denote the sensing data of the wireless sensing network.The proposed system uses a hash value representation calculated by the Merkel-tree algorithm,which makes the transfer data of the system difficult to be tamped with.However,the proposed system can serve as a private cloud data center.In this study,the system visualizes the data uploaded by sensors and create relevant charts based on big data analysis.Since the webpage server of the proposed system is built on an embedded operating system,it is easy to model and visualize the corresponding graphics using Python or JavaScript programming language.Finally,this study creates an embedded system mobile database and web server,which can utilize JavaScript program language and Node.js runtime environment to apply blockchain technology to mobile databases.The proposed method is verified by the experiment using about 1600 data records.The results show that the possibility of data being changed is very small,and the probability of data being changed is almost zero.
文摘The term sentiment analysis deals with sentiment classification based on the review made by the user in a social network.The sentiment classification accuracy is evaluated using various selection methods,especially those that deal with algorithm selection.In this work,every sentiment received through user expressions is ranked in order to categorise sentiments as informative and non-informative.In order to do so,the work focus on Query Expansion Ranking(QER)algorithm that takes user text as input and process for sentiment analysis andfinally produces the results as informative or non-informative.The challenge is to convert non-informative into informative using the concepts of classifiers like Bayes multinomial,entropy modelling along with the traditional sentimental analysis algorithm like Support Vector Machine(SVM)and decision trees.The work also addresses simulated annealing along with QER to classify data based on sentiment analysis.As the input volume is very fast,the work also addresses the concept of big data for information retrieval and processing.The result com-parison shows that the QER algorithm proved to be versatile when compared with the result of SVM.This work uses Twitter user comments for evaluating senti-ment analysis.
基金supported by the National Natural Science Foundation of China(No.61972261)Basic Research Foundations of Shenzhen(Nos.JCYJ 20210324093609026 and JCYJ20200813091134001).
文摘Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud.The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters.Thus,distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes.In performing such tasks,these frameworks face three challenges:computational inefficiency due to high I/O and communication costs,non-scalability to big data due to memory limit,and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model.New distributed computing frameworks need to be developed to conquer these challenges.In this paper,we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis.In addition,we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.
文摘The proliferation of textual data in society currently is overwhelming, in particular, unstructured textual data is being constantly generated via call centre logs, emails, documents on the web, blogs, tweets, customer comments, customer reviews, etc.While the amount of textual data is increasing rapidly, users ability to summarise, understand, and make sense of such data for making better business/living decisions remains challenging. This paper studies how to analyse textual data, based on layered software patterns, for extracting insightful user intelligence from a large collection of documents and for using such information to improve user operations and performance.
基金Supported in part by the National Natural Science Foundation of China(No.61972261)the National Key R&D Program of China(No.2017YFC0822604-2)
文摘Computer clusters with the shared-nothing architecture are the major computing platforms for big data processing and analysis.In cluster computing,data partitioning and sampling are two fundamental strategies to speed up the computation of big data and increase scalability.In this paper,we present a comprehensive survey of the methods and techniques of data partitioning and sampling with respect to big data processing and analysis.We start with an overview of the mainstream big data frameworks on Hadoop clusters.The basic methods of data partitioning are then discussed including three classical horizontal partitioning schemes:range,hash,and random partitioning.Data partitioning on Hadoop clusters is also discussed with a summary of new strategies for big data partitioning,including the new Random Sample Partition(RSP)distributed model.The classical methods of data sampling are then investigated,including simple random sampling,stratified sampling,and reservoir sampling.Two common methods of big data sampling on computing clusters are also discussed:record-level sampling and blocklevel sampling.Record-level sampling is not as efficient as block-level sampling on big distributed data.On the other hand,block-level sampling on data blocks generated with the classical data partitioning methods does not necessarily produce good representative samples for approximate computing of big data.In this survey,we also summarize the prevailing strategies and related work on sampling-based approximation on Hadoop clusters.We believe that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.
文摘This study innovatively built an intelligent analysis platform for learning behavior,which deeply integrated the cutting-edge technology of big data and Artificial Intelligence(AI),\mined and analyzed students’learning data,and realized the personalized customization of learning resources and the accurate matching of intelligent learning partners.With the help of advanced algorithms and multi-dimensional data fusion strategies,the platform not only promotes positive interaction and collaboration in the learning environment but also provides teachers with comprehensive and in-depth students’learning portraits,which provides solid support for the implementation of precision education and the personalized adjustment of teaching strategies.In this study,a recommender system based on user similarity evaluation and a collaborative filtering mechanism is carefully designed,and its technical architecture and implementation process are described in detail.