How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form...How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form of time-growing tensors.For example,air quality tensor data consists of multiple sensory values gathered from wide locations for a long time.Such data,accumulated over time,is redundant and consumes a lot ofmemory in its raw form.We need a way to efficiently store dynamically generated tensor data that increase over time and to model their behavior on demand between arbitrary time blocks.To this end,we propose a Block IncrementalDense Tucker Decomposition(BID-Tucker)method for efficient storage and on-demand modeling ofmultidimensional spatiotemporal data.Assuming that tensors come in unit blocks where only the time domain changes,our proposed BID-Tucker first slices the blocks into matrices and decomposes them via singular value decomposition(SVD).The SVDs of the time×space sliced matrices are stored instead of the raw tensor blocks to save space.When modeling from data is required at particular time blocks,the SVDs of corresponding time blocks are retrieved and incremented to be used for Tucker decomposition.The factor matrices and core tensor of the decomposed results can then be used for further data analysis.We compared our proposed BID-Tucker with D-Tucker,which our method extends,and vanilla Tucker decomposition.We show that our BID-Tucker is faster than both D-Tucker and vanilla Tucker decomposition and uses less memory for storage with a comparable reconstruction error.We applied our proposed BID-Tucker to model the spatial and temporal trends of air quality data collected in South Korea from 2018 to 2022.We were able to model the spatial and temporal air quality trends.We were also able to verify unusual events,such as chronic ozone alerts and large fire events.展开更多
As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in H...As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in HEIs encompasses handling huge amounts of data coming from different sources.This paper reviews big data and analyses the cases from the literature regarding quality assurance(QA)in HEIs.It also outlines a framework that can address the big data challenge in HEIs to handle QA monitoring using BI dashboards and a prototype dashboard is presented in this paper.The dashboard was developed using a utilisation tool to monitor QA in HEIs to provide visual representations of big data.The prototype dashboard enables stakeholders to monitor compliance with QA standards while addressing the big data challenge associated with the substantial volume of data managed by HEIs’QA systems.This paper also outlines how the developed system integrates big data from social media into the monitoring dashboard.展开更多
With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality pred...With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality prediction models have many disadvantages,such as high complexity and low accuracy.To overcome the above problems,we propose an optimized data equalization method to pre-process dataset and design a simple but effective product quality prediction model:radial basis function model optimized by the firefly algorithm with Levy flight mechanism(RBFFALM).First,the new data equalization method is introduced to pre-process the dataset,which reduces the dimension of the data,removes redundant features,and improves the data distribution.Then the RBFFALFM is used to predict product quality.Comprehensive expe riments conducted on real-world product quality datasets validate that the new model RBFFALFM combining with the new data pre-processing method outperforms other previous me thods on predicting product quality.展开更多
In contrast with the research of new models,little attention has been paid to the impact of low or high-quality data feeding a dialogue system.The present paper makes thefirst attempt tofill this gap by extending our ...In contrast with the research of new models,little attention has been paid to the impact of low or high-quality data feeding a dialogue system.The present paper makes thefirst attempt tofill this gap by extending our previous work on question-answering(QA)systems by investigating the effect of misspelling on QA agents and how context changes can enhance the responses.Instead of using large language models trained on huge datasets,we propose a method that enhances the model's score by modifying only the quality and structure of the data feed to the model.It is important to identify the features that modify the agent performance because a high rate of wrong answers can make the students lose their interest in using the QA agent as an additional tool for distant learning.The results demonstrate the accuracy of the proposed context simplification exceeds 85%.Thesefindings shed light on the importance of question data quality and context complexity construct as key dimensions of the QA system.In conclusion,the experimental results on questions and contexts showed that controlling and improving the various aspects of data quality around the QA system can significantly enhance his robustness and performance.展开更多
According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G c...According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G connections.While bringing a boom to the network,it also presents unprecedented challenges in terms of flow forwarding decisions.The path assignment mechanism used in traditional traffic schedulingmethods tends to cause local network congestion caused by the concentration of elephant flows,resulting in unbalanced network load and degraded quality of service.Using the centralized control of software-defined networks,this study proposes a data center traffic scheduling strategy for minimization congestion and quality of service guaranteeing(MCQG).The ideal transmission path is selected for data flows while considering the network congestion rate and quality of service.Different traffic scheduling strategies are used according to the characteristics of different service types in data centers.Reroute scheduling for elephant flows that tend to cause local congestion.The path evaluation function is formed by the maximum link utilization on the path,the number of elephant flows and the time delay,and the fast merit-seeking capability of the sparrow search algorithm is used to find the path with the lowest actual link overhead as the rerouting path for the elephant flows.It is used to reduce the possibility of local network congestion occurrence.Equal cost multi-path(ECMP)protocols with faster response time are used to schedulemouse flows with shorter duration.Used to guarantee the quality of service of the network.To achieve isolated transmission of various types of data streams.The experimental results show that the proposed strategy has higher throughput,better network load balancing,and better robustness compared to ECMP under different traffic models.In addition,because it can fully utilize the resources in the network,MCQG also outperforms another traffic scheduling strategy that does rerouting for elephant flows(namely Hedera).Compared withECMPandHedera,MCQGimproves average throughput by 11.73%and 4.29%,and normalized total throughput by 6.74%and 2.64%,respectively;MCQG improves link utilization by 23.25%and 15.07%;in addition,the average round-trip delay and packet loss rate fluctuate significantly less than the two compared strategies.展开更多
Objectives:Big data has revolutionized nursing and health care and raised concerns.This research aims to help nurses understand big data sets to provide better patient care.Methods:This study used big data in nursing ...Objectives:Big data has revolutionized nursing and health care and raised concerns.This research aims to help nurses understand big data sets to provide better patient care.Methods:This study used big data in nursing to improve patient care.Big data in nursing has sparked a global revolution and raised concerns,but few studies have focused on helping nurses understand big data to provide the best patient care.This systematic review was conducted based on PRISMA guidelines.PubMed,MEDLINE,CINAHL,Google Scholar,and ResearchGate were used for 2010-2020 studies.Results:The most common use of big data in nursing was investigated in eight papers between 2015 and 2018.All research showed improvements in patient outcomes and healthcare delivery when big data was used in the medical-surgical,emergency department,critical care unit,community,systems biology,and leadership applications.Big data is not taught to nurses.Conclusions:Big data applications in nursing and health care improve early intervention and decision-making.Big data provides a comprehensive view of a patient’s status and social determinants of health,allowing treatment using all metaparadigms and avoiding a singular focus.Big data can help prepare nurses and improve patient outcomes by improving quality,safety,and outcomes.展开更多
At present,water pollution has become an important factor affecting and restricting national and regional economic development.Total phosphorus is one of the main sources of water pollution and eutrophication,so the p...At present,water pollution has become an important factor affecting and restricting national and regional economic development.Total phosphorus is one of the main sources of water pollution and eutrophication,so the prediction of total phosphorus in water quality has good research significance.This paper selects the total phosphorus and turbidity data for analysis by crawling the data of the water quality monitoring platform.By constructing the attribute object mapping relationship,the correlation between the two indicators was analyzed and used to predict the future data.Firstly,the monthly mean and daily mean concentrations of total phosphorus and turbidity outliers were calculated after cleaning,and the correlation between them was analyzed.Secondly,the correlation coefficients of different times and frequencies were used to predict the values for the next five days,and the data trend was predicted by python visualization.Finally,the real value was compared with the predicted value data,and the results showed that the correlation between total phosphorus and turbidity was useful in predicting the water quality.展开更多
Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities tu...Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities turn up dangerous contaminants in our surroundings. This study investigated two years’ worth of air quality and outlier detection data from two Indian cities. Studies on air pollution have used numerous types of methodologies, with various gases being seen as a vector whose components include gas concentration values for each observation per-formed. We use curves to represent the monthly average of daily gas emissions in our technique. The approach, which is based on functional depth, was used to find outliers in the city of Delhi and Kolkata’s gas emissions, and the outcomes were compared to those from the traditional method. In the evaluation and comparison of these models’ performances, the functional approach model studied well.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
In the era of big data,the construction and implementation of a quality control audit system are particularly crucial.This article delves into the impact of big data technology on quality control auditing,establishes ...In the era of big data,the construction and implementation of a quality control audit system are particularly crucial.This article delves into the impact of big data technology on quality control auditing,establishes a quality control auditing system in the big data era,and elucidates the pathway to realizing this system.Through the application of big data technology to quality control audits,there is an enhancement in audit efficiency,the attainment of more accurate risk assessment,and the provision of robust support for the sustainable development of enterprises.展开更多
Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great s...Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.展开更多
Multisensor data fusion (MDF) is an emerging technology to fuse data from multiple sensors in order to make a more accurate estimation of the environment through measurement and detection. Applications of MDF cross ...Multisensor data fusion (MDF) is an emerging technology to fuse data from multiple sensors in order to make a more accurate estimation of the environment through measurement and detection. Applications of MDF cross a wide spectrum in military and civilian areas. With the rapid evolution of computers and the proliferation of micro-mechanical/electrical systems sensors, the utilization of MDF is being popularized in research and applications. This paper focuses on application of MDF for high quality data analysis and processing in measurement and instrumentation. A practical, general data fusion scheme was established on the basis of feature extraction and merge of data from multiple sensors. This scheme integrates artificial neural networks for high performance pattern recognition. A number of successful applications in areas of NDI (Non-Destructive Inspection) corrosion detection, food quality and safety characterization, and precision agriculture are described and discussed in order to motivate new applications in these or other areas. This paper gives an overall picture of using the MDF method to increase the accuracy of data analysis and processing in measurement and instrumentation in different areas of applications.展开更多
Since the British National Archive put forward the concept of the digital continuity in 2007,several developed countries have worked out their digital continuity action plan.However,the technologies of the digital con...Since the British National Archive put forward the concept of the digital continuity in 2007,several developed countries have worked out their digital continuity action plan.However,the technologies of the digital continuity guarantee are still lacked.At first,this paper analyzes the requirements of digital continuity guarantee for electronic record based on data quality theory,then points out the necessity of data quality guarantee for electronic record.Moreover,we convert the digital continuity guarantee of electronic record to ensure the consistency,completeness and timeliness of electronic record,and construct the first technology framework of the digital continuity guarantee for electronic record.Finally,the temporal functional dependencies technology is utilized to build the first integration method to insure the consistency,completeness and timeliness of electronic record.展开更多
We first analyzed GPS precipitable water vapor(GPS/PWV) available from a ground-based GPS observation network in Guangdong from 1 August 2009 to 27 August 2012 and then developed a method of quality control before GPS...We first analyzed GPS precipitable water vapor(GPS/PWV) available from a ground-based GPS observation network in Guangdong from 1 August 2009 to 27 August 2012 and then developed a method of quality control before GPS/PWV data is assimilated into the GRAPES 3DVAR system. This method can reject the outliers effectively. After establishing the criterion for quality control, we did three numerical experiments to investigate the impact on the precipitation forecast with and without the quality-controlled GPS/PWV data before they are assimilated into the system.In the numerical experiments, two precipitation cases(on 6 to 7 May, 2010 and 27 to 28 April, 2012 respectively) that occurred in the annually first raining season of Guangdong were selected. The results indicated that after quality control,only the GPS/PWV data that deviates little from the NCEP/PWV data can be assimilated into the system, has reasonable adjustment of the initial water vapor above Guangdong, and eventually improves the intensity and location of 24-h precipitation forecast significantly.展开更多
This study proposes a method to derive the climatological limit thresholds that can be used in an operational/historical quality control procedure for Chinese high vertical resolution(5–10 m)radiosonde temperature an...This study proposes a method to derive the climatological limit thresholds that can be used in an operational/historical quality control procedure for Chinese high vertical resolution(5–10 m)radiosonde temperature and wind speed data.The whole atmosphere is divided into 64 vertical bins,and the profiles are constructed by the percentiles of the values in each vertical bin.Based on the percentile profiles(PPs),some objective criteria are developed to obtain the thresholds.Tibetan Plateau field data are used to validate the effectiveness of the method in the application of experimental data.The results show that the derived thresholds for 120 operational stations and 3 experimental stations are effective in detecting the gross errors,and those PPs can clearly and instantly illustrate the characteristics of a radiosonde variable and reveal the distribution of errors.展开更多
This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the informat...This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the information quality risks for the database of the geographical information system (GIS). Four quantitative measures are introduced to examine how the quality risks of source information affect the quality of information outputs produced using the relational algebra operations Selection, Projection, and Cubic Product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The GIS is the prime source of information on the location of cables, and detection time strongly depends on whether maps indicate the presence of cables in the construction business. Poor data quality in the GIS can contribute to increased risk or higher risk avoidance costs. A case study provides a numerical example of the calculation of the trade-offs between risk and detection costs and provides an example of the calculation of the costs of data quality. We conclude that the model contributes valuable new insight.展开更多
Due to the influence of terrain structure,meteorological conditions and various factors,there are anomalous data in automatic dependent surveillance-broadcast(ADS-B)message.The ADS-B equipment can be used for position...Due to the influence of terrain structure,meteorological conditions and various factors,there are anomalous data in automatic dependent surveillance-broadcast(ADS-B)message.The ADS-B equipment can be used for positioning of general aviation aircraft.Aim to acquire the accurate position information of aircraft and detect anomaly data,the ADS-B anomaly data detection model based on deep learning and difference of Gaussian(DoG)approach is proposed.First,according to the characteristic of ADS-B data,the ADS-B position data are transformed into the coordinate system.And the origin of the coordinate system is set up as the take-off point.Then,based on the kinematic principle,the ADS-B anomaly data can be removed.Moreover,the details of the ADS-B position data can be got by the DoG approach.Finally,the long short-term memory(LSTM)neural network is used to optimize the recurrent neural network(RNN)with severe gradient reduction for processing ADS-B data.The position data of ADS-B are reconstructed by the sequence to sequence(seq2seq)model which is composed of LSTM neural network,and the reconstruction error is used to detect the anomalous data.Based on the real flight data of general aviation aircraft,the simulation results show that the anomaly data can be detected effectively by the proposed method of reconstructing ADS-B data with the seq2seq model,and its running time is reduced.Compared with the RNN,the accuracy of anomaly detection is increased by 2.7%.The performance of the proposed model is better than that of the traditional anomaly detection models.展开更多
This paper introduces the implementation and data analysis associated with a state-wide power quality monitoring and analysis system in China. Corporation specifications on power quality monitors as well as on communi...This paper introduces the implementation and data analysis associated with a state-wide power quality monitoring and analysis system in China. Corporation specifications on power quality monitors as well as on communication protocols are formulated for data transmission. Big data platform and related technologies are utilized for data storage and computation. Compliance verification analysis and a power quality performance assessment are conducted, and a visualization tool for result presentation is finally presented.展开更多
Sea surface temperature(SST)data obtained from coastal stations in Jiangsu,China during 20102014 are quality controlled before analysis of their characteristic semidiurnal and seasonal cycles,including the correlation...Sea surface temperature(SST)data obtained from coastal stations in Jiangsu,China during 20102014 are quality controlled before analysis of their characteristic semidiurnal and seasonal cycles,including the correlation with the variation of the tide.Quality control of data includes the validation of extreme values and checking of hourly values based on temporally adjacent data points,with 0.15℃/h considered a suitable threshold for detecting abnormal values.The diurnal variation amplitude of the SST data is greater in spring and summer than in autumn and winter.The diurnal variation of SST has bimodal structure on most days,i.e.,SST has a significant semidiurnal cycle.Moreover,the semidiurnal cycle of SST is negatively correlated with the tidal data from March to August,but positively correlated with the tidal data from October to January.Little correlation is detected in the remaining months because of the weak coastal offshore SST gradients.The quality control and understanding of coastal SST data are particularly relevant with regard to the validation of indirect measurements such as satellite-derived data.展开更多
Water is one of the basic resources for human survival.Water pollution monitoring and protection have been becoming a major problem for many countries all over the world.Most traditional water quality monitoring syste...Water is one of the basic resources for human survival.Water pollution monitoring and protection have been becoming a major problem for many countries all over the world.Most traditional water quality monitoring systems,however,generally focus only on water quality data collection,ignoring data analysis and data mining.In addition,some dirty data and data loss may occur due to power failures or transmission failures,further affecting data analysis and its application.In order to meet these needs,by using Internet of things,cloud computing,and big data technologies,we designed and implemented a water quality monitoring data intelligent service platform in C#and PHP language.The platform includes monitoring point addition,monitoring point map labeling,monitoring data uploading,monitoring data processing,early warning of exceeding the standard of monitoring indicators,and other functions modules.Using this platform,we can realize the automatic collection of water quality monitoring data,data cleaning,data analysis,intelligent early warning and early warning information push,and other functions.For better security and convenience,we deployed the system in the Tencent Cloud and tested it.The testing results showed that the data analysis platform could run well and will provide decision support for water resource protection.展开更多
基金supported by the Institute of Information&Communications Technology Planning&Evaluation (IITP)grant funded by the Korean government (MSIT) (No.2022-0-00369)by the NationalResearch Foundation of Korea Grant funded by the Korean government (2018R1A5A1060031,2022R1F1A1065664).
文摘How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form of time-growing tensors.For example,air quality tensor data consists of multiple sensory values gathered from wide locations for a long time.Such data,accumulated over time,is redundant and consumes a lot ofmemory in its raw form.We need a way to efficiently store dynamically generated tensor data that increase over time and to model their behavior on demand between arbitrary time blocks.To this end,we propose a Block IncrementalDense Tucker Decomposition(BID-Tucker)method for efficient storage and on-demand modeling ofmultidimensional spatiotemporal data.Assuming that tensors come in unit blocks where only the time domain changes,our proposed BID-Tucker first slices the blocks into matrices and decomposes them via singular value decomposition(SVD).The SVDs of the time×space sliced matrices are stored instead of the raw tensor blocks to save space.When modeling from data is required at particular time blocks,the SVDs of corresponding time blocks are retrieved and incremented to be used for Tucker decomposition.The factor matrices and core tensor of the decomposed results can then be used for further data analysis.We compared our proposed BID-Tucker with D-Tucker,which our method extends,and vanilla Tucker decomposition.We show that our BID-Tucker is faster than both D-Tucker and vanilla Tucker decomposition and uses less memory for storage with a comparable reconstruction error.We applied our proposed BID-Tucker to model the spatial and temporal trends of air quality data collected in South Korea from 2018 to 2022.We were able to model the spatial and temporal air quality trends.We were also able to verify unusual events,such as chronic ozone alerts and large fire events.
文摘As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in HEIs encompasses handling huge amounts of data coming from different sources.This paper reviews big data and analyses the cases from the literature regarding quality assurance(QA)in HEIs.It also outlines a framework that can address the big data challenge in HEIs to handle QA monitoring using BI dashboards and a prototype dashboard is presented in this paper.The dashboard was developed using a utilisation tool to monitor QA in HEIs to provide visual representations of big data.The prototype dashboard enables stakeholders to monitor compliance with QA standards while addressing the big data challenge associated with the substantial volume of data managed by HEIs’QA systems.This paper also outlines how the developed system integrates big data from social media into the monitoring dashboard.
基金supported by the National Science and Technology Innovation 2030 Next-Generation Artifical Intelligence Major Project(2018AAA0101801)the National Natural Science Foundation of China(72271188)。
文摘With the development of information technology,a large number of product quality data in the entire manufacturing process is accumulated,but it is not explored and used effectively.The traditional product quality prediction models have many disadvantages,such as high complexity and low accuracy.To overcome the above problems,we propose an optimized data equalization method to pre-process dataset and design a simple but effective product quality prediction model:radial basis function model optimized by the firefly algorithm with Levy flight mechanism(RBFFALM).First,the new data equalization method is introduced to pre-process the dataset,which reduces the dimension of the data,removes redundant features,and improves the data distribution.Then the RBFFALFM is used to predict product quality.Comprehensive expe riments conducted on real-world product quality datasets validate that the new model RBFFALFM combining with the new data pre-processing method outperforms other previous me thods on predicting product quality.
文摘In contrast with the research of new models,little attention has been paid to the impact of low or high-quality data feeding a dialogue system.The present paper makes thefirst attempt tofill this gap by extending our previous work on question-answering(QA)systems by investigating the effect of misspelling on QA agents and how context changes can enhance the responses.Instead of using large language models trained on huge datasets,we propose a method that enhances the model's score by modifying only the quality and structure of the data feed to the model.It is important to identify the features that modify the agent performance because a high rate of wrong answers can make the students lose their interest in using the QA agent as an additional tool for distant learning.The results demonstrate the accuracy of the proposed context simplification exceeds 85%.Thesefindings shed light on the importance of question data quality and context complexity construct as key dimensions of the QA system.In conclusion,the experimental results on questions and contexts showed that controlling and improving the various aspects of data quality around the QA system can significantly enhance his robustness and performance.
基金This work is funded by the National Natural Science Foundation of China under Grant No.61772180the Key R&D plan of Hubei Province(2020BHB004,2020BAB012).
文摘According to Cisco’s Internet Report 2020 white paper,there will be 29.3 billion connected devices worldwide by 2023,up from 18.4 billion in 2018.5G connections will generate nearly three times more traffic than 4G connections.While bringing a boom to the network,it also presents unprecedented challenges in terms of flow forwarding decisions.The path assignment mechanism used in traditional traffic schedulingmethods tends to cause local network congestion caused by the concentration of elephant flows,resulting in unbalanced network load and degraded quality of service.Using the centralized control of software-defined networks,this study proposes a data center traffic scheduling strategy for minimization congestion and quality of service guaranteeing(MCQG).The ideal transmission path is selected for data flows while considering the network congestion rate and quality of service.Different traffic scheduling strategies are used according to the characteristics of different service types in data centers.Reroute scheduling for elephant flows that tend to cause local congestion.The path evaluation function is formed by the maximum link utilization on the path,the number of elephant flows and the time delay,and the fast merit-seeking capability of the sparrow search algorithm is used to find the path with the lowest actual link overhead as the rerouting path for the elephant flows.It is used to reduce the possibility of local network congestion occurrence.Equal cost multi-path(ECMP)protocols with faster response time are used to schedulemouse flows with shorter duration.Used to guarantee the quality of service of the network.To achieve isolated transmission of various types of data streams.The experimental results show that the proposed strategy has higher throughput,better network load balancing,and better robustness compared to ECMP under different traffic models.In addition,because it can fully utilize the resources in the network,MCQG also outperforms another traffic scheduling strategy that does rerouting for elephant flows(namely Hedera).Compared withECMPandHedera,MCQGimproves average throughput by 11.73%and 4.29%,and normalized total throughput by 6.74%and 2.64%,respectively;MCQG improves link utilization by 23.25%and 15.07%;in addition,the average round-trip delay and packet loss rate fluctuate significantly less than the two compared strategies.
文摘Objectives:Big data has revolutionized nursing and health care and raised concerns.This research aims to help nurses understand big data sets to provide better patient care.Methods:This study used big data in nursing to improve patient care.Big data in nursing has sparked a global revolution and raised concerns,but few studies have focused on helping nurses understand big data to provide the best patient care.This systematic review was conducted based on PRISMA guidelines.PubMed,MEDLINE,CINAHL,Google Scholar,and ResearchGate were used for 2010-2020 studies.Results:The most common use of big data in nursing was investigated in eight papers between 2015 and 2018.All research showed improvements in patient outcomes and healthcare delivery when big data was used in the medical-surgical,emergency department,critical care unit,community,systems biology,and leadership applications.Big data is not taught to nurses.Conclusions:Big data applications in nursing and health care improve early intervention and decision-making.Big data provides a comprehensive view of a patient’s status and social determinants of health,allowing treatment using all metaparadigms and avoiding a singular focus.Big data can help prepare nurses and improve patient outcomes by improving quality,safety,and outcomes.
基金the National Natural Science Foundation of China(No.51775185)Natural Science Foundation of Hunan Province(No.2022JJ90013)+1 种基金Intelligent Environmental Monitoring Technology Hunan Provincial Joint Training Base for Graduate Students in the Integration of Industry and Education,and Hunan Normal University University-Industry Cooperation.the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property,Universities of Hunan Province,Open Project,Grant Number 20181901CRP04.
文摘At present,water pollution has become an important factor affecting and restricting national and regional economic development.Total phosphorus is one of the main sources of water pollution and eutrophication,so the prediction of total phosphorus in water quality has good research significance.This paper selects the total phosphorus and turbidity data for analysis by crawling the data of the water quality monitoring platform.By constructing the attribute object mapping relationship,the correlation between the two indicators was analyzed and used to predict the future data.Firstly,the monthly mean and daily mean concentrations of total phosphorus and turbidity outliers were calculated after cleaning,and the correlation between them was analyzed.Secondly,the correlation coefficients of different times and frequencies were used to predict the values for the next five days,and the data trend was predicted by python visualization.Finally,the real value was compared with the predicted value data,and the results showed that the correlation between total phosphorus and turbidity was useful in predicting the water quality.
文摘Human living would be impossible without air quality. Consistent advancements in practically every aspect of contemporary human life have harmed air quality. Everyday industrial, transportation, and home activities turn up dangerous contaminants in our surroundings. This study investigated two years’ worth of air quality and outlier detection data from two Indian cities. Studies on air pollution have used numerous types of methodologies, with various gases being seen as a vector whose components include gas concentration values for each observation per-formed. We use curves to represent the monthly average of daily gas emissions in our technique. The approach, which is based on functional depth, was used to find outliers in the city of Delhi and Kolkata’s gas emissions, and the outcomes were compared to those from the traditional method. In the evaluation and comparison of these models’ performances, the functional approach model studied well.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
文摘In the era of big data,the construction and implementation of a quality control audit system are particularly crucial.This article delves into the impact of big data technology on quality control auditing,establishes a quality control auditing system in the big data era,and elucidates the pathway to realizing this system.Through the application of big data technology to quality control audits,there is an enhancement in audit efficiency,the attainment of more accurate risk assessment,and the provision of robust support for the sustainable development of enterprises.
基金Supported by the National Key Research and Development Program of China(Nos.2016YFC1402000,2018YFC1407003,2017YFC1405300)
文摘Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.
文摘Multisensor data fusion (MDF) is an emerging technology to fuse data from multiple sensors in order to make a more accurate estimation of the environment through measurement and detection. Applications of MDF cross a wide spectrum in military and civilian areas. With the rapid evolution of computers and the proliferation of micro-mechanical/electrical systems sensors, the utilization of MDF is being popularized in research and applications. This paper focuses on application of MDF for high quality data analysis and processing in measurement and instrumentation. A practical, general data fusion scheme was established on the basis of feature extraction and merge of data from multiple sensors. This scheme integrates artificial neural networks for high performance pattern recognition. A number of successful applications in areas of NDI (Non-Destructive Inspection) corrosion detection, food quality and safety characterization, and precision agriculture are described and discussed in order to motivate new applications in these or other areas. This paper gives an overall picture of using the MDF method to increase the accuracy of data analysis and processing in measurement and instrumentation in different areas of applications.
基金This work is supported by the NSFC(Nos.61772280,61772454)the Changzhou Sci&Tech Program(No.CJ20179027)the PAPD fund from NUIST.Prof.Jin Wang is the corresponding author。
文摘Since the British National Archive put forward the concept of the digital continuity in 2007,several developed countries have worked out their digital continuity action plan.However,the technologies of the digital continuity guarantee are still lacked.At first,this paper analyzes the requirements of digital continuity guarantee for electronic record based on data quality theory,then points out the necessity of data quality guarantee for electronic record.Moreover,we convert the digital continuity guarantee of electronic record to ensure the consistency,completeness and timeliness of electronic record,and construct the first technology framework of the digital continuity guarantee for electronic record.Finally,the temporal functional dependencies technology is utilized to build the first integration method to insure the consistency,completeness and timeliness of electronic record.
基金Natural Science Foundation of Guangdong Province(2016A030313140)Project 973(2015CB452802)+1 种基金Natural Science Foundation of China(41405104)Science and Technology Program of Guangzhou City(201604020012)
文摘We first analyzed GPS precipitable water vapor(GPS/PWV) available from a ground-based GPS observation network in Guangdong from 1 August 2009 to 27 August 2012 and then developed a method of quality control before GPS/PWV data is assimilated into the GRAPES 3DVAR system. This method can reject the outliers effectively. After establishing the criterion for quality control, we did three numerical experiments to investigate the impact on the precipitation forecast with and without the quality-controlled GPS/PWV data before they are assimilated into the system.In the numerical experiments, two precipitation cases(on 6 to 7 May, 2010 and 27 to 28 April, 2012 respectively) that occurred in the annually first raining season of Guangdong were selected. The results indicated that after quality control,only the GPS/PWV data that deviates little from the NCEP/PWV data can be assimilated into the system, has reasonable adjustment of the initial water vapor above Guangdong, and eventually improves the intensity and location of 24-h precipitation forecast significantly.
基金supported by the National Innovation Project for Meteorological Science and Technology grant number CMAGGTD003-5the National Key R&D Program of China grant number2017YFC1501801。
文摘This study proposes a method to derive the climatological limit thresholds that can be used in an operational/historical quality control procedure for Chinese high vertical resolution(5–10 m)radiosonde temperature and wind speed data.The whole atmosphere is divided into 64 vertical bins,and the profiles are constructed by the percentiles of the values in each vertical bin.Based on the percentile profiles(PPs),some objective criteria are developed to obtain the thresholds.Tibetan Plateau field data are used to validate the effectiveness of the method in the application of experimental data.The results show that the derived thresholds for 120 operational stations and 3 experimental stations are effective in detecting the gross errors,and those PPs can clearly and instantly illustrate the characteristics of a radiosonde variable and reveal the distribution of errors.
基金The National Natural Science Foundation of China (No.70772021,70372004)China Postdoctoral Science Foundation (No.20060400077)
文摘This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the information quality risks for the database of the geographical information system (GIS). Four quantitative measures are introduced to examine how the quality risks of source information affect the quality of information outputs produced using the relational algebra operations Selection, Projection, and Cubic Product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The GIS is the prime source of information on the location of cables, and detection time strongly depends on whether maps indicate the presence of cables in the construction business. Poor data quality in the GIS can contribute to increased risk or higher risk avoidance costs. A case study provides a numerical example of the calculation of the trade-offs between risk and detection costs and provides an example of the calculation of the costs of data quality. We conclude that the model contributes valuable new insight.
基金supported by the National Key R&D Program of China(No.2018AAA0100804)the Talent Project of Revitalization Liaoning(No.XLYC1907022)+5 种基金the Key R&D Projects of Liaoning Province(No.2020JH2/10100045)the Capacity Building of Civil Aviation Safety(No.TMSA1614)the Natural Science Foundation of Liaoning Province(No.2019-MS-251)the Scientific Research Project of Liaoning Provincial Department of Education(Nos.L201705,L201716)the High-Level Innovation Talent Project of Shenyang(No.RC190030)the Second Young and Middle-Aged Talents Support Program of Shenyang Aerospace University.
文摘Due to the influence of terrain structure,meteorological conditions and various factors,there are anomalous data in automatic dependent surveillance-broadcast(ADS-B)message.The ADS-B equipment can be used for positioning of general aviation aircraft.Aim to acquire the accurate position information of aircraft and detect anomaly data,the ADS-B anomaly data detection model based on deep learning and difference of Gaussian(DoG)approach is proposed.First,according to the characteristic of ADS-B data,the ADS-B position data are transformed into the coordinate system.And the origin of the coordinate system is set up as the take-off point.Then,based on the kinematic principle,the ADS-B anomaly data can be removed.Moreover,the details of the ADS-B position data can be got by the DoG approach.Finally,the long short-term memory(LSTM)neural network is used to optimize the recurrent neural network(RNN)with severe gradient reduction for processing ADS-B data.The position data of ADS-B are reconstructed by the sequence to sequence(seq2seq)model which is composed of LSTM neural network,and the reconstruction error is used to detect the anomalous data.Based on the real flight data of general aviation aircraft,the simulation results show that the anomaly data can be detected effectively by the proposed method of reconstructing ADS-B data with the seq2seq model,and its running time is reduced.Compared with the RNN,the accuracy of anomaly detection is increased by 2.7%.The performance of the proposed model is better than that of the traditional anomaly detection models.
基金supported by the State Grid Science and Technology Project (GEIRI-DL-71-17-002)
文摘This paper introduces the implementation and data analysis associated with a state-wide power quality monitoring and analysis system in China. Corporation specifications on power quality monitors as well as on communication protocols are formulated for data transmission. Big data platform and related technologies are utilized for data storage and computation. Compliance verification analysis and a power quality performance assessment are conducted, and a visualization tool for result presentation is finally presented.
基金The Open Fund of State Key Laboratory of Satellite Ocean Environment Dynamics under contract No.SOED1402the Youth Science and Technology Foundation of East China Sea Branch,SOA under contract No.201624
文摘Sea surface temperature(SST)data obtained from coastal stations in Jiangsu,China during 20102014 are quality controlled before analysis of their characteristic semidiurnal and seasonal cycles,including the correlation with the variation of the tide.Quality control of data includes the validation of extreme values and checking of hourly values based on temporally adjacent data points,with 0.15℃/h considered a suitable threshold for detecting abnormal values.The diurnal variation amplitude of the SST data is greater in spring and summer than in autumn and winter.The diurnal variation of SST has bimodal structure on most days,i.e.,SST has a significant semidiurnal cycle.Moreover,the semidiurnal cycle of SST is negatively correlated with the tidal data from March to August,but positively correlated with the tidal data from October to January.Little correlation is detected in the remaining months because of the weak coastal offshore SST gradients.The quality control and understanding of coastal SST data are particularly relevant with regard to the validation of indirect measurements such as satellite-derived data.
基金the National Natural Science Foundation of China(No.61304208)Scientific Research Fund of Hunan Province Education Department(18C0003)+5 种基金Researchproject on teaching reform in colleges and universities of Hunan Province Education Department(20190147)Changsha City Science and Technology Plan Program(K1501013-11)Hunan NormalUniversity University-Industry Cooperation.This work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data PropertyUniversities of Hunan ProvinceOpen projectgrant number 20181901CRP04.
文摘Water is one of the basic resources for human survival.Water pollution monitoring and protection have been becoming a major problem for many countries all over the world.Most traditional water quality monitoring systems,however,generally focus only on water quality data collection,ignoring data analysis and data mining.In addition,some dirty data and data loss may occur due to power failures or transmission failures,further affecting data analysis and its application.In order to meet these needs,by using Internet of things,cloud computing,and big data technologies,we designed and implemented a water quality monitoring data intelligent service platform in C#and PHP language.The platform includes monitoring point addition,monitoring point map labeling,monitoring data uploading,monitoring data processing,early warning of exceeding the standard of monitoring indicators,and other functions modules.Using this platform,we can realize the automatic collection of water quality monitoring data,data cleaning,data analysis,intelligent early warning and early warning information push,and other functions.For better security and convenience,we deployed the system in the Tencent Cloud and tested it.The testing results showed that the data analysis platform could run well and will provide decision support for water resource protection.