How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form...How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form of time-growing tensors.For example,air quality tensor data consists of multiple sensory values gathered from wide locations for a long time.Such data,accumulated over time,is redundant and consumes a lot ofmemory in its raw form.We need a way to efficiently store dynamically generated tensor data that increase over time and to model their behavior on demand between arbitrary time blocks.To this end,we propose a Block IncrementalDense Tucker Decomposition(BID-Tucker)method for efficient storage and on-demand modeling ofmultidimensional spatiotemporal data.Assuming that tensors come in unit blocks where only the time domain changes,our proposed BID-Tucker first slices the blocks into matrices and decomposes them via singular value decomposition(SVD).The SVDs of the time×space sliced matrices are stored instead of the raw tensor blocks to save space.When modeling from data is required at particular time blocks,the SVDs of corresponding time blocks are retrieved and incremented to be used for Tucker decomposition.The factor matrices and core tensor of the decomposed results can then be used for further data analysis.We compared our proposed BID-Tucker with D-Tucker,which our method extends,and vanilla Tucker decomposition.We show that our BID-Tucker is faster than both D-Tucker and vanilla Tucker decomposition and uses less memory for storage with a comparable reconstruction error.We applied our proposed BID-Tucker to model the spatial and temporal trends of air quality data collected in South Korea from 2018 to 2022.We were able to model the spatial and temporal air quality trends.We were also able to verify unusual events,such as chronic ozone alerts and large fire events.展开更多
As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in H...As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in HEIs encompasses handling huge amounts of data coming from different sources.This paper reviews big data and analyses the cases from the literature regarding quality assurance(QA)in HEIs.It also outlines a framework that can address the big data challenge in HEIs to handle QA monitoring using BI dashboards and a prototype dashboard is presented in this paper.The dashboard was developed using a utilisation tool to monitor QA in HEIs to provide visual representations of big data.The prototype dashboard enables stakeholders to monitor compliance with QA standards while addressing the big data challenge associated with the substantial volume of data managed by HEIs’QA systems.This paper also outlines how the developed system integrates big data from social media into the monitoring dashboard.展开更多
Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great s...Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.展开更多
Multisensor data fusion (MDF) is an emerging technology to fuse data from multiple sensors in order to make a more accurate estimation of the environment through measurement and detection. Applications of MDF cross ...Multisensor data fusion (MDF) is an emerging technology to fuse data from multiple sensors in order to make a more accurate estimation of the environment through measurement and detection. Applications of MDF cross a wide spectrum in military and civilian areas. With the rapid evolution of computers and the proliferation of micro-mechanical/electrical systems sensors, the utilization of MDF is being popularized in research and applications. This paper focuses on application of MDF for high quality data analysis and processing in measurement and instrumentation. A practical, general data fusion scheme was established on the basis of feature extraction and merge of data from multiple sensors. This scheme integrates artificial neural networks for high performance pattern recognition. A number of successful applications in areas of NDI (Non-Destructive Inspection) corrosion detection, food quality and safety characterization, and precision agriculture are described and discussed in order to motivate new applications in these or other areas. This paper gives an overall picture of using the MDF method to increase the accuracy of data analysis and processing in measurement and instrumentation in different areas of applications.展开更多
We first analyzed GPS precipitable water vapor(GPS/PWV) available from a ground-based GPS observation network in Guangdong from 1 August 2009 to 27 August 2012 and then developed a method of quality control before GPS...We first analyzed GPS precipitable water vapor(GPS/PWV) available from a ground-based GPS observation network in Guangdong from 1 August 2009 to 27 August 2012 and then developed a method of quality control before GPS/PWV data is assimilated into the GRAPES 3DVAR system. This method can reject the outliers effectively. After establishing the criterion for quality control, we did three numerical experiments to investigate the impact on the precipitation forecast with and without the quality-controlled GPS/PWV data before they are assimilated into the system.In the numerical experiments, two precipitation cases(on 6 to 7 May, 2010 and 27 to 28 April, 2012 respectively) that occurred in the annually first raining season of Guangdong were selected. The results indicated that after quality control,only the GPS/PWV data that deviates little from the NCEP/PWV data can be assimilated into the system, has reasonable adjustment of the initial water vapor above Guangdong, and eventually improves the intensity and location of 24-h precipitation forecast significantly.展开更多
Since the British National Archive put forward the concept of the digital continuity in 2007,several developed countries have worked out their digital continuity action plan.However,the technologies of the digital con...Since the British National Archive put forward the concept of the digital continuity in 2007,several developed countries have worked out their digital continuity action plan.However,the technologies of the digital continuity guarantee are still lacked.At first,this paper analyzes the requirements of digital continuity guarantee for electronic record based on data quality theory,then points out the necessity of data quality guarantee for electronic record.Moreover,we convert the digital continuity guarantee of electronic record to ensure the consistency,completeness and timeliness of electronic record,and construct the first technology framework of the digital continuity guarantee for electronic record.Finally,the temporal functional dependencies technology is utilized to build the first integration method to insure the consistency,completeness and timeliness of electronic record.展开更多
This study proposes a method to derive the climatological limit thresholds that can be used in an operational/historical quality control procedure for Chinese high vertical resolution(5–10 m)radiosonde temperature an...This study proposes a method to derive the climatological limit thresholds that can be used in an operational/historical quality control procedure for Chinese high vertical resolution(5–10 m)radiosonde temperature and wind speed data.The whole atmosphere is divided into 64 vertical bins,and the profiles are constructed by the percentiles of the values in each vertical bin.Based on the percentile profiles(PPs),some objective criteria are developed to obtain the thresholds.Tibetan Plateau field data are used to validate the effectiveness of the method in the application of experimental data.The results show that the derived thresholds for 120 operational stations and 3 experimental stations are effective in detecting the gross errors,and those PPs can clearly and instantly illustrate the characteristics of a radiosonde variable and reveal the distribution of errors.展开更多
Water is one of the basic resources for human survival.Water pollution monitoring and protection have been becoming a major problem for many countries all over the world.Most traditional water quality monitoring syste...Water is one of the basic resources for human survival.Water pollution monitoring and protection have been becoming a major problem for many countries all over the world.Most traditional water quality monitoring systems,however,generally focus only on water quality data collection,ignoring data analysis and data mining.In addition,some dirty data and data loss may occur due to power failures or transmission failures,further affecting data analysis and its application.In order to meet these needs,by using Internet of things,cloud computing,and big data technologies,we designed and implemented a water quality monitoring data intelligent service platform in C#and PHP language.The platform includes monitoring point addition,monitoring point map labeling,monitoring data uploading,monitoring data processing,early warning of exceeding the standard of monitoring indicators,and other functions modules.Using this platform,we can realize the automatic collection of water quality monitoring data,data cleaning,data analysis,intelligent early warning and early warning information push,and other functions.For better security and convenience,we deployed the system in the Tencent Cloud and tested it.The testing results showed that the data analysis platform could run well and will provide decision support for water resource protection.展开更多
This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the informat...This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the information quality risks for the database of the geographical information system (GIS). Four quantitative measures are introduced to examine how the quality risks of source information affect the quality of information outputs produced using the relational algebra operations Selection, Projection, and Cubic Product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The GIS is the prime source of information on the location of cables, and detection time strongly depends on whether maps indicate the presence of cables in the construction business. Poor data quality in the GIS can contribute to increased risk or higher risk avoidance costs. A case study provides a numerical example of the calculation of the trade-offs between risk and detection costs and provides an example of the calculation of the costs of data quality. We conclude that the model contributes valuable new insight.展开更多
This paper introduces the implementation and data analysis associated with a state-wide power quality monitoring and analysis system in China. Corporation specifications on power quality monitors as well as on communi...This paper introduces the implementation and data analysis associated with a state-wide power quality monitoring and analysis system in China. Corporation specifications on power quality monitors as well as on communication protocols are formulated for data transmission. Big data platform and related technologies are utilized for data storage and computation. Compliance verification analysis and a power quality performance assessment are conducted, and a visualization tool for result presentation is finally presented.展开更多
Sea surface temperature(SST)data obtained from coastal stations in Jiangsu,China during 20102014 are quality controlled before analysis of their characteristic semidiurnal and seasonal cycles,including the correlation...Sea surface temperature(SST)data obtained from coastal stations in Jiangsu,China during 20102014 are quality controlled before analysis of their characteristic semidiurnal and seasonal cycles,including the correlation with the variation of the tide.Quality control of data includes the validation of extreme values and checking of hourly values based on temporally adjacent data points,with 0.15℃/h considered a suitable threshold for detecting abnormal values.The diurnal variation amplitude of the SST data is greater in spring and summer than in autumn and winter.The diurnal variation of SST has bimodal structure on most days,i.e.,SST has a significant semidiurnal cycle.Moreover,the semidiurnal cycle of SST is negatively correlated with the tidal data from March to August,but positively correlated with the tidal data from October to January.Little correlation is detected in the remaining months because of the weak coastal offshore SST gradients.The quality control and understanding of coastal SST data are particularly relevant with regard to the validation of indirect measurements such as satellite-derived data.展开更多
Background:Most of previous studies aimed to estimate the effect of nurse staffing on quality of acute hospital care have used stochastic methods and their results are mixed.Objective:To measure the magnitude of effec...Background:Most of previous studies aimed to estimate the effect of nurse staffing on quality of acute hospital care have used stochastic methods and their results are mixed.Objective:To measure the magnitude of effect of nurse-staffing level on increasing quality of acute care services in long-run.Data:The number of practicing nurses'density per 1000 population as the proxy of nurse-staffing level and three Health Care Quality Indicators(HCQI)included 30-day mortality per 100 patients based on acute myocardial infarction(MORTAMIO),hemorrhagic stroke(MORTHSTO)and ischemic stroke(MORTISTO)were collected as a part of ongoing project by OECD.org in panels of 26 OECD countries over 2005-2015 period.Method:Panel data analysis.Results:There were committed relationships from nurse-staffing level to the enhancement of HCQI i.e.1%increase in nurse-staffing level would reduce the rates of patient mortality based on MORTAMIO,MORTHSTO and MORTISTO by 0.65%,0.60%and 0.80%,respectively.Furthermore,the role of nursestaffing level in increasing overall HCQI were simulated at the highest level in Sweden(-3.53),Denmark(-3.31),Canada(-2.59),Netherlands(-2.33),Finland(-2.09),Switzerland(-1.72),Australia(-1.64)and United States(-1.53).Conclusion:A higher proportion of nurses-staffing level is associated with higher quality of acute care services in OECD countries.Also,the nursing characteristics of Sweden,Denmark,Canada,Netherlands,Finland,Switzerland,Australia and United States would be good patterns for other countries to maximize nursing outcomes in the care of patients with acute and life-threatening conditions by reducing the risk of complication,mortality and adverse clinical outcomes.展开更多
The Lightning Mapping Imager(LMI)equipped on the FY-4 A(Feng Yun-4 A)geostationary satellite achieves lightning positioning through optical imaging and has the advantages of high temporal resolution,high stability,and...The Lightning Mapping Imager(LMI)equipped on the FY-4 A(Feng Yun-4 A)geostationary satellite achieves lightning positioning through optical imaging and has the advantages of high temporal resolution,high stability,and continuous observation.In this study,FY-4 A LMI lightning event,group and flash data from April to August 2018 are selected,and their quality are assessed through qualitative and quantitative comparison with the ground-based Advanced Time of Arrival and Direction system(ADTD)lightning observation network data and the American International Space Station(ISS)lightning imaging sensor(LIS)data.The results show that the spatial distributions of FY-4 A lightning are consistent with those of the ground-based ADTD and ISS LIS.The temporal variation in FY-4 A lightning group frequency is consistent with that of ADTD stroke,which reflects that FY-4 A LMI can capture the lightning occurrence in inland China.Quantitative statistics show that the consistency rate of FY-4 A LMI and ISS LIS events is relatively high but their consistency rate is lower in terms of lightning group and flash data.Compared with the lightning observations by the ISS LIS and the ground-based ADTD,FY-4 A LMI reports fewer lightning events in the Tibetan Plateau.The application of Tibetan Plateau lightning data requires further processing and consideration.展开更多
One of the goals of data collection is preparing for decision-making, so high quality requirement must be satisfied. Rational evaluation of data quality is an effective way to identify data problem in time, and the qu...One of the goals of data collection is preparing for decision-making, so high quality requirement must be satisfied. Rational evaluation of data quality is an effective way to identify data problem in time, and the quality of data after this evaluation is satisfactory with the requirement of decision maker. A fuzzy neural network based research method of data quality evaluation is proposed. First, the criteria for the evaluation of data quality are selected to construct the fuzzy sets of evaluating grades, and then by using the learning ability of NN, the objective evaluation of membership is carried out, which can be used for the effective evaluation of data quality. This research has been used in the platform of 'data report of national compulsory education outlay guarantee' from the Chinese Ministry of Education. This method can be used for the effective evaluation of data quality worldwide, and the data quality situation can be found out more completely, objectively, and in better time by using the method.展开更多
A knowledge-based network for Section Yidong Bridge,Dongyang River,one tributary of Qiantang River,Zhejiang Province,China,is established in order to model water quality in areas under small data.Then,based on normal ...A knowledge-based network for Section Yidong Bridge,Dongyang River,one tributary of Qiantang River,Zhejiang Province,China,is established in order to model water quality in areas under small data.Then,based on normal transformation of variables with routine monitoring data and normal assumption of variables without routine monitoring data,a conditional linear Gaussian Bayesian network is constructed.A "two-constraint selection" procedure is proposed to estimate potential parameter values under small data.Among all potential parameter values,the ones that are most probable are selected as the "representatives".Finally,the risks of pollutant concentration exceeding national water quality standards are calculated and pollution reduction decisions for decision-making reference are proposed.The final results show that conditional linear Gaussian Bayesian network and "two-constraint selection" procedure are very useful in evaluating risks when there is limited data and can help managers to make sound decisions under small data.展开更多
Purpose: This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring th...Purpose: This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring the data quality of the European Tertiary Education Register(ETER) database, illustrating its functioning and highlighting the main challenges that still have to be faced in this domain.Design/methodology/approach: The proposed data quality methodology is based on two kinds of checks, one to assess the consistency of cross-sectional data and the other to evaluate the stability of multiannual data. This methodology has an operational and empirical orientation. This means that the proposed checks do not assume any theoretical distribution for the determination of the threshold parameters that identify potential outliers, inconsistencies, and errors in the data. Findings: We show that the proposed cross-sectional checks and multiannual checks are helpful to identify outliers, extreme observations and to detect ontological inconsistencies not described in the available meta-data. For this reason, they may be a useful complement to integrate the processing of the available information.Research limitations: The coverage of the study is limited to European Higher Education Institutions. The cross-sectional and multiannual checks are not yet completely integrated.Practical implications: The consideration of the quality of the available data and information is important to enhance data quality-aware empirical investigations, highlighting problems, and areas where to invest for improving the coverage and interoperability of data in future data collection initiatives.Originality/value: The data-driven quality checks proposed in this paper may be useful as a reference for building and monitoring the data quality of new databases or of existing databases available for other countries or systems characterized by high heterogeneity and complexity of the units of analysis without relying on pre-specified theoretical distributions.展开更多
Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.T...Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.展开更多
Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research...Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research work is still concerned with the collection of water quality indicators,and ignored the analysis of water quality monitoring data and its value.In this paper,by adopting Laravel and AdminTE framework,we introduced how to design and implement a water quality data visualization platform based on Baidu ECharts.Through the deployed water quality sensor,the collected water quality indicator data is transmitted to the big data processing platform that deployed on Tencent Cloud in real time through the 4G network.The collected monitoring data is analyzed,and the processing result is visualized by Baidu ECharts.The test results showed that the designed system could run well and will provide decision support for water resource protection.展开更多
<span style="font-family:Verdana;">Most GIS databases contain data errors. The quality of the data sources such as traditional paper maps or more recent remote sensing data determines spatial data qual...<span style="font-family:Verdana;">Most GIS databases contain data errors. The quality of the data sources such as traditional paper maps or more recent remote sensing data determines spatial data quality. In the past several decades, different statistical measures have been developed to evaluate data quality for different types of data, such as nominal categorical data, ordinal categorical data and numerical data. Although these methods were originally proposed for medical research or psychological research, they have been widely used to evaluate spatial data quality. In this paper, we first review statistical methods for evaluating data quality, discuss under what conditions we should use them and how to interpret the results, followed by a brief discussion of statistical software and packages that can be used to compute these data quality measures.</span>展开更多
Secret data hiding in binary images is more difficult than other formats since binary images require only one bit repre-sentation to indicate black and white. This study proposes a new method for data hiding in binary...Secret data hiding in binary images is more difficult than other formats since binary images require only one bit repre-sentation to indicate black and white. This study proposes a new method for data hiding in binary images using opti-mized bit position to replace a secret bit. This method manipulates blocks, which are sub-divided. The parity bit for a specified block decides whether to change or not, to embed a secret bit. By finding the best position to insert a secret bit for each divided block, the image quality of the resulting stego-image can be improved, while maintaining low computational complexity. The experimental results show that the proposed method has an improvement with respect to a previous work.展开更多
基金supported by the Institute of Information&Communications Technology Planning&Evaluation (IITP)grant funded by the Korean government (MSIT) (No.2022-0-00369)by the NationalResearch Foundation of Korea Grant funded by the Korean government (2018R1A5A1060031,2022R1F1A1065664).
文摘How can we efficiently store and mine dynamically generated dense tensors for modeling the behavior of multidimensional dynamic data?Much of the multidimensional dynamic data in the real world is generated in the form of time-growing tensors.For example,air quality tensor data consists of multiple sensory values gathered from wide locations for a long time.Such data,accumulated over time,is redundant and consumes a lot ofmemory in its raw form.We need a way to efficiently store dynamically generated tensor data that increase over time and to model their behavior on demand between arbitrary time blocks.To this end,we propose a Block IncrementalDense Tucker Decomposition(BID-Tucker)method for efficient storage and on-demand modeling ofmultidimensional spatiotemporal data.Assuming that tensors come in unit blocks where only the time domain changes,our proposed BID-Tucker first slices the blocks into matrices and decomposes them via singular value decomposition(SVD).The SVDs of the time×space sliced matrices are stored instead of the raw tensor blocks to save space.When modeling from data is required at particular time blocks,the SVDs of corresponding time blocks are retrieved and incremented to be used for Tucker decomposition.The factor matrices and core tensor of the decomposed results can then be used for further data analysis.We compared our proposed BID-Tucker with D-Tucker,which our method extends,and vanilla Tucker decomposition.We show that our BID-Tucker is faster than both D-Tucker and vanilla Tucker decomposition and uses less memory for storage with a comparable reconstruction error.We applied our proposed BID-Tucker to model the spatial and temporal trends of air quality data collected in South Korea from 2018 to 2022.We were able to model the spatial and temporal air quality trends.We were also able to verify unusual events,such as chronic ozone alerts and large fire events.
文摘As big data becomes an apparent challenge to handle when building a business intelligence(BI)system,there is a motivation to handle this challenging issue in higher education institutions(HEIs).Monitoring quality in HEIs encompasses handling huge amounts of data coming from different sources.This paper reviews big data and analyses the cases from the literature regarding quality assurance(QA)in HEIs.It also outlines a framework that can address the big data challenge in HEIs to handle QA monitoring using BI dashboards and a prototype dashboard is presented in this paper.The dashboard was developed using a utilisation tool to monitor QA in HEIs to provide visual representations of big data.The prototype dashboard enables stakeholders to monitor compliance with QA standards while addressing the big data challenge associated with the substantial volume of data managed by HEIs’QA systems.This paper also outlines how the developed system integrates big data from social media into the monitoring dashboard.
基金Supported by the National Key Research and Development Program of China(Nos.2016YFC1402000,2018YFC1407003,2017YFC1405300)
文摘Offshore waters provide resources for human beings,while on the other hand,threaten them because of marine disasters.Ocean stations are part of offshore observation networks,and the quality of their data is of great significance for exploiting and protecting the ocean.We used hourly mean wave height,temperature,and pressure real-time observation data taken in the Xiaomaidao station(in Qingdao,China)from June 1,2017,to May 31,2018,to explore the data quality using eight quality control methods,and to discriminate the most effective method for Xiaomaidao station.After using the eight quality control methods,the percentages of the mean wave height,temperature,and pressure data that passed the tests were 89.6%,88.3%,and 98.6%,respectively.With the marine disaster(wave alarm report)data,the values failed in the test mainly due to the influence of aging observation equipment and missing data transmissions.The mean wave height is often affected by dynamic marine disasters,so the continuity test method is not effective.The correlation test with other related parameters would be more useful for the mean wave height.
文摘Multisensor data fusion (MDF) is an emerging technology to fuse data from multiple sensors in order to make a more accurate estimation of the environment through measurement and detection. Applications of MDF cross a wide spectrum in military and civilian areas. With the rapid evolution of computers and the proliferation of micro-mechanical/electrical systems sensors, the utilization of MDF is being popularized in research and applications. This paper focuses on application of MDF for high quality data analysis and processing in measurement and instrumentation. A practical, general data fusion scheme was established on the basis of feature extraction and merge of data from multiple sensors. This scheme integrates artificial neural networks for high performance pattern recognition. A number of successful applications in areas of NDI (Non-Destructive Inspection) corrosion detection, food quality and safety characterization, and precision agriculture are described and discussed in order to motivate new applications in these or other areas. This paper gives an overall picture of using the MDF method to increase the accuracy of data analysis and processing in measurement and instrumentation in different areas of applications.
基金Natural Science Foundation of Guangdong Province(2016A030313140)Project 973(2015CB452802)+1 种基金Natural Science Foundation of China(41405104)Science and Technology Program of Guangzhou City(201604020012)
文摘We first analyzed GPS precipitable water vapor(GPS/PWV) available from a ground-based GPS observation network in Guangdong from 1 August 2009 to 27 August 2012 and then developed a method of quality control before GPS/PWV data is assimilated into the GRAPES 3DVAR system. This method can reject the outliers effectively. After establishing the criterion for quality control, we did three numerical experiments to investigate the impact on the precipitation forecast with and without the quality-controlled GPS/PWV data before they are assimilated into the system.In the numerical experiments, two precipitation cases(on 6 to 7 May, 2010 and 27 to 28 April, 2012 respectively) that occurred in the annually first raining season of Guangdong were selected. The results indicated that after quality control,only the GPS/PWV data that deviates little from the NCEP/PWV data can be assimilated into the system, has reasonable adjustment of the initial water vapor above Guangdong, and eventually improves the intensity and location of 24-h precipitation forecast significantly.
基金This work is supported by the NSFC(Nos.61772280,61772454)the Changzhou Sci&Tech Program(No.CJ20179027)the PAPD fund from NUIST.Prof.Jin Wang is the corresponding author。
文摘Since the British National Archive put forward the concept of the digital continuity in 2007,several developed countries have worked out their digital continuity action plan.However,the technologies of the digital continuity guarantee are still lacked.At first,this paper analyzes the requirements of digital continuity guarantee for electronic record based on data quality theory,then points out the necessity of data quality guarantee for electronic record.Moreover,we convert the digital continuity guarantee of electronic record to ensure the consistency,completeness and timeliness of electronic record,and construct the first technology framework of the digital continuity guarantee for electronic record.Finally,the temporal functional dependencies technology is utilized to build the first integration method to insure the consistency,completeness and timeliness of electronic record.
基金supported by the National Innovation Project for Meteorological Science and Technology grant number CMAGGTD003-5the National Key R&D Program of China grant number2017YFC1501801。
文摘This study proposes a method to derive the climatological limit thresholds that can be used in an operational/historical quality control procedure for Chinese high vertical resolution(5–10 m)radiosonde temperature and wind speed data.The whole atmosphere is divided into 64 vertical bins,and the profiles are constructed by the percentiles of the values in each vertical bin.Based on the percentile profiles(PPs),some objective criteria are developed to obtain the thresholds.Tibetan Plateau field data are used to validate the effectiveness of the method in the application of experimental data.The results show that the derived thresholds for 120 operational stations and 3 experimental stations are effective in detecting the gross errors,and those PPs can clearly and instantly illustrate the characteristics of a radiosonde variable and reveal the distribution of errors.
基金the National Natural Science Foundation of China(No.61304208)Scientific Research Fund of Hunan Province Education Department(18C0003)+5 种基金Researchproject on teaching reform in colleges and universities of Hunan Province Education Department(20190147)Changsha City Science and Technology Plan Program(K1501013-11)Hunan NormalUniversity University-Industry Cooperation.This work is implemented at the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data PropertyUniversities of Hunan ProvinceOpen projectgrant number 20181901CRP04.
文摘Water is one of the basic resources for human survival.Water pollution monitoring and protection have been becoming a major problem for many countries all over the world.Most traditional water quality monitoring systems,however,generally focus only on water quality data collection,ignoring data analysis and data mining.In addition,some dirty data and data loss may occur due to power failures or transmission failures,further affecting data analysis and its application.In order to meet these needs,by using Internet of things,cloud computing,and big data technologies,we designed and implemented a water quality monitoring data intelligent service platform in C#and PHP language.The platform includes monitoring point addition,monitoring point map labeling,monitoring data uploading,monitoring data processing,early warning of exceeding the standard of monitoring indicators,and other functions modules.Using this platform,we can realize the automatic collection of water quality monitoring data,data cleaning,data analysis,intelligent early warning and early warning information push,and other functions.For better security and convenience,we deployed the system in the Tencent Cloud and tested it.The testing results showed that the data analysis platform could run well and will provide decision support for water resource protection.
基金The National Natural Science Foundation of China (No.70772021,70372004)China Postdoctoral Science Foundation (No.20060400077)
文摘This paper presents a methodology to determine three data quality (DQ) risk characteristics: accuracy, comprehensiveness and nonmembership. The methodology provides a set of quantitative models to confirm the information quality risks for the database of the geographical information system (GIS). Four quantitative measures are introduced to examine how the quality risks of source information affect the quality of information outputs produced using the relational algebra operations Selection, Projection, and Cubic Product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The GIS is the prime source of information on the location of cables, and detection time strongly depends on whether maps indicate the presence of cables in the construction business. Poor data quality in the GIS can contribute to increased risk or higher risk avoidance costs. A case study provides a numerical example of the calculation of the trade-offs between risk and detection costs and provides an example of the calculation of the costs of data quality. We conclude that the model contributes valuable new insight.
基金supported by the State Grid Science and Technology Project (GEIRI-DL-71-17-002)
文摘This paper introduces the implementation and data analysis associated with a state-wide power quality monitoring and analysis system in China. Corporation specifications on power quality monitors as well as on communication protocols are formulated for data transmission. Big data platform and related technologies are utilized for data storage and computation. Compliance verification analysis and a power quality performance assessment are conducted, and a visualization tool for result presentation is finally presented.
基金The Open Fund of State Key Laboratory of Satellite Ocean Environment Dynamics under contract No.SOED1402the Youth Science and Technology Foundation of East China Sea Branch,SOA under contract No.201624
文摘Sea surface temperature(SST)data obtained from coastal stations in Jiangsu,China during 20102014 are quality controlled before analysis of their characteristic semidiurnal and seasonal cycles,including the correlation with the variation of the tide.Quality control of data includes the validation of extreme values and checking of hourly values based on temporally adjacent data points,with 0.15℃/h considered a suitable threshold for detecting abnormal values.The diurnal variation amplitude of the SST data is greater in spring and summer than in autumn and winter.The diurnal variation of SST has bimodal structure on most days,i.e.,SST has a significant semidiurnal cycle.Moreover,the semidiurnal cycle of SST is negatively correlated with the tidal data from March to August,but positively correlated with the tidal data from October to January.Little correlation is detected in the remaining months because of the weak coastal offshore SST gradients.The quality control and understanding of coastal SST data are particularly relevant with regard to the validation of indirect measurements such as satellite-derived data.
文摘Background:Most of previous studies aimed to estimate the effect of nurse staffing on quality of acute hospital care have used stochastic methods and their results are mixed.Objective:To measure the magnitude of effect of nurse-staffing level on increasing quality of acute care services in long-run.Data:The number of practicing nurses'density per 1000 population as the proxy of nurse-staffing level and three Health Care Quality Indicators(HCQI)included 30-day mortality per 100 patients based on acute myocardial infarction(MORTAMIO),hemorrhagic stroke(MORTHSTO)and ischemic stroke(MORTISTO)were collected as a part of ongoing project by OECD.org in panels of 26 OECD countries over 2005-2015 period.Method:Panel data analysis.Results:There were committed relationships from nurse-staffing level to the enhancement of HCQI i.e.1%increase in nurse-staffing level would reduce the rates of patient mortality based on MORTAMIO,MORTHSTO and MORTISTO by 0.65%,0.60%and 0.80%,respectively.Furthermore,the role of nursestaffing level in increasing overall HCQI were simulated at the highest level in Sweden(-3.53),Denmark(-3.31),Canada(-2.59),Netherlands(-2.33),Finland(-2.09),Switzerland(-1.72),Australia(-1.64)and United States(-1.53).Conclusion:A higher proportion of nurses-staffing level is associated with higher quality of acute care services in OECD countries.Also,the nursing characteristics of Sweden,Denmark,Canada,Netherlands,Finland,Switzerland,Australia and United States would be good patterns for other countries to maximize nursing outcomes in the care of patients with acute and life-threatening conditions by reducing the risk of complication,mortality and adverse clinical outcomes.
基金National Key R&D Program of China(2018YFC1506603)The Second Tibetan Plateau Scientific Expedition and Research(STEP)Program(2019QZKK0105)。
文摘The Lightning Mapping Imager(LMI)equipped on the FY-4 A(Feng Yun-4 A)geostationary satellite achieves lightning positioning through optical imaging and has the advantages of high temporal resolution,high stability,and continuous observation.In this study,FY-4 A LMI lightning event,group and flash data from April to August 2018 are selected,and their quality are assessed through qualitative and quantitative comparison with the ground-based Advanced Time of Arrival and Direction system(ADTD)lightning observation network data and the American International Space Station(ISS)lightning imaging sensor(LIS)data.The results show that the spatial distributions of FY-4 A lightning are consistent with those of the ground-based ADTD and ISS LIS.The temporal variation in FY-4 A lightning group frequency is consistent with that of ADTD stroke,which reflects that FY-4 A LMI can capture the lightning occurrence in inland China.Quantitative statistics show that the consistency rate of FY-4 A LMI and ISS LIS events is relatively high but their consistency rate is lower in terms of lightning group and flash data.Compared with the lightning observations by the ISS LIS and the ground-based ADTD,FY-4 A LMI reports fewer lightning events in the Tibetan Plateau.The application of Tibetan Plateau lightning data requires further processing and consideration.
基金the National Natural Science Foundation of China (60503024 50634010).
文摘One of the goals of data collection is preparing for decision-making, so high quality requirement must be satisfied. Rational evaluation of data quality is an effective way to identify data problem in time, and the quality of data after this evaluation is satisfactory with the requirement of decision maker. A fuzzy neural network based research method of data quality evaluation is proposed. First, the criteria for the evaluation of data quality are selected to construct the fuzzy sets of evaluating grades, and then by using the learning ability of NN, the objective evaluation of membership is carried out, which can be used for the effective evaluation of data quality. This research has been used in the platform of 'data report of national compulsory education outlay guarantee' from the Chinese Ministry of Education. This method can be used for the effective evaluation of data quality worldwide, and the data quality situation can be found out more completely, objectively, and in better time by using the method.
基金Project(50809058)supported by the National Natural Science Foundation of China
文摘A knowledge-based network for Section Yidong Bridge,Dongyang River,one tributary of Qiantang River,Zhejiang Province,China,is established in order to model water quality in areas under small data.Then,based on normal transformation of variables with routine monitoring data and normal assumption of variables without routine monitoring data,a conditional linear Gaussian Bayesian network is constructed.A "two-constraint selection" procedure is proposed to estimate potential parameter values under small data.Among all potential parameter values,the ones that are most probable are selected as the "representatives".Finally,the risks of pollutant concentration exceeding national water quality standards are calculated and pollution reduction decisions for decision-making reference are proposed.The final results show that conditional linear Gaussian Bayesian network and "two-constraint selection" procedure are very useful in evaluating risks when there is limited data and can help managers to make sound decisions under small data.
基金support of the European Commission ETER Project (No. 934533-2017-AO8-CH)H2020 RISIS 2 project (No. 824091)。
文摘Purpose: This paper relates the definition of data quality procedures for knowledge organizations such as Higher Education Institutions. The main purpose is to present the flexible approach developed for monitoring the data quality of the European Tertiary Education Register(ETER) database, illustrating its functioning and highlighting the main challenges that still have to be faced in this domain.Design/methodology/approach: The proposed data quality methodology is based on two kinds of checks, one to assess the consistency of cross-sectional data and the other to evaluate the stability of multiannual data. This methodology has an operational and empirical orientation. This means that the proposed checks do not assume any theoretical distribution for the determination of the threshold parameters that identify potential outliers, inconsistencies, and errors in the data. Findings: We show that the proposed cross-sectional checks and multiannual checks are helpful to identify outliers, extreme observations and to detect ontological inconsistencies not described in the available meta-data. For this reason, they may be a useful complement to integrate the processing of the available information.Research limitations: The coverage of the study is limited to European Higher Education Institutions. The cross-sectional and multiannual checks are not yet completely integrated.Practical implications: The consideration of the quality of the available data and information is important to enhance data quality-aware empirical investigations, highlighting problems, and areas where to invest for improving the coverage and interoperability of data in future data collection initiatives.Originality/value: The data-driven quality checks proposed in this paper may be useful as a reference for building and monitoring the data quality of new databases or of existing databases available for other countries or systems characterized by high heterogeneity and complexity of the units of analysis without relying on pre-specified theoretical distributions.
基金Projects(61001188,1161140319)supported by the National Natural Science Foundation of ChinaProject(2012ZX03001034)supported by the National Science and Technology Major ProjectProject(YETP1202)supported by Beijing Higher Education Young Elite Teacher Project,China
文摘Objective speech quality is difficult to be measured without the input reference speech.Mapping methods using data mining are investigated and designed to improve the output-based speech quality assessment algorithm.The degraded speech is firstly separated into three classes(unvoiced,voiced and silence),and then the consistency measurement between the degraded speech signal and the pre-trained reference model for each class is calculated and mapped to an objective speech quality score using data mining.Fuzzy Gaussian mixture model(GMM)is used to generate the artificial reference model trained on perceptual linear predictive(PLP)features.The mean opinion score(MOS)mapping methods including multivariate non-linear regression(MNLR),fuzzy neural network(FNN)and support vector regression(SVR)are designed and compared with the standard ITU-T P.563 method.Experimental results show that the assessment methods with data mining perform better than ITU-T P.563.Moreover,FNN and SVR are more efficient than MNLR,and FNN performs best with 14.50% increase in the correlation coefficient and 32.76% decrease in the root-mean-square MOS error.
基金This work is supported by National Natural Science Foundation of China 61304208by the 2011 Collaborative Innovation Center for Development and Utilization of Finance and Economics Big Data Property Open Fund Project 20181901CRP04+2 种基金by the Scientific Research Fund of Hunan Province Education Department 18C0003by the Research Project on Teaching Reform in General Colleges and Universities,Hunan Provincial Education Department 20190147by the Hunan Normal University Ungraduated Innovation and Entrepreneurship Training Plan Project 2019127.
文摘Water resources are one of the basic resources for human survival,and water protection has been becoming a major problem for countries around the world.However,most of the traditional water quality monitoring research work is still concerned with the collection of water quality indicators,and ignored the analysis of water quality monitoring data and its value.In this paper,by adopting Laravel and AdminTE framework,we introduced how to design and implement a water quality data visualization platform based on Baidu ECharts.Through the deployed water quality sensor,the collected water quality indicator data is transmitted to the big data processing platform that deployed on Tencent Cloud in real time through the 4G network.The collected monitoring data is analyzed,and the processing result is visualized by Baidu ECharts.The test results showed that the designed system could run well and will provide decision support for water resource protection.
文摘<span style="font-family:Verdana;">Most GIS databases contain data errors. The quality of the data sources such as traditional paper maps or more recent remote sensing data determines spatial data quality. In the past several decades, different statistical measures have been developed to evaluate data quality for different types of data, such as nominal categorical data, ordinal categorical data and numerical data. Although these methods were originally proposed for medical research or psychological research, they have been widely used to evaluate spatial data quality. In this paper, we first review statistical methods for evaluating data quality, discuss under what conditions we should use them and how to interpret the results, followed by a brief discussion of statistical software and packages that can be used to compute these data quality measures.</span>
文摘Secret data hiding in binary images is more difficult than other formats since binary images require only one bit repre-sentation to indicate black and white. This study proposes a new method for data hiding in binary images using opti-mized bit position to replace a secret bit. This method manipulates blocks, which are sub-divided. The parity bit for a specified block decides whether to change or not, to embed a secret bit. By finding the best position to insert a secret bit for each divided block, the image quality of the resulting stego-image can be improved, while maintaining low computational complexity. The experimental results show that the proposed method has an improvement with respect to a previous work.