Advances in technology require upgrades in the law. One such area involves data brokers, which have thus far gone unregulated. Data brokers use artificial intelligence to aggregate information into data profiles about...Advances in technology require upgrades in the law. One such area involves data brokers, which have thus far gone unregulated. Data brokers use artificial intelligence to aggregate information into data profiles about individual Americans derived from consumer use of the internet and connected devices. Data profiles are then sold for profit. Government investigators use a legal loophole to purchase this data instead of obtaining a search warrant, which the Fourth Amendment would otherwise require. Consumers have lacked a reasonable means to fight or correct the information data brokers collect. Americans may not even be aware of the risks of data aggregation, which upends the test of reasonable expectations used in a search warrant analysis. Data aggregation should be controlled and regulated, which is the direction some privacy laws take. Legislatures must step forward to safeguard against shadowy data-profiling practices, whether abroad or at home. In the meantime, courts can modify their search warrant analysis by including data privacy principles.展开更多
11% of Irish electricity was consumed by data centres in 2020. The Irish data centre industry and the cooling methods utilised require reformative actions in the coming years to meet EU Energy policies. The resell of ...11% of Irish electricity was consumed by data centres in 2020. The Irish data centre industry and the cooling methods utilised require reformative actions in the coming years to meet EU Energy policies. The resell of heat, alternative cooling methods or carbon reduction methods are all possibilities to conform to these policies. This study aims to determine the viability of the resell of waste heat from data centres both technically and economically. This was determined using a novel application of thermodynamics to determine waste heat recovery potential in Irish data centres, and the current methods of heat generation for economical comparison. This paper also explores policy surrounding waste heat recovery within the industry. The Recoverable Carnot Equivalent Power (RCEP) is theoretically calculated for the three potential cooling methods for Irish data centres. These are air, hybrid, and immersion cooling techniques. This is the maximum useable heat that can be recovered from a data centre rack. This study is established under current operating conditions which are optimised for cooling performance, that air cooling has the highest potential RCEP of 0.39 kW/rack. This is approximately 8% of the input electrical power that can be captured as useable heat. Indicating that Irish data centres have the energy potential to be heat providers in the Irish economy. This study highlighted the technical and economic aspects of prevalent cooling techniques and determined air cooling heat recovery cost can be reduced to 0.01 €/kWhth using offsetting. This is financially competitive with current heating solutions in Ireland.展开更多
Mitigating increasing cyberattack incidents may require strategies such as reinforcing organizations’ networks with Honeypots and effectively analyzing attack traffic for detection of zero-day attacks and vulnerabili...Mitigating increasing cyberattack incidents may require strategies such as reinforcing organizations’ networks with Honeypots and effectively analyzing attack traffic for detection of zero-day attacks and vulnerabilities. To effectively detect and mitigate cyberattacks, both computerized and visual analyses are typically required. However, most security analysts are not adequately trained in visualization principles and/or methods, which is required for effective visual perception of useful attack information hidden in attack data. Additionally, Honeypot has proven useful in cyberattack research, but no studies have comprehensively investigated visualization practices in the field. In this paper, we reviewed visualization practices and methods commonly used in the discovery and communication of attack patterns based on Honeypot network traffic data. Using the PRISMA methodology, we identified and screened 218 papers and evaluated only 37 papers having a high impact. Most Honeypot papers conducted summary statistics of Honeypot data based on static data metrics such as IP address, port, and packet size. They visually analyzed Honeypot attack data using simple graphical methods (such as line, bar, and pie charts) that tend to hide useful attack information. Furthermore, only a few papers conducted extended attack analysis, and commonly visualized attack data using scatter and linear plots. Papers rarely included simple yet sophisticated graphical methods, such as box plots and histograms, which allow for critical evaluation of analysis results. While a significant number of automated visualization tools have incorporated visualization standards by default, the construction of effective and expressive graphical methods for easy pattern discovery and explainable insights still requires applied knowledge and skill of visualization principles and tools, and occasionally, an interdisciplinary collaboration with peers. We, therefore, suggest the need, going forward, for non-classical graphical methods for visualizing attack patterns and communicating analysis results. We also recommend training investigators in visualization principles and standards for effective visual perception and presentation.展开更多
Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.I...Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods.展开更多
Environmental systems including our atmosphere oceans, biological… etc. can be modeled by mathematical equations to estimate their states. These equations can be solved with numerical methods. Initial and boundary co...Environmental systems including our atmosphere oceans, biological… etc. can be modeled by mathematical equations to estimate their states. These equations can be solved with numerical methods. Initial and boundary conditions are needed for such of these numerical methods. Predication and simulations for different case studies are major sources for the great importance of these models. Satellite data from different wide ranges of sensors provide observations that indicate system state. So both numerical models and satellite data provide estimation of system states, and between the different estimations it is required the best estimate for system state. Assimilation of observations in numerical weather models with data assimilation techniques provide an improved estimate of system states. In this work, highlights on the mathematical perspective for data assimilation methods are introduced. Least square estimation techniques are introduced because it is considered the basic mathematical building block for data assimilation methods. Stochastic version of least square is included to handle the error in both model and observation. Then the three and four dimensional variational assimilation 3dvar and 4dvar respectively will be handled. Kalman filters and its derivatives Extended, (KF, EKF, ENKF) and hybrid filters are introduced.展开更多
The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods...The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods is required. In this paper, the methodological approach to collecting, structuring and publishing the methods, which have been used or developed by former or present adaptation initiatives, is described. The intention is to communicate achieved knowledge and thus support future users. A key component is the participation of users in the development process. Main elements of the approach are standardized, template-based descriptions of the methods including the specific applications, references, and method assessment. All contributions have been quality checked, sorted, and placed in a larger context. The result is a report on statistical methods which is freely available as printed or online version. Examples of how to use the methods are presented in this paper and are also included in the brochure.展开更多
With the ongoing advancements in sensor networks and data acquisition technologies across various systems like manufacturing,aviation,and healthcare,the data driven vibration control(DDVC)has attracted broad interests...With the ongoing advancements in sensor networks and data acquisition technologies across various systems like manufacturing,aviation,and healthcare,the data driven vibration control(DDVC)has attracted broad interests from both the industrial and academic communities.Input shaping(IS),as a simple and effective feedforward method,is greatly demanded in DDVC methods.It convolves the desired input command with impulse sequence without requiring parametric dynamics and the closed-loop system structure,thereby suppressing the residual vibration separately.Based on a thorough investigation into the state-of-the-art DDVC methods,this survey has made the following efforts:1)Introducing the IS theory and typical input shapers;2)Categorizing recent progress of DDVC methods;3)Summarizing commonly adopted metrics for DDVC;and 4)Discussing the engineering applications and future trends of DDVC.By doing so,this study provides a systematic and comprehensive overview of existing DDVC methods from designing to optimizing perspectives,aiming at promoting future research regarding this emerging and vital issue.展开更多
In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene p...In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene production by acetonitrile.The accuracy of five prediction methods,UNIFAC(UNIQUAC Functional-group Activity Coefficients),UNIFAC-LL,UNIFAC-LBY,UNIFAC-DMD and COSMO-RS,applied to the butadiene extraction process was verified using partial phase equilibrium data.The results showed that the UNIFAC-DMD method had the highest accuracy in predicting phase equilibrium data for the missing system.COSMO-RS-predicted multiple systems showed good accuracy,and a large number of missing phase equilibrium data were estimated using the UNIFAC-DMD method and COSMO-RS method.The predicted phase equilibrium data were checked for consistency.The NRTL-RK(non-Random Two Liquid-Redlich-Kwong Equation of State)and UNIQUAC thermodynamic models were used to correlate the phase equilibrium data.Industrial device simulations were used to verify the accuracy of the thermodynamic model applied to the butadiene extraction process.The simulation results showed that the average deviations of the simulated results using the correlated thermodynamic model from the actual values were less than 2%compared to that using the commercial simulation software,Aspen Plus and its database.The average deviation was much smaller than that of the simulations using the Aspen Plus database(>10%),indicating that the obtained phase equilibrium data are highly accurate and reliable.The best phase equilibrium data and thermodynamic model parameters for butadiene extraction are provided.This improves the accuracy and reliability of the design,optimization and control of the process,and provides a basis and guarantee for developing a more environmentally friendly and economical butadiene extraction process.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
In source detection in the Tianlai project,locating the interferometric fringe in visibility data accurately will influence downstream tasks drastically,such as physical parameter estimation and weak source exploratio...In source detection in the Tianlai project,locating the interferometric fringe in visibility data accurately will influence downstream tasks drastically,such as physical parameter estimation and weak source exploration.Considering that traditional locating methods are time-consuming and supervised methods require a great quantity of expensive labeled data,in this paper,we first investigate characteristics of interferometric fringes in the simulation and real scenario separately,and integrate an almost parameter-free unsupervised clustering method and seeding filling or eraser algorithm to propose a hierarchical plug and play method to improve location accuracy.Then,we apply our method to locate single and multiple sources’interferometric fringes in simulation data.Next,we apply our method to real data taken from the Tianlai radio telescope array.Finally,we compare with unsupervised methods that are state of the art.These results show that our method has robustness in different scenarios and can improve location measurement accuracy effectively.展开更多
We study continuous data assimilation(CDA)applied to projection and penalty methods for the Navier-Stokes(NS)equations.Penalty and projection methods are more efficient than consistent Ns discretizations,however are l...We study continuous data assimilation(CDA)applied to projection and penalty methods for the Navier-Stokes(NS)equations.Penalty and projection methods are more efficient than consistent Ns discretizations,however are less accurate due to modeling error(penalty)and splitting error(projection).We show analytically and numerically that with measurement data and properly chosen parameters,CDA can effectively remove these splitting and modeling errors and provide long time optimally accurate solutions.展开更多
An anisotropic diffusion filter can be used to model a flow-dependent background error covariance matrix,which can be achieved by solving the advection-diffusion equation.Because of the directionality of the advection...An anisotropic diffusion filter can be used to model a flow-dependent background error covariance matrix,which can be achieved by solving the advection-diffusion equation.Because of the directionality of the advection term,the discrete method needs to be chosen very carefully.The finite analytic method is an alternative scheme to solve the advection-diffusion equation.As a combination of analytical and numerical methods,it not only has high calculation accuracy but also holds the characteristic of the auto upwind.To demonstrate its ability,the one-dimensional steady and unsteady advection-diffusion equation numerical examples are respectively solved by the finite analytic method.The more widely used upwind difference method is used as a control approach.The result indicates that the finite analytic method has higher accuracy than the upwind difference method.For the two-dimensional case,the finite analytic method still has a better performance.In the three-dimensional variational assimilation experiment,the finite analytic method can effectively improve analysis field accuracy,and its effect is significantly better than the upwind difference and the central difference method.Moreover,it is still a more effective solution method in the strong flow region where the advective-diffusion filter performs most prominently.展开更多
Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanal...Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanalysis database(ERA5)is used.Seeing calculated from ERA5 is compared consistently with the Differential Image Motion Monitor seeing at the height of 12 m.Results show that seeing decays exponentially with height at the Muztagh-Ata site.Seeing decays the fastest in fall in 2021 and most slowly with height in summer.The seeing condition is better in fall than in summer.The median value of seeing at 12 m is 0.89 arcsec,the maximum value is1.21 arcsec in August and the minimum is 0.66 arcsec in October.The median value of seeing at 12 m is 0.72arcsec in the nighttime and 1.08 arcsec in the daytime.Seeing is a combination of annual and about biannual variations with the same phase as temperature and wind speed indicating that seeing variation with time is influenced by temperature and wind speed.The Richardson number Ri is used to analyze the atmospheric stability and the variations of seeing are consistent with Ri between layers.These quantitative results can provide an important reference for a telescopic observation strategy.展开更多
Pulsar detection has become an active research topic in radio astronomy recently.One of the essential procedures for pulsar detection is pulsar candidate sifting(PCS),a procedure for identifying potential pulsar signa...Pulsar detection has become an active research topic in radio astronomy recently.One of the essential procedures for pulsar detection is pulsar candidate sifting(PCS),a procedure for identifying potential pulsar signals in a survey.However,pulsar candidates are always class-imbalanced,as most candidates are non-pulsars such as RFI and only a tiny part of them are from real pulsars.Class imbalance can greatly affect the performance of machine learning(ML)models,resulting in a heavy cost as some real pulsars are misjudged.To deal with the problem,techniques of choosing relevant features to discriminate pulsars from non-pulsars are focused on,which is known as feature selection.Feature selection is a process of selecting a subset of the most relevant features from a feature pool.The distinguishing features between pulsars and non-pulsars can significantly improve the performance of the classifier even if the data are highly imbalanced.In this work,an algorithm for feature selection called the K-fold Relief-Greedy(KFRG)algorithm is designed.KFRG is a two-stage algorithm.In the first stage,it filters out some irrelevant features according to their K-fold Relief scores,while in the second stage,it removes the redundant features and selects the most relevant features by a forward greedy search strategy.Experiments on the data set of the High Time Resolution Universe survey verified that ML models based on KFRG are capable of PCS,correctly separating pulsars from non-pulsars even if the candidates are highly class-imbalanced.展开更多
The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff...The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.展开更多
The accurate estimation of parameters is the premise for establishing a high-fidelity simulation model of a valve-controlled cylinder system.Bench test data are easily obtained,but it is challenging to emulate actual ...The accurate estimation of parameters is the premise for establishing a high-fidelity simulation model of a valve-controlled cylinder system.Bench test data are easily obtained,but it is challenging to emulate actual loads in the research on parameter estimation of valve-controlled cylinder system.Despite the actual load information contained in the operating data of the control valve,its acquisition remains challenging.This paper proposes a method that fuses bench test and operating data for parameter estimation to address the aforementioned problems.The proposed method is based on Bayesian theory,and its core is a pool fusion of prior information from bench test and operating data.Firstly,a system model is established,and the parameters in the model are analysed.Secondly,the bench and operating data of the system are collected.Then,the model parameters and weight coefficients are estimated using the data fusion method.Finally,the estimated effects of the data fusion method,Bayesian method,and particle swarm optimisation(PSO)algorithm on system model parameters are compared.The research shows that the weight coefficient represents the contribution of different prior information to the parameter estimation result.The effect of parameter estimation based on the data fusion method is better than that of the Bayesian method and the PSO algorithm.Increasing load complexity leads to a decrease in model accuracy,highlighting the crucial role of the data fusion method in parameter estimation studies.展开更多
Artificial immune detection can be used to detect network intrusions in an adaptive approach and proper matching methods can improve the accuracy of immune detection methods.This paper proposes an artificial immune de...Artificial immune detection can be used to detect network intrusions in an adaptive approach and proper matching methods can improve the accuracy of immune detection methods.This paper proposes an artificial immune detection model for network intrusion data based on a quantitative matching method.The proposed model defines the detection process by using network data and decimal values to express features and artificial immune mechanisms are simulated to define immune elements.Then,to improve the accuracy of similarity calculation,a quantitative matching method is proposed.The model uses mathematical methods to train and evolve immune elements,increasing the diversity of immune recognition and allowing for the successful detection of unknown intrusions.The proposed model’s objective is to accurately identify known intrusions and expand the identification of unknown intrusions through signature detection and immune detection,overcoming the disadvantages of traditional methods.The experiment results show that the proposed model can detect intrusions effectively.It has a detection rate of more than 99.6%on average and a false alarm rate of 0.0264%.It outperforms existing immune intrusion detection methods in terms of comprehensive detection performance.展开更多
Attitude is one of the crucial parameters for space objects and plays a vital role in collision prediction and debris removal.Analyzing light curves to determine attitude is the most commonly used method.In photometri...Attitude is one of the crucial parameters for space objects and plays a vital role in collision prediction and debris removal.Analyzing light curves to determine attitude is the most commonly used method.In photometric observations,outliers may exist in the obtained light curves due to various reasons.Therefore,preprocessing is required to remove these outliers to obtain high quality light curves.Through statistical analysis,the reasons leading to outliers can be categorized into two main types:first,the brightness of the object significantly increases due to the passage of a star nearby,referred to as“stellar contamination,”and second,the brightness markedly decreases due to cloudy cover,referred to as“cloudy contamination.”The traditional approach of manually inspecting images for contamination is time-consuming and labor-intensive.However,we propose the utilization of machine learning methods as a substitute.Convolutional Neural Networks and SVMs are employed to identify cases of stellar contamination and cloudy contamination,achieving F1 scores of 1.00 and 0.98 on a test set,respectively.We also explore other machine learning methods such as ResNet-18 and Light Gradient Boosting Machine,then conduct comparative analyses of the results.展开更多
文摘Advances in technology require upgrades in the law. One such area involves data brokers, which have thus far gone unregulated. Data brokers use artificial intelligence to aggregate information into data profiles about individual Americans derived from consumer use of the internet and connected devices. Data profiles are then sold for profit. Government investigators use a legal loophole to purchase this data instead of obtaining a search warrant, which the Fourth Amendment would otherwise require. Consumers have lacked a reasonable means to fight or correct the information data brokers collect. Americans may not even be aware of the risks of data aggregation, which upends the test of reasonable expectations used in a search warrant analysis. Data aggregation should be controlled and regulated, which is the direction some privacy laws take. Legislatures must step forward to safeguard against shadowy data-profiling practices, whether abroad or at home. In the meantime, courts can modify their search warrant analysis by including data privacy principles.
文摘11% of Irish electricity was consumed by data centres in 2020. The Irish data centre industry and the cooling methods utilised require reformative actions in the coming years to meet EU Energy policies. The resell of heat, alternative cooling methods or carbon reduction methods are all possibilities to conform to these policies. This study aims to determine the viability of the resell of waste heat from data centres both technically and economically. This was determined using a novel application of thermodynamics to determine waste heat recovery potential in Irish data centres, and the current methods of heat generation for economical comparison. This paper also explores policy surrounding waste heat recovery within the industry. The Recoverable Carnot Equivalent Power (RCEP) is theoretically calculated for the three potential cooling methods for Irish data centres. These are air, hybrid, and immersion cooling techniques. This is the maximum useable heat that can be recovered from a data centre rack. This study is established under current operating conditions which are optimised for cooling performance, that air cooling has the highest potential RCEP of 0.39 kW/rack. This is approximately 8% of the input electrical power that can be captured as useable heat. Indicating that Irish data centres have the energy potential to be heat providers in the Irish economy. This study highlighted the technical and economic aspects of prevalent cooling techniques and determined air cooling heat recovery cost can be reduced to 0.01 €/kWhth using offsetting. This is financially competitive with current heating solutions in Ireland.
文摘Mitigating increasing cyberattack incidents may require strategies such as reinforcing organizations’ networks with Honeypots and effectively analyzing attack traffic for detection of zero-day attacks and vulnerabilities. To effectively detect and mitigate cyberattacks, both computerized and visual analyses are typically required. However, most security analysts are not adequately trained in visualization principles and/or methods, which is required for effective visual perception of useful attack information hidden in attack data. Additionally, Honeypot has proven useful in cyberattack research, but no studies have comprehensively investigated visualization practices in the field. In this paper, we reviewed visualization practices and methods commonly used in the discovery and communication of attack patterns based on Honeypot network traffic data. Using the PRISMA methodology, we identified and screened 218 papers and evaluated only 37 papers having a high impact. Most Honeypot papers conducted summary statistics of Honeypot data based on static data metrics such as IP address, port, and packet size. They visually analyzed Honeypot attack data using simple graphical methods (such as line, bar, and pie charts) that tend to hide useful attack information. Furthermore, only a few papers conducted extended attack analysis, and commonly visualized attack data using scatter and linear plots. Papers rarely included simple yet sophisticated graphical methods, such as box plots and histograms, which allow for critical evaluation of analysis results. While a significant number of automated visualization tools have incorporated visualization standards by default, the construction of effective and expressive graphical methods for easy pattern discovery and explainable insights still requires applied knowledge and skill of visualization principles and tools, and occasionally, an interdisciplinary collaboration with peers. We, therefore, suggest the need, going forward, for non-classical graphical methods for visualizing attack patterns and communicating analysis results. We also recommend training investigators in visualization principles and standards for effective visual perception and presentation.
基金This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(Grant Number 2020R1A6A1A03040583).
文摘Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods.
文摘Environmental systems including our atmosphere oceans, biological… etc. can be modeled by mathematical equations to estimate their states. These equations can be solved with numerical methods. Initial and boundary conditions are needed for such of these numerical methods. Predication and simulations for different case studies are major sources for the great importance of these models. Satellite data from different wide ranges of sensors provide observations that indicate system state. So both numerical models and satellite data provide estimation of system states, and between the different estimations it is required the best estimate for system state. Assimilation of observations in numerical weather models with data assimilation techniques provide an improved estimate of system states. In this work, highlights on the mathematical perspective for data assimilation methods are introduced. Least square estimation techniques are introduced because it is considered the basic mathematical building block for data assimilation methods. Stochastic version of least square is included to handle the error in both model and observation. Then the three and four dimensional variational assimilation 3dvar and 4dvar respectively will be handled. Kalman filters and its derivatives Extended, (KF, EKF, ENKF) and hybrid filters are introduced.
文摘The development of adaptation measures to climate change relies on data from climate models or impact models. In order to analyze these large data sets or an ensemble of these data sets, the use of statistical methods is required. In this paper, the methodological approach to collecting, structuring and publishing the methods, which have been used or developed by former or present adaptation initiatives, is described. The intention is to communicate achieved knowledge and thus support future users. A key component is the participation of users in the development process. Main elements of the approach are standardized, template-based descriptions of the methods including the specific applications, references, and method assessment. All contributions have been quality checked, sorted, and placed in a larger context. The result is a report on statistical methods which is freely available as printed or online version. Examples of how to use the methods are presented in this paper and are also included in the brochure.
基金supported by the National Natural Science Foundation of China (62272078)。
文摘With the ongoing advancements in sensor networks and data acquisition technologies across various systems like manufacturing,aviation,and healthcare,the data driven vibration control(DDVC)has attracted broad interests from both the industrial and academic communities.Input shaping(IS),as a simple and effective feedforward method,is greatly demanded in DDVC methods.It convolves the desired input command with impulse sequence without requiring parametric dynamics and the closed-loop system structure,thereby suppressing the residual vibration separately.Based on a thorough investigation into the state-of-the-art DDVC methods,this survey has made the following efforts:1)Introducing the IS theory and typical input shapers;2)Categorizing recent progress of DDVC methods;3)Summarizing commonly adopted metrics for DDVC;and 4)Discussing the engineering applications and future trends of DDVC.By doing so,this study provides a systematic and comprehensive overview of existing DDVC methods from designing to optimizing perspectives,aiming at promoting future research regarding this emerging and vital issue.
基金supported by the National Natural Science Foundation of China(22178190)。
文摘In response to the lack of reliable physical parameters in the process simulation of the butadiene extraction,a large amount of phase equilibrium data were collected in the context of the actual process of butadiene production by acetonitrile.The accuracy of five prediction methods,UNIFAC(UNIQUAC Functional-group Activity Coefficients),UNIFAC-LL,UNIFAC-LBY,UNIFAC-DMD and COSMO-RS,applied to the butadiene extraction process was verified using partial phase equilibrium data.The results showed that the UNIFAC-DMD method had the highest accuracy in predicting phase equilibrium data for the missing system.COSMO-RS-predicted multiple systems showed good accuracy,and a large number of missing phase equilibrium data were estimated using the UNIFAC-DMD method and COSMO-RS method.The predicted phase equilibrium data were checked for consistency.The NRTL-RK(non-Random Two Liquid-Redlich-Kwong Equation of State)and UNIQUAC thermodynamic models were used to correlate the phase equilibrium data.Industrial device simulations were used to verify the accuracy of the thermodynamic model applied to the butadiene extraction process.The simulation results showed that the average deviations of the simulated results using the correlated thermodynamic model from the actual values were less than 2%compared to that using the commercial simulation software,Aspen Plus and its database.The average deviation was much smaller than that of the simulations using the Aspen Plus database(>10%),indicating that the obtained phase equilibrium data are highly accurate and reliable.The best phase equilibrium data and thermodynamic model parameters for butadiene extraction are provided.This improves the accuracy and reliability of the design,optimization and control of the process,and provides a basis and guarantee for developing a more environmentally friendly and economical butadiene extraction process.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
基金supported by the National Natural Science Foundation of China(NSFC,grant Nos.42172323 and 12371454)。
文摘In source detection in the Tianlai project,locating the interferometric fringe in visibility data accurately will influence downstream tasks drastically,such as physical parameter estimation and weak source exploration.Considering that traditional locating methods are time-consuming and supervised methods require a great quantity of expensive labeled data,in this paper,we first investigate characteristics of interferometric fringes in the simulation and real scenario separately,and integrate an almost parameter-free unsupervised clustering method and seeding filling or eraser algorithm to propose a hierarchical plug and play method to improve location accuracy.Then,we apply our method to locate single and multiple sources’interferometric fringes in simulation data.Next,we apply our method to real data taken from the Tianlai radio telescope array.Finally,we compare with unsupervised methods that are state of the art.These results show that our method has robustness in different scenarios and can improve location measurement accuracy effectively.
文摘We study continuous data assimilation(CDA)applied to projection and penalty methods for the Navier-Stokes(NS)equations.Penalty and projection methods are more efficient than consistent Ns discretizations,however are less accurate due to modeling error(penalty)and splitting error(projection).We show analytically and numerically that with measurement data and properly chosen parameters,CDA can effectively remove these splitting and modeling errors and provide long time optimally accurate solutions.
基金The National Key Research and Development Program of China under contract Nos 2022YFC3104804,2021YFC3101501,and 2017YFC1404103the National Programme on Global Change and Air-Sea Interaction of China under contract No.GASI-IPOVAI-04the National Natural Science Foundation of China under contract Nos 41876014,41606039,and 11801402.
文摘An anisotropic diffusion filter can be used to model a flow-dependent background error covariance matrix,which can be achieved by solving the advection-diffusion equation.Because of the directionality of the advection term,the discrete method needs to be chosen very carefully.The finite analytic method is an alternative scheme to solve the advection-diffusion equation.As a combination of analytical and numerical methods,it not only has high calculation accuracy but also holds the characteristic of the auto upwind.To demonstrate its ability,the one-dimensional steady and unsteady advection-diffusion equation numerical examples are respectively solved by the finite analytic method.The more widely used upwind difference method is used as a control approach.The result indicates that the finite analytic method has higher accuracy than the upwind difference method.For the two-dimensional case,the finite analytic method still has a better performance.In the three-dimensional variational assimilation experiment,the finite analytic method can effectively improve analysis field accuracy,and its effect is significantly better than the upwind difference and the central difference method.Moreover,it is still a more effective solution method in the strong flow region where the advective-diffusion filter performs most prominently.
基金funded by the National Natural Science Foundation of China(NSFC)the Chinese Academy of Sciences(CAS)(grant No.U2031209)the National Natural Science Foundation of China(NSFC,grant Nos.11872128,42174192,and 91952111)。
文摘Seeing is an important index to evaluate the quality of an astronomical site.To estimate seeing at the Muztagh-Ata site with height and time quantitatively,the European Centre for Medium-Range Weather Forecasts reanalysis database(ERA5)is used.Seeing calculated from ERA5 is compared consistently with the Differential Image Motion Monitor seeing at the height of 12 m.Results show that seeing decays exponentially with height at the Muztagh-Ata site.Seeing decays the fastest in fall in 2021 and most slowly with height in summer.The seeing condition is better in fall than in summer.The median value of seeing at 12 m is 0.89 arcsec,the maximum value is1.21 arcsec in August and the minimum is 0.66 arcsec in October.The median value of seeing at 12 m is 0.72arcsec in the nighttime and 1.08 arcsec in the daytime.Seeing is a combination of annual and about biannual variations with the same phase as temperature and wind speed indicating that seeing variation with time is influenced by temperature and wind speed.The Richardson number Ri is used to analyze the atmospheric stability and the variations of seeing are consistent with Ri between layers.These quantitative results can provide an important reference for a telescopic observation strategy.
基金support from the National Natural Science Foundation of China(NSFC,grant Nos.11973022 and 12373108)the Natural Science Foundation of Guangdong Province(No.2020A1515010710)Hanshan Normal University Startup Foundation for Doctor Scientific Research(No.QD202129)。
文摘Pulsar detection has become an active research topic in radio astronomy recently.One of the essential procedures for pulsar detection is pulsar candidate sifting(PCS),a procedure for identifying potential pulsar signals in a survey.However,pulsar candidates are always class-imbalanced,as most candidates are non-pulsars such as RFI and only a tiny part of them are from real pulsars.Class imbalance can greatly affect the performance of machine learning(ML)models,resulting in a heavy cost as some real pulsars are misjudged.To deal with the problem,techniques of choosing relevant features to discriminate pulsars from non-pulsars are focused on,which is known as feature selection.Feature selection is a process of selecting a subset of the most relevant features from a feature pool.The distinguishing features between pulsars and non-pulsars can significantly improve the performance of the classifier even if the data are highly imbalanced.In this work,an algorithm for feature selection called the K-fold Relief-Greedy(KFRG)algorithm is designed.KFRG is a two-stage algorithm.In the first stage,it filters out some irrelevant features according to their K-fold Relief scores,while in the second stage,it removes the redundant features and selects the most relevant features by a forward greedy search strategy.Experiments on the data set of the High Time Resolution Universe survey verified that ML models based on KFRG are capable of PCS,correctly separating pulsars from non-pulsars even if the candidates are highly class-imbalanced.
基金Foundation of National Natural Science Foundation of China(62202118)Scientific and Technological Research Projects from Guizhou Education Department([2023]003)+1 种基金Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project(GCC[2023]018)Top Technology Talent Project from Guizhou Education Department([2022]073).
文摘The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.
基金Supported by National Key R&D Program of China(Grant Nos.2020YFB1709901,2020YFB1709904)National Natural Science Foundation of China(Grant Nos.51975495,51905460)+1 种基金Guangdong Provincial Basic and Applied Basic Research Foundation of China(Grant No.2021-A1515012286)Science and Technology Plan Project of Fuzhou City of China(Grant No.2022-P-022).
文摘The accurate estimation of parameters is the premise for establishing a high-fidelity simulation model of a valve-controlled cylinder system.Bench test data are easily obtained,but it is challenging to emulate actual loads in the research on parameter estimation of valve-controlled cylinder system.Despite the actual load information contained in the operating data of the control valve,its acquisition remains challenging.This paper proposes a method that fuses bench test and operating data for parameter estimation to address the aforementioned problems.The proposed method is based on Bayesian theory,and its core is a pool fusion of prior information from bench test and operating data.Firstly,a system model is established,and the parameters in the model are analysed.Secondly,the bench and operating data of the system are collected.Then,the model parameters and weight coefficients are estimated using the data fusion method.Finally,the estimated effects of the data fusion method,Bayesian method,and particle swarm optimisation(PSO)algorithm on system model parameters are compared.The research shows that the weight coefficient represents the contribution of different prior information to the parameter estimation result.The effect of parameter estimation based on the data fusion method is better than that of the Bayesian method and the PSO algorithm.Increasing load complexity leads to a decrease in model accuracy,highlighting the crucial role of the data fusion method in parameter estimation studies.
基金This research was funded by the Scientific Research Project of Leshan Normal University(No.2022SSDX002)the Scientific Plan Project of Leshan(No.22NZD012).
文摘Artificial immune detection can be used to detect network intrusions in an adaptive approach and proper matching methods can improve the accuracy of immune detection methods.This paper proposes an artificial immune detection model for network intrusion data based on a quantitative matching method.The proposed model defines the detection process by using network data and decimal values to express features and artificial immune mechanisms are simulated to define immune elements.Then,to improve the accuracy of similarity calculation,a quantitative matching method is proposed.The model uses mathematical methods to train and evolve immune elements,increasing the diversity of immune recognition and allowing for the successful detection of unknown intrusions.The proposed model’s objective is to accurately identify known intrusions and expand the identification of unknown intrusions through signature detection and immune detection,overcoming the disadvantages of traditional methods.The experiment results show that the proposed model can detect intrusions effectively.It has a detection rate of more than 99.6%on average and a false alarm rate of 0.0264%.It outperforms existing immune intrusion detection methods in terms of comprehensive detection performance.
基金funded by the National Natural Science Foundation of China(NSFC,Nos.12373086 and 12303082)CAS“Light of West China”Program+2 种基金Yunnan Revitalization Talent Support Program in Yunnan ProvinceNational Key R&D Program of ChinaGravitational Wave Detection Project No.2022YFC2203800。
文摘Attitude is one of the crucial parameters for space objects and plays a vital role in collision prediction and debris removal.Analyzing light curves to determine attitude is the most commonly used method.In photometric observations,outliers may exist in the obtained light curves due to various reasons.Therefore,preprocessing is required to remove these outliers to obtain high quality light curves.Through statistical analysis,the reasons leading to outliers can be categorized into two main types:first,the brightness of the object significantly increases due to the passage of a star nearby,referred to as“stellar contamination,”and second,the brightness markedly decreases due to cloudy cover,referred to as“cloudy contamination.”The traditional approach of manually inspecting images for contamination is time-consuming and labor-intensive.However,we propose the utilization of machine learning methods as a substitute.Convolutional Neural Networks and SVMs are employed to identify cases of stellar contamination and cloudy contamination,achieving F1 scores of 1.00 and 0.98 on a test set,respectively.We also explore other machine learning methods such as ResNet-18 and Light Gradient Boosting Machine,then conduct comparative analyses of the results.