Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantil...Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantile regression(QR)is highly competitive in terms of both flexibility and predictive performance.Nevertheless,a long-standing problem of QR is quantile crossing,which greatly limits the interpretability of QR-calibrated forecasts.On this point,this study proposes a non-crossing quantile regression neural network(NCQRNN),for calibrating ensemble NWP forecasts into a set of reliable quantile forecasts without crossing.The overarching design principle of NCQRNN is to add on top of the conventional QRNN structure another hidden layer,which imposes a non-decreasing mapping between the combined output from nodes of the last hidden layer to the nodes of the output layer,through a triangular weight matrix with positive entries.The empirical part of the work considers a solar irradiance case study,in which four years of ensemble irradiance forecasts at seven locations,issued by the European Centre for Medium-Range Weather Forecasts,are calibrated via NCQRNN,as well as via an eclectic mix of benchmarking models,ranging from the naïve climatology to the state-of-the-art deep-learning and other non-crossing models.Formal and stringent forecast verification suggests that the forecasts post-processed via NCQRNN attain the maximum sharpness subject to calibration,amongst all competitors.Furthermore,the proposed conception to resolve quantile crossing is remarkably simple yet general,and thus has broad applicability as it can be integrated with many shallow-and deep-learning-based neural networks.展开更多
This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their res...This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their respective marginal regression functions (RV-SIS). We show that, under some mild technical conditions, the RV-SIS possesses a sure screening property, which is defined by Fan and Lv (2008). Numerical comparisons suggest that RV-SIS has competitive performance compared to other screening procedures, and outperforms them in many different model settings.展开更多
Objective Previous studies on the association between lipid profiles and chronic kidney disease(CKD)have yielded inconsistent results and no defined thresholds for blood lipids.Methods A prospective cohort study inclu...Objective Previous studies on the association between lipid profiles and chronic kidney disease(CKD)have yielded inconsistent results and no defined thresholds for blood lipids.Methods A prospective cohort study including 32,351 subjects who completed baseline and follow-up surveys over 5 years was conducted.Restricted cubic splines and Cox models were used to examine the association between the lipid profiles and CKD.A regression discontinuity design was used to determine the cutoff value of lipid profiles that was significantly associated with increased the risk of CKD.Results Over a median follow-up time of 2.2(0.5,4.2)years,648(2.00%)subjects developed CKD.The lipid profiles that were significantly and linearly related to CKD included total cholesterol(TC),triglycerides(TG),high-density lipoprotein cholesterol(HDL-C),TC/HDL-C,and TG/HDL-C,whereas lowdensity lipoprotein cholesterol(LDL-C)and LDL-C/HDL-C were nonlinearly correlated with CKD.TC,TG,TC/HDL-C,and TG/HDL-C showed an upward jump at the cutoff value,increasing the risk of CKD by 0.90%,1.50%,2.30%,and 1.60%,respectively,whereas HDL-C showed a downward jump at the cutoff value,reducing this risk by 1.0%.Female and participants with dyslipidemia had a higher risk of CKD,while the cutoff values for the different characteristics of the population were different.Conclusion There was a significant association between lipid profiles and CKD in a prospective cohort from Northwest China,while TG,TC/HDL-C,and TG/HDL-C showed a stronger risk association.The specific cutoff values of lipid profiles may provide a clinical reference for screening or diagnosing CKD risk.展开更多
Concentrate copper grade(CCG)is one of the important production indicators of copper flotation processes,and keeping the CCG at the set value is of great significance to the economic benefit of copper flotation indust...Concentrate copper grade(CCG)is one of the important production indicators of copper flotation processes,and keeping the CCG at the set value is of great significance to the economic benefit of copper flotation industrial processes.This paper addresses the fluctuation problem of CCG through an operational optimization method.Firstly,a density-based affinity propagationalgorithm is proposed so that more ideal working condition categories can be obtained for the complex raw ore properties.Next,a Bayesian network(BN)is applied to explore the relationship between the operational variables and the CCG.Based on the analysis results of BN,a weighted Gaussian process regression model is constructed to predict the CCG that a higher prediction accuracy can be obtained.To ensure the predicted CCG is close to the set value with a smaller magnitude of the operation adjustments and a smaller uncertainty of the prediction results,an index-oriented adaptive differential evolution(IOADE)algorithm is proposed,and the convergence performance of IOADE is superior to the traditional differential evolution and adaptive differential evolution methods.Finally,the effectiveness and feasibility of the proposed methods are verified by the experiments on a copper flotation industrial process.展开更多
This study aims to predict the undrained shear strength of remolded soil samples using non-linear regression analyses,fuzzy logic,and artificial neural network modeling.A total of 1306 undrained shear strength results...This study aims to predict the undrained shear strength of remolded soil samples using non-linear regression analyses,fuzzy logic,and artificial neural network modeling.A total of 1306 undrained shear strength results from 230 different remolded soil test settings reported in 21 publications were collected,utilizing six different measurement devices.Although water content,plastic limit,and liquid limit were used as input parameters for fuzzy logic and artificial neural network modeling,liquidity index or water content ratio was considered as an input parameter for non-linear regression analyses.In non-linear regression analyses,12 different regression equations were derived for the prediction of undrained shear strength of remolded soil.Feed-Forward backpropagation and the TANSIG transfer function were used for artificial neural network modeling,while the Mamdani inference system was preferred with trapezoidal and triangular membership functions for fuzzy logic modeling.The experimental results of 914 tests were used for training of the artificial neural network models,196 for validation and 196 for testing.It was observed that the accuracy of the artificial neural network and fuzzy logic modeling was higher than that of the non-linear regression analyses.Furthermore,a simple and reliable regression equation was proposed for assessments of undrained shear strength values with higher coefficients of determination.展开更多
The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate ...The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate estimation of cropland burned area is both crucial and challenging,especially for the small and fragmented burned scars in China.Here we developed an automated burned area mapping algorithm that was implemented using Sentinel-2 Multi Spectral Instrument(MSI)data and its effectiveness was tested taking Songnen Plain,Northeast China as a case using satellite image of 2020.We employed a logistic regression method for integrating multiple spectral data into a synthetic indicator,and compared the results with manually interpreted burned area reference maps and the Moderate-Resolution Imaging Spectroradiometer(MODIS)MCD64A1 burned area product.The overall accuracy of the single variable logistic regression was 77.38%to 86.90%and 73.47%to 97.14%for the 52TCQ and 51TYM cases,respectively.In comparison,the accuracy of the burned area map was improved to 87.14%and 98.33%for the 52TCQ and 51TYM cases,respectively by multiple variable logistic regression of Sentind-2 images.The balance of omission error and commission error was also improved.The integration of multiple spectral data combined with a logistic regression method proves to be effective for burned area detection,offering a highly automated process with an automatic threshold determination mechanism.This method exhibits excellent extensibility and flexibility taking the image tile as the operating unit.It is suitable for burned area detection at a regional scale and can also be implemented with other satellite data.展开更多
Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. ...Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R<sup>2</sup> values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.展开更多
In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), ob...In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.展开更多
This study aims to analyze and predict the relationship between the average price per box in the cigarette market of City A and government procurement,providing a scientific basis and support for decision-making.By re...This study aims to analyze and predict the relationship between the average price per box in the cigarette market of City A and government procurement,providing a scientific basis and support for decision-making.By reviewing relevant theories and literature,qualitative prediction methods,regression prediction models,and other related theories were explored.Through the analysis of annual cigarette sales data and government procurement data in City A,a comprehensive understanding of the development of the tobacco industry and the economic trends of tobacco companies in the county was obtained.By predicting and analyzing the average price per box of cigarette sales across different years,corresponding prediction results were derived and compared with actual sales data.The prediction results indicate that the correlation coefficient between the average price per box of cigarette sales and government procurement is 0.982,implying that government procurement accounts for 96.4%of the changes in the average price per box of cigarettes.These findings offer an in-depth exploration of the relationship between the average price per box of cigarettes in City A and government procurement,providing a scientific foundation for corporate decision-making and market operations.展开更多
The layered pavements usually exhibit complicated mechanical properties with the effect of complex material properties under external environment.In some cases,such as launching missiles or rockets,layered pavements a...The layered pavements usually exhibit complicated mechanical properties with the effect of complex material properties under external environment.In some cases,such as launching missiles or rockets,layered pavements are required to bear large impulse load.However,traditional methods cannot non-destructively and quickly detect the internal structural of pavements.Thus,accurate and fast prediction of the mechanical properties of layered pavements is of great importance and necessity.In recent years,machine learning has shown great superiority in solving nonlinear problems.In this work,we present a method of predicting the maximum deflection and damage factor of layered pavements under instantaneous large impact based on random forest regression with the deflection basin parameters obtained from falling weight deflection testing.The regression coefficient R^(2)of testing datasets are above 0.94 in the process of predicting the elastic moduli of structural layers and mechanical responses,which indicates that the prediction results have great consistency with finite element simulation results.This paper provides a novel method for fast and accurate prediction of pavement mechanical responses under instantaneous large impact load using partial structural parameters of pavements,and has application potential in non-destructive evaluation of pavement structure.展开更多
Objective:To investigate the trend of mortality by COVID-19 before and after the national vaccination program using joinpoint regression analysis from 19 February 2020 to 5 September 2022.Methods:In the present study,...Objective:To investigate the trend of mortality by COVID-19 before and after the national vaccination program using joinpoint regression analysis from 19 February 2020 to 5 September 2022.Methods:In the present study,a joinpoint regression analysis of monthly collected data on confirmed deaths of COVID-19 in Iran from February 19,2020 to September 5,2022 was performed.Results:After national vaccination in Iran,the trend of new monthly deaths due to COVID-19 was decreasing.The percentage of monthly changes from the beginning of the pandemic to the 19th month was 6.62%(95%CI:1.1,12.4),which had an increasing trend.From the 19th month to the end of the 31st month,the mortality trend was decreasing,and the percentage of monthly changes was-20.05%(95%CI:-8.3,-30.3)(P=0.002).The average percentage of monthly changes was-5%with a 95%CI of(-10.5,0.9).Conclusions:Along with other health measures,such as quarantine,wearing a mask,hand washing,social distancing,etc.,national vaccination significantly reduces the mortality rate of COVID-19.展开更多
Machine Learning(ML)has changed clinical diagnostic procedures drastically.Especially in Cardiovascular Diseases(CVD),the use of ML is indispensable to reducing human errors.Enormous studies focused on disease predict...Machine Learning(ML)has changed clinical diagnostic procedures drastically.Especially in Cardiovascular Diseases(CVD),the use of ML is indispensable to reducing human errors.Enormous studies focused on disease prediction but depending on multiple parameters,further investigations are required to upgrade the clinical procedures.Multi-layered implementation of ML also called Deep Learning(DL)has unfolded new horizons in the field of clinical diagnostics.DL formulates reliable accuracy with big datasets but the reverse is the case with small datasets.This paper proposed a novel method that deals with the issue of less data dimensionality.Inspired by the regression analysis,the proposed method classifies the data by going through three different stages.In the first stage,feature representation is converted into probabilities using multiple regression techniques,the second stage grasps the probability conclusions from the previous stage and the third stage fabricates the final classifications.Extensive experiments were carried out on the Cleveland heart disease dataset.The results show significant improvement in classification accuracy.It is evident from the comparative results of the paper that the prevailing statistical ML methods are no more stagnant disease prediction techniques in demand in the future.展开更多
Identification of the ice channel is the basic technology for developing intelligent ships in ice-covered waters,which is important to ensure the safety and economy of navigation.In the Arctic,merchant ships with low ...Identification of the ice channel is the basic technology for developing intelligent ships in ice-covered waters,which is important to ensure the safety and economy of navigation.In the Arctic,merchant ships with low ice class often navigate in channels opened up by icebreakers.Navigation in the ice channel often depends on good maneuverability skills and abundant experience from the captain to a large extent.The ship may get stuck if steered into ice fields off the channel.Under this circumstance,it is very important to study how to identify the boundary lines of ice channels with a reliable method.In this paper,a two-staged ice channel identification method is developed based on image segmentation and corner point regression.The first stage employs the image segmentation method to extract channel regions.In the second stage,an intelligent corner regression network is proposed to extract the channel boundary lines from the channel region.A non-intelligent angle-based filtering and clustering method is proposed and compared with corner point regression network.The training and evaluation of the segmentation method and corner regression network are carried out on the synthetic and real ice channel dataset.The evaluation results show that the accuracy of the method using the corner point regression network in the second stage is achieved as high as 73.33%on the synthetic ice channel dataset and 70.66%on the real ice channel dataset,and the processing speed can reach up to 14.58frames per second.展开更多
In this paper,we define the curve rλ=r+λd at a constant distance from the edge of regression on a curve r(s)with arc length parameter s in Galilean 3-space.Here,d is a non-isotropic or isotropic vector defined as a ...In this paper,we define the curve rλ=r+λd at a constant distance from the edge of regression on a curve r(s)with arc length parameter s in Galilean 3-space.Here,d is a non-isotropic or isotropic vector defined as a vector tightly fastened to Frenet trihedron of the curve r(s)in 3-dimensional Galilean space.We build the Frenet frame{Tλ,Nλ,Bλ}of the constructed curve rλwith respect to two types of the vector d and we indicate the properties related to the curvatures of the curve rλ.Also,for the curve rλ,we give the conditions to be a circular helix.Furthermore,we discuss ruled surfaces of type A generated via the curve rλand the vector D which is defined as tangent of the curve rλin 3-dimensional Galilean space.The constructed ruled surfaces also appear in two ways.The first is constructed with the curve rλ(s)=r(s)+λT(s)and the non-isotropic vector D.The second is formed by the curve rλ=r(s)+λ2N+λ3B and the non-isotropic vector D.We calculate the distribution parameters of the constructed ruled surfaces and we show that the ruled surfaces are developable.Finally,we provide examples and visuals to back up our research.展开更多
The development of prediction supports is a critical step in information systems engineering in this era defined by the knowledge economy, the hub of which is big data. Currently, the lack of a predictive model, wheth...The development of prediction supports is a critical step in information systems engineering in this era defined by the knowledge economy, the hub of which is big data. Currently, the lack of a predictive model, whether qualitative or quantitative, depending on a company’s areas of intervention can handicap or weaken its competitive capacities, endangering its survival. In terms of quantitative prediction, depending on the efficacy criteria, a variety of methods and/or tools are available. The multiple linear regression method is one of the methods used for this purpose. A linear regression model is a regression model of an explained variable on one or more explanatory variables in which the function that links the explanatory variables to the explained variable has linear parameters. The purpose of this work is to demonstrate how to use multiple linear regressions, which is one aspect of decisional mathematics. The use of multiple linear regressions on random data, which can be replaced by real data collected by or from organizations, provides decision makers with reliable data knowledge. As a result, machine learning methods can provide decision makers with relevant and trustworthy data. The main goal of this article is therefore to define the objective function on which the influencing factors for its optimization will be defined using the linear regression method.展开更多
Sea fog is a disastrous weather phenomenon,posing a risk to the safety of maritime transportation.Dense sea fogs reduce visibility at sea and have frequently caused ship collisions.This study used a geographically wei...Sea fog is a disastrous weather phenomenon,posing a risk to the safety of maritime transportation.Dense sea fogs reduce visibility at sea and have frequently caused ship collisions.This study used a geographically weighted regression(GWR)model to explore the spatial non-stationarity of near-miss collision risk,as detected by a vessel conflict ranking operator(VCRO)model from automatic identification system(AIS)data under the influence of sea fog in the Bohai Sea.Sea fog was identified by a machine learning method that was derived from Himawari-8 satellite data.The spatial distributions of near-miss collision risk,sea fog,and the parameters of GWR were mapped.The results showed that sea fog and near-miss collision risk have specific spatial distribution patterns in the Bohai Sea,in which near-miss collision risk in the fog season is significantly higher than that outside the fog season,especially in the northeast(the sea area near Yingkou Port and Bayuquan Port)and the southeast(the sea area near Yantai Port).GWR outputs further indicated a significant correlation between near-miss collision risk and sea fog in fog season,with higher R-squared(0.890 in fog season,2018),than outside the fog season(0.723 in non-fog season,2018).GWR results revealed spatial non-stationarity in the relationships between-near miss collision risk and sea fog and that the significance of these relationships varied locally.Dividing the specific navigation area made it possible to verify that sea fog has a positive impact on near-miss collision risk.展开更多
Evaluation of calligraphic copy is the core of Chinese calligraphy appreciation and in-heritance.However,previous aesthetic evaluation studies often focussed on photos and paintings,with few attempts on Chinese callig...Evaluation of calligraphic copy is the core of Chinese calligraphy appreciation and in-heritance.However,previous aesthetic evaluation studies often focussed on photos and paintings,with few attempts on Chinese calligraphy.To solve this problem,a Siamese regression aesthetic fusion method is proposed,named SRAFE,for Chinese calligraphy based on the combination of calligraphy aesthetics and deep learning.First,a dataset termed Evaluated Chinese Calligraphy Copies(E3C)is constructed for aesthetic evalu-ation.Second,12 hand‐crafted aesthetic features based on the shape,structure,and stroke of calligraphy are designed.Then,the Siamese regression network(SRN)is designed to extract the deep aesthetic representation of calligraphy.Finally,the SRAFE method is built by fusing the deep aesthetic features with the hand‐crafted aesthetic features.Experimental results show that scores given by SRAFE are similar to the aesthetic evaluation label of E3C,proving the effectiveness of the authors’method.展开更多
In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived ...In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.展开更多
Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for rep...Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.展开更多
China’s low-carbon development path will make significant contributions to achieving global sustainable development goals.Due to the diverse natural and economic conditions across different regions in China,there exi...China’s low-carbon development path will make significant contributions to achieving global sustainable development goals.Due to the diverse natural and economic conditions across different regions in China,there exists an imbalance in the distribution of car-bon emissions.Therefore,regional cooperation serves as an effective means to attain low-carbon development.This study examined the pattern of carbon emissions and proposed a potential joint emission reduction strategy by utilizing the industrial carbon emission intens-ity(ICEI)as a crucial factor.We utilized social network analysis and Local Indicators of Spatial Association(LISA)space-time trans-ition matrix to investigate the spatiotemporal connections and discrepancies of ICEI in the cities of the Pearl River Basin(PRB),China from 2010 to 2020.The primary drivers of the ICEI were determined through geographical detectors and multi-scale geographically weighted regression.The results were as follows:1)the overall ICEI in the Pearl River Basin is showing a downward trend,and there is a significant spatial imbalance.2)There are numerous network connections between cities regarding the ICEI,but the network structure is relatively fragile and unstable.3)Economically developed cities such as Guangzhou,Foshan,and Dongguan are in the center of the network while playing an intermediary role.4)Energy consumption,industrialization,per capita GDP,urbanization,science and techno-logy,and productivity are found to be the most influential variables in the spatial differentiation of ICEI,and their combination in-creased the explanatory power of the geographic variation of ICEI.Finally,through the analysis of differences and connections in urban carbon emissions under different economic levels and ICEI,the study suggests joint carbon reduction strategies,which are centered on carbon transfer,financial support,and technological assistance among cities.展开更多
基金supported by the National Natural Science Foundation of China (Project No.42375192)the China Meteorological Administration Climate Change Special Program (CMA-CCSP+1 种基金Project No.QBZ202315)support by the Vector Stiftung through the Young Investigator Group"Artificial Intelligence for Probabilistic Weather Forecasting."
文摘Despite the maturity of ensemble numerical weather prediction(NWP),the resulting forecasts are still,more often than not,under-dispersed.As such,forecast calibration tools have become popular.Among those tools,quantile regression(QR)is highly competitive in terms of both flexibility and predictive performance.Nevertheless,a long-standing problem of QR is quantile crossing,which greatly limits the interpretability of QR-calibrated forecasts.On this point,this study proposes a non-crossing quantile regression neural network(NCQRNN),for calibrating ensemble NWP forecasts into a set of reliable quantile forecasts without crossing.The overarching design principle of NCQRNN is to add on top of the conventional QRNN structure another hidden layer,which imposes a non-decreasing mapping between the combined output from nodes of the last hidden layer to the nodes of the output layer,through a triangular weight matrix with positive entries.The empirical part of the work considers a solar irradiance case study,in which four years of ensemble irradiance forecasts at seven locations,issued by the European Centre for Medium-Range Weather Forecasts,are calibrated via NCQRNN,as well as via an eclectic mix of benchmarking models,ranging from the naïve climatology to the state-of-the-art deep-learning and other non-crossing models.Formal and stringent forecast verification suggests that the forecasts post-processed via NCQRNN attain the maximum sharpness subject to calibration,amongst all competitors.Furthermore,the proposed conception to resolve quantile crossing is remarkably simple yet general,and thus has broad applicability as it can be integrated with many shallow-and deep-learning-based neural networks.
文摘This article develops a procedure for screening variables, in ultra high-di- mensional settings, based on their predictive significance. This is achieved by ranking the variables according to the variance of their respective marginal regression functions (RV-SIS). We show that, under some mild technical conditions, the RV-SIS possesses a sure screening property, which is defined by Fan and Lv (2008). Numerical comparisons suggest that RV-SIS has competitive performance compared to other screening procedures, and outperforms them in many different model settings.
基金supported by the Municipal Science and Technology Program of Wuwei City,China(WW2202RPZ037)the Fundamental Research Funds for the Central Universities in China(Grant No.lzujbky-2018-69).
文摘Objective Previous studies on the association between lipid profiles and chronic kidney disease(CKD)have yielded inconsistent results and no defined thresholds for blood lipids.Methods A prospective cohort study including 32,351 subjects who completed baseline and follow-up surveys over 5 years was conducted.Restricted cubic splines and Cox models were used to examine the association between the lipid profiles and CKD.A regression discontinuity design was used to determine the cutoff value of lipid profiles that was significantly associated with increased the risk of CKD.Results Over a median follow-up time of 2.2(0.5,4.2)years,648(2.00%)subjects developed CKD.The lipid profiles that were significantly and linearly related to CKD included total cholesterol(TC),triglycerides(TG),high-density lipoprotein cholesterol(HDL-C),TC/HDL-C,and TG/HDL-C,whereas lowdensity lipoprotein cholesterol(LDL-C)and LDL-C/HDL-C were nonlinearly correlated with CKD.TC,TG,TC/HDL-C,and TG/HDL-C showed an upward jump at the cutoff value,increasing the risk of CKD by 0.90%,1.50%,2.30%,and 1.60%,respectively,whereas HDL-C showed a downward jump at the cutoff value,reducing this risk by 1.0%.Female and participants with dyslipidemia had a higher risk of CKD,while the cutoff values for the different characteristics of the population were different.Conclusion There was a significant association between lipid profiles and CKD in a prospective cohort from Northwest China,while TG,TC/HDL-C,and TG/HDL-C showed a stronger risk association.The specific cutoff values of lipid profiles may provide a clinical reference for screening or diagnosing CKD risk.
基金supported in part by the National Key Research and Development Program of China(2021YFC2902703)the National Natural Science Foundation of China(62173078,61773105,61533007,61873049,61873053,61703085,61374147)。
文摘Concentrate copper grade(CCG)is one of the important production indicators of copper flotation processes,and keeping the CCG at the set value is of great significance to the economic benefit of copper flotation industrial processes.This paper addresses the fluctuation problem of CCG through an operational optimization method.Firstly,a density-based affinity propagationalgorithm is proposed so that more ideal working condition categories can be obtained for the complex raw ore properties.Next,a Bayesian network(BN)is applied to explore the relationship between the operational variables and the CCG.Based on the analysis results of BN,a weighted Gaussian process regression model is constructed to predict the CCG that a higher prediction accuracy can be obtained.To ensure the predicted CCG is close to the set value with a smaller magnitude of the operation adjustments and a smaller uncertainty of the prediction results,an index-oriented adaptive differential evolution(IOADE)algorithm is proposed,and the convergence performance of IOADE is superior to the traditional differential evolution and adaptive differential evolution methods.Finally,the effectiveness and feasibility of the proposed methods are verified by the experiments on a copper flotation industrial process.
文摘This study aims to predict the undrained shear strength of remolded soil samples using non-linear regression analyses,fuzzy logic,and artificial neural network modeling.A total of 1306 undrained shear strength results from 230 different remolded soil test settings reported in 21 publications were collected,utilizing six different measurement devices.Although water content,plastic limit,and liquid limit were used as input parameters for fuzzy logic and artificial neural network modeling,liquidity index or water content ratio was considered as an input parameter for non-linear regression analyses.In non-linear regression analyses,12 different regression equations were derived for the prediction of undrained shear strength of remolded soil.Feed-Forward backpropagation and the TANSIG transfer function were used for artificial neural network modeling,while the Mamdani inference system was preferred with trapezoidal and triangular membership functions for fuzzy logic modeling.The experimental results of 914 tests were used for training of the artificial neural network models,196 for validation and 196 for testing.It was observed that the accuracy of the artificial neural network and fuzzy logic modeling was higher than that of the non-linear regression analyses.Furthermore,a simple and reliable regression equation was proposed for assessments of undrained shear strength values with higher coefficients of determination.
基金Under the auspices of National Natural Science Foundation of China(No.42101414)Natural Science Found for Outstanding Young Scholars in Jilin Province(No.20230508106RC)。
文摘The burning of crop residues in fields is a significant global biomass burning activity which is a key element of the terrestrial carbon cycle,and an important source of atmospheric trace gasses and aerosols.Accurate estimation of cropland burned area is both crucial and challenging,especially for the small and fragmented burned scars in China.Here we developed an automated burned area mapping algorithm that was implemented using Sentinel-2 Multi Spectral Instrument(MSI)data and its effectiveness was tested taking Songnen Plain,Northeast China as a case using satellite image of 2020.We employed a logistic regression method for integrating multiple spectral data into a synthetic indicator,and compared the results with manually interpreted burned area reference maps and the Moderate-Resolution Imaging Spectroradiometer(MODIS)MCD64A1 burned area product.The overall accuracy of the single variable logistic regression was 77.38%to 86.90%and 73.47%to 97.14%for the 52TCQ and 51TYM cases,respectively.In comparison,the accuracy of the burned area map was improved to 87.14%and 98.33%for the 52TCQ and 51TYM cases,respectively by multiple variable logistic regression of Sentind-2 images.The balance of omission error and commission error was also improved.The integration of multiple spectral data combined with a logistic regression method proves to be effective for burned area detection,offering a highly automated process with an automatic threshold determination mechanism.This method exhibits excellent extensibility and flexibility taking the image tile as the operating unit.It is suitable for burned area detection at a regional scale and can also be implemented with other satellite data.
文摘Efficient water quality monitoring and ensuring the safety of drinking water by government agencies in areas where the resource is constantly depleted due to anthropogenic or natural factors cannot be overemphasized. The above statement holds for West Texas, Midland, and Odessa Precisely. Two machine learning regression algorithms (Random Forest and XGBoost) were employed to develop models for the prediction of total dissolved solids (TDS) and sodium absorption ratio (SAR) for efficient water quality monitoring of two vital aquifers: Edward-Trinity (plateau), and Ogallala aquifers. These two aquifers have contributed immensely to providing water for different uses ranging from domestic, agricultural, industrial, etc. The data was obtained from the Texas Water Development Board (TWDB). The XGBoost and Random Forest models used in this study gave an accurate prediction of observed data (TDS and SAR) for both the Edward-Trinity (plateau) and Ogallala aquifers with the R<sup>2</sup> values consistently greater than 0.83. The Random Forest model gave a better prediction of TDS and SAR concentration with an average R, MAE, RMSE and MSE of 0.977, 0.015, 0.029 and 0.00, respectively. For the XGBoost, an average R, MAE, RMSE, and MSE of 0.953, 0.016, 0.037 and 0.00, respectively, were achieved. The overall performance of the models produced was impressive. From this study, we can clearly understand that Random Forest and XGBoost are appropriate for water quality prediction and monitoring in an area of high hydrocarbon activities like Midland and Odessa and West Texas at large.
文摘In this paper, a logistical regression statistical analysis (LR) is presented for a set of variables used in experimental measurements in reversed field pinch (RFP) machines, commonly known as “slinky mode” (SM), observed to travel around the torus in Madison Symmetric Torus (MST). The LR analysis is used to utilize the modified Sine-Gordon dynamic equation model to predict with high confidence whether the slinky mode will lock or not lock when compared to the experimentally measured motion of the slinky mode. It is observed that under certain conditions, the slinky mode “locks” at or near the intersection of poloidal and/or toroidal gaps in MST. However, locked mode cease to travel around the torus;while unlocked mode keeps traveling without a change in the energy, making it hard to determine an exact set of conditions to predict locking/unlocking behaviour. The significant key model parameters determined by LR analysis are shown to improve the Sine-Gordon model’s ability to determine the locking/unlocking of magnetohydrodyamic (MHD) modes. The LR analysis of measured variables provides high confidence in anticipating locking versus unlocking of slinky mode proven by relational comparisons between simulations and the experimentally measured motion of the slinky mode in MST.
基金National Social Science Fund Project“Research on the Operational Risks and Prevention of Government Procurement of Community Services Project System”(Project No.21CSH018)Research and Application of SDM Cigarette Supply Strategy Based on Consumer Data Analysis(Project No.2023ASXM07)。
文摘This study aims to analyze and predict the relationship between the average price per box in the cigarette market of City A and government procurement,providing a scientific basis and support for decision-making.By reviewing relevant theories and literature,qualitative prediction methods,regression prediction models,and other related theories were explored.Through the analysis of annual cigarette sales data and government procurement data in City A,a comprehensive understanding of the development of the tobacco industry and the economic trends of tobacco companies in the county was obtained.By predicting and analyzing the average price per box of cigarette sales across different years,corresponding prediction results were derived and compared with actual sales data.The prediction results indicate that the correlation coefficient between the average price per box of cigarette sales and government procurement is 0.982,implying that government procurement accounts for 96.4%of the changes in the average price per box of cigarettes.These findings offer an in-depth exploration of the relationship between the average price per box of cigarettes in City A and government procurement,providing a scientific foundation for corporate decision-making and market operations.
基金Project supported in part by the National Natural Science Foundation of China(Grant No.12075168)the Fund from the Science and Technology Commission of Shanghai Municipality(Grant No.21JC1405600)。
文摘The layered pavements usually exhibit complicated mechanical properties with the effect of complex material properties under external environment.In some cases,such as launching missiles or rockets,layered pavements are required to bear large impulse load.However,traditional methods cannot non-destructively and quickly detect the internal structural of pavements.Thus,accurate and fast prediction of the mechanical properties of layered pavements is of great importance and necessity.In recent years,machine learning has shown great superiority in solving nonlinear problems.In this work,we present a method of predicting the maximum deflection and damage factor of layered pavements under instantaneous large impact based on random forest regression with the deflection basin parameters obtained from falling weight deflection testing.The regression coefficient R^(2)of testing datasets are above 0.94 in the process of predicting the elastic moduli of structural layers and mechanical responses,which indicates that the prediction results have great consistency with finite element simulation results.This paper provides a novel method for fast and accurate prediction of pavement mechanical responses under instantaneous large impact load using partial structural parameters of pavements,and has application potential in non-destructive evaluation of pavement structure.
文摘Objective:To investigate the trend of mortality by COVID-19 before and after the national vaccination program using joinpoint regression analysis from 19 February 2020 to 5 September 2022.Methods:In the present study,a joinpoint regression analysis of monthly collected data on confirmed deaths of COVID-19 in Iran from February 19,2020 to September 5,2022 was performed.Results:After national vaccination in Iran,the trend of new monthly deaths due to COVID-19 was decreasing.The percentage of monthly changes from the beginning of the pandemic to the 19th month was 6.62%(95%CI:1.1,12.4),which had an increasing trend.From the 19th month to the end of the 31st month,the mortality trend was decreasing,and the percentage of monthly changes was-20.05%(95%CI:-8.3,-30.3)(P=0.002).The average percentage of monthly changes was-5%with a 95%CI of(-10.5,0.9).Conclusions:Along with other health measures,such as quarantine,wearing a mask,hand washing,social distancing,etc.,national vaccination significantly reduces the mortality rate of COVID-19.
文摘Machine Learning(ML)has changed clinical diagnostic procedures drastically.Especially in Cardiovascular Diseases(CVD),the use of ML is indispensable to reducing human errors.Enormous studies focused on disease prediction but depending on multiple parameters,further investigations are required to upgrade the clinical procedures.Multi-layered implementation of ML also called Deep Learning(DL)has unfolded new horizons in the field of clinical diagnostics.DL formulates reliable accuracy with big datasets but the reverse is the case with small datasets.This paper proposed a novel method that deals with the issue of less data dimensionality.Inspired by the regression analysis,the proposed method classifies the data by going through three different stages.In the first stage,feature representation is converted into probabilities using multiple regression techniques,the second stage grasps the probability conclusions from the previous stage and the third stage fabricates the final classifications.Extensive experiments were carried out on the Cleveland heart disease dataset.The results show significant improvement in classification accuracy.It is evident from the comparative results of the paper that the prevailing statistical ML methods are no more stagnant disease prediction techniques in demand in the future.
基金financially supported by the National Key Research and Development Program(Grant No.2022YFE0107000)the General Projects of the National Natural Science Foundation of China(Grant No.52171259)the High-Tech Ship Research Project of the Ministry of Industry and Information Technology(Grant No.[2021]342)。
文摘Identification of the ice channel is the basic technology for developing intelligent ships in ice-covered waters,which is important to ensure the safety and economy of navigation.In the Arctic,merchant ships with low ice class often navigate in channels opened up by icebreakers.Navigation in the ice channel often depends on good maneuverability skills and abundant experience from the captain to a large extent.The ship may get stuck if steered into ice fields off the channel.Under this circumstance,it is very important to study how to identify the boundary lines of ice channels with a reliable method.In this paper,a two-staged ice channel identification method is developed based on image segmentation and corner point regression.The first stage employs the image segmentation method to extract channel regions.In the second stage,an intelligent corner regression network is proposed to extract the channel boundary lines from the channel region.A non-intelligent angle-based filtering and clustering method is proposed and compared with corner point regression network.The training and evaluation of the segmentation method and corner regression network are carried out on the synthetic and real ice channel dataset.The evaluation results show that the accuracy of the method using the corner point regression network in the second stage is achieved as high as 73.33%on the synthetic ice channel dataset and 70.66%on the real ice channel dataset,and the processing speed can reach up to 14.58frames per second.
文摘In this paper,we define the curve rλ=r+λd at a constant distance from the edge of regression on a curve r(s)with arc length parameter s in Galilean 3-space.Here,d is a non-isotropic or isotropic vector defined as a vector tightly fastened to Frenet trihedron of the curve r(s)in 3-dimensional Galilean space.We build the Frenet frame{Tλ,Nλ,Bλ}of the constructed curve rλwith respect to two types of the vector d and we indicate the properties related to the curvatures of the curve rλ.Also,for the curve rλ,we give the conditions to be a circular helix.Furthermore,we discuss ruled surfaces of type A generated via the curve rλand the vector D which is defined as tangent of the curve rλin 3-dimensional Galilean space.The constructed ruled surfaces also appear in two ways.The first is constructed with the curve rλ(s)=r(s)+λT(s)and the non-isotropic vector D.The second is formed by the curve rλ=r(s)+λ2N+λ3B and the non-isotropic vector D.We calculate the distribution parameters of the constructed ruled surfaces and we show that the ruled surfaces are developable.Finally,we provide examples and visuals to back up our research.
文摘The development of prediction supports is a critical step in information systems engineering in this era defined by the knowledge economy, the hub of which is big data. Currently, the lack of a predictive model, whether qualitative or quantitative, depending on a company’s areas of intervention can handicap or weaken its competitive capacities, endangering its survival. In terms of quantitative prediction, depending on the efficacy criteria, a variety of methods and/or tools are available. The multiple linear regression method is one of the methods used for this purpose. A linear regression model is a regression model of an explained variable on one or more explanatory variables in which the function that links the explanatory variables to the explained variable has linear parameters. The purpose of this work is to demonstrate how to use multiple linear regressions, which is one aspect of decisional mathematics. The use of multiple linear regressions on random data, which can be replaced by real data collected by or from organizations, provides decision makers with reliable data knowledge. As a result, machine learning methods can provide decision makers with relevant and trustworthy data. The main goal of this article is therefore to define the objective function on which the influencing factors for its optimization will be defined using the linear regression method.
文摘Sea fog is a disastrous weather phenomenon,posing a risk to the safety of maritime transportation.Dense sea fogs reduce visibility at sea and have frequently caused ship collisions.This study used a geographically weighted regression(GWR)model to explore the spatial non-stationarity of near-miss collision risk,as detected by a vessel conflict ranking operator(VCRO)model from automatic identification system(AIS)data under the influence of sea fog in the Bohai Sea.Sea fog was identified by a machine learning method that was derived from Himawari-8 satellite data.The spatial distributions of near-miss collision risk,sea fog,and the parameters of GWR were mapped.The results showed that sea fog and near-miss collision risk have specific spatial distribution patterns in the Bohai Sea,in which near-miss collision risk in the fog season is significantly higher than that outside the fog season,especially in the northeast(the sea area near Yingkou Port and Bayuquan Port)and the southeast(the sea area near Yantai Port).GWR outputs further indicated a significant correlation between near-miss collision risk and sea fog in fog season,with higher R-squared(0.890 in fog season,2018),than outside the fog season(0.723 in non-fog season,2018).GWR results revealed spatial non-stationarity in the relationships between-near miss collision risk and sea fog and that the significance of these relationships varied locally.Dividing the specific navigation area made it possible to verify that sea fog has a positive impact on near-miss collision risk.
文摘Evaluation of calligraphic copy is the core of Chinese calligraphy appreciation and in-heritance.However,previous aesthetic evaluation studies often focussed on photos and paintings,with few attempts on Chinese calligraphy.To solve this problem,a Siamese regression aesthetic fusion method is proposed,named SRAFE,for Chinese calligraphy based on the combination of calligraphy aesthetics and deep learning.First,a dataset termed Evaluated Chinese Calligraphy Copies(E3C)is constructed for aesthetic evalu-ation.Second,12 hand‐crafted aesthetic features based on the shape,structure,and stroke of calligraphy are designed.Then,the Siamese regression network(SRN)is designed to extract the deep aesthetic representation of calligraphy.Finally,the SRAFE method is built by fusing the deep aesthetic features with the hand‐crafted aesthetic features.Experimental results show that scores given by SRAFE are similar to the aesthetic evaluation label of E3C,proving the effectiveness of the authors’method.
文摘In regression, despite being both aimed at estimating the Mean Squared Prediction Error (MSPE), Akaike’s Final Prediction Error (FPE) and the Generalized Cross Validation (GCV) selection criteria are usually derived from two quite different perspectives. Here, settling on the most commonly accepted definition of the MSPE as the expectation of the squared prediction error loss, we provide theoretical expressions for it, valid for any linear model (LM) fitter, be it under random or non random designs. Specializing these MSPE expressions for each of them, we are able to derive closed formulas of the MSPE for some of the most popular LM fitters: Ordinary Least Squares (OLS), with or without a full column rank design matrix;Ordinary and Generalized Ridge regression, the latter embedding smoothing splines fitting. For each of these LM fitters, we then deduce a computable estimate of the MSPE which turns out to coincide with Akaike’s FPE. Using a slight variation, we similarly get a class of MSPE estimates coinciding with the classical GCV formula for those same LM fitters.
文摘Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.
基金Under the auspices of the Philosophy and Social Science Planning Project of Guizhou,China(No.21GZZD59)。
文摘China’s low-carbon development path will make significant contributions to achieving global sustainable development goals.Due to the diverse natural and economic conditions across different regions in China,there exists an imbalance in the distribution of car-bon emissions.Therefore,regional cooperation serves as an effective means to attain low-carbon development.This study examined the pattern of carbon emissions and proposed a potential joint emission reduction strategy by utilizing the industrial carbon emission intens-ity(ICEI)as a crucial factor.We utilized social network analysis and Local Indicators of Spatial Association(LISA)space-time trans-ition matrix to investigate the spatiotemporal connections and discrepancies of ICEI in the cities of the Pearl River Basin(PRB),China from 2010 to 2020.The primary drivers of the ICEI were determined through geographical detectors and multi-scale geographically weighted regression.The results were as follows:1)the overall ICEI in the Pearl River Basin is showing a downward trend,and there is a significant spatial imbalance.2)There are numerous network connections between cities regarding the ICEI,but the network structure is relatively fragile and unstable.3)Economically developed cities such as Guangzhou,Foshan,and Dongguan are in the center of the network while playing an intermediary role.4)Energy consumption,industrialization,per capita GDP,urbanization,science and techno-logy,and productivity are found to be the most influential variables in the spatial differentiation of ICEI,and their combination in-creased the explanatory power of the geographic variation of ICEI.Finally,through the analysis of differences and connections in urban carbon emissions under different economic levels and ICEI,the study suggests joint carbon reduction strategies,which are centered on carbon transfer,financial support,and technological assistance among cities.