Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear ...Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear data fusion algorithm for N dependent observations is derived. It is proved that the estimation error of data fusion is not greater than that of individual components. The expression of estimation error and weight coefficients are presented. The results of numerical calculation and some examples are illustrated. The effect of dependence of observation data for the final estimation error is presented.展开更多
Data Fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. The optimum linear data fusion algorithm for N indepen...Data Fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. The optimum linear data fusion algorithm for N independent observation data is derived. It is proved that the estimation error of optimum data fusion is not greater than that of individual components. The expression of estimation error and weight coefficients are presented. The results of numerical calculation and some examples are illustrated. The effect of input signal to noise ratio for the data fusion is described.展开更多
A class of estimators of the mean survival time with interval censored data are studied by unbiased transformation method. The estimators are constructed based on the observations to ensure unbiasedness in the sense t...A class of estimators of the mean survival time with interval censored data are studied by unbiased transformation method. The estimators are constructed based on the observations to ensure unbiasedness in the sense that the estimators in a certain class have the same expectation as the mean survival time. The estimators have good properties such as strong consistency (with the rate of O(n^-1/1 (log log n)^1/2)) and asymptotic normality. The application to linear regression is considered and the simulation reports are given.展开更多
An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical...An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies.展开更多
This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author als...This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author also found that the estimators show remarkable in the small sample case yet.展开更多
The experimental random error and desired valuse of non observed points in dynamic indexes were estimated by establishing the linear regression equations about variety regulations of dynamic indexes.The methods for d...The experimental random error and desired valuse of non observed points in dynamic indexes were estimated by establishing the linear regression equations about variety regulations of dynamic indexes.The methods for difference significant test among different treatments using dynamic point as indexes were presented without setting the replication on each dynamic point observed.展开更多
In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be est...In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be estimated simultaneously by the proposed method while the feature of longitudinal data is considered. The existence, strong consistency and asymptotic normality of the estimators are proved under suitable conditions. A simulation study is conducted to investigate the finite sample performance of the proposed method. Our approach can also be used to study the pure single-index model for longitudinal data.展开更多
The present paper proposes a new method of spectrophotometry based on linear combination of multiwavelength data by means of selecting a set of properly weighted coefficients and combination methods. It is clear that ...The present paper proposes a new method of spectrophotometry based on linear combination of multiwavelength data by means of selecting a set of properly weighted coefficients and combination methods. It is clear that the weighted combination absorbance attained is only in direct proportion to the concentration of the analysed component and independent of coexisting interferents.The accuracy of the analytical results is improved greatly for the analysis of light rare earths with the coexistence of heavy rare earths.The analyti- cal error from the reagent blank and co-coloration of light and heavy rare earths have also been overcome. The greatly improved linearity and additivity of absorbance are obtained.展开更多
The linear model features were carefully studied in the cases of data perturbation and mean shift perturbation. Some important features were also proved mathematically. The results show that the mean shift perturbatio...The linear model features were carefully studied in the cases of data perturbation and mean shift perturbation. Some important features were also proved mathematically. The results show that the mean shift perturbation is equivalent to the data perturbation, that is, adding a parameter to an observation equation means that this set of data is deleted from the data set. The estimate of this parameter is its predicted residual in fact.展开更多
Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Method...Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Methods: Generalized estimating equations (GEE) proposed by Zeger and Liang was used. For sevens covariance structures, one method was given for estimating regression and correlation parameters. Results: Regression and coerelation parameters were estimated simultaneously. A Set of program was finished and an example was illustrated. Conclusion: Longitudinal dsta often occur in medical researches and clinical trials. For solving the problem of correlation between repeated measures, it is necessary to use some special methods to cope with this Kind of data.展开更多
MODIS (Moderate Resolution Imaging Spectroradiometer) is a key instrument aboard the Terra (EOS AM) and Aqua (EOS PM) satellites. Linear spectral mixture models are applied to MOIDS data for the sub-pixel classi...MODIS (Moderate Resolution Imaging Spectroradiometer) is a key instrument aboard the Terra (EOS AM) and Aqua (EOS PM) satellites. Linear spectral mixture models are applied to MOIDS data for the sub-pixel classification of land covers. Shaoxing county of Zhejiang Province in China was chosen to be the study site and early rice was selected as the study crop. The derived proportions of land covers from MODIS pixel using linear spectral mixture models were compared with unsupervised classification derived from TM data acquired on the same day, which implies that MODIS data could be used as satellite data source for rice cultivation area estimation, possibly rice growth monitoring and yield forecasting on the regional scale.展开更多
The robust guaranteed cost sampled-data control was studied for a class of uncertain nonlinear systems with time-varying delay. The parameter uncertainties are time-varying norm-bounded and appear in both the state an...The robust guaranteed cost sampled-data control was studied for a class of uncertain nonlinear systems with time-varying delay. The parameter uncertainties are time-varying norm-bounded and appear in both the state and the input control matrices. By applying an input delay approach, the system was transformed into a continuous time-delay system. Attention was focused on the design of a robust guaranteed cost sampled-data control law which guarantees that the closed-loop system is asymptotically stable and the quadratic performance index is less than a certain bound for all admissible uncertainties. By applying Lyapunov stability theory, the theorems were derived to provide sufficient conditions for the existence of robust guaranteed cost sampled-data control law in the form of linear matrix inequalities (LMIs), especially an optimal state-feedback guaranteed cost sampled-data control law which ensures the minimization of the guaranteed cost was given. The effectiveness of the proposed method was illustrated by a simulation example with the asymptotically stable curves of system state under the initial condition of x(0)=[0.679 6 0].展开更多
High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this prob...High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.展开更多
A method of data processing to determine the coefficients of linearization equations for 1050 anemometer (produced by Thermo-Systems Inc. -TSI, USA) with the sensors made of domestic hot wire using the program preferr...A method of data processing to determine the coefficients of linearization equations for 1050 anemometer (produced by Thermo-Systems Inc. -TSI, USA) with the sensors made of domestic hot wire using the program preferred in this Paper is described. By calculation and test, it is indicated that the error resulting from this method is about 0. 5% of the full scale and less than TSl's. By using this method we can set up the calibration curve according to the measurement range and the diameter of the hot wire at a certain accuracy.展开更多
The explicit solution to Cauchy problem for linearized system of two-dimensional isentropic flow with axisymmetrical initial data in gas dynamics is given.
Rectification for airborne linear images is an indispensable preprocessing step. This paper presents in detail a two-step rectification algorithm. The first step is to establish the model of direct georeference positi...Rectification for airborne linear images is an indispensable preprocessing step. This paper presents in detail a two-step rectification algorithm. The first step is to establish the model of direct georeference position using the data provided by the Po- sitioning and Orientation System (POS) and obtain the mathematical relationships between the image points and ground reference points. The second step is to apply polynomial distortion model and Bilinear Interpolation to get the final precise rectified images. In this step, a reference image is required and some ground control points (GCPs) are selected. Experiments showed that the final rectified images are satisfactory, and that our two-step rectification algorithm is very effective.展开更多
In the network technology era, the collected data are growing more and more complex, and become larger than before. In this article, we focus on estimates of the linear regression parameters for symbolic interval data...In the network technology era, the collected data are growing more and more complex, and become larger than before. In this article, we focus on estimates of the linear regression parameters for symbolic interval data. We propose two approaches to estimate regression parameters for symbolic interval data under two different data models and compare our proposed approaches with the existing methods via simulations. Finally, we analyze two real datasets with the proposed methods for illustrations.展开更多
This paper addresses the stabilization problem for a class of nonlinear systems. It is assumed that the controller can only receive the transmitted sequence of finite coded signals via a limited digital communication ...This paper addresses the stabilization problem for a class of nonlinear systems. It is assumed that the controller can only receive the transmitted sequence of finite coded signals via a limited digital communication channel. Both state and output feedback coder-decoder-controller procedures are proposed. Stabilization conditions involving the size of coding alphabet, the sampling period, system state growth rate and data packet dropout rate are obtained. Finally, an example is given to illustrate the design procedures and effectiveness of the proposed results.展开更多
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘Data fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. Follow the discussion in Ref. [1], the optimum linear data fusion algorithm for N dependent observations is derived. It is proved that the estimation error of data fusion is not greater than that of individual components. The expression of estimation error and weight coefficients are presented. The results of numerical calculation and some examples are illustrated. The effect of dependence of observation data for the final estimation error is presented.
文摘Data Fusion is one of the attractive topic in sonar signal processing. Decision level data fusion of multi-sensor (multi-array) system is described in this paper. The optimum linear data fusion algorithm for N independent observation data is derived. It is proved that the estimation error of optimum data fusion is not greater than that of individual components. The expression of estimation error and weight coefficients are presented. The results of numerical calculation and some examples are illustrated. The effect of input signal to noise ratio for the data fusion is described.
基金Supported by the National Natural Science Foundation of China (70171008)
文摘A class of estimators of the mean survival time with interval censored data are studied by unbiased transformation method. The estimators are constructed based on the observations to ensure unbiasedness in the sense that the estimators in a certain class have the same expectation as the mean survival time. The estimators have good properties such as strong consistency (with the rate of O(n^-1/1 (log log n)^1/2)) and asymptotic normality. The application to linear regression is considered and the simulation reports are given.
文摘An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies.
文摘This paper considers the local linear regression estimators for partially linear model with censored data. Which have some nice large-sample behaviors and are easy to implement. By many simulation runs, the author also found that the estimators show remarkable in the small sample case yet.
文摘The experimental random error and desired valuse of non observed points in dynamic indexes were estimated by establishing the linear regression equations about variety regulations of dynamic indexes.The methods for difference significant test among different treatments using dynamic point as indexes were presented without setting the replication on each dynamic point observed.
基金Supported by the National Natural Science Foundation of China (10571008)the Natural Science Foundation of Henan (092300410149)the Core Teacher Foundationof Henan (2006141)
文摘In this article, a partially linear single-index model /or longitudinal data is investigated. The generalized penalized spline least squares estimates of the unknown parameters are suggested. All parameters can be estimated simultaneously by the proposed method while the feature of longitudinal data is considered. The existence, strong consistency and asymptotic normality of the estimators are proved under suitable conditions. A simulation study is conducted to investigate the finite sample performance of the proposed method. Our approach can also be used to study the pure single-index model for longitudinal data.
文摘The present paper proposes a new method of spectrophotometry based on linear combination of multiwavelength data by means of selecting a set of properly weighted coefficients and combination methods. It is clear that the weighted combination absorbance attained is only in direct proportion to the concentration of the analysed component and independent of coexisting interferents.The accuracy of the analytical results is improved greatly for the analysis of light rare earths with the coexistence of heavy rare earths.The analyti- cal error from the reagent blank and co-coloration of light and heavy rare earths have also been overcome. The greatly improved linearity and additivity of absorbance are obtained.
文摘The linear model features were carefully studied in the cases of data perturbation and mean shift perturbation. Some important features were also proved mathematically. The results show that the mean shift perturbation is equivalent to the data perturbation, that is, adding a parameter to an observation equation means that this set of data is deleted from the data set. The estimate of this parameter is its predicted residual in fact.
文摘Objective: To analyze longitudinal binary data by using generalized linear models. The correlation between repeated measures were considered. The general method for analyzing longitudinal binary data was given. Methods: Generalized estimating equations (GEE) proposed by Zeger and Liang was used. For sevens covariance structures, one method was given for estimating regression and correlation parameters. Results: Regression and coerelation parameters were estimated simultaneously. A Set of program was finished and an example was illustrated. Conclusion: Longitudinal dsta often occur in medical researches and clinical trials. For solving the problem of correlation between repeated measures, it is necessary to use some special methods to cope with this Kind of data.
文摘MODIS (Moderate Resolution Imaging Spectroradiometer) is a key instrument aboard the Terra (EOS AM) and Aqua (EOS PM) satellites. Linear spectral mixture models are applied to MOIDS data for the sub-pixel classification of land covers. Shaoxing county of Zhejiang Province in China was chosen to be the study site and early rice was selected as the study crop. The derived proportions of land covers from MODIS pixel using linear spectral mixture models were compared with unsupervised classification derived from TM data acquired on the same day, which implies that MODIS data could be used as satellite data source for rice cultivation area estimation, possibly rice growth monitoring and yield forecasting on the regional scale.
基金Project(12511109) supported by the Science and Technology Studies Foundation of Heilongjiang Educational Committee of 2011, China
文摘The robust guaranteed cost sampled-data control was studied for a class of uncertain nonlinear systems with time-varying delay. The parameter uncertainties are time-varying norm-bounded and appear in both the state and the input control matrices. By applying an input delay approach, the system was transformed into a continuous time-delay system. Attention was focused on the design of a robust guaranteed cost sampled-data control law which guarantees that the closed-loop system is asymptotically stable and the quadratic performance index is less than a certain bound for all admissible uncertainties. By applying Lyapunov stability theory, the theorems were derived to provide sufficient conditions for the existence of robust guaranteed cost sampled-data control law in the form of linear matrix inequalities (LMIs), especially an optimal state-feedback guaranteed cost sampled-data control law which ensures the minimization of the guaranteed cost was given. The effectiveness of the proposed method was illustrated by a simulation example with the asymptotically stable curves of system state under the initial condition of x(0)=[0.679 6 0].
基金Project(60835005) supported by the National Nature Science Foundation of China
文摘High dimensional data clustering,with the inherent sparsity of data and the existence of noise,is a serious challenge for clustering algorithms.A new linear manifold clustering method was proposed to address this problem.The basic idea was to search the line manifold clusters hidden in datasets,and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters.The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure.The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time.The proposed method is able to obtain high clustering accuracy for various data sets with different sizes,manifold dimensions and noise ratios,which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.
文摘A method of data processing to determine the coefficients of linearization equations for 1050 anemometer (produced by Thermo-Systems Inc. -TSI, USA) with the sensors made of domestic hot wire using the program preferred in this Paper is described. By calculation and test, it is indicated that the error resulting from this method is about 0. 5% of the full scale and less than TSl's. By using this method we can set up the calibration curve according to the measurement range and the diameter of the hot wire at a certain accuracy.
文摘The explicit solution to Cauchy problem for linearized system of two-dimensional isentropic flow with axisymmetrical initial data in gas dynamics is given.
基金Project (No. 02DZ15001) supported by Shanghai Science and Technology Development Funds, China
文摘Rectification for airborne linear images is an indispensable preprocessing step. This paper presents in detail a two-step rectification algorithm. The first step is to establish the model of direct georeference position using the data provided by the Po- sitioning and Orientation System (POS) and obtain the mathematical relationships between the image points and ground reference points. The second step is to apply polynomial distortion model and Bilinear Interpolation to get the final precise rectified images. In this step, a reference image is required and some ground control points (GCPs) are selected. Experiments showed that the final rectified images are satisfactory, and that our two-step rectification algorithm is very effective.
文摘In the network technology era, the collected data are growing more and more complex, and become larger than before. In this article, we focus on estimates of the linear regression parameters for symbolic interval data. We propose two approaches to estimate regression parameters for symbolic interval data under two different data models and compare our proposed approaches with the existing methods via simulations. Finally, we analyze two real datasets with the proposed methods for illustrations.
基金supported by the National Natural Science Foundation of China(No.60874021,60974016)the National Natural Science Foundation of Jiangsu Province(No.BK2007061)Qing Lan Project from the Jiangsu Provincial Department for Education and the National Natural Science Foundation of Nantong University(No.08Z001)
文摘This paper addresses the stabilization problem for a class of nonlinear systems. It is assumed that the controller can only receive the transmitted sequence of finite coded signals via a limited digital communication channel. Both state and output feedback coder-decoder-controller procedures are proposed. Stabilization conditions involving the size of coding alphabet, the sampling period, system state growth rate and data packet dropout rate are obtained. Finally, an example is given to illustrate the design procedures and effectiveness of the proposed results.