The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities...The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.展开更多
Building detection in very high resolution (VHR) images is crucial for mapping and analysing urban environments. Since buildings are elevated objects, elevation data need to be integrated with images for reliable dete...Building detection in very high resolution (VHR) images is crucial for mapping and analysing urban environments. Since buildings are elevated objects, elevation data need to be integrated with images for reliable detection. This process requires two critical steps: optical-elevation data co-registration and aboveground elevation calculation. These two steps are still challenging to some extent. Therefore, this paper introduces optical-elevation data co-registration and normalization techniques for generating a dataset that facilitates elevation-based building detection. For achieving accurate co-registration, a dense set of stereo-based elevations is generated and co-registered to their relevant image based on their corresponding image locations. To normalize these co-registered elevations, the bare-earth elevations are detected based on classification information of some terrain-level features after achieving the image co-registration. The developed method was executed and validated. After implementation, 80% overall-quality of detection result was achieved with 94% correct detection. Together, the developed techniques successfully facilitate the incorporation of stereo-based elevations for detecting buildings in VHR remote sensing images.展开更多
The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biolog...The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biology, geology and geography. To overcome the encountered difficulties upon fitting the autologistic regression model to analyze such data via Bayesian and/or Markov chain Monte Carlo (MCMC) techniques, the Gaussian latent variable model has been enrolled in the methodology. Assuming a normal distribution for the latent random variable may not be realistic and wrong, normal assumptions might cause bias in parameter estimates and affect the accuracy of results and inferences. Thus, it entails more flexible prior distributions for the latent variable in the spatial models. A review of the recent literature in spatial statistics shows that there is an increasing tendency in presenting models that are involving skew distributions, especially skew-normal ones. In this study, a skew-normal latent variable modeling was developed in Bayesian analysis of the spatially correlated binary data that were acquired on uncorrelated lattices. The proposed methodology was applied in inspecting spatial dependency and related factors of tooth caries occurrences in a sample of students of Yasuj University of Medical Sciences, Yasuj, Iran. The results indicated that the skew-normal latent variable model had validity and it made a decent criterion that fitted caries data.展开更多
Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integrat...Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.展开更多
基金Supported by the National Natural Science Foundation of China(No.61502475)the Importation and Development of High-Caliber Talents Project of the Beijing Municipal Institutions(No.CIT&TCD201504039)
文摘The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.
基金supported by the National Science and Technology Major Projects (2008ZX05025)the Project of National Oil and Gas Resources Strategic Constituency Survey and Evaluation of the Ministry of Land and Resources,China (XQ-2007-05)
文摘边察觉和改进技术通常在用潜在的域数据认出地质的身体的边被使用。我们在场一种新边识别技术基于有边察觉和改进技术的功能的全部的水平衍生物的规范的垂直衍生物。首先,我们计算潜在地的数据的全部的水平衍生物(THDR ) 然后计算 n 顺序 THDR 的垂直衍生物(VDRn ) 。为 n 顺序垂直衍生物,全部的水平衍生物(PTHDR ) 的山峰价值用比 0 大的阀值价值被获得。这 PTHDR 能被用于边察觉。第二, PTHDR 价值被全部的水平衍生物划分并且由最大的价值使正常化。最后,我们使用了数字模型的不同类型验证新边识别技术的有效性和可靠性。
文摘Building detection in very high resolution (VHR) images is crucial for mapping and analysing urban environments. Since buildings are elevated objects, elevation data need to be integrated with images for reliable detection. This process requires two critical steps: optical-elevation data co-registration and aboveground elevation calculation. These two steps are still challenging to some extent. Therefore, this paper introduces optical-elevation data co-registration and normalization techniques for generating a dataset that facilitates elevation-based building detection. For achieving accurate co-registration, a dense set of stereo-based elevations is generated and co-registered to their relevant image based on their corresponding image locations. To normalize these co-registered elevations, the bare-earth elevations are detected based on classification information of some terrain-level features after achieving the image co-registration. The developed method was executed and validated. After implementation, 80% overall-quality of detection result was achieved with 94% correct detection. Together, the developed techniques successfully facilitate the incorporation of stereo-based elevations for detecting buildings in VHR remote sensing images.
文摘The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biology, geology and geography. To overcome the encountered difficulties upon fitting the autologistic regression model to analyze such data via Bayesian and/or Markov chain Monte Carlo (MCMC) techniques, the Gaussian latent variable model has been enrolled in the methodology. Assuming a normal distribution for the latent random variable may not be realistic and wrong, normal assumptions might cause bias in parameter estimates and affect the accuracy of results and inferences. Thus, it entails more flexible prior distributions for the latent variable in the spatial models. A review of the recent literature in spatial statistics shows that there is an increasing tendency in presenting models that are involving skew distributions, especially skew-normal ones. In this study, a skew-normal latent variable modeling was developed in Bayesian analysis of the spatially correlated binary data that were acquired on uncorrelated lattices. The proposed methodology was applied in inspecting spatial dependency and related factors of tooth caries occurrences in a sample of students of Yasuj University of Medical Sciences, Yasuj, Iran. The results indicated that the skew-normal latent variable model had validity and it made a decent criterion that fitted caries data.
基金supported by the Crohn's&Colitis Foundation Senior Research Award(No.902766 to J.S.)The National Institute of Diabetes and Digestive and Kidney Diseases(No.R01DK105118-01 and R01DK114126 to J.S.)+1 种基金United States Department of Defense Congressionally Directed Medical Research Programs(No.BC191198 to J.S.)VA Merit Award BX-19-00 to J.S.
文摘Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.