The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities...The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.展开更多
Building detection in very high resolution (VHR) images is crucial for mapping and analysing urban environments. Since buildings are elevated objects, elevation data need to be integrated with images for reliable dete...Building detection in very high resolution (VHR) images is crucial for mapping and analysing urban environments. Since buildings are elevated objects, elevation data need to be integrated with images for reliable detection. This process requires two critical steps: optical-elevation data co-registration and aboveground elevation calculation. These two steps are still challenging to some extent. Therefore, this paper introduces optical-elevation data co-registration and normalization techniques for generating a dataset that facilitates elevation-based building detection. For achieving accurate co-registration, a dense set of stereo-based elevations is generated and co-registered to their relevant image based on their corresponding image locations. To normalize these co-registered elevations, the bare-earth elevations are detected based on classification information of some terrain-level features after achieving the image co-registration. The developed method was executed and validated. After implementation, 80% overall-quality of detection result was achieved with 94% correct detection. Together, the developed techniques successfully facilitate the incorporation of stereo-based elevations for detecting buildings in VHR remote sensing images.展开更多
针对现有图像数据增强方法会生成大量无效冗余数据,导致训练数据质量降低和网络泛化性能减弱的问题,提出一种基于正态分布的随机仿射变换(random affine transformation based on normal distribution,NRAff)图像数据增强方法。NRAff的...针对现有图像数据增强方法会生成大量无效冗余数据,导致训练数据质量降低和网络泛化性能减弱的问题,提出一种基于正态分布的随机仿射变换(random affine transformation based on normal distribution,NRAff)图像数据增强方法。NRAff的核心是设计一个正态随机仿射变换模块,在随机仿射变换中引入正态分布,使图像随机仿射变换幅度以原图像为中心呈正态分布形式输出,通过限制变换图像输出的分布范围,去除无效数据,获取更有效且具有正态分布特性的图像数据。NRAff方法仿照生物视觉感知系统的正态分布采样机制,使生成的图像分布接近生物视觉主观感知效果,突出目标感知的正态分布特性,使网络在变换的特征中学习不变的特征。该方法能够提高图像数据分布的一致性,使网络学习到更多有效的、潜在的仿射变换不变特征,提高网络抗过拟合能力。在图像分类数据集CIFAR10,CIFAR100,SVHN,Fashion-MNIST和Imagenette上,与当前先进的数据增强方法进行实验和对比分析,实验结果表明,提出的图像增强方法在分类准确率上均有不同程度的提升,验证了NRAff方法的有效性和普适性。展开更多
The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biolog...The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biology, geology and geography. To overcome the encountered difficulties upon fitting the autologistic regression model to analyze such data via Bayesian and/or Markov chain Monte Carlo (MCMC) techniques, the Gaussian latent variable model has been enrolled in the methodology. Assuming a normal distribution for the latent random variable may not be realistic and wrong, normal assumptions might cause bias in parameter estimates and affect the accuracy of results and inferences. Thus, it entails more flexible prior distributions for the latent variable in the spatial models. A review of the recent literature in spatial statistics shows that there is an increasing tendency in presenting models that are involving skew distributions, especially skew-normal ones. In this study, a skew-normal latent variable modeling was developed in Bayesian analysis of the spatially correlated binary data that were acquired on uncorrelated lattices. The proposed methodology was applied in inspecting spatial dependency and related factors of tooth caries occurrences in a sample of students of Yasuj University of Medical Sciences, Yasuj, Iran. The results indicated that the skew-normal latent variable model had validity and it made a decent criterion that fitted caries data.展开更多
Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integrat...Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.展开更多
基金Supported by the National Natural Science Foundation of China(No.61502475)the Importation and Development of High-Caliber Talents Project of the Beijing Municipal Institutions(No.CIT&TCD201504039)
文摘The performance of conventional similarity measurement methods is affected seriously by the curse of dimensionality of high-dimensional data.The reason is that data difference between sparse and noisy dimensionalities occupies a large proportion of the similarity,leading to the dissimilarities between any results.A similarity measurement method of high-dimensional data based on normalized net lattice subspace is proposed.The data range of each dimension is divided into several intervals,and the components in different dimensions are mapped onto the corresponding interval.Only the component in the same or adjacent interval is used to calculate the similarity.To validate this method,three data types are used,and seven common similarity measurement methods are compared.The experimental result indicates that the relative difference of the method is increasing with the dimensionality and is approximately two or three orders of magnitude higher than the conventional method.In addition,the similarity range of this method in different dimensions is [0,1],which is fit for similarity analysis after dimensionality reduction.
文摘Building detection in very high resolution (VHR) images is crucial for mapping and analysing urban environments. Since buildings are elevated objects, elevation data need to be integrated with images for reliable detection. This process requires two critical steps: optical-elevation data co-registration and aboveground elevation calculation. These two steps are still challenging to some extent. Therefore, this paper introduces optical-elevation data co-registration and normalization techniques for generating a dataset that facilitates elevation-based building detection. For achieving accurate co-registration, a dense set of stereo-based elevations is generated and co-registered to their relevant image based on their corresponding image locations. To normalize these co-registered elevations, the bare-earth elevations are detected based on classification information of some terrain-level features after achieving the image co-registration. The developed method was executed and validated. After implementation, 80% overall-quality of detection result was achieved with 94% correct detection. Together, the developed techniques successfully facilitate the incorporation of stereo-based elevations for detecting buildings in VHR remote sensing images.
文摘针对现有图像数据增强方法会生成大量无效冗余数据,导致训练数据质量降低和网络泛化性能减弱的问题,提出一种基于正态分布的随机仿射变换(random affine transformation based on normal distribution,NRAff)图像数据增强方法。NRAff的核心是设计一个正态随机仿射变换模块,在随机仿射变换中引入正态分布,使图像随机仿射变换幅度以原图像为中心呈正态分布形式输出,通过限制变换图像输出的分布范围,去除无效数据,获取更有效且具有正态分布特性的图像数据。NRAff方法仿照生物视觉感知系统的正态分布采样机制,使生成的图像分布接近生物视觉主观感知效果,突出目标感知的正态分布特性,使网络在变换的特征中学习不变的特征。该方法能够提高图像数据分布的一致性,使网络学习到更多有效的、潜在的仿射变换不变特征,提高网络抗过拟合能力。在图像分类数据集CIFAR10,CIFAR100,SVHN,Fashion-MNIST和Imagenette上,与当前先进的数据增强方法进行实验和对比分析,实验结果表明,提出的图像增强方法在分类准确率上均有不同程度的提升,验证了NRAff方法的有效性和普适性。
文摘The analysis of spatially correlated binary data observed on lattices is an interesting topic that catches the attention of many scholars of different scientific fields like epidemiology, medicine, agriculture, biology, geology and geography. To overcome the encountered difficulties upon fitting the autologistic regression model to analyze such data via Bayesian and/or Markov chain Monte Carlo (MCMC) techniques, the Gaussian latent variable model has been enrolled in the methodology. Assuming a normal distribution for the latent random variable may not be realistic and wrong, normal assumptions might cause bias in parameter estimates and affect the accuracy of results and inferences. Thus, it entails more flexible prior distributions for the latent variable in the spatial models. A review of the recent literature in spatial statistics shows that there is an increasing tendency in presenting models that are involving skew distributions, especially skew-normal ones. In this study, a skew-normal latent variable modeling was developed in Bayesian analysis of the spatially correlated binary data that were acquired on uncorrelated lattices. The proposed methodology was applied in inspecting spatial dependency and related factors of tooth caries occurrences in a sample of students of Yasuj University of Medical Sciences, Yasuj, Iran. The results indicated that the skew-normal latent variable model had validity and it made a decent criterion that fitted caries data.
基金supported by the Crohn's&Colitis Foundation Senior Research Award(No.902766 to J.S.)The National Institute of Diabetes and Digestive and Kidney Diseases(No.R01DK105118-01 and R01DK114126 to J.S.)+1 种基金United States Department of Defense Congressionally Directed Medical Research Programs(No.BC191198 to J.S.)VA Merit Award BX-19-00 to J.S.
文摘Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.