Learning unlabeled data is a significant challenge that needs to han-dle complicated relationships between nominal values and attributes.Increas-ingly,recent research on learning value relations within and between att...Learning unlabeled data is a significant challenge that needs to han-dle complicated relationships between nominal values and attributes.Increas-ingly,recent research on learning value relations within and between attributes has shown significant improvement in clustering and outlier detection,etc.However,typical existing work relies on learning pairwise value relations but weakens or overlooks the direct couplings between multiple attributes.This paper thus proposes two novel and flexible multi-attribute couplings-based distance(MCD)metrics,which learn the multi-attribute couplings and their strengths in nominal data based on information theories:self-information,entropy,and mutual information,for measuring both numerical and nominal distances.MCD enables the application of numerical and nominal clustering methods on nominal data and quantifies the influence of involving and filtering multi-attribute couplings on distance learning and clustering perfor-mance.Substantial experiments evidence the above conclusions on 15 data sets against seven state-of-the-art distance measures with various feature selection methods for both numerical and nominal clustering.展开更多
Aims Measures of plot-to-plot phylogenetic dissimilarity and beta diversity are providing a powerful tool for understanding the complex ecolog-ical and evolutionary mechanisms that drive community assembly.Methods Her...Aims Measures of plot-to-plot phylogenetic dissimilarity and beta diversity are providing a powerful tool for understanding the complex ecolog-ical and evolutionary mechanisms that drive community assembly.Methods Here,we review the properties of some previously published dis-similarity measures that are based on minimum or average phylo-genetic dissimilarity between species in different plots.Important Findings We first show that some of these measures violate the basic condi-tion that for two identical plots the measures take the value zero.They also violate the condition that the dissimilarity between two identical plots should always be lower than that between two differ-ent plots.Such erratic behavior renders these measures unsuitable for measuring plot-to-plot phylogenetic dissimilarity.We next pro-pose a new measure that satisfies these conditions,thus providing a more reasonable way for measuring phylogenetic dissimilarity.展开更多
A large amount of researches and studies have been recently performed by applying statistical and machine learning techniques for vibration-based damage detection. However, the global character inherent to the limited...A large amount of researches and studies have been recently performed by applying statistical and machine learning techniques for vibration-based damage detection. However, the global character inherent to the limited number of modal properties issued from operational modal analysis may be not appropriate for early-damage, which has generally a local character. The present paper aims at detecting this type of damage by using static SHM data and by assuming that early-damage produces dead load redistribution. To achieve this objective a data driven strategy is proposed, consisting of the combination of advanced statistical and machine learning methods such as principal component analysis, symbolic data analysis and cluster analysis. From this analysis it was observed that, under the noise levels measured on site, the proposed strategy is able to automatically detect stiffness reduction in stay cables reaching at least 1%.展开更多
基金funded by the MOE(Ministry of Education in China)Project of Humanities and Social Sciences(Project Number:18YJC870006)from China.
文摘Learning unlabeled data is a significant challenge that needs to han-dle complicated relationships between nominal values and attributes.Increas-ingly,recent research on learning value relations within and between attributes has shown significant improvement in clustering and outlier detection,etc.However,typical existing work relies on learning pairwise value relations but weakens or overlooks the direct couplings between multiple attributes.This paper thus proposes two novel and flexible multi-attribute couplings-based distance(MCD)metrics,which learn the multi-attribute couplings and their strengths in nominal data based on information theories:self-information,entropy,and mutual information,for measuring both numerical and nominal distances.MCD enables the application of numerical and nominal clustering methods on nominal data and quantifies the influence of involving and filtering multi-attribute couplings on distance learning and clustering perfor-mance.Substantial experiments evidence the above conclusions on 15 data sets against seven state-of-the-art distance measures with various feature selection methods for both numerical and nominal clustering.
文摘Aims Measures of plot-to-plot phylogenetic dissimilarity and beta diversity are providing a powerful tool for understanding the complex ecolog-ical and evolutionary mechanisms that drive community assembly.Methods Here,we review the properties of some previously published dis-similarity measures that are based on minimum or average phylo-genetic dissimilarity between species in different plots.Important Findings We first show that some of these measures violate the basic condi-tion that for two identical plots the measures take the value zero.They also violate the condition that the dissimilarity between two identical plots should always be lower than that between two differ-ent plots.Such erratic behavior renders these measures unsuitable for measuring plot-to-plot phylogenetic dissimilarity.We next pro-pose a new measure that satisfies these conditions,thus providing a more reasonable way for measuring phylogenetic dissimilarity.
文摘A large amount of researches and studies have been recently performed by applying statistical and machine learning techniques for vibration-based damage detection. However, the global character inherent to the limited number of modal properties issued from operational modal analysis may be not appropriate for early-damage, which has generally a local character. The present paper aims at detecting this type of damage by using static SHM data and by assuming that early-damage produces dead load redistribution. To achieve this objective a data driven strategy is proposed, consisting of the combination of advanced statistical and machine learning methods such as principal component analysis, symbolic data analysis and cluster analysis. From this analysis it was observed that, under the noise levels measured on site, the proposed strategy is able to automatically detect stiffness reduction in stay cables reaching at least 1%.