Index structure that enables efficient similarity queries in high-dimensional space is crucial for many applications. This paper discusses the indexing problem in dataset composed of partially clustered data, which ex...Index structure that enables efficient similarity queries in high-dimensional space is crucial for many applications. This paper discusses the indexing problem in dataset composed of partially clustered data, which exists in many applications. Current index methods are inefficient with partially clustered datasets. The dynamic and adaptive index structure presented here, called a multi-cluster tree (MC-tree), consists of a set of height-balanced trees for indexing. This index structure improves the querying efficiency in three ways: 1) Most bounding regions achieve uniform distributions, which results in fewer splits and less overlap compared with a single indexing tree. 2) The clusters in the dataset are dynamically detected when the index is updated. 3) The query process does not involve a sequential scan. The MC-tree was shown to be better than hierarchical and cluster-based indexes for the partially clustered datasets.展开更多
提出了适用于铁路隧道的QRA方法论,基于事件树方法对各种场景进行组合,然后采用蒙特卡罗方法(MCS Monte Carlo Simulation)进行分析,该方法能够有效地量化铁路隧道中可能发生的伤亡风险,可以从安全角度评估和比较候选基础设施类型或解...提出了适用于铁路隧道的QRA方法论,基于事件树方法对各种场景进行组合,然后采用蒙特卡罗方法(MCS Monte Carlo Simulation)进行分析,该方法能够有效地量化铁路隧道中可能发生的伤亡风险,可以从安全角度评估和比较候选基础设施类型或解决方案,以便选择最终实施的隧道系统。展开更多
基金Supported by the Chinese National Key FundamentalResearch Program(No.G1998030414)the National Natural Science Foundation of China (No.79990580)the"985" Program of Tsinghua University
文摘Index structure that enables efficient similarity queries in high-dimensional space is crucial for many applications. This paper discusses the indexing problem in dataset composed of partially clustered data, which exists in many applications. Current index methods are inefficient with partially clustered datasets. The dynamic and adaptive index structure presented here, called a multi-cluster tree (MC-tree), consists of a set of height-balanced trees for indexing. This index structure improves the querying efficiency in three ways: 1) Most bounding regions achieve uniform distributions, which results in fewer splits and less overlap compared with a single indexing tree. 2) The clusters in the dataset are dynamically detected when the index is updated. 3) The query process does not involve a sequential scan. The MC-tree was shown to be better than hierarchical and cluster-based indexes for the partially clustered datasets.