The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data spa...The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.展开更多
The effect of the different training samples is different for the classifier when pattern recognition system is established. The training samples were selected randomly in the past protein disulfide bond prediction me...The effect of the different training samples is different for the classifier when pattern recognition system is established. The training samples were selected randomly in the past protein disulfide bond prediction methods, therefore the prediction accuracy of protein contact was reduced. In order to improve the influence of training samples, a prediction method of protein disulfide bond on the basis of pattern selection and Radical Basis Function neural network has been brought forward in this paper. The attributes related with protein disulfide bond are extracted and coded in the method and pattern selection is used to select training samples from coded samples in order to improve the precision of protein disulfide bond prediction. 200 proteins with disulfide bond structure from the PDB database are encoded according to the encoding approach and are taken as models of training samples. Then samples are taken on the pattern selection based on the nearest neighbor algorithm and corresponding prediction models are set by using RBF neural network. The simulation experiment result indicates that this method of pattern selection can improve the prediction accuracy of protein disulfide bond.展开更多
Aggregate nearest neighbor(ANN) search retrieves for two spatial datasets T and Q, segment(s) of one or more trajectories from the set T having minimum aggregate distance to points in Q. When interacting with large am...Aggregate nearest neighbor(ANN) search retrieves for two spatial datasets T and Q, segment(s) of one or more trajectories from the set T having minimum aggregate distance to points in Q. When interacting with large amounts of trajectories, this process would be very time-consuming due to consecutive page loads. An approximate method for finding segments with minimum aggregate distance is proposed which can improve the response time. In order to index large volumes of trajectories, scalable and efficient trajectory index(SETI) structure is used. But some refinements are provided to temporal index of SETI to improve the performance of proposed method. The experiments were performed with different number of query points and percentages of dataset. It is shown that proposed method besides having an acceptable precision, can reduce the computation time significantly. It is also shown that the main fraction of search time among load time, ANN and computing convex and centroid, is related to ANN.展开更多
In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power ...In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power generation forecast method using the combination of K-means++, grey relational analysis (GRA) and support vector regression (SVR) based on feature selection (Hybrid Kmeans-GRA-SVR, HKGSVR) was proposed. The historical power data were clustered through the multi-index K-means++ algorithm and divided into ideal and non-ideal weather. The GRA algorithm was used to match the similar day and the nearest neighbor similar day of the prediction day. And selected appropriate input features for different weather types to train the SVR model. Under ideal weather, the average values of MAE, RMSE and R2 were 0.8101, 0.9608 kW and 99.66%, respectively. And this method reduced the average training time by 77.27% compared with the standard SVR model. Under non-ideal weather conditions, the average values of MAE, RMSE and R2 were 1.8337, 2.1379 kW and 98.47%, respectively. And this method reduced the average training time of the standard SVR model by 98.07%. The experimental results show that the prediction accuracy of the proposed model is significantly improved compared to the other five models, which verify the effectiveness of the method.展开更多
针对水声目标信号复杂、样本获取难度大且富含不确定信息的问题,研究了一种新的证据K类近邻识别算法(Evidence K Nearest Neighbor,EK-NN)。首先在水声目标的各类训练样本中,根据特征距离大小选取待识别目标的K近邻,并构造其基本置信指...针对水声目标信号复杂、样本获取难度大且富含不确定信息的问题,研究了一种新的证据K类近邻识别算法(Evidence K Nearest Neighbor,EK-NN)。首先在水声目标的各类训练样本中,根据特征距离大小选取待识别目标的K近邻,并构造其基本置信指派函数。然后使用证据理论中的Dempster-Shafer(D-S)规则对各类别下的近邻证据进行组合,最后再应用冲突置信的比例分配规则5(Redistribute Conflicting mass proportionally rule5,PCR5)将所有类别的组合证据进行融合,并根据融合结果和所设立的分类规则来判断目标的类别属性。根据水声目标实测数据,将新算法与其他几种常见的水声目标识别算法进行了对比分析,结果表明新算法能有效提高识别的准确率。展开更多
基金Project (No.ABA048) supported by the Natural Science Foundationof Hubei Province,China
文摘The problem of continuously monitoring multiple K-nearest neighbor (K-NN) queries with dynamic object and query dataset is valuable for many location-based applications. A practical method is to partition the data space into grid cells, with both object and query table being indexed by this grid structure, while solving the problem by periodically joining cells of objects with queries having their influence regions intersecting the cells. In the worst case, all cells of objects will be accessed once. Object and query cache strategies are proposed to further reduce the I/O cost. With object cache strategy, queries remaining static in current processing cycle seldom need I/O cost, they can be returned quickly. The main I/O cost comes from moving queries, the query cache strategy is used to restrict their search-regions, which uses current results of queries in the main memory buffer. The queries can share not only the accessing of object pages, but also their influence regions. Theoretical analysis of the expected I/O cost is presented, with the I/O cost being about 40% that of the SEA-CNN method in the experiment results.
文摘The effect of the different training samples is different for the classifier when pattern recognition system is established. The training samples were selected randomly in the past protein disulfide bond prediction methods, therefore the prediction accuracy of protein contact was reduced. In order to improve the influence of training samples, a prediction method of protein disulfide bond on the basis of pattern selection and Radical Basis Function neural network has been brought forward in this paper. The attributes related with protein disulfide bond are extracted and coded in the method and pattern selection is used to select training samples from coded samples in order to improve the precision of protein disulfide bond prediction. 200 proteins with disulfide bond structure from the PDB database are encoded according to the encoding approach and are taken as models of training samples. Then samples are taken on the pattern selection based on the nearest neighbor algorithm and corresponding prediction models are set by using RBF neural network. The simulation experiment result indicates that this method of pattern selection can improve the prediction accuracy of protein disulfide bond.
文摘Aggregate nearest neighbor(ANN) search retrieves for two spatial datasets T and Q, segment(s) of one or more trajectories from the set T having minimum aggregate distance to points in Q. When interacting with large amounts of trajectories, this process would be very time-consuming due to consecutive page loads. An approximate method for finding segments with minimum aggregate distance is proposed which can improve the response time. In order to index large volumes of trajectories, scalable and efficient trajectory index(SETI) structure is used. But some refinements are provided to temporal index of SETI to improve the performance of proposed method. The experiments were performed with different number of query points and percentages of dataset. It is shown that proposed method besides having an acceptable precision, can reduce the computation time significantly. It is also shown that the main fraction of search time among load time, ANN and computing convex and centroid, is related to ANN.
文摘In order to ensure that the large-scale application of photovoltaic power generation does not affect the stability of the grid, accurate photovoltaic (PV) power generation forecast is essential. A short-term PV power generation forecast method using the combination of K-means++, grey relational analysis (GRA) and support vector regression (SVR) based on feature selection (Hybrid Kmeans-GRA-SVR, HKGSVR) was proposed. The historical power data were clustered through the multi-index K-means++ algorithm and divided into ideal and non-ideal weather. The GRA algorithm was used to match the similar day and the nearest neighbor similar day of the prediction day. And selected appropriate input features for different weather types to train the SVR model. Under ideal weather, the average values of MAE, RMSE and R2 were 0.8101, 0.9608 kW and 99.66%, respectively. And this method reduced the average training time by 77.27% compared with the standard SVR model. Under non-ideal weather conditions, the average values of MAE, RMSE and R2 were 1.8337, 2.1379 kW and 98.47%, respectively. And this method reduced the average training time of the standard SVR model by 98.07%. The experimental results show that the prediction accuracy of the proposed model is significantly improved compared to the other five models, which verify the effectiveness of the method.
文摘为了改善利用SCATS交通数据估计路段行程时间的效果,通过分析SCATS实际交通数据获取时间间隔不一致的特征,构建了SCATS交通数据虚拟时间序列,将利用因子分析法提取的累计贡献率在85%以上的主因子作为交通模式特征向量的构成要素,用欧氏距离作为当前交通模式特征向量和历史交通模式特征向量相似性的测度指标,以路段行程时间估计误差最小为目标选取当前交通模式的近邻数,对交通模式之间距离的倒数进行归一化处理,确定了相似交通模式的行程时间权重,设计了基于SCATS交通数据的路段行程时间估计方法.实例结果表明:与多元线性回归方法相比,本文方法估计的路段行程时间平均绝对误差、平均绝对百分比误差和均方根误差分别平均减少了9.68 s、8.07%和4.5 s.
文摘针对水声目标信号复杂、样本获取难度大且富含不确定信息的问题,研究了一种新的证据K类近邻识别算法(Evidence K Nearest Neighbor,EK-NN)。首先在水声目标的各类训练样本中,根据特征距离大小选取待识别目标的K近邻,并构造其基本置信指派函数。然后使用证据理论中的Dempster-Shafer(D-S)规则对各类别下的近邻证据进行组合,最后再应用冲突置信的比例分配规则5(Redistribute Conflicting mass proportionally rule5,PCR5)将所有类别的组合证据进行融合,并根据融合结果和所设立的分类规则来判断目标的类别属性。根据水声目标实测数据,将新算法与其他几种常见的水声目标识别算法进行了对比分析,结果表明新算法能有效提高识别的准确率。