With the increasing number of web services, it becomes a difficult task for an ordinary user to select an appropriate service. Hence, it is conventional that users in a digital community network take part in a collabo...With the increasing number of web services, it becomes a difficult task for an ordinary user to select an appropriate service. Hence, it is conventional that users in a digital community network take part in a collaborative mechanism for the purpose of service selection. The participation usually brings unnecessary burdens for users, such as giving opinions, storing service information. Extra communication overhead hinders the performance of the network. Thus, the community administrators are facing a problem of how to obtain an overall service selection result for the whole community readily and effectively. To address this problem, we present a k-median facility location agent model. The model analyzes the procedure of service selection through five entities and six types of messages. Two algorithms are elaborated in pursuit of a global optimization concerning connection costs between users and facilities where services are deployed. To evaluate our model, we conduct extensive simulations and present a detailed analysis of the simulation results.展开更多
In this paper, we study the problem of efficiently computing k-medians over high-dimensional and high speed data streams. The focus of this paper is on the issue of minimizing CPU time to handle high speed data stream...In this paper, we study the problem of efficiently computing k-medians over high-dimensional and high speed data streams. The focus of this paper is on the issue of minimizing CPU time to handle high speed data streams on top of the requirements of high accuracy and small memory. Our work is motivated by the following observation: the existing algorithms have similar approximation behaviors in practice, even though they make noticeably different worst case theoretical guarantees. The underlying reason is that in order to achieve high approximation level with the smallest possible memory, they need rather complex techniques to maintain a sketch, along time dimension, by using some existing off-line clustering algorithms. Those clustering algorithms cannot guarantee the optimal clustering result over data segments in a data stream but accumulate errors over segments, which makes most algorithms behave the same in terms of approximation level, in practice. We propose a new grid-based approach which divides the entire data set into cells (not along time dimension). We can achieve high approximation level based on a novel concept called (1 - ε)-dominant. We further extend the method to the data stream context, by leveraging a density-based heuristic and frequent item mining techniques over data streams. We only need to apply an existing clustering once to computing k-medians, on demand, which reduces CPU time significantly. We conducted extensive experimental studies, and show that our approaches outperform other well-known approaches.展开更多
We study the generalizedk-median version of the warehouse-retailer network design problem(kWRND).We formulate the k-WRND as a binary integer program and propose a 6-approximation randomized algorithm based on Lagrangi...We study the generalizedk-median version of the warehouse-retailer network design problem(kWRND).We formulate the k-WRND as a binary integer program and propose a 6-approximation randomized algorithm based on Lagrangian relaxation.展开更多
在处理雷达信号时,基于密度的空间聚类(Density-based spatial clustering of applications with noise,DBSCAN)分选算法依赖于参数或阈值的选取,影响分选的准确率。为此提出了一种改进的雷达信号脉冲分选算法,在DBSCAN聚类基础上结合了...在处理雷达信号时,基于密度的空间聚类(Density-based spatial clustering of applications with noise,DBSCAN)分选算法依赖于参数或阈值的选取,影响分选的准确率。为此提出了一种改进的雷达信号脉冲分选算法,在DBSCAN聚类基础上结合了K中位最近邻(K-median nearest neighbor,KMNN)算法,通过引入自衰减系数并设置阈值上限对参数值列表进行二次处理,可以自适应根据聚类结果与不同参数时的K值之间的关系确定最优的邻域半径和最少点个数,提高了分选的正确率。通过仿真实验验证了算法利用雷达脉冲描述字特征进行自适应分选的有效性。展开更多
基金This work is supported by Program for the Key Program of NSFC-Guangdong Union Foundation (U1135002), Major national S&T program (2011ZX03005-002), National Natural Science Foundation of China (60872041, 61072066), the Fundamental Research Funds for the Central Universities (JY10000903001, JY10000901034, K5051203010) and the GAD Pre-Research Foundation (9140A 15040210HK61 ).
文摘With the increasing number of web services, it becomes a difficult task for an ordinary user to select an appropriate service. Hence, it is conventional that users in a digital community network take part in a collaborative mechanism for the purpose of service selection. The participation usually brings unnecessary burdens for users, such as giving opinions, storing service information. Extra communication overhead hinders the performance of the network. Thus, the community administrators are facing a problem of how to obtain an overall service selection result for the whole community readily and effectively. To address this problem, we present a k-median facility location agent model. The model analyzes the procedure of service selection through five entities and six types of messages. Two algorithms are elaborated in pursuit of a global optimization concerning connection costs between users and facilities where services are deployed. To evaluate our model, we conduct extensive simulations and present a detailed analysis of the simulation results.
文摘In this paper, we study the problem of efficiently computing k-medians over high-dimensional and high speed data streams. The focus of this paper is on the issue of minimizing CPU time to handle high speed data streams on top of the requirements of high accuracy and small memory. Our work is motivated by the following observation: the existing algorithms have similar approximation behaviors in practice, even though they make noticeably different worst case theoretical guarantees. The underlying reason is that in order to achieve high approximation level with the smallest possible memory, they need rather complex techniques to maintain a sketch, along time dimension, by using some existing off-line clustering algorithms. Those clustering algorithms cannot guarantee the optimal clustering result over data segments in a data stream but accumulate errors over segments, which makes most algorithms behave the same in terms of approximation level, in practice. We propose a new grid-based approach which divides the entire data set into cells (not along time dimension). We can achieve high approximation level based on a novel concept called (1 - ε)-dominant. We further extend the method to the data stream context, by leveraging a density-based heuristic and frequent item mining techniques over data streams. We only need to apply an existing clustering once to computing k-medians, on demand, which reduces CPU time significantly. We conducted extensive experimental studies, and show that our approaches outperform other well-known approaches.
基金supported by National Basic Research Program of China(973 Program)(Grant No.2010CB732501)National Natural Science Foundation of China(Grant No.11071268)China Scholarship Council Scientific Research Common Program of Beijing Municipal Commission of Education(Grant No.KM201210005033)
文摘We study the generalizedk-median version of the warehouse-retailer network design problem(kWRND).We formulate the k-WRND as a binary integer program and propose a 6-approximation randomized algorithm based on Lagrangian relaxation.
文摘在处理雷达信号时,基于密度的空间聚类(Density-based spatial clustering of applications with noise,DBSCAN)分选算法依赖于参数或阈值的选取,影响分选的准确率。为此提出了一种改进的雷达信号脉冲分选算法,在DBSCAN聚类基础上结合了K中位最近邻(K-median nearest neighbor,KMNN)算法,通过引入自衰减系数并设置阈值上限对参数值列表进行二次处理,可以自适应根据聚类结果与不同参数时的K值之间的关系确定最优的邻域半径和最少点个数,提高了分选的正确率。通过仿真实验验证了算法利用雷达脉冲描述字特征进行自适应分选的有效性。