A new admission control algorithm considering the network self-similar access characteristics is proposed. Taking advantage of the mathematical model of the network traffic admission control which can effectively over...A new admission control algorithm considering the network self-similar access characteristics is proposed. Taking advantage of the mathematical model of the network traffic admission control which can effectively overcome the self-similar characteristics of the network requests, through the scheduling of the differential service qucue based on priority while at the same time taking into account various factors including access characteristics of requests, load information, etc, smoothness of the admission control is ensured by the algorithm proposed in this paper. We design a non-linear self-adapting control algorithm by introducing an exponential admission function, thus overcomes the negative aspects introduced by static threshold parameters. Simulation results show that the scheme proposed in this paper can effectively improve the resource utilization of the clusters, while at the same time protecting the service with high priority. Our simulation results also show that this algorithm can improve system stability and reliability too. Key words Web cluster - admission control - differential service - self-similar - self-adapting CLC number TP 393 Foundation item: Supported by the National Natural Science Foundation of China (10375024) and the Hunan Natural Science Foundation of China(03JJY4054)Biography: LIU An-feng(1971-), male, Ph. D candidate, majoring in network computing, Web QoS.展开更多
Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm (TDSA). The algorithm adopts the throughput of cluster back-ends to evaluate their load...Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm (TDSA). The algorithm adopts the throughput of cluster back-ends to evaluate their load and employs the neural network model to predict the future load so that the scheduling system features a self-learning capability and good adaptability to the change of load. Moreover, it separates static requests from dynamic requests to make full use of the CPU resources and takes the locality of requests into account to improve the cache hit ratio. Experimental re suits from the testing tool of WebBench^TM show better per formance for Web cluster server with TDSA than that with traditional scheduling algorithms.展开更多
There are two kinds of dispatching policies in content-aware web server cluster; segregation dispatching policy and mixture dispatching policy. Traditional scheduling algorithms all adopt mixture dispatching policy. T...There are two kinds of dispatching policies in content-aware web server cluster; segregation dispatching policy and mixture dispatching policy. Traditional scheduling algorithms all adopt mixture dispatching policy. They do not consider that dynamic requests' serving has the tendency to slow down static requests' serving, and that different requests have different resource demands, so they can not use duster's resource reasonably and effectively. This paper uses stochastic reward net (SRN) to model and analyze the two dispatching policies, and uses stochastic Petri net package (SPNP) to simulate the models. The simulation results and practical tests both show that segregation dispatching policy is better than mixture dispatching policy. The principle of segregation dispatching policy can guide us to design efficient scheduling algorithm.展开更多
Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing alg...Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing algorithm in ONE-IP technology which can ensure application-session-keep have been analyzed, an improved request dispatching algorithm which is simple, effective and supports dynamic load balance has been proposed. In this algorithm, dispatcher evaluates which server node will process request by applying a hash function to the client IP address and comparing the result with its assigned identifier subset; it adjusts the size of the subset according to the performance and current load of each server, so as to utilize all servers' resource effectively. Simulation shows that the improved algorithm has better performance than the original one.展开更多
With an aim to the fact that the K-means clustering algorithm usually ends in local optimization and is hard to harvest global optimization, a new web clustering method is presented based on the chaotic social evoluti...With an aim to the fact that the K-means clustering algorithm usually ends in local optimization and is hard to harvest global optimization, a new web clustering method is presented based on the chaotic social evolutionary programming (CSEP) algorithm. This method brings up the manner of that a cognitive agent inherits a paradigm in clustering to enable the cognitive agent to acquire a chaotic mutation operator in the betrayal. As proven in the experiment, this method can not only effectively increase web clustering efficiency, but it can also practically improve the precision of web clustering.展开更多
An approach for web server cluster(WSC)reliability and degradation process analysis is proposed.The reliability process is modeled as a non-homogeneous Markov process(NHMH)composed of several non-homogeneous Poisson p...An approach for web server cluster(WSC)reliability and degradation process analysis is proposed.The reliability process is modeled as a non-homogeneous Markov process(NHMH)composed of several non-homogeneous Poisson processes(NHPPs).The arrival rate of each NHPP corresponds to the system software failure rate which is expressed using Cox s proportional hazards model(PHM)in terms of the cumulative and instantaneous load of the software.The cumulative load refers to software cumulative execution time,and the instantaneous load denotes the rate that the users requests arrive at a server.The result of reliability analysis is a time-varying reliability and degradation process over the WSC lifetime.Finally,the evaluation experiment shows the effectiveness of the proposed approach.展开更多
Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web the...Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining.展开更多
We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can ...We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining.展开更多
Optimal clustering for the web documents is known to complicated cornbinatorial Optimization problem and it is hard to develop a generally applicable oplimal algorithm. An accelerated simuIated arlneaIing aIgorithm is...Optimal clustering for the web documents is known to complicated cornbinatorial Optimization problem and it is hard to develop a generally applicable oplimal algorithm. An accelerated simuIated arlneaIing aIgorithm is developed for automatic web document classification. The web document classification problem is addressed as the problem of best describing a match between a web query and a hypothesized web object. The normalized term frequency and inverse document frequency coefficient is used as a measure of the match. Test beds are generated on - line during the search by transforming model web sites. As a result, web sites can be clustered optimally in terms of keyword vectofs of corresponding web documents.展开更多
A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phr...A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.展开更多
As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This spa...As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags for clustering. In this work, we propose a user-related tag expansion method to overcome this problem, which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. To tackle this problem, we have designed a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion, especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods., which indicates that tags could be a better resource for the clustering task.展开更多
Aiming at the load imbalance and poor scalability in single-tier Web server clusters, an efficient load balancing ap- proach is proposed for constructing an N-hierarchical (multi-tier) Web server cluster. In each la...Aiming at the load imbalance and poor scalability in single-tier Web server clusters, an efficient load balancing ap- proach is proposed for constructing an N-hierarchical (multi-tier) Web server cluster. In each layer, multiple load balancers are set to receive the user requests simultaneously, and different load bal- ancing algorithms are used to construct the high-scalable Web cluster system. At the same time, an improved load balancing al- gorithm is proposed, which can dynamically calculate weights according to the utilization of the server resources, and reasonably distribute the loads for each server according to the load status of the servers. The experimental results show that the proposed ap- proach can greatly decrease the load imbalance among the Web servers and reduce the response time of the entire Web cluster system.展开更多
文摘A new admission control algorithm considering the network self-similar access characteristics is proposed. Taking advantage of the mathematical model of the network traffic admission control which can effectively overcome the self-similar characteristics of the network requests, through the scheduling of the differential service qucue based on priority while at the same time taking into account various factors including access characteristics of requests, load information, etc, smoothness of the admission control is ensured by the algorithm proposed in this paper. We design a non-linear self-adapting control algorithm by introducing an exponential admission function, thus overcomes the negative aspects introduced by static threshold parameters. Simulation results show that the scheme proposed in this paper can effectively improve the resource utilization of the clusters, while at the same time protecting the service with high priority. Our simulation results also show that this algorithm can improve system stability and reliability too. Key words Web cluster - admission control - differential service - self-similar - self-adapting CLC number TP 393 Foundation item: Supported by the National Natural Science Foundation of China (10375024) and the Hunan Natural Science Foundation of China(03JJY4054)Biography: LIU An-feng(1971-), male, Ph. D candidate, majoring in network computing, Web QoS.
基金Supported by the National Natural Science Funda-tion of China (60175015)
文摘Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm (TDSA). The algorithm adopts the throughput of cluster back-ends to evaluate their load and employs the neural network model to predict the future load so that the scheduling system features a self-learning capability and good adaptability to the change of load. Moreover, it separates static requests from dynamic requests to make full use of the CPU resources and takes the locality of requests into account to improve the cache hit ratio. Experimental re suits from the testing tool of WebBench^TM show better per formance for Web cluster server with TDSA than that with traditional scheduling algorithms.
基金Supported by the National Natural Science Foun-dation of China (90204008) the Science Council of Wuhan(20001001004)
文摘There are two kinds of dispatching policies in content-aware web server cluster; segregation dispatching policy and mixture dispatching policy. Traditional scheduling algorithms all adopt mixture dispatching policy. They do not consider that dynamic requests' serving has the tendency to slow down static requests' serving, and that different requests have different resource demands, so they can not use duster's resource reasonably and effectively. This paper uses stochastic reward net (SRN) to model and analyze the two dispatching policies, and uses stochastic Petri net package (SPNP) to simulate the models. The simulation results and practical tests both show that segregation dispatching policy is better than mixture dispatching policy. The principle of segregation dispatching policy can guide us to design efficient scheduling algorithm.
基金This work was supported by the National "863" program of China ( No.2003AA148010) and National Torch Project of China (No.2001EB001233) .
文摘Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing algorithm in ONE-IP technology which can ensure application-session-keep have been analyzed, an improved request dispatching algorithm which is simple, effective and supports dynamic load balance has been proposed. In this algorithm, dispatcher evaluates which server node will process request by applying a hash function to the client IP address and comparing the result with its assigned identifier subset; it adjusts the size of the subset according to the performance and current load of each server, so as to utilize all servers' resource effectively. Simulation shows that the improved algorithm has better performance than the original one.
文摘With an aim to the fact that the K-means clustering algorithm usually ends in local optimization and is hard to harvest global optimization, a new web clustering method is presented based on the chaotic social evolutionary programming (CSEP) algorithm. This method brings up the manner of that a cognitive agent inherits a paradigm in clustering to enable the cognitive agent to acquire a chaotic mutation operator in the betrayal. As proven in the experiment, this method can not only effectively increase web clustering efficiency, but it can also practically improve the precision of web clustering.
基金The National Natural Science Foundation of China(No.61402333,61402242)the National Science Foundation of Tianjin(No.15JCQNJC00400)
文摘An approach for web server cluster(WSC)reliability and degradation process analysis is proposed.The reliability process is modeled as a non-homogeneous Markov process(NHMH)composed of several non-homogeneous Poisson processes(NHPPs).The arrival rate of each NHPP corresponds to the system software failure rate which is expressed using Cox s proportional hazards model(PHM)in terms of the cumulative and instantaneous load of the software.The cumulative load refers to software cumulative execution time,and the instantaneous load denotes the rate that the users requests arrive at a server.The result of reliability analysis is a time-varying reliability and degradation process over the WSC lifetime.Finally,the evaluation experiment shows the effectiveness of the proposed approach.
文摘Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining.
文摘We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining.
文摘Optimal clustering for the web documents is known to complicated cornbinatorial Optimization problem and it is hard to develop a generally applicable oplimal algorithm. An accelerated simuIated arlneaIing aIgorithm is developed for automatic web document classification. The web document classification problem is addressed as the problem of best describing a match between a web query and a hypothesized web object. The normalized term frequency and inverse document frequency coefficient is used as a measure of the match. Test beds are generated on - line during the search by transforming model web sites. As a result, web sites can be clustered optimally in terms of keyword vectofs of corresponding web documents.
基金Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086)Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow Uni-versity (KJS0714)+1 种基金Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082)National Natural Science Foundation of Jiangsu (BK2006094).
文摘A new common phrase scoring method is proposed according to term frequency-inverse document frequency (TFIDF) and independence of the phrase. Combining the two properties can help identify more reasonable common phrases, which improve the accuracy of clustering. Also, the equation to measure the in-dependence of a phrase is proposed in this paper. The new algorithm which improves suffix tree clustering algorithm (STC) is named as improved suffix tree clustering (ISTC). To validate the proposed algorithm, a prototype system is implemented and used to cluster several groups of web search results obtained from Google search engine. Experimental results show that the improved algorithm offers higher accuracy than traditional suffix tree clustering.
基金supported by the National Natural Science Foundation of China under Grant No. 61070111
文摘As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags for clustering. In this work, we propose a user-related tag expansion method to overcome this problem, which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. To tackle this problem, we have designed a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion, especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods., which indicates that tags could be a better resource for the clustering task.
基金Supported by the National Natural Science Foundation of China(61073063,61173029,61272182 and 61173030)the Ocean Public Welfare Scientific Research Project of State Oceanic Administration of China(201105033)National Digital Ocean Key Laboratory Open Fund Projects(KLDO201306)
文摘Aiming at the load imbalance and poor scalability in single-tier Web server clusters, an efficient load balancing ap- proach is proposed for constructing an N-hierarchical (multi-tier) Web server cluster. In each layer, multiple load balancers are set to receive the user requests simultaneously, and different load bal- ancing algorithms are used to construct the high-scalable Web cluster system. At the same time, an improved load balancing al- gorithm is proposed, which can dynamically calculate weights according to the utilization of the server resources, and reasonably distribute the loads for each server according to the load status of the servers. The experimental results show that the proposed ap- proach can greatly decrease the load imbalance among the Web servers and reduce the response time of the entire Web cluster system.