Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm (TDSA). The algorithm adopts the throughput of cluster back-ends to evaluate their load...Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm (TDSA). The algorithm adopts the throughput of cluster back-ends to evaluate their load and employs the neural network model to predict the future load so that the scheduling system features a self-learning capability and good adaptability to the change of load. Moreover, it separates static requests from dynamic requests to make full use of the CPU resources and takes the locality of requests into account to improve the cache hit ratio. Experimental re suits from the testing tool of WebBench^TM show better per formance for Web cluster server with TDSA than that with traditional scheduling algorithms.展开更多
A new admission control algorithm considering the network self-similar access characteristics is proposed. Taking advantage of the mathematical model of the network traffic admission control which can effectively over...A new admission control algorithm considering the network self-similar access characteristics is proposed. Taking advantage of the mathematical model of the network traffic admission control which can effectively overcome the self-similar characteristics of the network requests, through the scheduling of the differential service qucue based on priority while at the same time taking into account various factors including access characteristics of requests, load information, etc, smoothness of the admission control is ensured by the algorithm proposed in this paper. We design a non-linear self-adapting control algorithm by introducing an exponential admission function, thus overcomes the negative aspects introduced by static threshold parameters. Simulation results show that the scheme proposed in this paper can effectively improve the resource utilization of the clusters, while at the same time protecting the service with high priority. Our simulation results also show that this algorithm can improve system stability and reliability too. Key words Web cluster - admission control - differential service - self-similar - self-adapting CLC number TP 393 Foundation item: Supported by the National Natural Science Foundation of China (10375024) and the Hunan Natural Science Foundation of China(03JJY4054)Biography: LIU An-feng(1971-), male, Ph. D candidate, majoring in network computing, Web QoS.展开更多
Optimal clustering for the web documents is known to complicated cornbinatorial Optimization problem and it is hard to develop a generally applicable oplimal algorithm. An accelerated simuIated arlneaIing aIgorithm is...Optimal clustering for the web documents is known to complicated cornbinatorial Optimization problem and it is hard to develop a generally applicable oplimal algorithm. An accelerated simuIated arlneaIing aIgorithm is developed for automatic web document classification. The web document classification problem is addressed as the problem of best describing a match between a web query and a hypothesized web object. The normalized term frequency and inverse document frequency coefficient is used as a measure of the match. Test beds are generated on - line during the search by transforming model web sites. As a result, web sites can be clustered optimally in terms of keyword vectofs of corresponding web documents.展开更多
There are two kinds of dispatching policies in content-aware web server cluster; segregation dispatching policy and mixture dispatching policy. Traditional scheduling algorithms all adopt mixture dispatching policy. T...There are two kinds of dispatching policies in content-aware web server cluster; segregation dispatching policy and mixture dispatching policy. Traditional scheduling algorithms all adopt mixture dispatching policy. They do not consider that dynamic requests' serving has the tendency to slow down static requests' serving, and that different requests have different resource demands, so they can not use duster's resource reasonably and effectively. This paper uses stochastic reward net (SRN) to model and analyze the two dispatching policies, and uses stochastic Petri net package (SPNP) to simulate the models. The simulation results and practical tests both show that segregation dispatching policy is better than mixture dispatching policy. The principle of segregation dispatching policy can guide us to design efficient scheduling algorithm.展开更多
In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction p...In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining.展开更多
作物病害是我国主要农业灾害之一,严重危害作物生长发育,威胁粮食安全。为宏观掌握作物病害的发展动态,了解作物病害监测和预警的研究前沿和应用热点,基于文献计量学方法,利用VOSviewer可视化软件,对2003—2022年间Web of Science核心...作物病害是我国主要农业灾害之一,严重危害作物生长发育,威胁粮食安全。为宏观掌握作物病害的发展动态,了解作物病害监测和预警的研究前沿和应用热点,基于文献计量学方法,利用VOSviewer可视化软件,对2003—2022年间Web of Science核心合集数据库收录的作物病害监测和预警研究的相关论文进行可视化分析,为作物病害研究者跟踪研究前沿、把握研究方向提供理论参考。结果表明:作物病害监测和预警领域发文量整体呈现逐步上升趋势,具有广阔的发展前景;中国是作物病害监测和预警研究领域发文数量最多的国家,但研究成果质量需进一步提升;核心作者之间已形成固定的核心研究团队,发文量最多的作者来自以黄文江、张竞成、康振生和Varshney为代表的研究团队;研究成果主要刊载在Frontiers in Plant Science、Plant Disease和Computers and Electronics in Agriculture期刊上;发文的主要机构有美国农业部农业研究局、中国科学院和中国农业科学院;抗病基因育种、PCR诊断作物病害、卷积神经网络和深度学习分类作物病害和遥感监测作物植被指数是近20年来该领域研究的重点和热点。综合来看,作物病害监测和预警研究具有较强的应用前景,但面临的挑战仍很大,需要突破现有技术手段,多种技术相融合,推动作物病害监测和预警向着更加智能化、精准化的方向发展。展开更多
Single Web server would become a bottleneck that influences the availability and stability of Web service. Ten years ago, what had been proposed is to add Web servers for resolving this problem—Web Server Cluster. In...Single Web server would become a bottleneck that influences the availability and stability of Web service. Ten years ago, what had been proposed is to add Web servers for resolving this problem—Web Server Cluster. In recent years, the concept of cloud computing has got rapid development, and is becoming the future development trend of the IT industry. One of the characteristics of cloud computing is putting lots of computing resources together to provide users with a unified service. In this paper, we have proposed a new Cloud-Based Web Server Cluster Solution, based on the existing cloud computing model—Twitter Storm. It involves a new way to handle the web request from client and some other new features compared to the traditional Web Server Cluster. Combining with cloud computing, it would be the new trend of Web Server Cluster, and its feasibility is described in the paper too.展开更多
Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web the...Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining.展开更多
Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing alg...Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing algorithm in ONE-IP technology which can ensure application-session-keep have been analyzed, an improved request dispatching algorithm which is simple, effective and supports dynamic load balance has been proposed. In this algorithm, dispatcher evaluates which server node will process request by applying a hash function to the client IP address and comparing the result with its assigned identifier subset; it adjusts the size of the subset according to the performance and current load of each server, so as to utilize all servers' resource effectively. Simulation shows that the improved algorithm has better performance than the original one.展开更多
We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can ...We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining.展开更多
An approach for web server cluster(WSC)reliability and degradation process analysis is proposed.The reliability process is modeled as a non-homogeneous Markov process(NHMH)composed of several non-homogeneous Poisson p...An approach for web server cluster(WSC)reliability and degradation process analysis is proposed.The reliability process is modeled as a non-homogeneous Markov process(NHMH)composed of several non-homogeneous Poisson processes(NHPPs).The arrival rate of each NHPP corresponds to the system software failure rate which is expressed using Cox s proportional hazards model(PHM)in terms of the cumulative and instantaneous load of the software.The cumulative load refers to software cumulative execution time,and the instantaneous load denotes the rate that the users requests arrive at a server.The result of reliability analysis is a time-varying reliability and degradation process over the WSC lifetime.Finally,the evaluation experiment shows the effectiveness of the proposed approach.展开更多
The content-ignorant clustering method takes advantages in time complexity and space complexity than the content based methods.In this paper,the authors introduce a unified expanding method for content-ignorant web pa...The content-ignorant clustering method takes advantages in time complexity and space complexity than the content based methods.In this paper,the authors introduce a unified expanding method for content-ignorant web page clustering by mining the "click-through" log,which tries to solve the problem that the "click-through" log is sparse.The relationship between two nodes which have been expanded is also defined and optimized.Analysis and experiment show that the performance of the new method has improved,by the comparison with the standard content-ignorant method.The new method can also work without iterative clustering.展开更多
To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant...To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.展开更多
A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the...A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the site, then computes fuzzy degree of cross page through aggregating on data of Web log. After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits, and derives the fuzzy similarity matrix from the interest vectors for the Web users. Finally, it gets the clustering result through the fuzzy clustering method. The experimental results show the effectiveness of the method. Key words Web log mining - fuzzy similarity matrix - fuzzy comprehensive evaluation - fuzzy clustering CLC number TP18 - TP311 - TP391 Foundation item: Supported by the Natural Science Foundation of Heilongjiang Province of China (F0304)Biography: ZHAN Li-qiang (1966-), male, Lecturer, Ph. D. research direction: the theory methods of data mining and theory of database.展开更多
基金Supported by the National Natural Science Funda-tion of China (60175015)
文摘Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm (TDSA). The algorithm adopts the throughput of cluster back-ends to evaluate their load and employs the neural network model to predict the future load so that the scheduling system features a self-learning capability and good adaptability to the change of load. Moreover, it separates static requests from dynamic requests to make full use of the CPU resources and takes the locality of requests into account to improve the cache hit ratio. Experimental re suits from the testing tool of WebBench^TM show better per formance for Web cluster server with TDSA than that with traditional scheduling algorithms.
文摘A new admission control algorithm considering the network self-similar access characteristics is proposed. Taking advantage of the mathematical model of the network traffic admission control which can effectively overcome the self-similar characteristics of the network requests, through the scheduling of the differential service qucue based on priority while at the same time taking into account various factors including access characteristics of requests, load information, etc, smoothness of the admission control is ensured by the algorithm proposed in this paper. We design a non-linear self-adapting control algorithm by introducing an exponential admission function, thus overcomes the negative aspects introduced by static threshold parameters. Simulation results show that the scheme proposed in this paper can effectively improve the resource utilization of the clusters, while at the same time protecting the service with high priority. Our simulation results also show that this algorithm can improve system stability and reliability too. Key words Web cluster - admission control - differential service - self-similar - self-adapting CLC number TP 393 Foundation item: Supported by the National Natural Science Foundation of China (10375024) and the Hunan Natural Science Foundation of China(03JJY4054)Biography: LIU An-feng(1971-), male, Ph. D candidate, majoring in network computing, Web QoS.
文摘Optimal clustering for the web documents is known to complicated cornbinatorial Optimization problem and it is hard to develop a generally applicable oplimal algorithm. An accelerated simuIated arlneaIing aIgorithm is developed for automatic web document classification. The web document classification problem is addressed as the problem of best describing a match between a web query and a hypothesized web object. The normalized term frequency and inverse document frequency coefficient is used as a measure of the match. Test beds are generated on - line during the search by transforming model web sites. As a result, web sites can be clustered optimally in terms of keyword vectofs of corresponding web documents.
基金Supported by the National Natural Science Foun-dation of China (90204008) the Science Council of Wuhan(20001001004)
文摘There are two kinds of dispatching policies in content-aware web server cluster; segregation dispatching policy and mixture dispatching policy. Traditional scheduling algorithms all adopt mixture dispatching policy. They do not consider that dynamic requests' serving has the tendency to slow down static requests' serving, and that different requests have different resource demands, so they can not use duster's resource reasonably and effectively. This paper uses stochastic reward net (SRN) to model and analyze the two dispatching policies, and uses stochastic Petri net package (SPNP) to simulate the models. The simulation results and practical tests both show that segregation dispatching policy is better than mixture dispatching policy. The principle of segregation dispatching policy can guide us to design efficient scheduling algorithm.
文摘In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining.
文摘作物病害是我国主要农业灾害之一,严重危害作物生长发育,威胁粮食安全。为宏观掌握作物病害的发展动态,了解作物病害监测和预警的研究前沿和应用热点,基于文献计量学方法,利用VOSviewer可视化软件,对2003—2022年间Web of Science核心合集数据库收录的作物病害监测和预警研究的相关论文进行可视化分析,为作物病害研究者跟踪研究前沿、把握研究方向提供理论参考。结果表明:作物病害监测和预警领域发文量整体呈现逐步上升趋势,具有广阔的发展前景;中国是作物病害监测和预警研究领域发文数量最多的国家,但研究成果质量需进一步提升;核心作者之间已形成固定的核心研究团队,发文量最多的作者来自以黄文江、张竞成、康振生和Varshney为代表的研究团队;研究成果主要刊载在Frontiers in Plant Science、Plant Disease和Computers and Electronics in Agriculture期刊上;发文的主要机构有美国农业部农业研究局、中国科学院和中国农业科学院;抗病基因育种、PCR诊断作物病害、卷积神经网络和深度学习分类作物病害和遥感监测作物植被指数是近20年来该领域研究的重点和热点。综合来看,作物病害监测和预警研究具有较强的应用前景,但面临的挑战仍很大,需要突破现有技术手段,多种技术相融合,推动作物病害监测和预警向着更加智能化、精准化的方向发展。
文摘Single Web server would become a bottleneck that influences the availability and stability of Web service. Ten years ago, what had been proposed is to add Web servers for resolving this problem—Web Server Cluster. In recent years, the concept of cloud computing has got rapid development, and is becoming the future development trend of the IT industry. One of the characteristics of cloud computing is putting lots of computing resources together to provide users with a unified service. In this paper, we have proposed a new Cloud-Based Web Server Cluster Solution, based on the existing cloud computing model—Twitter Storm. It involves a new way to handle the web request from client and some other new features compared to the traditional Web Server Cluster. Combining with cloud computing, it would be the new trend of Web Server Cluster, and its feasibility is described in the paper too.
文摘Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining.
基金This work was supported by the National "863" program of China ( No.2003AA148010) and National Torch Project of China (No.2001EB001233) .
文摘Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing algorithm in ONE-IP technology which can ensure application-session-keep have been analyzed, an improved request dispatching algorithm which is simple, effective and supports dynamic load balance has been proposed. In this algorithm, dispatcher evaluates which server node will process request by applying a hash function to the client IP address and comparing the result with its assigned identifier subset; it adjusts the size of the subset according to the performance and current load of each server, so as to utilize all servers' resource effectively. Simulation shows that the improved algorithm has better performance than the original one.
文摘We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining.
基金The National Natural Science Foundation of China(No.61402333,61402242)the National Science Foundation of Tianjin(No.15JCQNJC00400)
文摘An approach for web server cluster(WSC)reliability and degradation process analysis is proposed.The reliability process is modeled as a non-homogeneous Markov process(NHMH)composed of several non-homogeneous Poisson processes(NHPPs).The arrival rate of each NHPP corresponds to the system software failure rate which is expressed using Cox s proportional hazards model(PHM)in terms of the cumulative and instantaneous load of the software.The cumulative load refers to software cumulative execution time,and the instantaneous load denotes the rate that the users requests arrive at a server.The result of reliability analysis is a time-varying reliability and degradation process over the WSC lifetime.Finally,the evaluation experiment shows the effectiveness of the proposed approach.
文摘The content-ignorant clustering method takes advantages in time complexity and space complexity than the content based methods.In this paper,the authors introduce a unified expanding method for content-ignorant web page clustering by mining the "click-through" log,which tries to solve the problem that the "click-through" log is sparse.The relationship between two nodes which have been expanded is also defined and optimized.Analysis and experiment show that the performance of the new method has improved,by the comparison with the standard content-ignorant method.The new method can also work without iterative clustering.
基金Supported by the Natural Science Foundation of Jiangsu Province(BK2005046)
文摘To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.
文摘A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article. The method first defines page fuzzy categories according to the links on the index page of the site, then computes fuzzy degree of cross page through aggregating on data of Web log. After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits, and derives the fuzzy similarity matrix from the interest vectors for the Web users. Finally, it gets the clustering result through the fuzzy clustering method. The experimental results show the effectiveness of the method. Key words Web log mining - fuzzy similarity matrix - fuzzy comprehensive evaluation - fuzzy clustering CLC number TP18 - TP311 - TP391 Foundation item: Supported by the Natural Science Foundation of Heilongjiang Province of China (F0304)Biography: ZHAN Li-qiang (1966-), male, Lecturer, Ph. D. research direction: the theory methods of data mining and theory of database.