This paper proposes a novel chronically evaluated highest instantaneous priority next processor scheduling algorithm. The currently existing algorithms like first come first serve, shortest job first, round-robin, sho...This paper proposes a novel chronically evaluated highest instantaneous priority next processor scheduling algorithm. The currently existing algorithms like first come first serve, shortest job first, round-robin, shortest remaining time first, highest response ratio next and varying response ratio priority algorithm have some problems associated with them. Some of them can lead to endless waiting or starvation and some of them like round-robin has problem of too many context switches and high waiting time associated with them. In the proposed algorithm, we have taken care of all such problems. As the novel algorithm is capable of achieving as good results as shortest remaining time first algorithm and also it will never lead to starvation.展开更多
To solve the problems of incomplete topic description and repetitive crawling of visited hyperlinks in traditional focused crawling methods,in this paper,we propose a novel focused crawler using an improved tabu searc...To solve the problems of incomplete topic description and repetitive crawling of visited hyperlinks in traditional focused crawling methods,in this paper,we propose a novel focused crawler using an improved tabu search algorithm with domain ontology and host information(FCITS_OH),where a domain ontology is constructed by formal concept analysis to describe topics at the semantic and knowledge levels.To avoid crawling visited hyperlinks and expand the search range,we present an improved tabu search(ITS)algorithm and the strategy of host information memory.In addition,a comprehensive priority evaluation method based on Web text and link structure is designed to improve the assessment of topic relevance for unvisited hyperlinks.Experimental results on both tourism and rainstorm disaster domains show that the proposed focused crawlers overmatch the traditional focused crawlers for different performance metrics.展开更多
At present,focused crawler is a crucial method for obtaining effective domain knowledge from massive heterogeneous networks.For most current focused crawling technologies,there are some difficulties in obtaining high-...At present,focused crawler is a crucial method for obtaining effective domain knowledge from massive heterogeneous networks.For most current focused crawling technologies,there are some difficulties in obtaining high-quality crawling results.The main difficulties are the establishment of topic benchmark models,the assessment of topic relevance of hyperlinks,and the design of crawling strategies.In this paper,we use domain ontology to build a topic benchmark model for a specific topic,and propose a novel multiple-filtering strategy based on local ontology and global ontology(MFSLG).A comprehensive priority evaluation method(CPEM)based on the web text and link structure is introduced to improve the computation precision of topic relevance for unvisited hyperlinks,and a simulated annealing(SA)method is used to avoid the focused crawler falling into local optima of the search.By incorporating SA into the focused crawler with MFSLG and CPEM for the first time,two novel focused crawler strategies based on ontology and SA(FCOSA),including FCOSA with only global ontology(FCOSA_G)and FCOSA with both local ontology and global ontology(FCOSA_LG),are proposed to obtain topic-relevant webpages about rainstorm disasters from the network.Experimental results show that the proposed crawlers outperform the other focused crawling strategies on different performance metric indices.展开更多
文摘This paper proposes a novel chronically evaluated highest instantaneous priority next processor scheduling algorithm. The currently existing algorithms like first come first serve, shortest job first, round-robin, shortest remaining time first, highest response ratio next and varying response ratio priority algorithm have some problems associated with them. Some of them can lead to endless waiting or starvation and some of them like round-robin has problem of too many context switches and high waiting time associated with them. In the proposed algorithm, we have taken care of all such problems. As the novel algorithm is capable of achieving as good results as shortest remaining time first algorithm and also it will never lead to starvation.
基金supported by the Guangdong Basic and Applied Basic Research Foundation of China(Nos.2021A1515011974 and 2023A1515011344)the Program of Science and Technology of Guangzhou,China(No.202002030238)。
文摘To solve the problems of incomplete topic description and repetitive crawling of visited hyperlinks in traditional focused crawling methods,in this paper,we propose a novel focused crawler using an improved tabu search algorithm with domain ontology and host information(FCITS_OH),where a domain ontology is constructed by formal concept analysis to describe topics at the semantic and knowledge levels.To avoid crawling visited hyperlinks and expand the search range,we present an improved tabu search(ITS)algorithm and the strategy of host information memory.In addition,a comprehensive priority evaluation method based on Web text and link structure is designed to improve the assessment of topic relevance for unvisited hyperlinks.Experimental results on both tourism and rainstorm disaster domains show that the proposed focused crawlers overmatch the traditional focused crawlers for different performance metrics.
基金supported by the Special Foundation of Guangzhou Key Laboratory of Multilingual Intelligent Processing,China(No.201905010008)the Program of Science and Technology of Guangzhou,China(No.202002030238)the Guangdong Basic and Applied Basic Research Foundation,China(No.2021A1515011974)。
文摘At present,focused crawler is a crucial method for obtaining effective domain knowledge from massive heterogeneous networks.For most current focused crawling technologies,there are some difficulties in obtaining high-quality crawling results.The main difficulties are the establishment of topic benchmark models,the assessment of topic relevance of hyperlinks,and the design of crawling strategies.In this paper,we use domain ontology to build a topic benchmark model for a specific topic,and propose a novel multiple-filtering strategy based on local ontology and global ontology(MFSLG).A comprehensive priority evaluation method(CPEM)based on the web text and link structure is introduced to improve the computation precision of topic relevance for unvisited hyperlinks,and a simulated annealing(SA)method is used to avoid the focused crawler falling into local optima of the search.By incorporating SA into the focused crawler with MFSLG and CPEM for the first time,two novel focused crawler strategies based on ontology and SA(FCOSA),including FCOSA with only global ontology(FCOSA_G)and FCOSA with both local ontology and global ontology(FCOSA_LG),are proposed to obtain topic-relevant webpages about rainstorm disasters from the network.Experimental results show that the proposed crawlers outperform the other focused crawling strategies on different performance metric indices.