As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results a...As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.展开更多
The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in thi...The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in this paper is based on exploiting the implicit feedbacks of user satisfaction during her web browsing history to construct a user profile storing the web pages the user is highly interested in. A weight is assigned to each page stored in the user’s profile;this weight reflects the user’s interest in this page. We name this weight the relative rank of the page, since it depends on the user issuing the query. Therefore, the ranking algorithm provided in this paper is based on the principle that;the rank assigned to a page is the addition of two rank values R_rank and A_rank. A_rank is an absolute rank, since it is fixed for all users issuing the same query, it only depends on the link structures of the web and on the keywords of the query. Thus, it could be calculated by the PageRank algorithm suggested by Brin and Page in 1998 and used by the google search engine. While, R_rank is the relative rank, it is calculated by the methods given in this paper which depends mainly on recording implicit measures of user satisfaction during her previous browsing history.展开更多
In order to rank searching results according to the user preferences,a new personalized web pages ranking algorithm called PWPR(personalized web page ranking)with the idea of adjusting the ranking scores of web page...In order to rank searching results according to the user preferences,a new personalized web pages ranking algorithm called PWPR(personalized web page ranking)with the idea of adjusting the ranking scores of web pages in accordance with user preferences is proposed.PWPR assigns the initial weights based on user interests and creates the virtual links and hubs according to user interests.By measuring user click streams,PWPR incrementally reflects users’ favors for the personalized ranking.To improve the accuracy of ranking, PWPR also takes collaborative filtering into consideration when the query with similar is submitted by users who have similar user interests. Detailed simulation results and comparison with other algorithms prove that the proposed PWPR can adaptively provide personalized ranking and truly relevant information to user preferences.展开更多
This paper proposed a watermarking algorithm for tamper-proof of web pages. For a web page, it generates a watermark consisting of a sequence of Space and Tab. The wa termark is then embedded into the web page after e...This paper proposed a watermarking algorithm for tamper-proof of web pages. For a web page, it generates a watermark consisting of a sequence of Space and Tab. The wa termark is then embedded into the web page after each word and each line. When a watermarked web page is tampered, the extracted watermark can detect and locate the modifications to the web page. Besides, the framework of watermarked Web Server system was given. Compared with traditional digital signature methods, this watermarking method is more transparent in that there is no necessary to detach the watermark before displaying web pages. The e xperimental results show that the proposed scheme is an effective tool for tamper-proof of web pages.展开更多
Cyber-crimes are growing rapidly,so it is important to obtain the digital evidence on the web page.Usually,people can examine the browser history on the client side and data files on the server side,but both of them h...Cyber-crimes are growing rapidly,so it is important to obtain the digital evidence on the web page.Usually,people can examine the browser history on the client side and data files on the server side,but both of them have shortcomings in real criminal investigation.To overcome the weakness,this paper designs a web page forensic scheme to snapshot the pages from web servers with the help of web spider.Also,it designs several steps to improve the trustworthiness of these pages.All the pages will be dumped in local database which can be presented as reliable evidence on the court.展开更多
As part of business translation, web page translation is an effective way of global communication. Accurate and adequate translation of web pages can enhance the enterprises' competitiveness. This paper focuses on th...As part of business translation, web page translation is an effective way of global communication. Accurate and adequate translation of web pages can enhance the enterprises' competitiveness. This paper focuses on the features of Chinese and Western web pages of garment enterprises and the translation strategy with the guidance of mass communication theory. Results illustrate that Chinese texts favor symmetrical phrases rich in cultural connotation, whereas English texts prefer plain language with fewer culture-loaded expressions and the latter focuses more on the feeling of audiences. As a result, audience-oriented translation strategy is strongly recommended so as to maximize the communication effect of the enterprise web pages展开更多
The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the w...The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the web pages is evaluated using several carefully selected statistical models. Universities web pages are chosen as subjects for this work for ease of comparison and ease of collecting data. A series of experiments has been conducted to investigate into the usability and design of the universities web pages. Prototype web pages have been developed according to the structured methodologies of web pages design and usability. Universities web pages were evaluated together with the prototype web pages using a questionnaire which was designed according to the Human Computer Interactions (HCI) heuristics. Nine (users) respondents’ variables and 14 web pages variables (items) were studied. Stringent statistical analysis was adopted to extract the required information to form the data acquired, and augmented interpretation of the statistical results was followed. The results showed that the analysis of variance (ANOVA) procedure showed there were significant differences among the universities web pages regarding most of the 23 items studied. Duncan Multiple Range Test (DMRT) showed that the prototype usability performed significantly better regarding most of the items. The correlation analysis showed significant positive and negative correlations between many items. The regression analysis revealed that the most significant factors (items) that contributed to the best model of the universities web pages design and usability were: multimedia in the web pages, the web pages icons (alone) organisation and design, and graphics attractiveness. The results showed some of the limitations of some heuristics used in conventional interface systems design and proposed some additional heuristics in web pages design and usability.展开更多
In this paper, we discuss several issues related to automated classification of web pages, especially text classification of web pages. We analyze features selection and categorization algorithms of web pages and give...In this paper, we discuss several issues related to automated classification of web pages, especially text classification of web pages. We analyze features selection and categorization algorithms of web pages and give some suggestions for web pages categorization.展开更多
Automatic web page classification has become inevitable for web directories due to the multitude of web pages in the World Wide Web. In this paper an improved Term Weighting technique is proposed for automatic and eff...Automatic web page classification has become inevitable for web directories due to the multitude of web pages in the World Wide Web. In this paper an improved Term Weighting technique is proposed for automatic and effective classification of web pages. The web documents are represented as set of features. The proposed method selects and extracts the most prominent features reducing the high dimensionality problem of classifier. The proper selection of features among the large set improves the performance of the classifier. The proposed algorithm is implemented and tested on a benchmarked dataset. The results show the better performance than most of the existing term weighting techniques.展开更多
Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practic...Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.展开更多
“Web Design and Website Construction”is a core professional course for e-commerce majors.This article explores how to integrate ideological and political education into the curriculum teaching of the course from thr...“Web Design and Website Construction”is a core professional course for e-commerce majors.This article explores how to integrate ideological and political education into the curriculum teaching of the course from three aspects:the necessity of ideological and political education construction,the construction goals,and the implementation paths.It not only improves students’professional and technical skills,but also guides students to establish a correct outlook on life and values,and cultivates students’comprehensive development of comprehensive literacy.展开更多
Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different cha...Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different characteristics, and multiple classifiers can be combined to allow classifiers to complement one another. In this study, a web page classification method based on heterogeneous features and a combination of multiple classifiers is proposed. Different from computing the frequency of HTML tags, we exploit the tree-like structure of HTML tags to characterize the structural features of a web page. Heterogeneous textual features and the proposed tree-like structural features are converted into vectors and fused. Confidence is proposed here as a criterion to compare the classification results of different classifiers by calculating the classification accuracy of a set of samples. Multiple classifiers are combined based on confidence with different decision strategies, such as voting, confidence comparison, and direct output, to give the final classification results. Experimental results demonstrate that on the Amazon dataset, 7-web-genres dataset, and DMOZ dataset, the accuracies are increased to 94.2%, 95.4%, and 95.7%, respectively. The fusion of the textual features with the proposed structural features is a comprehensive approach, and the accuracy is higher than that when using only textual features. At the same time, the accuracy of the web page classification is improved by combining multiple classifiers, and is higher than those of the related web page classification algorithms.展开更多
The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This help...The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This helps the search engines to provide users with relevant and quick retrieval results. As web pages are represented by thousands of features, feature selection helps the web page classifiers to resolve this large scale dimensionality problem. This paper proposes a new feature selection method using Ward’s minimum variance measure. This measure is first used to identify clusters of redundant features in a web page. In each cluster, the best representative features are retained and the others are eliminated. Removing such redundant features helps in minimizing the resource utilization during classification. The proposed method of feature selection is compared with other common feature selection methods. Experiments done on a benchmark data set, namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.展开更多
To understand website complexity deeply, a web page complexity measurement system is developed. The system measures the complexity of a web page at two levels: transport-level and content-level, using a packet trace-...To understand website complexity deeply, a web page complexity measurement system is developed. The system measures the complexity of a web page at two levels: transport-level and content-level, using a packet trace-based approach rather than server or client logs. Packet traces surpass others in the amount of information contained. Quantitative analyses show that different categories of web pages have different complexity characteristics. Experimental results show that a news web page usually loads much more elements at more accessing levels from much more web servers within diverse administrative domains over much more concurrent transmission control protocol (TCP) flows. About more than half of education pages each only involve a few logical servers, where most of elements of a web page are fetched only from one or two logical servers. The number of content types for web game traffic after login is usually least. The system can help web page designers to design more efficient web pages, and help researchers or Internet users to know communication details.展开更多
Web page classification is an important application in many fields of Internet information retrieval,such as providing directory classification and vertical search. Methods based on query log which is a light weight v...Web page classification is an important application in many fields of Internet information retrieval,such as providing directory classification and vertical search. Methods based on query log which is a light weight version of Web page classification can avoid Web content crawling, making it relatively high in efficiency, but the sparsity of user click data makes it difficult to be used directly for constructing a classifier. To solve this problem, we explore the semantic relations among different queries through word embedding, and propose three improved graph structure classification algorithms. To reflect the semantic relevance between queries, we map the user query into the low-dimensional space according to its query vector in the first step. Then, we calculate the uniform resource locator(URL) vector according to the relationship between the query and URL. Finally, we use the improved label propagation algorithm(LPA) and the bipartite graph expansion algorithm to classify the unlabeled Web pages. Experiments show that our methods make about 20% more increase in F1-value than other Web page classification methods based on query log.展开更多
文摘As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.
文摘The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in this paper is based on exploiting the implicit feedbacks of user satisfaction during her web browsing history to construct a user profile storing the web pages the user is highly interested in. A weight is assigned to each page stored in the user’s profile;this weight reflects the user’s interest in this page. We name this weight the relative rank of the page, since it depends on the user issuing the query. Therefore, the ranking algorithm provided in this paper is based on the principle that;the rank assigned to a page is the addition of two rank values R_rank and A_rank. A_rank is an absolute rank, since it is fixed for all users issuing the same query, it only depends on the link structures of the web and on the keywords of the query. Thus, it could be calculated by the PageRank algorithm suggested by Brin and Page in 1998 and used by the google search engine. While, R_rank is the relative rank, it is calculated by the methods given in this paper which depends mainly on recording implicit measures of user satisfaction during her previous browsing history.
基金The Natural Science Foundation of South-Central University for Nationalities(No.YZZ07006)
文摘In order to rank searching results according to the user preferences,a new personalized web pages ranking algorithm called PWPR(personalized web page ranking)with the idea of adjusting the ranking scores of web pages in accordance with user preferences is proposed.PWPR assigns the initial weights based on user interests and creates the virtual links and hubs according to user interests.By measuring user click streams,PWPR incrementally reflects users’ favors for the personalized ranking.To improve the accuracy of ranking, PWPR also takes collaborative filtering into consideration when the query with similar is submitted by users who have similar user interests. Detailed simulation results and comparison with other algorithms prove that the proposed PWPR can adaptively provide personalized ranking and truly relevant information to user preferences.
文摘This paper proposed a watermarking algorithm for tamper-proof of web pages. For a web page, it generates a watermark consisting of a sequence of Space and Tab. The wa termark is then embedded into the web page after each word and each line. When a watermarked web page is tampered, the extracted watermark can detect and locate the modifications to the web page. Besides, the framework of watermarked Web Server system was given. Compared with traditional digital signature methods, this watermarking method is more transparent in that there is no necessary to detach the watermark before displaying web pages. The e xperimental results show that the proposed scheme is an effective tool for tamper-proof of web pages.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61272540)the National Basic Research Program of China(973 Program)(Grant No.2013CB329604)+3 种基金the National High Technology Research and Development Program of China(Grant No.2012AA011005)the Natural Science Foundation of Anhui Province,China(Grant No.11040606M138 and No.1208085MF101)the Specialized Research Fund for the Doctoral Program of Higher Education of China(Grant No.2011JYXJ1498)the Fundamental Research Funds for the Central Universities(Grant No.2011HGQC1012)
文摘Cyber-crimes are growing rapidly,so it is important to obtain the digital evidence on the web page.Usually,people can examine the browser history on the client side and data files on the server side,but both of them have shortcomings in real criminal investigation.To overcome the weakness,this paper designs a web page forensic scheme to snapshot the pages from web servers with the help of web spider.Also,it designs several steps to improve the trustworthiness of these pages.All the pages will be dumped in local database which can be presented as reliable evidence on the court.
文摘As part of business translation, web page translation is an effective way of global communication. Accurate and adequate translation of web pages can enhance the enterprises' competitiveness. This paper focuses on the features of Chinese and Western web pages of garment enterprises and the translation strategy with the guidance of mass communication theory. Results illustrate that Chinese texts favor symmetrical phrases rich in cultural connotation, whereas English texts prefer plain language with fewer culture-loaded expressions and the latter focuses more on the feeling of audiences. As a result, audience-oriented translation strategy is strongly recommended so as to maximize the communication effect of the enterprise web pages
文摘The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the web pages is evaluated using several carefully selected statistical models. Universities web pages are chosen as subjects for this work for ease of comparison and ease of collecting data. A series of experiments has been conducted to investigate into the usability and design of the universities web pages. Prototype web pages have been developed according to the structured methodologies of web pages design and usability. Universities web pages were evaluated together with the prototype web pages using a questionnaire which was designed according to the Human Computer Interactions (HCI) heuristics. Nine (users) respondents’ variables and 14 web pages variables (items) were studied. Stringent statistical analysis was adopted to extract the required information to form the data acquired, and augmented interpretation of the statistical results was followed. The results showed that the analysis of variance (ANOVA) procedure showed there were significant differences among the universities web pages regarding most of the 23 items studied. Duncan Multiple Range Test (DMRT) showed that the prototype usability performed significantly better regarding most of the items. The correlation analysis showed significant positive and negative correlations between many items. The regression analysis revealed that the most significant factors (items) that contributed to the best model of the universities web pages design and usability were: multimedia in the web pages, the web pages icons (alone) organisation and design, and graphics attractiveness. The results showed some of the limitations of some heuristics used in conventional interface systems design and proposed some additional heuristics in web pages design and usability.
文摘In this paper, we discuss several issues related to automated classification of web pages, especially text classification of web pages. We analyze features selection and categorization algorithms of web pages and give some suggestions for web pages categorization.
文摘Automatic web page classification has become inevitable for web directories due to the multitude of web pages in the World Wide Web. In this paper an improved Term Weighting technique is proposed for automatic and effective classification of web pages. The web documents are represented as set of features. The proposed method selects and extracts the most prominent features reducing the high dimensionality problem of classifier. The proper selection of features among the large set improves the performance of the classifier. The proposed algorithm is implemented and tested on a benchmarked dataset. The results show the better performance than most of the existing term weighting techniques.
基金The National Natural Science Foundation of China(No60082003)
文摘Web pages contain more abundant contents than pure text ,such as hyperlinks,html tags and metadata et al.So that Web page categorization is different from pure text. According to Internet Chinese news pages, a practical algorithm for extracting subject concepts from web page without thesaurus was proposed, when incorporated these category-subject concepts into knowledge base, Web pages was classified by hybrid algorithm, with experiment corpus extracting from Xinhua net. Experimental result shows that the categorization performance is improved using Web page feature.
基金2021 University-Level Undergraduate High-Quality Curriculum Construction Reform Project of Wuyi University:Web Design and Website Construction(Project number:5071700304C8)。
文摘“Web Design and Website Construction”is a core professional course for e-commerce majors.This article explores how to integrate ideological and political education into the curriculum teaching of the course from three aspects:the necessity of ideological and political education construction,the construction goals,and the implementation paths.It not only improves students’professional and technical skills,but also guides students to establish a correct outlook on life and values,and cultivates students’comprehensive development of comprehensive literacy.
基金Project supported by the National Natural Science Foundation of China(No.61471314)the Welfare Technology Research Project of Zhejiang Province,China(No.LGG18F010003)。
文摘Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different characteristics, and multiple classifiers can be combined to allow classifiers to complement one another. In this study, a web page classification method based on heterogeneous features and a combination of multiple classifiers is proposed. Different from computing the frequency of HTML tags, we exploit the tree-like structure of HTML tags to characterize the structural features of a web page. Heterogeneous textual features and the proposed tree-like structural features are converted into vectors and fused. Confidence is proposed here as a criterion to compare the classification results of different classifiers by calculating the classification accuracy of a set of samples. Multiple classifiers are combined based on confidence with different decision strategies, such as voting, confidence comparison, and direct output, to give the final classification results. Experimental results demonstrate that on the Amazon dataset, 7-web-genres dataset, and DMOZ dataset, the accuracies are increased to 94.2%, 95.4%, and 95.7%, respectively. The fusion of the textual features with the proposed structural features is a comprehensive approach, and the accuracy is higher than that when using only textual features. At the same time, the accuracy of the web page classification is improved by combining multiple classifiers, and is higher than those of the related web page classification algorithms.
文摘The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This helps the search engines to provide users with relevant and quick retrieval results. As web pages are represented by thousands of features, feature selection helps the web page classifiers to resolve this large scale dimensionality problem. This paper proposes a new feature selection method using Ward’s minimum variance measure. This measure is first used to identify clusters of redundant features in a web page. In each cluster, the best representative features are retained and the others are eliminated. Removing such redundant features helps in minimizing the resource utilization during classification. The proposed method of feature selection is compared with other common feature selection methods. Experiments done on a benchmark data set, namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.
基金supported by the Open Research Program of the Key Laboratory of Computer Network and Information Integration(Southeast University),Ministry of Education(K93-9-2014-04B)the National Natural Science Foundation of China(61170322,61572263,61302157)
文摘To understand website complexity deeply, a web page complexity measurement system is developed. The system measures the complexity of a web page at two levels: transport-level and content-level, using a packet trace-based approach rather than server or client logs. Packet traces surpass others in the amount of information contained. Quantitative analyses show that different categories of web pages have different complexity characteristics. Experimental results show that a news web page usually loads much more elements at more accessing levels from much more web servers within diverse administrative domains over much more concurrent transmission control protocol (TCP) flows. About more than half of education pages each only involve a few logical servers, where most of elements of a web page are fetched only from one or two logical servers. The number of content types for web game traffic after login is usually least. The system can help web page designers to design more efficient web pages, and help researchers or Internet users to know communication details.
文摘Web page classification is an important application in many fields of Internet information retrieval,such as providing directory classification and vertical search. Methods based on query log which is a light weight version of Web page classification can avoid Web content crawling, making it relatively high in efficiency, but the sparsity of user click data makes it difficult to be used directly for constructing a classifier. To solve this problem, we explore the semantic relations among different queries through word embedding, and propose three improved graph structure classification algorithms. To reflect the semantic relevance between queries, we map the user query into the low-dimensional space according to its query vector in the first step. Then, we calculate the uniform resource locator(URL) vector according to the relationship between the query and URL. Finally, we use the improved label propagation algorithm(LPA) and the bipartite graph expansion algorithm to classify the unlabeled Web pages. Experiments show that our methods make about 20% more increase in F1-value than other Web page classification methods based on query log.