As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results a...As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.展开更多
The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in thi...The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in this paper is based on exploiting the implicit feedbacks of user satisfaction during her web browsing history to construct a user profile storing the web pages the user is highly interested in. A weight is assigned to each page stored in the user’s profile;this weight reflects the user’s interest in this page. We name this weight the relative rank of the page, since it depends on the user issuing the query. Therefore, the ranking algorithm provided in this paper is based on the principle that;the rank assigned to a page is the addition of two rank values R_rank and A_rank. A_rank is an absolute rank, since it is fixed for all users issuing the same query, it only depends on the link structures of the web and on the keywords of the query. Thus, it could be calculated by the PageRank algorithm suggested by Brin and Page in 1998 and used by the google search engine. While, R_rank is the relative rank, it is calculated by the methods given in this paper which depends mainly on recording implicit measures of user satisfaction during her previous browsing history.展开更多
In order to rank searching results according to the user preferences,a new personalized web pages ranking algorithm called PWPR(personalized web page ranking)with the idea of adjusting the ranking scores of web page...In order to rank searching results according to the user preferences,a new personalized web pages ranking algorithm called PWPR(personalized web page ranking)with the idea of adjusting the ranking scores of web pages in accordance with user preferences is proposed.PWPR assigns the initial weights based on user interests and creates the virtual links and hubs according to user interests.By measuring user click streams,PWPR incrementally reflects users’ favors for the personalized ranking.To improve the accuracy of ranking, PWPR also takes collaborative filtering into consideration when the query with similar is submitted by users who have similar user interests. Detailed simulation results and comparison with other algorithms prove that the proposed PWPR can adaptively provide personalized ranking and truly relevant information to user preferences.展开更多
The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the w...The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the web pages is evaluated using several carefully selected statistical models. Universities web pages are chosen as subjects for this work for ease of comparison and ease of collecting data. A series of experiments has been conducted to investigate into the usability and design of the universities web pages. Prototype web pages have been developed according to the structured methodologies of web pages design and usability. Universities web pages were evaluated together with the prototype web pages using a questionnaire which was designed according to the Human Computer Interactions (HCI) heuristics. Nine (users) respondents’ variables and 14 web pages variables (items) were studied. Stringent statistical analysis was adopted to extract the required information to form the data acquired, and augmented interpretation of the statistical results was followed. The results showed that the analysis of variance (ANOVA) procedure showed there were significant differences among the universities web pages regarding most of the 23 items studied. Duncan Multiple Range Test (DMRT) showed that the prototype usability performed significantly better regarding most of the items. The correlation analysis showed significant positive and negative correlations between many items. The regression analysis revealed that the most significant factors (items) that contributed to the best model of the universities web pages design and usability were: multimedia in the web pages, the web pages icons (alone) organisation and design, and graphics attractiveness. The results showed some of the limitations of some heuristics used in conventional interface systems design and proposed some additional heuristics in web pages design and usability.展开更多
In this paper, we discuss several issues related to automated classification of web pages, especially text classification of web pages. We analyze features selection and categorization algorithms of web pages and give...In this paper, we discuss several issues related to automated classification of web pages, especially text classification of web pages. We analyze features selection and categorization algorithms of web pages and give some suggestions for web pages categorization.展开更多
“Web Design and Website Construction”is a core professional course for e-commerce majors.This article explores how to integrate ideological and political education into the curriculum teaching of the course from thr...“Web Design and Website Construction”is a core professional course for e-commerce majors.This article explores how to integrate ideological and political education into the curriculum teaching of the course from three aspects:the necessity of ideological and political education construction,the construction goals,and the implementation paths.It not only improves students’professional and technical skills,but also guides students to establish a correct outlook on life and values,and cultivates students’comprehensive development of comprehensive literacy.展开更多
Web design is a key course for computer majors in colleges and universities,and it has the characteristics of comprehensiveness and practicability.In order to ensure the quality of web design teaching,teachers need to...Web design is a key course for computer majors in colleges and universities,and it has the characteristics of comprehensiveness and practicability.In order to ensure the quality of web design teaching,teachers need to formulate appropriate teaching strategies in combination with teaching content and student characteristics.This paper summarizes the problems existing in the teaching of computer web design in colleges and universities,studies the effective strategies and related aspects of computer web design teaching in colleges and universities,and hopes to provide guidelines and information for relevant teachers.展开更多
This paper proposed a watermarking algorithm for tamper-proof of web pages. For a web page, it generates a watermark consisting of a sequence of Space and Tab. The wa termark is then embedded into the web page after e...This paper proposed a watermarking algorithm for tamper-proof of web pages. For a web page, it generates a watermark consisting of a sequence of Space and Tab. The wa termark is then embedded into the web page after each word and each line. When a watermarked web page is tampered, the extracted watermark can detect and locate the modifications to the web page. Besides, the framework of watermarked Web Server system was given. Compared with traditional digital signature methods, this watermarking method is more transparent in that there is no necessary to detach the watermark before displaying web pages. The e xperimental results show that the proposed scheme is an effective tool for tamper-proof of web pages.展开更多
This paper describes the development of a new ECG tele-monitoring method and system based on the embedded web server. The system consists of ECG recorders with network interface and the embedded web server, internet n...This paper describes the development of a new ECG tele-monitoring method and system based on the embedded web server. The system consists of ECG recorders with network interface and the embedded web server, internet networks and computers, with the system operating on browser/server(B/S) mode. The ECG recorder was designed by ARM9 (S3C2410X) and embedded operating system (Linux). Once the ECG recorder has been connected to the internet network, medical experts can use the internet to access the server of the ECG recorder, monitor ECG signals, and diagnose patients by browsing the dynamic web pages in the embedded web server. The experimental results reveal that the designed system is stable, reliable, and suitable for the use in real-time ECG tele-monitoring for both family and community health care.展开更多
Cyber-crimes are growing rapidly,so it is important to obtain the digital evidence on the web page.Usually,people can examine the browser history on the client side and data files on the server side,but both of them h...Cyber-crimes are growing rapidly,so it is important to obtain the digital evidence on the web page.Usually,people can examine the browser history on the client side and data files on the server side,but both of them have shortcomings in real criminal investigation.To overcome the weakness,this paper designs a web page forensic scheme to snapshot the pages from web servers with the help of web spider.Also,it designs several steps to improve the trustworthiness of these pages.All the pages will be dumped in local database which can be presented as reliable evidence on the court.展开更多
As part of business translation, web page translation is an effective way of global communication. Accurate and adequate translation of web pages can enhance the enterprises' competitiveness. This paper focuses on th...As part of business translation, web page translation is an effective way of global communication. Accurate and adequate translation of web pages can enhance the enterprises' competitiveness. This paper focuses on the features of Chinese and Western web pages of garment enterprises and the translation strategy with the guidance of mass communication theory. Results illustrate that Chinese texts favor symmetrical phrases rich in cultural connotation, whereas English texts prefer plain language with fewer culture-loaded expressions and the latter focuses more on the feeling of audiences. As a result, audience-oriented translation strategy is strongly recommended so as to maximize the communication effect of the enterprise web pages展开更多
Automatic web page classification has become inevitable for web directories due to the multitude of web pages in the World Wide Web. In this paper an improved Term Weighting technique is proposed for automatic and eff...Automatic web page classification has become inevitable for web directories due to the multitude of web pages in the World Wide Web. In this paper an improved Term Weighting technique is proposed for automatic and effective classification of web pages. The web documents are represented as set of features. The proposed method selects and extracts the most prominent features reducing the high dimensionality problem of classifier. The proper selection of features among the large set improves the performance of the classifier. The proposed algorithm is implemented and tested on a benchmarked dataset. The results show the better performance than most of the existing term weighting techniques.展开更多
This paper presents a twice-gathering information interactive system prototype of e-government based on the condition that the Intranet and the Extranet are physical isolated.Users in the Extranet can gather links of ...This paper presents a twice-gathering information interactive system prototype of e-government based on the condition that the Intranet and the Extranet are physical isolated.Users in the Extranet can gather links of the latest related information from client software which is previously collected by web alert in the Internet.Finally,through ferry-type transport devices,information is browsed by users in the Intranet,and it is transported to a storage device and synchronized with the web platform in the Intranet.During information gathering in the Extranet and data synchronization in the Intranet,it is essential to avoid repeated gathering and copying by means of comparing the extracted information fingerprints gathered from the web pages.This prototype uses HashTrie to store information fingerprints.During testing,the structure based on HashTrie is 2.28 times faster than the Darts(double array Trie)which is the fastest structure in the existing applied patent.The existing 12 types of high speed Hash functions serving for HashTrie are also implemented.When the dictionary content is larger than 5×105 words,the PJWHash or the SuperFastHush function can be adopted;when the dictionary content is 105 words, CalcStrCR32 and ELFHash functions can be adopted.展开更多
A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also con...A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.展开更多
The following paper provides a new algorithm: a result integration algorithm based on matching strategy. The algorithm extracts the title and the abstract of Web pages, calculates the relevance between the query stri...The following paper provides a new algorithm: a result integration algorithm based on matching strategy. The algorithm extracts the title and the abstract of Web pages, calculates the relevance between the query string and the Web pages, decides the Web pages accepted, rejected and sorts them out in user interfaces. The experiment results in dieate obviously that the new algorithms improve the precision of meta-search engine. This technique is very useful to metasearch engine.展开更多
The paper deals the popular news-talk radio station "Echo of Moscow" which is one of the most interesting and successful one in Moscow FM range. It provides thorough analysis of different previous and nowadays progr...The paper deals the popular news-talk radio station "Echo of Moscow" which is one of the most interesting and successful one in Moscow FM range. It provides thorough analysis of different previous and nowadays programs and web projects in comparison with main multimedia sphere trends. A particular attention is paid to "Echo of Moscow" web page as multimedia portal strongly packed with different functions such as its integration with social nets.展开更多
Parking automation has been developed for locating available parking spaces inside a parking lot with new technologies, such as automatic collection, capacity, or empty spaces’ detection;to both reduce search time an...Parking automation has been developed for locating available parking spaces inside a parking lot with new technologies, such as automatic collection, capacity, or empty spaces’ detection;to both reduce search time and avoid crowds of vehicles waiting to park in it. A simple detection system of available spaces, within the Faculty and Staff parking lot of the Engineering and Chemistry Departments at the Autonomous University of Chihuahua, Mexico (UACH), was set up. First, an analysis of vehicle capacity was carried out to establish the peak hours of vehicles’ entrance</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;"> and compar</span><span style="font-family:Verdana;">e</span><span style="font-family:Verdana;"> it to the parking lots’ capacity. For one working week (6 days), it was determined that Tuesdays have the highest vehicle occupancy, and Saturdays are the days with the least traffic. Second, the geometric design of the parking lot (divided in</span><span style="font-family:Verdana;">to</span><span style="font-family:Verdana;"> two areas because of the terrain topography: upper and bottom levels), its dimensions and location of parking spots and marks on the pavement for traffic directions, were obtained from its blueprint, to visualize how the surveillance cameras can be oriented to cover the area. A detection system prototype was created for a specific area of the parking lot, using an artificial vision technique to perform the video extraction, while the digital image processing was developed in MATLAB. To complement and achieve the objective, Object-</span><span style="font-family:""> </span><span style="font-family:Verdana;">Oriented Programming (OOP) was used to obtain the count analysis as well as the availability/occupancy of each of the parking areas. This information is shown </span><span style="font-family:Verdana;">o</span><span style="font-family:Verdana;">n a web page, with a simple driver-friendly design, since it has each parking space number and its availability in blue with the word “Available” or, in the opposite case, the word “Taken” in pink.展开更多
The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This help...The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This helps the search engines to provide users with relevant and quick retrieval results. As web pages are represented by thousands of features, feature selection helps the web page classifiers to resolve this large scale dimensionality problem. This paper proposes a new feature selection method using Ward’s minimum variance measure. This measure is first used to identify clusters of redundant features in a web page. In each cluster, the best representative features are retained and the others are eliminated. Removing such redundant features helps in minimizing the resource utilization during classification. The proposed method of feature selection is compared with other common feature selection methods. Experiments done on a benchmark data set, namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.展开更多
Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different cha...Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different characteristics, and multiple classifiers can be combined to allow classifiers to complement one another. In this study, a web page classification method based on heterogeneous features and a combination of multiple classifiers is proposed. Different from computing the frequency of HTML tags, we exploit the tree-like structure of HTML tags to characterize the structural features of a web page. Heterogeneous textual features and the proposed tree-like structural features are converted into vectors and fused. Confidence is proposed here as a criterion to compare the classification results of different classifiers by calculating the classification accuracy of a set of samples. Multiple classifiers are combined based on confidence with different decision strategies, such as voting, confidence comparison, and direct output, to give the final classification results. Experimental results demonstrate that on the Amazon dataset, 7-web-genres dataset, and DMOZ dataset, the accuracies are increased to 94.2%, 95.4%, and 95.7%, respectively. The fusion of the textual features with the proposed structural features is a comprehensive approach, and the accuracy is higher than that when using only textual features. At the same time, the accuracy of the web page classification is improved by combining multiple classifiers, and is higher than those of the related web page classification algorithms.展开更多
To understand website complexity deeply, a web page complexity measurement system is developed. The system measures the complexity of a web page at two levels: transport-level and content-level, using a packet trace-...To understand website complexity deeply, a web page complexity measurement system is developed. The system measures the complexity of a web page at two levels: transport-level and content-level, using a packet trace-based approach rather than server or client logs. Packet traces surpass others in the amount of information contained. Quantitative analyses show that different categories of web pages have different complexity characteristics. Experimental results show that a news web page usually loads much more elements at more accessing levels from much more web servers within diverse administrative domains over much more concurrent transmission control protocol (TCP) flows. About more than half of education pages each only involve a few logical servers, where most of elements of a web page are fetched only from one or two logical servers. The number of content types for web game traffic after login is usually least. The system can help web page designers to design more efficient web pages, and help researchers or Internet users to know communication details.展开更多
文摘As data grows in size,search engines face new challenges in extracting more relevant content for users’searches.As a result,a number of retrieval and ranking algorithms have been employed to ensure that the results are relevant to the user’s requirements.Unfortunately,most existing indexes and ranking algo-rithms crawl documents and web pages based on a limited set of criteria designed to meet user expectations,making it impossible to deliver exceptionally accurate results.As a result,this study investigates and analyses how search engines work,as well as the elements that contribute to higher ranks.This paper addresses the issue of bias by proposing a new ranking algorithm based on the PageRank(PR)algorithm,which is one of the most widely used page ranking algorithms We pro-pose weighted PageRank(WPR)algorithms to test the relationship between these various measures.The Weighted Page Rank(WPR)model was used in three dis-tinct trials to compare the rankings of documents and pages based on one or more user preferences criteria.Thefindings of utilizing the Weighted Page Rank model showed that using multiple criteria to rankfinal pages is better than using only one,and that some criteria had a greater impact on ranking results than others.
文摘The basic idea behind a personalized web search is to deliver search results that are tailored to meet user needs, which is one of the growing concepts in web technologies. The personalized web search presented in this paper is based on exploiting the implicit feedbacks of user satisfaction during her web browsing history to construct a user profile storing the web pages the user is highly interested in. A weight is assigned to each page stored in the user’s profile;this weight reflects the user’s interest in this page. We name this weight the relative rank of the page, since it depends on the user issuing the query. Therefore, the ranking algorithm provided in this paper is based on the principle that;the rank assigned to a page is the addition of two rank values R_rank and A_rank. A_rank is an absolute rank, since it is fixed for all users issuing the same query, it only depends on the link structures of the web and on the keywords of the query. Thus, it could be calculated by the PageRank algorithm suggested by Brin and Page in 1998 and used by the google search engine. While, R_rank is the relative rank, it is calculated by the methods given in this paper which depends mainly on recording implicit measures of user satisfaction during her previous browsing history.
基金The Natural Science Foundation of South-Central University for Nationalities(No.YZZ07006)
文摘In order to rank searching results according to the user preferences,a new personalized web pages ranking algorithm called PWPR(personalized web page ranking)with the idea of adjusting the ranking scores of web pages in accordance with user preferences is proposed.PWPR assigns the initial weights based on user interests and creates the virtual links and hubs according to user interests.By measuring user click streams,PWPR incrementally reflects users’ favors for the personalized ranking.To improve the accuracy of ranking, PWPR also takes collaborative filtering into consideration when the query with similar is submitted by users who have similar user interests. Detailed simulation results and comparison with other algorithms prove that the proposed PWPR can adaptively provide personalized ranking and truly relevant information to user preferences.
文摘The usability of an interface is a fundamental issue to elucidate. Many researchers argued that many usability results and recommendations lack empirical and experimental data. In this research, the usability of the web pages is evaluated using several carefully selected statistical models. Universities web pages are chosen as subjects for this work for ease of comparison and ease of collecting data. A series of experiments has been conducted to investigate into the usability and design of the universities web pages. Prototype web pages have been developed according to the structured methodologies of web pages design and usability. Universities web pages were evaluated together with the prototype web pages using a questionnaire which was designed according to the Human Computer Interactions (HCI) heuristics. Nine (users) respondents’ variables and 14 web pages variables (items) were studied. Stringent statistical analysis was adopted to extract the required information to form the data acquired, and augmented interpretation of the statistical results was followed. The results showed that the analysis of variance (ANOVA) procedure showed there were significant differences among the universities web pages regarding most of the 23 items studied. Duncan Multiple Range Test (DMRT) showed that the prototype usability performed significantly better regarding most of the items. The correlation analysis showed significant positive and negative correlations between many items. The regression analysis revealed that the most significant factors (items) that contributed to the best model of the universities web pages design and usability were: multimedia in the web pages, the web pages icons (alone) organisation and design, and graphics attractiveness. The results showed some of the limitations of some heuristics used in conventional interface systems design and proposed some additional heuristics in web pages design and usability.
文摘In this paper, we discuss several issues related to automated classification of web pages, especially text classification of web pages. We analyze features selection and categorization algorithms of web pages and give some suggestions for web pages categorization.
基金2021 University-Level Undergraduate High-Quality Curriculum Construction Reform Project of Wuyi University:Web Design and Website Construction(Project number:5071700304C8)。
文摘“Web Design and Website Construction”is a core professional course for e-commerce majors.This article explores how to integrate ideological and political education into the curriculum teaching of the course from three aspects:the necessity of ideological and political education construction,the construction goals,and the implementation paths.It not only improves students’professional and technical skills,but also guides students to establish a correct outlook on life and values,and cultivates students’comprehensive development of comprehensive literacy.
文摘Web design is a key course for computer majors in colleges and universities,and it has the characteristics of comprehensiveness and practicability.In order to ensure the quality of web design teaching,teachers need to formulate appropriate teaching strategies in combination with teaching content and student characteristics.This paper summarizes the problems existing in the teaching of computer web design in colleges and universities,studies the effective strategies and related aspects of computer web design teaching in colleges and universities,and hopes to provide guidelines and information for relevant teachers.
文摘This paper proposed a watermarking algorithm for tamper-proof of web pages. For a web page, it generates a watermark consisting of a sequence of Space and Tab. The wa termark is then embedded into the web page after each word and each line. When a watermarked web page is tampered, the extracted watermark can detect and locate the modifications to the web page. Besides, the framework of watermarked Web Server system was given. Compared with traditional digital signature methods, this watermarking method is more transparent in that there is no necessary to detach the watermark before displaying web pages. The e xperimental results show that the proposed scheme is an effective tool for tamper-proof of web pages.
基金Education Committee Foundation of Beijing grant number: KM200610005022+1 种基金Young Backbone Teacher Foundation of Beijing grant number: 102KB00845
文摘This paper describes the development of a new ECG tele-monitoring method and system based on the embedded web server. The system consists of ECG recorders with network interface and the embedded web server, internet networks and computers, with the system operating on browser/server(B/S) mode. The ECG recorder was designed by ARM9 (S3C2410X) and embedded operating system (Linux). Once the ECG recorder has been connected to the internet network, medical experts can use the internet to access the server of the ECG recorder, monitor ECG signals, and diagnose patients by browsing the dynamic web pages in the embedded web server. The experimental results reveal that the designed system is stable, reliable, and suitable for the use in real-time ECG tele-monitoring for both family and community health care.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61272540)the National Basic Research Program of China(973 Program)(Grant No.2013CB329604)+3 种基金the National High Technology Research and Development Program of China(Grant No.2012AA011005)the Natural Science Foundation of Anhui Province,China(Grant No.11040606M138 and No.1208085MF101)the Specialized Research Fund for the Doctoral Program of Higher Education of China(Grant No.2011JYXJ1498)the Fundamental Research Funds for the Central Universities(Grant No.2011HGQC1012)
文摘Cyber-crimes are growing rapidly,so it is important to obtain the digital evidence on the web page.Usually,people can examine the browser history on the client side and data files on the server side,but both of them have shortcomings in real criminal investigation.To overcome the weakness,this paper designs a web page forensic scheme to snapshot the pages from web servers with the help of web spider.Also,it designs several steps to improve the trustworthiness of these pages.All the pages will be dumped in local database which can be presented as reliable evidence on the court.
文摘As part of business translation, web page translation is an effective way of global communication. Accurate and adequate translation of web pages can enhance the enterprises' competitiveness. This paper focuses on the features of Chinese and Western web pages of garment enterprises and the translation strategy with the guidance of mass communication theory. Results illustrate that Chinese texts favor symmetrical phrases rich in cultural connotation, whereas English texts prefer plain language with fewer culture-loaded expressions and the latter focuses more on the feeling of audiences. As a result, audience-oriented translation strategy is strongly recommended so as to maximize the communication effect of the enterprise web pages
文摘Automatic web page classification has become inevitable for web directories due to the multitude of web pages in the World Wide Web. In this paper an improved Term Weighting technique is proposed for automatic and effective classification of web pages. The web documents are represented as set of features. The proposed method selects and extracts the most prominent features reducing the high dimensionality problem of classifier. The proper selection of features among the large set improves the performance of the classifier. The proposed algorithm is implemented and tested on a benchmarked dataset. The results show the better performance than most of the existing term weighting techniques.
基金The National Basic Research Program of China(973 Program)(No.2007CB310806)
文摘This paper presents a twice-gathering information interactive system prototype of e-government based on the condition that the Intranet and the Extranet are physical isolated.Users in the Extranet can gather links of the latest related information from client software which is previously collected by web alert in the Internet.Finally,through ferry-type transport devices,information is browsed by users in the Intranet,and it is transported to a storage device and synchronized with the web platform in the Intranet.During information gathering in the Extranet and data synchronization in the Intranet,it is essential to avoid repeated gathering and copying by means of comparing the extracted information fingerprints gathered from the web pages.This prototype uses HashTrie to store information fingerprints.During testing,the structure based on HashTrie is 2.28 times faster than the Darts(double array Trie)which is the fastest structure in the existing applied patent.The existing 12 types of high speed Hash functions serving for HashTrie are also implemented.When the dictionary content is larger than 5×105 words,the PJWHash or the SuperFastHush function can be adopted;when the dictionary content is 105 words, CalcStrCR32 and ELFHash functions can be adopted.
基金Sponsored bythe Huo Ying-Dong Education Foundation of China(91101)
文摘A web page clustering algorithm called PageCluster and the improved algorithm ImPageCluster solving overlapping are proposed. These methods not only take the web structure and page hyperlink into account, but also consider the importance of each page which is described as in-weight and out-weight. Compared with the traditional clustering methods, the experiments show that the runtimes of the proposed algorithms are less with the improved accuracies.
基金Supported by the Fifteenth Project ,Science Tech-nology Development Plan of Shaanxi Province of China (2000K08-G12)
文摘The following paper provides a new algorithm: a result integration algorithm based on matching strategy. The algorithm extracts the title and the abstract of Web pages, calculates the relevance between the query string and the Web pages, decides the Web pages accepted, rejected and sorts them out in user interfaces. The experiment results in dieate obviously that the new algorithms improve the precision of meta-search engine. This technique is very useful to metasearch engine.
文摘The paper deals the popular news-talk radio station "Echo of Moscow" which is one of the most interesting and successful one in Moscow FM range. It provides thorough analysis of different previous and nowadays programs and web projects in comparison with main multimedia sphere trends. A particular attention is paid to "Echo of Moscow" web page as multimedia portal strongly packed with different functions such as its integration with social nets.
文摘Parking automation has been developed for locating available parking spaces inside a parking lot with new technologies, such as automatic collection, capacity, or empty spaces’ detection;to both reduce search time and avoid crowds of vehicles waiting to park in it. A simple detection system of available spaces, within the Faculty and Staff parking lot of the Engineering and Chemistry Departments at the Autonomous University of Chihuahua, Mexico (UACH), was set up. First, an analysis of vehicle capacity was carried out to establish the peak hours of vehicles’ entrance</span><span style="font-family:Verdana;">s</span><span style="font-family:Verdana;"> and compar</span><span style="font-family:Verdana;">e</span><span style="font-family:Verdana;"> it to the parking lots’ capacity. For one working week (6 days), it was determined that Tuesdays have the highest vehicle occupancy, and Saturdays are the days with the least traffic. Second, the geometric design of the parking lot (divided in</span><span style="font-family:Verdana;">to</span><span style="font-family:Verdana;"> two areas because of the terrain topography: upper and bottom levels), its dimensions and location of parking spots and marks on the pavement for traffic directions, were obtained from its blueprint, to visualize how the surveillance cameras can be oriented to cover the area. A detection system prototype was created for a specific area of the parking lot, using an artificial vision technique to perform the video extraction, while the digital image processing was developed in MATLAB. To complement and achieve the objective, Object-</span><span style="font-family:""> </span><span style="font-family:Verdana;">Oriented Programming (OOP) was used to obtain the count analysis as well as the availability/occupancy of each of the parking areas. This information is shown </span><span style="font-family:Verdana;">o</span><span style="font-family:Verdana;">n a web page, with a simple driver-friendly design, since it has each parking space number and its availability in blue with the word “Available” or, in the opposite case, the word “Taken” in pink.
文摘The number of Internet users and the number of web pages being added to www increase dramatically every day. It is therefore required to automatically and efficiently classify web pages into web directories. This helps the search engines to provide users with relevant and quick retrieval results. As web pages are represented by thousands of features, feature selection helps the web page classifiers to resolve this large scale dimensionality problem. This paper proposes a new feature selection method using Ward’s minimum variance measure. This measure is first used to identify clusters of redundant features in a web page. In each cluster, the best representative features are retained and the others are eliminated. Removing such redundant features helps in minimizing the resource utilization during classification. The proposed method of feature selection is compared with other common feature selection methods. Experiments done on a benchmark data set, namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.
基金Project supported by the National Natural Science Foundation of China(No.61471314)the Welfare Technology Research Project of Zhejiang Province,China(No.LGG18F010003)。
文摘Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different characteristics, and multiple classifiers can be combined to allow classifiers to complement one another. In this study, a web page classification method based on heterogeneous features and a combination of multiple classifiers is proposed. Different from computing the frequency of HTML tags, we exploit the tree-like structure of HTML tags to characterize the structural features of a web page. Heterogeneous textual features and the proposed tree-like structural features are converted into vectors and fused. Confidence is proposed here as a criterion to compare the classification results of different classifiers by calculating the classification accuracy of a set of samples. Multiple classifiers are combined based on confidence with different decision strategies, such as voting, confidence comparison, and direct output, to give the final classification results. Experimental results demonstrate that on the Amazon dataset, 7-web-genres dataset, and DMOZ dataset, the accuracies are increased to 94.2%, 95.4%, and 95.7%, respectively. The fusion of the textual features with the proposed structural features is a comprehensive approach, and the accuracy is higher than that when using only textual features. At the same time, the accuracy of the web page classification is improved by combining multiple classifiers, and is higher than those of the related web page classification algorithms.
基金supported by the Open Research Program of the Key Laboratory of Computer Network and Information Integration(Southeast University),Ministry of Education(K93-9-2014-04B)the National Natural Science Foundation of China(61170322,61572263,61302157)
文摘To understand website complexity deeply, a web page complexity measurement system is developed. The system measures the complexity of a web page at two levels: transport-level and content-level, using a packet trace-based approach rather than server or client logs. Packet traces surpass others in the amount of information contained. Quantitative analyses show that different categories of web pages have different complexity characteristics. Experimental results show that a news web page usually loads much more elements at more accessing levels from much more web servers within diverse administrative domains over much more concurrent transmission control protocol (TCP) flows. About more than half of education pages each only involve a few logical servers, where most of elements of a web page are fetched only from one or two logical servers. The number of content types for web game traffic after login is usually least. The system can help web page designers to design more efficient web pages, and help researchers or Internet users to know communication details.