Measles,an infectious disease caused by the measles virus,remains a significant public health concern worldwide due to its highly contagious nature and potential for severe complications[1].In addition to symptoms suc...Measles,an infectious disease caused by the measles virus,remains a significant public health concern worldwide due to its highly contagious nature and potential for severe complications[1].In addition to symptoms such as high fever,cough,Koplik spots,and rash,measles can lead to serious complications including pneumonia and myocarditis,particularly in vulnerable populations such as young children[1,2].展开更多
In order to improve the accuracy and integrality of mining data records from the web, the concepts of isomorphic page and directory page and three algorithms are proposed. An isomorphic web page is a set of web pages ...In order to improve the accuracy and integrality of mining data records from the web, the concepts of isomorphic page and directory page and three algorithms are proposed. An isomorphic web page is a set of web pages that have uniform structure, only differing in main information. A web page which contains many links that link to isomorphic web pages is called a directory page. Algorithm 1 can find directory web pages in a web using adjacent links similar analysis method. It first sorts the link, and then counts the links in each directory. If the count is greater than a given valve then finds the similar sub-page links in the directory and gives the results. A function for an isomorphic web page judgment is also proposed. Algorithm 2 can mine data records from an isomorphic page using a noise information filter. It is based on the fact that the noise information is the same in two isomorphic pages, only the main information is different. Algorithm 3 can mine data records from an entire website using the technology of spider. The experiment shows that the proposed algorithms can mine data records more intactly than the existing algorithms. Mining data records from isomorphic pages is an efficient method.展开更多
In very short time today web has become an enormously important tool for communicating ideas, conducting business and entertainment. At the time of navigation, web users leave various records of their action. This vas...In very short time today web has become an enormously important tool for communicating ideas, conducting business and entertainment. At the time of navigation, web users leave various records of their action. This vast amount of data can be a useful source of knowledge for predicting user behavior. A refined method is required to carry out this task. Web usages mining (WUM) is the tool designed to do this task. WUM system is used to extract the knowledge based on user behavior during the web navigation. The extracted knowledge can be used for predicting the users’ future request when user is browsing the web. In this paper we advanced the online recommender system by using a Longest Common Subsequence (LCS) classification algorithm to classify users’ navigation pattern. Classification using the proposed method can improve the accuracy of recommendation and also proposed an algorithm that uses LCS method to know the user behavior for improvement of design of a website.展开更多
We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornog...We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornographic webpages with the following major advantages: 1) a weighted window-based technique was proposed to estimate for the condition of concept drift for the keywords found recently in pornographic webpages; 2) checking only contexts of webpages without scanning pictures; 3) an incremental learning mechanism was designed to incrementally update the pornographic keyword database.展开更多
A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human underst...A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.展开更多
Given the rapid growth of e-commerce and the importance of understanding online customer behavior,it is necessary to develop a universal model as a means to measure the relevant construct.Website design plays an impor...Given the rapid growth of e-commerce and the importance of understanding online customer behavior,it is necessary to develop a universal model as a means to measure the relevant construct.Website design plays an important role in e-commerce because it directly affects online customers during the purchase process.Online customers will continue to be loyal because of good website design.However,the effect of website design has not been clearly defined and a suitable framework for evaluating the status of website design is lacking.The purpose of this study is to develop a comprehensive framework that can guide successful web design and examine the impact of web design on e-loyalty.207 Taobao customers from China completed an online survey.All scales were analyzed by reliability,construct validity,and convergent validity.Multiple regression analysis is used to test research hypotheses.Findings indicated that the factor of website design should be revised and classified into visual design and information&navigation design.The website design had a significantly positive effect on e-loyalty.The relative order of importance of these predictors was information&navigation design and visual design.展开更多
速度和效果是聚类算法面临的两大问题.DBSCAN(density based spatial clustering of applications with noise)是典型的基于密度的一种聚类方法,对于大型数据库的聚类实验显示了它在速度上的优越性.提出了一种基于密度的递归聚类算法(re...速度和效果是聚类算法面临的两大问题.DBSCAN(density based spatial clustering of applications with noise)是典型的基于密度的一种聚类方法,对于大型数据库的聚类实验显示了它在速度上的优越性.提出了一种基于密度的递归聚类算法(recursive density based clustering algorithm,简称RDBC),此算法可以智能地、动态地修改其密度参数.RDBC是基于DBSCAN的一种改进算法,其运算复杂度和DBSCAN相同.通过在Web文档上的聚类实验,结果表明,RDBC不但保留了DBSCAN高速度的优点,而且聚类效果大大优于DBSCAN.展开更多
文摘Measles,an infectious disease caused by the measles virus,remains a significant public health concern worldwide due to its highly contagious nature and potential for severe complications[1].In addition to symptoms such as high fever,cough,Koplik spots,and rash,measles can lead to serious complications including pneumonia and myocarditis,particularly in vulnerable populations such as young children[1,2].
文摘In order to improve the accuracy and integrality of mining data records from the web, the concepts of isomorphic page and directory page and three algorithms are proposed. An isomorphic web page is a set of web pages that have uniform structure, only differing in main information. A web page which contains many links that link to isomorphic web pages is called a directory page. Algorithm 1 can find directory web pages in a web using adjacent links similar analysis method. It first sorts the link, and then counts the links in each directory. If the count is greater than a given valve then finds the similar sub-page links in the directory and gives the results. A function for an isomorphic web page judgment is also proposed. Algorithm 2 can mine data records from an isomorphic page using a noise information filter. It is based on the fact that the noise information is the same in two isomorphic pages, only the main information is different. Algorithm 3 can mine data records from an entire website using the technology of spider. The experiment shows that the proposed algorithms can mine data records more intactly than the existing algorithms. Mining data records from isomorphic pages is an efficient method.
文摘In very short time today web has become an enormously important tool for communicating ideas, conducting business and entertainment. At the time of navigation, web users leave various records of their action. This vast amount of data can be a useful source of knowledge for predicting user behavior. A refined method is required to carry out this task. Web usages mining (WUM) is the tool designed to do this task. WUM system is used to extract the knowledge based on user behavior during the web navigation. The extracted knowledge can be used for predicting the users’ future request when user is browsing the web. In this paper we advanced the online recommender system by using a Longest Common Subsequence (LCS) classification algorithm to classify users’ navigation pattern. Classification using the proposed method can improve the accuracy of recommendation and also proposed an algorithm that uses LCS method to know the user behavior for improvement of design of a website.
基金supported by MOST under Grant No.MOST 103-2410-H-004-112
文摘We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornographic webpages with the following major advantages: 1) a weighted window-based technique was proposed to estimate for the condition of concept drift for the keywords found recently in pornographic webpages; 2) checking only contexts of webpages without scanning pictures; 3) an incremental learning mechanism was designed to incrementally update the pornographic keyword database.
文摘A large amount of data is present on the web which can be used for useful purposes like a product recommendation,price comparison and demand forecasting for a particular product.Websites are designed for human understanding and not for machines.Therefore,to make data machine-readable,it requires techniques to grab data from web pages.Researchers have addressed the problem using two approaches,i.e.,knowledge engineering and machine learning.State of the art knowledge engineering approaches use the structure of documents,visual cues,clustering of attributes of data records and text processing techniques to identify data records on a web page.Machine learning approaches use annotated pages to learn rules.These rules are used to extract data from unseen web pages.The structure of web documents is continuously evolving.Therefore,new techniques are needed to handle the emerging requirements of web data extraction.In this paper,we have presented a novel,simple and efficient technique to extract data from web pages using visual styles and structure of documents.The proposed technique detects Rich Data Region(RDR)using query and correlative words of the query.RDR is then divided into data records using style similarity.Noisy elements are removed using a Common Tag Sequence(CTS)and formatting entropy.The system is implemented using JAVA and runs on the dataset of real-world working websites.The effectiveness of results is evaluated using precision,recall,and F-measure and compared with five existing systems.A comparison of the proposed technique to existing systems has shown encouraging results.
文摘Given the rapid growth of e-commerce and the importance of understanding online customer behavior,it is necessary to develop a universal model as a means to measure the relevant construct.Website design plays an important role in e-commerce because it directly affects online customers during the purchase process.Online customers will continue to be loyal because of good website design.However,the effect of website design has not been clearly defined and a suitable framework for evaluating the status of website design is lacking.The purpose of this study is to develop a comprehensive framework that can guide successful web design and examine the impact of web design on e-loyalty.207 Taobao customers from China completed an online survey.All scales were analyzed by reliability,construct validity,and convergent validity.Multiple regression analysis is used to test research hypotheses.Findings indicated that the factor of website design should be revised and classified into visual design and information&navigation design.The website design had a significantly positive effect on e-loyalty.The relative order of importance of these predictors was information&navigation design and visual design.
文摘速度和效果是聚类算法面临的两大问题.DBSCAN(density based spatial clustering of applications with noise)是典型的基于密度的一种聚类方法,对于大型数据库的聚类实验显示了它在速度上的优越性.提出了一种基于密度的递归聚类算法(recursive density based clustering algorithm,简称RDBC),此算法可以智能地、动态地修改其密度参数.RDBC是基于DBSCAN的一种改进算法,其运算复杂度和DBSCAN相同.通过在Web文档上的聚类实验,结果表明,RDBC不但保留了DBSCAN高速度的优点,而且聚类效果大大优于DBSCAN.