Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a...Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a novel algorithm updating for global frequent patterns-IPARUC. A rapid clustering method is introduced to divide database into n parts in IPARUC firstly, where the data are similar in the same part. Then, the nodes in the tree are adjusted dynamically in inserting process by "pruning and laying back" to keep the frequency descending order so that they can be shared to approaching optimization. Finally local frequent itemsets mined from each local dataset are merged into global frequent itemsets. The results of experimental study are very encouraging. It is obvious from experiment that IPARUC is more effective and efficient than other two contrastive methods. Furthermore, there is significant application potential to a prototype of Web log Analyzer in web usage mining that can help us to discover useful knowledge effectively, even help managers making decision.展开更多
A semantic session analysis method partitioning Web usage logs is presented. Semantic Web usage log preparation model enhances usage logs with semantic. The Markov chain model based on ontology semantic measurement is...A semantic session analysis method partitioning Web usage logs is presented. Semantic Web usage log preparation model enhances usage logs with semantic. The Markov chain model based on ontology semantic measurement is used to identifying which active session a request should belong to. The competitive method is applied to determine the end of the sessions. Compared with other algorithms, more successful sessions are additionally detected by semantic outlier analysis.展开更多
To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant...To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.展开更多
Since the emergency of the mining of web usage patterns in the nineties of the 20th century, it has gotten a great development because of its wide range of application. To take advantage of the mining of web usage pat...Since the emergency of the mining of web usage patterns in the nineties of the 20th century, it has gotten a great development because of its wide range of application. To take advantage of the mining of web usage patterns, it will make network education system to meet personalized requirement better by distinguishing user interest and finding out important page.展开更多
Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usual...Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usually high-dimensional and sparse. Two approaches for mining typical user profiles, based on matrix dimensionality reduction, are presented. In these approaches, non-negative matrix factorization is applied to reduce dimensionality of the session-URL matrix, and the projecting vectors of the user-session vectors are clustered into typical user-session profiles using the spherical k -means algorithm. The results show that two algorithms are successful in mining many typical user profiles in the user sessions.展开更多
We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can ...We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining.展开更多
The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The first and foremost question neede...The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The first and foremost question needed to be considered in clustering Web sessions is how to measure the similarity between Web sessions. However, there are many shortcomings in traditional measurements. This paper introduces a new method for measuring similarities between Web pages that takes into account not only the URL but also the viewing time of the visited Web page. Then we give a new method to measure the similarity of Web sessions using sequence alignment and the similarity of Web page access in detail Experiments have proved that our method is valid and efficient.展开更多
In this era of a data-driven society, useful data(Big Data) is often unintentionally ignored due to lack of convenient tools and expensive software. For example, web log files can be used to identify explicit informat...In this era of a data-driven society, useful data(Big Data) is often unintentionally ignored due to lack of convenient tools and expensive software. For example, web log files can be used to identify explicit information of browsing patterns when users access web sites. Some hidden information,however, cannot be directly derived from the log files. We may need external resources to discover more knowledge from browsing patterns. The purpose of this study is to investigate the application of web usage mining based on web log files. The outcome of this study sets further directions of this investigation on what and how implicit information embedded in log files can be efficiently and effectively extracted. Further work involves combining the use of social media data to improve business decision quality.展开更多
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
In the advance of E-commerce, the importance of predicting the next request of a user as he or she visits Web pages grows larger than before. Web usage mining is the process of applying data mining to the discovery of...In the advance of E-commerce, the importance of predicting the next request of a user as he or she visits Web pages grows larger than before. Web usage mining is the process of applying data mining to the discovery of user behavior patterns based on Web log data, well suited to this problem. As an important field of Web usage mining, mining user navigation patterns is the fundamental approach for generating recommendations. In this paper, we propose an ant colony approach for navigation patterns. We use the ant theory as a metaphor to guide user's choice in the Web site.展开更多
基金Supported by the National Natural Science Foundation of China(60472099)Ningbo Natural Science Foundation(2006A610017)
文摘Because data warehouse is frequently changing, incremental data leads to old knowledge which is mined formerly unavailable. In order to maintain the discovered knowledge and patterns dynamically, this study presents a novel algorithm updating for global frequent patterns-IPARUC. A rapid clustering method is introduced to divide database into n parts in IPARUC firstly, where the data are similar in the same part. Then, the nodes in the tree are adjusted dynamically in inserting process by "pruning and laying back" to keep the frequency descending order so that they can be shared to approaching optimization. Finally local frequent itemsets mined from each local dataset are merged into global frequent itemsets. The results of experimental study are very encouraging. It is obvious from experiment that IPARUC is more effective and efficient than other two contrastive methods. Furthermore, there is significant application potential to a prototype of Web log Analyzer in web usage mining that can help us to discover useful knowledge effectively, even help managers making decision.
基金Supported by the Huo Yingdong Education Foundation of China(91101)
文摘A semantic session analysis method partitioning Web usage logs is presented. Semantic Web usage log preparation model enhances usage logs with semantic. The Markov chain model based on ontology semantic measurement is used to identifying which active session a request should belong to. The competitive method is applied to determine the end of the sessions. Compared with other algorithms, more successful sessions are additionally detected by semantic outlier analysis.
基金Supported by the Natural Science Foundation of Jiangsu Province(BK2005046)
文摘To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.
文摘Since the emergency of the mining of web usage patterns in the nineties of the 20th century, it has gotten a great development because of its wide range of application. To take advantage of the mining of web usage patterns, it will make network education system to meet personalized requirement better by distinguishing user interest and finding out important page.
文摘Recently clustering techniques have been used to automatically discover typical user profiles. In general, it is a challenging problem to design effective similarity measure between the session vectors which are usually high-dimensional and sparse. Two approaches for mining typical user profiles, based on matrix dimensionality reduction, are presented. In these approaches, non-negative matrix factorization is applied to reduce dimensionality of the session-URL matrix, and the projecting vectors of the user-session vectors are clustered into typical user-session profiles using the spherical k -means algorithm. The results show that two algorithms are successful in mining many typical user profiles in the user sessions.
文摘We combine the web usage mining and fuzzy clustering and give the concept of web fuzzy clustering, and then put forward the web fuzzy clustering processing model which is discussed in detail. Web fuzzy clustering can be used in the web users clustering and web pages clustering. In the end, a case study is given and the result has proved the feasibility of using web fuzzy clustering in web pages clustering. Key words web mining - web usage mining - web fuzzy clustering - WFCM CLC number TP 391 Foundation item: Supported by the National Natural Science Foundation of China (90104005)Biography: LIU Mao-fu (1977-), male, Ph. D candidate, research direction: artificial intelligence, web mining, image mining.
基金Supported by the Foundation of Hubei Key Technology Research and Development(2005AA101C18)the Natural Science Founda-tion of South-Central University for Nationalities(YZY06009)
文摘The task of clustering Web sessions is to group Web sessions based on similarity and consists of maximizing the intra-group similarity while minimizing the inter-group similarity. The first and foremost question needed to be considered in clustering Web sessions is how to measure the similarity between Web sessions. However, there are many shortcomings in traditional measurements. This paper introduces a new method for measuring similarities between Web pages that takes into account not only the URL but also the viewing time of the visited Web page. Then we give a new method to measure the similarity of Web sessions using sequence alignment and the similarity of Web page access in detail Experiments have proved that our method is valid and efficient.
基金Supported by Royal Thai Government ScholarshipFaculty of IT,Monash University,Resources Support
文摘In this era of a data-driven society, useful data(Big Data) is often unintentionally ignored due to lack of convenient tools and expensive software. For example, web log files can be used to identify explicit information of browsing patterns when users access web sites. Some hidden information,however, cannot be directly derived from the log files. We may need external resources to discover more knowledge from browsing patterns. The purpose of this study is to investigate the application of web usage mining based on web log files. The outcome of this study sets further directions of this investigation on what and how implicit information embedded in log files can be efficiently and effectively extracted. Further work involves combining the use of social media data to improve business decision quality.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.
基金This research is supported by National Natural Science Foundation of China (70471046), and Doctoral Fund of State Education Ministry(20040359010).
文摘In the advance of E-commerce, the importance of predicting the next request of a user as he or she visits Web pages grows larger than before. Web usage mining is the process of applying data mining to the discovery of user behavior patterns based on Web log data, well suited to this problem. As an important field of Web usage mining, mining user navigation patterns is the fundamental approach for generating recommendations. In this paper, we propose an ant colony approach for navigation patterns. We use the ant theory as a metaphor to guide user's choice in the Web site.