This paper describes in detail the web data mining technology, analyzes the relationship between the data on the web site to the tourism electronic commerce (including the server log, tourism commodity database, user...This paper describes in detail the web data mining technology, analyzes the relationship between the data on the web site to the tourism electronic commerce (including the server log, tourism commodity database, user database, the shopping cart), access to relevant user preference information for tourism commodity. Based on these models, the paper presents recommended strategies for the site registered users, and has had the corresponding formulas for calculating the current user of certain items recommended values and the corresponding recommendation algorithm, and the system can get a recommendation for user.展开更多
The content-ignorant clustering method takes advantages in time complexity and space complexity than the content based methods.In this paper,the authors introduce a unified expanding method for content-ignorant web pa...The content-ignorant clustering method takes advantages in time complexity and space complexity than the content based methods.In this paper,the authors introduce a unified expanding method for content-ignorant web page clustering by mining the "click-through" log,which tries to solve the problem that the "click-through" log is sparse.The relationship between two nodes which have been expanded is also defined and optimized.Analysis and experiment show that the performance of the new method has improved,by the comparison with the standard content-ignorant method.The new method can also work without iterative clustering.展开更多
With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this pap...With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.展开更多
文摘This paper describes in detail the web data mining technology, analyzes the relationship between the data on the web site to the tourism electronic commerce (including the server log, tourism commodity database, user database, the shopping cart), access to relevant user preference information for tourism commodity. Based on these models, the paper presents recommended strategies for the site registered users, and has had the corresponding formulas for calculating the current user of certain items recommended values and the corresponding recommendation algorithm, and the system can get a recommendation for user.
文摘The content-ignorant clustering method takes advantages in time complexity and space complexity than the content based methods.In this paper,the authors introduce a unified expanding method for content-ignorant web page clustering by mining the "click-through" log,which tries to solve the problem that the "click-through" log is sparse.The relationship between two nodes which have been expanded is also defined and optimized.Analysis and experiment show that the performance of the new method has improved,by the comparison with the standard content-ignorant method.The new method can also work without iterative clustering.
基金supported in part by the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114111 Project of China under Grant No.B08004
文摘With increasingly complex website structure and continuously advancing web technologies,accurate user clicks recognition from massive HTTP data,which is critical for web usage mining,becomes more difficult.In this paper,we propose a dependency graph model to describe the relationships between web requests.Based on this model,we design and implement a heuristic parallel algorithm to distinguish user clicks with the assistance of cloud computing technology.We evaluate the proposed algorithm with real massive data.The size of the dataset collected from a mobile core network is 228.7GB.It covers more than three million users.The experiment results demonstrate that the proposed algorithm can achieve higher accuracy than previous methods.