While recommendation plays an increasingly critical role in our living, study, work, and entertainment, the recommendations we receive are often for irrelevant, duplicate, or uninteresting products and ser- vices. A c...While recommendation plays an increasingly critical role in our living, study, work, and entertainment, the recommendations we receive are often for irrelevant, duplicate, or uninteresting products and ser- vices. A critical reason for such bad recommendations lies in the intrinsic assumption that recommend- ed users and items are independent and identically distributed (liD) in existing theories and systems. Another phenomenon is that, while tremendous efforts have been made to model specific aspects of users or items, the overall user and item characteristics and their non-IIDness have been overlooked. In this paper, the non-liD nature and characteristics of recommendation are discussed, followed by the non-liD theoretical framework in order to build a deep and comprehensive understanding of the in- trinsic nature of recommendation problems, from the perspective of both couplings and heterogeneity. This non-liD recommendation research triggers the paradigm shift from lid to non-liD recommendation research and can hopefully deliver informed, relevant, personalized, and actionable recommendations. It creates exciting new directions and fundamental solutions to address various complexities including cold-start, sparse data-based, cross-domain, group-based, and shilling attack-related issues.展开更多
It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical informati...It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical information is used to analyze the coupled label relationship.In this work,firstly Bayesian and hypothesis testing methods are applied to predict the label set size of testing samples within their k nearest neighbor samples,which combines global and local statistical information,and then apriori algorithm is used to mine the label coupling relationship among multiple labels rather than pairwise labels,which can exploit the label coupling relations more accurately and comprehensively.The experimental results on text,biology and audio datasets shown that,compared with the state-of-the-art algorithm,the proposed algorithm can obtain better performance on 5 common criteria.展开更多
In this paper a novel coupled attribute similarity learning method is proposed with the basis on the multi-label categorical data(CASonMLCD).The CASonMLCD method not only computes the correlations between different ...In this paper a novel coupled attribute similarity learning method is proposed with the basis on the multi-label categorical data(CASonMLCD).The CASonMLCD method not only computes the correlations between different attributes and multi-label sets using information gain,which can be regarded as the important degree of each attribute in the attribute learning method,but also further analyzes the intra-coupled and inter-coupled interactions between an attribute value pair for different attributes and multiple labels.The paper compared the CASonMLCD method with the OF distance and Jaccard similarity,which is based on the MLKNN algorithm according to 5common evaluation criteria.The experiment results demonstrated that the CASonMLCD method can mine the similarity relationship more accurately and comprehensively,it can obtain better performance than compared methods.展开更多
The adaptive local hyperplane (ALH) algorithm is a very recently proposed classifier, which has been shown to perform better than many other benchmarking classifiers including support vector machine (SVM), K-neare...The adaptive local hyperplane (ALH) algorithm is a very recently proposed classifier, which has been shown to perform better than many other benchmarking classifiers including support vector machine (SVM), K-nearest neighbor (KNN), linear discriminant analysis (LDA), and K-local hyperplane distance nearest neighbor (HKNN) algorithms. Although the ALH algorithm is well formulated and despite the fact that it performs well in practice, its scalability over a very large data set is limited due to the online distance computations associated with all training instances. In this paper, a novel algorithm, called ALH-Fast and obtained by combining the classification tree algorithm and the ALH, is proposed to reduce the computational load of the ALH algorithm. The experiment results on two large data sets show that the ALH-Fast algorithm is both much faster and more accurate than the ALH algorithm.展开更多
文摘While recommendation plays an increasingly critical role in our living, study, work, and entertainment, the recommendations we receive are often for irrelevant, duplicate, or uninteresting products and ser- vices. A critical reason for such bad recommendations lies in the intrinsic assumption that recommend- ed users and items are independent and identically distributed (liD) in existing theories and systems. Another phenomenon is that, while tremendous efforts have been made to model specific aspects of users or items, the overall user and item characteristics and their non-IIDness have been overlooked. In this paper, the non-liD nature and characteristics of recommendation are discussed, followed by the non-liD theoretical framework in order to build a deep and comprehensive understanding of the in- trinsic nature of recommendation problems, from the perspective of both couplings and heterogeneity. This non-liD recommendation research triggers the paradigm shift from lid to non-liD recommendation research and can hopefully deliver informed, relevant, personalized, and actionable recommendations. It creates exciting new directions and fundamental solutions to address various complexities including cold-start, sparse data-based, cross-domain, group-based, and shilling attack-related issues.
基金Supported by Australian Research Council Discovery(DP130102691)the National Science Foundation of China(61302157)+1 种基金China National 863 Project(2012AA12A308)China Pre-research Project of Nuclear Industry(FZ1402-08)
文摘It is a key challenge to exploit the label coupling relationship in multi-label classification(MLC)problems.Most previous work focused on label pairwise relations,in which generally only global statistical information is used to analyze the coupled label relationship.In this work,firstly Bayesian and hypothesis testing methods are applied to predict the label set size of testing samples within their k nearest neighbor samples,which combines global and local statistical information,and then apriori algorithm is used to mine the label coupling relationship among multiple labels rather than pairwise labels,which can exploit the label coupling relations more accurately and comprehensively.The experimental results on text,biology and audio datasets shown that,compared with the state-of-the-art algorithm,the proposed algorithm can obtain better performance on 5 common criteria.
基金Supported by Australian Research Council Discovery(DP130102691)the National Science Foundation of China(61302157)+1 种基金China National 863 Project(2012AA12A308)China Pre-research Project of Nuclear Industry(FZ1402-08)
文摘In this paper a novel coupled attribute similarity learning method is proposed with the basis on the multi-label categorical data(CASonMLCD).The CASonMLCD method not only computes the correlations between different attributes and multi-label sets using information gain,which can be regarded as the important degree of each attribute in the attribute learning method,but also further analyzes the intra-coupled and inter-coupled interactions between an attribute value pair for different attributes and multiple labels.The paper compared the CASonMLCD method with the OF distance and Jaccard similarity,which is based on the MLKNN algorithm according to 5common evaluation criteria.The experiment results demonstrated that the CASonMLCD method can mine the similarity relationship more accurately and comprehensively,it can obtain better performance than compared methods.
文摘The adaptive local hyperplane (ALH) algorithm is a very recently proposed classifier, which has been shown to perform better than many other benchmarking classifiers including support vector machine (SVM), K-nearest neighbor (KNN), linear discriminant analysis (LDA), and K-local hyperplane distance nearest neighbor (HKNN) algorithms. Although the ALH algorithm is well formulated and despite the fact that it performs well in practice, its scalability over a very large data set is limited due to the online distance computations associated with all training instances. In this paper, a novel algorithm, called ALH-Fast and obtained by combining the classification tree algorithm and the ALH, is proposed to reduce the computational load of the ALH algorithm. The experiment results on two large data sets show that the ALH-Fast algorithm is both much faster and more accurate than the ALH algorithm.