Approximations based on random Fourier features have recently emerged as an efficient and elegant method for designing large-scale machine learning tasks.Unlike approaches using the Nystr?m method,which randomly sampl...Approximations based on random Fourier features have recently emerged as an efficient and elegant method for designing large-scale machine learning tasks.Unlike approaches using the Nystr?m method,which randomly samples the training examples,we make use of random Fourier features,whose basis functions(i.e.,cosine and sine)are sampled from a distribution independent from the training sample set,to cluster preference data which appears extensively in recommender systems.Firstly,we propose a two-stage preference clustering framework.In this framework,we make use of random Fourier features to map the preference matrix into the feature matrix,soon afterwards,utilize the traditional k-means approach to cluster preference data in the transformed feature space.Compared with traditional preference clustering,our method solves the problem of insufficient memory and greatly improves the efficiency of the operation.Experiments on movie data sets containing 100000 ratings,show that the proposed method is more effective in clustering accuracy than the Nystr?m and k-means,while also achieving better performance than these clustering approaches.展开更多
Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entr...Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entropy machine(MEM)combined with a synthetic minority over-sampling technique for handling binary classification problems with high imbalance ratios,large numbers of data samples,and medium/large numbers of features.A random Fourier feature representation of kernel functions and primal estimated sub-gradient solver for support vector machine(PEGASOS)are applied to speed up the classic MEM.Experiments have been conducted using various real datasets(including two China Mobile datasets and several other standard test datasets)with various configurations.The obtained results demonstrate that the proposed algorithm has extremely low complexity but an excellent overall classification performance(in terms of several widely used evaluation metrics)as compared to the classic MEM and some other state-of-the-art methods.The proposed algorithm is particularly valuable in big data applications owing to its significantly low computational complexity.展开更多
基金supported by the National Natural Science Foundation of China(Nos.61872260 and 61592419)the Natural Science Foundation of Shanxi Province(No.201703D421013).
文摘Approximations based on random Fourier features have recently emerged as an efficient and elegant method for designing large-scale machine learning tasks.Unlike approaches using the Nystr?m method,which randomly samples the training examples,we make use of random Fourier features,whose basis functions(i.e.,cosine and sine)are sampled from a distribution independent from the training sample set,to cluster preference data which appears extensively in recommender systems.Firstly,we propose a two-stage preference clustering framework.In this framework,we make use of random Fourier features to map the preference matrix into the feature matrix,soon afterwards,utilize the traditional k-means approach to cluster preference data in the transformed feature space.Compared with traditional preference clustering,our method solves the problem of insufficient memory and greatly improves the efficiency of the operation.Experiments on movie data sets containing 100000 ratings,show that the proposed method is more effective in clustering accuracy than the Nystr?m and k-means,while also achieving better performance than these clustering approaches.
基金The author Feng Yin was funded by the Shenzhen Science and Technology Innovation Council(No.JCYJ20170307155957688)and by National Natural Science Foundation of China Key Project(No.61731018)The authors Feng Yin and Shuguang(Robert)Cui were funded by Shenzhen Fundamental Research Funds under Grant(Key Lab)No.ZDSYS201707251409055,Grant(Peacock)No.KQTD2015033114415450,and Guangdong province“The Pearl River Talent Recruitment Program Innovative and Entrepreneurial Teams in 2017”-Data Driven Evolution of Future Intelligent Network Team.The associate editor coordinating the review of this paper and approving it for publication was X.Cheng.
文摘Driven by the need of a plethora of machine learning applications,several attempts have been made at improving the performance of classifiers applied to imbalanced datasets.In this paper,we present a fast maximum entropy machine(MEM)combined with a synthetic minority over-sampling technique for handling binary classification problems with high imbalance ratios,large numbers of data samples,and medium/large numbers of features.A random Fourier feature representation of kernel functions and primal estimated sub-gradient solver for support vector machine(PEGASOS)are applied to speed up the classic MEM.Experiments have been conducted using various real datasets(including two China Mobile datasets and several other standard test datasets)with various configurations.The obtained results demonstrate that the proposed algorithm has extremely low complexity but an excellent overall classification performance(in terms of several widely used evaluation metrics)as compared to the classic MEM and some other state-of-the-art methods.The proposed algorithm is particularly valuable in big data applications owing to its significantly low computational complexity.