摘要
互联网技术不断发展,新浪微博作为公开的网络社交平台拥有庞大的活跃用户.然而由于用户数量庞大,且个人信息并不一定真实,造成训练样本打标困难.本文采用了一种多视图tri-training的方法,构建三个不同的视图,利用这些视图中少量已打标样本和未打标样本不断重复互相训练三个不同的分类器,最后集成这三个分类器实现用户性别判断.本文用真实用户数据进行实验,发现和单一视图分类器相比,使用多视图tri-training学习训练后的分类器准确性更好,且需要打标的样本更少.
With the high pace of internet technology, microblog, an opening free social network, has an awful lot of active users. However, the number of sina microblog users is very large and the personal information is not always true, leading to the situation that it is hard to label the user's gender. In this study, multi-view and tri-training learning method are used to solve these problems. First three different views are constructed and three different classifiers are trained with a small number of labeled samples. And then three different classifiers are trained repeatedly by unlabeled samples. Finally, we integrate three classifiers into one to judge the user gender. We use the real user data and find that the classifier using the multi-view and tri-training learning is better than the performance of the single view classifier and needs less labeled data.
出处
《计算机系统应用》
2018年第2期240-244,共5页
Computer Systems & Applications