摘要
根据LBS用户位置信息对用户之间是否存在社会关系进行判断,是基于位置大数据的情报挖掘领域中的一个新兴问题,可为群体发现及社团划分提供信息支撑。以时空共现理论为依据,将时空共现区特征归纳为4类,提出了一种基于随机森林的用户社会关系判断方法。该方法包括特征选择和训练分类环节。首先,针对特征空间存在不相关和冗余特征而影响判断性能的问题,提出一种基于Fisher准则和χ2检验的特征选择算法,对无关、冗余特征进行剔除;然后采用随机森林进行分类判断,克服了现有方法训练速度慢、容易过拟合的问题。以LBSN用户Check-in数据为例进行的实验结果表明,该方法能够以较低的计算代价和较高的准确率实现社会关系的判断。
Inferring social ties from the location information of LBS users, which can provide more information for group discovery and community detection, is now becoming a new problem in intelligence mining from location big data. Based on the theory of co-occurrences, the features of co-occurrences region were divided into four categories, and a new method based on random forests for social ties inferring was proposed in this paper. The method consists of feature selec- tion phase and classification phase. Firstly, for the problem that uneorrelatedand redundant features will affect the accuracy of result, an algorithm based on Fisher criterion and Z2 test was proposed to remove the uncorrelated and redundant features. Secondly, random forests was applied in the classification to overcome the problem of existing method that training phase is slow and the model is easily over-fitting. Check-in data of LBSN users is chosen as test data in experiment, the results indicate the feasibility and effectiveness of the method.
出处
《计算机科学》
CSCD
北大核心
2016年第12期218-222,共5页
Computer Science
基金
国防重点实验室基金资助