摘要
用户移动上网访问基站的轨迹数据从时间和空间上反映了用户的生活习惯和行为模式。时间和空间信息同时产生不应分别考虑。因此,该文在传统的TF-IDF方法基础上提出了与时间相关的TFT-IDFT方法,用以提取轨迹点语义信息,进而采用word2vec方法将轨迹数据转化为文档分析。提取包含位置信息和语义信息的轨迹时空词向量,在此基础上建立多分类模型对用户所属年龄段进行识别。实验结果表明,改进的TFT-IDFT方法在提取轨迹语义时更具合理性,且基于此方法构建的轨迹时空词向量应用于分类模型,对用户所属年龄阶段的识别效果更好。
The trajectory data generated from users’mobile access to base stations reflect their life styles and behavior patterns in terms of both time and space.Based on the fact that temporal and spatial information are produced simultaneously,this paper proposes a TFT-IDFT method to extract semantic information from trajectories.First,a word embedding method named word2 vec is applied to build trajectory word vectors which include users’geometric and semantic information.Then,classification methods are used on these vectors to discriminate user age groups.The result shows that TFT-IDFT is more applicable than TF-IDF in the task of extracting semantic trajectories,and word vectors based on this method performs better in the age classification task.
作者
吴浩
张威强
张朋柱
WU Hao;ZHANG Weiqiang;ZHANG Pengzhu(Antai College of Economics&Management,Shanghai Jiao Tong University,Shanghai 200030,China)
出处
《中文信息学报》
CSCD
北大核心
2019年第7期118-127,共10页
Journal of Chinese Information Processing
基金
国家自然科学基金(91646205,71421002)
上海交通大学中央高校基本科研业务费资助项目(16JCCS08)