摘要
我国旅游业经过40年的高速度发展,现在进入了高质量发展新阶段。同时,随着疫情防控进入常态化和旅游市场逐步回暖,“互联网 + 旅游”新业态发展迅猛,海量网络搜索数据潜在反映着人们的旅游需求。因此,本文利用网络搜索数据(Internet search data, IS)用于北京市旅游需求预测。首先,利用Python爬取在线旅游网站的游记攻略,使用NLPIR分词系统提取高频词汇,并结合旅游六要素确定初始关键词词库。其次,采用需求图谱、百度指数相关词热度推荐、北京旅游网推荐等7种方法拓展关键词,经过Adaptive Lasso等方法筛选得到9个最佳预测变量,并引入季节性虚拟变量,然后结合网络搜索关键词和随机森林算法、极限梯度提升算法及支持向量回归算法对北京市旅游需求进行建模和训练。最后,借助多个预测性能指标,确定支持向量回归模型为最优模型。研究结果表明:网络搜索数据与旅游需求显著相关,具有很强的时效性,并且支持向量回归模型能够很好地解决突发事件和小样本问题,用于短期旅游需求预测是高效可行的。
After 40 years of rapid development, China’s tourism industry has entered a new stage of high-quality development. Meanwhile, with the gradual normalization of epidemic prevention and control and the gradual warming of the tourism market, the new format of “Internet + tourism” is developing rapidly, and massive Internet search data potentially reflects the tourism demand of people. Therefore, this paper attempts to apply Internet search data to the tourism demand forecast of Beijing. Firstly, Python is used to crawl the travel notes of online travel websites, NLPIR word segmentation system is used to extract high-frequency words, and six elements of tourism are combined to determine the initial keyword thesaurus. Secondly, seven methods, such as demand map, related word heat recommendation from Baidu index and recommendation from Beijing travel website, etc., are used to expand keywords. Nine predictive variables are selected by adaptive lasso and other methods, the seasonal dummy variables are introduced, then RF algorithm, XGBoost algorithm and SVR algorithm are combined to model and train the tourism demand of Beijing. Finally, the support vector regression model is determined as the optimal model with the help of multiple prediction performance indicators. The results show that there is a significant correlation between Internet search data and tourism demand, and Internet search data has strong timeliness. In addition, SVR model can well solve the emergency and small sample problems, and it is efficient and feasible to predict short-term tourism demand.
出处
《数据挖掘》
2022年第2期133-151,共19页
Hans Journal of Data Mining