摘要
随着互联网行业和航空行业的高速发展,越来越多的人倾向在网站上购买机票,许多乘客会在乘坐之后对航班进行评论。本文基于文本数据研究影响乘客满意度的航司服务特征,帮助航空公司进行相应服务的改善,提升乘客航程体验。本文利用python爬虫技术爬取CAPSE网站东方航空公司乘客的评论数据,首先对数据进行预处理;其次统计评论文本高频词汇;再应用LDA主题模型方法获取主题关键字,从用户角度挖掘乘客关注的服务特征;然后利用TF-IDF方法将文本评论转化为基于服务特征的词向量矩阵。最后通过相关系数法和基于决策树的特征重要性分析方法,发现航空公司服务中影响乘客满意的关键因素是飞机是否准时、空乘服务水平、客舱环境等问题。
With the rapid development of theInternet industry and the aviation industry,more and more people tend to buytickets on the website,and many passengers will comment on the flight aftertaking the flight.Based on textual data,this paper studies thecharacteristics of airline services that affect passenger satisfaction,so asto help airlines improve corresponding services and enhance passenger flightexperience.Firstly,this paper uses python crawler technology to crawl thecomment data of passengers of China Eastern airlines on CAPSE website.Secondly,high frequency words in comment text are counted.Then the LDA theme modelmethod is applied to obtain the theme keywords,and the service characteristicsconcerned by passengers are mined from the perspective of users.Then TF-IDF methodis used to transform text comments into a word vector matrix based on servicecharacteristics.Finally,through correlation coefficient method and featureimportance analysis method based on decision tree,it is found that the keyfactors affecting passenger satisfaction in airline service are whether theplane is on time,flight attendant service level,cabin environment and so on.
出处
《数据挖掘》
2019年第3期88-95,共8页
Hans Journal of Data Mining