摘要
随着移动互联网的普及,公众更加频繁地使用社交网络分享生活并表达思想和感情。推特作为拥有全球最大用户基数的社交平台,沉淀了非常丰富的用户和文本信息。文章使用Sentiment140数据集对推特文本信息进行分析,从不同角度对数据集进行探索性数据分析,并通过F1值评估对比不同的特征提取方法和分类算法,最终确定了最佳的特征提取和分类参数。文章使用的分析流程和分析结果可作为文本情感挖掘的参考,为基于文本信息的情感分类任务以及精神疾病如抑郁症等的诊断提供助力。
With the popularity of mobile internet,the public is more frequent to use social networks to share their lives and express their thoughts and feelings.As a social platform with the world's largest user base,Twitter has accumulated a wealth of user and text information.This paper uses the Sentiment140 dataset to analyze the Twitter text information,analyzes the exploratory data of dataset from different aspects,and evaluates and compares the different feature extraction methods and classification algorithms by F1 value,and determines the best feature extraction and classification parameters finally.The analysis process and analysis results used in this paper can be used as a reference for text sentiment mining,and provide assistance for the task of sentiment classification based on text information and the diagnosis of mental diseases such as depression.
作者
张鑫衍
ZHANG Xinyan(Yangtze River Delta Center for Medical Device Evaluation and Inspection of National Medical Products Administration,Shanghai 200135,China)
出处
《现代信息科技》
2023年第15期157-161,共5页
Modern Information Technology
关键词
社交网络
文本挖掘
自然语言处理
数字疗法
精神疾病筛查
social network
text mining
Digital Therapeutics(DTx)
Mental illness screening
Natural language processing