摘要
我们从2011年开始,通过开发的App有计划地收集个人生活日志数据,目前已经有22位志愿者参与到这个项目中,收集到的有效生活日志数据超过4万余条。将这些丰富而杂乱的数据进行分类,为人们提供更清晰、有序的生活见解是一件有意义的事情。本文提出了一个生活日志文本分类模型DTC-TextCNN,通过引入LDA主题模型,对文本日志的主题特征进行提取;使用DB-SCAN算法,对发送动态时的地理位置进行聚类,得到不同的地理位置特征簇,并将提取到的文本主题特征和地理位置特征与文本动态进行拼接,输入到TextCNN模型中进行分类。实验结果表明,将地理位置这一特征引入模型中,有助于更好地理解文本发生的背景和环境,提供更丰富的上下文信息。融合了地理特征和主题特征的分类方法,弥补了生活日志文本语义模糊以及全局语义缺失的问题,提高了对于文本内容的理解水平。通过在Liu Lifelog数据集上的测试,可以看到该模型能够提高对生活日志分类的准确性。
We have been systematically collecting personal lifelog data through the development of an app since 2011. Currently, 22 volunteers have participated in this project and have collected over 40000 effective lifelog data. Classifying these rich and chaotic data to provide people with clearer and more organized insights into their lives is a meaningful thing. This article proposes a lifelog text classification model, DTC-TextCNN, which extracts topic features from text logs by introducing the LDA topic model;Using the DB-SCAN algorithm to cluster the geographical locations when sending dynamics, obtain different geographical feature clusters, concatenate the extracted text topic and geographical location features with the text dynamics, and input them into the TextCNN model for classification. The experimental results indicate that incorporating the feature of geographic location into the model helps to better understand the background and environment of text occurrence, providing richer contextual information. The classification method that integrates geographical and thematic features compensates for the problems of semantic ambiguity and global semantic loss in life log texts, and improves the level of understanding of text content. Through testing on the Liu Lifelog dataset, it can be seen that the model can improve the accuracy of lifelog classification.
出处
《计算机科学与应用》
2024年第2期480-488,共9页
Computer Science and Application