摘要
随着互联网时代的快速发展,在线医疗社区的出现打破了时空限制,为用户提供了丰富的医疗信息和情感帮助,已经成为社会支持的重要来源,受到用户的广泛关注和参与。对在线医疗社区进行用户文本挖掘能够揭示社区中用户的参与行为,从而优化其用户管理和信息推荐。已有的研究对象主要集中在英文在线医疗社区,鲜有文献对中文在线医疗社区进行研究。基于社会支持理论,本文设计了一个中文用户文本挖掘流程来研究中文在线医疗社区中的社会支持类型和用户参与。利用中文文本挖掘及机器学习方法,对中文糖尿病社区"甜蜜家园"进行研究。本文利用LDA(Latent Dirichlet Allocation)模型进行特征提取来构建低维度文本表示向量,采用二元分类法将用户文本分为不同的社会支持类型。最后,基于分类结果使用K-means算法进行用户聚类来识别用户角色。相比传统的特征提取方法,利用LDA进行特征提取能显著地降低数据维度,优化分类模型,提高分类准确率和分类效率。结果表明,本文提出的中文用户文本挖掘流程在文本分类与用户聚类中效果显著。
The emerging online health communities (OHCs) provide abundant medical information and emotional connection for users in today's rapidly developing Internet era, without the limitation of time and space. OHCs, which have been regarded as one of the major sources of social support, have become increasingly popular among people with health issues in China. The user text mining of OHCs can reveal a user's behavior, and hence can be used to op- timize user management and information recommendation. Most studies used English OHCs as their research objects, while few focused on Chinese OHCs. Based on the social support theory, we designed a Chinese content analysis process to reveal the social support and user engagement in OHCs. Using a case study of an OHC among diabetics, we first extracted the features using an LDA model to construct low-dimensional text representation vectors, and then used binary classification to divide users' posts and replies into different types of social support. Finally, we used the K-means algorithm to cluster the users based on the classification results to identify user roles. Compared with the traditional vector space model, the LDA feature extraction can not only significantly reduce the data dimension and the amount of human annotation data, but also improve the classification accuracy and efficiency. Results showed thatthe process performed well in text classification and user clustering.
出处
《情报学报》
CSSCI
CSCD
北大核心
2017年第11期1183-1191,共9页
Journal of the China Society for Scientific and Technical Information
基金
国家自然科学基金项目"内容关系互动下的在线医疗社区用户行为演化研究"(71573197)