摘要
在使用数据挖掘技术对高校学生助学金等级进行分类的过程中,存在数据样本不平衡的问题.针对该问题,对基于上下文信息的特征交互网络模型CFIN进行了改进,提出了长尾分布下的助学金等级分类模型LT-CFIN.为验证学生人格特征与经济状况之间的相关性,丰富特征维度,依据大五人格理论和卡特尔16型人格理论(16PF)对学生的人格进行量化,使用学生校园卡数据集进行实验,对长尾分布下LT-CFIN模型的分类性能进行评估,整体数据集的AUC值达到98.28%,较其他对比模型提升了3.24%~4.81%,助学金3个等级的F1值分别达到了90.11%,92.60%,95.00%.实验结果表明:结合学生人格特征的LT-CFIN模型能解决数据不平衡的问题,并能有效提高分类的精准性.
In the process of using data mining to classify college students’ grants,there exists the problem of imbalanced samples. Aiming at this problem,the Context-aware Feature Interaction Network(CFIN)was improved,and LT-CFIN,a model for classification of the grade of students’ grants for long-tailed distributions,was proposed. In order to verify the correlation between students’ personality characteristics and economic status,and enrich the characteristic dimensions,the students’ personality was quantified according to the Big Five personality theory and 16 PF. Using student campus card data,the classification performance of LT-CFIN under the long-tail distribution was evaluated,the AUC of the total data set was 98.28%,which increased 3.24% to 4.81% compared with other models. F1_score of three types of grants were90.11%,92.60% and 95.00%,respectively. The experimental results showed that the LT-CFIN combined with students’ personality characteristics can solve the problem of imbalanced samples and improve the accuracy of classification effectively.
作者
郭佳君
杨波
朱剑林
朱连淼
余辉
GUO Jiajun;YANG Bo;ZHU Jianlin;ZHU Lianmiao;YU Hui(College of Computer Science,South-Central University for Nationalities,Wuhan 430074,China)
出处
《中南民族大学学报(自然科学版)》
CAS
北大核心
2022年第1期101-108,共8页
Journal of South-Central University for Nationalities:Natural Science Edition
基金
国家自然科学基金资助项目(61976226)
中南民族大学研究生学术创新基金资助项目(3212021sycxjj141)。
关键词
助学金
等级分类
长尾分布
人格量化
grant
grade classification
long-tailed distributions
quantification of personality