摘要
数据驱动的机器学习模型往往依赖于大幅提高训练样本数量以降低其泛化风险,对有限样本的应用场景适用性差。为此,该文在数据驱动的模型基础上引入领域知识这一学习要素,提出数据–知识融合的机器学习范式,可以降低机器学习方法的泛化风险。首先,给出关于数据–知识融合机器学习问题的表示,分析数据–知识融合的多种不同模式,并建立一般性的数据–知识融合机器学习模型。然后,分析该融合学习模型的解的形态,给出评价该模型在问题的全域学习空间和局部学习空间的泛化能力的表示。最后,结合实际应用案例,讨论该融合学习模型在回归分析、模式识别、动态规划等任务中如何实现数据–知识的融合学习。与单纯数据驱动的模型相比,数据–知识融合模型可使机器学习过程更加高效,并且在不提高训练样本数量的前提下降低学习器泛化风险。
In machine learning, the main strategy for reducing the generalization risk of the data-driven model(DDM) is to increase the training samples. In this paper, an alternative method to reduce the generalization risk was proposed, in which domain knowledges are embedded into the data-driven learning paradigm and it is particularly useful for the scenarios with limited training samples. First, we introduce the formulation of the data-driven model incorporated with knowledge(KDM), and its solutions. Then, the characteristics of the KDM’s solutions were analyzed, and the methods to measure KDM’s generalization capability in both the global learning space and local learning space were given. Finally, the applicability of KDM in regression analysis, pattern recognition, and dynamic programming problems were discussed. Compared with DDM, the learning procedure in KDM is more efficient and can reduce the generalization risk without increasing the training samples.
作者
尚宇炜
郭剑波
吴文传
盛万兴
马钊
SHANG Yuwei;GUO Jianbo;WU Wenchuan;SHENG Wanxing;MA Zhao(State Key Lab of Control and Simulation of Power Systems and Generation Equipments (Tsinghua University), Haidian District, Beijing 100084, China;China Electric Power Research Institute, Haidian District, Beijing 100192, China)
出处
《中国电机工程学报》
EI
CSCD
北大核心
2019年第15期4406-4415,共10页
Proceedings of the CSEE
关键词
机器学习
泛化风险
数据驱动
知识引导
machine learning
generalization risk
data driven
knowledge guiding