从三种伦理理论的视角看人工智能威胁问题及其对策被引量：5

AI Threat and Countermeasures from the Perspective of Three Ethical Theories

下载PDF

导出

摘要波斯特洛姆、尤德科夫斯基、索勒斯等人将人工智能的伦理风险主要归因于目标正交、价值观加载困难、工具子目标趋同、能力增强等几个方面。我们从义务论、功利主义、美德伦理学三个主流伦理理论出发,针对智能伦理中面临的几个主要技术难题及代表性解决方案展开分析,考察其优势与缺陷。从义务论出发,总有些隐藏的微妙细节是事先难以预料的,存在可能被利用的漏洞,并且很难确保伦理准则的语义足够精确;功利主义在人工智能领域的典型代表是强化学习,强化学习的框架无法回避目标正交、价值观加载困难、工具子目标趋同等难题;如果单纯依靠美德伦理学,不仅不能提供正确行动的判别标准,各种美德的语义定义也是模糊的。虽然三个理论各有各的困难,但有希望整合出一条综合性的解决路径,用义务论的伦理准则约束智能体的行动空间,借助美德伦理学弥补功利主义的不足,采用"合作逆强化学习"的方法加载相对可靠的价值观,在确保不损失太多智能的情况下依然能够与人的行为保持一致,从而在最大程度上降低伦理风险。 Bostrom,Yudkowsky and Soares outline the main causes of AI ethical risks as follows:goals and capabilities are orthogonal,sufficiently optimized objectives tend to converge on adversarial instrumental strategies,AGI systems are likely to show rapid capability gains,and aligning advanced AI systems with our interests is difficult. We start from the perspectives of three mainstream ethical theories:deontology,utilitarianism and virtue ethics,then analyze several major technical problems and representative solutions in AI ethics,and examine their advantages and disadvantages. From the deontological analysis,there may be some subtle details that are unpredictable and loopholes that can be exploited,and the semantics of the ethical norms may be ambiguous. The typical representative of utilitarianism is reinforced learning,which cannot avoid the problems of goal orthogonality,alignment difficulty and instrumental convergence. Relying solely on virtue ethics will not only fail to provide a criterion for correct action,but the semantics of virtues are also ambiguous. Although each theory has its own difficulties,it is hopeful to integrate a comprehensive solution. We restrict the action space of agents with the ethical norms of deontology,remedy the shortcomings of utilitarianism with virtue ethics,and align our interests to AI systems by inverse reinforced learning,so as to minimize the ethical risks by ensuring that machines’ behaviors are consistent with ours without loss of too much intelligence.

作者李熙周日晴 LI Xi;ZHOU Riqing

机构地区中南大学公共管理学院

出处《江汉大学学报（社会科学版）》 2019年第1期92-100,126,共10页 Journal of Jianghan University(Social Science Edition)

基金国家社科基金项目"通用人工智能的哲学基础研究"(17CZX020)

关键词人工智能伦理目标正交合作逆强化学习 AI ethics goal orthogonality inverse reinforced learning

分类号 B82-05 [哲学宗教—伦理学] TP18 [自动化与计算机技术—控制理论与控制工程]