近年来,大语言模型(large language model,LLM)在一系列下游任务中得到了广泛应用,并在多个领域表现出了卓越的文本理解、生成与推理能力.然而,越狱攻击正成为大语言模型的新兴威胁.越狱攻击能够绕过大语言模型的安全机制,削弱价值观对...近年来,大语言模型(large language model,LLM)在一系列下游任务中得到了广泛应用,并在多个领域表现出了卓越的文本理解、生成与推理能力.然而,越狱攻击正成为大语言模型的新兴威胁.越狱攻击能够绕过大语言模型的安全机制,削弱价值观对齐的影响,诱使经过对齐的大语言模型产生有害输出.越狱攻击带来的滥用、劫持、泄露等问题已对基于大语言模型的对话系统与应用程序造成了严重威胁.对近年的越狱攻击研究进行了系统梳理,并基于攻击原理将其分为基于人工设计的攻击、基于模型生成的攻击与基于对抗性优化的攻击3类.详细总结了相关研究的基本原理、实施方法与研究结论,全面回顾了大语言模型越狱攻击的发展历程,为后续的研究提供了有效参考.对现有的安全措施进行了简略回顾,从内部防御与外部防御2个角度介绍了能够缓解越狱攻击并提高大语言模型生成内容安全性的相关技术,并对不同方法的利弊进行了罗列与比较.在上述工作的基础上,对大语言模型越狱攻击领域的现存问题与前沿方向进行探讨,并结合多模态、模型编辑、多智能体等方向进行研究展望.展开更多
Currently, most public higher learning institutions in Tanzania rely on traditional in-class examinations, requiring students to register and present identification documents for examinations eligibility verification....Currently, most public higher learning institutions in Tanzania rely on traditional in-class examinations, requiring students to register and present identification documents for examinations eligibility verification. This system, however, is prone to impersonations due to security vulnerabilities in current students’ verification system. These vulnerabilities include weak authentication, lack of encryption, and inadequate anti-counterfeiting measures. Additionally, advanced printing technologies and online marketplaces which claim to produce convincing fake identification documents make it easy to create convincing fake identity documents. The Improved Mechanism for Detecting Impersonations (IMDIs) system detects impersonations in in-class exams by integrating QR codes and dynamic question generation based on student profiles. It consists of a mobile verification app, built with Flutter and communicating via RESTful APIs, and a web system, developed with Laravel using HTML, CSS, and JavaScript. The two components communicate through APIs, with MySQL managing the database. The mobile app and web server interact to ensure efficient verification and security during examinations. The implemented IMDIs system was validated by a mobile application which is integrated with a QR codes scanner for capturing codes embedded in student Identity Cards and linking them to a dynamic question generation model. The QG model uses natural language processing (NLP) algorithm and Question Generation (QG) techniques to create dynamic profile questions. Results show that the IMDIs system could generate four challenging profile-based questions within two seconds, allowing the verification of 200 students in 33 minutes by one operator. The IMDIs system also tracks exam-eligible students, aiding in exam attendance and integrates with a Short Message Service (SMS) to report impersonation incidents to a dedicated security officer in real-time. The IMDIs system was tested and found to be 98% secure, 100% convenient, with a 0% false rejection rate and a 2% false acceptance rate, demonstrating its security, reliability, and high performance.展开更多
文摘近年来,大语言模型(large language model,LLM)在一系列下游任务中得到了广泛应用,并在多个领域表现出了卓越的文本理解、生成与推理能力.然而,越狱攻击正成为大语言模型的新兴威胁.越狱攻击能够绕过大语言模型的安全机制,削弱价值观对齐的影响,诱使经过对齐的大语言模型产生有害输出.越狱攻击带来的滥用、劫持、泄露等问题已对基于大语言模型的对话系统与应用程序造成了严重威胁.对近年的越狱攻击研究进行了系统梳理,并基于攻击原理将其分为基于人工设计的攻击、基于模型生成的攻击与基于对抗性优化的攻击3类.详细总结了相关研究的基本原理、实施方法与研究结论,全面回顾了大语言模型越狱攻击的发展历程,为后续的研究提供了有效参考.对现有的安全措施进行了简略回顾,从内部防御与外部防御2个角度介绍了能够缓解越狱攻击并提高大语言模型生成内容安全性的相关技术,并对不同方法的利弊进行了罗列与比较.在上述工作的基础上,对大语言模型越狱攻击领域的现存问题与前沿方向进行探讨,并结合多模态、模型编辑、多智能体等方向进行研究展望.
文摘Currently, most public higher learning institutions in Tanzania rely on traditional in-class examinations, requiring students to register and present identification documents for examinations eligibility verification. This system, however, is prone to impersonations due to security vulnerabilities in current students’ verification system. These vulnerabilities include weak authentication, lack of encryption, and inadequate anti-counterfeiting measures. Additionally, advanced printing technologies and online marketplaces which claim to produce convincing fake identification documents make it easy to create convincing fake identity documents. The Improved Mechanism for Detecting Impersonations (IMDIs) system detects impersonations in in-class exams by integrating QR codes and dynamic question generation based on student profiles. It consists of a mobile verification app, built with Flutter and communicating via RESTful APIs, and a web system, developed with Laravel using HTML, CSS, and JavaScript. The two components communicate through APIs, with MySQL managing the database. The mobile app and web server interact to ensure efficient verification and security during examinations. The implemented IMDIs system was validated by a mobile application which is integrated with a QR codes scanner for capturing codes embedded in student Identity Cards and linking them to a dynamic question generation model. The QG model uses natural language processing (NLP) algorithm and Question Generation (QG) techniques to create dynamic profile questions. Results show that the IMDIs system could generate four challenging profile-based questions within two seconds, allowing the verification of 200 students in 33 minutes by one operator. The IMDIs system also tracks exam-eligible students, aiding in exam attendance and integrates with a Short Message Service (SMS) to report impersonation incidents to a dedicated security officer in real-time. The IMDIs system was tested and found to be 98% secure, 100% convenient, with a 0% false rejection rate and a 2% false acceptance rate, demonstrating its security, reliability, and high performance.