The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users wit...The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.展开更多
Natural Language To SQL(NL2SQL)任务的目标是将自然语言查询转化为结构化查询语言。现有的大多数模型所使用的方法是将NL2SQL任务分解为多个子任务,为每个子任务构建一个专用的全连接神经网络解码器。这些方法存在一些问题,如模型设...Natural Language To SQL(NL2SQL)任务的目标是将自然语言查询转化为结构化查询语言。现有的大多数模型所使用的方法是将NL2SQL任务分解为多个子任务,为每个子任务构建一个专用的全连接神经网络解码器。这些方法存在一些问题,如模型设计与模型结构较为简单,在学习不同子任务之间的依赖关系的能力有限。为了解决这些问题,将多通道并行LSTM模型引入到NL2SQL任务中,并采用稀疏连接层联合不同的子任务解码器,提升神经网络表现能力和计算资源的使用效率。在WikiSQL数据集上的评估结果表明,与基线模型相比,文中提出的模型计算精度较好。展开更多
基金supported by the National Natural Science Foundation of China(No.62302242)the China Postdoctoral Science Foundation(No.2023M731802).
文摘The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.
文摘Natural Language To SQL(NL2SQL)任务的目标是将自然语言查询转化为结构化查询语言。现有的大多数模型所使用的方法是将NL2SQL任务分解为多个子任务,为每个子任务构建一个专用的全连接神经网络解码器。这些方法存在一些问题,如模型设计与模型结构较为简单,在学习不同子任务之间的依赖关系的能力有限。为了解决这些问题,将多通道并行LSTM模型引入到NL2SQL任务中,并采用稀疏连接层联合不同的子任务解码器,提升神经网络表现能力和计算资源的使用效率。在WikiSQL数据集上的评估结果表明,与基线模型相比,文中提出的模型计算精度较好。