摘要
数据集是众多科学研究得以开展与验证的基础,学术界和工业界已经联合在许多领域打造了丰富的基准数据集,但在一些细分研究领域仍然缺少高质量数据。本文介绍了2个新基准数据集:内部安全威胁基准数据集和室内人群移动轨迹基准数据集。2个数据集经过精心的场景设计、科学的模型构造,嵌入了丰富的数据模式和交错的故事情节,采用程序驱动的合成数据生成方法,数据类型多样,规模适中,有一定的分析难度,曾被用于中国数据可视分析挑战赛。本文旨在进一步宣传和推广这2个数据集,以促进相关领域的科学研究与技术应用的发展。
Benchmark datasets are crucial for many data-dependent scientific studies and technology applications.Academic and industry communities have closely collaborated to release abundant datasets in many fields.However,there is still a lack of high-quality benchmark datasets in some specific domains.This paper introduces two open-source benchmark datasets,namely,the Insider Threat Dataset(ITD-2018)and the Indoor Crowd Movement Trajectory Dataset(ICMTD-2019).The two datasets are produced by program-driven synthetic data generation methods and are presented with well-defined scenarios,carefully-designed behavior models,rich data patterns,and vivid storylines.The two datasets were used in the ChinaVis Data Challenge.This paper aims to promote the two datasets for the development of the research and technology in relevant domains.
作者
赵颖
赵鑫
杨奎
陈思明
张卓
黄鑫
ZHAO Ying;ZHAO Xin;YANG Kui;CHEN Siming;ZHANG Zhuo;HUANG Xin(School of Computer Science,Central South University,Changsha Hunan 410083,China;School of Data Science,Fudan University,Shanghai 200433,China;Layer Visualization Department,Qi An Xin Technology Group Co.,Ltd.,Beijing 100015,China)
出处
《太赫兹科学与电子信息学报》
2022年第12期1257-1268,共12页
Journal of Terahertz Science and Electronic Information Technology
基金
国家自然科学基金资助项目(61872388)。