A Theoretically Grounded Question Answering Data Set for Evaluating Machine Common Sense

导出

摘要 Achieving machine common sense has been a longstanding problem within Artificial Intelligence.Thus far,benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous,semantic evaluations of common sense reasoning(CSR)systems have been lacking.One expectation of the AI community is that neuro-symbolic reasoners can help bridge this gap towards more dependable systems with common sense.We propose a novel benchmark,called Theoretically Grounded common sense Reasoning(TG-CSR),modeled as a set of question answering instances,with each instance grounded in a semantic category of common sense,such as space,time,and emotions.The benchmark is few-shot i.e.,only a few training and validation examples are provided in the public release to avoid the possibility of overfitting.Results from recent evaluations suggest that TG-CSR is challenging even for state-of-the-art statistical models.Due to its semantic rigor,this benchmark can be used to evaluate the common sense reasoning capabilities of neuro-symbolic systems.

作者 Henrique Santos Ke Shen Alice M.Mulvehill Mayank Kejriwal Deborah L.McGuinness

机构地区 Rensselaer Polytechnic Institute Information Sciences Institute

出处《Data Intelligence》 EI 2024年第1期1-28,共28页 数据智能（英文）

基金 This work was funded under the DARPA Machine Common Sense(MCS)program under award number N660011924033.Further thanks to Yasaman Razeghi for supporting the evaluation of the benchmark.

关键词 REASONING INTELLIGENCE instance

分类号 F831 [经济管理—金融学] TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1尉桢楷,程梦,周夏冰,李志峰,邹博伟,洪宇,姚建民.基于类卷积交互式注意力机制的属性抽取研究[J].计算机研究与发展,2020,57(11):2456-2466. 被引量：9
2宋焕民,张云华.基于BiGRU与方面注意力模块的情感分类方法[J].智能计算机与应用,2020,10(11):83-87.
3王学震,付娟,胡军华,肖伟,王振中.济川煎体内外化学成分鉴定及其网络药理学研究[J].中成药,2024,46(4):1385-1394.
4LIANG Yu Jian,RONG Jia Hui,WANG Xue Xiu,CAI Jian Sheng,QIN Li Dong,LIU Qiu Mei,TANG Xu,MO Xiao Ting,WEI Yan Fei,LIN Yin Xia,HUANG Shen Xiang,LUO Ting Yu,GOU Ruo Yu,CAO Jie Jing,HUANG Chu Wu,LU Yu Fu,QIN Jian,ZHANG Zhi Yong.Correlation between Combined Urinary Metal Exposure and Grip Strength under Three Statistical Models:A Cross-sectional Study in Rural Guangxi[J].Biomedical and Environmental Sciences,2024,37(1):3-18. 被引量：1
5ZHANG Zhiwei,DOU Yajie,XU Xiangqian,MA Yufeng,JIANG Jiang,TAN Yuejin.Operational requirements analysis method based on question answering of WEKG[J].Journal of Systems Engineering and Electronics,2024,35(2):386-395.
6Jia Xu,Jingming Li,Jun Shi.Research Progress and Prospects of Magnetic Nanomaterials[J].Expert Review of Chinese Chemical,2024,2(1):5-8.
7赵恩源,宋宁,聂婕,王鑫,郑程予,魏志强.面向遥感视觉问答的尺度引导融合推理网络[J].软件学报,2024,35(5):2133-2149.
8Mou Yongkang,Fang Xiangming.Prehistoric Jade Artifacts of China[J].China Book International,2024(2):92-97.
9Harshvardhan Aditya,Siddansh Chawla,Gunika Dhingra,Parijat Rai,Saumil Sood,Tanmay Singh,Zeba Mohsin Wase,Arshdeep Bahga,Vijay K. Madisetti.Evaluating Privacy Leakage and Memorization Attacks on Large Language Models (LLMs) in Generative AI Applications[J].Journal of Software Engineering and Applications,2024,17(5):421-447.
10何杜博,孙胜祥.基于实例与目标相关性的多目标稀疏回归算法[J].控制与决策,2024,39(5):1478-1486.

Data Intelligence

2024年第1期

浏览历史

内容加载中请稍等...

A Theoretically Grounded Question Answering Data Set for Evaluating Machine Common Sense

相关作者

相关机构

相关主题

浏览历史