摘要
Achieving machine common sense has been a longstanding problem within Artificial Intelligence.Thus far,benchmark data sets that are grounded in a theory of common sense and can be used to conduct rigorous,semantic evaluations of common sense reasoning(CSR)systems have been lacking.One expectation of the AI community is that neuro-symbolic reasoners can help bridge this gap towards more dependable systems with common sense.We propose a novel benchmark,called Theoretically Grounded common sense Reasoning(TG-CSR),modeled as a set of question answering instances,with each instance grounded in a semantic category of common sense,such as space,time,and emotions.The benchmark is few-shot i.e.,only a few training and validation examples are provided in the public release to avoid the possibility of overfitting.Results from recent evaluations suggest that TG-CSR is challenging even for state-of-the-art statistical models.Due to its semantic rigor,this benchmark can be used to evaluate the common sense reasoning capabilities of neuro-symbolic systems.
基金
This work was funded under the DARPA Machine Common Sense(MCS)program under award number N660011924033.Further thanks to Yasaman Razeghi for supporting the evaluation of the benchmark.