The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems t...The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems that humans are facing, no matter how intelligent machines are, they are unable to completely replace humans. Therefore, it is necessary to introduce human cognitive capabilities or human-like cognitive models into AI systems to develop a new form of AI, that is, hybrid-augmented intelligence. This form of AI or machine intelligence is a feasible and important developing model. Hybrid-augmented intelligence can be divided into two basic models: one is human-in-the-loop augmented intelligence with human-computer collaboration, and the other is cognitive computing based augmented intelligence, in which a cognitive model is embedded in the machine learning system. This survey describes a basic framework for human-computer collaborative hybrid-augmented intelligence, and the basic elements of hybrid-augmented intelligence based on cognitive computing. These elements include intuitive reasoning, causal models, evolution of memory and knowledge, especially the role and basic principles of intuitive reasoning for complex problem solving, and the cognitive learning framework for visual scene understanding based on memory and reasoning. Several typical applications of hybrid-augmented intelligence in related fields are given.展开更多
Most state-of-the-art robotic cars' perception systems are quite different from the way a human driver understands traffic environments. First, humans assimilate information from the traffic scene mainly through visu...Most state-of-the-art robotic cars' perception systems are quite different from the way a human driver understands traffic environments. First, humans assimilate information from the traffic scene mainly through visual perception, while the machine perception of traffic environments needs to fuse information from several different kinds of sensors to meet safety-critical requirements. Second, a robotic car requires nearly 100% correct perception results for its autonomous driving, while an experienced human driver works well with dynamic traffic environments, in which machine perception could easily produce noisy perception results. In this paper, we propose a vision-centered multi-sensor fusing framework for a traffic environment perception approach to autonomous driving, which fuses camera, LIDAR, and GIS information consistently via both geometrical and semantic constraints for efficient self- localization and obstacle perception. We also discuss robust machine vision algorithms that have been successfully integrated with the framework and address multiple levels of machine vision techniques, from collecting training data, efficiently processing sensor data, and extracting low-level features, to higher-level object and environment mapping. The proposed framework has been tested extensively in actual urban scenes with our self-developed robotic cars for eight years. The empirical results validate its robustness and efficiency.展开更多
Realizing autonomy is a hot research topic for automatic vehicles in recent years. For a long time, most of the efforts to this goal concentrate on understanding the scenes surrounding the ego-vehicle(autonomous vehi...Realizing autonomy is a hot research topic for automatic vehicles in recent years. For a long time, most of the efforts to this goal concentrate on understanding the scenes surrounding the ego-vehicle(autonomous vehicle itself). By completing lowlevel vision tasks, such as detection, tracking and segmentation of the surrounding traffic participants, e.g., pedestrian, cyclists and vehicles, the scenes can be interpreted. However, for an autonomous vehicle, low-level vision tasks are largely insufficient to give help to comprehensive scene understanding. What are and how about the past, the on-going and the future of the scene participants? This deep question actually steers the vehicles towards truly full automation, just like human beings. Based on this thoughtfulness, this paper attempts to investigate the interpretation of traffic scene in autonomous driving from an event reasoning view. To reach this goal, we study the most relevant literatures and the state-of-the-arts on scene representation, event detection and intention prediction in autonomous driving. In addition, we also discuss the open challenges and problems in this field and endeavor to provide possible solutions.展开更多
Question answering is an important problem that aims to deliver specific answers to questions posed by humans in natural language.How to efficiently identify the exact answer with respect to a given question has becom...Question answering is an important problem that aims to deliver specific answers to questions posed by humans in natural language.How to efficiently identify the exact answer with respect to a given question has become an active line of research.Previous approaches in factoid question answering tasks typically focus on modeling the semantic relevance or syntactic relationship between a given question and its corresponding answer.Most of these models suffer when a question contains very little content that is indicative of the answer.In this paper,we devise an architecture named the temporality-enhanced knowledge memory network(TE-KMN) and apply the model to a factoid question answering dataset from a trivia competition called quiz bowl.Unlike most of the existing approaches,our model encodes not only the content of questions and answers,but also the temporal cues in a sequence of ordered sentences which gradually remark the answer.Moreover,our model collaboratively uses external knowledge for a better understanding of a given question.The experimental results demonstrate that our method achieves better performance than several state-of-the-art methods.展开更多
基金Project supported by the Chinese Academy of Engi- neering, the National Natural Science Foundation of China (No. L1522023), the National Basic Research Program (973) of China (No. 2015CB351703), and the National Key Research and Development Plan (Nos. 2016YFB1001004 and 2016YFB1000903)
文摘The long-term goal of artificial intelligence (AI) is to make machines learn and think like human beings. Due to the high levels of uncertainty and vulnerability in human life and the open-ended nature of problems that humans are facing, no matter how intelligent machines are, they are unable to completely replace humans. Therefore, it is necessary to introduce human cognitive capabilities or human-like cognitive models into AI systems to develop a new form of AI, that is, hybrid-augmented intelligence. This form of AI or machine intelligence is a feasible and important developing model. Hybrid-augmented intelligence can be divided into two basic models: one is human-in-the-loop augmented intelligence with human-computer collaboration, and the other is cognitive computing based augmented intelligence, in which a cognitive model is embedded in the machine learning system. This survey describes a basic framework for human-computer collaborative hybrid-augmented intelligence, and the basic elements of hybrid-augmented intelligence based on cognitive computing. These elements include intuitive reasoning, causal models, evolution of memory and knowledge, especially the role and basic principles of intuitive reasoning for complex problem solving, and the cognitive learning framework for visual scene understanding based on memory and reasoning. Several typical applications of hybrid-augmented intelligence in related fields are given.
基金supported by the National Key Program Project of China(No.2016YFB1001004)the National Natural Science Foundation of China(Nos.91320301 and 61273252)
文摘Most state-of-the-art robotic cars' perception systems are quite different from the way a human driver understands traffic environments. First, humans assimilate information from the traffic scene mainly through visual perception, while the machine perception of traffic environments needs to fuse information from several different kinds of sensors to meet safety-critical requirements. Second, a robotic car requires nearly 100% correct perception results for its autonomous driving, while an experienced human driver works well with dynamic traffic environments, in which machine perception could easily produce noisy perception results. In this paper, we propose a vision-centered multi-sensor fusing framework for a traffic environment perception approach to autonomous driving, which fuses camera, LIDAR, and GIS information consistently via both geometrical and semantic constraints for efficient self- localization and obstacle perception. We also discuss robust machine vision algorithms that have been successfully integrated with the framework and address multiple levels of machine vision techniques, from collecting training data, efficiently processing sensor data, and extracting low-level features, to higher-level object and environment mapping. The proposed framework has been tested extensively in actual urban scenes with our self-developed robotic cars for eight years. The empirical results validate its robustness and efficiency.
基金supported by National Key R&D Program Project of China(No.2016YFB1001004)National Natural Science Foundation of China(Nos.61751308,61603057,61773311)+1 种基金China Postdoctoral Science Foundation(No.2017M613152)Collaborative Research with MSRA
文摘Realizing autonomy is a hot research topic for automatic vehicles in recent years. For a long time, most of the efforts to this goal concentrate on understanding the scenes surrounding the ego-vehicle(autonomous vehicle itself). By completing lowlevel vision tasks, such as detection, tracking and segmentation of the surrounding traffic participants, e.g., pedestrian, cyclists and vehicles, the scenes can be interpreted. However, for an autonomous vehicle, low-level vision tasks are largely insufficient to give help to comprehensive scene understanding. What are and how about the past, the on-going and the future of the scene participants? This deep question actually steers the vehicles towards truly full automation, just like human beings. Based on this thoughtfulness, this paper attempts to investigate the interpretation of traffic scene in autonomous driving from an event reasoning view. To reach this goal, we study the most relevant literatures and the state-of-the-arts on scene representation, event detection and intention prediction in autonomous driving. In addition, we also discuss the open challenges and problems in this field and endeavor to provide possible solutions.
基金supported by the National Basic Research Program(973)of China(No.2015CB352302)the National Natural Science Foundation of China(Nos.61625107,U1611461,U1509206,and 61402403)+2 种基金the Key Program of Zhejiang Province,China(No.2015C01027)the Chinese Knowledge Center for Engineering Sciences and Technologythe Fundamental Research Funds for the Central Universities,China
文摘Question answering is an important problem that aims to deliver specific answers to questions posed by humans in natural language.How to efficiently identify the exact answer with respect to a given question has become an active line of research.Previous approaches in factoid question answering tasks typically focus on modeling the semantic relevance or syntactic relationship between a given question and its corresponding answer.Most of these models suffer when a question contains very little content that is indicative of the answer.In this paper,we devise an architecture named the temporality-enhanced knowledge memory network(TE-KMN) and apply the model to a factoid question answering dataset from a trivia competition called quiz bowl.Unlike most of the existing approaches,our model encodes not only the content of questions and answers,but also the temporal cues in a sequence of ordered sentences which gradually remark the answer.Moreover,our model collaboratively uses external knowledge for a better understanding of a given question.The experimental results demonstrate that our method achieves better performance than several state-of-the-art methods.