摘要
大规模科学装置与重大科学实验使得科学发现进入了数据密集型的第四范式,借助蓬勃发展的人工智能技术促进智能科学发现势在必行.机器学习作为人工智能中的一项重要技术,已广泛应用于各个科学领域.然而,现有工作仅研究特定任务下的机器学习方法,没能抽象出一个通用的智能科学发现研究框架.本文首先总结了科学发现任务中常用的机器学习方法,并将科学任务归类为五大机器学习问题.其次,提出了基于机器学习的智能科学发现研究框架,作为“AI for Science”的典型范例,阐述了一种高效的智能科学发现模式.再次,本文以时域天文学中发现瞬变事件这一科学任务为例,通过实验证明了唯有恰当地结合领域知识后,机器学习算法才能更好地服务于智能科学发现,验证了该框架的有效性.最后进行总结与展望,以期对各领域进行智能科学发现形成参考意义.
Probing valuable scientific phenomena is very important for revealing the laws of the universe and verifying the proposed scientific hypothesis.The rare scientific phenomena prompt people to build many large-scale scientific devices or carry out large-scale scientific experiments to collect a lot of scientific data for analysis,which is called data-intensive scientific discovery.In this paradigm,relying solely on the expertise of scientists is no longer feasible and scientific discovery needs a kind of more efficient method.As a result,a kind of key artificial intelligence(AI)technique,machine learning,plays a more and more important role in it.In other words,“AI for Science”is booming.Scientific big data and scientific discovery tasks are different from general big data and tasks on the Internet.For example,scientific big data has a more long lifecycle,more uncertainty,and hard to get be repeatedly.Scientific discovery tasks are not only innovative but also rigorous.Because of the above characteristics,there are a lot of tough and common problems when different machine learning methods meet scientific discovery.However,the existing work only focused on specific machine learning algorithms to accomplish specific scientific discovery tasks,rather than giving a general research framework of AI-driven scientific discovery to solve these common problems.In this paper,we first summarize the latest development of intelligent scientific discovery in six scientific fields,in which machine learning has been widely used.On the one hand,we analyze frequently-used methods in scientific discovery tasks from machine learning and deep learning two perspectives.On the other hand,we classify scientific discovery tasks into 5 kinds of machine-learning problems from basic science and applied science two perspectives.Secondly,we propose a general research framework for intelligent scientific discovery as an example of“AI for Science”.It describes an efficient mode of applying machine learning to scientific discovery and helps scientists make sense of how to use machine learning efficiently in scientific tasks.Corresponding to the scientific discovery pipeline,this framework is composed of six components.Every component solves several challenges when scientific discovery meets machine learning.These six components are scientific data integration and sharing,scientific discovery task transformation,scientific data pre-processing,scientific discovery method,scientific discovery verification,and domain knowledge constraints,respectively.Thirdly,we verify this framework through a series of experiments.We choose time-domain astronomy as a typical scientific field of“Big Data+AI”.In this field,we aim at discovering a kind of transient event,which is called a stellar flare.To compare different discovery methods,we use seven machine-learning methods and a classical method in time-domain astronomy.One of the most important conclusions is that machine learning is not omnipotent.Only when combined with domain knowledge,will machine learning reach its full potential.Lastly,we summarize three challenges that need to be solved in the future and three lessons learned.Machine learning has its advantages and disadvantages for scientific discovery.Scientists should make more efforts in science-oriented machine learning,not only developing machine learning applications for scientific discovery.
作者
孟小峰
郝新丽
马超红
杨晨
艾山·毛力尼亚孜
吴潮
魏建彦
MENG Xiao-Feng;HAO Xin-Li;MA Chao-Hong;YANG Chen;MAOLINIYAZI Ai-Shan;WU Chao;WEI Jian-Yan(School of Information,Renmin University of China,Beijing 100872;China National Clearing Center,Beijing 100048;Department of Computer Science and Technology,Tsinghua University,Beijing 100084;National Astronomical Observatories,Chinese Academy of Sciences,Beijing 100101)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2023年第5期877-895,共19页
Chinese Journal of Computers
基金
国家自然科学基金项目(62172423,91846204,U1931133)资助.
关键词
科学发现
机器学习
科学大数据
瞬变事件发现
智能科学发现
scientific discovery
machine learning
scientific big data
transient event discovery
intelligent scientific discovery