摘要
Data science is a rapidly growing academic field with significant implications for all conventional scientific studies. However, most relevant studies have been limited to one or several facets of data science from a specific application domain perspective and less to discuss its theoretical framework. Data science is unique in that its research goals, perspectives, and body of knowledge are distinct from other sciences. The core theories of data science are the DIKW pyramid, data-intensive scientific discovery, data science life cycle, data wrangling or munging,big data analytics, data management, and governance, data products Dev Ops, and big data visualization. Six main trends characterize the recent theoretical studies on data science are:(1)the growing significance of Data Ops,(2) the rise of citizen data scientists,(3) enabling augmented data science,(4) integrating data warehouse with data lake,(5) diversity of domain-specific data science, and(6) implementing data stories as data products. Further development of data science should prioritize four ways to turn challenges into opportunities:(1) accelerating theoretical studies of data science,(2) the trade-off between explainability and performance,(3) achieving data ethics, privacy and trust, and(4) aligning academic curricula with industrial needs.