Artificial intelligence(AI)relies on data and algorithms.State-of-the-art(SOTA)AI smart algorithms have been developed to improve the performance of AI-oriented structures.However,model-centric approaches are limited ...Artificial intelligence(AI)relies on data and algorithms.State-of-the-art(SOTA)AI smart algorithms have been developed to improve the performance of AI-oriented structures.However,model-centric approaches are limited by the absence of high-quality data.Data-centric AI is an emerging approach for solving machine learning(ML)problems.It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline.However,data-centric AI approaches are not well documented.Researchers have conducted various experiments without a clear set of guidelines.This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems.These include big data quality assessment,data preprocessing,transfer learning,semi-supervised learning,machine learning operations(MLOps),and the effect of adding more data.In addition,it highlights recent data-centric techniques adopted by ML practitioners.We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them.Finally,we discuss the causes of technical debt in AI.Technical debt builds up when software design and implementation decisions run into“or outright collide with”business goals and timelines.This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches.展开更多
The purpose of this paper(presented online as a keynote lecture at the 25th Annual Indonesian Geotechnical Conference on 10 Nov 2021)is to broadly conceptualize the agenda for data-centric geotechnics,an emerging fiel...The purpose of this paper(presented online as a keynote lecture at the 25th Annual Indonesian Geotechnical Conference on 10 Nov 2021)is to broadly conceptualize the agenda for data-centric geotechnics,an emerging field that attempts to prepare geotechnical engineering for digital transformation.The agenda must include(1)development of methods that make sense of all real-world data(not selective input data for a physical model),(2)offering insights of significant value to critical real-world decisions for current or future practice(not decisions for an ideal world or decisions of minor concern to geotechnical engineers),and(3)sensitivity to the physical context of geotechnics(not abstract data-driven analysis connected to geotechnics in a peripheral way,i.e.,engagement with the knowledge and experience base should be substantial).These three elements are termed“data centricity”,“fit for(and transform)practice”,and“geotechnical context”in the agenda.Given that a knowledge of the site is central to any geotechnical engineering project,datadriven site characterization(DDSC)must constitute one key application domain in data-centric geotechnics,although other infrastructure lifecycle phases such as project conceptualization,design,construction,operation,and decommission/reuse would benefit from data-informed decision support as well.One part of DDSC that addresses numerical soil data in a site investigation report and soil property databases is pursued under Project DeepGeo.In principle,the source of data can also go beyond site investigation,and the type of data can go beyond numbers,such as categorical data,text,audios,images,videos,and expert opinion.The purpose of Project DeepGeo is to produce a 3D stratigraphic map of the subsurface volume below a full-scale project site and to estimate relevant engineering properties at each spatial point based on actual site investigation data and other relevant Big Indirect Data(BID).Uncertainty quantification is necessary,as current real-world data is insufficient,incomplete,and/or not directly relevant to construct a deterministic map.The value of a deterministic map for decision support is debatable.The computational cost to do this for a 3D true scale subsurface volume must be reasonable.Ultimately,geotechnical structures need to be a part of a completely smart infrastructure that fits the circular economy and need to focus on delivering service to end-users and the community from project conceptualization to decommission/reuse with full integration to smart city and smart society.Although current geotechnical practice has been very successful in taking“calculated risk”informed by limited data,imperfect theories,prototype testing,observations,among others and exercising judicious caution and engineering judgment,there is no clear pathway forward to leverage on big data and digital technologies such as machine learning,BIM,and digital twin to meet more challenging needs such as sustainability and resilience engineering.展开更多
A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow acc...A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow accurately.However,accurately predicting traffic flow at the individual road level is extremely difficult due to the complex interplay of spatial and temporal factors.This paper proposes a technique for predicting short-term traffic flow data using an architecture that utilizes convolutional bidirectional long short-term memory(Conv-BiLSTM)with attention mechanisms.Prior studies neglected to include data pertaining to factors such as holidays,weather conditions,and vehicle types,which are interconnected and significantly impact the accuracy of forecast outcomes.In addition,this research incorporates recurring monthly periodic pattern data that significantly enhances the accuracy of forecast outcomes.The experimental findings demonstrate a performance improvement of 21.68%when incorporating the vehicle type feature.展开更多
Research data infrastructures form the cornerstone in both cyber and physical spaces,driving the progression of the data-intensive scientific research paradigm.This opinion paper presents an overview of global researc...Research data infrastructures form the cornerstone in both cyber and physical spaces,driving the progression of the data-intensive scientific research paradigm.This opinion paper presents an overview of global research data infrastructure,drawing insights from national roadmaps and strategic documents related to research data infrastructure.It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem.The four new missions of research data infrastructures are:(1)as a pioneer,to transcend the disciplinary border and address complex,cutting-edge scientific and social challenges with problem-and data-oriented insights;(2)as an architect,to establish a digital,intelligent,flexible research and knowledge services environment;(3)as a platform,to foster the high-end academic communication;(4)as a coordinator,to balance scientific openness with ethics needs.展开更多
This paper conducts a comprehensive review of existing research on Privacy by Design (PbD) and behavioral economics, explores the intersection of Privacy by Design (PbD) and behavioral economics, and how designers can...This paper conducts a comprehensive review of existing research on Privacy by Design (PbD) and behavioral economics, explores the intersection of Privacy by Design (PbD) and behavioral economics, and how designers can leverage “nudges” to encourage users towards privacy-friendly choices. We analyze the limitations of rational choice in the context of privacy decision-making and identify key opportunities for integrating behavioral economics into PbD. We propose a user-centered design framework for integrating behavioral economics into PbD, which includes strategies for simplifying complex choices, making privacy visible, providing feedback and control, and testing and iterating. Our analysis highlights the need for a more nuanced understanding of user behavior and decision-making in the context of privacy, and demonstrates the potential of behavioral economics to inform the design of more effective PbD solutions.展开更多
文摘Artificial intelligence(AI)relies on data and algorithms.State-of-the-art(SOTA)AI smart algorithms have been developed to improve the performance of AI-oriented structures.However,model-centric approaches are limited by the absence of high-quality data.Data-centric AI is an emerging approach for solving machine learning(ML)problems.It is a collection of various data manipulation techniques that allow ML practitioners to systematically improve the quality of the data used in an ML pipeline.However,data-centric AI approaches are not well documented.Researchers have conducted various experiments without a clear set of guidelines.This survey highlights six major data-centric AI aspects that researchers are already using to intentionally or unintentionally improve the quality of AI systems.These include big data quality assessment,data preprocessing,transfer learning,semi-supervised learning,machine learning operations(MLOps),and the effect of adding more data.In addition,it highlights recent data-centric techniques adopted by ML practitioners.We addressed how adding data might harm datasets and how HoloClean can be used to restore and clean them.Finally,we discuss the causes of technical debt in AI.Technical debt builds up when software design and implementation decisions run into“or outright collide with”business goals and timelines.This survey lays the groundwork for future data-centric AI discussions by summarizing various data-centric approaches.
文摘The purpose of this paper(presented online as a keynote lecture at the 25th Annual Indonesian Geotechnical Conference on 10 Nov 2021)is to broadly conceptualize the agenda for data-centric geotechnics,an emerging field that attempts to prepare geotechnical engineering for digital transformation.The agenda must include(1)development of methods that make sense of all real-world data(not selective input data for a physical model),(2)offering insights of significant value to critical real-world decisions for current or future practice(not decisions for an ideal world or decisions of minor concern to geotechnical engineers),and(3)sensitivity to the physical context of geotechnics(not abstract data-driven analysis connected to geotechnics in a peripheral way,i.e.,engagement with the knowledge and experience base should be substantial).These three elements are termed“data centricity”,“fit for(and transform)practice”,and“geotechnical context”in the agenda.Given that a knowledge of the site is central to any geotechnical engineering project,datadriven site characterization(DDSC)must constitute one key application domain in data-centric geotechnics,although other infrastructure lifecycle phases such as project conceptualization,design,construction,operation,and decommission/reuse would benefit from data-informed decision support as well.One part of DDSC that addresses numerical soil data in a site investigation report and soil property databases is pursued under Project DeepGeo.In principle,the source of data can also go beyond site investigation,and the type of data can go beyond numbers,such as categorical data,text,audios,images,videos,and expert opinion.The purpose of Project DeepGeo is to produce a 3D stratigraphic map of the subsurface volume below a full-scale project site and to estimate relevant engineering properties at each spatial point based on actual site investigation data and other relevant Big Indirect Data(BID).Uncertainty quantification is necessary,as current real-world data is insufficient,incomplete,and/or not directly relevant to construct a deterministic map.The value of a deterministic map for decision support is debatable.The computational cost to do this for a 3D true scale subsurface volume must be reasonable.Ultimately,geotechnical structures need to be a part of a completely smart infrastructure that fits the circular economy and need to focus on delivering service to end-users and the community from project conceptualization to decommission/reuse with full integration to smart city and smart society.Although current geotechnical practice has been very successful in taking“calculated risk”informed by limited data,imperfect theories,prototype testing,observations,among others and exercising judicious caution and engineering judgment,there is no clear pathway forward to leverage on big data and digital technologies such as machine learning,BIM,and digital twin to meet more challenging needs such as sustainability and resilience engineering.
文摘A significant obstacle in intelligent transportation systems(ITS)is the capacity to predict traffic flow.Recent advancements in deep neural networks have enabled the development of models to represent traffic flow accurately.However,accurately predicting traffic flow at the individual road level is extremely difficult due to the complex interplay of spatial and temporal factors.This paper proposes a technique for predicting short-term traffic flow data using an architecture that utilizes convolutional bidirectional long short-term memory(Conv-BiLSTM)with attention mechanisms.Prior studies neglected to include data pertaining to factors such as holidays,weather conditions,and vehicle types,which are interconnected and significantly impact the accuracy of forecast outcomes.In addition,this research incorporates recurring monthly periodic pattern data that significantly enhances the accuracy of forecast outcomes.The experimental findings demonstrate a performance improvement of 21.68%when incorporating the vehicle type feature.
基金the National Social Science Fund of China(Grant No.22CTQ031)Special Project on Library Capacity Building of the Chinese Academy of Sciences(Grant No.E2290431).
文摘Research data infrastructures form the cornerstone in both cyber and physical spaces,driving the progression of the data-intensive scientific research paradigm.This opinion paper presents an overview of global research data infrastructure,drawing insights from national roadmaps and strategic documents related to research data infrastructure.It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem.The four new missions of research data infrastructures are:(1)as a pioneer,to transcend the disciplinary border and address complex,cutting-edge scientific and social challenges with problem-and data-oriented insights;(2)as an architect,to establish a digital,intelligent,flexible research and knowledge services environment;(3)as a platform,to foster the high-end academic communication;(4)as a coordinator,to balance scientific openness with ethics needs.
文摘This paper conducts a comprehensive review of existing research on Privacy by Design (PbD) and behavioral economics, explores the intersection of Privacy by Design (PbD) and behavioral economics, and how designers can leverage “nudges” to encourage users towards privacy-friendly choices. We analyze the limitations of rational choice in the context of privacy decision-making and identify key opportunities for integrating behavioral economics into PbD. We propose a user-centered design framework for integrating behavioral economics into PbD, which includes strategies for simplifying complex choices, making privacy visible, providing feedback and control, and testing and iterating. Our analysis highlights the need for a more nuanced understanding of user behavior and decision-making in the context of privacy, and demonstrates the potential of behavioral economics to inform the design of more effective PbD solutions.
基金Supported by the Key Program of National Natural Science Foundation of China under Grant No.60533110(国家自然科学基金重点项目)the National Natural Science Foundation of China under Grant No.60473075(国家自然科学基金)+3 种基金the National Grand Fundamental Research973Program of China under Grant No.2006CB303000(国家重点基础研究发展计划(973))the Program for New Century Excellent Talents in University of China under Grant No.NCET-05-0333(新世纪优秀人才支持计划)the Key Program of the Natural Science Foundation of Heilongjiang Province of China under Grant No.ZJG03-05(黑龙江省自然科学基金重点项目)the Heilongjiang Province Scientific and Technological Special Fund for Young Scholars of China under Grant No.QC06C033(黑龙江省青年科技专项资金)