Computational Social Science(CSS),aiming at utilizing computational methods to address social science problems,is a recent emerging and fast-developing field.The study of CSS is data-driven and significantly benefits ...Computational Social Science(CSS),aiming at utilizing computational methods to address social science problems,is a recent emerging and fast-developing field.The study of CSS is data-driven and significantly benefits from the availability of online user-generated contents and social networks,which contain rich text and network data for investigation.However,these large-scale and multi-modal data also present researchers with a great challenge:how to represent data effectively to mine the meanings we want in CSS?To explore the answer,we give a thorough review of data representations in CSS for both text and network.Specifically,we summarize existing representations into two schemes,namely symbol-based and embeddingbased representations,and introduce a series of typical methods for each scheme.Afterwards,we present the applications of the above representations based on the investigation of more than 400 research articles from 6 top venues involved with CSS.From the statistics of these applications,we unearth the strength of each kind of representations and discover the tendency that embedding-based representations are emerging and obtaining increasing attention over the last decade.Finally,we discuss several key challenges and open issues for future directions.This survey aims to provide a deeper understanding and more advisable applications of data representations for CSS researchers.展开更多
基金This work was supported by the National Key Research and Development Program of China(No.2020AAA0106501)the National Natural Science Foundation of China(No.62002029)Beijing Academy of Artificial Intelligence(BAAI).
文摘Computational Social Science(CSS),aiming at utilizing computational methods to address social science problems,is a recent emerging and fast-developing field.The study of CSS is data-driven and significantly benefits from the availability of online user-generated contents and social networks,which contain rich text and network data for investigation.However,these large-scale and multi-modal data also present researchers with a great challenge:how to represent data effectively to mine the meanings we want in CSS?To explore the answer,we give a thorough review of data representations in CSS for both text and network.Specifically,we summarize existing representations into two schemes,namely symbol-based and embeddingbased representations,and introduce a series of typical methods for each scheme.Afterwards,we present the applications of the above representations based on the investigation of more than 400 research articles from 6 top venues involved with CSS.From the statistics of these applications,we unearth the strength of each kind of representations and discover the tendency that embedding-based representations are emerging and obtaining increasing attention over the last decade.Finally,we discuss several key challenges and open issues for future directions.This survey aims to provide a deeper understanding and more advisable applications of data representations for CSS researchers.