期刊文献+

基于在线百科的大规模人物社会网络抽取与分析 被引量:6

Large-Scale People Social Network Extraction and Analysis Based on Online Encyclopedia
下载PDF
导出
摘要 在线百科词条中蕴含着海量的人物间关系信息,基于这些信息可以抽取出大规模社会网络,为数字人文和社会计算研究提供数据支撑。本研究以百度百科为例,首次对面向中文在线百科的大规模社会网络抽取进行探索,提出一种新的人物社会网络抽取方法。该方法利用排序学习综合多种特征计算人物关系权重,通过估计人物生存时空来发现人物间的时空耦合关系。由此,从百度百科中抽取出一个带权重的跨时空人物社会网络和一个时空耦合的人物网络。这两个人物网络具有良好的小世界和无标度特性,并存在清晰的社区结构。最后,通过可视化分析展示了百科人物网络在数字人文研究中的应用模式和应用价值。 Social Network Extraction( SNE) is an emerging research field which focuses on automatic extraction of hidden social networks from a wide variety of information sources. The articles of online encyclopedia contain massive information about persons as well as their interpersonal relationships,from which a people social network can be extracted and used for the research of digital humanities and social computing. The extracted people social network involves both real persons who may span thousands of years and virtual persons who may come from a large number of literary works. However,most of people social network extraction methods ignore the types and spatio-temporal characteristics of persons,and only consider text similarity or other related features to measure the degree of relevance between persons. This may result in restrictions on the accuracy and application field of the extracted people social networks.This study explored the automatic extraction of a large-scale people social network from Chinese online encyclopedia for the first time by taking Baidu Encyclopedia as an example. It proposed a new method of social network extraction,which distinguishes the types and spatio-temporal characteristics of extracted persons and more accurately measures the weight of interpersonal relationships based on multiple relevance features. This method contains three phrases-generating an initial people social network,computing the relationship strength between different persons and analyzing the spatio-temporal characteristics of persons.In the first phase,the articles on persons( hereinafter referred to as"person articles") were identified from Baidu Encyclopedia,and then an initial undirected and unweighted people social network containing more than 0.54 million nodes and 2.22 million edges were generated based on the links between person articles.In the second phase,the strength of the relationships between persons in the initial network was calculated as a ranking task. It was solved with a supervised learning to rank( L2 R) method to combine five similarity features for measuring the relevance degree between persons. Based on this method,the initial unweighted people network was then transformed to a weighted network in which person nodes are across time and space. In the third phase,the living time-space of each person in the people network was estimated. For a real person,his/her living time-space was estimated based on the years( including reign titles) occurring in the article on him/her,whereas for a virtual person,his/her living time-space was one or more works depicting him/her. In this way,a time-space coupling network,which contains about 0.45 million nodes and 1.70 million edges,was derived from the previous cross-time-space weighted people network.The characteristics of the extracted two people social networks were investigated with social network analysis. The results showed that the two networks were both small-world and scale-free networks and have a clear community structure. Furthermore,three types of visual analysis were also performed on the two people networks: point analysis was used to detect related persons of a central person;chain analysis was used to discover the path between two persons( i. e. their direct or indirect relationships);and network analysis was used to reveal high-central persons and person communities in a specific historical or documental time-space. This also indicated that the large-scale social networks extracted from online encyclopedia had great value to support digital humanities research and improve researchers’ perception of historical person age in reality and virtual characters in literary and artistic works.
作者 林泽斐 欧石燕 LIN Zefei&OU Shiyan
出处 《中国图书馆学报》 CSSCI 北大核心 2019年第6期100-118,共19页 Journal of Library Science in China
基金 国家社会科学基金重点项目“基于关联数据的学术文献内容语义发布及其应用研究”(编号:17ATQ001)的研究成果之一~~
关键词 社会网络抽取 社会网络分析 人物社会网络 在线百科 数字人文 Social network extraction Social network analysis People social network Online encyclopedia Digital humanities
  • 相关文献

参考文献3

二级参考文献56

共引文献104

同被引文献299

引证文献6

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部