摘要
性别偏见是社会学研究的热点。近年来,机器学习算法从数据中学到偏见,使之得到更广泛的关注,但目前尚无基于语料库的方法对文本数据中职业性别偏见的研究。该文基于标记理论,利用BCC和DCC语料库,从共时和历时两个层面考察了63个职业的性别无意识偏见现象。首先,以调查问卷的形式调研了不同性别和不同年龄段的人群对63个职业的性别倾向,发现和BCC语料库中多领域的职业性别偏见度呈显著的正相关关系。然后从共时的角度,利用BCC语料库中不同领域的语料,以及DCC语料库中2018年全国31个省级行政单位(不含港澳台地区)的报纸语料,发现从口语至书面语语体,大部分职业表现出对女性的性别偏见逐渐升高,且不同地区对职业的性别偏见存在差异。最后,从历时的角度,利用DCC语料库2005至2018年的报纸语料进行统计分析,发现职业性别无意识偏见现象随着时间的推移,呈现总体弱化趋势。
Gender bias is a hot topic in sociology. In recent years, machine learning algorithms have learnt bias from data, which have arouse much more attention on this topic. Based on the markedness theory, this paper examines the unconscious gender bias of 63 occupations in BCC and DCC corpora from both synchronic and diachronic perspectives. Firstly, the gender preference of 63 occupations among different age and gender groups is investigated via questionnaires. There is a significant positive correlation between the questionnaire and the occupation gender bias word frequency indicators in the BCC corpus. Then, from the perspective of synchronic study, most of the occupations are found with a growing gender bias against women from the corpus of different fields in the BCC corpus, and the newspaper texts of the 31 provincial administrative units in the DCC corpus in 2018, There also are differences in occupational gender bias in different regions. Finally, from a diachronic perspective, it is found that the occupational gender unconscious bias phenomenon shows an overall weakening trend form the DCC corpus from 2005 to 2018 newspaper texts for statistical analysis.
作者
朱述承
苏祺
刘鹏远
ZHU Shucheng;SU Qi;LIU Pengyuan(College of Information Science,Beijing Language and Culture University,Beijing 100083,China;School of Foreign Languages,Peking University,Beijing 100871,China;Key Laboratory of Computational Linguistics(Peking University),Ministry of Education,Beijing 100871,China;National Language Resources Monitoring and Research Center Print Media Language Branch,Beijing Language and Culture University,Beijing 100083,China)
出处
《中文信息学报》
CSCD
北大核心
2021年第5期130-140,共11页
Journal of Chinese Information Processing
基金
教育部人文社会科学研究规划基金(18YJA740030)
北京语言大学院级项目(中央高校基本科研业务费专项资金)(19YJ040003)。
关键词
语料库
职业
性别
无意识偏见
标记理论
corpus
occupation
gender
unconscious bias
markedness theory