摘要
本文以民国时期图书目录数据整理为例,讨论书目数据库数据文本整理过程中字频统计方法的应用问题。通过在数据库内部为目录字段创建以汉字字形为单位的单字索引表,统计书目数据文本实际使用汉字频率分布情况,在此基础上进行异形字归并整理,最后通过索引关联实现书目数据文本字形的统一。数据库支持下的字频统计,可以作为书目数据文本整理的一种有效的方法。
Taking compiling bibliographic data of books of the Republic of China as an example, this paper discusses applying statistical methods of Chinese character frequency in compiling text of bibliographic index based on Chinese character pattern database. Firstly, create Chinese character in the database; secondly, calculate the frequency distribution of the usage of Chinese characters in bibliographic text; thirdly, emerge and compile the characters with different forms; finally, unify the form of characters in the text of bibliographic data by index. The Chinese character frequency statistics based on database can be used as an effective method of compiling the text of bibliographic data.
出处
《中国索引》
2016年第1期88-99,共12页
Journal of the China Society of Indexers
基金
本文为教育部人文社会科学研究一般规划课题“民国时期图书目录资料库”(项目编号为10YJA870012)研究成果.
关键词
民国文献
书目数据
字频统计
Bibliographic Data of Books of the Republic of China
Bibliographic Data
Chinese Character Frequency Statistics