摘要
古籍数字化并不能免去文本保存的负担,其所带来的重要变化在于古籍的使用上。全文检索是古籍数字化中最受重视的技术之一,但简单的字符串匹配"噪音"太大,需要对文本进行标注和索引。XML的可扩展性和易交换性使其成为首选的标记语言。主要讨论如何在计算机中存储与检索这些经过XML标注的古籍文本,也就是建立XML数据库。一个基本做法是在目前占统治地位的关系型数据库上增加XML映射层,使之能适应XML数据的存储和查询需要。在关系数据库中引入XML技术可以提高其自身的灵活性,而其成熟的管理机制也可强化了XML数据库的体质。
Classic digitalization can' t dispense with the burden of version preserves. The great changes brought by it are about the uses of these classics. Full text search is one of the most important techniques in classic digitalization, but there are too many "noises" brought by simple matches of character strings, so it' s necessary to mark and index these digitalized texts of classics. The extensibility and exchangeability of XML make it the first option of the marking language. This thesis mainly discusses on how to store and retrieve these classic texts marked by XML. That is, on other words, to build a Database of XML documents. A basic method is to add a mapping tier on the dominant RDBS so as to make it can meet the need of storing and querying XML data. To apply XML techniques in RDBS would improve RDBS' own flexibility, while the mature administration mechanism of RDBS would also strengthen the constitution of XML Database.
出处
《农业图书情报学刊》
2010年第2期92-96,共5页
Journal of Library and Information Sciences in Agriculture