This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on ...This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on computer clusters, for the purpose of dynamically improving the recognition precision of the digitized texts of a million volumes of books produced by the China-US Million Books Digital Library (CADAL) Project. The practice of this center will provide helpful reference for other digital library projects.展开更多
The Bibliotheca Alexandrina (BA) has been developing and putting to use a workflow for tuming printed books into digital books as its contribution to the building of a Universal Digital Library. This workflow is a p...The Bibliotheca Alexandrina (BA) has been developing and putting to use a workflow for tuming printed books into digital books as its contribution to the building of a Universal Digital Library. This workflow is a process consisting of multiple phases, namely, scanning, image processing, OCR, digital archiving, document encoding, and publishing. Over the past couple of years, the BA has defined procedures and special techniques for the scanning, processing, OCR and publishing, especially of Arabic books. This workflow has been automated, allowing the governance of the different phases and making possible the production of 18000 books so far. The BA has also designed and implemented a framework for the encoding of digital books that allows publishing as well as a software system for managing the creation, maintenance, and publishing of the overall digital repository.展开更多
Google’s announcement that it intended to digitize all the books in several major research libraries was met with mixed reactions. John Wilkin at the University of Michigan declared “This is the day the world chang...Google’s announcement that it intended to digitize all the books in several major research libraries was met with mixed reactions. John Wilkin at the University of Michigan declared “This is the day the world changes,” while Rory Litwin said in Library Juice that the move would “commercialize the great research libraries with a handshake, suddenly and epochally.” The four directors of the Universal Library and Million Book Project have received many questions about the comparative aspects of our work and Google Print. My purpose is to compare the two, talking about their genesis, the realities of collections and logistics, and the worries that arise from these realities.展开更多
基金Project supported by China-US Million Books Digital Library Project
文摘This paper briefly introduces the main ideas of a sustainable development OCR system based on open architecture techniques and then describes the construction of an optical character recognition (OCR) center built on computer clusters, for the purpose of dynamically improving the recognition precision of the digitized texts of a million volumes of books produced by the China-US Million Books Digital Library (CADAL) Project. The practice of this center will provide helpful reference for other digital library projects.
文摘The Bibliotheca Alexandrina (BA) has been developing and putting to use a workflow for tuming printed books into digital books as its contribution to the building of a Universal Digital Library. This workflow is a process consisting of multiple phases, namely, scanning, image processing, OCR, digital archiving, document encoding, and publishing. Over the past couple of years, the BA has defined procedures and special techniques for the scanning, processing, OCR and publishing, especially of Arabic books. This workflow has been automated, allowing the governance of the different phases and making possible the production of 18000 books so far. The BA has also designed and implemented a framework for the encoding of digital books that allows publishing as well as a software system for managing the creation, maintenance, and publishing of the overall digital repository.
文摘Google’s announcement that it intended to digitize all the books in several major research libraries was met with mixed reactions. John Wilkin at the University of Michigan declared “This is the day the world changes,” while Rory Litwin said in Library Juice that the move would “commercialize the great research libraries with a handshake, suddenly and epochally.” The four directors of the Universal Library and Million Book Project have received many questions about the comparative aspects of our work and Google Print. My purpose is to compare the two, talking about their genesis, the realities of collections and logistics, and the worries that arise from these realities.