With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumb...With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumbersome work for the management of university libraries is document retrieval.This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process.The fast-matching method is used to determine the weight of each keyword,so as to ensure an efficient and accurate document retrieval in digital libraries,thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.展开更多
Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of te...Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.展开更多
文摘With the development of big data,all walks of life in society have begun to venture into big data to serve their own enterprises and departments.Big data has been embraced by university digital libraries.The most cumbersome work for the management of university libraries is document retrieval.This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process.The fast-matching method is used to determine the weight of each keyword,so as to ensure an efficient and accurate document retrieval in digital libraries,thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.
基金supported by the Innovation Platform Construction of Qinghai Province(No.2016-ZJ-Y04)the Basic Research Program of Qinghai Province(No.2016-ZJ-740)
文摘Text extraction is an important initial step in digitizing the historical documents. In this paper, we present a text extraction method for historical Tibetan document images based on block projections. The task of text extraction is considered as text area detection and location problem. The images are divided equally into blocks and the blocks are filtered by the information of the categories of connected components and corner point density. By analyzing the filtered blocks' projections, the approximate text areas can be located, and the text regions are extracted. Experiments on the dataset of historical Tibetan documents demonstrate the effectiveness of the proposed method.