期刊文献+
共找到17篇文章
< 1 >
每页显示 20 50 100
Semantic Document Layout Analysis of Handwritten Manuscripts
1
作者 Emad Sami Jaha 《Computers, Materials & Continua》 SCIE EI 2023年第5期2805-2831,共27页
A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume... A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts. 展开更多
关键词 Semantic characteristics semantic labeling document layout analysis semantic document layout analysis handwritten manuscripts clustering RETRIEVAL image processing computer vision machine learning
下载PDF
Analysis of Adverse Reactions in the Treatment of COVID-19 with Three Chinese Patent Medicines and Three Herbal Formulas
2
作者 Li Qiao Wang Aili +1 位作者 Wu Di Chen Yuwen 《Asian Journal of Social Pharmacy》 2023年第1期8-16,共9页
Objective To explore the rules and characteristics of the adverse drug reactions(ADRs)of three Chinese patent medicines and three herbal formulas for the treatment of COVID-19,and to provide a reference for clinical s... Objective To explore the rules and characteristics of the adverse drug reactions(ADRs)of three Chinese patent medicines and three herbal formulas for the treatment of COVID-19,and to provide a reference for clinical safe medication.Methods The cases and ADR reports of the three Chinese patent medicines and three herbal formulas in PubMed,Web of Science,Springer Link,CNKI,Wanfang and VIP database were retrieved from December 2019 to May 2021.Then we extracted and analyzed the effective information included in the literature.Results and Conclusion According to the pre-developed retrieval plan,a total of 136 documents were obtained,and a total of 6 documents met the inclusion criteria finally.553 patients used three Chinese patent medicines and three herbal formulas,and there were 133 cases of adverse reactions.The adverse reactions of patients taking the three Chinese patent medicines and three herbal formulas can all be explained under the theory of traditional Chinese medicine,and the adverse reactions can be eliminated by adding or subtracting the flavor of the medicine or stopping the medicine. 展开更多
关键词 three Chinese patent medicines and three herbal formulas adverse drug reaction document analysis
下载PDF
Pre-training transformer with dual-branch context content module for table detection in document images
3
作者 Yongzhi LI Pengle ZHANG +2 位作者 Meng SUN Jin HUANG Ruhan HE 《虚拟现实与智能硬件(中英文)》 EI 2024年第5期408-420,共13页
Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such... Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM. 展开更多
关键词 Table detection Document image analysis TRANSFORMER Dilated convolution Deformable convolution Feature fusion
下载PDF
Radon CLF:A Novel Approach for Skew Detection Using Radon Transform
4
作者 Yuhang Chen Mahdi Bahaghighat +1 位作者 Aghil Esmaeili Kelishomi Jingyi Du 《Computer Systems Science & Engineering》 SCIE EI 2023年第10期675-697,共23页
In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure... In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure,and the entire document image might be degraded.Imperfect conversion effects due to noise,motion blur,and skew distortion can lead to significant impact on the accuracy and effectiveness of document image segmentation and analysis in Optical Character Recognition(OCR)systems.In Document Image Analysis Systems(DIAS),skew estimation of images is a crucial step.In this paper,a novel,fast,and reliable skew detection algorithm based on the Radon Transform and Curve Length Fitness Function(CLF),so-called Radon CLF,was proposed.The Radon CLF model aims to take advantage of the properties of Radon spaces.The Radon CLF explores the dominating angle more effectively for a 1D signal than it does for a 2D input image due to an innovative fitness function formulation for a projected signal of the Radon space.Several significant performance indicators,including Mean Square Error(MSE),Mean Absolute Error(MAE),Peak Signal-to-Noise Ratio(PSNR),Structural Similarity Measure(SSIM),Accuracy,and run-time,were taken into consideration when assessing the performance of our model.In addition,a new dataset named DSI5000 was constructed to assess the accuracy of the CLF model.Both two-dimensional image signal and the Radon space have been used in our simulations to compare the noise effect.Obtained results show that the proposed method is more effective than other approaches already in use,with an accuracy of roughly 99.87%and a run-time of 0.048(s).The introduced model is far more accurate and timeefficient than current approaches in detecting image skew. 展开更多
关键词 Document image analysis skew detection Radon transform pattern recognition
下载PDF
Hash-Indexing Block-Based Deduplication Algorithm for Reducing Storage in the Cloud
5
作者 D.Viji S.Revathy 《Computer Systems Science & Engineering》 SCIE EI 2023年第7期27-42,共16页
Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massiv... Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massive amount of data stored in the data centre containing similar information and file structures remaining in multi-copy,duplication leads to increase storage space.The potential deduplication system doesn’t make efficient data reduction because of inaccuracy in finding similar data analysis.It creates a complex nature to increase the storage consumption under cost.To resolve this problem,this paper proposes an efficient storage reduction called Hash-Indexing Block-based Deduplication(HIBD)based on Segmented Bind Linkage(SBL)Methods for reducing storage in a cloud environment.Initially,preprocessing is done using the sparse augmentation technique.Further,the preprocessed files are segmented into blocks to make Hash-Index.The block of the contents is compared with other files through Semantic Content Source Deduplication(SCSD),which identifies the similar content presence between the file.Based on the content presence count,the Distance Vector Weightage Correlation(DVWC)estimates the document similarity weight,and related files are grouped into a cluster.Finally,the segmented bind linkage compares the document to find duplicate content in the cluster using similarity weight based on the coefficient match case.This implementation helps identify the data redundancy efficiently and reduces the service cost in distributed cloud storage. 展开更多
关键词 Cloud computing DEDUPLICATION hash indexing relational content analysis document clustering cloud storage record linkage
下载PDF
Assessment of Stone Material Deterioration of the Exposed Surfaces of the Step Pyramid in Saqqara
6
作者 Agnese Kukela Valdis Seglins 《Journal of Earth Science and Engineering》 2013年第4期238-244,共7页
The Step Pyramid of Djoser at Saqqara, Egypt is one of the oldest stone monuments in the world and along with other historical monuments of this area is included in the World Heritage List of UNESCO (United Nations E... The Step Pyramid of Djoser at Saqqara, Egypt is one of the oldest stone monuments in the world and along with other historical monuments of this area is included in the World Heritage List of UNESCO (United Nations Educational, Scientific, and Cultural Organization). In a way, this monument was an experimental construction and served as a prototype for other pyramids afterwards built in Ancient Egypt. Innovative materials, mortar, construction and engineering solutions were introduced and approbated during the construction process of the Step Pyramid. Therefore, the reconstruction of this monument possibly close to its original state is an extremely difficult task. The preservation of this pyramid for future generations is a challenge to the specialists of various scientific fields. Current study is focusing on systematic assessment of the exposed surfaces of the pyramid's facades identifying various stone material weathering types and their intensities, as well as major deformations of the structure further integrated into the geospatial model of the pyramid. The results of this study provide possibility to determine the most endangered areas of pyramid's facades and calculate the volume of necessary reconstruction work. 展开更多
关键词 Geospatial model photo documentation analysis weathering intensity conservation strategy stone monument.
下载PDF
APPCorp:a corpus for Android privacy policy document structure analysis 被引量:1
7
作者 Shuang LIU Fan ZHANG +3 位作者 Baiyang ZHAO Renjie GUO Tao CHEN Meishan ZHANG 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第3期1-10,共10页
With the increasing popularity of mobile devices and the wide adoption of mobile Apps,an increasing concern of privacy issues is raised.Privacy policy is identified as a proper medium to indicate the legal terms,such ... With the increasing popularity of mobile devices and the wide adoption of mobile Apps,an increasing concern of privacy issues is raised.Privacy policy is identified as a proper medium to indicate the legal terms,such as the general data protection regulation(GDPR),and to bind legal agreement between service providers and users.However,privacy policies are usually long and vague for end users to read and understand.It is thus important to be able to automatically analyze the document structures of privacy policies to assist user understanding.In this work we create a manually labelled corpus containing 231 privacy policies(of more than 566,000 words and 7,748 annotated paragraphs).We benchmark our data corpus with 3 document classification models and achieve more than 82%on F1-score. 展开更多
关键词 privacy policy GDPR document structure analysis representation learning graph neural network
原文传递
The Origin Review and A Case Study of Plain Language in Federal Documents
8
作者 朱慧子 《海外英语》 2017年第19期217-219,共3页
Plain Language has made a great difference nowadays. As it turns out, Plain Language works effectively to express clearly, concisely and systematically. However, it is necessary for contemporary practitioners to revie... Plain Language has made a great difference nowadays. As it turns out, Plain Language works effectively to express clearly, concisely and systematically. However, it is necessary for contemporary practitioners to review the origin and development of Plain Language Movement and to examine whether it has thoroughly implemented Plain Language policies in every federal document. Examining a contemporary federal document against the Guidelines for Document Designers reveals existing problems for further development. 展开更多
关键词 Plain Language federal documents development history document analysis
下载PDF
Document Analysis by Crosscount Approach
9
作者 王海琴 戴汝为 《Journal of Computer Science & Technology》 SCIE EI CSCD 1998年第1期32-40,共9页
In this paper a new feature called crosscount for document analysis is introduced.The feature crosscount is a function of white line segment with its start on the edgeof document images. It reflects not only the conto... In this paper a new feature called crosscount for document analysis is introduced.The feature crosscount is a function of white line segment with its start on the edgeof document images. It reflects not only the contour of image, but also the periodicity of white lines(background) and text lines in the document images. In complexprinted-page layouts, there are different blocks such as textual, graphical, tabular, andso on. of these blocks, textual ones have the most obvious periodicity with their homogeneous white lines arranged regularly. The important property of textual blockscan be extracted by crosscount functions. Here the document layouts are classifiedinto three classes on the basis of their physical structures. Then the definition andproperties of the crosscount function are described. According to the classification ofdocument layouts, the application of this new feature to different types of documentimages analysis and understanding is discussed. 展开更多
关键词 Crosscount PROJECTION RUN-LENGTH document analysis document understanding skew detection skew correction
原文传递
Visual Similarity Based Document Layout Analysis
10
作者 文迪 丁晓青 《Journal of Computer Science & Technology》 SCIE EI CSCD 2006年第3期459-464,F0003,共7页
In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structu... In this paper, a visual similarity based document layout analysis (DLA) scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles. Aiming at a robust and adaptive DLA approach, the authors first manage to find a set of representative filters and statistics to characterize typical texture patterns in document images, which is through a visual similarity testing process. Texture features are then extracted from these filters and passed into a dynamic clustering procedure, which is called visual similarity clustering. Finally, text contents are located from the clustered results. Benefit from this scheme, the algorithm demonstrates strong robustness and adaptability in a wide variety of documents, which previous traditional DLA approaches do not possess. 展开更多
关键词 document layout analysis texture analysis dynamic clustering
原文传递
AUTOMATIC PATENT DOCUMFNT SUMMARIZATION FOR COLLABORATIVE KNOWLEDGE SYSTEMS AND SERVICES 被引量:9
11
作者 Amy J.C. TRAPPEY Charles V. TRAPPEY Chun-Yi WU 《Journal of Systems Science and Systems Engineering》 SCIE EI CSCD 2009年第1期71-94,共24页
Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development tim... Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development time, increases market success, and reduces potential patent infringement. Thus, it is beneficial to automatically and systematically extract information from patent documents in order to improve knowledge sharing and collaboration among R&D team members. In this research, patents are summarized using a combined ontology based and TF-IDF concept clustering approach. The ontology captures the general knowledge and core meaning of patents in a given domain. Then, the proposed methodology extracts, clusters, and integrates the content of a patent to derive a summary and a cluster tree diagram of key terms. Patents from the International Patent Classification (IPC) codes B25C, B25D, B25F (categories for power hand tools) and B24B, C09G and H011 (categories for chemical mechanical polishing) are used as case studies to evaluate the compression ratio, retention ratio, and classification accuracy of the summarization results. The evaluation uses statistics to represent the summary generation and its compression ratio, the ontology based keyword extraction retention ratio, and the summary classification accuracy. The results show that the ontology based approach yields about the same compression ratio as previous non-ontology based research but yields on average an 11% improvement for the retention ratio and a 14% improvement for classification accuracy. 展开更多
关键词 Semantic knowledge service key phrase extraction document summarization text mining patent document analysis
原文传递
Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding 被引量:11
12
作者 Ming Liu Bo Lang +1 位作者 Zepeng Gu Ahmed Zeeshan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第6期619-632,共14页
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the sema... Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models. 展开更多
关键词 document semantic similarity text understanding semantic enrichment word embedding scientific literature analysis
原文传递
Arabic Bank Check Processing: State of the Art
13
作者 Irfan Ahmad Sabri A.Mahmoud 《Journal of Computer Science & Technology》 SCIE EI CSCD 2013年第2期285-299,共15页
In this paper, we present a general model for Arabic bank check processing indicating the major phases of a check processing system. We then survey the available databases for Arabic bank check processing research. Th... In this paper, we present a general model for Arabic bank check processing indicating the major phases of a check processing system. We then survey the available databases for Arabic bank check processing research. The state of the art in the different phases of Arabic bank check processing is surveyed (i.e., pre-processing, check analysis and segmentation, features extraction, and legal and courtesy amounts recognition). The open issues for future research are stated and areas that need improvements are presented. To the best of our knowledge, it is the first survey of Arabic bank check processing. 展开更多
关键词 handwriting analysis document analysis text processing feature evaluation and selection pattern analysis
原文传递
Extended Approach to Water Flow Algorithm for Text Line Segmentation
14
作者 Darko Brodi 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第1期187-194,共8页
This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as par... This paper proposes a new approach to the water flow algorithm for text line segmentation. In the basic method the hypothetical water flows under few specified angles which have been defined by water flow angle as parameter. It is applied to the document image frame from left to right and vice versa. As a result, the unwetted and wetted areas are established. These areas separate text from non-text elements in each text line, respectively. Hence, they represent the control areas that are of major importance for text line segmentation. Primarily, an extended approach means extraction of the connected-components by bounding boxes over text. By this way, each connected component is mutually separated. Hence, the water flow angle, which defines the unwetted areas, is determined adaptively. By choosing appropriate water flow angle, the unwetted areas are lengthening which leads to the better text line segmentation. Results of this approach are encouraging due to the text line segmentation improvement which is the most challenging step in document image processing. 展开更多
关键词 document image analysis text segmentation region growing smearing method water flow algorithm
原文传递
Self-identification of electronically scanned signatures(ESS)and digitally constructed signatures(DCS)
15
作者 Zuzanna Kazmierczyk Ian J.Turner 《Forensic Sciences Research》 CSCD 2022年第2期261-264,共4页
The use of electronic signatures as a form of identification is increasingly common,yet they have been shown to lack the dynamic features found in online signatures.In this study,handwritten signatures were scanned to... The use of electronic signatures as a form of identification is increasingly common,yet they have been shown to lack the dynamic features found in online signatures.In this study,handwritten signatures were scanned to produce electronically scanned signatures(ESS)which were then digitally altered to produce digitally constructed signatures(DCS).The ESS and DCS were presented back to participants to identify which were genuine.Only 1%of participants correctly identified all signatures,with a mean score of 57.6%identifications.The lack of self-recognition of ESS raises questions on their reliability and usefulness as means of personal identification. 展开更多
关键词 Forensic sciences SIGNATURE HANDWRITING questioned document analysis electronic signature digital signature simulation
原文传递
Knowledge-driven decision analytics for commercial banking
16
作者 K.S.Law Fu-Lai Chung 《Journal of Management Analytics》 EI 2020年第2期209-230,共22页
Although the corporate relationship manager seems to be the key enabler in commercial banking,the personal relationship sales model is not a sustainable model for the paradigm shift in digital financial markets.In thi... Although the corporate relationship manager seems to be the key enabler in commercial banking,the personal relationship sales model is not a sustainable model for the paradigm shift in digital financial markets.In this research,we propose a knowledge-driven decision analytics approach to improve the decision process.However,most of the corporate client documents processed in banks are not well-structured and the traditional analysis approach does not consider the document structure,which carries rich semantic information.We propose a document structure-based text representation approach with incorporating auxiliary information in the predictive analytics of unstructured data to improve the performance in the document classification task.The proposed approach significantly outperforms the traditional whole document approach which does not take into considerations of the document structure.With the proposed approach,knowledge can be effectively and efficiently used for business decisions and planning to improve the competitive advantage and substantiality of banks. 展开更多
关键词 document classification information retrieval informatics document structure analysis auxiliary information
原文传递
Purposeful and Ethical Early Childhood Teacher:The Underlying Values Guiding Finnish Early Childhood Education
17
作者 Anitta Melasalmi Tarja-Ritta Hurme Inkeri Ruokonen 《ECNU Review of Education》 2022年第4期601-623,共23页
Purpose:The new Finnish National Core Curriculum for Early Childhood Education and Care(2018)strongly highlights pedagogical knowledge and practice,demanding teachers to develop their pedagogical thinking,evaluation,j... Purpose:The new Finnish National Core Curriculum for Early Childhood Education and Care(2018)strongly highlights pedagogical knowledge and practice,demanding teachers to develop their pedagogical thinking,evaluation,judgment,and operating culture.Since ethics is viewed as vital characteristics of the teaching profession,our objective is to make these complex ethical issues more visible to be subject to democratic discussion and change.Design/Approach/Methods:The framework comprises a broad theory base of codes of ethics and professional codes of ethics of teaching.The research materials were national curricula of early childhood education and care(ECEC)-and pre-primary education.The eight-step qualitative analysis process was applied to identify and shed light on the codes of ethics laying the foundations forpurposeful and ethical earlychildhood education(ECE)teacher.Findings:The results indicate that through both theoretical lenses,the Finnish ECEC curricula comprise several ethical codes.For the future purposeful ECE teachers as ethical professionals,the results raise questions for further discussion.Particularly,issues related to the ethics of care,intellectual freedom,inquiry stance,and professional competence,and diversity may further enhance our ECEC curricula. 展开更多
关键词 Document analysis early childhood education and care early childhood teacher education ethical values
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部