In the information age,electronic documents(e-documents)have become a popular alternative to paper documents due to their lower costs,higher dissemination rates,and ease of knowledge sharing.However,digital copyright ...In the information age,electronic documents(e-documents)have become a popular alternative to paper documents due to their lower costs,higher dissemination rates,and ease of knowledge sharing.However,digital copyright infringements occur frequently due to the ease of copying,which not only infringes on the rights of creators but also weakens their creative enthusiasm.Therefore,it is crucial to establish an e-document sharing system that enforces copyright protection.However,the existing centralized system has outstanding vulnerabilities,and the plagiarism detection algorithm used cannot fully detect the context,semantics,style,and other factors of the text.Digital watermark technology is only used as a means of infringement tracing.This paper proposes a decentralized framework for e-document sharing based on decentralized autonomous organization(DAO)and non-fungible token(NFT)in blockchain.The use of blockchain as a distributed credit base resolves the vulnerabilities inherent in traditional centralized systems.The e-document evaluation and plagiarism detection mechanisms based on the DAO model effectively address challenges in comprehensive text information checks,thereby promoting the enhancement of e-document quality.The mechanism for protecting and circulating e-document copyrights using NFT technology ensures effective safeguarding of users’e-document copyrights and facilitates e-document sharing.Moreover,recognizing the security issues within the DAO governance mechanism,we introduce an innovative optimization solution.Through experimentation,we validate the enhanced security of the optimized governance mechanism,reducing manipulation risks by up to 51%.Additionally,by utilizing evolutionary game analysis to deduce the equilibrium strategies of the framework,we discovered that adjusting the reward and penalty parameters of the incentive mechanism motivates creators to generate superior quality and unique e-documents,while evaluators are more likely to engage in assessments.展开更多
Purpose:Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS)and Scopus is important.This study aims to investigate the document type assignation of review arti...Purpose:Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS)and Scopus is important.This study aims to investigate the document type assignation of review articles in Web of Science,Scopus and Publisher’s websites on a large scale.Design/methodology/approach:27,616 papers from 160 journals from 10 review journal series indexed in SCI are analyzed.The document types of these papers labeled on journals’websites,and assigned by WoS and Scopus are retrieved and compared to determine the assigning accuracy and identify the possible reasons for wrongly assigning.For the document type labeled on the website,we further differentiate them into explicit review and implicit review based on whether the website directly indicates it is a review or not.Findings:Overall,WoS and Scopus performed similarly,with an average precision of about 99% and recall of about 80%.However,there were some differences between WoS and Scopus across different journal series and within the same journal series.The assigning accuracy of WoS and Scopus for implicit reviews dropped significantly,especially for Scopus.Research limitations:The document types we used as the gold standard were based on the journal websites’labeling which were not manually validated one by one.We only studied the labeling performance for review articles published during 2017-2018 in review journals.Whether this conclusion can be extended to review articles published in non-review journals and most current situation is not very clear.Practical implications:This study provides a reference for the accuracy of document type assigning of review articles in WoS and Scopus,and the identified pattern for assigning implicit reviews may be helpful to better labeling on websites,WoS and Scopus.Originality/value:This study investigated the assigning accuracy of document type of reviews and identified the some patterns of wrong assignments.展开更多
The Gannet Optimization Algorithm (GOA) and the Whale Optimization Algorithm (WOA) demonstrate strong performance;however, there remains room for improvement in convergence and practical applications. This study intro...The Gannet Optimization Algorithm (GOA) and the Whale Optimization Algorithm (WOA) demonstrate strong performance;however, there remains room for improvement in convergence and practical applications. This study introduces a hybrid optimization algorithm, named the adaptive inertia weight whale optimization algorithm and gannet optimization algorithm (AIWGOA), which addresses challenges in enhancing handwritten documents. The hybrid strategy integrates the strengths of both algorithms, significantly enhancing their capabilities, whereas the adaptive parameter strategy mitigates the need for manual parameter setting. By amalgamating the hybrid strategy and parameter-adaptive approach, the Gannet Optimization Algorithm was refined to yield the AIWGOA. Through a performance analysis of the CEC2013 benchmark, the AIWGOA demonstrates notable advantages across various metrics. Subsequently, an evaluation index was employed to assess the enhanced handwritten documents and images, affirming the superior practical application of the AIWGOA compared with other algorithms.展开更多
As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of d...As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of digitized documents,the classification of digitized documents in real time has been identified as the primary goal of our study.A paper classification is the first stage in automating document control and efficient knowledge discovery with no or little human involvement.Artificial intelligence methods such as Deep Learning are now combined with segmentation to study and interpret those traits,which were not conceivable ten years ago.Deep learning aids in comprehending input patterns so that object classes may be predicted.The segmentation process divides the input image into separate segments for a more thorough image study.This study proposes a deep learning-enabled framework for automated document classification,which can be implemented in higher education.To further this goal,a dataset was developed that includes seven categories:Diplomas,Personal documents,Journal of Accounting of higher education diplomas,Service letters,Orders,Production orders,and Student orders.Subsequently,a deep learning model based on Conv2D layers is proposed for the document classification process.In the final part of this research,the proposed model is evaluated and compared with other machine-learning techniques.The results demonstrate that the proposed deep learning model shows high results in document categorization overtaking the other machine learning models by reaching 94.84%,94.79%,94.62%,94.43%,94.07%in accuracy,precision,recall,F-score,and AUC-ROC,respectively.The achieved results prove that the proposed deep model is acceptable to use in practice as an assistant to an office worker.展开更多
Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such...Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.展开更多
Research on the use of EHR is contradictory since it presents contradicting results regarding the time spent documenting. There is research that supports the use of electronic records as a tool to speed documentation;...Research on the use of EHR is contradictory since it presents contradicting results regarding the time spent documenting. There is research that supports the use of electronic records as a tool to speed documentation;and research that found that it is time consuming. The purpose of this quantitative retrospective before-after project was to measure the impact of using the laboratory value flowsheet within the EHR on documentation time. The research question was: “Does the use of a laboratory value flowsheet in the EHR impact documentation time by primary care providers (PCPs)?” The theoretical framework utilized in this project was the Donabedian Model. The population in this research was the two PCPs in a small primary care clinic in the northwest of Puerto Rico. The sample was composed of all the encounters during the months of October 2019 and December 2019. The data was obtained through data mining and analyzed using SPSS 27. The evaluative outcome of this project is that there is a decrease in documentation time after implementation of the use of the laboratory value flowsheet in the EHR. However, patients per day increase therefore having an impact on the number of patients seen per day/week/month. The implications for clinical practice include the use of templates to improve workflow and documentation as well as decreasing documentation time while also increasing the number of patients seen per day. .展开更多
With the widespread use of Chinese globally, the number of Chinese learners has been increasing, leading to various grammatical errors among beginners. Additionally, as domestic efforts to develop industrial informati...With the widespread use of Chinese globally, the number of Chinese learners has been increasing, leading to various grammatical errors among beginners. Additionally, as domestic efforts to develop industrial information grow, electronic documents have also proliferated. When dealing with numerous electronic documents and texts written by Chinese beginners, manually written texts often contain hidden grammatical errors, posing a significant challenge to traditional manual proofreading. Correcting these grammatical errors is crucial to ensure fluency and readability. However, certain special types of text grammar or logical errors can have a huge impact, and manually proofreading a large number of texts individually is clearly impractical. Consequently, research on text error correction techniques has garnered significant attention in recent years. The advent and advancement of deep learning have paved the way for sequence-to-sequence learning methods to be extensively applied to the task of text error correction. This paper presents a comprehensive analysis of Chinese text grammar error correction technology, elaborates on its current research status, discusses existing problems, proposes preliminary solutions, and conducts experiments using judicial documents as an example. The aim is to provide a feasible research approach for Chinese text error correction technology.展开更多
A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume...A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.展开更多
Cross-document relation extraction(RE),as an extension of information extraction,requires integrating information from multiple documents retrieved from open domains with a large number of irrelevant or confusing nois...Cross-document relation extraction(RE),as an extension of information extraction,requires integrating information from multiple documents retrieved from open domains with a large number of irrelevant or confusing noisy texts.Previous studies focus on the attention mechanism to construct the connection between different text features through semantic similarity.However,similarity-based methods cannot distinguish valid information from highly similar retrieved documents well.How to design an effective algorithm to implement aggregated reasoning in confusing information with similar features still remains an open issue.To address this problem,we design a novel local-toglobal causal reasoning(LGCR)network for cross-document RE,which enables efficient distinguishing,filtering and global reasoning on complex information from a causal perspective.Specifically,we propose a local causal estimation algorithm to estimate the causal effect,which is the first trial to use the causal reasoning independent of feature similarity to distinguish between confusing and valid information in cross-document RE.Furthermore,based on the causal effect,we propose a causality guided global reasoning algorithm to filter the confusing information and achieve global reasoning.Experimental results under the closed and the open settings of the large-scale dataset Cod RED demonstrate our LGCR network significantly outperforms the state-ofthe-art methods and validate the effectiveness of causal reasoning in confusing information processing.展开更多
The covers of booklets and books in folk documents primarily serve to protect the pages.Owing to long-term storage limitations,a considerable number of book covers have suffered varying degrees of damage.Following the...The covers of booklets and books in folk documents primarily serve to protect the pages.Owing to long-term storage limitations,a considerable number of book covers have suffered varying degrees of damage.Following the principles of restoration,a comparative analysis and restoration of folk document covers were conducted,selecting four different types of carriers from the Taihang Mountain Documents,ranging from the Qing dynasty to the Republican Era.These carriers included hemp,mulberry bark,and machinemade paper,and cotton blue cloth.Each cover type was matched with an appropriate restoration paper,and different methods were employed during the restoration process.Through restoration,the previously damaged document covers can continue to fulfill their role in protecting the books,thereby extending the lifespan of these four folk documents.展开更多
Desertification is increasingly serious in Xinjiang,and the construction of water conservancy is a precondition for the development of agriculture.The main project for the development of agriculture and water conserva...Desertification is increasingly serious in Xinjiang,and the construction of water conservancy is a precondition for the development of agriculture.The main project for the development of agriculture and water conservancy in Xinjiang is to build Karez,which played a vital role in the development of Xinjiang agriculture in the Qing Dynasty.It has been recorded many times in historical documents of the Qing Dynasty,such as Lin Zexu s Diary,Tao Baolian s Diary,Xinjiang Atlas and Zuo Zongtang s Memorial to the Emperor,etc.,which recorded the situation and historical origin of Karez.Karez made a significant contribution to the development of agriculture in the Qing Dynasty.It increased the cultivated land in Xinjiang at that time,and increased the types and yields of crops.It is conducive to the stability and development of Xinjiang s economy.Until today,Karez is still an important water source for agricultural irrigation in Xinjiang.展开更多
Traditional human rights theory tends to hold that human rights should be aimed at defending public authority and that the legal issue of human rights is a matter of public law.However,the development of human rights ...Traditional human rights theory tends to hold that human rights should be aimed at defending public authority and that the legal issue of human rights is a matter of public law.However,the development of human rights concepts and practices is not just confined to this.A textual search shows that the term“human rights”exists widely in China’s civil judicial documents.Among the 3,412 civil judicial documents we researched,the concept of“human rights”penetrates all kinds of disputes in lawsuits,ranging from property rights,contracts,labor,and torts to marital property,which is embedded in both the claims of the parties concerned and the reasoning of judges.Human rights have become the discourse and yardstick for understanding and evaluating social behavior.The widespread use of the term“human rights”in civil judicial documents reflects at least three concepts related to human rights:first,the rights to subsistence and development are the primary basic human rights;second,the judicial protection of human rights is a bottom-line guarantee;third,the protection of human rights aims to achieve equal rights.Today,judges quote the theory of human rights in judicial judgments from time to time,evidencing that human rights have a practical function in judicial adjudication activities,and in practice this is mainly manifested in declaring righteous values and strengthening arguments with the values and ideas related to human rights,using the provisions concerning human rights in the Constitution to interpret the constitutionality,and using the principles of human rights to interpret blurred rules and rank the importance of different rights.展开更多
This paper explores the potential of applying online collaborative documents to foster critical thinking skills in EFL college-level classrooms.Considering the limitations of traditional teacher-centered approaches an...This paper explores the potential of applying online collaborative documents to foster critical thinking skills in EFL college-level classrooms.Considering the limitations of traditional teacher-centered approaches and the need for innovative methods,the study examines the integration of online collaborative tools,using Tencent Docs as an example.The discussion highlights the importance of critical thinking in the academic and professional spheres and introduces the concept of online collaborative documents for enhancing this cognitive skill.Through a detailed exploration,the paper presents a model of employing collaborative documents within a college English class,demonstrating how students collaboratively learning an article.Then,the paper discusses the pros and cons of employing this technology in classroom.The conclusion emphasizes the transformative potential of integrating technology into pedagogy and its role in creating a dynamic learning environment.The paper underscores the importance of striking a balance between technology and traditional methods,foreseeing avenues for further research and development.展开更多
A fully automated paper document sorting robot was developed in this project.This robot classifies documents efficiently and accurately.The objective of this project was to improve the efficiency of classifying or sor...A fully automated paper document sorting robot was developed in this project.This robot classifies documents efficiently and accurately.The objective of this project was to improve the efficiency of classifying or sorting paper documents,reduce costs,and save time.The robot can classify documents according to user-defined rules,such as keywords,dates,serial numbers,bar codes,and the meaning of paragraphs.Since it can classify or sort documents intelligently,it can complete large-scale document classification quickly.The robot is constructed using an aluminum profile to create a box-type truss gantry structure frame.It was built on the LubanCat 4 motherboard and controlled through Python language programming.Driven by a stepper motor to move the manipulator.The camera module is combined with an artificial intelligence algorithm to recognize paper in real time,and the text is recognized after taking pictures of the paper.The sorting function is performed by several sensors.In addition,a web-based human-computer interaction platform was developed using the Flask web framework in Python.Users could access this platform in a variety of ways,allowing them to easily and swiftly configure parameters and send operational instructions to perform various functions.展开更多
In this paper,the research achievements and progress of Yunnan tea germplasm resource in past sixty years are systematically reviewed from the following aspects:exploration,collecting,conservation,protection,identifi...In this paper,the research achievements and progress of Yunnan tea germplasm resource in past sixty years are systematically reviewed from the following aspects:exploration,collecting,conservation,protection,identification,evaluation and shared utilization.Simultaneously,the current problems and the suggestions about subsequent development of tea germplasm resources in Yunnan were discussed,including superior and rare germplasm collection,tea genetic diversity research,biotechnology utilization in tea germplasm innovation,super gene exploration and function,the construction of utilization platform,biological base of species and population conservation.展开更多
A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and...A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and document feature encoding. In the Rough-CC4, the documents are described by the equivalent classes of the approximate words. By this method, the dimensions representing the documents can be reduced, which can solve the precision problems caused by the different document sizes and also blur the differences caused by the approximate words. In the Rough-CC4, a binary encoding method is introduced, through which the importance of documents relative to each equivalent class is encoded. By this encoding method, the precision of the Rough-CC4 is improved greatly and the space complexity of the Rough-CC4 is reduced. The Rough-CC4 can be used in automatic classification of documents.展开更多
The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is av...The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be ally linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.展开更多
Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholdi...Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholding techniques that use fixed thresholds and windows,the proposed method introduces a concept for obtaining dynamic windows according to the image content to achieve better binarization.To enhance a low-contrast image,we proposed a new mean histogram stretching method for suppressing noisy pixels in the background and,simultaneously,increasing pixel contrast at edges or near edges,which results in an enhanced image.For the enhanced image,we propose a new method for deriving adaptive local thresholds for dynamic windows.The dynamic window is derived by exploiting the advantage of Otsu thresholding.To assess the performance of the proposed method,we have used standard databases,namely,document image binarization contest(DIBCO),for experimentation.The comparative study on well-known existing methods indicates that the proposed method outperforms the existing methods in terms of quality and recognition rate.展开更多
The eXtensible markup language (XML) is a kind of new meta language for replacing HTML and has many advantages. Traditional engineering documents have too many expression forms to be expediently managed and have no dy...The eXtensible markup language (XML) is a kind of new meta language for replacing HTML and has many advantages. Traditional engineering documents have too many expression forms to be expediently managed and have no dynamic correlation functions. This paper introduces a new method and uses XML to store and manage engineering documents to realize the format unity of engineering documents and their dynamic correlations.展开更多
基金This work is supported by the National Key Research and Development Program(2022YFB2702300)National Natural Science Foundation of China(Grant No.62172115)+2 种基金Innovation Fund Program of the Engineering Research Center for Integration and Application of Digital Learning Technology of Ministry of Education under Grant Number No.1331005Guangdong Higher Education Innovation Group 2020KCXTD007Guangzhou Fundamental Research Plan of Municipal-School Jointly Funded Projects(No.202102010445).
文摘In the information age,electronic documents(e-documents)have become a popular alternative to paper documents due to their lower costs,higher dissemination rates,and ease of knowledge sharing.However,digital copyright infringements occur frequently due to the ease of copying,which not only infringes on the rights of creators but also weakens their creative enthusiasm.Therefore,it is crucial to establish an e-document sharing system that enforces copyright protection.However,the existing centralized system has outstanding vulnerabilities,and the plagiarism detection algorithm used cannot fully detect the context,semantics,style,and other factors of the text.Digital watermark technology is only used as a means of infringement tracing.This paper proposes a decentralized framework for e-document sharing based on decentralized autonomous organization(DAO)and non-fungible token(NFT)in blockchain.The use of blockchain as a distributed credit base resolves the vulnerabilities inherent in traditional centralized systems.The e-document evaluation and plagiarism detection mechanisms based on the DAO model effectively address challenges in comprehensive text information checks,thereby promoting the enhancement of e-document quality.The mechanism for protecting and circulating e-document copyrights using NFT technology ensures effective safeguarding of users’e-document copyrights and facilitates e-document sharing.Moreover,recognizing the security issues within the DAO governance mechanism,we introduce an innovative optimization solution.Through experimentation,we validate the enhanced security of the optimized governance mechanism,reducing manipulation risks by up to 51%.Additionally,by utilizing evolutionary game analysis to deduce the equilibrium strategies of the framework,we discovered that adjusting the reward and penalty parameters of the incentive mechanism motivates creators to generate superior quality and unique e-documents,while evaluators are more likely to engage in assessments.
文摘Purpose:Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS)and Scopus is important.This study aims to investigate the document type assignation of review articles in Web of Science,Scopus and Publisher’s websites on a large scale.Design/methodology/approach:27,616 papers from 160 journals from 10 review journal series indexed in SCI are analyzed.The document types of these papers labeled on journals’websites,and assigned by WoS and Scopus are retrieved and compared to determine the assigning accuracy and identify the possible reasons for wrongly assigning.For the document type labeled on the website,we further differentiate them into explicit review and implicit review based on whether the website directly indicates it is a review or not.Findings:Overall,WoS and Scopus performed similarly,with an average precision of about 99% and recall of about 80%.However,there were some differences between WoS and Scopus across different journal series and within the same journal series.The assigning accuracy of WoS and Scopus for implicit reviews dropped significantly,especially for Scopus.Research limitations:The document types we used as the gold standard were based on the journal websites’labeling which were not manually validated one by one.We only studied the labeling performance for review articles published during 2017-2018 in review journals.Whether this conclusion can be extended to review articles published in non-review journals and most current situation is not very clear.Practical implications:This study provides a reference for the accuracy of document type assigning of review articles in WoS and Scopus,and the identified pattern for assigning implicit reviews may be helpful to better labeling on websites,WoS and Scopus.Originality/value:This study investigated the assigning accuracy of document type of reviews and identified the some patterns of wrong assignments.
文摘The Gannet Optimization Algorithm (GOA) and the Whale Optimization Algorithm (WOA) demonstrate strong performance;however, there remains room for improvement in convergence and practical applications. This study introduces a hybrid optimization algorithm, named the adaptive inertia weight whale optimization algorithm and gannet optimization algorithm (AIWGOA), which addresses challenges in enhancing handwritten documents. The hybrid strategy integrates the strengths of both algorithms, significantly enhancing their capabilities, whereas the adaptive parameter strategy mitigates the need for manual parameter setting. By amalgamating the hybrid strategy and parameter-adaptive approach, the Gannet Optimization Algorithm was refined to yield the AIWGOA. Through a performance analysis of the CEC2013 benchmark, the AIWGOA demonstrates notable advantages across various metrics. Subsequently, an evaluation index was employed to assess the enhanced handwritten documents and images, affirming the superior practical application of the AIWGOA compared with other algorithms.
文摘As digital technologies have advanced more rapidly,the number of paper documents recently converted into a digital format has exponentially increased.To respond to the urgent need to categorize the growing number of digitized documents,the classification of digitized documents in real time has been identified as the primary goal of our study.A paper classification is the first stage in automating document control and efficient knowledge discovery with no or little human involvement.Artificial intelligence methods such as Deep Learning are now combined with segmentation to study and interpret those traits,which were not conceivable ten years ago.Deep learning aids in comprehending input patterns so that object classes may be predicted.The segmentation process divides the input image into separate segments for a more thorough image study.This study proposes a deep learning-enabled framework for automated document classification,which can be implemented in higher education.To further this goal,a dataset was developed that includes seven categories:Diplomas,Personal documents,Journal of Accounting of higher education diplomas,Service letters,Orders,Production orders,and Student orders.Subsequently,a deep learning model based on Conv2D layers is proposed for the document classification process.In the final part of this research,the proposed model is evaluated and compared with other machine-learning techniques.The results demonstrate that the proposed deep learning model shows high results in document categorization overtaking the other machine learning models by reaching 94.84%,94.79%,94.62%,94.43%,94.07%in accuracy,precision,recall,F-score,and AUC-ROC,respectively.The achieved results prove that the proposed deep model is acceptable to use in practice as an assistant to an office worker.
文摘Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dualbranch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/Yong Z-Lee/TD-DCCAM.
文摘Research on the use of EHR is contradictory since it presents contradicting results regarding the time spent documenting. There is research that supports the use of electronic records as a tool to speed documentation;and research that found that it is time consuming. The purpose of this quantitative retrospective before-after project was to measure the impact of using the laboratory value flowsheet within the EHR on documentation time. The research question was: “Does the use of a laboratory value flowsheet in the EHR impact documentation time by primary care providers (PCPs)?” The theoretical framework utilized in this project was the Donabedian Model. The population in this research was the two PCPs in a small primary care clinic in the northwest of Puerto Rico. The sample was composed of all the encounters during the months of October 2019 and December 2019. The data was obtained through data mining and analyzed using SPSS 27. The evaluative outcome of this project is that there is a decrease in documentation time after implementation of the use of the laboratory value flowsheet in the EHR. However, patients per day increase therefore having an impact on the number of patients seen per day/week/month. The implications for clinical practice include the use of templates to improve workflow and documentation as well as decreasing documentation time while also increasing the number of patients seen per day. .
文摘With the widespread use of Chinese globally, the number of Chinese learners has been increasing, leading to various grammatical errors among beginners. Additionally, as domestic efforts to develop industrial information grow, electronic documents have also proliferated. When dealing with numerous electronic documents and texts written by Chinese beginners, manually written texts often contain hidden grammatical errors, posing a significant challenge to traditional manual proofreading. Correcting these grammatical errors is crucial to ensure fluency and readability. However, certain special types of text grammar or logical errors can have a huge impact, and manually proofreading a large number of texts individually is clearly impractical. Consequently, research on text error correction techniques has garnered significant attention in recent years. The advent and advancement of deep learning have paved the way for sequence-to-sequence learning methods to be extensively applied to the task of text error correction. This paper presents a comprehensive analysis of Chinese text grammar error correction technology, elaborates on its current research status, discusses existing problems, proposes preliminary solutions, and conducts experiments using judicial documents as an example. The aim is to provide a feasible research approach for Chinese text error correction technology.
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.
基金supported in part by the National Key Research and Development Program of China(2022ZD0116405)the Strategic Priority Research Program of the Chinese Academy of Sciences(XDA27030300)the Key Research Program of the Chinese Academy of Sciences(ZDBS-SSW-JSC006)。
文摘Cross-document relation extraction(RE),as an extension of information extraction,requires integrating information from multiple documents retrieved from open domains with a large number of irrelevant or confusing noisy texts.Previous studies focus on the attention mechanism to construct the connection between different text features through semantic similarity.However,similarity-based methods cannot distinguish valid information from highly similar retrieved documents well.How to design an effective algorithm to implement aggregated reasoning in confusing information with similar features still remains an open issue.To address this problem,we design a novel local-toglobal causal reasoning(LGCR)network for cross-document RE,which enables efficient distinguishing,filtering and global reasoning on complex information from a causal perspective.Specifically,we propose a local causal estimation algorithm to estimate the causal effect,which is the first trial to use the causal reasoning independent of feature similarity to distinguish between confusing and valid information in cross-document RE.Furthermore,based on the causal effect,we propose a causality guided global reasoning algorithm to filter the confusing information and achieve global reasoning.Experimental results under the closed and the open settings of the large-scale dataset Cod RED demonstrate our LGCR network significantly outperforms the state-ofthe-art methods and validate the effectiveness of causal reasoning in confusing information processing.
基金the research result of the 2022 Hebei Province Social Science Development Research Project:“Research on the Sustainability of Paper Protection of Revolutionary Literature Seen in Taihang Mountain Documents”(Project No.:20220303015).
文摘The covers of booklets and books in folk documents primarily serve to protect the pages.Owing to long-term storage limitations,a considerable number of book covers have suffered varying degrees of damage.Following the principles of restoration,a comparative analysis and restoration of folk document covers were conducted,selecting four different types of carriers from the Taihang Mountain Documents,ranging from the Qing dynasty to the Republican Era.These carriers included hemp,mulberry bark,and machinemade paper,and cotton blue cloth.Each cover type was matched with an appropriate restoration paper,and different methods were employed during the restoration process.Through restoration,the previously damaged document covers can continue to fulfill their role in protecting the books,thereby extending the lifespan of these four folk documents.
文摘Desertification is increasingly serious in Xinjiang,and the construction of water conservancy is a precondition for the development of agriculture.The main project for the development of agriculture and water conservancy in Xinjiang is to build Karez,which played a vital role in the development of Xinjiang agriculture in the Qing Dynasty.It has been recorded many times in historical documents of the Qing Dynasty,such as Lin Zexu s Diary,Tao Baolian s Diary,Xinjiang Atlas and Zuo Zongtang s Memorial to the Emperor,etc.,which recorded the situation and historical origin of Karez.Karez made a significant contribution to the development of agriculture in the Qing Dynasty.It increased the cultivated land in Xinjiang at that time,and increased the types and yields of crops.It is conducive to the stability and development of Xinjiang s economy.Until today,Karez is still an important water source for agricultural irrigation in Xinjiang.
文摘Traditional human rights theory tends to hold that human rights should be aimed at defending public authority and that the legal issue of human rights is a matter of public law.However,the development of human rights concepts and practices is not just confined to this.A textual search shows that the term“human rights”exists widely in China’s civil judicial documents.Among the 3,412 civil judicial documents we researched,the concept of“human rights”penetrates all kinds of disputes in lawsuits,ranging from property rights,contracts,labor,and torts to marital property,which is embedded in both the claims of the parties concerned and the reasoning of judges.Human rights have become the discourse and yardstick for understanding and evaluating social behavior.The widespread use of the term“human rights”in civil judicial documents reflects at least three concepts related to human rights:first,the rights to subsistence and development are the primary basic human rights;second,the judicial protection of human rights is a bottom-line guarantee;third,the protection of human rights aims to achieve equal rights.Today,judges quote the theory of human rights in judicial judgments from time to time,evidencing that human rights have a practical function in judicial adjudication activities,and in practice this is mainly manifested in declaring righteous values and strengthening arguments with the values and ideas related to human rights,using the provisions concerning human rights in the Constitution to interpret the constitutionality,and using the principles of human rights to interpret blurred rules and rank the importance of different rights.
文摘This paper explores the potential of applying online collaborative documents to foster critical thinking skills in EFL college-level classrooms.Considering the limitations of traditional teacher-centered approaches and the need for innovative methods,the study examines the integration of online collaborative tools,using Tencent Docs as an example.The discussion highlights the importance of critical thinking in the academic and professional spheres and introduces the concept of online collaborative documents for enhancing this cognitive skill.Through a detailed exploration,the paper presents a model of employing collaborative documents within a college English class,demonstrating how students collaboratively learning an article.Then,the paper discusses the pros and cons of employing this technology in classroom.The conclusion emphasizes the transformative potential of integrating technology into pedagogy and its role in creating a dynamic learning environment.The paper underscores the importance of striking a balance between technology and traditional methods,foreseeing avenues for further research and development.
基金supported by the Guangdong University Scientific Research Young Innovative Talents Project(Natural Science)under Grant 2021KQNCX240Zhanjiang Preschool Education College 2023 College Students Innovation and Entrepreneurship Training Program under Grant 2023ZYDC02.
文摘A fully automated paper document sorting robot was developed in this project.This robot classifies documents efficiently and accurately.The objective of this project was to improve the efficiency of classifying or sorting paper documents,reduce costs,and save time.The robot can classify documents according to user-defined rules,such as keywords,dates,serial numbers,bar codes,and the meaning of paragraphs.Since it can classify or sort documents intelligently,it can complete large-scale document classification quickly.The robot is constructed using an aluminum profile to create a box-type truss gantry structure frame.It was built on the LubanCat 4 motherboard and controlled through Python language programming.Driven by a stepper motor to move the manipulator.The camera module is combined with an artificial intelligence algorithm to recognize paper in real time,and the text is recognized after taking pictures of the paper.The sorting function is performed by several sensors.In addition,a web-based human-computer interaction platform was developed using the Flask web framework in Python.Users could access this platform in a variety of ways,allowing them to easily and swiftly configure parameters and send operational instructions to perform various functions.
基金Supported by Project of National Natural Science Foundation of China (31160175)Project of Tea Research Institute of Yunnan Academy of Agricultural Sciences (2009A0937)National Modern Agriculture Technology System Projects in Tea Industry (nycytx-23)~~
文摘In this paper,the research achievements and progress of Yunnan tea germplasm resource in past sixty years are systematically reviewed from the following aspects:exploration,collecting,conservation,protection,identification,evaluation and shared utilization.Simultaneously,the current problems and the suggestions about subsequent development of tea germplasm resources in Yunnan were discussed,including superior and rare germplasm collection,tea genetic diversity research,biotechnology utilization in tea germplasm innovation,super gene exploration and function,the construction of utilization platform,biological base of species and population conservation.
基金The National Natural Science Foundation of China(No.60503020,60373066,60403016,60425206),the Natural Science Foundation of Jiangsu Higher Education Institutions ( No.04KJB520096),the Doctoral Foundation of Nanjing University of Posts and Telecommunication (No.0302).
文摘A rough set based corner classification neural network, the Rough-CC4, is presented to solve document classification problems such as document representation of different document sizes, document feature selection and document feature encoding. In the Rough-CC4, the documents are described by the equivalent classes of the approximate words. By this method, the dimensions representing the documents can be reduced, which can solve the precision problems caused by the different document sizes and also blur the differences caused by the approximate words. In the Rough-CC4, a binary encoding method is introduced, through which the importance of documents relative to each equivalent class is encoded. By this encoding method, the precision of the Rough-CC4 is improved greatly and the space complexity of the Rough-CC4 is reduced. The Rough-CC4 can be used in automatic classification of documents.
基金Supported by the Funds of Heilongjiang Outstanding Young Teacher (1151G037).
文摘The major problem of the most current approaches of information models lies in that individual words provide unreliable evidence about the content of the texts. When the document is short, e.g. only the abstract is available, the word-use variability problem will have substantial impact on the Information Retrieval (IR) performance. To solve the problem, a new technology to short document retrieval named Reference Document Model (RDM) is put forward in this letter. RDM gets the statistical semantic of the query/document by pseudo feedback both for the query and document from reference documents. The contributions of this model are three-fold: (1) Pseudo feedback both for the query and the document; (2) Building the query model and the document model from reference documents; (3) Flexible indexing units, which can be ally linguistic elements such as documents, paragraphs, sentences, n-grams, term or character. For short document retrieval, RDM achieves significant improvements over the classical probabilistic models on the task of ad hoc retrieval on Text REtrieval Conference (TREC) test sets. Results also show that the shorter the document, the better the RDM performance.
基金funded by the Ministry of Higher Education,Malaysia for providing facilities and financial support under the Long Research Grant Scheme LRGS-1-2019-UKM-UKM-2-7.
文摘Achieving a good recognition rate for degraded document images is difficult as degraded document images suffer from low contrast,bleedthrough,and nonuniform illumination effects.Unlike the existing baseline thresholding techniques that use fixed thresholds and windows,the proposed method introduces a concept for obtaining dynamic windows according to the image content to achieve better binarization.To enhance a low-contrast image,we proposed a new mean histogram stretching method for suppressing noisy pixels in the background and,simultaneously,increasing pixel contrast at edges or near edges,which results in an enhanced image.For the enhanced image,we propose a new method for deriving adaptive local thresholds for dynamic windows.The dynamic window is derived by exploiting the advantage of Otsu thresholding.To assess the performance of the proposed method,we have used standard databases,namely,document image binarization contest(DIBCO),for experimentation.The comparative study on well-known existing methods indicates that the proposed method outperforms the existing methods in terms of quality and recognition rate.
文摘The eXtensible markup language (XML) is a kind of new meta language for replacing HTML and has many advantages. Traditional engineering documents have too many expression forms to be expediently managed and have no dynamic correlation functions. This paper introduces a new method and uses XML to store and manage engineering documents to realize the format unity of engineering documents and their dynamic correlations.