A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed docume...A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.展开更多
Objective To explore the rules and characteristics of the adverse drug reactions(ADRs)of three Chinese patent medicines and three herbal formulas for the treatment of COVID-19,and to provide a reference for clinical s...Objective To explore the rules and characteristics of the adverse drug reactions(ADRs)of three Chinese patent medicines and three herbal formulas for the treatment of COVID-19,and to provide a reference for clinical safe medication.Methods The cases and ADR reports of the three Chinese patent medicines and three herbal formulas in PubMed,Web of Science,Springer Link,CNKI,Wanfang and VIP database were retrieved from December 2019 to May 2021.Then we extracted and analyzed the effective information included in the literature.Results and Conclusion According to the pre-developed retrieval plan,a total of 136 documents were obtained,and a total of 6 documents met the inclusion criteria finally.553 patients used three Chinese patent medicines and three herbal formulas,and there were 133 cases of adverse reactions.The adverse reactions of patients taking the three Chinese patent medicines and three herbal formulas can all be explained under the theory of traditional Chinese medicine,and the adverse reactions can be eliminated by adding or subtracting the flavor of the medicine or stopping the medicine.展开更多
Plain Language has made a great difference nowadays. As it turns out, Plain Language works effectively to express clearly, concisely and systematically. However, it is necessary for contemporary practitioners to revie...Plain Language has made a great difference nowadays. As it turns out, Plain Language works effectively to express clearly, concisely and systematically. However, it is necessary for contemporary practitioners to review the origin and development of Plain Language Movement and to examine whether it has thoroughly implemented Plain Language policies in every federal document. Examining a contemporary federal document against the Guidelines for Document Designers reveals existing problems for further development.展开更多
With the increasing popularity of mobile devices and the wide adoption of mobile Apps,an increasing concern of privacy issues is raised.Privacy policy is identified as a proper medium to indicate the legal terms,such ...With the increasing popularity of mobile devices and the wide adoption of mobile Apps,an increasing concern of privacy issues is raised.Privacy policy is identified as a proper medium to indicate the legal terms,such as the general data protection regulation(GDPR),and to bind legal agreement between service providers and users.However,privacy policies are usually long and vague for end users to read and understand.It is thus important to be able to automatically analyze the document structures of privacy policies to assist user understanding.In this work we create a manually labelled corpus containing 231 privacy policies(of more than 566,000 words and 7,748 annotated paragraphs).We benchmark our data corpus with 3 document classification models and achieve more than 82%on F1-score.展开更多
In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure...In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure,and the entire document image might be degraded.Imperfect conversion effects due to noise,motion blur,and skew distortion can lead to significant impact on the accuracy and effectiveness of document image segmentation and analysis in Optical Character Recognition(OCR)systems.In Document Image Analysis Systems(DIAS),skew estimation of images is a crucial step.In this paper,a novel,fast,and reliable skew detection algorithm based on the Radon Transform and Curve Length Fitness Function(CLF),so-called Radon CLF,was proposed.The Radon CLF model aims to take advantage of the properties of Radon spaces.The Radon CLF explores the dominating angle more effectively for a 1D signal than it does for a 2D input image due to an innovative fitness function formulation for a projected signal of the Radon space.Several significant performance indicators,including Mean Square Error(MSE),Mean Absolute Error(MAE),Peak Signal-to-Noise Ratio(PSNR),Structural Similarity Measure(SSIM),Accuracy,and run-time,were taken into consideration when assessing the performance of our model.In addition,a new dataset named DSI5000 was constructed to assess the accuracy of the CLF model.Both two-dimensional image signal and the Radon space have been used in our simulations to compare the noise effect.Obtained results show that the proposed method is more effective than other approaches already in use,with an accuracy of roughly 99.87%and a run-time of 0.048(s).The introduced model is far more accurate and timeefficient than current approaches in detecting image skew.展开更多
Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massiv...Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massive amount of data stored in the data centre containing similar information and file structures remaining in multi-copy,duplication leads to increase storage space.The potential deduplication system doesn’t make efficient data reduction because of inaccuracy in finding similar data analysis.It creates a complex nature to increase the storage consumption under cost.To resolve this problem,this paper proposes an efficient storage reduction called Hash-Indexing Block-based Deduplication(HIBD)based on Segmented Bind Linkage(SBL)Methods for reducing storage in a cloud environment.Initially,preprocessing is done using the sparse augmentation technique.Further,the preprocessed files are segmented into blocks to make Hash-Index.The block of the contents is compared with other files through Semantic Content Source Deduplication(SCSD),which identifies the similar content presence between the file.Based on the content presence count,the Distance Vector Weightage Correlation(DVWC)estimates the document similarity weight,and related files are grouped into a cluster.Finally,the segmented bind linkage compares the document to find duplicate content in the cluster using similarity weight based on the coefficient match case.This implementation helps identify the data redundancy efficiently and reduces the service cost in distributed cloud storage.展开更多
The use of electronic signatures as a form of identification is increasingly common,yet they have been shown to lack the dynamic features found in online signatures.In this study,handwritten signatures were scanned to...The use of electronic signatures as a form of identification is increasingly common,yet they have been shown to lack the dynamic features found in online signatures.In this study,handwritten signatures were scanned to produce electronically scanned signatures(ESS)which were then digitally altered to produce digitally constructed signatures(DCS).The ESS and DCS were presented back to participants to identify which were genuine.Only 1%of participants correctly identified all signatures,with a mean score of 57.6%identifications.The lack of self-recognition of ESS raises questions on their reliability and usefulness as means of personal identification.展开更多
Purpose:The new Finnish National Core Curriculum for Early Childhood Education and Care(2018)strongly highlights pedagogical knowledge and practice,demanding teachers to develop their pedagogical thinking,evaluation,j...Purpose:The new Finnish National Core Curriculum for Early Childhood Education and Care(2018)strongly highlights pedagogical knowledge and practice,demanding teachers to develop their pedagogical thinking,evaluation,judgment,and operating culture.Since ethics is viewed as vital characteristics of the teaching profession,our objective is to make these complex ethical issues more visible to be subject to democratic discussion and change.Design/Approach/Methods:The framework comprises a broad theory base of codes of ethics and professional codes of ethics of teaching.The research materials were national curricula of early childhood education and care(ECEC)-and pre-primary education.The eight-step qualitative analysis process was applied to identify and shed light on the codes of ethics laying the foundations forpurposeful and ethical earlychildhood education(ECE)teacher.Findings:The results indicate that through both theoretical lenses,the Finnish ECEC curricula comprise several ethical codes.For the future purposeful ECE teachers as ethical professionals,the results raise questions for further discussion.Particularly,issues related to the ethics of care,intellectual freedom,inquiry stance,and professional competence,and diversity may further enhance our ECEC curricula.展开更多
Although the corporate relationship manager seems to be the key enabler in commercial banking,the personal relationship sales model is not a sustainable model for the paradigm shift in digital financial markets.In thi...Although the corporate relationship manager seems to be the key enabler in commercial banking,the personal relationship sales model is not a sustainable model for the paradigm shift in digital financial markets.In this research,we propose a knowledge-driven decision analytics approach to improve the decision process.However,most of the corporate client documents processed in banks are not well-structured and the traditional analysis approach does not consider the document structure,which carries rich semantic information.We propose a document structure-based text representation approach with incorporating auxiliary information in the predictive analytics of unstructured data to improve the performance in the document classification task.The proposed approach significantly outperforms the traditional whole document approach which does not take into considerations of the document structure.With the proposed approach,knowledge can be effectively and efficiently used for business decisions and planning to improve the competitive advantage and substantiality of banks.展开更多
基金This research was supported and funded by KAU Scientific Endowment,King Abdulaziz University,Jeddah,Saudi Arabia.
文摘A document layout can be more informative than merely a document’s visual and structural appearance.Thus,document layout analysis(DLA)is considered a necessary prerequisite for advanced processing and detailed document image analysis to be further used in several applications and different objectives.This research extends the traditional approaches of DLA and introduces the concept of semantic document layout analysis(SDLA)by proposing a novel framework for semantic layout analysis and characterization of handwritten manuscripts.The proposed SDLA approach enables the derivation of implicit information and semantic characteristics,which can be effectively utilized in dozens of practical applications for various purposes,in a way bridging the semantic gap and providingmore understandable high-level document image analysis and more invariant characterization via absolute and relative labeling.This approach is validated and evaluated on a large dataset ofArabic handwrittenmanuscripts comprising complex layouts.The experimental work shows promising results in terms of accurate and effective semantic characteristic-based clustering and retrieval of handwritten manuscripts.It also indicates the expected efficacy of using the capabilities of the proposed approach in automating and facilitating many functional,reallife tasks such as effort estimation and pricing of transcription or typing of such complex manuscripts.
文摘Objective To explore the rules and characteristics of the adverse drug reactions(ADRs)of three Chinese patent medicines and three herbal formulas for the treatment of COVID-19,and to provide a reference for clinical safe medication.Methods The cases and ADR reports of the three Chinese patent medicines and three herbal formulas in PubMed,Web of Science,Springer Link,CNKI,Wanfang and VIP database were retrieved from December 2019 to May 2021.Then we extracted and analyzed the effective information included in the literature.Results and Conclusion According to the pre-developed retrieval plan,a total of 136 documents were obtained,and a total of 6 documents met the inclusion criteria finally.553 patients used three Chinese patent medicines and three herbal formulas,and there were 133 cases of adverse reactions.The adverse reactions of patients taking the three Chinese patent medicines and three herbal formulas can all be explained under the theory of traditional Chinese medicine,and the adverse reactions can be eliminated by adding or subtracting the flavor of the medicine or stopping the medicine.
文摘Plain Language has made a great difference nowadays. As it turns out, Plain Language works effectively to express clearly, concisely and systematically. However, it is necessary for contemporary practitioners to review the origin and development of Plain Language Movement and to examine whether it has thoroughly implemented Plain Language policies in every federal document. Examining a contemporary federal document against the Guidelines for Document Designers reveals existing problems for further development.
基金This work was supported by the National Natural Science Foundation of China(Grant Nos.61802275 and U1836214)the Innovation fund of Tianjin University(2020XRG-0022).
文摘With the increasing popularity of mobile devices and the wide adoption of mobile Apps,an increasing concern of privacy issues is raised.Privacy policy is identified as a proper medium to indicate the legal terms,such as the general data protection regulation(GDPR),and to bind legal agreement between service providers and users.However,privacy policies are usually long and vague for end users to read and understand.It is thus important to be able to automatically analyze the document structures of privacy policies to assist user understanding.In this work we create a manually labelled corpus containing 231 privacy policies(of more than 566,000 words and 7,748 annotated paragraphs).We benchmark our data corpus with 3 document classification models and achieve more than 82%on F1-score.
文摘In the digital world,a wide range of handwritten and printed documents should be converted to digital format using a variety of tools,including mobile phones and scanners.Unfortunately,this is not an optimal procedure,and the entire document image might be degraded.Imperfect conversion effects due to noise,motion blur,and skew distortion can lead to significant impact on the accuracy and effectiveness of document image segmentation and analysis in Optical Character Recognition(OCR)systems.In Document Image Analysis Systems(DIAS),skew estimation of images is a crucial step.In this paper,a novel,fast,and reliable skew detection algorithm based on the Radon Transform and Curve Length Fitness Function(CLF),so-called Radon CLF,was proposed.The Radon CLF model aims to take advantage of the properties of Radon spaces.The Radon CLF explores the dominating angle more effectively for a 1D signal than it does for a 2D input image due to an innovative fitness function formulation for a projected signal of the Radon space.Several significant performance indicators,including Mean Square Error(MSE),Mean Absolute Error(MAE),Peak Signal-to-Noise Ratio(PSNR),Structural Similarity Measure(SSIM),Accuracy,and run-time,were taken into consideration when assessing the performance of our model.In addition,a new dataset named DSI5000 was constructed to assess the accuracy of the CLF model.Both two-dimensional image signal and the Radon space have been used in our simulations to compare the noise effect.Obtained results show that the proposed method is more effective than other approaches already in use,with an accuracy of roughly 99.87%and a run-time of 0.048(s).The introduced model is far more accurate and timeefficient than current approaches in detecting image skew.
文摘Cloud storage is essential for managing user data to store and retrieve from the distributed data centre.The storage service is distributed as pay a service for accessing the size to collect the data.Due to the massive amount of data stored in the data centre containing similar information and file structures remaining in multi-copy,duplication leads to increase storage space.The potential deduplication system doesn’t make efficient data reduction because of inaccuracy in finding similar data analysis.It creates a complex nature to increase the storage consumption under cost.To resolve this problem,this paper proposes an efficient storage reduction called Hash-Indexing Block-based Deduplication(HIBD)based on Segmented Bind Linkage(SBL)Methods for reducing storage in a cloud environment.Initially,preprocessing is done using the sparse augmentation technique.Further,the preprocessed files are segmented into blocks to make Hash-Index.The block of the contents is compared with other files through Semantic Content Source Deduplication(SCSD),which identifies the similar content presence between the file.Based on the content presence count,the Distance Vector Weightage Correlation(DVWC)estimates the document similarity weight,and related files are grouped into a cluster.Finally,the segmented bind linkage compares the document to find duplicate content in the cluster using similarity weight based on the coefficient match case.This implementation helps identify the data redundancy efficiently and reduces the service cost in distributed cloud storage.
基金Supported by the Social Science Foundation of Shaanxi Province of China(2018P03)the Humanities and Social Sciences Research Youth Fund Project of Ministry of Education of China(13YJCZH251)
文摘在聚类算法的 K 工具,每个数据点特别地被放进一个范畴。聚类的质量重重地依赖于起始的簇矩心。不同初始化能产出改变的结果;本地调整不能从差的本地 optima 节省聚类的结果。如果在簇有一个异例,它将严重影响簇平均数价值。聚类算法的 K 工具对有凸的形状的簇仅仅合适。我们因此建议所有矩心被距离排序的工具从一个点和放射性元素珍视的所有等级距离(卡片)是的新奇聚类算法 CARDBKcentroid 指的批 K-meansin 的 initials 不仅最近修改一颗簇矩心到这个点而且修改邻近的多重簇矩心一颗簇矩心上的一个点的影响的这个点,和度取决于在这个点和另外的更近的簇矩心之间的距离价值。当基于下列性能索引在很多个不同数据集合上测试了时,试验性的结果证明我们的 CARDBK 算法超过了另外的算法:熵,纯净, F1 价值,边索引和规范的相互的信息(不可屏敝中断) 。我们的算法表明了更稳定、线性地可伸缩、更快。
文摘The use of electronic signatures as a form of identification is increasingly common,yet they have been shown to lack the dynamic features found in online signatures.In this study,handwritten signatures were scanned to produce electronically scanned signatures(ESS)which were then digitally altered to produce digitally constructed signatures(DCS).The ESS and DCS were presented back to participants to identify which were genuine.Only 1%of participants correctly identified all signatures,with a mean score of 57.6%identifications.The lack of self-recognition of ESS raises questions on their reliability and usefulness as means of personal identification.
文摘Purpose:The new Finnish National Core Curriculum for Early Childhood Education and Care(2018)strongly highlights pedagogical knowledge and practice,demanding teachers to develop their pedagogical thinking,evaluation,judgment,and operating culture.Since ethics is viewed as vital characteristics of the teaching profession,our objective is to make these complex ethical issues more visible to be subject to democratic discussion and change.Design/Approach/Methods:The framework comprises a broad theory base of codes of ethics and professional codes of ethics of teaching.The research materials were national curricula of early childhood education and care(ECEC)-and pre-primary education.The eight-step qualitative analysis process was applied to identify and shed light on the codes of ethics laying the foundations forpurposeful and ethical earlychildhood education(ECE)teacher.Findings:The results indicate that through both theoretical lenses,the Finnish ECEC curricula comprise several ethical codes.For the future purposeful ECE teachers as ethical professionals,the results raise questions for further discussion.Particularly,issues related to the ethics of care,intellectual freedom,inquiry stance,and professional competence,and diversity may further enhance our ECEC curricula.
文摘Although the corporate relationship manager seems to be the key enabler in commercial banking,the personal relationship sales model is not a sustainable model for the paradigm shift in digital financial markets.In this research,we propose a knowledge-driven decision analytics approach to improve the decision process.However,most of the corporate client documents processed in banks are not well-structured and the traditional analysis approach does not consider the document structure,which carries rich semantic information.We propose a document structure-based text representation approach with incorporating auxiliary information in the predictive analytics of unstructured data to improve the performance in the document classification task.The proposed approach significantly outperforms the traditional whole document approach which does not take into considerations of the document structure.With the proposed approach,knowledge can be effectively and efficiently used for business decisions and planning to improve the competitive advantage and substantiality of banks.