Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed...Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate.展开更多
This paper presents a new method for text detection, location and binarization from natural scenes. Several morphological steps are used to detect the general position of the text, including English, Chinese and Japan...This paper presents a new method for text detection, location and binarization from natural scenes. Several morphological steps are used to detect the general position of the text, including English, Chinese and Japanese characters. Next bonnding boxes are processed by a new “Expand, Break and Merge” (EBM) method to get the precise text areas. Finally, text is binarized by a hybrid method based on Otsu and Niblack. This new approach can extract different kinds of text from complicated natural scenes. It is insensitive to noise, distortedness, and text orientation. It also has good performance on extracting texts in various sizes.展开更多
针对现有数学表达式检索系统中待检索表达式与目标文档之间的语义关联问题,在使用序列化特征提取方法解析La Te X表达式的基础上,提出一种基于Ontology的数学表达式检索方法。运用Ontology建立数学表达式及其概念之间的联系并构建数学...针对现有数学表达式检索系统中待检索表达式与目标文档之间的语义关联问题,在使用序列化特征提取方法解析La Te X表达式的基础上,提出一种基于Ontology的数学表达式检索方法。运用Ontology建立数学表达式及其概念之间的联系并构建数学表达式语义本体库,以达到输入关键词、概念、短语和数学名词可检索数学表达式语义相关文献的目的。实验结果表明,基于Ontology的数学表达式检索方法运用本体概念扩展查询结果集,使得查全率、查准率和扩展率均有一定程度提高。展开更多
文摘Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate.
文摘This paper presents a new method for text detection, location and binarization from natural scenes. Several morphological steps are used to detect the general position of the text, including English, Chinese and Japanese characters. Next bonnding boxes are processed by a new “Expand, Break and Merge” (EBM) method to get the precise text areas. Finally, text is binarized by a hybrid method based on Otsu and Niblack. This new approach can extract different kinds of text from complicated natural scenes. It is insensitive to noise, distortedness, and text orientation. It also has good performance on extracting texts in various sizes.
文摘针对现有数学表达式检索系统中待检索表达式与目标文档之间的语义关联问题,在使用序列化特征提取方法解析La Te X表达式的基础上,提出一种基于Ontology的数学表达式检索方法。运用Ontology建立数学表达式及其概念之间的联系并构建数学表达式语义本体库,以达到输入关键词、概念、短语和数学名词可检索数学表达式语义相关文献的目的。实验结果表明,基于Ontology的数学表达式检索方法运用本体概念扩展查询结果集,使得查全率、查准率和扩展率均有一定程度提高。