期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Document Analysis by Crosscount Approach
1
作者 王海琴 戴汝为 《Journal of Computer Science & Technology》 SCIE EI CSCD 1998年第1期32-40,共9页
In this paper a new feature called crosscount for document analysis is introduced.The feature crosscount is a function of white line segment with its start on the edgeof document images. It reflects not only the conto... In this paper a new feature called crosscount for document analysis is introduced.The feature crosscount is a function of white line segment with its start on the edgeof document images. It reflects not only the contour of image, but also the periodicity of white lines(background) and text lines in the document images. In complexprinted-page layouts, there are different blocks such as textual, graphical, tabular, andso on. of these blocks, textual ones have the most obvious periodicity with their homogeneous white lines arranged regularly. The important property of textual blockscan be extracted by crosscount functions. Here the document layouts are classifiedinto three classes on the basis of their physical structures. Then the definition andproperties of the crosscount function are described. According to the classification ofdocument layouts, the application of this new feature to different types of documentimages analysis and understanding is discussed. 展开更多
关键词 Crosscount PROJECTION RUN-LENGTH document analysis document understanding skew detection skew correction
原文传递
USEVis:Visual analytics of attention-based neural embedding in information retrieval
2
作者 Xiaonan Ji Yamei Tu +3 位作者 Wenbin He Junpeng Wang Han-Wei Shen Po-Yin Yen 《Visual Informatics》 EI 2021年第2期1-12,共12页
Neural attention-based encoders,which effectively attend sentence tokens to their associated context without being restricted by long-term distance or dependency,have demonstrated outstanding performance in embedding ... Neural attention-based encoders,which effectively attend sentence tokens to their associated context without being restricted by long-term distance or dependency,have demonstrated outstanding performance in embedding sentences into meaningful representations(embeddings).The Universal Sentence Encoder(USE)is one of the most well-recognized deep neural network(DNN)based solutions,which is facilitated with an attention-driven transformer architecture and has been pre-trained on a large number of sentences from the Internet.Besides the fact that USE has been widely used in many downstream applications,including information retrieval(IR),interpreting its complicated internal working mechanism remains challenging.In this work,we present a visual analytics solution towards addressing this challenge.Specifically,focused on semantics and syntactics(concepts and relations)that are critical to domain clinical IR,we designed and developed a visual analytics system,i.e.,USEVis.The system investigates the power of USE in effectively extracting sentences’semantics and syntactics through exploring and interpreting how linguistic properties are captured by attentions.Furthermore,by thoroughly examining and comparing the inherent patterns of these attentions,we are able to exploit attentions to retrieve sentences/documents that have similar semantics or are closely related to a given clinical problem in IR.By collaborating with domain experts,we demonstrate use cases with inspiring findings to validate the contribution of our work and the effectiveness of our system. 展开更多
关键词 Interactive visual system Neural embedding Attention mechanism document understanding Information retrieval Clinical decision-making
原文传递
Measuring Similarity of Academic Articles with Semantic Profile and Joint Word Embedding 被引量:11
3
作者 Ming Liu Bo Lang +1 位作者 Zepeng Gu Ahmed Zeeshan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2017年第6期619-632,共14页
Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the sema... Long-document semantic measurement has great significance in many applications such as semantic searchs, plagiarism detection, and automatic technical surveys. However, research efforts have mainly focused on the semantic similarity of short texts. Document-level semantic measurement remains an open issue due to problems such as the omission of background knowledge and topic transition. In this paper, we propose a novel semantic matching method for long documents in the academic domain. To accurately represent the general meaning of an academic article, we construct a semantic profile in which key semantic elements such as the research purpose, methodology, and domain are included and enriched. As such, we can obtain the overall semantic similarity of two papers by computing the distance between their profiles. The distances between the concepts of two different semantic profiles are measured by word vectors. To improve the semantic representation quality of word vectors, we propose a joint word-embedding model for incorporating a domain-specific semantic relation constraint into the traditional context constraint. Our experimental results demonstrate that, in the measurement of document semantic similarity, our approach achieves substantial improvement over state-of-the-art methods, and our joint word-embedding model produces significantly better word representations than traditional word-embedding models. 展开更多
关键词 document semantic similarity text understanding semantic enrichment word embedding scientific literature analysis
原文传递
Guidelines for Creating a Rule-Based Knowledge Learning System and Their Application to a Chinese Business Card Layout Analysis
4
作者 潘武模 王庆人 《Journal of Computer Science & Technology》 SCIE EI CSCD 2001年第1期47-56,共10页
Rule selection has long been a problem of great challenge that has to be solved when developing a rule-based knowledge learning system. Many methods have been proposed to evaluate the eligibility of a single rule base... Rule selection has long been a problem of great challenge that has to be solved when developing a rule-based knowledge learning system. Many methods have been proposed to evaluate the eligibility of a single rule based on some criteria. However, in a knowledge learning system there is usually a set of rules. These rules are not independent, but interactive. They tend to affect each other and form a rulesystem. In such case, it is no longer reasonable to isolate each rule from others for evaluation. A best rule according to certain criterion is not always the best one for the whole system. Furthermore, the data in the real world from which people want to create their learning system are often ill-defined and inconsistent. In this case, the completeness and consistency criteria for rule selection are no longer essential. In this paper, some ideas about how to solve the rule-selection problem in a systematic way are proposed. These ideas have been applied in the design of a Chinese business card layout analysis system and gained a good result on the training data set of 425 images. The implementation of the system and the result are presented in this paper. 展开更多
关键词 rule-based system knowledge learning layout analysis document image understanding business card
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部