Aim: To explore and analyze the feasibility of establishing a program of complex intervention in Traditional Chinese Medicine (TCM) based on Text Mining and Interviewing method. Methods: According to MRC, Constructing...Aim: To explore and analyze the feasibility of establishing a program of complex intervention in Traditional Chinese Medicine (TCM) based on Text Mining and Interviewing method. Methods: According to MRC, Constructing the program of complex intervention in TCM by Text Mining and Interviewing method should include 4 steps: 1) establishment of interview framework via normalization of extraction of ancient documents and Effectiveness of collection of modern periodical literatures;2) materialization of interview outline based on Focus Group Interview;3) rudimentary construction of complex intervention program based on Semi-structured Interview;4) evaluation of curative effect of complex intervention. Conclusions: It is feasible and significative to establish a program of complex intervention in TCM based on Text Mining and Interviewing method.展开更多
Often we encounter documents with text printed on complex color background. Readability of textual contents in such documents is very poor due to complexity of the background and mix up of color(s) of foreground text ...Often we encounter documents with text printed on complex color background. Readability of textual contents in such documents is very poor due to complexity of the background and mix up of color(s) of foreground text with colors of background. Automatic segmentation of foreground text in such document images is very much essential for smooth reading of the document contents either by human or by machine. In this paper we propose a novel approach to extract the foreground text in color document images having complex background. The proposed approach is a hybrid approach which combines connected component and texture feature analysis of potential text regions. The proposed approach utilizes Canny edge detector to detect all possible text edge pixels. Connected component analysis is performed on these edge pixels to identify candidate text regions. Because of background complexity it is also possible that a non-text region may be identified as a text region. This problem is overcome by analyzing the texture features of potential text region corresponding to each connected component. An unsupervised local thresholding is devised to perform foreground segmentation in detected text regions. Finally the text regions which are noisy are identified and reprocessed to further enhance the quality of retrieved foreground. The proposed approach can handle document images with varying background of multiple colors and texture;and foreground text in any color, font, size and orientation. Experimental results show that the proposed algorithm detects on an average 97.12% of text regions in the source document. Readability of the extracted foreground text is illustrated through Optical character recognition (OCR) in case the text is in English. The proposed approach is compared with some existing methods of foreground separation in document images. Experimental results show that our approach performs better.展开更多
In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorith...In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results.展开更多
本文探讨了句法复杂度作为一个多维度构念在量化文本工具定级中的预测效应和机制。50名师范大学英语专业师范生以“蓝思分析器”为参照进行小学三年级至初中三年级共700篇的英语阅读理解测评素材开发,研究团队选取了Jin et al.(2020)中...本文探讨了句法复杂度作为一个多维度构念在量化文本工具定级中的预测效应和机制。50名师范大学英语专业师范生以“蓝思分析器”为参照进行小学三年级至初中三年级共700篇的英语阅读理解测评素材开发,研究团队选取了Jin et al.(2020)中涉及的8个粗细粒度句法复杂度指标,并使用二语句法复杂度分析器(L2 Syntactic Complexity Analyzer,L2SCA,Lu 2010)对每篇改编后文本进行指标提取。本文聚焦句法复杂度这一多维度构念,旨在探讨如何使用句法复杂度进行阅读文本改编及难度调控。单因素方差分析结果表明,句法复杂度各指标在不同难度等级的文本上存在显著差异,路径分析的结果揭示了句法复杂度各子维度对素材难度的预测机制,本文讨论了如何通过具体的句子成分来微观调控文本的句法复杂度以调整文本难度,为国内一线教师的阅读文本改编实践提供了启示。展开更多
目前知识库问答(Knowledge base question answering,KBQA)技术无法有效地处理复杂问题,难以理解其中的复杂语义.将一个复杂问题先分解再整合,是解析复杂语义的有效方法.但是,在问题分解的过程中往往会出现实体判断错误或主题实体缺失...目前知识库问答(Knowledge base question answering,KBQA)技术无法有效地处理复杂问题,难以理解其中的复杂语义.将一个复杂问题先分解再整合,是解析复杂语义的有效方法.但是,在问题分解的过程中往往会出现实体判断错误或主题实体缺失的情况,导致分解得到的子问题与原始复杂问题并不匹配.针对上述问题,提出了一种融合事实文本的问解分解式语义解析方法.对复杂问题的处理分为分解-抽取-解析3个阶段,首先把复杂问题分解成简单子问题,然后抽取问句中的关键信息,最后生成结构化查询语句.同时,本文又构造了事实文本库,将三元组转化成用自然语言描述的句子,采用注意力机制获取更丰富的知识.在ComplexWebQuestions数据集上的实验表明,本文提出的模型在性能上优于其他基线模型.展开更多
With the advent of the information age, it will be more troublesome to search for a lot of relevant knowledge to find the information you need. Text reasoning is a very basic and important part of multi-hop question a...With the advent of the information age, it will be more troublesome to search for a lot of relevant knowledge to find the information you need. Text reasoning is a very basic and important part of multi-hop question and answer tasks. This paper aims to study the integrity, uniformity, and speed of computational intelligence inference data capabilities. That is why multi-hop reasoning came into being, but it is still in its infancy, that is, it is far from enough to conduct multi-hop question and answer questions, such as search breadth, process complexity, response speed, comprehensiveness of information, etc. This paper makes a text comparison between traditional information retrieval and computational intelligence through corpus relevancy and other computing methods. The study finds that in the face of multi-hop question and answer reasoning, the reasoning data that traditional retrieval methods lagged behind in intelligence are about 35% worse. It shows that computational intelligence would be more complete, unified, and faster than traditional retrieval methods. This paper also introduces the relevant points of text reasoning and describes the process of the multi-hop question answering system, as well as the subsequent discussions and expectations.展开更多
文摘Aim: To explore and analyze the feasibility of establishing a program of complex intervention in Traditional Chinese Medicine (TCM) based on Text Mining and Interviewing method. Methods: According to MRC, Constructing the program of complex intervention in TCM by Text Mining and Interviewing method should include 4 steps: 1) establishment of interview framework via normalization of extraction of ancient documents and Effectiveness of collection of modern periodical literatures;2) materialization of interview outline based on Focus Group Interview;3) rudimentary construction of complex intervention program based on Semi-structured Interview;4) evaluation of curative effect of complex intervention. Conclusions: It is feasible and significative to establish a program of complex intervention in TCM based on Text Mining and Interviewing method.
文摘Often we encounter documents with text printed on complex color background. Readability of textual contents in such documents is very poor due to complexity of the background and mix up of color(s) of foreground text with colors of background. Automatic segmentation of foreground text in such document images is very much essential for smooth reading of the document contents either by human or by machine. In this paper we propose a novel approach to extract the foreground text in color document images having complex background. The proposed approach is a hybrid approach which combines connected component and texture feature analysis of potential text regions. The proposed approach utilizes Canny edge detector to detect all possible text edge pixels. Connected component analysis is performed on these edge pixels to identify candidate text regions. Because of background complexity it is also possible that a non-text region may be identified as a text region. This problem is overcome by analyzing the texture features of potential text region corresponding to each connected component. An unsupervised local thresholding is devised to perform foreground segmentation in detected text regions. Finally the text regions which are noisy are identified and reprocessed to further enhance the quality of retrieved foreground. The proposed approach can handle document images with varying background of multiple colors and texture;and foreground text in any color, font, size and orientation. Experimental results show that the proposed algorithm detects on an average 97.12% of text regions in the source document. Readability of the extracted foreground text is illustrated through Optical character recognition (OCR) in case the text is in English. The proposed approach is compared with some existing methods of foreground separation in document images. Experimental results show that our approach performs better.
文摘In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results.
文摘目前知识库问答(Knowledge base question answering,KBQA)技术无法有效地处理复杂问题,难以理解其中的复杂语义.将一个复杂问题先分解再整合,是解析复杂语义的有效方法.但是,在问题分解的过程中往往会出现实体判断错误或主题实体缺失的情况,导致分解得到的子问题与原始复杂问题并不匹配.针对上述问题,提出了一种融合事实文本的问解分解式语义解析方法.对复杂问题的处理分为分解-抽取-解析3个阶段,首先把复杂问题分解成简单子问题,然后抽取问句中的关键信息,最后生成结构化查询语句.同时,本文又构造了事实文本库,将三元组转化成用自然语言描述的句子,采用注意力机制获取更丰富的知识.在ComplexWebQuestions数据集上的实验表明,本文提出的模型在性能上优于其他基线模型.
文摘With the advent of the information age, it will be more troublesome to search for a lot of relevant knowledge to find the information you need. Text reasoning is a very basic and important part of multi-hop question and answer tasks. This paper aims to study the integrity, uniformity, and speed of computational intelligence inference data capabilities. That is why multi-hop reasoning came into being, but it is still in its infancy, that is, it is far from enough to conduct multi-hop question and answer questions, such as search breadth, process complexity, response speed, comprehensiveness of information, etc. This paper makes a text comparison between traditional information retrieval and computational intelligence through corpus relevancy and other computing methods. The study finds that in the face of multi-hop question and answer reasoning, the reasoning data that traditional retrieval methods lagged behind in intelligence are about 35% worse. It shows that computational intelligence would be more complete, unified, and faster than traditional retrieval methods. This paper also introduces the relevant points of text reasoning and describes the process of the multi-hop question answering system, as well as the subsequent discussions and expectations.