本文介绍了在生物学英文文本纷繁芜杂的当今,面对中英文本的文化差异,生物医学自然语言处理(Natural Language processing for Biology,BioNLP)的基本概念和方法。归纳总结了BioNLP在挖掘生物医学文献信息中的重要方面。通过研究实例分...本文介绍了在生物学英文文本纷繁芜杂的当今,面对中英文本的文化差异,生物医学自然语言处理(Natural Language processing for Biology,BioNLP)的基本概念和方法。归纳总结了BioNLP在挖掘生物医学文献信息中的重要方面。通过研究实例分析了常见的以"词"、"句"、"篇"为语言单位的分析方法并指出这些方法的局限性,最后展望了生物医学计算语言学研究趋势。展开更多
自然语言文本形式的文档是软件项目的重要组成部分.如何帮助开发者在大量文档中进行高效、准确的信息定位,是软件复用领域中的一个重要研究问题.提出了一种基于代码结构知识的软件文档语义搜索方法.该方法从软件项目的源代码中解析出代...自然语言文本形式的文档是软件项目的重要组成部分.如何帮助开发者在大量文档中进行高效、准确的信息定位,是软件复用领域中的一个重要研究问题.提出了一种基于代码结构知识的软件文档语义搜索方法.该方法从软件项目的源代码中解析出代码结构图,并以此作为领域特定的知识来帮助机器理解自然语言文本的语义.这一语义信息与信息检索技术相结合,从而实现了对软件文档的语义检索.在StackOverflow问答文档数据集上的实验表明,与多种文本检索方法相比,该方法在平均准确率(mean average precision,简称MAP)上可以取得至少13.77%的提升.展开更多
Software tools are developed for computer realization of syntactic, semantic, and morphological models of natural language texts, using rule based programming. The tools are efficient for a language, which has free or...Software tools are developed for computer realization of syntactic, semantic, and morphological models of natural language texts, using rule based programming. The tools are efficient for a language, which has free order of words and developed morphological structure like Georgian. For instance, a Georgian verb has several thousand verb-forms. It is very difficult to express rules of morphological analysis by finite automaton and it will be inefficient as well. Resolution of some problems of full morphological analysis of Georgian words is impossible by finite automaton. Splitting of some Georgian verb-forms into morphemes requires non-deterministic search algorithm, which needs many backtrackings. To minimize backtrackings, it is necessary to put constraints, which exist among morphemes and verify them as soon as possible to avoid false directions of search. Software tool for syntactic analysis has means to reduce rules, which have the same members in different order. The authors used the tool for semantic analysis as well. Thus, proposed software tools have many means to construct efficient parser, test and correct it. The authors realized morphological and syntactic analysis of Georgian texts by these tools. In the presented paper, the authors describe the software tools and its application for Georgian language.展开更多
The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to exter...The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to external evidence by association analysis, without additional manual effort. These rules can be used to recognize the proper nouns in Chinese texts. The experimental result shows that our method is practical in some applications. Moreover, the method is language independent.展开更多
文摘本文介绍了在生物学英文文本纷繁芜杂的当今,面对中英文本的文化差异,生物医学自然语言处理(Natural Language processing for Biology,BioNLP)的基本概念和方法。归纳总结了BioNLP在挖掘生物医学文献信息中的重要方面。通过研究实例分析了常见的以"词"、"句"、"篇"为语言单位的分析方法并指出这些方法的局限性,最后展望了生物医学计算语言学研究趋势。
文摘自然语言文本形式的文档是软件项目的重要组成部分.如何帮助开发者在大量文档中进行高效、准确的信息定位,是软件复用领域中的一个重要研究问题.提出了一种基于代码结构知识的软件文档语义搜索方法.该方法从软件项目的源代码中解析出代码结构图,并以此作为领域特定的知识来帮助机器理解自然语言文本的语义.这一语义信息与信息检索技术相结合,从而实现了对软件文档的语义检索.在StackOverflow问答文档数据集上的实验表明,与多种文本检索方法相比,该方法在平均准确率(mean average precision,简称MAP)上可以取得至少13.77%的提升.
文摘Software tools are developed for computer realization of syntactic, semantic, and morphological models of natural language texts, using rule based programming. The tools are efficient for a language, which has free order of words and developed morphological structure like Georgian. For instance, a Georgian verb has several thousand verb-forms. It is very difficult to express rules of morphological analysis by finite automaton and it will be inefficient as well. Resolution of some problems of full morphological analysis of Georgian words is impossible by finite automaton. Splitting of some Georgian verb-forms into morphemes requires non-deterministic search algorithm, which needs many backtrackings. To minimize backtrackings, it is necessary to put constraints, which exist among morphemes and verify them as soon as possible to avoid false directions of search. Software tool for syntactic analysis has means to reduce rules, which have the same members in different order. The authors used the tool for semantic analysis as well. Thus, proposed software tools have many means to construct efficient parser, test and correct it. The authors realized morphological and syntactic analysis of Georgian texts by these tools. In the presented paper, the authors describe the software tools and its application for Georgian language.
基金The National Hi-Tech Research and Development Program ( 863 )of China ( No2002AA119050)
文摘The issue of proper names recognition in Chinese text was discussed. An automatic approach based on association analysis to extract rules from corpus was presented. The method tries to discover rules relevant to external evidence by association analysis, without additional manual effort. These rules can be used to recognize the proper nouns in Chinese texts. The experimental result shows that our method is practical in some applications. Moreover, the method is language independent.