A new method of automatic Chinese term extraction is proposed based on Patricia (PAT) tree. Mutual information is calculated based on prefix searching in PAT tree of domain corpus to estimate the internal associativ...A new method of automatic Chinese term extraction is proposed based on Patricia (PAT) tree. Mutual information is calculated based on prefix searching in PAT tree of domain corpus to estimate the internal associative strength between Chinese characters in a string. It can improve the speed of term candidate extraction largely compared with methods based on domain corpus directly. Common collocation suffix, prefix bank are constructed and term part of speech (POS) composing rules are summarized to improve the precision of term extraction. Experiment results show that the F-measure is 74.97%.展开更多
In the context of interdisciplinary research,using computer technology to further mine keywords in cultural texts and carry out semantic analysis can deepen the understanding of texts,and provide quantitative support ...In the context of interdisciplinary research,using computer technology to further mine keywords in cultural texts and carry out semantic analysis can deepen the understanding of texts,and provide quantitative support and evidence for humanistic studies.Based on the novel A Dream of Red Mansions,the automatic extraction and classification of those sentiment terms in it were realized,and detailed analysis of large-scale sentiment terms was carried out.Bidirectional encoder representation from transformers(BERT) pretraining and fine-tuning model was used to construct the sentiment classifier of A Dream of Red Mansions.Sentiment terms of A Dream of Red Mansions are divided into eight sentimental categories,and the relevant people in sentences are extracted according to specific rules.It also tries to visually display the sentimental interactions between Twelve Girls of Jinling and Jia Baoyu along with the development of the episode.The overall F_(1) score of BERT-based sentiment classifier reached 84.89%.The best single sentiment score reached 91.15%.Experimental results show that the classifier can satisfactorily classify the text of A Dream of Red Mansions,and the text classification and interactional analysis results can be mutually verified with the text interpretation of A dream of Red Mansions by literature experts.展开更多
In the present era of Big Data the demand for developing efficient information processing techniques for different applications is expanding steadily.One such possible application is automatic creation of ontology.Suc...In the present era of Big Data the demand for developing efficient information processing techniques for different applications is expanding steadily.One such possible application is automatic creation of ontology.Such an ontology is often found to be helpful for answering queries for the underlying domain.The present work proposes a scheme for designing an ontology for agriculture domain.The proposed scheme works in two steps.In the first step it uses domain-dependent regular expressions and natural language processing techniques for automatic extraction of vocabulary pertaining to agriculture domain.In the second step semantic relationships between the extracted terms and phrases are identified.A rulebased reasoning algorithm RelExOnt has been proposed for the said task.Human evaluation of the term extraction output yields precision and recall of 75.7%and 60%,respectively.The relation extraction algorithm,RelExOnt performs well with an average precision of 86.89%.展开更多
Health records of traditional Chinese medicine contain valuable clinical inlormation which can be used for improvement of disease treatment and for medical research. In this paper, we present a practical iterative ext...Health records of traditional Chinese medicine contain valuable clinical inlormation which can be used for improvement of disease treatment and for medical research. In this paper, we present a practical iterative extraction method for extracting terms from the records. The method is based on a set of extraction rules, the Mesh, and the likelihood ratio technique, and achieved a precision rate of 88.18% and a recall rate of 94.21%.展开更多
文摘A new method of automatic Chinese term extraction is proposed based on Patricia (PAT) tree. Mutual information is calculated based on prefix searching in PAT tree of domain corpus to estimate the internal associative strength between Chinese characters in a string. It can improve the speed of term candidate extraction largely compared with methods based on domain corpus directly. Common collocation suffix, prefix bank are constructed and term part of speech (POS) composing rules are summarized to improve the precision of term extraction. Experiment results show that the F-measure is 74.97%.
基金supported by the Fundamental Research Funds for the Central Universities (2019XD-A03-3)the Beijing Key Lab of Network System and Network Culture (NSNC-202 A09)。
文摘In the context of interdisciplinary research,using computer technology to further mine keywords in cultural texts and carry out semantic analysis can deepen the understanding of texts,and provide quantitative support and evidence for humanistic studies.Based on the novel A Dream of Red Mansions,the automatic extraction and classification of those sentiment terms in it were realized,and detailed analysis of large-scale sentiment terms was carried out.Bidirectional encoder representation from transformers(BERT) pretraining and fine-tuning model was used to construct the sentiment classifier of A Dream of Red Mansions.Sentiment terms of A Dream of Red Mansions are divided into eight sentimental categories,and the relevant people in sentences are extracted according to specific rules.It also tries to visually display the sentimental interactions between Twelve Girls of Jinling and Jia Baoyu along with the development of the episode.The overall F_(1) score of BERT-based sentiment classifier reached 84.89%.The best single sentiment score reached 91.15%.Experimental results show that the classifier can satisfactorily classify the text of A Dream of Red Mansions,and the text classification and interactional analysis results can be mutually verified with the text interpretation of A dream of Red Mansions by literature experts.
文摘In the present era of Big Data the demand for developing efficient information processing techniques for different applications is expanding steadily.One such possible application is automatic creation of ontology.Such an ontology is often found to be helpful for answering queries for the underlying domain.The present work proposes a scheme for designing an ontology for agriculture domain.The proposed scheme works in two steps.In the first step it uses domain-dependent regular expressions and natural language processing techniques for automatic extraction of vocabulary pertaining to agriculture domain.In the second step semantic relationships between the extracted terms and phrases are identified.A rulebased reasoning algorithm RelExOnt has been proposed for the said task.Human evaluation of the term extraction output yields precision and recall of 75.7%and 60%,respectively.The relation extraction algorithm,RelExOnt performs well with an average precision of 86.89%.
文摘Health records of traditional Chinese medicine contain valuable clinical inlormation which can be used for improvement of disease treatment and for medical research. In this paper, we present a practical iterative extraction method for extracting terms from the records. The method is based on a set of extraction rules, the Mesh, and the likelihood ratio technique, and achieved a precision rate of 88.18% and a recall rate of 94.21%.