Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chines...Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory,we proposed an integrated control method for indexing documents. It consists of 'feed-forward control','in-progress control' and 'feed-back control',aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents,the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from97.37% to 99.47%.Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce(BF) approach,the indexing efficiency has been reduced to some extent.Practical implications: The research is of both theoretical significance and practical value in improving the accuracy of automatic indexing of multilingual documents(not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.Originality/value: So far,few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents,especially Chinese-English mixed documents.展开更多
Emerging zoonotic diseases have received tremendous interests in recent years,as they pose a significant threat to human health,animal welfare,and economic stability.A high proportion of zoonoses originate from wildli...Emerging zoonotic diseases have received tremendous interests in recent years,as they pose a significant threat to human health,animal welfare,and economic stability.A high proportion of zoonoses originate from wildlife reservoirs.Rodents are the most numerous,widespread,and diverse group of mammals on the earth and are reservoirs for many zoonotic viruses responsible for significant morbidity and mortality.A better understanding of virome diversity in rodents would be of importance for researchers and professionals in the field.Therefore,we developed the DRodVir database(http://www.mgc.ac.cn/DRodVir/),a comprehensive,up-to-date,and well-curated repository of rodent-associated animal viruses.The database currently covers 7690 sequences from 5491 rodent-associated mammal viruses of 26 viral families detected from 194 rodent species in 93 countries worldwide.In addition to virus sequences,the database provides detailed information on related samples and host rodents,as well as a set of online analytical tools for text query,BLAST search and phylogenetic reconstruction.The DRodVir database will help virologists better understand the virome diversity of rodents.Moreover,it will be a valuable tool for epidemiologists and zoologists for easy monitoring and tracking of the current and future zoonotic diseases.As a data application example,we further compared the current status of rodent-associated viruses with bat-associated viruses to highlight the necessity for including additional host species and geographic regions in future investigations,which will help us achieve a better understanding of the virome diversities in the two major reservoirs of emerging zoonotic infectious diseases.展开更多
基金supported by the Shanghai International Studies University(Grant No.:2011114061)
文摘Purpose: The thrust of this paper is to present a method for improving the accuracy of automatic indexing of Chinese-English mixed documents.Design/methodology/approach: Based on the inherent characteristics of Chinese-English mixed texts and the cybernetics theory,we proposed an integrated control method for indexing documents. It consists of 'feed-forward control','in-progress control' and 'feed-back control',aiming at improving the accuracy of automatic indexing of Chinese-English mixed documents. An experiment was conducted to investigate the effect of our proposed method.Findings: This method distinguishes Chinese and English documents in grammatical structures and word formation rules. Through the implementation of this method in the three phases of automatic indexing for the Chinese-English mixed documents,the results were encouraging. The precision increased from 88.54% to 97.10% and recall improved from97.37% to 99.47%.Research limitations: The indexing method is relatively complicated and the whole indexing process requires substantial human intervention. Due to pattern matching based on a bruteforce(BF) approach,the indexing efficiency has been reduced to some extent.Practical implications: The research is of both theoretical significance and practical value in improving the accuracy of automatic indexing of multilingual documents(not confined to Chinese-English mixed documents). The proposed method will benefit not only the indexing of life science documents but also the indexing of documents in other subject areas.Originality/value: So far,few studies have been published about the method for increasing the accuracy of multilingual automatic indexing. This study will provide insights into the automatic indexing of multilingual documents,especially Chinese-English mixed documents.
基金supported by the National Major Science and Technology Project(2014ZX10004001)the National Key Research and Development Program(2016YFC1202404) of China+1 种基金supported by the Program for Changjiang Scholars and Innovative Research Team in University(IRT13007)the CAMS Innovation Fund for Medical Sciences(2016-I2M-1014)
文摘Emerging zoonotic diseases have received tremendous interests in recent years,as they pose a significant threat to human health,animal welfare,and economic stability.A high proportion of zoonoses originate from wildlife reservoirs.Rodents are the most numerous,widespread,and diverse group of mammals on the earth and are reservoirs for many zoonotic viruses responsible for significant morbidity and mortality.A better understanding of virome diversity in rodents would be of importance for researchers and professionals in the field.Therefore,we developed the DRodVir database(http://www.mgc.ac.cn/DRodVir/),a comprehensive,up-to-date,and well-curated repository of rodent-associated animal viruses.The database currently covers 7690 sequences from 5491 rodent-associated mammal viruses of 26 viral families detected from 194 rodent species in 93 countries worldwide.In addition to virus sequences,the database provides detailed information on related samples and host rodents,as well as a set of online analytical tools for text query,BLAST search and phylogenetic reconstruction.The DRodVir database will help virologists better understand the virome diversity of rodents.Moreover,it will be a valuable tool for epidemiologists and zoologists for easy monitoring and tracking of the current and future zoonotic diseases.As a data application example,we further compared the current status of rodent-associated viruses with bat-associated viruses to highlight the necessity for including additional host species and geographic regions in future investigations,which will help us achieve a better understanding of the virome diversities in the two major reservoirs of emerging zoonotic infectious diseases.