基于高斯混合模型的生物医学领域双语句子对齐被引量：3

Sentence Alignment for Biomedicine Texts Based on Gaussian Mixture Model

下载PDF

导出

摘要双语术语词典在生物医学跨语言检索系统中有着非常重要的地位,而双语句子对齐是构建双语词典的第一步工作。为了构想面向生物医学领域的双语词典,该文将分类思想和迁移学习方法引入汉英句子对齐任务中,将句子对齐任务看成一个多类分类任务,考虑生物医学领域双语摘要的锚信息,利用高斯混合模型完成分类目标。同时,在模型训练过程中,该文引入了迁移学习的思想,结合无噪音的《新概念英语》双语语料对模型的句子长度特征进行训练,使得模型在测试语料上句子对齐的正确率得到较大提高。 A bilingual lexicon of biomedical terms plays an important role in biomedical cross-language information retrieval.Sentence alignment is the first step to build a bilingual lexicon.The Gaussian mixture model and transfer learning are applied to align sentences.The basic idea is to consider the sentence alignment as a classification task,which can be solved by the Gaussian mixture model classifiers based on the anchor information included in medical literature abstracts.At the same time,the sentence alignment model is built by combining biomedicine literature abstracts with New Concept English corpora,and it aims at applying transfer learning to train the length features and transfer them to the model.The experiments show it improves the performance of the sentence alignment model.

作者陈相林鸿飞杨志豪

机构地区大连理工大学信息检索研究室

出处《中文信息学报》 CSCD 北大核心 2010年第4期68-73,共6页 Journal of Chinese Information Processing

基金国家自然科学基金资助项目(60373095 60673039) 国家863高科技计划资助项目(2006AA01Z151) 教育部留学人员归国科研启动基金项目(教外司留[2007]118号) 国家社科基金资助项目(08BTQ025)

关键词计算机应用中文信息处理句子对齐高斯混合模型迁移学习锚信息 computer application Chinese information processing sentence alignment gaussian mixture model transfer learning anchor information

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Gale W. F. , Church K. W.. A program for alignment sentences in bilingual corpora[J].Computational Linguistics, 1993,19(1) :75-102.
2Brown P. F., Lai J. C., Mercer R. L.. Aligning sentences in parallel corpora[C]// Proceedings of the 29^th Annual Meeting of the Association for Computational Linguistics, Berkeley,CA, USA, 1991 : 169-176.
3Thomas C. , Kevin C. Aligning parallel bilingual corpora statistically with punctuation criteria[J]. Computational Linguistics and Chinese Language Processing, 2005,10(1) :95-122.
4Wu D. Aligning a parallel English-Chinese corpus statistically with lexical criteria[C]// Proceedings of the 32^th Annual Conference of the Association for Computational Linguistics. Las Cruces, NM, USA, 1994: 80-87.
5张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量：24
6Chen S. F.. Aligning sentences in bilingual corpora using lexical information[C]// Proceedings of the 31^th Annual Conference of the Association for Computational Linguistics, Columbus,USA, 1993: 9-16.
7吕学强,吴宏林,姚天顺.无双语词典的英汉词对齐[J].计算机学报,2004,27(8):1036-1045. 被引量：11
8Mohamed Abdel Fattah, David B. Bracewell, Fuji Ren. el al. Sentence alignment using P-NNT and GMM[J]. Computer Speech and Language, 2007,21 (4) :594-608.
9J. Pan, J. Kwok, Q. Yang. Adaptive localization in a dynamic Wifi environment through mutil-view learning [C]// Proceedings of the 22nd conference on artificial intelligence (AAAI-07), Vancouve, Canada , 2007:1108-1113.
10R. Raina, A Ng and D. Koller. Constructing informative priors using transfer learning[C]// Proceedings of the 23^th International Conference on Machine Learning ( ICML2006 ), Pittsburgh, USA, 2006: 713-720.

二级参考文献21

1刘小虎,吴葳,李生,赵铁军,蔡萌,鞠英杰.基于词典和统计的语料库词汇级对齐算法[J].情报学报,1997,16(1):21-27. 被引量：8
2Xu Dong-Hua. Aligning and matching of English-Chinese bilingual texts of CNS news. Department of Information System and Computer Science, National Univerisity of Singapore:Technical Report: cmp-lg/9608017, 1996
3Brown P.F., Lai J.C., Mercer R.L. et al.. Aligning sentences in parallel corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA, 1991, 169～176
4Gale W.A., Church K.W.. A program for aligning sentences in bilingual corpora. Computational Linguistics, 1993,19(1): 75～102
5Kay M., Roscheisen M.. Text-translation alignment.Computational Linguistics, 1993, 19(1): 121～142
6Chen S.F.. Aligning sentences in bilingual corpora using lexical information. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, OH, 1993, 9～16
7Wu De-Kai. Aligning a parallel English-Chinese corpus statistically with lexical criteria. In: Proceedings of the 32th Annual Conference of the Association for Computational Linguistics, Las Cruces, NM, 1994, 80～87
8Imamura K.. A hierarchical phrase alignment from English and Japanese bilingual text. In: Proceedings of the 2nd International Conference on Intelligent Text Processing and Computational Linguistics, Mexico, 2001, 206～207
9Ker S.J.,Chang J.S.. A class-based approach to word alignment. Computational Linguistics, 1997, 23(2): 313～344
10Borin L.. You'll take the high road and I'll take the low road: Using a third language to improve bilingual word alignment. In: Proceedings of the 18th International Conference of Computational Linguistics, Saarbrucken, Germany,2000, 97～103

共引文献31

1黄俊红,范云,黄萍.双语平行语料库对齐技术述评[J].外语电化教学,2007(6):21-25. 被引量：20
2昝红英,张霞,刘亚雷.基于多种长度单位的汉英句子分组对齐算法[J].郑州大学学报（理学版）,2009,41(2):33-36. 被引量：2
3李德俊.基于英汉平行语料库的词典编写系统CpsDict的研制[J].现代外语,2006,29(4):371-381. 被引量：14
4吴宏林,刘绍明,于戈.基于加权二部图的汉日词对齐[J].中文信息学报,2007,21(5):101-106. 被引量：7
5郭锐,宋继华,廖敏.基于自动句对齐的相似古文句子检索[J].中文信息学报,2008,22(2):87-91. 被引量：15
6李英,吐尔根.依布拉音.双语句子对齐算法分析[J].现代计算机,2008,14(12):71-74.
7陈相,林鸿飞.基于锚信息的生物医学文献双语摘要句子对齐[J].中文信息学报,2009,23(1):58-62. 被引量：4
8热西旦.塔依,吐尔根.依布拉音.汉文-维吾尔文双语语料库中基于词典译文的句子对齐方法研究[J].新疆大学学报（自然科学版）,2009,26(3):359-363. 被引量：4
9张霞,昝红英,张恩展.汉英句子对齐长度计算方法的研究[J].计算机工程与设计,2009,30(18):4356-4358. 被引量：7
10张亚军,吐尔根·依布拉音,田生伟.汉语-维吾尔语句子级对齐系统分析及其实现[J].中国科技纵横,2010(6):126-126. 被引量：3

同被引文献35

1张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量：24
2李维刚,刘挺,张宇,李生.基于长度和位置信息的双语句子对齐方法[J].哈尔滨工业大学学报,2006,38(5):689-692. 被引量：25
3黄红梅,李鹏,赵济民.宇称模糊逻辑与自然语言理解[J].现代电子技术,2007,30(8):84-86. 被引量：1
4Qian L H, Zhou G D. Dependency-directed tree ker- nel-based protein-protein interaction extraction from biomedical literature[C~//Proceedings of the 5th Inter- national Joint Conference on Natural Language Pro- cessing, Thailand ,2011: 10-19.
5LiS S, Xue Y X, Wang Z Q and Zhou G D. Active learning for cross-domain sentiment classification[-C~// Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. Beijing, China, 2013 ~ 2127-2133.
6Sampo P, Antti A, Juho H, et al. Comparative analy- sis o:~ five protein-protein interaction corpora~J~. BMCBioinformatics, 2008, 9 : $6.
7Miwa M, Saetre R, Miyao Y, et al. A rich feature vector for protein-protein interaction extraction from multiple corporaI-C]//Proceedings of the Association for Computational Linguistics, Singapore: World Sci- entific Publishing Co Pte Ltd. 2009: 121-130.
8Wei F M, Zhang J P, Chu Y, et al. FSFP: Transfer Learning From Long Texts to the Short~J]. Applied Mathematics ~ Information Sciences, 2014, 8 (4): 2033-2040.
9Yang P, Gao W, Tan Q, et al. A link-bridged topic model for cross-domain document classification[J']. Information Processing ~ Management, 2013, 49 (6) : 1181-1193.
10Zhou H, Zhang Y, Huang D, et al. Semi-supervised Learning with Trans[er Learning[J]. Chinese Compu- tational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer Berlin Heidelberg, 2013: 109-119.

引证文献3

1李文刚,周杰,杨保群.基于词典和句长及位置的双语对齐方法的改进[J].现代电子技术,2011,34(14):25-27. 被引量：2
2李丽双,郭瑞,黄德根,周惠巍.基于迁移学习的蛋白质交互关系抽取[J].中文信息学报,2016,30(2):160-167. 被引量：5
3梁继文,江川,王东波.基于多特征融合的先秦典籍汉英句子对齐研究[J].数据分析与知识发现,2020,4(9):123-132. 被引量：8

二级引证文献15

1赵哲焕,杨志豪,孙聪,林鸿飞.生物医学文献中的蛋白质关系抽取研究[J].中文信息学报,2018,32(7):82-90. 被引量：6
2代君,李佶壕,秦岩,王文欣.基于综述型文献的跨学科领域信息源地图绘制[J].图书情报知识,2018,35(6):61-74. 被引量：2
3蒋俊梅.基于平行语料库的双语术语抽取系统研究[J].现代电子技术,2016,39(15):108-111. 被引量：2
4高慧,贵彩虹.基于WEB技术的协同数据库分布交互仿真研究[J].计算机仿真,2019,36(6):341-345. 被引量：1
5梁继文,江川,王东波.基于多特征融合的先秦典籍汉英句子对齐研究[J].数据分析与知识发现,2020,4(9):123-132. 被引量：8
6邓三鸿,胡昊天,王昊,王东波.古文自动处理研究现状与新时代发展趋势展望[J].科技情报研究,2021,3(1):1-20. 被引量：26
7关慧,吕颖,贾成真.基于句法和语义的需求依赖关系自动获取[J].计算机技术与发展,2021,31(2):20-26. 被引量：4
8刘晨阳,唐慧丰.融入多特征的汉韩双语自动句对齐方法[J].智能计算机与应用,2021,11(1):28-31.
9黄水清,王东波.国内语料库研究综述[J].信息资源管理学报,2021,11(3):4-17. 被引量：48
10高瑞卿,董启文,方达,王弘治,方勇.数字技术下《老子》文本与先秦两汉典籍的关系挖掘[J].情报杂志,2021,40(10):99-107. 被引量：3

1李英,吐尔根.依布拉音.双语句子对齐算法分析[J].现代计算机,2008,14(12):71-74.
2陈相,林鸿飞.基于锚信息的生物医学文献双语摘要句子对齐[J].中文信息学报,2009,23(1):58-62. 被引量：4
3宋冰,李锡祚,安纪霞.双语对齐技术在双语教学词典自动编纂中的应用[J].计算机与数字工程,2007,35(11):153-157. 被引量：2
4刘昕,周明,朱胜火,黄昌宁.基于自动抽取词汇信息的双语句子对齐[J].计算机学报,1998,21(S1):151-158. 被引量：17
5术语词典[J].电脑爱好者（普及版）,2007(10):31-31.
6张希府,戴云徽,高志强.利用句法模式从术语词典中抽取语义关系[J].南京师范大学学报（工程技术版）,2008,8(4):43-45. 被引量：3
7宋培彦,李静静,刘宁静,暴二平.术语词典知识组织模型及辅助编纂系统设计[J].辞书研究,2014(2):35-40. 被引量：1
8鲁可,石庆升,张晓东.基于SVM的玻璃瓶缺陷分类算法研究[J].机电产品开发与创新,2015,28(1):23-25. 被引量：2
9热西旦木.塔依,吐尔根.依布拉音.汉文-维吾尔文双语对齐语料库构建的实验性研究[J].伊犁师范学院学报（自然科学版）,2008,2(4):33-37. 被引量：3
10王丽月,叶东毅.面向游戏客服场景的自动问答系统研究与实现[J].计算机工程与应用,2016,52(17):152-159. 被引量：11

中文信息学报

2010年第4期

浏览历史

内容加载中请稍等...

基于高斯混合模型的生物医学领域双语句子对齐被引量：3

参考文献13

二级参考文献21

共引文献31

同被引文献35

引证文献3

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

基于高斯混合模型的生物医学领域双语句子对齐 被引量：3

参考文献13

二级参考文献21

共引文献31

同被引文献35

引证文献3

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

基于高斯混合模型的生物医学领域双语句子对齐被引量：3