基于句子对齐的汉语句法结构推导的计算模型被引量：2

A Computational Model for Chinese Syntactic Structure Induction Based on Sentence Alignment

下载PDF

导出

摘要基于句子的相似性,提出了无指导的汉语句法结构推导方法.基本思想是:首先,在汉语句子库的基础上,通过句对之间的对齐,得到交替的相同片断和相异片断.然后,根据相同片断优先或相异片断优先策略,选取相应的对齐片断作为句子成分候选,并对可能因片断交叉而导致边界摩擦的候选进行歧义消解.最后,通过逐步归约句子成分,推导出汉语句法结构树.为了避免对齐过程中词的稀疏问题,还对部分具有明显规律的词事先作了归类处理.分别以词、词性以及词联合词性作为句子基本构成单元,评测了推导的句法结果.测试结果表明:对于3种构成单元,相异片断优先归约得到的结果的F值都超过了46%,均优于相同片断优先归约所得到的结果,最好的达到了49.52%,好于已报道的结果. This paper introduces an unsupervised learning framework of Chinese syntactic structure based sentences similarity. First, all sentence pairs in the Chinese sentence corpus are aligned, and each pair is partitioned into similarity segmentations and different ones which alternately occur, Then, aligned similarity segmentations or different ones are selected as potential constituent candidates based on the strategy of similarity priority or of difference priority respectively. As the boundary friction may be introduced in the later step, its disambiguation is further carried out. Finally, by inducing sentence constituents, the syntactic structures are learned. In order to reduce word sparseness in the process, some words are replaced by classes in advance. Three forms of the sentence units, such as the sequence of words, the sequence of POS （part of speech）-tags and the sequence of words with POS-tag, are examined and the learned syntactic structures are evaluated respectively. The results show that different priority strategy achieves a better performance than the similarity one, and the Fs are above 46% for all three forms, with the best one being 49.52%, which is better than those having been reported.

作者王厚峰王波

机构地区北京大学信息科学技术学院计算语言学研究所

出处《软件学报》 EI CSCD 北大核心 2007年第3期538-546,共9页 Journal of Software

基金 Supported by the National Natural Science Foundation of China under Grant Nos.60473138 60675035 (国家自然科学基金)

关键词句子对齐无指导学习边界摩擦相同优先相异优先汉语句法结构推导 sentence alignment unsupervised learning boundary friction similarity priority difference priority Chinese syntactic structure induction

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献14

1Brill E.Automatic grammar induction and parsing free text:A transformation-based approach.In:Proc.of the 31st Annual Meeting of the Association for Computational Linguistics.1993.259-265.http://acl.ldc.upenn.edu/P/P93/
2Pereira F,Schabes Y.Inside-Outside reestimation from partially bracketed corpora.In:Pros.of the 30th Annual Meeting of the Association for Computational Linguistics.1992.128-135.http://acl.ldc.upenn.edu/P/P92/
3Nakamura K,Matsumoto M.Incremental learning of context free grammar.In:Adriaans P,et al.,eds.Proc.of the Grammatical Inference:Algorithms and applications (ICGI-2002).LNAI 2484,Springer-Verlag,2002.174-184.
4Grunwall P.A minimum description length approach to grammar inference.In:Wermter S,Riloff E,Scheler G,eds.Proc.of the Symbolic,Connectionist and Statistical Approaches to Learning for Natural Language Processing.LNCS 1040,Springer-Verlag,1996.203-216.
5Wolff GJ.Unsupervised grammar induction in a framework of information compression by multiple alignment,unification and search.In:de la Higuera C,Adriaans P,van Zaanen M,Oncina J,eds.Proc.of the Workshorp at ECML/PKDD2003:Learning Context-Free Grammars.2003.114-124.http://ilk.uvt.nl/～mvzaanen/ECMLPKDD/talks.html
6Klein D,Manning CD.A generative constituent-context model for improved grammar induction.In:Proc.of the 40th Annual Meeting of the Association for Computational Linguistics.2002.128-135.http://acl.ldc.upenn.edu/P/P02/
7Klein D.The unsupervised learning of natural language structure[Ph.D.Thesis].Stanford University,2005.
8Clark A.Unsupervised induction of stochastic context-free grammars using distributional clustering.In:Daelemans W,Zajac R,eds.Proc.of the CoNLL 2001.Morgan Kaufmann.2001.105-112.
9Adriaans P,Trautwein M,Vervoort M.Towards high speed grammar induction on large text corpora.In:Hlavac V,Feffrey G,Wiedermann J,eds.Proc.of the SOFSEM-2000,Theory and Practice of Informatics.LNCS 1963,Springer-Verlag,2000.173-186.
10van Zaanen M.Bootstrapping syntax and recursion using alignment-based learning.In:Langley P,ed.Proc.of the 17th Int'l Conf.on Machine Learning.Morgan Kaufmann.2000.1063-1070.

同被引文献20

1党政法,周强.短语树到依存树的自动转换研究[J].中文信息学报,2005,19(3):21-27. 被引量：12
2刘世岳,李珩,张俐,姚天顺.Co-training机器学习方法在中文组块识别中的应用[J].中文信息学报,2005,19(3):73-79. 被引量：8
3冯志伟.自然语言处理中的概率语法[J].当代语言学,2005,7(2):166-178. 被引量：10
4李幸,宗成庆.引入标点处理的层次化汉语长句句法分析方法[J].中文信息学报,2006,20(4):8-15. 被引量：22
5刘挺,马金山,李生.基于词汇支配度的汉语依存分析模型[J].软件学报,2006,17(9):1876-1883. 被引量：24
6刘智博,Michael Brasser,郑方,徐明星.一个基于文本输入的口语对话系统的新的实现策略[J].计算机科学,2006,33(11):205-209. 被引量：3
7徐艳华,陈小荷.面向自动句法分析的“V+V”结构歧义研究[J].计算机工程与应用,2006,42(33):150-152. 被引量：2
8段湘煜,赵军,徐波.基于动作建模的中文依存句法分析[J].中文信息学报,2007,21(5):25-30. 被引量：11
9YAN Pengju,,ZHENG Fang,SUN Hui,et al.Spontaneousspeech parsing in travel information inquiring and bookingsystems. Journal of Computer Science and Technology . 2002
10Adriaans P,Trautwein M,Vervoort M.Towards high speedgrammar induction on large text corpora. Proc of theSOFSEM-2000 . 2000

引证文献2

1张合,邬晓钧,王晓东,郑方.一种基于句子分割的文法自动推导算法[J].清华大学学报（自然科学版）,2009(S1):1322-1327.
2陆昊翔.句法分析和结构识别研究综述[J].科学与信息化,2022(20):31-33.

1刘晓亮,李家滨.基于数据挖掘的网络入侵检测系统研究[J].计算机应用与软件,2009,26(4):253-256. 被引量：8
2李世奇,赵铁军,陈晨,刘鹏远.基于ART网络的无指导中文共指消解方法[J].高技术通讯,2009,19(9):926-932.
3赵铁军,李生,周明.实用化的汉语句法分析策略及其实现[J].情报学报,1992,11(4):300-307. 被引量：2
4王厚峰,戴大为.汉语句法结构标注的研究[J].计算机研究与发展,1997,34(3):235-240. 被引量：2
5石晶,李万龙.汉语语义分析方法研究[J].计算机应用研究,2010,27(2):529-531. 被引量：4
6朱佳贤.无指导学习环境下基于属性相关性分析和聚类算法的属性选择问题研究[J].管理学报,2005,2(S2):162-165. 被引量：2
7李旭,刘国华,张东明.一种改进的汉语全文无指导词义消歧方法[J].自动化学报,2010,36(1):184-187. 被引量：6
8刘思思.边坡影响因素的自组织神经网络归类处理[J].企业技术开发,2005,24(11):35-36.
9陈凯,朱钰.机器学习及其相关算法综述[J].统计与信息论坛,2007,22(5):105-112. 被引量：84
10韩自豪.有指导的数据挖掘在心脏病风险评价中的应用[J].商情,2014(21):169-169.

软件学报

2007年第3期

浏览历史

内容加载中请稍等...

基于句子对齐的汉语句法结构推导的计算模型被引量：2

参考文献14

同被引文献20

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于句子对齐的汉语句法结构推导的计算模型 被引量：2

参考文献14

同被引文献20

引证文献2

相关作者

相关机构

相关主题

浏览历史

基于句子对齐的汉语句法结构推导的计算模型被引量：2