A Semi-automatic Method Based on Statistic for Mandarin Semantic Structures Extraction in Specific Domains 被引量：1

A Semi-automatic Method Based on Statistic for Mandarin Semantic Structures Extraction in Specific Domains

下载PDF

导出

摘要 This paper proposed a new method of semi-automatic extraction for semantic structures from unlabelled corpora in specific domains. The approach is statistical in nature. The extracted structures can be used for shallow parsing and semantic labeling. By iteratively extracting new words and clustering words, we get an inital semantic lexicon that groups words of the same semantic meaning together as a class. After that, a bootstrapping algorithm is adopted to extract semantic structures. Then the semantic structures are used to extract This paper proposed a new method of semi-automatic extraction for semantic structures from unlabelled corpora in specific domains. The approach is statistical in nature. The extracted structures can be used for shallow parsing and semantic labeling. By iteratively extracting new words and clustering words, we get an inital semantic lexicon that groups words of the same semantic meaning together as a class. After that, a bootstrapping algorithm is adopted to extract semantic structures. Then the semantic structures are used to extract new

作者熊英朱杰孙静

机构地区 Dept. of Electronic Eng.

出处《Journal of Shanghai Jiaotong university(Science)》 EI 2004年第4期25-29,共5页 上海交通大学学报（英文版）

基金 FoundationResearchProgram,Science&TechnologyCommitteeofShanghaiMunicipality(No.01JC14033)

关键词语义结构语言模型半自动提取语义分组 NLU and augment the semantic lexicon. The resultant semantic structures are interpreted by persons and are amenable to hand-editing for refinement. In this experiment, the semi-automatically extracted structures S SA provide recall rate of 84.

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献6

1孙静,朱杰,徐向华.一种新的中文词自动聚类算法[J].上海交通大学学报,2003,37(z1):139-142. 被引量：2
2刘秉伟,黄萱菁,郭以昆,吴立德.基于统计方法的中文姓名识别[J].中文信息学报,2000,14(3):16-24. 被引量：48
3Marcus B,Marcinkiewicz M.Building a large annotated corpus of English: The Penn Treebank[].Computational Linguistics.1993
4Arai K,Wright J,Riccardi G,et al.Grammar fragment acquisition using syntactic and semantic clustering[].Speech Communication.1999
5Meng Henlen M,Siu Kai -Chung.Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries[].I EEE Transactions on Knowledge and Data Engineering.2002
6Chen K J,Liu S H.Word identification for Mandarin Chinese sentences. Proceedings of 15th International Conference on Computational Linguistics COLI NG-92 . 1992

二级参考文献5

1[1]Martin S, Ney H. Algorithms for bigram and trigram word clustering [A]. Proc European Conference Speech and Technology [C]. Madrid, Spain: [s.n.],1995. 1253-1256.
2[3]Farhat A, Isabelle J F. Clustering words for statistical language models based on contextual word similarity[A]. Proc ICASSP'96[C]. Atlanta, GA:IEEE,1996. 180-183.
3[4]Ries K, Bue F D, Waibel A. Class phrase models for language modeling[A]. Proceedings of the ICSLP'96[C]. Philadelphia, USA :[s.n.], 1996. 398-401.
4[5]Stolcje A. Entropy-based pruning of backoff language models [A]. Proc DRAPA News Transcription and Understanding Workshop [C]. Lansdwone, VA:[s.n.], 1998. 270-274.
5张俊盛,陈舜德,郑萦,刘显仲,柯淑津.多语料库作法之中文姓名辨识[J].中文信息学报,1992,6(3):7-15. 被引量：14

共引文献48

1王睿,张洁,张由仪,于禛,姚天昉.基于混合模型的中文命名实体抽取系统[J].清华大学学报（自然科学版）,2005,45(S1):1908-1914. 被引量：10
2张素香,高国洋,戚银城.基于条件随机场的中国人名识别方法[J].郑州大学学报（理学版）,2009,41(2):40-43. 被引量：7
3郑泽之,张普,杨建国.基于语料库的字母词语自动提取研究[J].中文信息学报,2005,19(2):78-85. 被引量：10
4李成城,赵述芳,刘建毅,钟义信.基于动态规划算法的专有名词切分[J].计算机应用研究,2005,22(7):78-80. 被引量：2
5王源媛,何中市.基于词性探测的中文姓名识别算法[J].计算机科学,2005,32(4):84-86. 被引量：2
6王桂平,林鹏.基于双侧语料评价模型的专业词汇识别算法[J].计算机与现代化,2005(9):13-15.
7高红,黄德根,杨元生.中文文本中外国人名与中国人名同步识别方法[J].小型微型计算机系统,2006,27(4):715-719. 被引量：1
8毋琳,郑逢斌,乔保军,汤赛丽.HENU汉语分词系统中的中文人名识别算法[J].计算机工程与应用,2006,42(14):180-182.
9吴芬芬,刘磊.基于神经网络的中文姓名抽取技术[J].吉林大学学报（理学版）,2006,44(3):411-414. 被引量：1
10刘竞,苏万力.统计和规则相结合的中文姓名识别方法研究[J].福建电脑,2006,22(7):92-92. 被引量：3

同被引文献13

1黄德根,马玉霞,杨元生.基于互信息的中文姓名识别方法[J].大连理工大学学报,2004,44(5):744-748. 被引量：12
2王振华,孔祥龙,陆汝占,刘绍明.结合决策树方法的中文姓名识别[J].中文信息学报,2004,18(6):10-15. 被引量：15
3王源媛,何中市.基于词性探测的中文姓名识别算法[J].计算机科学,2005,32(4):84-86. 被引量：2
4孙飞显,李涛,蒋亚平,王铁方,倪建成,龚勋.基于人工免疫原理的中文姓名识别方法[J].四川大学学报（工程科学版）,2006,38(1):98-102. 被引量：1
5贾品贵,杨一平,卢朋.基于统计方法的中文姓名识别研究[J].计算机工程与应用,2006,42(31):168-170. 被引量：3
6贾宁,张全.基于最大熵模型和规则的中文姓名识别[J].计算机工程与应用,2007,43(35):1-4. 被引量：6
7丁俊苗.人名、地名、机构名自动识别的形式化策略及意义[J].乐山师范学院学报,2009,24(2):51-54. 被引量：1
8胡文博,都云程,吕学强,施水才.基于多层条件随机场的中文命名实体识别[J].计算机工程与应用,2009,45(1):163-165. 被引量：25
9赵伟,李丹.SVM与错误驱动学习相结合的中文人名识别[J].长春工业大学学报,2009,30(4):396-400. 被引量：3
10戴播,毛奇,袁春风.一种基于共坐标上升算法的人名识别方法[J].计算机应用与软件,2010,27(4):7-9. 被引量：2

引证文献1

1方玉萍,罗陈红,陈恳.基于姓氏用字驱动与统计的中文姓名识别方法的研究[J].计算机与现代化,2013(3):38-40. 被引量：1

二级引证文献1

1贺慧.蒙古族人名译名的自动识别研究[J].内蒙古工业大学学报（自然科学版）,2015,34(3):214-217.

1张帆.很有YBA的味道——评Audio Refinement的CD播放[J].高保真音响,2012(10):92-96.
2施万亚,望育梅,张琳,邓辉.移动IPv6中Bootstrapping问题研究[J].现代电信科技,2005(11):14-18.
3South Africa： Learning Mandarin More Than Personal Interest in South Africa[J].海外华文教育动态,2016(9):70-71.
4Yuanfang Yu,Zhenzhen Li,Wenhui Wang,Xitao Guo,Jie Jiang,Haiyan Nan,Zhenhua Ni.Investigation of multilayer domains in large-scale CVD monolayer graphene by optical imaging[J].Journal of Semiconductors,2017,38(3):69-75. 被引量：2
5Learning Mandarin from Birds[J].海外华文教育动态,2016(2):74-74.
6No Need to be Anxious It＇s Just Mandarin[J].海外华文教育动态,2016(5):80-81.
7遥感、遥测、遥控[J].中国无线电电子学文摘,2003,0(2):131-133.
8徐春笙,阿龙（摄影）.引导听众进入音乐的流动中 YBA Design GC10 CD播放机 GIA10合并机[J].视听前线,2010(7):10-12.
9Zhang Hongke(institute of Information Science, Northern Jiaotong University, Beijing 100044).GENERALIZED THRESHOLD DECOMPOSITION[J].Journal of Electronics(China),1997,14(1):63-67.
10劲浪与YBA[J].高保真音响,2005(6):49-49.

Journal of Shanghai Jiaotong university(Science)

2004年第4期

浏览历史

内容加载中请稍等...