期刊文献+

非结构化信息抽取关键技术研究探讨 被引量:10

Research on key technologies of unstructured information extraction
下载PDF
导出
摘要 以基于内在认知机理的知识发现理论为指导,针对汉语命名实体识别的难点,充分考虑专家知识在命名实体识别中的作用;根据不同的实体类型,采用灵活变化的统计与规则相结合的方式;采用各种技术来研究信息抽取的任务,如:机器学习技术、篇章分析与理解技术、句法分析技术、图算法与图挖掘技术、词计算技术、快速全文检索技术等;该文探讨的是不仅要从文本中获取简单子句中的关系,还要获得跨句子、段落中的实体关系。 Under the guidance of the Knowledge Discovery Theory based on Inner Cognitive Mechanism(KDTICM),this paper focuses on the difficult points of the Chinese named entity recognition.It takes into full eonsideration the role that expert knowledge plays in the named entity recognition.According to the different types of entities,this paper flexibly combines the method based on statistics and rules.A series of techniques are adopted to deal with the tasks of information extraction,such as machine learning,document understanding and analysis,parsing technique,graph algorithm and graph mining,computing with words,rapid full-text retrieval,etc.The object of exploration of this paper is to obtain from text not only the relations in simple sentences but also the entity relations across sentences and paragraphs.
出处 《计算机工程与应用》 CSCD 北大核心 2009年第14期1-6,21,共7页 Computer Engineering and Applications
基金 国家自然科学基金No.60675030 江西省自然科学基金No.0511035 No.2007GZS0358~~
关键词 信息抽取 内在认知机理 命名实体识别 共指消解 机器学习 information extraction inner cognitive mechanism name entity recognition anaphora resolution machine learning
  • 相关文献

参考文献38

  • 1Gaizauskas R,Wilks Y.Information extraction:Beyond document retfieval[J].Joumal of Documentation, 1997.
  • 2Sager N.Natural language information processing, reading [M].Massachusetts:Addison Wesley, 1981.
  • 3Dejong G.An overview of the FRUMP system[C]//Lehnert W,Ringle M H.Strategies for Natural Language Processing.Lawrence Erlbaum: [s.n.], 1982:149-176.
  • 4Chinehor N.MUC-6 named entity task defininition(Version 2.1 )[C]// Proceedings of the Sixth Message Understanding Conference,Columbia, Maryland, 1995.
  • 5Chinchor N.Statistical significance of MUC-6 results[C]//Proceedings of the Sixth Message Understanding Conference,Columbia, Maryland, 1995.
  • 6Chinchor N.MUC-6 named entity task defininition(Version 3.5)[C]// Proceedings of the Seventh Message Understanding Conference, Fairfax,Virginia, 1998.
  • 7Chinchor N.Statistical significance of MUC-7 resuhs[C]//Proceedings of the Seventh Message Understanding Conference,Fairfax, Virginia, 1998.
  • 8Chert H H,Ding Y W,Tsai S C,et al.Description of the NTU system used for MET2[C]//Proceedings of the Seventh Message Understanding Conference, 1998.
  • 9Yu S H,Bai S H,Wu P.Description of the kent ridge digital labs system used for MUC-7[C]//Proceedings of the Seventh Message Understanding Conference, 1998.
  • 10Zhang Y M ,Zhou J F.A trainable method for extracting Chinese entity names and their relations [C]//Proceedings of the Second Chinese Language Processing Workshop,Hong Kong,Oct 2000.

二级参考文献23

  • 1车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量:116
  • 2孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:87
  • 3梁晗,陈群秀,吴平博.基于事件框架的信息抽取系统[J].中文信息学报,2006,20(2):40-46. 被引量:38
  • 4罗智勇,宋柔.现代汉语自动分词中专名的一体化、快速识别方法[C]//Ji Dong-Hong.国际中文电脑学术会议,新加坡,2001:323-328.
  • 5Ji Heng, Luo Zhen-Shen. Inverse name frequency model and rules based on Chinese name identifying. In: Huang ChangNing, Zhang Pu ed.. Natural Language Understanding and Machine Translation. Beijing: Tsinghua University Press,2001, 123 - 128( in Chinese)(季姮,罗振声.基于反比概率模型和规则的中文姓名自动辨识系统.见:黄昌宁,张普编.自然语言理解与机器翻译.北京:清华大学出版社,2001,123-128)
  • 6Zhen Jia-Heng, Liu Kai-Ying. Discussion on strategy of surname and personal name processing in Chinese word segmentation. In: Chen Li-Wei ed.. Research and Application of Computational Linguistics. Beijing: Beijing Institute of Linguistics and Culture Press, 1993(in Chinese)(郑家恒刘开瑛.自动分词系统中姓氏人名的处理策略探讨.见:陈力为编.计算语言研究与应用.北京:北京语言学院出版社,1993)
  • 7Song Rou, Zhu Hong et al.. Approach of personal name recognition based on corpus and rules. In: Chen Li Wei ed.. Research and Application of Computational Linguistics. Beijing:Beijing Institute of Linguistics and Culture Press, 1993(in Chinese)(宋柔,朱宏等.基于语料库和规则库的人名识别法.见:陈力为编.计算语言研究与应用.北京:北京语言学院出版社,1993)
  • 8Wang Sheng, Huang De-Gen, Yang Yuan-Sheng. Chinese person name recognition based on mixture of statistics and rules.In: Huang Chang-Ning, Dong Zhen-Dong ed.. Corpora of Computational Linguistics. Beijing: Tsinghua University Press, 1999 (in Chinese)(王省,黄德根,杨元生.基于统计和规则相结合的中文姓名识别.见:黄昌宁,董振东编.计算语言学文集.北京:清华大学出版社,1999)
  • 9Chen Xiao-He. Automatic Analysis of Modern Chinese. Beijing: Beijing University Linguistics and Culture Press, 2000,104-114(in Chinese)(陈小荷.现代汉语自动分析.北京:北京语言文化大学出版社, 2000, 104-114 )
  • 10Rabiner L. R.. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of IEEE, 1989, 77(2): 257~286

共引文献157

同被引文献73

引证文献10

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部