摘要
以基于内在认知机理的知识发现理论为指导,针对汉语命名实体识别的难点,充分考虑专家知识在命名实体识别中的作用;根据不同的实体类型,采用灵活变化的统计与规则相结合的方式;采用各种技术来研究信息抽取的任务,如:机器学习技术、篇章分析与理解技术、句法分析技术、图算法与图挖掘技术、词计算技术、快速全文检索技术等;该文探讨的是不仅要从文本中获取简单子句中的关系,还要获得跨句子、段落中的实体关系。
Under the guidance of the Knowledge Discovery Theory based on Inner Cognitive Mechanism(KDTICM),this paper focuses on the difficult points of the Chinese named entity recognition.It takes into full eonsideration the role that expert knowledge plays in the named entity recognition.According to the different types of entities,this paper flexibly combines the method based on statistics and rules.A series of techniques are adopted to deal with the tasks of information extraction,such as machine learning,document understanding and analysis,parsing technique,graph algorithm and graph mining,computing with words,rapid full-text retrieval,etc.The object of exploration of this paper is to obtain from text not only the relations in simple sentences but also the entity relations across sentences and paragraphs.
出处
《计算机工程与应用》
CSCD
北大核心
2009年第14期1-6,21,共7页
Computer Engineering and Applications
基金
国家自然科学基金No.60675030
江西省自然科学基金No.0511035
No.2007GZS0358~~
关键词
信息抽取
内在认知机理
命名实体识别
共指消解
机器学习
information extraction
inner cognitive mechanism
name entity recognition
anaphora resolution
machine learning