摘要
随机噪音会引起半结构化数据结构和语义的变化,从而造成半结构化数据提取或向结构化自动转换障碍,为此本文在分析噪声对半结构化文本数据带来的影响基础下,提出了一种基于语法模板结构知识的自动化分词处理思路和方法 ,可以提高带噪音的半结构化文本数据自动分词准确性,可为类似问题提供参考。
Random noise always bring challenges to segment auto parsing for SEMI-structured data, and cause hindrance for extracting SEMI-structured data and autochange to structured data. With analyzing the effects bring by noise to the SEMI-structured data, this paper introduced a new method based on knowledge about the template that the SEMI-structured data organized by. The method could handle a set of noise and raise the accuracy rate for segment-parsing of SEMI-structured data, so it could be helpful for the research about homologous problems.
出处
《微型机与应用》
2015年第17期89-91,95,共4页
Microcomputer & Its Applications
基金
国家自然科学基金项目(61363019)
青海省创新能力促进计划项目(2014-ZJ-718
2014-ZJ-941Q)
关键词
半结构数据
分词
模板
噪音
SEMI-structured data
segment-parsing
knowledge about template
noise