摘要
在分析公式特征的基础上,提出了一种将Parezen窗和Bayes分类规则相结合的公式抽取方法。对于孤立式公式采用改进后的Parzen窗方法将其从文档中抽取出来,对于内嵌公式通过Bayes分类规则将其从文本行中抽取出来。实验表明,这种抽取方法对中文文档具有较好的适应性和较高的成功率。
Based on the analysis of formula features, an approach composed of Parzen windows and Bayes theorem is proposed to extract mathematical formulas. Improved Parzen windows approach is used to extract the isolated formulas from the printed documents and Bayes theorem is used to extract the embedded formulas from the text lines. The experiments show that the combination of the two methods can obtain satisfactory results.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2006年第19期211-213,共3页
Computer Engineering
基金
河北省自然科学基金资助项目(F2004000132)