期刊文献+

基于神经网络和多样化特征的表格单元格分类方法

Tabular Cell Classification Based on Neural Networks and Multiple Features
下载PDF
导出
摘要 在大数据时代,电子表格无处不在,它们的结构样式多变、语义丰富。为了自动化地理解电子表格的逻辑结构,一项关键的步骤是对表格单元格分类,区分出标题单元格和内容单元格。为完成表格单元格分类,首先抽取来自表格的结构、样式和语义的6种特征,其次基于深度学习的方法对多样化的特征进行编码和融合,最后构建了一个U-Net结构的神经网络模型来学习特征与单元格类型间的关系。实验结果显示了特征选择和模型结构设计的合理性,证明了所提方法的有效性。 Spreadsheets are ubiquitous in the era of big data,built with varied structures and rich semantics.A key step in automatically understanding the logical structure of a spreadsheet is to classify the tabular cells,distinguish header cells and content cells.In order to complete the classification of tabular cells,this paper first extracts six different features from the structure,style and semantics of spreadsheets,and then encodes and fuses diverse features based on deep learning methods,and finally builds a U-Net neural network model to learn the relationship between features and tabular cell types.Experimental results indicate the rationality of feature selection and model structure design,and demonstrate the effectiveness of the proposed method.
作者 彭滢 吴杰 齐伟钢 PENG Ying;WU Jie;QI Weigang(Westone Information Industry Inc.,Chengdu Sichuan 610041,China)
出处 《通信技术》 2022年第9期1146-1152,共7页 Communications Technology
关键词 电子表格 表格单元格分类 深度学习 特征融合 spreadsheet tabular cell classification deep learning feature fusion
  • 相关文献

参考文献2

二级参考文献11

共引文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部