Rich-text document styling restoration via reinforcement learning 被引量：1

导出

摘要 Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside them is usually missing,making them improper or even burdensome to be displayed and edited in different formats and platforms.In this study we formulate the task of document styling restoration as an optimization problem,which aims to identify the styling settings on the document elements,e.g.,lines,table cells,text,so that rendering with the output styling settings results in a document,where each element inside it holds the(closely)exact position with the one in the original document.Considering that each styling setting is a decision,this problem can be transformed as a multi-step decision-making task over all the document elements,and then be solved by reinforcement learning.Specifically,Monte-Carlo Tree Search(MCTS)is leveraged to explore the different styling settings,and the policy function is learnt under the supervision of the delayed rewards.As a case study,we restore the styling information inside tables,where structural and functional data in the documents are usually presented.Experiment shows that,our best reinforcement method successfully restores the stylings in 87.65%of the tables,with 25.75%absolute improvement over the greedymethod.We also discuss the tradeoff between the inference time and restoration success rate,and argue that although the reinforcement methods cannot be used in real-time scenarios,it is suitable for the offline tasks with high-quality requirement.Finally,this model has been applied in a PDF parser to support cross-format display.

作者 Hongwei LI Yingpeng HU Yixuan CAO Ganbin ZHOU Ping LUO

机构地区 Key Lab of Intelligent Information Processing of Chinese Academy of Sciences(CAS) University of Chinese Academy of Sciences Search Product Center

出处《Frontiers of Computer Science》 SCIE EI CSCD 2021年第4期93-103,共11页 中国计算机科学前沿（英文版）

基金 This work was supported by the National Key Research and Development Program of China(2017YFB1002104) the National Natural Science Foundation of China(Grant No.U1811461) the Innovation Program of Institute of Computing Technology,CAS.

关键词 styling restoration monte-carlo tree search reinforcement learning richly formatted documents TABLES

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

同被引文献1

1张建东,陈仕吉,徐小婷,左文革.基于词向量的PDF表格抽取研究[J].数据分析与知识发现,2021,5(8):34-44. 被引量：6

引证文献1

1罗平,杨清平,曹逸轩,曹荣禹,何清.非关系型表格理解前沿进展[J].中文信息学报,2024,38(5):1-21.

1Yuan Song,Hongwei Wang,Maoran Zhu.Sustainable strategy for corporate governance based on the sentiment analysis of financial reports with CSR[J].Financial Innovation,2018,4(1):30-43. 被引量：1
2Zewei Sun,Hanwen Liu,Chao Yan,Ran An.Natural Disasters Warning for Enterprises Through Fuzzy Keywords Search[J].Tsinghua Science and Technology,2021,26(4):558-564.
3Mohanned Abduljabbar Hael,Yongsheng Yuan.Identifying Extreme Rainfall Events Using Functional Outliers Detection Methods[J].Journal of Data Analysis and Information Processing,2020,8(4):282-294.
4Justin Ushize Rutikanga,Aliou Diop.Functional Kernel Estimation of the Conditional Extreme Quantile under Random Right Censoring[J].Open Journal of Statistics,2021,11(1):162-177.
5王亚杰,祁冰枝,张云博,丁傲冬.结合神经网络的改进UCT在国际跳棋中的应用[J].重庆理工大学学报（自然科学）,2021,35(7):259-265. 被引量：5
6Oluwatolani Achimugu,Philip Achimugu,Chinonyelum Nwufoh,Sseggujja Husssein,Ridwan Kolapo,Tolulope Olufemi.An Improved Approach for Generating Test Cases during Model-Based Testing Using Tree Traversal Algorithm[J].Journal of Software Engineering and Applications,2021,14(6):257-265.
7Alan A.Luo.Magnesium casting technology for structural applications[J].Journal of Magnesium and Alloys,2013,1(1):2-22. 被引量：94
8Hao GONG,Baicun WANG,Haijun LIANG,Zuoxian LUO,Yaofeng CAO.Strategic analysis of China’s geothermal energy industry[J].Frontiers of Engineering Management,2021,8(3):390-401. 被引量：2
9董砚,康学斌,雷兆明,卢禹.基于蒙特卡洛树搜索的智能天车倒垛优化方法[J].高技术通讯,2021,31(7):705-712.
10Jeff Skousen,Carl E.Zipper.Post-mining policies and practices in the Eastern USA coal region[J].International Journal of Coal Science & Technology,2014,1(2):135-151. 被引量：8

Frontiers of Computer Science

2021年第4期

浏览历史

内容加载中请稍等...

Rich-text document styling restoration via reinforcement learning 被引量：1

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史