期刊文献+

深度强化学习在直肠癌IMRT自动计划的应用 被引量:4

Application of deep reinforcement learning in automatic IMRT planning for rectal cancer
下载PDF
导出
摘要 目的:对于调强放疗(IMRT)计划,优化过程较为耗时,且计划的质量取决于计划人员的经验和时间,本文探讨并实现一种无监督IMRT自动优化的方案,使其能够模拟人工操作方式进行治疗计划优化。方法:本研究基于深度强化学习框架,提出一种优化调整决策网络(OAPN)自动化计划优化的方法。利用Varian Eclipse 15.6 TPS的脚本应用程序接口(ESAPI)实现OAPN与TPS之间的交互,以剂量体积直方图作为信息输入,通过强化学习的训练方式,OAPN学习TPS中目标参数的调整策略,从而逐步改善并获得较高质量的计划。实验从临床数据库中选取18例既往已完成治疗的直肠癌病例,其中5例计划案例用于OAPN网络训练,其余13例计划案例用于评估训练后OAPN的可行性与有效性,引入第三方计划评分工具来衡量计划质量。结果:用于测试的13例直肠癌计划,使用统一的初始优化目标参数(OOPs)所获得的平均得分为(45.53±4.58)分(计划得分上限值为110),经过OAPN对OOPs调整后计划所获得的平均得分为(88.67±6.74)分。结论:OAPN借助ESAPI实现与TPS之间数据交互,通过深度强化学习的方式形成行为价值策略,经过训练后的OAPN可以对目标参数进行高效率的调整,同时获得较高质量计划。 Objective The optimization of intensity-modulated radiotherapy planning is often time-consuming,and the plan quality depends on the experience of the planner and the available planning time.An unsupervised automatic intensitymodulated radiotherapy optimization procedure is discussed and implemented to simulate the human operation during the whole optimization process.Methods Based on the framework of deep reinforcement learning(DRL),an optimization adjustment policy network(OAPN)was proposed to automate the process of treatment planning optimization.The scripting application programming interface(ESAPI)of Varian Eclipse 15.6 TPS was used to realize the interaction between OAPN and TPS.Taking dose-volume histogram as the information input,OAPN learned the adjustment strategy of objective parameters in TPS by the training mode of reinforcement learning,so as to gradually improve and obtain high-quality plans.A total of 18 cases of rectum cancer which had completed treatment were selected from the clinical database.Five of the cases were used for OAPN training,and the remaining 13 for evaluating the feasibility and effectiveness of OAPN after training.Finally,a third-party scoring tool was used to evaluate plan quality.Results The average score of 13 tested plans using uniform initial optimization objective parameters(OOPs)was 45.53±4.58(the upper limit value was 110).After adjusting OOPs by OAPN,the average plan score was 88.67±6.74.Conclusion OAPN can realize the data interaction with TPS through ESAPI,and form an action-value strategy through DRL.After training,OAPN can efficiently adjust OOPs and obtain a high-quality plan.
作者 王翰林 刘嘉城 王清莹 岳海振 杜乙 张艺宝 王若曦 吴昊 WANG Hanlin;LIU Jiacheng;WANG Qingying;YUE Haizhen;DU Yi;ZHANG Yibao;WANG Ruoxi;WU Hao(Key Laboratory of Carcinogenesis and Translational Research(Ministry of Education)/Department of Radiotherapy,Peking University Cancer Hospital&Institute,Beijing 100142,China)
出处 《中国医学物理学杂志》 CSCD 2022年第1期1-8,共8页 Chinese Journal of Medical Physics
基金 国家重点研发计划(2019YFF01014405) 北京市医管局培育计划(PX2019042) 北京市自然科学基金(1202009) 国家自然科学基金(12005007) 中央高校基本科研业务费/北京大学临床医学+X青年专项(PKU2020LCXQ019)。
关键词 直肠癌 自动优化 深度强化学习 脚本应用程序接口 优化调整决策网络 rectum cancer automatic optimization deep reinforcement learning Eclipse scripting application programming interface optimization adjustment policy network
  • 相关文献

同被引文献22

引证文献4

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部