摘要
电子图画书具有较好的阅读便捷性和可获得性,但其阅读信息的冗杂性以及提取的复杂性无形中加大了儿童阅读的难度。在B/S架构体系上引入CNN以实现信息提取,并考虑到敏感性词汇带来的阅读困扰,以信息熵改进下的词权重实现TF-IDF算法的优化,最终完成自动化采集系统的设计。实验结果表明,改进TF-IDF算法能有效避免算法过拟合,信息识别最高准确率为92.14%,单字及词组检索的AUC值为0.958和0.971,系统延迟时间小于1.7 s。该自动化信息识别系统能在有效保证阅读信息的完整性、针对性以及交互性,极大地提高了阅读效率和质量。
The electronic picture book has better reading convenience and accessibility, but the complexity of reading information and the complexity of extraction virtually increase the difficulty of children's reading. Therefore, the research introduces CNN into the B/S architecture to achieve information extraction, and takes into account the reading difficulties caused by sensitive words, optimizes the TF-IDF algorithm with the word weight improved by information entropy, and finally completes the design of the automatic acquisition system. The experimental results show that the improved TF-IDF algorithm can effectively avoid algorithm over-fitting, the maximum accuracy rate of information recognition is 92.14%, the AUC value of single word and phrase retrieval is 0.958 and 0.971, and the system delay time is less than 1.7 s. The automatic information recognition system can effectively ensure the integrity, pertinence and interactivity of reading information, and greatly improve reading efficiency and quality.
作者
艾雪银
尚思琪
AI Xueyin;SHANG Siqi(Xianyang Vocational and Technical College,Xi’an 721000,China;Zhaolun Zhongguan Coinage Site Museum,Xi’an 710000,China)
出处
《自动化与仪器仪表》
2023年第10期99-103,共5页
Automation & Instrumentation
基金
《高职院校学前教育专业服务乡村振兴战略的优化路径与发展策略》(2023JYB03)
《偶动画在小学美术校本课程的开发实践研究》(CXD2101)。