摘要
树形结构的文本配置在分布式的测控数据处理软件中使用广泛,它的正确性对数据处理而言至关重要。为了实现树形结构的文本配置自动检查和纠错,通过引入LD(Levenshtein Distance)编辑距离算法,把字符串的编辑操作推广到多叉树之间。在此基础上定义了多叉树之间的编辑距离,建立了衡量多叉树之间相似度的方法,设计了基于模糊匹配的文本配置自动校对流程,解决了精确匹配时由字符的多义性导致的查全率失真和误判的问题。根据实验结果,查全率和查准率分别达到了87.5%和100%,有效提高了基于树形结构的文本配置自动校验时的可靠性。
The correctness of text configuration based on tree structure is critically important for data processing as it is widely used in distributed data processing software of TTC(Tracking,Telemetry and Command).To achieve automatic proofreading of text configuration based on tree structure,Levenshtein Distance is introduced to extend edit operations between strings to multi-branches trees.On basis of this,tree Levenshtein Distance is defined,a method for measuring similarity between trees is developed,and a text proofreading flow with fuzzy matching method is designed.Distortion of precision rate and misjudgment caused by polysemy of characters in accurate matching are solved.According to experimental results,the recall ratio and precision ratio are up to 87.5% and100% respectively,significantly improving the reliability of automatic text proofreading based on tree structure.
出处
《飞行器测控学报》
CSCD
2015年第4期389-394,共6页
Journal of Spacecraft TT&C Technology
基金
上海航天科技创新基金资助(SAST201251)
关键词
字符串相似度
树编辑距离
模糊匹配
文本校验
similarity between strings
tree Levenshtein distance
fuzzy matching
text proofreading