摘要
随着人工智能的发展,智能问答系统逐渐成为研究的热点,得到了越来越多研究者的关注。藏文问答系统不同于中英文等主流语种的问答系统,没有大量的结构化数据以支撑问答系统丰富全面的知识库引擎。本研究通过着力于面向小学藏语文课本数据领域的问答数据资源,通过规则筛选、人工校正、问句意图及相似度标注,构建了一个高质量的藏文问答数据集。经自动评价和实验验证,该数据集的问句和答复句具有较好的知识关联度,采用三分制的人工评价结果显示98%的数据样本符合小学生认知和藏文文语法规则,且问答对句子流畅、问题与答案相关性较高。通过Bert融合提取词和不融合提取词进行了意图分类和tf-idf+Bert相似度计算,分类结果准确率分别在75%和76%,相似度准确率在76%,这也验证了所构建面向小学藏语文课程知识问答语料库的有效性。
With the development of artificial intelligence,intelligent Q&A system has gradually become a hotspot of research in recent years,and has also gained more and more researchers’attention.Tibetan Q&A system is different from the Q&A system in the traditional sense of the mainstream languages such as Chinese and English and other popular languages,and does not have a large amount of structured data in order to support the Q&A system’s rich and comprehensive knowledge base engine.In this study,by focusing on the Q&A data resources oriented to the data domain of elementary school Tibetan textbooks,a high-quality Tibetan Q&A dataset is constructed through rule filtering,manual correction,question intent and similarity labeling.After automatic evaluation and experimental validation,the question and answer sentences in this dataset have good knowledge correlation,and the manual evaluation results using a three-point scale show that 98%of the data samples are in line with the cognition of elementary school students and the grammatical rules of the Tibetan language,and that the question and answer pairs have high sentence fluency and question-answer correlation,and that the intention categorization and the similarity calculation of the classification results are carried out through the fusion of the extracted words by Bert and the extracted words by Bert,respectively.The accuracy of the classification result is 75%and 76%respectively,and the accuracy of the similarity is 76%,which also verifies the validity of the constructed Q&A corpus for the elementary school Tibetan language curriculum.
作者
切羊卓玛
石海强
更太加
魏建国
Qieyang Zhuoma;Shi Haiqiang;Kuntharrgyal Khysru;Wei Jianguo(Key Laboratory of Artificial Intelligence Application Technology State Ethnic Affairs Commission,Qinghai Minzu University,Xining 810007,China;Henan County Middle School for Nationalities,Qinghai Provingce,Henan 811500,China;Qinghai Two Bombs and One Satellite Cadre College,Xihai Town 810299,China;Tianjin Key Laboratory of Cognitive Computing and Application,Tianjin University,Tianjin 300072,China)
出处
《青海科技》
2023年第4期164-174,共11页
Qinghai Science and Technology
基金
国家自然科学基金项目(62261045)
青海省重点研发与转化计划(2022-QY-218)
创新项目“面向小学藏语文课程的智能问答语料库构建”(09M2022001)。
关键词
小学藏语文课程
智能问答
语义关联
问答语料库
意图分类
Elementary Tibetan language curriculum
Intelligent quizzing
Semantic association
QA corpus
Intent categorization