要提升同义词挖掘的效果通常需要现成的相关领域同义词库的支持。由于相关领域同义词库极其稀缺,给模型优化带来了阻碍。针对缺少相关领域同义词库而导致模型在相关领域的同义词挖掘效果难以持续提升的问题,提出了基于主动学习和持续学...要提升同义词挖掘的效果通常需要现成的相关领域同义词库的支持。由于相关领域同义词库极其稀缺,给模型优化带来了阻碍。针对缺少相关领域同义词库而导致模型在相关领域的同义词挖掘效果难以持续提升的问题,提出了基于主动学习和持续学习的同义词挖掘模型(SYN-AC)。首先,基于主动学习的方法获取专家标记数据,设计了一个新的损失函数并利用标记后的数据去微调模型;其次,为了减少时间和空间消耗,采用了持续学习的方法,使模型在只使用当前组标记的数据进行训练的情况下,也能不断提高同义词挖掘效果,而不需要每次都使用所有标记数据对模型重新微调。使用了3个数据集模拟专家标记的过程,实验结果表明,在其中2个数据集上比效果最好的BERT(Bidirectional Encoder Representations from Transformers)模型F1值分别提升了9.34个百分点和2.75个百分点。验证了SYN-AC能够有效提高同义词挖掘的效果。展开更多
民间文书是汉语词汇史和汉语词汇学研究的重要材料。从徽州契约文书材料中所见三组分别表示“推托”“怨悔”和“向上”义概念场的同义类聚材料出发,其每组概念场中的成员又可分成若干组并具有差异性义素。对徽州文书词汇的同义类聚现...民间文书是汉语词汇史和汉语词汇学研究的重要材料。从徽州契约文书材料中所见三组分别表示“推托”“怨悔”和“向上”义概念场的同义类聚材料出发,其每组概念场中的成员又可分成若干组并具有差异性义素。对徽州文书词汇的同义类聚现象进行研究的学术价值,表现在可以在此基础上进一步探讨民间文书词汇中同义类聚材料的丰富性、系统辨释同义词的同中之异的可操作性以及汉语词汇的系统性。Folk documents are important sources of information for the study of Chinese vocabulary history and Chinese lexicology. Starting from the three groups of synonymous vocabulary clusters found in Huizhou contracts, which respectively represent the conceptual fields of “delay”, “regret” and “upward”, it can be found that the members of each group of concept field words can be divided into several groups and have differential sememe. The academic value of studying the phenomenon of synonymous vocabulary clusters in Huizhou document vocabulary lies in the ability to further explore the richness of synonymous materials in folk documents, the operability of systematically distinguishing the similarities and differences among synonyms, and the systematic nature of Chinese vocabulary.展开更多
文摘要提升同义词挖掘的效果通常需要现成的相关领域同义词库的支持。由于相关领域同义词库极其稀缺,给模型优化带来了阻碍。针对缺少相关领域同义词库而导致模型在相关领域的同义词挖掘效果难以持续提升的问题,提出了基于主动学习和持续学习的同义词挖掘模型(SYN-AC)。首先,基于主动学习的方法获取专家标记数据,设计了一个新的损失函数并利用标记后的数据去微调模型;其次,为了减少时间和空间消耗,采用了持续学习的方法,使模型在只使用当前组标记的数据进行训练的情况下,也能不断提高同义词挖掘效果,而不需要每次都使用所有标记数据对模型重新微调。使用了3个数据集模拟专家标记的过程,实验结果表明,在其中2个数据集上比效果最好的BERT(Bidirectional Encoder Representations from Transformers)模型F1值分别提升了9.34个百分点和2.75个百分点。验证了SYN-AC能够有效提高同义词挖掘的效果。
文摘民间文书是汉语词汇史和汉语词汇学研究的重要材料。从徽州契约文书材料中所见三组分别表示“推托”“怨悔”和“向上”义概念场的同义类聚材料出发,其每组概念场中的成员又可分成若干组并具有差异性义素。对徽州文书词汇的同义类聚现象进行研究的学术价值,表现在可以在此基础上进一步探讨民间文书词汇中同义类聚材料的丰富性、系统辨释同义词的同中之异的可操作性以及汉语词汇的系统性。Folk documents are important sources of information for the study of Chinese vocabulary history and Chinese lexicology. Starting from the three groups of synonymous vocabulary clusters found in Huizhou contracts, which respectively represent the conceptual fields of “delay”, “regret” and “upward”, it can be found that the members of each group of concept field words can be divided into several groups and have differential sememe. The academic value of studying the phenomenon of synonymous vocabulary clusters in Huizhou document vocabulary lies in the ability to further explore the richness of synonymous materials in folk documents, the operability of systematically distinguishing the similarities and differences among synonyms, and the systematic nature of Chinese vocabulary.