本文介绍了一种基于最大公共子串(Longest Common Substring,LCS)算法的术语抽取方法:按标点符号对领域文档进行切分;抽取切分后的语句片断的所有最大公共子串作为候选术语集;通过停用词过滤、对照领域词筛选和术语嵌套子串筛选等规...本文介绍了一种基于最大公共子串(Longest Common Substring,LCS)算法的术语抽取方法:按标点符号对领域文档进行切分;抽取切分后的语句片断的所有最大公共子串作为候选术语集;通过停用词过滤、对照领域词筛选和术语嵌套子串筛选等规则进行判别,得到最终的术语集。通过学前教育领域术语抽取的实验,验证了该算法可以有效地抽取中文领域术语:术语抽取平均准确率达84.2%;4~6字符双词术语抽取的效果尤佳,准确率接近100%。展开更多
数据流相似性查询广泛应用于智能家居、环境监测等领域.当前以LCSS(longest common subsequence)作为相似性测度函数的研究并不多.NAIVE算法使用基本动态规划方法计算测度函数值,通过该值与相似阈值的比较得到查询结果,对基于LCSS的数...数据流相似性查询广泛应用于智能家居、环境监测等领域.当前以LCSS(longest common subsequence)作为相似性测度函数的研究并不多.NAIVE算法使用基本动态规划方法计算测度函数值,通过该值与相似阈值的比较得到查询结果,对基于LCSS的数据流相似性查询问题进行研究.针对NAIVE算法必须在动态规划矩阵所有成员取值的计算完成后才能得到查询结果的缺点,提出了一种基于PS(possible solution)-CC(column critical)域优化策略的数据流相似性查询处理算法.该算法划定了每个窗口上动态规划矩阵的PS域和CC域,很好地利用了这2个域中成员所具有的性质和相似性查询的特点,无须获得测度函数的最终值便可得到查询结果,省略了很多矩阵成员的计算.实验部分证明了该算法的有效性,与同类算法相比,在处理具有更高精度结果要求的查询时效果更好.展开更多
Our previous work has introduced the newly generated program using the code transformation model GPT-2,verifying the generated programming codes through simhash(SH)and longest common subsequence(LCS)algo-rithms.Howeve...Our previous work has introduced the newly generated program using the code transformation model GPT-2,verifying the generated programming codes through simhash(SH)and longest common subsequence(LCS)algo-rithms.However,the entire code transformation process has encountered a time-consuming problem.Therefore,the objective of this study is to speed up the code transformation process signicantly.This paper has proposed deep learning approaches for modifying SH using a variational simhash(VSH)algorithm and replacing LCS with a piecewise longest common subsequence(PLCS)algorithm to faster the verication process in the test phase.Besides the code transformation model GPT-2,this study has also introduced MicrosoMASS and Facebook BART for a comparative analysis of their performance.Meanwhile,the explainable AI technique using local interpretable model-agnostic explanations(LIME)can also interpret the decision-making ofAImodels.The experimental results show that VSH can reduce the number of qualied programs by 22.11%,and PLCS can reduce the execution time of selected pocket programs by 32.39%.As a result,the proposed approaches can signicantly speed up the entire code transformation process by 1.38 times on average compared with our previous work.展开更多
文摘本文介绍了一种基于最大公共子串(Longest Common Substring,LCS)算法的术语抽取方法:按标点符号对领域文档进行切分;抽取切分后的语句片断的所有最大公共子串作为候选术语集;通过停用词过滤、对照领域词筛选和术语嵌套子串筛选等规则进行判别,得到最终的术语集。通过学前教育领域术语抽取的实验,验证了该算法可以有效地抽取中文领域术语:术语抽取平均准确率达84.2%;4~6字符双词术语抽取的效果尤佳,准确率接近100%。
文摘数据流相似性查询广泛应用于智能家居、环境监测等领域.当前以LCSS(longest common subsequence)作为相似性测度函数的研究并不多.NAIVE算法使用基本动态规划方法计算测度函数值,通过该值与相似阈值的比较得到查询结果,对基于LCSS的数据流相似性查询问题进行研究.针对NAIVE算法必须在动态规划矩阵所有成员取值的计算完成后才能得到查询结果的缺点,提出了一种基于PS(possible solution)-CC(column critical)域优化策略的数据流相似性查询处理算法.该算法划定了每个窗口上动态规划矩阵的PS域和CC域,很好地利用了这2个域中成员所具有的性质和相似性查询的特点,无须获得测度函数的最终值便可得到查询结果,省略了很多矩阵成员的计算.实验部分证明了该算法的有效性,与同类算法相比,在处理具有更高精度结果要求的查询时效果更好.
基金supported by the Ministry of Science and Technology,Taiwan,under Grant Nos.MOST 111-2221-E-390-012 and MOST 111-2622-E-390-001.
文摘Our previous work has introduced the newly generated program using the code transformation model GPT-2,verifying the generated programming codes through simhash(SH)and longest common subsequence(LCS)algo-rithms.However,the entire code transformation process has encountered a time-consuming problem.Therefore,the objective of this study is to speed up the code transformation process signicantly.This paper has proposed deep learning approaches for modifying SH using a variational simhash(VSH)algorithm and replacing LCS with a piecewise longest common subsequence(PLCS)algorithm to faster the verication process in the test phase.Besides the code transformation model GPT-2,this study has also introduced MicrosoMASS and Facebook BART for a comparative analysis of their performance.Meanwhile,the explainable AI technique using local interpretable model-agnostic explanations(LIME)can also interpret the decision-making ofAImodels.The experimental results show that VSH can reduce the number of qualied programs by 22.11%,and PLCS can reduce the execution time of selected pocket programs by 32.39%.As a result,the proposed approaches can signicantly speed up the entire code transformation process by 1.38 times on average compared with our previous work.