针对电力领域中文文本包含大量专有词时分词效果不佳的问题,提出一种基于改进BERT(Bidirectional Encoder Representation from Transformers)的电力领域中文分词(CWS)方法。首先,构建分别涵盖通用、领域词的词典,并设计双词典匹配融合...针对电力领域中文文本包含大量专有词时分词效果不佳的问题,提出一种基于改进BERT(Bidirectional Encoder Representation from Transformers)的电力领域中文分词(CWS)方法。首先,构建分别涵盖通用、领域词的词典,并设计双词典匹配融合机制将词特征直接融入BERT模型,使模型更有效地利用外部知识;其次,通过引入DEEPNORM方法提高模型对于特征的提取能力,并使用贝叶斯信息准则(BIC)确定模型的最佳深度,使BERT模型稳定加深至40层;最后,采用ProbSparse自注意力机制层替换BERT模型中的经典自注意力机制层,并利用粒子群优化(PSO)算法确定采样因子的最优值,在降低模型复杂度的同时确保模型性能不变。在人工标注的电力领域专利文本数据集上进行了分词性能测试。实验结果表明,所提方法在该数据集分词任务中的F1值达到了92.87%,相较于隐马尔可夫模型(HMM)、多标准分词模型METASEG(pre-training model with META learning for Chinese word SEGmentation)与词典增强型BERT(LEBERT)模型分别提高了14.70、9.89与3.60个百分点,验证了所提方法有效提高了电力领域中文文本的分词质量。展开更多
Xiaoqinling District is an important gold-producing area in China.It ranks second to Jiaodong with regard to gold deposits.The uprising period of the Wenyu granitic pluton and the wall-rocks of the deposit,as well as ...Xiaoqinling District is an important gold-producing area in China.It ranks second to Jiaodong with regard to gold deposits.The uprising period of the Wenyu granitic pluton and the wall-rocks of the deposit,as well as the mineralizing depth and reserved place of gold ore bodies,are significant to ore exploration.Fission-track(FT)analysis of zircons and apatites of granitic rocks from the Wenyu granitic pluton shows that apatite FT(AFT)data modeling indicates a rapid cooling rate of 20°C/Ma from 138 to120 Ma after emplacement at 138 Ma.Thermal evolution and inversion curves suggest a secondary phase of fast cooling and uprising from 45 to 35 Ma,and 35 Ma,respectively,with a cooling rate of 6.7°C/Ma and a denudation quantity of^4.3 km.The last cooling phase took place from<4 Ma,with an average cooling rate of^11.3°C/Ma and a denudation amount of 1.3 km.Total exhumation quantity of 5.6 km and uprising elevation of 7.3 km are similar to the estimated results of fluid inclusions from the Dongtongyu and Wenyu gold deposits.The39Ar/40Ar dating of sericite from the fault planes of the Xunmadao-Xiaohe and Taiyao faults demonstrate two uprising activities of the ore-host metamorphic complex.The Huashan and Wenyu granitic plutons intensively occurred during 77 and 45 Ma,respectively.These data sets are valuable for understanding the uplifting process and for preserving gold ore bodies in the Xiaoqinling area,as well as for further studies on tectonic evolutions of the Taihua Complex and the Qinling-Dabie Orogen.展开更多
文摘针对电力领域中文文本包含大量专有词时分词效果不佳的问题,提出一种基于改进BERT(Bidirectional Encoder Representation from Transformers)的电力领域中文分词(CWS)方法。首先,构建分别涵盖通用、领域词的词典,并设计双词典匹配融合机制将词特征直接融入BERT模型,使模型更有效地利用外部知识;其次,通过引入DEEPNORM方法提高模型对于特征的提取能力,并使用贝叶斯信息准则(BIC)确定模型的最佳深度,使BERT模型稳定加深至40层;最后,采用ProbSparse自注意力机制层替换BERT模型中的经典自注意力机制层,并利用粒子群优化(PSO)算法确定采样因子的最优值,在降低模型复杂度的同时确保模型性能不变。在人工标注的电力领域专利文本数据集上进行了分词性能测试。实验结果表明,所提方法在该数据集分词任务中的F1值达到了92.87%,相较于隐马尔可夫模型(HMM)、多标准分词模型METASEG(pre-training model with META learning for Chinese word SEGmentation)与词典增强型BERT(LEBERT)模型分别提高了14.70、9.89与3.60个百分点,验证了所提方法有效提高了电力领域中文文本的分词质量。
基金supported by the National Natural Science Foundation of China(90814006)
文摘Xiaoqinling District is an important gold-producing area in China.It ranks second to Jiaodong with regard to gold deposits.The uprising period of the Wenyu granitic pluton and the wall-rocks of the deposit,as well as the mineralizing depth and reserved place of gold ore bodies,are significant to ore exploration.Fission-track(FT)analysis of zircons and apatites of granitic rocks from the Wenyu granitic pluton shows that apatite FT(AFT)data modeling indicates a rapid cooling rate of 20°C/Ma from 138 to120 Ma after emplacement at 138 Ma.Thermal evolution and inversion curves suggest a secondary phase of fast cooling and uprising from 45 to 35 Ma,and 35 Ma,respectively,with a cooling rate of 6.7°C/Ma and a denudation quantity of^4.3 km.The last cooling phase took place from<4 Ma,with an average cooling rate of^11.3°C/Ma and a denudation amount of 1.3 km.Total exhumation quantity of 5.6 km and uprising elevation of 7.3 km are similar to the estimated results of fluid inclusions from the Dongtongyu and Wenyu gold deposits.The39Ar/40Ar dating of sericite from the fault planes of the Xunmadao-Xiaohe and Taiyao faults demonstrate two uprising activities of the ore-host metamorphic complex.The Huashan and Wenyu granitic plutons intensively occurred during 77 and 45 Ma,respectively.These data sets are valuable for understanding the uplifting process and for preserving gold ore bodies in the Xiaoqinling area,as well as for further studies on tectonic evolutions of the Taihua Complex and the Qinling-Dabie Orogen.