Randomized weights neural networks have fast learning speed and good generalization performance with one single hidden layer structure. Input weighs of the hidden layer are produced randomly. By employing certain acti...Randomized weights neural networks have fast learning speed and good generalization performance with one single hidden layer structure. Input weighs of the hidden layer are produced randomly. By employing certain activation function, outputs of the hidden layer are calculated with some randomization. Output weights are computed using pseudo inverse. Mutual information can be used to measure mutual dependence of two variables quantitatively based on the probability theory. In this paper, these hidden layer’s outputs that relate to prediction variable closely are selected with the simple mutual information based feature selection method. These hidden nodes with high mutual information values are maintained as a new hidden layer. Thus, the size of the hidden layer is reduced. The new hidden layer’s output weights are learned with the pseudo inverse method. The proposed method is compared with the original randomized algorithms using concrete compressive strength benchmark dataset.展开更多
运行状态评价是指在过程正常生产的前提下,进一步判断生产过程运行状态的优劣.针对复杂工业过程定量信息与定性信息共存的情况,本文提出了一种基于随机森林的工业过程运行状态评价方法.针对随机森林中决策树信息存在冗余的问题,基于互...运行状态评价是指在过程正常生产的前提下,进一步判断生产过程运行状态的优劣.针对复杂工业过程定量信息与定性信息共存的情况,本文提出了一种基于随机森林的工业过程运行状态评价方法.针对随机森林中决策树信息存在冗余的问题,基于互信息将传统随机森林中的决策树进行分组,并选出每组中最优的决策树组成新的随机森林.同时为了强化评价精度高的决策树和弱化评价精度低的决策树对最终评价结果的影响,使用加权投票机制取代传统众数投票方法,最终构成一种基于互信息的加权随机森林算法(Mutual information weighted random forest,MIWRF).对于在线评价,本文通过计算在线数据处于各个等级的概率,并且结合提出的在线评价策略,判定当前样本运行状态等级.为了验证所提算法的有效性,将所提方法应用于湿法冶金浸出过程,实验结果表明,相对于传统随机森林算法,MIWRF降低了模型的复杂度,同时提高了运行状态评价精度.展开更多
【背景】及时掌握领域术语有助于动态把握领域发展方向,揭示领域的核心知识与研究热点。【目的】为提高领域术语抽取准确率,提出一种基于深度学习和统计信息的领域术语抽取方法。【方法】首先,对领域中文专利文本进行字嵌入表示,基于BER...【背景】及时掌握领域术语有助于动态把握领域发展方向,揭示领域的核心知识与研究热点。【目的】为提高领域术语抽取准确率,提出一种基于深度学习和统计信息的领域术语抽取方法。【方法】首先,对领域中文专利文本进行字嵌入表示,基于BERT(Bidirectional Encoder Representations from Transformers)获取字符级的向量表征作为模型的输入;然后,利用BiLSTM-CRF(Bidirectional Long Short Term Memory-Conditional Random Field)深度学习模型提取序列化文本的语义特征,得到领域术语标注序列;最后,综合计算复合结构术语的互信息和左右熵,并结合领域知识库对抽取结果进行校正。【结果】模型在“盐湖提锂”领域进行实验,结果表明BERT-BiLSTM-CRF模型抽取该领域术语准确率达到77.33%,而对抽取结果进行校正进一步将准确率提升了3.68%,是一种有效的领域术语抽取方法。展开更多
文摘Randomized weights neural networks have fast learning speed and good generalization performance with one single hidden layer structure. Input weighs of the hidden layer are produced randomly. By employing certain activation function, outputs of the hidden layer are calculated with some randomization. Output weights are computed using pseudo inverse. Mutual information can be used to measure mutual dependence of two variables quantitatively based on the probability theory. In this paper, these hidden layer’s outputs that relate to prediction variable closely are selected with the simple mutual information based feature selection method. These hidden nodes with high mutual information values are maintained as a new hidden layer. Thus, the size of the hidden layer is reduced. The new hidden layer’s output weights are learned with the pseudo inverse method. The proposed method is compared with the original randomized algorithms using concrete compressive strength benchmark dataset.
文摘运行状态评价是指在过程正常生产的前提下,进一步判断生产过程运行状态的优劣.针对复杂工业过程定量信息与定性信息共存的情况,本文提出了一种基于随机森林的工业过程运行状态评价方法.针对随机森林中决策树信息存在冗余的问题,基于互信息将传统随机森林中的决策树进行分组,并选出每组中最优的决策树组成新的随机森林.同时为了强化评价精度高的决策树和弱化评价精度低的决策树对最终评价结果的影响,使用加权投票机制取代传统众数投票方法,最终构成一种基于互信息的加权随机森林算法(Mutual information weighted random forest,MIWRF).对于在线评价,本文通过计算在线数据处于各个等级的概率,并且结合提出的在线评价策略,判定当前样本运行状态等级.为了验证所提算法的有效性,将所提方法应用于湿法冶金浸出过程,实验结果表明,相对于传统随机森林算法,MIWRF降低了模型的复杂度,同时提高了运行状态评价精度.
文摘【背景】及时掌握领域术语有助于动态把握领域发展方向,揭示领域的核心知识与研究热点。【目的】为提高领域术语抽取准确率,提出一种基于深度学习和统计信息的领域术语抽取方法。【方法】首先,对领域中文专利文本进行字嵌入表示,基于BERT(Bidirectional Encoder Representations from Transformers)获取字符级的向量表征作为模型的输入;然后,利用BiLSTM-CRF(Bidirectional Long Short Term Memory-Conditional Random Field)深度学习模型提取序列化文本的语义特征,得到领域术语标注序列;最后,综合计算复合结构术语的互信息和左右熵,并结合领域知识库对抽取结果进行校正。【结果】模型在“盐湖提锂”领域进行实验,结果表明BERT-BiLSTM-CRF模型抽取该领域术语准确率达到77.33%,而对抽取结果进行校正进一步将准确率提升了3.68%,是一种有效的领域术语抽取方法。