摘要
为改善建筑信用管理中对信用信息的文档管理依赖人力劳动的现状,文章提出一种基于自然语言处理技术(NLP)的建筑企业失信行为信息文本分类方法。首先,基于Skip-Gram词向量模型利用已标注数据和大量无标注获取文本的词向量表示;其次,运用融入注意力机制(attention-mechanism)的双向长短期记忆网络模型(BiLSTM)对已标注数据进行特征提取与文本分类。结果表明:在小样本训练中,使用较大的语料库训练词向量模型可有效提高文本分类模型的分类效果,BiLSTM-Attention模型的分类性能优于对照模型,基于NLP的文本分类方法能够实现对建筑企业失信行为信息的快速自动分类。
In order to improve the status quo of relying on human labor for document management of credit information in construction credit management,This paper proposed a text categorization method based on Natural Language Processing(NLP)for the information of construction enterprise's bad credit information.Firstly,the word vector representation of the text was obtained based on Skip-Gram model using labeled data and a large number of unlabeled;secondly,the Bi-directional Long-Short Term Memory Network(BiLSTM),which incorporated the Attention-Mechanism,was used to perform feature extraction and text classification on the labeled data.The results showed that:in small-sample training,using a larger corpus to train the word vector model could effectively improve the classification performance of the text classification model,the NLP-based text classification method could realize the fast and automatic classification of the information about the bad Credit information of construction enterprises.
作者
张振森
任宇轩
曹吉昌
ZHANG Zhensen;REN Yuxuan;CAO Jichang(School of Management Engineering,Qingdao University of Technology,Qingdao 266525)
出处
《九江学院学报(自然科学版)》
CAS
2024年第3期99-105,109,共8页
Journal of Jiujiang University:Natural Science Edition
基金
国家自然科学基金(编号72001121)
住房和城乡建设部委托课题(编号JXXTH-2023-103)的研究成果之一。