摘要
对网络安全从业人员来说,基于暗网市场的研究扮演了一个非常重要的角色.与此同时,由于暗网市场文本数据本身的特点,针对暗网市场的命名实体识别研究面临着巨大的挑战.提出了一个针对暗网市场文本的命名实体识别系统(DNER),使用卷积神经网络(CNN)进行字符向量化以学习单词形态特征,使得系统能从单词级和字符级2方面学习特征.同时,将双向长短时记忆神经网络(Bi-LSTM)应用于暗网市场文本的命名实体识别,并采用CRF模型实现序列标签之间的约束性.此外,对暗网市场文本进行了词性标注.最后,比较了DNER和其他基本命名实体识别模型在暗网市场文本命名实体识别的效果.实验结果显示,DNER系统在暗网市场文本的准确率达到98.59%,召回率达到93.82%,F1值达到了96.15%.
Research on the darknet markets(DNMs)plays a vital role for cybersecurity practitioners.At the same time,named entity recognition(NER)on DNM is a challenging task because of its inherent characteristics.In this paper,we proposed DNER,a named entity recognition system on the darknet markets,using the convolutional neural network(CNN)to learn morphological features of words from character embeddings,and combining with word embeddings,which makes it benefits from both word-level and character-level features to train the dataset.Meanwhile,we combine the bidirectional long short-term memory(BiLSTM)with the NER task on darknet markets,and we adopt the CRF model to constraint the sequence labels.Moreover,we tag the corpus with Part-of-Speech.At last,we compared the performance of the CNN-BiLSTM-CRF model with other baseline models on darknet market corpus.The experimental results show that the DNER system has achieved at 98.59%of precision,93.82%of recall,and96.15%of F1 score on the DNM corpus.
作者
范晓霞
周安民
郑荣锋
李孟铭
Fan Xiaoxia;Zhou Anmin;Zheng Rongfeng;Li Mengming(School of Cyber Science and Engineering,Sichuan University,Chengdu 610065;College of Electronics and Information Engineering,Sichuan University,Chengdu 610065)
出处
《信息安全研究》
2021年第1期37-43,共7页
Journal of Information Security Research
关键词
暗网市场
命名实体识别
双向长短时记忆网络
卷积神经网络
条件随机场
darknet market
named entity recognition
Bi-directional long short-term memory
convolutional neural network
conditional random fields