期刊文献+
共找到1,462篇文章
< 1 2 74 >
每页显示 20 50 100
Chinese word segmentation with local and global context representation learning 被引量:2
1
作者 李岩 Zhang Yinghua +2 位作者 Huang Xiaoping Yin Xucheng Hao Hongwei 《High Technology Letters》 EI CAS 2015年第1期71-77,共7页
A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chin... A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chinese character learning model uses the semanties of loeal context and global context to learn the representation of Chinese characters. Then, Chinese word segmentation model is built by a neural network, while the segmentation model is trained with the eharaeter representations as its input features. Finally, experimental results show that Chinese charaeter representations can effectively learn the semantic information. Characters with similar semantics cluster together in the visualize space. Moreover, the proposed Chinese word segmentation model also achieves a pretty good improvement on precision, recall and f-measure. 展开更多
关键词 local and global context representation learning Chinese character representa- tion Chinese word segmentation
下载PDF
An Improved Unsupervised Approach to Word Segmentation
2
作者 WANG Hanshi HAN Xuhong +2 位作者 LIU Lizhen SONG Wei YUAN Mudan 《China Communications》 SCIE CSCD 2015年第7期82-95,共14页
ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the... ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the extension of ESA. In Ex ESA, the original approach is extended to a 2-pass process and the ratio of different word lengths is introduced as the third type of information combined with cohesion and separation. A maximum strategy is adopted to determine the best segmentation of a character sequence in the phrase of Selection. Besides, in Adjustment, Ex ESA re-evaluates separation information and individual information to overcome the overestimation frequencies. Additionally, a smoothing algorithm is applied to alleviate sparseness. The experiment results show that Ex ESA can further improve the performance and is time-saving by properly utilizing more information from un-annotated corpora. Moreover, the parameters of Ex ESA can be predicted by a set of empirical formulae or combined with the minimum description length principle. 展开更多
关键词 word segmentation character sequence smoothing algorithm maximum strategy
下载PDF
Applying rough sets in word segmentation disambiguation based on maximum entropy model
3
作者 姜维 王晓龙 +1 位作者 关毅 梁国华 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2006年第1期94-98,共5页
To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets ill WSD based on the Maximum Entropy model. Firstly... To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets ill WSD based on the Maximum Entropy model. Firstly, rough set theory is applied to extract the complicated features and long distance features, even frnm noise or inconsistent corpus. Secondly, these features are added into the Maximum Entropy model, and consequently, the feature weights can be assigned according to the performance of the whole disambiguation mnltel. Finally, tile semantic lexicou is adopted to build class-hased rough set teatures to overcome data spareness. The experiment indicated that our method performed better than previous models, which got top rank in WSD in 863 Evaluation in 2003. This system ranked first and second respcetively in MSR and PKU open test in the Second International Chinese Word Segmentation Bankeoff held in 2005. 展开更多
关键词 word segmentation feature extraction rough sets maximum entropy
下载PDF
Design and Implementation of a New Chinese Word Segmentation Dictionary for the Personalized Mobile Search
4
作者 Zhongmin Wang Jingna Qi Yan He 《Communications and Network》 2013年第1期81-85,共5页
Chinese word segmentation is the basis of natural language processing. The dictionary mechanism significantly influences the efficiency of word segmentation and the understanding of the user’s intention which is impl... Chinese word segmentation is the basis of natural language processing. The dictionary mechanism significantly influences the efficiency of word segmentation and the understanding of the user’s intention which is implied in the user’s query. As the traditional dictionary mechanisms can't meet the present situation of personalized mobile search, this paper presents a new dictionary mechanism which contains the word classification information. This paper, furthermore, puts forward an approach for improving the traditional word bank structure, and proposes an improved FMM segmentation algorithm. The results show that the new dictionary mechanism has made a significant increase on the query efficiency and met the user’s individual requirements better. 展开更多
关键词 Chinese word segmentation DICTIONARY Mechanism Natural LANGUAGE Processing PERSONALIZED SEARCH word Classification Information
下载PDF
Improvement in Accuracy of Word Segmentation of a Web-Based Japanese-to-Braille Translation Program for Medical Information
5
作者 Tsuyoshi Oda Aki Sugano +10 位作者 Masashi Shimbo Kenji Miura Mika Ohta Masako Matsuura Mineko Ikegami Tetsuya Watanabe Shinichi Kita Akihiro Ichinose Eiichi Maeda Yuji Matsumoto Yutaka Takaoka 《通讯和计算机(中英文版)》 2013年第1期82-89,共8页
关键词 医疗信息 翻译程序 Web 盲文 分词 精度 自然语言处理 专有名词
下载PDF
Remove Redundancy Samples for SVM in A Chinese Word Segmentation Task
6
作者 Feiliang Ren Tianshun Yao 《通讯和计算机(中英文版)》 2006年第5期103-107,共5页
关键词 文字处理 变参数系统 软件开发 数据处理
下载PDF
Chinese to Braille Translation Based on Braille Word Segmentation Using Statistical Model 被引量:2
7
作者 王向东 杨阳 +3 位作者 张金超 姜文斌 刘宏 钱跃良 《Journal of Shanghai Jiaotong university(Science)》 EI 2017年第1期82-86,共5页
Automatic translation of Chinese text to Chinese Braille is important for blind people in China to acquire information using computers or smart phones. In this paper, a novel scheme of Chinese-Braille translation is p... Automatic translation of Chinese text to Chinese Braille is important for blind people in China to acquire information using computers or smart phones. In this paper, a novel scheme of Chinese-Braille translation is proposed. Under the scheme, a Braille word segmentation model based on statistical machine learning is trained on a Braille corpus, and Braille word segmentation is carried out using the statistical model directly without the stage of Chinese word segmentation. This method avoids establishing rules concerning syntactic and semantic information and uses statistical model to learn the rules stealthily and automatically. To further improve the performance, an algorithm of fusing the results of Chinese word segmentation and Braille word segmentation is also proposed. Our results show that the proposed method achieves accuracy of 92.81% for Braille word segmentation and considerably outperforms current approaches using the segmentation-merging scheme. 展开更多
关键词 Chinese Braille word segmentation perceptron algorithm TP 391.1 A
原文传递
Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node 被引量:2
8
作者 Nuo Qun Hang Yan +1 位作者 Xi-Peng Qiu Xuan-Jing Huang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第5期1115-1126,共12页
Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit ... Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences.Despite its theoretical advantage,Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentenced length.In this paper,we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity.Specifically,we first adopt a bi-directional long short-term memory(BiLSTM)on character level to model the context information,and then use simple but effective fusion layer to represent the segment information.Besides,to model arbitrarily long segments within linear time complexity,we also propose a new model named Semi-CRF-Relay.The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings.Experiments on four popular CWS datasets show the effectiveness of our proposed methods.The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/. 展开更多
关键词 Semi-Markov conditional random field(Semi-CRF) Chinese word segmentation bi-directional long short-term memory deep learning
原文传递
Word Segmentation Based on Database Semantics in NChiql 被引量:2
9
作者 孟小峰 刘爽 王珊 《Journal of Computer Science & Technology》 SCIE EI CSCD 2000年第4期346-354,共9页
In this paper a novel word-segmentation algorithm is presented todelimit words in Chinese natural language queries in NChiql system, a Chinese natural language query interface to databases. Although there are sizable ... In this paper a novel word-segmentation algorithm is presented todelimit words in Chinese natural language queries in NChiql system, a Chinese natural language query interface to databases. Although there are sizable literatureson Chinese segmentation, they cannot satisfy particular requirements in this system. The novel word-segmentation algorithm is based on the database semantics,namely Semantic Conceptual Model (SCM) for specific domain knowledge. Basedon SCM, the segmenter labels the database semantics to words directly, which easesthe disambiguation and translation (from natural language to database query) inNChiql. 展开更多
关键词 database query natural language processing word segmentation disambiguation
原文传递
Construction of Word Segmentation Model Based on HMM+BI-LSTM
10
作者 Hang Zhang Bin Wen 《国际计算机前沿大会会议论文集》 2020年第2期47-61,共15页
Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation a... Chinese word segmentation plays an important role in search engine,artificial intelligence,machine translation and so on.There are currently three main word segmentation algorithms:dictionary-based word segmentation algorithms,statistics-based word segmentation algorithms,and understandingbased word segmentation algorithms.However,few people combine these three methods or two of them.Therefore,a Chinese word segmentation model is proposed based on a combination of statistical word segmentation algorithm and understanding-based word segmentation algorithm.It combines Hidden Markov Model(HMM)word segmentation and Bi-LSTM word segmentation to improve accuracy.The main method is to make lexical statistics on the results of the two participles,and to choose the best results based on the statistical results,and then to combine them into the final word segmentation results.This combined word segmentation model is applied to perform experiments on the MSRA corpus provided by Bakeoff.Experiments show that the accuracy of word segmentation results is 12.52%higher than that of traditional HMM model and 0.19%higher than that of BI-LSTM model. 展开更多
关键词 Chinese word segmentation HMM BI-LSTM Sequence tagging
原文传递
Effective Analysis of Chinese Word-Segmentation Accuracy
11
作者 MA Weiyin 《现代电子技术》 2007年第4期108-110,共3页
Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidate... Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidates are invalid.These false unknown word candidates deteriorate the overall segmentation accuracy,as it will affect the segmentation accuracy of known words.In this paper,we propose several methods for reducing the difficulties and improving the accuracy of the word-segmentation of written Chinese,such as full segmentation of a sentence,processing the duplicative word,idioms and statistical identification for unknown words.A simulation shows the feasibility of our proposed methods in improving the accuracy of word-segmentation of Chinese. 展开更多
关键词 中文信息处理 汉字处理 自动分割 效率分析
下载PDF
基于Word2Vec及TextRank算法的长文档摘要自动生成研究 被引量:1
12
作者 朱玉婷 刘乐 +2 位作者 辛晓乐 陈珑慧 康亮河 《现代信息科技》 2023年第4期36-38,42,共4页
近年来,如何从大量信息中提取关键信息已成为一个急需解决的问题。针对中文专利长文档,提出一种结合Word2Vec和TextRank的专利生成算法。首先利用Python Jieba技术对中文专利文档进行分词,利用停用词典去除无意义的词;其次利用Word2Vec... 近年来,如何从大量信息中提取关键信息已成为一个急需解决的问题。针对中文专利长文档,提出一种结合Word2Vec和TextRank的专利生成算法。首先利用Python Jieba技术对中文专利文档进行分词,利用停用词典去除无意义的词;其次利用Word2Vec算法进行特征提取,并利用WordCloud对提取的关键词进行可视化展示;最后利用TextRank算法计算语句间的相似度,生成摘要候选句,根据候选句的权重生成该专利文档的摘要信息。实验表明,采用Word2Vec和TextRank生成的专利摘要质量高,概括性也强。 展开更多
关键词 Jieba分词 关键词提取 word2Vec算法 TextRank算法
下载PDF
阅读伴随词汇学习的词切分:首、尾词素位置概率的不同作用 被引量:1
13
作者 梁菲菲 冯琳琳 +2 位作者 刘瑛 李馨 白学军 《心理学报》 CSCD 北大核心 2024年第3期281-294,共14页
本研究通过两个平行实验,探讨重复学习新词时首、尾词素位置概率信息作用于词切分的变化模式。采用阅读伴随词汇学习范式,将双字假词作为新词,实验1操纵首词素位置概率高低,保证尾词素相同;实验2操纵尾词素位置概率高低,保证首词素相同... 本研究通过两个平行实验,探讨重复学习新词时首、尾词素位置概率信息作用于词切分的变化模式。采用阅读伴随词汇学习范式,将双字假词作为新词,实验1操纵首词素位置概率高低,保证尾词素相同;实验2操纵尾词素位置概率高低,保证首词素相同。采用眼动仪记录大学生阅读时的眼动轨迹。结果显示:(1)首、尾词素位置概率信息的词切分作用随新词在阅读中学习次数的增加而逐步变小,表现出“熟悉性效应”。(2)首词素位置概率信息的“熟悉性效应”表现在回视路径时间、总注视次数两个相对晚期的眼动指标,而尾词素位置概率信息的“熟悉性效应”则从凝视时间开始,到回视路径时间,再持续到总注视时间。结果表明首、尾词素的位置概率信息均作用于阅读伴随词汇学习的词切分,但首词素的作用时程更长,更稳定,支持了首词素在双字词加工中具有优势的观点。 展开更多
关键词 词素位置概率 词切分 阅读伴随词汇学习 中文阅读
下载PDF
基于BERT-BiLSTM-CRF模型的畜禽疫病文本分词研究 被引量:2
14
作者 余礼根 郭晓利 +3 位作者 赵红涛 杨淦 张俊 李奇峰 《农业机械学报》 EI CAS CSCD 北大核心 2024年第2期287-294,共8页
针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectiona... 针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectional encoder representation from transformers)预训练语言模型进行文本向量化表示;通过双向长短时记忆网络(Bidirectional long short-term memory network,BiLSTM)获取上下文语义特征;由条件随机场(Conditional random field,CRF)输出全局最优标签序列。基于此,在CRF层后加入畜禽疫病领域词典进行分词匹配修正,减少在分词过程中出现的疫病名称及短语等造成的歧义切分,进一步提高了分词准确率。实验结果表明,结合词典匹配的BERT-BiLSTM-CRF模型在羊常见疫病文本数据集上的F1值为96.38%,与jieba分词器、BiLSTM-Softmax模型、BiLSTM-CRF模型、未结合词典匹配的本文模型相比,分别提升11.01、10.62、8.3、0.72个百分点,验证了方法的有效性。与单一语料相比,通用语料PKU和羊常见疫病文本数据集结合的混合语料,能够同时对畜禽疫病专业术语及疫病文本中常用词进行准确切分,在通用语料及疫病文本数据集上F1值都达到95%以上,具有较好的模型泛化能力。该方法可用于畜禽疫病文本分词。 展开更多
关键词 畜禽疫病 文本分词 预训练语言模型 双向长短时记忆网络 条件随机场
下载PDF
基于信创环境的水利智搜优化设计与研究
15
作者 付静 杨柳 那泽琛 《水利信息化》 2024年第3期1-7,共7页
为提高水利智搜数据采集效率、个性化推荐程度、协调运维能力、信创环境适配程度,向社会公众提供便捷高效的水利信息检索服务,结合核心技术自主可控的必然要求,研究运用数据采集、搜索推荐模型等智能化处理技术,进一步优化搜索算法,加... 为提高水利智搜数据采集效率、个性化推荐程度、协调运维能力、信创环境适配程度,向社会公众提供便捷高效的水利信息检索服务,结合核心技术自主可控的必然要求,研究运用数据采集、搜索推荐模型等智能化处理技术,进一步优化搜索算法,加强数据信息采集。同时,通过优化水利智搜平台设计,基于信创环境适配优化,采用集成式、场景式和交互式一体化数据呈现方式,提升支撑、统计分析、搜索和聚合能力,实现精准化推荐和协同化运维。基于信创环境的水利智搜优化设计与研究,可为水利行业智能化信息检索服务提供经验借鉴。 展开更多
关键词 水利智搜 信创 优化设计 数据采集 智能推荐 协同运维 分词
下载PDF
在线医疗社区分析系统的设计与实现
16
作者 张霞 邵芊芊 顾加成 《无线互联科技》 2024年第3期38-40,44,共4页
作为“互联网+医疗”的重要产物,在线医疗社区迅速发展。在线医疗社区产生了大量的医疗问答信息,这些信息富含医学知识和患者关切等内容。因此,文章构建了在线医疗社区分析系统的架构,再通过网络爬虫、数据清洗和存储、文本分词、数据... 作为“互联网+医疗”的重要产物,在线医疗社区迅速发展。在线医疗社区产生了大量的医疗问答信息,这些信息富含医学知识和患者关切等内容。因此,文章构建了在线医疗社区分析系统的架构,再通过网络爬虫、数据清洗和存储、文本分词、数据可视化等技术,设计并开发了一个医患问答数据的分析系统,通过折线图、饼状图和生成词云等数据分析,得到不同疾病的发病症状、治疗常用药物等有用知识,为患者诊断和治疗提供便利,也能为医生了解患者关切提供依据。 展开更多
关键词 在线医疗社区 文本分词 词云分析
下载PDF
基于历史事故案例的瓦斯爆炸情景要素提取及情景构建方法研究
17
作者 国汉君 赵伟 +4 位作者 宋亚楠 郭小芳 赵志虎 周爱桃 王凯 《矿业安全与环保》 CAS 北大核心 2024年第3期43-49,共7页
为深入探究煤矿瓦斯爆炸事故发展规律,提出一种基于中文分词技术对瓦斯爆炸事故情景要素进行分析和提取的方法。通过煤矿安全网等途径搜集统计了1978—2020年间的733起瓦斯爆炸事故报告,在此基础上进行数据预处理,剔除不完整的事故报告... 为深入探究煤矿瓦斯爆炸事故发展规律,提出一种基于中文分词技术对瓦斯爆炸事故情景要素进行分析和提取的方法。通过煤矿安全网等途径搜集统计了1978—2020年间的733起瓦斯爆炸事故报告,在此基础上进行数据预处理,剔除不完整的事故报告,最终选取255起瓦斯爆炸事故报告进行要素分析与提取;将事故等级、事故经过、事故原因等内容进行整理储存,形成待挖掘文本语料库;基于Jieba分词算法提取瓦斯爆炸事故情景关键词,并采用TF-IDF算法进行权重计算,将情景划分为事故体、致灾体、承灾体、抗灾体4个维度和24个要素,为后续瓦斯爆炸事故的情景表示和事故未来的可能性组合提供了参考依据。 展开更多
关键词 安全工程 瓦斯爆炸 情景分析 中文分词 Jieba分词技术
下载PDF
基于局部Transformer的泰语分词和词性标注联合模型
18
作者 朱叶芬 线岩团 +1 位作者 余正涛 相艳 《智能系统学报》 CSCD 北大核心 2024年第2期401-410,共10页
泰语分词和词性标注任务二者之间存在高关联性,已有研究表明将分词和词性标注任务进行联合学习可以有效提升模型性能,为此,提出了一种针对泰语拼写和构词特点的分词和词性标注联合模型。针对泰语中字符构成音节,音节组成词语的特点,采... 泰语分词和词性标注任务二者之间存在高关联性,已有研究表明将分词和词性标注任务进行联合学习可以有效提升模型性能,为此,提出了一种针对泰语拼写和构词特点的分词和词性标注联合模型。针对泰语中字符构成音节,音节组成词语的特点,采用局部Transformer网络从音节序列中学习分词特征;考虑到词根和词缀等音节与词性的关联,将用于分词的音节特征融入词语序列特征,缓解未知词的词性标注特征缺失问题。在此基础上,模型采用线性分类层预测分词标签,采用线性条件随机场建模词性序列的依赖关系。在泰语数据集LST20上的试验结果表明,模型分词F1、词性标注微平均F1和宏平均F1分别达到96.33%、97.06%和85.98%,相较基线模型分别提升了0.33%、0.44%和0.12%。 展开更多
关键词 泰语分词 词性标注 联合学习 局部Transformer 构词特点 音节特征 线性条件随机场 联合模型
下载PDF
基于注意力增强与特征融合的中文医学实体识别
19
作者 王晋涛 秦昂 +4 位作者 张元 陈一飞 王廷凤 谢承霖 邹刚 《计算机工程》 CAS CSCD 北大核心 2024年第7期324-332,共9页
针对基于字符表示的中文医学领域命名实体识别模型嵌入形式单一、边界识别困难、语义信息利用不充分等问题,一种非常有效的方法是在Bret底层注入词汇特征,在利用词粒度语义信息的同时降低分词错误带来的影响,然而在注入词汇信息的同时... 针对基于字符表示的中文医学领域命名实体识别模型嵌入形式单一、边界识别困难、语义信息利用不充分等问题,一种非常有效的方法是在Bret底层注入词汇特征,在利用词粒度语义信息的同时降低分词错误带来的影响,然而在注入词汇信息的同时也会引入一些低相关性的词汇和噪声,导致基于注意力机制的Bret模型出现注意力分散的情况。此外仅依靠字、词粒度难以充分挖掘中文字符深层次的语义信息。对此,提出基于注意力增强与特征融合的中文医学实体识别模型,对字词注意力分数矩阵进行稀疏处理,使模型的注意力集中在相关度高的词汇,能够有效减少上下文中的噪声词汇干扰。同时,对汉字发音和笔画通过卷积神经网络(CNN)提取特征,经过迭代注意力特征融合模块进行融合,然后与Bret模型的输出特征进行拼接输入给Bi LSTM模型,进一步挖掘字符所包含的深层次语义信息。通过爬虫等方式搜集大量相关医学语料,训练医学领域词向量库,并在CCKS2017和CCKS2019数据集上进行验证,实验结果表明,该模型F1值分别达到94.90%、89.37%,效果优于当前主流的实体识别模型,具有更好的识别效果。 展开更多
关键词 实体识别 中文分词 注意力稀疏 特征融合 医学词向量库
下载PDF
基于LLM的多粒度口令分析研究
20
作者 洪萌 邱卫东 王杨德 《网络与信息安全学报》 2024年第1期112-122,共11页
基于口令的认证是常见的身份认证机制。然而,大规模口令泄露事件时有发生,表明口令仍面临着被猜测或者盗用等风险。由于口令可以被视作一种特殊的自然语言,近年来运用自然语言处理技术进行口令分析的研究工作逐渐展开。目前少有工作在... 基于口令的认证是常见的身份认证机制。然而,大规模口令泄露事件时有发生,表明口令仍面临着被猜测或者盗用等风险。由于口令可以被视作一种特殊的自然语言,近年来运用自然语言处理技术进行口令分析的研究工作逐渐展开。目前少有工作在大语言模型(LLM,large language model)上探究口令文本分词粒度对口令分析效果的影响。为此,提出了基于LLM的多粒度口令分析框架,总体上沿用预训练范式,在大量未标记数据集上自主学习口令分布先验知识。该框架由同步网络、主干网络、尾部网络3个模块构成。其中,同步网络模块实现了char-level、template-level和chunk-level这3种粒度的口令分词,并提取了口令的字符分布、结构、词块组成等特征知识;主干网络模块构建了通用的口令模型来学习口令组成规律;尾部网络模块生成了候选口令对目标库进行猜测分析。在Tianya、Twitter等8个口令库上进行大量实验,分析总结了多粒度分词下所提框架在不同语言环境中的口令分析效果。实验结果表明,在中文用户场景中,基于char-level和chunk-level分词的框架口令分析性能接近一致,且显著优于基于template-level分词的框架;在英文用户场景中,基于chunk-level分词的框架口令分析性能最佳。 展开更多
关键词 大语言模型 口令分析 自然语言处理 分词
下载PDF
上一页 1 2 74 下一页 到第
使用帮助 返回顶部