期刊文献+
共找到895篇文章
< 1 2 45 >
每页显示 20 50 100
Chinese word segmentation with local and global context representation learning 被引量:2
1
作者 李岩 Zhang Yinghua +2 位作者 Huang Xiaoping Yin Xucheng Hao Hongwei 《High Technology Letters》 EI CAS 2015年第1期71-77,共7页
A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chin... A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chinese character learning model uses the semanties of loeal context and global context to learn the representation of Chinese characters. Then, Chinese word segmentation model is built by a neural network, while the segmentation model is trained with the eharaeter representations as its input features. Finally, experimental results show that Chinese charaeter representations can effectively learn the semantic information. Characters with similar semantics cluster together in the visualize space. Moreover, the proposed Chinese word segmentation model also achieves a pretty good improvement on precision, recall and f-measure. 展开更多
关键词 local and global context representation learning chinese character representa- tion chinese word segmentation
下载PDF
Word Segmentation for Chinese Judicial Documents 被引量:1
2
作者 Linxia Yao Jidong Ge +5 位作者 Chuanyi Li Yuan Yao Zhenhao Li Jin Zeng Bin Luo Victor Chang 《国际计算机前沿大会会议论文集》 2019年第1期476-478,共3页
Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents:(1) existing methods rely on large-... Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents:(1) existing methods rely on large-scale labeled data which is typically unavailable in judicial documents, and (2) judicial document has its own language features and writing formats. In this paper, a word segmentation method is proposed for Chinese judicial documents. The proposed method consists of two steps:(1) automatically generating some labeled data as legal dictionaries, and (2) applying a hybrid multilayer neural networks to do word segmentation incorporating legal dictionaries. Experiments are conducted on a dataset of Chinese judicial documents showing that the proposed model can achieve better results than the existing methods. 展开更多
关键词 chinese word segmentation KNOWLEDGE DISCOVERY JUDICIAL DOCUMENTS
下载PDF
Design and Implementation of a New Chinese Word Segmentation Dictionary for the Personalized Mobile Search
3
作者 Zhongmin Wang Jingna Qi Yan He 《Communications and Network》 2013年第1期81-85,共5页
Chinese word segmentation is the basis of natural language processing. The dictionary mechanism significantly influences the efficiency of word segmentation and the understanding of the user’s intention which is impl... Chinese word segmentation is the basis of natural language processing. The dictionary mechanism significantly influences the efficiency of word segmentation and the understanding of the user’s intention which is implied in the user’s query. As the traditional dictionary mechanisms can't meet the present situation of personalized mobile search, this paper presents a new dictionary mechanism which contains the word classification information. This paper, furthermore, puts forward an approach for improving the traditional word bank structure, and proposes an improved FMM segmentation algorithm. The results show that the new dictionary mechanism has made a significant increase on the query efficiency and met the user’s individual requirements better. 展开更多
关键词 chinese word segmentation DICTIONARY Mechanism Natural LANGUAGE Processing PERSONALIZED SEARCH word Classification Information
下载PDF
Effective Analysis of Chinese Word-Segmentation Accuracy
4
作者 MA Weiyin 《现代电子技术》 2007年第4期108-110,共3页
Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidate... Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidates are invalid.These false unknown word candidates deteriorate the overall segmentation accuracy,as it will affect the segmentation accuracy of known words.In this paper,we propose several methods for reducing the difficulties and improving the accuracy of the word-segmentation of written Chinese,such as full segmentation of a sentence,processing the duplicative word,idioms and statistical identification for unknown words.A simulation shows the feasibility of our proposed methods in improving the accuracy of word-segmentation of Chinese. 展开更多
关键词 中文信息处理 汉字处理 自动分割 效率分析
下载PDF
Remove Redundancy Samples for SVM in A Chinese Word Segmentation Task
5
作者 Feiliang Ren Tianshun Yao 《通讯和计算机(中英文版)》 2006年第5期103-107,共5页
关键词 文字处理 变参数系统 软件开发 数据处理
下载PDF
A New Word Detection Method for Chinese Based on Local Context Information 被引量:1
6
作者 曾华琳 周昌乐 郑旭玲 《Journal of Donghua University(English Edition)》 EI CAS 2010年第2期189-192,共4页
Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction b... Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent. 展开更多
关键词 new word detection improved PPM model context information chinese words segmentation
下载PDF
Text Mining Based on the Korean Word Segmentation System in the Context of Big Data
7
作者 Yongmin Quan Na Niu +1 位作者 Hongyi Li Zhezhi Jin 《信息工程期刊(中英文版)》 2018年第1期1-7,共7页
Text mining is a text data analysis,found that the relationship between concepts and underlying concepts from unstructured text,it is extracted from large text database has not yet been realized patterns or associatio... Text mining is a text data analysis,found that the relationship between concepts and underlying concepts from unstructured text,it is extracted from large text database has not yet been realized patterns or associations,some information retrieval and text processing system can find the relationship between words and paragraphs.This article first describes the data sources and a brief introduction to the related platforms and functional components.Secondly,it explains the Chinese word segmentation and the Korean word segmentation system.At last,it takes the news,documents and materials of the Korean Peninsula as well as the various public opinion data on the network as the basic data for the research.The examples of word frequency graph and word cloud graph is carried out to show the results of text mining through Chinese word segmentation system and Korean word segmentation system. 展开更多
关键词 BIG Data Platform chinese word segmentation SYSTEM KOREAN word segmentation SYSTEM TEXT Mining
下载PDF
Feature study for improving Chinese overlapping ambiguity resolution based on SVM 被引量:1
8
作者 熊英 朱杰 《Journal of Southeast University(English Edition)》 EI CAS 2007年第2期179-184,共6页
In order to improve Chinese overlapping ambiguity resolution based on a support vector machine, statistical features are studied for representing the feature vectors. First, four statistical parameters-mutual informat... In order to improve Chinese overlapping ambiguity resolution based on a support vector machine, statistical features are studied for representing the feature vectors. First, four statistical parameters-mutual information, accessor variety, two-character word frequency and single-character word frequency are used to describe the feature vectors respectively. Then other parameters are tried to add as complementary features to the parameters which obtain the best results for further improving the classification performance. Experimental results show that features represented by mutual information, single-character word frequency and accessor variety can obtain an optimum result of 94. 39%. Compared with a commonly used word probability model, the accuracy has been improved by 6. 62%. Such comparative results confirm that the classification performance can be improved by feature selection and representation. 展开更多
关键词 support vector machine chinese overlapping ambiguity chinese word segmentation word probability model
下载PDF
Apriori and N-gram Based Chinese Text Feature Extraction Method 被引量:4
9
作者 王晔 黄上腾 《Journal of Shanghai Jiaotong university(Science)》 EI 2004年第4期11-14,20,共5页
A feature extraction, which means extracting the representative words from a text, is an important issue in text mining field. This paper presented a new Apriori and N-gram based Chinese text feature extraction method... A feature extraction, which means extracting the representative words from a text, is an important issue in text mining field. This paper presented a new Apriori and N-gram based Chinese text feature extraction method, and analyzed its correctness and performance. Our method solves the question that the exist extraction methods cannot find the frequent words with arbitrary length in Chinese texts. The experimental results show this method is feasible. 展开更多
关键词 Apriori algorithm N-GRAM chinese words segmentation feature extraction
下载PDF
阅读伴随词汇学习的词切分:首、尾词素位置概率的不同作用 被引量:1
10
作者 梁菲菲 冯琳琳 +2 位作者 刘瑛 李馨 白学军 《心理学报》 CSSCI CSCD 北大核心 2024年第3期281-294,共14页
本研究通过两个平行实验,探讨重复学习新词时首、尾词素位置概率信息作用于词切分的变化模式。采用阅读伴随词汇学习范式,将双字假词作为新词,实验1操纵首词素位置概率高低,保证尾词素相同;实验2操纵尾词素位置概率高低,保证首词素相同... 本研究通过两个平行实验,探讨重复学习新词时首、尾词素位置概率信息作用于词切分的变化模式。采用阅读伴随词汇学习范式,将双字假词作为新词,实验1操纵首词素位置概率高低,保证尾词素相同;实验2操纵尾词素位置概率高低,保证首词素相同。采用眼动仪记录大学生阅读时的眼动轨迹。结果显示:(1)首、尾词素位置概率信息的词切分作用随新词在阅读中学习次数的增加而逐步变小,表现出“熟悉性效应”。(2)首词素位置概率信息的“熟悉性效应”表现在回视路径时间、总注视次数两个相对晚期的眼动指标,而尾词素位置概率信息的“熟悉性效应”则从凝视时间开始,到回视路径时间,再持续到总注视时间。结果表明首、尾词素的位置概率信息均作用于阅读伴随词汇学习的词切分,但首词素的作用时程更长,更稳定,支持了首词素在双字词加工中具有优势的观点。 展开更多
关键词 词素位置概率 词切分 阅读伴随词汇学习 中文阅读
下载PDF
Chinese to Braille Translation Based on Braille Word Segmentation Using Statistical Model 被引量:2
11
作者 王向东 杨阳 +3 位作者 张金超 姜文斌 刘宏 钱跃良 《Journal of Shanghai Jiaotong university(Science)》 EI 2017年第1期82-86,共5页
Automatic translation of Chinese text to Chinese Braille is important for blind people in China to acquire information using computers or smart phones. In this paper, a novel scheme of Chinese-Braille translation is p... Automatic translation of Chinese text to Chinese Braille is important for blind people in China to acquire information using computers or smart phones. In this paper, a novel scheme of Chinese-Braille translation is proposed. Under the scheme, a Braille word segmentation model based on statistical machine learning is trained on a Braille corpus, and Braille word segmentation is carried out using the statistical model directly without the stage of Chinese word segmentation. This method avoids establishing rules concerning syntactic and semantic information and uses statistical model to learn the rules stealthily and automatically. To further improve the performance, an algorithm of fusing the results of Chinese word segmentation and Braille word segmentation is also proposed. Our results show that the proposed method achieves accuracy of 92.81% for Braille word segmentation and considerably outperforms current approaches using the segmentation-merging scheme. 展开更多
关键词 chinese Braille word segmentation perceptron algorithm TP 391.1 A
原文传递
基于POI数据的公共免租站点数字化检测策略探索与应用
12
作者 张悦 柯俊生 +2 位作者 张姣 易卓锋 李慧 《长江信息通信》 2024年第1期218-220,共3页
在5G网络大规模部署和刚性成本快速增长的局势下,为了节省基站建设场地租赁费用以及制定有效的选址策略,文章提出了一种公共免租站点智能检测的方法。该方法利用中文切词算法挖掘公共免租站点关键词并建立公共免租关键词库,引入广泛的PO... 在5G网络大规模部署和刚性成本快速增长的局势下,为了节省基站建设场地租赁费用以及制定有效的选址策略,文章提出了一种公共免租站点智能检测的方法。该方法利用中文切词算法挖掘公共免租站点关键词并建立公共免租关键词库,引入广泛的POI数据、基站数据以及合同数据进行匹配和交叉运算,实现系统化、智能化地检测现有站点是否满足公共免租条件。通过建设公共免租关键词库,辅助锁定存量租用的现网物业站点,并以公共免租谈判的手段进行推动,有效地摆脱了传统人工判断的主观性与不确定性,为基站选址决策者提供可靠的参考,从而降低基站租金成本,提高5G网络的经济效益。 展开更多
关键词 公共免租点 中文切词 公共免租关键词库 POI数据 基站选址
下载PDF
Chinese Word Segmentation via BiLSTM+Semi-CRF with Relay Node 被引量:2
13
作者 Nuo Qun Hang Yan +1 位作者 Xi-Peng Qiu Xuan-Jing Huang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2020年第5期1115-1126,共12页
Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit ... Semi-Markov conditional random fields(Semi-CRFs)have been successfully utilized in many segmentation problems,including Chinese word segmentation(CWS).The advantage of Semi-CRF lies in its inherent ability to exploit properties of segments instead of individual elements of sequences.Despite its theoretical advantage,Semi-CRF is still not the best choice for CWS because its computation complexity is quadratic to the sentenced length.In this paper,we propose a simple yet effective framework to help Semi-CRF achieve comparable performance with CRF-based models under similar computation complexity.Specifically,we first adopt a bi-directional long short-term memory(BiLSTM)on character level to model the context information,and then use simple but effective fusion layer to represent the segment information.Besides,to model arbitrarily long segments within linear time complexity,we also propose a new model named Semi-CRF-Relay.The direct modeling of segments makes the combination with word features easy and the CWS performance can be enhanced merely by adding publicly available pre-trained word embeddings.Experiments on four popular CWS datasets show the effectiveness of our proposed methods.The source codes and pre-trained embeddings of this paper are available on https://github.com/fastnlp/fastNLP/. 展开更多
关键词 Semi-Markov conditional random field(Semi-CRF) chinese word segmentation bi-directional long short-term memory deep learning
原文传递
基于历史事故案例的瓦斯爆炸情景要素提取及情景构建方法研究
14
作者 国汉君 赵伟 +4 位作者 宋亚楠 郭小芳 赵志虎 周爱桃 王凯 《矿业安全与环保》 CAS 北大核心 2024年第3期43-49,共7页
为深入探究煤矿瓦斯爆炸事故发展规律,提出一种基于中文分词技术对瓦斯爆炸事故情景要素进行分析和提取的方法。通过煤矿安全网等途径搜集统计了1978—2020年间的733起瓦斯爆炸事故报告,在此基础上进行数据预处理,剔除不完整的事故报告... 为深入探究煤矿瓦斯爆炸事故发展规律,提出一种基于中文分词技术对瓦斯爆炸事故情景要素进行分析和提取的方法。通过煤矿安全网等途径搜集统计了1978—2020年间的733起瓦斯爆炸事故报告,在此基础上进行数据预处理,剔除不完整的事故报告,最终选取255起瓦斯爆炸事故报告进行要素分析与提取;将事故等级、事故经过、事故原因等内容进行整理储存,形成待挖掘文本语料库;基于Jieba分词算法提取瓦斯爆炸事故情景关键词,并采用TF-IDF算法进行权重计算,将情景划分为事故体、致灾体、承灾体、抗灾体4个维度和24个要素,为后续瓦斯爆炸事故的情景表示和事故未来的可能性组合提供了参考依据。 展开更多
关键词 安全工程 瓦斯爆炸 情景分析 中文分词 Jieba分词技术
下载PDF
基于注意力增强与特征融合的中文医学实体识别
15
作者 王晋涛 秦昂 +4 位作者 张元 陈一飞 王廷凤 谢承霖 邹刚 《计算机工程》 CAS CSCD 北大核心 2024年第7期324-332,共9页
针对基于字符表示的中文医学领域命名实体识别模型嵌入形式单一、边界识别困难、语义信息利用不充分等问题,一种非常有效的方法是在Bret底层注入词汇特征,在利用词粒度语义信息的同时降低分词错误带来的影响,然而在注入词汇信息的同时... 针对基于字符表示的中文医学领域命名实体识别模型嵌入形式单一、边界识别困难、语义信息利用不充分等问题,一种非常有效的方法是在Bret底层注入词汇特征,在利用词粒度语义信息的同时降低分词错误带来的影响,然而在注入词汇信息的同时也会引入一些低相关性的词汇和噪声,导致基于注意力机制的Bret模型出现注意力分散的情况。此外仅依靠字、词粒度难以充分挖掘中文字符深层次的语义信息。对此,提出基于注意力增强与特征融合的中文医学实体识别模型,对字词注意力分数矩阵进行稀疏处理,使模型的注意力集中在相关度高的词汇,能够有效减少上下文中的噪声词汇干扰。同时,对汉字发音和笔画通过卷积神经网络(CNN)提取特征,经过迭代注意力特征融合模块进行融合,然后与Bret模型的输出特征进行拼接输入给Bi LSTM模型,进一步挖掘字符所包含的深层次语义信息。通过爬虫等方式搜集大量相关医学语料,训练医学领域词向量库,并在CCKS2017和CCKS2019数据集上进行验证,实验结果表明,该模型F1值分别达到94.90%、89.37%,效果优于当前主流的实体识别模型,具有更好的识别效果。 展开更多
关键词 实体识别 中文分词 注意力稀疏 特征融合 医学词向量库
下载PDF
基于自然语言处理技术的数据治理体系研究及应用 被引量:1
16
作者 孔庆波 李文科 《微型电脑应用》 2024年第2期122-125,共4页
在自然语言处理技术中,中文分词模型计算时间长、学习能力有限是目前困扰学术界的问题,对此提出一种结合SACNN+CRF模型。该模型结合自注意力机制、卷积神经网络、CRF优势完成中文分词任务。最佳参数测试结果表明,SACNN+CRF模型的最佳隐... 在自然语言处理技术中,中文分词模型计算时间长、学习能力有限是目前困扰学术界的问题,对此提出一种结合SACNN+CRF模型。该模型结合自注意力机制、卷积神经网络、CRF优势完成中文分词任务。最佳参数测试结果表明,SACNN+CRF模型的最佳隐藏数和最佳迭代次数分别为100个和200次。相较于BiSTM+CRF模型,SACNN+CRF模型的MAE、RMSE、MAPE三个指标分别提升了32.98%、41.89%、36.58%。所提出的SACNN+CRF模型具有较高的运行效率,在中文分词任务中的应用具有较高的价值。 展开更多
关键词 中文分词 自注意力 卷积神经网络
下载PDF
基于深度学习的中文命名实体识别技术研究
17
作者 武文静 岳杰 +1 位作者 王佳丽 刘枫 《河北建筑工程学院学报》 CAS 2024年第3期210-215,共6页
命名实体识别(NER)是NLP领域的一项基础底层任务。针对当前传统的基于规则和统计方法存在特征提取的精准度和模型的可扩展性上不足的问题,中文命名实体识别技术在利用神经网络学习模型时得到了极大地改善。除了通过Bert预训练模型和相... 命名实体识别(NER)是NLP领域的一项基础底层任务。针对当前传统的基于规则和统计方法存在特征提取的精准度和模型的可扩展性上不足的问题,中文命名实体识别技术在利用神经网络学习模型时得到了极大地改善。除了通过Bert预训练模型和相关的公开数据集对文本数据特征提取、识别实体之外还融合了人工标注的地名和组织机构实体的额外数据集来增强模型的词义理解准确度。实验结果表明,模型的实体识别能力有所提高。 展开更多
关键词 自然语言处理 中文命名实体识别 深度学习 中文分词
下载PDF
应用Jieba和Wordcloud库的词云设计与优化 被引量:20
18
作者 徐博龙 《福建电脑》 2019年第6期25-28,共4页
分词是Python中的一项重要应用,实现分词功能的工具有很多种,如jieba、SnowNLP、THULAC、NLPIR等。词云是在分词的基础上设计并实现的,它提供阅读整个信息的重点,揭示关键概念,并可使用不同的展示形式,以有趣、高效、新颖的方式呈现给... 分词是Python中的一项重要应用,实现分词功能的工具有很多种,如jieba、SnowNLP、THULAC、NLPIR等。词云是在分词的基础上设计并实现的,它提供阅读整个信息的重点,揭示关键概念,并可使用不同的展示形式,以有趣、高效、新颖的方式呈现给阅读者。在此,以中文分词为例,详细介绍使用jieba库和wordcloud库实现词云的设计与优化。 展开更多
关键词 PYTHON 中文分词 词云 Jieba wordcloud
下载PDF
基于自然语言处理的学生评教情绪分析
19
作者 高云 刘寰 +1 位作者 周建慧 郭艳萍 《山西大同大学学报(自然科学版)》 2024年第5期49-55,共7页
对学生评教信息中蕴含的情绪分析对于课堂教学的改进起着至关重要的作用,使用了“中文分词+token+LSTM模型”的自然语言处理方式对学生评教信息进行了情绪分析.设置词表和停用词,对数据集进行中文分词.将得到的中文分词列表训练得出数... 对学生评教信息中蕴含的情绪分析对于课堂教学的改进起着至关重要的作用,使用了“中文分词+token+LSTM模型”的自然语言处理方式对学生评教信息进行了情绪分析.设置词表和停用词,对数据集进行中文分词.将得到的中文分词列表训练得出数字字典,将分词列表转换成数字列表,最后将数字列表转成空间向量形成数据集.建立LSTM模型,使用建立好的训练集进行训练,对训练后的模型进行评估,评估结果证明该模型是可靠的,对选取的典型的和复杂的数据进行预测,得出情绪分析结果.实验证明,该模式对于典型和复杂评教信息的分析结果均是正确的。 展开更多
关键词 自然语言处理 评教信息 情绪分析 中文分词 LSTM模型
下载PDF
词切分对藏-汉读者汉语阅读的影响:语言水平的调节作用
20
作者 高蕾 李天贽 +2 位作者 窦浩天 陈雯月 陈成 《辽宁师范大学学报(社会科学版)》 2024年第4期60-68,共9页
采用EyeLink 1000 Plus型眼动仪,以40名藏族大学生为被试,以汉语句子为阅读材料,探讨词切分对藏-汉读者汉语阅读的影响及语言水平的调节作用。实验采用空格词切分方式,设置了四种呈现条件:正常句子、字间空格、词间空格和非词空格,并操... 采用EyeLink 1000 Plus型眼动仪,以40名藏族大学生为被试,以汉语句子为阅读材料,探讨词切分对藏-汉读者汉语阅读的影响及语言水平的调节作用。实验采用空格词切分方式,设置了四种呈现条件:正常句子、字间空格、词间空格和非词空格,并操纵了读者的汉语水平。实验结果表明:词间空格对藏-汉读者的汉语阅读起到了一定的促进作用,不同汉语水平藏-汉读者阅读汉语时的眼动模式存在差异。 展开更多
关键词 词切分 藏-汉读者 汉语阅读 眼动
下载PDF
上一页 1 2 45 下一页 到第
使用帮助 返回顶部