期刊文献+
共找到1,474篇文章
< 1 2 74 >
每页显示 20 50 100
Chinese word segmentation with local and global context representation learning 被引量:2
1
作者 李岩 Zhang Yinghua +2 位作者 Huang Xiaoping Yin Xucheng Hao Hongwei 《High Technology Letters》 EI CAS 2015年第1期71-77,共7页
A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chin... A local and global context representation learning model for Chinese characters is designed and a Chinese word segmentation method based on character representations is proposed in this paper. First, the proposed Chinese character learning model uses the semanties of loeal context and global context to learn the representation of Chinese characters. Then, Chinese word segmentation model is built by a neural network, while the segmentation model is trained with the eharaeter representations as its input features. Finally, experimental results show that Chinese charaeter representations can effectively learn the semantic information. Characters with similar semantics cluster together in the visualize space. Moreover, the proposed Chinese word segmentation model also achieves a pretty good improvement on precision, recall and f-measure. 展开更多
关键词 local and global context representation learning Chinese character representa- tion Chinese word segmentation
下载PDF
Word Segmentation for Chinese Judicial Documents 被引量:1
2
作者 Linxia Yao Jidong Ge +5 位作者 Chuanyi Li Yuan Yao Zhenhao Li Jin Zeng Bin Luo Victor Chang 《国际计算机前沿大会会议论文集》 2019年第1期476-478,共3页
Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents:(1) existing methods rely on large-... Word segmentation is an integral step in many knowledge discovery applications. However, existing word segmentation methods have problems when applying to Chinese judicial documents:(1) existing methods rely on large-scale labeled data which is typically unavailable in judicial documents, and (2) judicial document has its own language features and writing formats. In this paper, a word segmentation method is proposed for Chinese judicial documents. The proposed method consists of two steps:(1) automatically generating some labeled data as legal dictionaries, and (2) applying a hybrid multilayer neural networks to do word segmentation incorporating legal dictionaries. Experiments are conducted on a dataset of Chinese judicial documents showing that the proposed model can achieve better results than the existing methods. 展开更多
关键词 CHINESE word segmentation KNOWLEDGE DISCOVERY JUDICIAL DOCUMENTS
下载PDF
Design and Implementation of a New Chinese Word Segmentation Dictionary for the Personalized Mobile Search
3
作者 Zhongmin Wang Jingna Qi Yan He 《Communications and Network》 2013年第1期81-85,共5页
Chinese word segmentation is the basis of natural language processing. The dictionary mechanism significantly influences the efficiency of word segmentation and the understanding of the user’s intention which is impl... Chinese word segmentation is the basis of natural language processing. The dictionary mechanism significantly influences the efficiency of word segmentation and the understanding of the user’s intention which is implied in the user’s query. As the traditional dictionary mechanisms can't meet the present situation of personalized mobile search, this paper presents a new dictionary mechanism which contains the word classification information. This paper, furthermore, puts forward an approach for improving the traditional word bank structure, and proposes an improved FMM segmentation algorithm. The results show that the new dictionary mechanism has made a significant increase on the query efficiency and met the user’s individual requirements better. 展开更多
关键词 Chinese word segmentation DICTIONARY Mechanism Natural LANGUAGE Processing PERSONALIZED SEARCH word Classification Information
下载PDF
An Improved Unsupervised Approach to Word Segmentation
4
作者 WANG Hanshi HAN Xuhong +2 位作者 LIU Lizhen SONG Wei YUAN Mudan 《China Communications》 SCIE CSCD 2015年第7期82-95,共14页
ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the... ESA is an unsupervised approach to word segmentation previously proposed by Wang, which is an iterative process consisting of three phases: Evaluation, Selection and Adjustment. In this article, we propose Ex ESA, the extension of ESA. In Ex ESA, the original approach is extended to a 2-pass process and the ratio of different word lengths is introduced as the third type of information combined with cohesion and separation. A maximum strategy is adopted to determine the best segmentation of a character sequence in the phrase of Selection. Besides, in Adjustment, Ex ESA re-evaluates separation information and individual information to overcome the overestimation frequencies. Additionally, a smoothing algorithm is applied to alleviate sparseness. The experiment results show that Ex ESA can further improve the performance and is time-saving by properly utilizing more information from un-annotated corpora. Moreover, the parameters of Ex ESA can be predicted by a set of empirical formulae or combined with the minimum description length principle. 展开更多
关键词 word segmentation character sequence smoothing algorithm maximum strategy
下载PDF
Applying rough sets in word segmentation disambiguation based on maximum entropy model
5
作者 姜维 王晓龙 +1 位作者 关毅 梁国华 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2006年第1期94-98,共5页
To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets ill WSD based on the Maximum Entropy model. Firstly... To solve the complicated feature extraction and long distance dependency problem in Word Segmentation Disambiguation (WSD), this paper proposes to apply rough sets ill WSD based on the Maximum Entropy model. Firstly, rough set theory is applied to extract the complicated features and long distance features, even frnm noise or inconsistent corpus. Secondly, these features are added into the Maximum Entropy model, and consequently, the feature weights can be assigned according to the performance of the whole disambiguation mnltel. Finally, tile semantic lexicou is adopted to build class-hased rough set teatures to overcome data spareness. The experiment indicated that our method performed better than previous models, which got top rank in WSD in 863 Evaluation in 2003. This system ranked first and second respcetively in MSR and PKU open test in the Second International Chinese Word Segmentation Bankeoff held in 2005. 展开更多
关键词 word segmentation feature extraction rough sets maximum entropy
下载PDF
Text Mining Based on the Korean Word Segmentation System in the Context of Big Data
6
作者 Yongmin Quan Na Niu +1 位作者 Hongyi Li Zhezhi Jin 《信息工程期刊(中英文版)》 2018年第1期1-7,共7页
Text mining is a text data analysis,found that the relationship between concepts and underlying concepts from unstructured text,it is extracted from large text database has not yet been realized patterns or associatio... Text mining is a text data analysis,found that the relationship between concepts and underlying concepts from unstructured text,it is extracted from large text database has not yet been realized patterns or associations,some information retrieval and text processing system can find the relationship between words and paragraphs.This article first describes the data sources and a brief introduction to the related platforms and functional components.Secondly,it explains the Chinese word segmentation and the Korean word segmentation system.At last,it takes the news,documents and materials of the Korean Peninsula as well as the various public opinion data on the network as the basic data for the research.The examples of word frequency graph and word cloud graph is carried out to show the results of text mining through Chinese word segmentation system and Korean word segmentation system. 展开更多
关键词 BIG Data Platform Chinese word segmentation SYSTEM KOREAN word segmentation SYSTEM TEXT Mining
下载PDF
Effective Analysis of Chinese Word-Segmentation Accuracy
7
作者 MA Weiyin 《现代电子技术》 2007年第4期108-110,共3页
Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidate... Automatic word-segmentation is widely used in the ambiguity cancellation when processing large-scale real text,but during the process of unknown word detection in Chinese word segmentation,many detected word candidates are invalid.These false unknown word candidates deteriorate the overall segmentation accuracy,as it will affect the segmentation accuracy of known words.In this paper,we propose several methods for reducing the difficulties and improving the accuracy of the word-segmentation of written Chinese,such as full segmentation of a sentence,processing the duplicative word,idioms and statistical identification for unknown words.A simulation shows the feasibility of our proposed methods in improving the accuracy of word-segmentation of Chinese. 展开更多
关键词 中文信息处理 汉字处理 自动分割 效率分析
下载PDF
Remove Redundancy Samples for SVM in A Chinese Word Segmentation Task
8
作者 Feiliang Ren Tianshun Yao 《通讯和计算机(中英文版)》 2006年第5期103-107,共5页
关键词 文字处理 变参数系统 软件开发 数据处理
下载PDF
Improvement in Accuracy of Word Segmentation of a Web-Based Japanese-to-Braille Translation Program for Medical Information
9
作者 Tsuyoshi Oda Aki Sugano +10 位作者 Masashi Shimbo Kenji Miura Mika Ohta Masako Matsuura Mineko Ikegami Tetsuya Watanabe Shinichi Kita Akihiro Ichinose Eiichi Maeda Yuji Matsumoto Yutaka Takaoka 《通讯和计算机(中英文版)》 2013年第1期82-89,共8页
关键词 医疗信息 翻译程序 Web 盲文 分词 精度 自然语言处理 专有名词
下载PDF
利用word2vec对中文词进行聚类的研究 被引量:29
10
作者 郑文超 徐鹏 《软件》 2013年第12期160-162,共3页
文本聚类在数据挖掘和机器学习中发挥着重要的作用,该技术经过多年的发展,已产生了一系列的理论成果。本文在前人研究成果的基础上,探索了一种新的中文聚类方法。本文先提出了一种中文分词算法,用来将中文文本分割成独立的词语。再对处... 文本聚类在数据挖掘和机器学习中发挥着重要的作用,该技术经过多年的发展,已产生了一系列的理论成果。本文在前人研究成果的基础上,探索了一种新的中文聚类方法。本文先提出了一种中文分词算法,用来将中文文本分割成独立的词语。再对处理后的语料使用Word2Vec工具集,应用深度神经网络算法,转化为对应的词向量。最后,将词向量之间的余弦距离定义为词之间的相似度,通过使用K-means聚类算法将获取的词向量进行聚类,最终可以返回语料库中同输入词语语意最接近的词。本文从网络上抓取了2012年的网络新闻数据,应用上述方法进行了实验,取得了不错的实验效果。 展开更多
关键词 数据挖掘 聚类 分词 词向量 神经网络
下载PDF
Word2007在长篇文档编辑中的几个技巧 被引量:2
11
作者 楚叶峰 周海涛 《长春大学学报》 2013年第2期241-245,共5页
Word 2007在办公领域中使用率很高,用于长篇文档编辑中需要考虑的事情有很多,本文简单介绍了在长篇文档编辑中几个实用的小技巧,在实际应用过程中可以达到事半功倍的效果。主要技巧有分节、制作样式、目录提取、双面打印以及几种特殊的... Word 2007在办公领域中使用率很高,用于长篇文档编辑中需要考虑的事情有很多,本文简单介绍了在长篇文档编辑中几个实用的小技巧,在实际应用过程中可以达到事半功倍的效果。主要技巧有分节、制作样式、目录提取、双面打印以及几种特殊的排版技巧。 展开更多
关键词 word2007 排版 分节 样式 打印
下载PDF
应用Jieba和Wordcloud库的词云设计与优化 被引量:20
12
作者 徐博龙 《福建电脑》 2019年第6期25-28,共4页
分词是Python中的一项重要应用,实现分词功能的工具有很多种,如jieba、SnowNLP、THULAC、NLPIR等。词云是在分词的基础上设计并实现的,它提供阅读整个信息的重点,揭示关键概念,并可使用不同的展示形式,以有趣、高效、新颖的方式呈现给... 分词是Python中的一项重要应用,实现分词功能的工具有很多种,如jieba、SnowNLP、THULAC、NLPIR等。词云是在分词的基础上设计并实现的,它提供阅读整个信息的重点,揭示关键概念,并可使用不同的展示形式,以有趣、高效、新颖的方式呈现给阅读者。在此,以中文分词为例,详细介绍使用jieba库和wordcloud库实现词云的设计与优化。 展开更多
关键词 PYTHON 中文分词 词云 Jieba wordcloud
下载PDF
用Word进行毕业论文编辑排版的技巧 被引量:4
13
作者 刘敏 《电脑学习》 2009年第2期112-113,共2页
介绍用Word进行毕业论文编辑排版时常用的技巧。
关键词 word 排版 样式 分节 目录
下载PDF
Rotation,Translation and Scale Invariant Sign Word Recognition Using Deep Learning 被引量:2
14
作者 Abu Saleh Musa Miah Jungpil Shin +2 位作者 Md.Al Mehedi Hasan Md Abdur Rahim Yuichi Okuyama 《Computer Systems Science & Engineering》 SCIE EI 2023年第3期2521-2536,共16页
Communication between people with disabilities and people who do not understand sign language is a growing social need and can be a tedious task.One of the main functions of sign language is to communicate with each o... Communication between people with disabilities and people who do not understand sign language is a growing social need and can be a tedious task.One of the main functions of sign language is to communicate with each other through hand gestures.Recognition of hand gestures has become an important challenge for the recognition of sign language.There are many existing models that can produce a good accuracy,but if the model test with rotated or translated images,they may face some difficulties to make good performance accuracy.To resolve these challenges of hand gesture recognition,we proposed a Rotation,Translation and Scale-invariant sign word recognition system using a convolu-tional neural network(CNN).We have followed three steps in our work:rotated,translated and scaled(RTS)version dataset generation,gesture segmentation,and sign word classification.Firstly,we have enlarged a benchmark dataset of 20 sign words by making different amounts of Rotation,Translation and Scale of the ori-ginal images to create the RTS version dataset.Then we have applied the gesture segmentation technique.The segmentation consists of three levels,i)Otsu Thresholding with YCbCr,ii)Morphological analysis:dilation through opening morphology and iii)Watershed algorithm.Finally,our designed CNN model has been trained to classify the hand gesture as well as the sign word.Our model has been evaluated using the twenty sign word dataset,five sign word dataset and the RTS version of these datasets.We achieved 99.30%accuracy from the twenty sign word dataset evaluation,99.10%accuracy from the RTS version of the twenty sign word evolution,100%accuracy from thefive sign word dataset evaluation,and 98.00%accuracy from the RTS versionfive sign word dataset evolution.Furthermore,the influence of our model exists in competitive results with state-of-the-art methods in sign word recognition. 展开更多
关键词 Sign word recognition convolution neural network(cnn) rotation translation and scaling(rts) otsu segmentation
下载PDF
Chinese Word Boundary Ambiguity and Unknown Word Resolution Using Unsupervised Methods 被引量:1
15
作者 傅国宏 《High Technology Letters》 EI CAS 2000年第2期29-39,共11页
An unsupervised framework to partially resolve the four issues, namely ambiguity, unknown word, knowledge acquisition and efficient algorithm, in developing a robust Chinese segmentation system is described. It first ... An unsupervised framework to partially resolve the four issues, namely ambiguity, unknown word, knowledge acquisition and efficient algorithm, in developing a robust Chinese segmentation system is described. It first proposes a statistical segmentation model integrating the simplified character juncture model (SCJM) with word formation power. The advantage of this model is that it can employ the affinity of characters inside or outside a word and word formation power simultaneously to process disambiguation and all the parameters can be estimated in an unsupervised way. After investigating the differences between real and theoretical size of segmentation space, we apply A * algorithm to perform segmentation without exhaustively searching all the potential segmentations. Finally, an unsupervised version of Chinese word formation patterns to detect unknown words is presented. Experiments show that the proposed methods are efficient. 展开更多
关键词 word segmentation CHARACTER JUNCTURE Work formation pattern
下载PDF
A New Word Detection Method for Chinese Based on Local Context Information 被引量:1
16
作者 曾华琳 周昌乐 郑旭玲 《Journal of Donghua University(English Edition)》 EI CAS 2010年第2期189-192,共4页
Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction b... Finding out out-of-vocabulary words is an urgent and difficult task in Chinese words segmentation. To avoid the defect causing by offline training in the traditional method, the paper proposes an improved prediction by partical match (PPM) segmenting algorithm for Chinese words based on extracting local context information, which adds the context information of the testing text into the local PPM statistical model so as to guide the detection of new words. The algorithm focuses on the process of online segmentatien and new word detection which achieves a good effect in the close or opening test, and outperforms some well-known Chinese segmentation system to a certain extent. 展开更多
关键词 new word detection improved PPM model context information Chinese words segmentation
下载PDF
基于TF-IDF与Word2vec的用户评论分析研究 被引量:4
17
作者 刘宇韬 施莉 刘诗含 《成都航空职业技术学院学报》 2022年第4期89-92,共4页
文章以对电脑产品为实验对象,通过网络爬虫对评论数据进行爬取,并将用户评论进行分词处理,而后就处理结果分别基于TF-IDF和Word2vec两者进行文本分析,计算该评论中的高频词语及其相关性,从而了解用户对该类产品的关注点及与之相关的其... 文章以对电脑产品为实验对象,通过网络爬虫对评论数据进行爬取,并将用户评论进行分词处理,而后就处理结果分别基于TF-IDF和Word2vec两者进行文本分析,计算该评论中的高频词语及其相关性,从而了解用户对该类产品的关注点及与之相关的其他问题,最后为生产商及电商平台提出指导性建议。 展开更多
关键词 用户评论 中文分词 TF-IDF word2vec
下载PDF
阅读伴随词汇学习的词切分:首、尾词素位置概率的不同作用 被引量:2
18
作者 梁菲菲 冯琳琳 +2 位作者 刘瑛 李馨 白学军 《心理学报》 CSSCI CSCD 北大核心 2024年第3期281-294,共14页
本研究通过两个平行实验,探讨重复学习新词时首、尾词素位置概率信息作用于词切分的变化模式。采用阅读伴随词汇学习范式,将双字假词作为新词,实验1操纵首词素位置概率高低,保证尾词素相同;实验2操纵尾词素位置概率高低,保证首词素相同... 本研究通过两个平行实验,探讨重复学习新词时首、尾词素位置概率信息作用于词切分的变化模式。采用阅读伴随词汇学习范式,将双字假词作为新词,实验1操纵首词素位置概率高低,保证尾词素相同;实验2操纵尾词素位置概率高低,保证首词素相同。采用眼动仪记录大学生阅读时的眼动轨迹。结果显示:(1)首、尾词素位置概率信息的词切分作用随新词在阅读中学习次数的增加而逐步变小,表现出“熟悉性效应”。(2)首词素位置概率信息的“熟悉性效应”表现在回视路径时间、总注视次数两个相对晚期的眼动指标,而尾词素位置概率信息的“熟悉性效应”则从凝视时间开始,到回视路径时间,再持续到总注视时间。结果表明首、尾词素的位置概率信息均作用于阅读伴随词汇学习的词切分,但首词素的作用时程更长,更稳定,支持了首词素在双字词加工中具有优势的观点。 展开更多
关键词 词素位置概率 词切分 阅读伴随词汇学习 中文阅读
下载PDF
基于Word2Vec及TextRank算法的长文档摘要自动生成研究 被引量:1
19
作者 朱玉婷 刘乐 +2 位作者 辛晓乐 陈珑慧 康亮河 《现代信息科技》 2023年第4期36-38,42,共4页
近年来,如何从大量信息中提取关键信息已成为一个急需解决的问题。针对中文专利长文档,提出一种结合Word2Vec和TextRank的专利生成算法。首先利用Python Jieba技术对中文专利文档进行分词,利用停用词典去除无意义的词;其次利用Word2Vec... 近年来,如何从大量信息中提取关键信息已成为一个急需解决的问题。针对中文专利长文档,提出一种结合Word2Vec和TextRank的专利生成算法。首先利用Python Jieba技术对中文专利文档进行分词,利用停用词典去除无意义的词;其次利用Word2Vec算法进行特征提取,并利用WordCloud对提取的关键词进行可视化展示;最后利用TextRank算法计算语句间的相似度,生成摘要候选句,根据候选句的权重生成该专利文档的摘要信息。实验表明,采用Word2Vec和TextRank生成的专利摘要质量高,概括性也强。 展开更多
关键词 Jieba分词 关键词提取 word2Vec算法 TextRank算法
下载PDF
基于BERT-BiLSTM-CRF模型的畜禽疫病文本分词研究 被引量:2
20
作者 余礼根 郭晓利 +3 位作者 赵红涛 杨淦 张俊 李奇峰 《农业机械学报》 EI CAS CSCD 北大核心 2024年第2期287-294,共8页
针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectiona... 针对畜禽疫病文本语料匮乏、文本内包含大量疫病名称及短语等未登录词问题,提出了一种结合词典匹配的BERT-BiLSTM-CRF畜禽疫病文本分词模型。以羊疫病为研究对象,构建了常见疫病文本数据集,将其与通用语料PKU结合,利用BERT(Bidirectional encoder representation from transformers)预训练语言模型进行文本向量化表示;通过双向长短时记忆网络(Bidirectional long short-term memory network,BiLSTM)获取上下文语义特征;由条件随机场(Conditional random field,CRF)输出全局最优标签序列。基于此,在CRF层后加入畜禽疫病领域词典进行分词匹配修正,减少在分词过程中出现的疫病名称及短语等造成的歧义切分,进一步提高了分词准确率。实验结果表明,结合词典匹配的BERT-BiLSTM-CRF模型在羊常见疫病文本数据集上的F1值为96.38%,与jieba分词器、BiLSTM-Softmax模型、BiLSTM-CRF模型、未结合词典匹配的本文模型相比,分别提升11.01、10.62、8.3、0.72个百分点,验证了方法的有效性。与单一语料相比,通用语料PKU和羊常见疫病文本数据集结合的混合语料,能够同时对畜禽疫病专业术语及疫病文本中常用词进行准确切分,在通用语料及疫病文本数据集上F1值都达到95%以上,具有较好的模型泛化能力。该方法可用于畜禽疫病文本分词。 展开更多
关键词 畜禽疫病 文本分词 预训练语言模型 双向长短时记忆网络 条件随机场
下载PDF
上一页 1 2 74 下一页 到第
使用帮助 返回顶部