期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
基于统计抽词和格律的全宋词切分语料库建立 被引量:11
1
作者 苏劲松 周昌乐 李翼鸿 《中文信息学报》 CSCD 北大核心 2007年第2期52-57,共6页
全宋词切分语料库的建立是计算机研究宋词的基础。本文对宋词中“词”的界定提出了自己的看法,并在综合考虑统计抽词方法和基于诗词格律切分方法各自优点的基础上,提出建立全宋词切分语料库的新方法。我们首先通过统计抽词来抽取结合程... 全宋词切分语料库的建立是计算机研究宋词的基础。本文对宋词中“词”的界定提出了自己的看法,并在综合考虑统计抽词方法和基于诗词格律切分方法各自优点的基础上,提出建立全宋词切分语料库的新方法。我们首先通过统计抽词来抽取结合程度较强的二字词,并结合相关资源建立词表;在此基础上,结合宋词的格律特点按照一定的规则来对全宋词进行了切分。实验证明,本文中的方法具有较好的效果。 展开更多
关键词 计算机应用 中文信息处理 语料库 统计 格律
下载PDF
基于ElasticSearch的科技服务推荐系统设计与实现
2
作者 刘勇 刘菲 蒙杰 《甘肃科技》 2024年第3期59-64,共6页
推荐系统是通过算法策略将用户与信息匹配,实现信息高效筛选和个性化推荐的闭环系统。系统基于ElasticSearch大数据分析挖掘技术,采用TF-IDF、LUCentere、热词统计发现算法,利用Python程序将数据异步批量导入至Elasticsearch中。结合昨... 推荐系统是通过算法策略将用户与信息匹配,实现信息高效筛选和个性化推荐的闭环系统。系统基于ElasticSearch大数据分析挖掘技术,采用TF-IDF、LUCentere、热词统计发现算法,利用Python程序将数据异步批量导入至Elasticsearch中。结合昨日访问、点赞、收藏上升数以及发布时间等指标计算文档的相关性得分并排序,实现用户喜好推荐。研究目的在于提高Elasticsearch索引数据时的分词准确度,缩短检索响应时间,为用户提供更优使用体验。通过翻阅大量文献和不断实验测试系统中的算法,有效解决了“冷启动”问题。实验结果表明系统在用户体验方面取得显著提升,用户正反馈促进了留存率和点击转换率的提高。研究为推荐系统的算法设计和实施提供了有益经验,对信息检索和用户体验改进具有实质性的意义。 展开更多
关键词 大数据分析挖掘 词统计 智能推荐 ElasticSearch TF-IDF LUCentere
下载PDF
基于正则匹配和词云统计的主变压器缺陷内容分析 被引量:1
3
作者 刘丽 张云云 +1 位作者 黄道友 张征凯 《现代工业经济和信息化》 2019年第2期118-120,共3页
以安徽省变压器相关数据为基础,编写正则表达式匹配出部体、部件的发生部位以及缺陷性质的严重程度,利用词云分析和词频统计对缺陷部体、部件发生部位及缺陷性质等进行分析。
关键词 主变压器 缺陷内容 正则匹配 统计
下载PDF
基于计算机的《红楼梦》字词浅探 被引量:4
4
作者 李瑞芳 孙军波 常诗珧 《电脑知识与技术》 2009年第1X期753-755,共3页
红学研究已经现代化,计算机技术已被成功运用到分析研究《红楼梦》当中。利用Java编程语言编制出字、词统计程序,并运用Spss软件作出所统计字、词的线形图,对这些字、词还有线形图进行比较和分析。反映出《红楼梦》作者在前80回和后40... 红学研究已经现代化,计算机技术已被成功运用到分析研究《红楼梦》当中。利用Java编程语言编制出字、词统计程序,并运用Spss软件作出所统计字、词的线形图,对这些字、词还有线形图进行比较和分析。反映出《红楼梦》作者在前80回和后40回用字、用词习惯的改变,即作者语言风格和写作风格的转变,为红学研究提供一定的参考、利用价值。 展开更多
关键词 JAVA编程语言 SPSS 《红楼梦》 统计 词统计
下载PDF
在线评论数据挖掘视角下游客情感分析模式构建——以肇庆市七星岩景区为调研对象
5
作者 郭栩东 胡绿 +1 位作者 李茂强 王怡 《科技创新与应用》 2024年第13期9-13,共5页
互联网时代的到来,推动全球的经济发展发生巨大改变。如今互联网已经是人们的生活中不可分割的一部分。旅游行业的业态也随网络时代而转变。随着各旅游平台的兴起,在线评论已经是很普遍的现象,并成为人们旅游消费的重要参考指标。该文... 互联网时代的到来,推动全球的经济发展发生巨大改变。如今互联网已经是人们的生活中不可分割的一部分。旅游行业的业态也随网络时代而转变。随着各旅游平台的兴起,在线评论已经是很普遍的现象,并成为人们旅游消费的重要参考指标。该文以肇庆市七星岩景区为调研对象,基于在线评论的视角对数据采集进行剔除过滤、高频词统计及可视化呈现等一系列工作,通过情感分析法对数据进行分析整理,构建游客情感分析模式。经过系列的实证试验分析可以诊断,对于旅游目的地因素,游客的情感更为敏感以及在意;管理和景观是景区提升自身竞争力的重要内容;消费不仅受游客关注度影响,同时也对游客情感有着显著的影响力。提出关于资源管理、经济发展模式和服务、饮食开发的相关建议。 展开更多
关键词 在线评论 游客情感 情感分析模式 数据挖掘 高频词统计
下载PDF
基于共词分析的我国血管生成研究领域热点分析 被引量:3
6
作者 崔鹤蓉 赵雅 +5 位作者 张文曦 李磊 陈红珊 王鹏龙 苏进 雷海民 《中国医药导报》 CAS 2020年第5期184-187,共4页
目的对血管生成领域相关文献关键词进行共词分析,探索该领域研究热点。方法选取CNKI学术期刊中的“EI来源期刊”“核心期刊”“CSSCI”或“CSCD”作为数据来源,限定领域类型为“医药卫生科技”,以“血管生成”为检索关键词,检索类型为... 目的对血管生成领域相关文献关键词进行共词分析,探索该领域研究热点。方法选取CNKI学术期刊中的“EI来源期刊”“核心期刊”“CSSCI”或“CSCD”作为数据来源,限定领域类型为“医药卫生科技”,以“血管生成”为检索关键词,检索类型为“主题”,检索时间为1992年~2019年6月,检索得到血管生成领域相关文献,采用pajek和SPSS 22.0等软件进行文献计量分析,统计高频词并构建共词矩阵,根据共词矩阵进行聚类分析并生成共词网络图。结果根据检索公式共得到血管生成领域核心期刊论文共2586篇,共词网络图显示关键词主要集中在干预方式、研究模型/技术、机制、评价指标及临床疾病5个主要方面。结论我国血管生成的研究热点方面中,抑制血管生成主要方向是对肿瘤的研究;促进血管生成主要方向是对子宫内膜异位症、心肌缺血等疾病的研究;实验动物主要是鸡胚尿囊膜,为相关临床和科研工作提供参考依据。 展开更多
关键词 分析 高频词统计 血管生成 研究热点
下载PDF
基于数据挖掘的互联网行业岗位类型分析 被引量:3
7
作者 詹翠芬 周燕 《计算机产品与流通》 2018年第7期136-138,共3页
本文通过抓取web招聘广告,利用Python进行文本分词并构建关键词向量空间模型来完成数据预处理部分,通过K-means聚类完成岗位类型划分。在此基础上,对各岗位类型进行相关性分析,并利用Python词云进行技能关键词统计。
关键词 文本挖掘 聚类分析 统计
下载PDF
Comparative Analysis of Modal Auxiliary Verbs in English and in Chinese
8
作者 ZHANG Hong-yan 《Sino-US English Teaching》 2015年第2期128-136,共9页
The study of modal auxiliary verbs has been done by comparing modal auxiliary verbs in English with the ones in Chinese qualitatively and quantitatively. The modals in English and in Chinese are statistically analyzed... The study of modal auxiliary verbs has been done by comparing modal auxiliary verbs in English with the ones in Chinese qualitatively and quantitatively. The modals in English and in Chinese are statistically analyzed through their forms and meanings. The data consists of 50 pieces of Chinese prose with their 50 English translation versions called corpus A and 50 pieces of English prose with their Chinese translation versions called corpus B, altogether 200 articles, which represent a type of discourse that is rich in modal auxiliary verbs both in English and in Chinese The major findings are as follows: (1) The three criteria: inversion, negation, and the use of pro-forms can be used to define both English and Chinese auxiliaries; (2) the modals of both languages can be analyzed within the same semantic categories: volition, probability, and necessity; (3) Chinese epistemic modals can have inversion patterns; (4) the negative forms of Chinese modals are more complex than those of English modals; and (5) the statistic analysis shows that the modals in probability category both in English and in Chinese are used much more often compared to the other two categories: volition and necessity and that deontic modals are used much fewer in both languages to express necessity 展开更多
关键词 modal auxiliary verb VOLITION PROBABILITY NECESSITY EPISTEMIC deontic
下载PDF
A Fuzzy Neural Network Model of Linguistic Dynamic Systems Based on Computing with Words
9
作者 蔡国榕 李绍滋 +1 位作者 陈水利 吴云东 《Journal of Donghua University(English Edition)》 EI CAS 2010年第6期813-818,共6页
Linguistic dynamic systems(LDS)are dynamic processes involving computing with words(CW)for modeling and analysis of complex systems.In this paper,a fuzzy neural network(FNN)structure of LDS was proposed.In addition,an... Linguistic dynamic systems(LDS)are dynamic processes involving computing with words(CW)for modeling and analysis of complex systems.In this paper,a fuzzy neural network(FNN)structure of LDS was proposed.In addition,an improved nonlinear particle swarm optimization was employed for training FNN.The experiment results on logistics formulation demonstrates the feasibility and the efficiency of this FNN model. 展开更多
关键词 linguistic dynamic systems(LDS) computing with words(CW) fuzzy neural network(FNN) particle swarm optimization(PSO)
下载PDF
Improved hidden Markov model for speech recognition and POS tagging 被引量:4
10
作者 袁里驰 《Journal of Central South University》 SCIE EI CAS 2012年第2期511-516,共6页
In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language proc... In order to overcome defects of the classical hidden Markov model (HMM), Markov family model (MFM), a new statistical model was proposed. Markov family model was applied to speech recognition and natural language processing. The speaker independently continuous speech recognition experiments and the part-of-speech tagging experiments show that Markov family model has higher performance than hidden Markov model. The precision is enhanced from 94.642% to 96.214% in the part-of-speech tagging experiments, and the work rate is reduced by 11.9% in the speech recognition experiments with respect to HMM baseline system. 展开更多
关键词 hidden Markov model Markov family model speech recognition part-of-speech tagging
下载PDF
Graph-based Lexicalized Reordering Models for Statistical Machine Translation
11
作者 SU Jinsong LIU Yang +1 位作者 LIU Qun DONG Huailin 《China Communications》 SCIE CSCD 2014年第5期71-82,共12页
Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word a... Lexicalized reordering models are very important components of phrasebased translation systems.By examining the reordering relationships between adjacent phrases,conventional methods learn these models from the word aligned bilingual corpus,while ignoring the effect of the number of adjacent bilingual phrases.In this paper,we propose a method to take the number of adjacent phrases into account for better estimation of reordering models.Instead of just checking whether there is one phrase adjacent to a given phrase,our method firstly uses a compact structure named reordering graph to represent all phrase segmentations of a parallel sentence,then the effect of the adjacent phrase number can be quantified in a forward-backward fashion,and finally incorporated into the estimation of reordering models.Experimental results on the NIST Chinese-English and WMT French-Spanish data sets show that our approach significantly outperforms the baseline method. 展开更多
关键词 natural language processing statistical machine translation lexicalized reordering model reordering graph
下载PDF
Acquiring synonymous attribute phrases for named entities via online encyclopedia
12
作者 伍大勇 Zhao Shiqi Liu Ting 《High Technology Letters》 EI CAS 2013年第4期398-405,共8页
In this work,an approach is proposed to acquire synonymous attribute phrases of named entities(NEs) from an online encyclopedia.Synonymous attribute phrases are the phrases that express the same attribute with differe... In this work,an approach is proposed to acquire synonymous attribute phrases of named entities(NEs) from an online encyclopedia.Synonymous attribute phrases are the phrases that express the same attribute with different surface forms for a class of NEs.Specifically,the proposed approach is composed of three stages.Firstly,the entries related to a given NE class are automatically selected from an online encyclopedia.Secondly,attribute phrases are extracted based on the statistics of phrase frequency.Thirdly,synonymous attributes are identified in a pairwise manner through a classification framework combining multiple features.The proposed approach is applied on Baidu Baike,a Chinese online encyclopedia,for four different NE classes.Experimental results show that the approach obtains an average precision of 74%and an average F-value of 65%for the four NE classes.In particular,thousands of synonymous attribute phrase pairs are acquired for each class,which demonstrates the effectiveness of the proposed approach. 展开更多
关键词 attribute phrase named entity (NE) synonymous attribute online encyclopedia
下载PDF
Vari-gram language model based on word clustering
13
作者 袁里驰 《Journal of Central South University》 SCIE EI CAS 2012年第4期1057-1062,共6页
Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with g... Category-based statistic language model is an important method to solve the problem of sparse data.But there are two bottlenecks:1) The problem of word clustering.It is hard to find a suitable clustering method with good performance and less computation.2) Class-based method always loses the prediction ability to adapt the text in different domains.In order to solve above problems,a definition of word similarity by utilizing mutual information was presented.Based on word similarity,the definition of word set similarity was given.Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance,and the perplexity is reduced from 283 to 218.At the same time,an absolute weighted difference method was presented and was used to construct vari-gram language model which has good prediction ability.The perplexity of vari-gram model is reduced from 234.65 to 219.14 on Chinese corpora,and is reduced from 195.56 to 184.25 on English corpora compared with category-based model. 展开更多
关键词 word similarity word clustering statistical language model vari-gram language model
下载PDF
Hybrid Features for an Arabic Word Recognition System
14
作者 Mehmmood A. Abd Sarab Al Rubeaai George Paschos 《Computer Technology and Application》 2012年第10期685-691,共7页
This research proposes and implements an Arabic Sub-Words Recognition System (ASWR). The system focuses on employing a combination of statistical and structural features to provide complete pattern's description an... This research proposes and implements an Arabic Sub-Words Recognition System (ASWR). The system focuses on employing a combination of statistical and structural features to provide complete pattern's description and enhances the recognition rate. Support Vector Machines (SVMs) is utilized as a promising pattern recognition tool. In addition to that, the problems of dots and holes are solved in a completely different way from the ones previously employed. The proposed system proceeds in several phases as follows: (1) image acquisition, (2) binarisation, (3) morphological processing, (4) feature extraction, which includes statistical features, i.e., moment invariants, and structural features, i.e., dot number, dot position, and number of holes, features, and (5) classification, using multi-class SVMs and applying a one-against-all technique. The proposed system has been tested using different sets of words and subwords and has achieved a nearly 98.90% recogiaition rate. Comparative results with NNs are also presented. 展开更多
关键词 Arabic word recognition support vector machines CLASSIFICATION feature extraction neural networks morphological.
下载PDF
近十年来中国高等职业教育研究的轨迹、特征和未来走向--基于高教研究类核心期刊和CSSCI数据库论文的文献计量分析 被引量:28
15
作者 范笑仙 汤建民 《中国高教研究》 CSSCI 北大核心 2010年第10期18-23,共6页
综合运用词频统计法、知识图谱法和内容分析法,对十多年来我国高等职业教育研究论文的研究数量、研究主题和作者队伍等情况进行了历时性的文献计量分析,发现:我国高等职业教育研究近年来取得了一些进步,但还处于较基础的阶段,离一门成... 综合运用词频统计法、知识图谱法和内容分析法,对十多年来我国高等职业教育研究论文的研究数量、研究主题和作者队伍等情况进行了历时性的文献计量分析,发现:我国高等职业教育研究近年来取得了一些进步,但还处于较基础的阶段,离一门成熟的学科还有较大的距离;近十年来我国高等职业教育研究大致经过了三个阶段,第三阶段的研究呈现出"职业教育"、"高职教育"、"高职院校"三中心的特点;我国高等职业教育从职业教育中孕育,职业性是高职教育的内生属性,高等性是高职教育的后生外发属性;高等职业教育研究已初步独具气象,形态上已逐渐与高等教育研究接近,并正朝高等教育研究方向发展。 展开更多
关键词 高职教育研究 文献计量分析 统计 词统计 知识图谱方法 主题结构形态
原文传递
2006-2011二语习得研究热点与前沿的可视化分析 被引量:7
16
作者 刘浩 《当代外语研究》 2013年第3期29-33,77-78,共5页
本研究运用词频统计方法对2006-2011二语习得研究的关键词进行分析统计,总结出近年来该领域的研究热点与前沿,同时借助信息可视化技术绘制了该领域近年来的科学知识图谱。结果表明,2006-2011二语习得研究的热点和前沿如下:重视句法层面... 本研究运用词频统计方法对2006-2011二语习得研究的关键词进行分析统计,总结出近年来该领域的研究热点与前沿,同时借助信息可视化技术绘制了该领域近年来的科学知识图谱。结果表明,2006-2011二语习得研究的热点和前沿如下:重视句法层面的习得研究;重视对中介语语言学特征的描写;重视对学习理论和学习者个体差异的研究;重视学科间的交叉研究;重视基于课堂的研究,尤其是课堂反馈研究;涉及语种范围广,其中以对英语、西班牙语和汉语的习得研究居多。这些发现可以帮助二语习得研究者及时把握学科发展的热点和前沿,并对探寻与确定研究方向有一定的参考价值。 展开更多
关键词 二语习得 统计 分析 可视化
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部