期刊文献+
共找到16篇文章
< 1 >
每页显示 20 50 100
面向维吾尔文不平衡数据分类的特征选择方法 被引量:3
1
作者 董瑞 周喜 《计算机工程与设计》 CSCD 北大核心 2013年第1期349-352,共4页
为解决维吾尔文文本分类中不平衡数据集问题,提出了一种改进的卡方特征选择方法。结合维吾尔文的语言特性对文本进行预处理,降低特征空间维度;运用卡方和逆文档频数相结合的方法进行特征选择,进一步降低特征空间维数;使用朴素贝叶斯分... 为解决维吾尔文文本分类中不平衡数据集问题,提出了一种改进的卡方特征选择方法。结合维吾尔文的语言特性对文本进行预处理,降低特征空间维度;运用卡方和逆文档频数相结合的方法进行特征选择,进一步降低特征空间维数;使用朴素贝叶斯分类器进行分类。在维吾尔文不平衡语料库上进行的实验表明,提出的特征选择方法在不平衡数据集中要优于卡方和信息增益特征选择方法。 展开更多
关键词 不平衡 文本分类 维吾尔文 特征选择 文档 卡方 信息增益
下载PDF
谈个人数字图书馆 被引量:15
2
作者 刘杰 《甘肃科技》 2004年第4期61-63,共3页
论述了个人数字图书馆含义 ,建立个人数字图书馆的必要性及需要考虑的几个方面。
关键词 个人字图书馆定义 个人字图书馆必要性 个人字图书馆 文档收集原则 文档管理
下载PDF
基于差分贡献的垃圾邮件过滤特征选择方法 被引量:10
3
作者 张文良 黄亚楼 倪维健 《计算机工程》 CAS CSCD 北大核心 2007年第8期80-82,共3页
垃圾邮件过滤本质上是一个二类文本分类问题,特征选择是其一个重要的组成部分。针对垃圾邮件过滤问题的特殊性,基于“差分贡献”的思想对文档频数和互信息量这两种传统的特征选择方法进行了改进,设计了新的垃圾邮件过滤特征选择方法。... 垃圾邮件过滤本质上是一个二类文本分类问题,特征选择是其一个重要的组成部分。针对垃圾邮件过滤问题的特殊性,基于“差分贡献”的思想对文档频数和互信息量这两种传统的特征选择方法进行了改进,设计了新的垃圾邮件过滤特征选择方法。实验结果表明,基于差分贡献的特征选择方法使得垃圾邮件过滤的精度得到了有效的提高。 展开更多
关键词 垃圾邮件过滤 特征选择 文档 互信息量
下载PDF
藏文停用词选取与自动处理方法研究 被引量:8
4
作者 珠杰 李天瑞 《中文信息学报》 CSCD 北大核心 2015年第2期125-132,共8页
停用词的处理是文本挖掘中一个关键的预处理步骤。该文结合现有停用词的处理技术,研究了基于统计的藏文停用词选取方法,通过实验分析了词项频率、文档频率、熵等方法的藏文停用词选用情况,提出了藏文虚词、特殊动词和自动处理方法相结... 停用词的处理是文本挖掘中一个关键的预处理步骤。该文结合现有停用词的处理技术,研究了基于统计的藏文停用词选取方法,通过实验分析了词项频率、文档频率、熵等方法的藏文停用词选用情况,提出了藏文虚词、特殊动词和自动处理方法相结合的藏文停用词选取方法。实验结果表明,该方法可以确定一个较合理的藏文停用词表。 展开更多
关键词 藏文停用词 词频统计 文档
下载PDF
一种新型的文本无监督特征选择方法 被引量:2
5
作者 何中市 徐浙君 《重庆大学学报(自然科学版)》 EI CAS CSCD 北大核心 2007年第6期77-79,83,共4页
结合文档频数DF(Document Frequency)和特征相似度FS(Feature Similarity)方法,提出一种新的无监督特征选择方法DFFS。该方法利用文档频数过滤掉90%的特征之后,再借助特征相似度移除尽可能多的冗余特征。采用K-均值方法,对比DFF... 结合文档频数DF(Document Frequency)和特征相似度FS(Feature Similarity)方法,提出一种新的无监督特征选择方法DFFS。该方法利用文档频数过滤掉90%的特征之后,再借助特征相似度移除尽可能多的冗余特征。采用K-均值方法,对比DFFS方法与其他3种常用特征选择方法(DF,TC,TS)的聚类性能。实验一:当特征数量由6000减少到1047时,DF方法的聚类性能急剧下降,而DFFS方法则有提高,甚至当特征数量进一步减少到350时,DFFS方法也没有下降。实验二:在保持10%~2%的特征时,DFFS方法优于其他3种方法,特别是在只保留2%的特征时,DFFS方法的明显优于其他方法。 展开更多
关键词 自然语言处理 特征选择 文档 单词权 单词熵
下载PDF
词分布均衡度评价特征词选取方法的文本分类 被引量:1
6
作者 陈键 胡学刚 《安徽科技学院学报》 2009年第2期38-40,共3页
对文本分类技术进行研究,首先介绍文档频数特征词评价方法;然后提出一种词分布均衡度评价的特征词选取方法,最后分析基于词分布均衡度评价的支持向量机文本分类算法,并实验证明其优越性。
关键词 文本分类 支持向量机 文档
下载PDF
网页特征提取技术研究 被引量:3
7
作者 于洪波 《山东理工大学学报(自然科学版)》 CAS 2011年第2期107-110,共4页
特征提取作为网页分类中的一个必要步骤,起着重要作用.提取算法性能的优劣将直接影响到分类的质量.通过对几种提取方法的分析比较,采用了DF和TF相结合的提取方法进行设计实现,并在提取过程中增加了基于词性的提取;最后通过实验对该方法... 特征提取作为网页分类中的一个必要步骤,起着重要作用.提取算法性能的优劣将直接影响到分类的质量.通过对几种提取方法的分析比较,采用了DF和TF相结合的提取方法进行设计实现,并在提取过程中增加了基于词性的提取;最后通过实验对该方法可行性进行了验证. 展开更多
关键词 文档 特征频率 特征提取 网页分类
下载PDF
CIT/XML Security Platform Syntax and Processing 被引量:1
8
作者 安南 张申生 《Journal of Southeast University(English Edition)》 EI CAS 2002年第2期108-113,共6页
Today companies and organizations are using the Web as the main informationdissemination means both at internal and external level. Information dissemination often takes theform of XML documents that are made availabl... Today companies and organizations are using the Web as the main informationdissemination means both at internal and external level. Information dissemination often takes theform of XML documents that are made available at Web servers, or that are actively broadcasted byWeb servers to interested clients. These documents often contain information at different degrees ofsensitivity, therefore a strong XML security platform and mechanism is needed. In this paper wedeveloped CIT/XML security platform and take a close look to syntax and processing of CIT/digitalsignature model, CIT/encryption model, CIT/smart card crypto and SPKI interface security models.Security services such as authentication, integrity and confidentiality to XML documents and non-XMLdocuments, which exchanged among various servers, are provided. 展开更多
关键词 electronic commerce security digital certificates smart card digitalcommerce AUTHENTICATION SPKI XML
下载PDF
A method for publishing relational schema into DTD
9
作者 梁作鹏 王晓玲 +1 位作者 徐立臻 董逸生 《Journal of Southeast University(English Edition)》 EI CAS 2003年第2期117-120,共4页
This paper focuses on exporting relational data into extensible markup language (XML). First, the characteristics of both relational schemas represented by E-R diagrams and XML document type definitions (DTDs) are an... This paper focuses on exporting relational data into extensible markup language (XML). First, the characteristics of both relational schemas represented by E-R diagrams and XML document type definitions (DTDs) are analyzed. Secondly, the corresponding mapping rules are proposed. At last an algorithm based on edge tables is presented. There are two key points in the algorithm. One is that the edge table is used to store the information of the relational dictionary, and this brings about the efficiency of the algorithm. The other is that structural information can be obtained from the resulting DTDs and other applications can optimize their query processes using the structural information. 展开更多
关键词 XML DTD relational database XML schema
下载PDF
一种改进的特征选择方法在文本分类系统中的应用
10
作者 李长虹 李堂秋 《学术问题研究》 2005年第1期94-98,共5页
本文在介绍文本分类的背景及传统基于向量空间模型特征选择不足之处的同时,提出了不同特征选择方法相结合的文本分类模型。该模型首先对文本进行分析,把文本表示成向量空间的形式。文本在经过预处理后,按一定规则提取关键词,关键词的提... 本文在介绍文本分类的背景及传统基于向量空间模型特征选择不足之处的同时,提出了不同特征选择方法相结合的文本分类模型。该模型首先对文本进行分析,把文本表示成向量空间的形式。文本在经过预处理后,按一定规则提取关键词,关键词的提取中增加了对名词短语的识别。特征选择的方法上,结合了文档频数和互信息量,并对他们进行了改进。实验结果表明,使用新方法进行分类所得到的分类精度得到了一定的提高。 展开更多
关键词 文本分类 特征选择 文档 互信息量
下载PDF
Study on Tourism Electronic Platform based on Large Data Background
11
作者 Hengyuan Xie 《International Journal of Technology Management》 2014年第7期41-43,共3页
A big scale data poses a great challenge to data storage, management and data analysis. This article analyzes the basic concepts of large data, and mainly used on large data makes the simple contrast. And paper put fo... A big scale data poses a great challenge to data storage, management and data analysis. This article analyzes the basic concepts of large data, and mainly used on large data makes the simple contrast. And paper put forward a platform of regional characteristics based on electronic business information publishing system. Finally the paper gives general model and the realization of the platform structure, key technology and process. The platform uses conversion technology of StrutsCX framework based on J2EE platform and the XSLT parsing template of XML document tree that generates and provide automation platform construction features site for the user, it can quickly set up a tourism industry application component with plug-in manner. 展开更多
关键词 Large data data analysis cloud computing Platform architecture of tourism E-commerce StrutsCX XML
下载PDF
Audiovisual Archiving in Lithuanian Central State Archive
12
作者 Jole Stimbiryte 《Journalism and Mass Communication》 2014年第2期86-100,共15页
Lithuanian Central State Archive is the biggest one within the state archival service and the only state archive where audiovisual documents are stored. There are more than 800,000 units of audiovisual documents in th... Lithuanian Central State Archive is the biggest one within the state archival service and the only state archive where audiovisual documents are stored. There are more than 800,000 units of audiovisual documents in the archive. The main laws regulating the activity of Lithuanian Central State Archive and related to audiovisual archiving are the Law on Documents and Archives of Lithuanian Republic, the Law of Cinema of Lithuanian Republic, and the Law on Copyright and Related Rights of Lithuanian Republic. There are four big collections of audiovisual documents in the Lithuanian Central State Archive--films, photo documents, sound recordings, and video recordings. The Archive's specialists have a large experience in the field of physical treatment and preservation of analogue audiovisual documents. Lithuanian Central State Archive digitizes audiovisual documents seeking the balance between long time preservation and nowadays access. Since May, 2010 till April 2013, Lithuanian Central State Archive implemented the project--Lithuanian documentaries on the Internet. During the project the Archives digitized and transferred to the Internet 1,000 titles of Lithuanian documentaries, created in the period 1919-1961. Lithuanian Central State Archive wants to popularize its collections, so various international projects are participated in. 展开更多
关键词 audiovisual archiving preservation of analogues DIGITIZATION projects
下载PDF
The Development of a Geographic Information System (GIS) to Document Research in an Everglades Physical Model
13
作者 S. Aich T.W. Dreschel +1 位作者 E.A. Cline F.H. Sklar 《Journal of Environmental Science and Engineering》 2011年第3期289-302,共14页
The Loxahatchee Impoundment Landscape Assessment (LILA) facility is a unique physical model of the Everglades ecosystem. LILA has a closed-loop water delivery system and consists of four 0.08 square kilometer (-8 h... The Loxahatchee Impoundment Landscape Assessment (LILA) facility is a unique physical model of the Everglades ecosystem. LILA has a closed-loop water delivery system and consists of four 0.08 square kilometer (-8 ha) macrocosms, created to be replicates of one another and of the Everglades landscape. Built in 2003, LILA's purpose is to provide scientists with an opportunity to design and implement research concerning Everglades restoration techniques in an accessible, controlled and replicated Everglades environment. Key Everglades habitats were sculpted within LILA: tree islands, ridges, sloughs and alligator holes. Water levels and flows in each macrocosm are controlled independently, so that researchers can study the effects of hydrology on Everglades landscape and ecology. Studies have focused upon measuring survival and growth of native trees planted on the tree islands; measuring surface water and ground water movement and chemistry; studying wading bird feeding and the movement of prey species (crayfish); and measuring erosion and accretion on tree islands and ridges. We developed a Geographic Information System (GIS) data set to identify, characterize, and spatially reference the features of LILA and document research activities. This development included mapping the boundaries of the landscape features, creating a theoretical Digital Elevation Model (DEM) and describing the research projects being carried out. The creation of this GIS data set enhances the ability to schedule and coordinate research, assist scientists in the visualization and spatial representation of their research, and provide a resource for the storage, analysis and synthesis of valuable scientific information. 展开更多
关键词 CERP EVERGLADES everglades forever act GIS LILA ridge and slough tree island.
下载PDF
An Interactive Model for Analyzing the Development of the Communication Discipline: Israel as a Case Study
14
作者 Anat First Hanna Adoni 《Journalism and Mass Communication》 2015年第7期324-340,共17页
Our paper presents an interactive four-dimensional model for studying the long- and short-term development of the communication discipline with Israel serving as a case study: institutional-contextual, institutional-... Our paper presents an interactive four-dimensional model for studying the long- and short-term development of the communication discipline with Israel serving as a case study: institutional-contextual, institutional-in-field, intellectual-contextual, and intellectual-in-field. Our empirical analysis utilized personal interviews, archive documents, and statistical data. Four main processes were discerned: transition from integration to alienation between institutions of higher learning and the larger political and ideological context; a shift from Hebrew University Institute of Communication's institutional monopoly to a multiplicity of increasingly competitive communication schools/departments; transition from intellectual hegemony to limited intellectual diversity; and gradually improving status for the communication field among social science disciplines. Our case-study analysis validated the interactive relationship among the model's conceptual dimensions, calling for future cross-national comparisons. 展开更多
关键词 development of communication discipline interactive model conceptual dimensions limited intellectual diversity communication field in Israel
下载PDF
数据中心存储与容灾建设解决方案
15
《科技浪潮》 2009年第5期19-20,共2页
数据中心,顾名思义,是数据集中存储和管理的中心。数据中心汇聚了各种类型的数据,这些数据的表现形式和组织方式都大不相同,其重要程度也不同。如何管理数据中心的数据。
关键词 据中心 容灾系统 集中存储 管理 光纤磁盘阵列 存储系统 网络存储 备份软件 备份 文档数
原文传递
Numeric-Based XML Labeling Schema by Generalized Dynamic Method
16
作者 倪叶峰 范远超 +2 位作者 谈昕澄 崔锦 王晓玲 《Journal of Shanghai Jiaotong university(Science)》 EI 2012年第2期203-208,共6页
Most efficient indeces and query techniques over XML (extensible markup language) data are based on a certain labeling scheme, which can quickly determine ancestor-descendant and parent-child relationship between tw... Most efficient indeces and query techniques over XML (extensible markup language) data are based on a certain labeling scheme, which can quickly determine ancestor-descendant and parent-child relationship between two nodes. The current basic labeling schemes such as containment scheme and prefix scheme cannot avoid re- labeling when XML documents are updated. After analyzing the essence of existing dynamic XML labels such as compact dynamic binary string (CDBS) and vector encoding, this paper gives a common unifying framework for the numeric-based generalized dynamic label, which can be implemented into a variety of dynamic labels according to the different user-defined value comparison methods. This paper also proposes a novel dynamic labeling scheme called radical sign label. Extensive experiments show that the radical sign label performs well for the initialization, insertion and query operations, and especially for skewed insertion where the storage cost of the radical sign label is better than that of former methods. 展开更多
关键词 dynamic labeling scheme compact dynamic binary string (CDBS) vector encoding
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部