期刊文献+
共找到255篇文章
< 1 2 13 >
每页显示 20 50 100
Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark 被引量:21
1
作者 LIU Peng ZHAO Hui-han +3 位作者 TENG Jia-yu YANG Yan-yan LIU Ya-feng ZHU Zong-wei 《Journal of Central South University》 SCIE EI CAS CSCD 2019年第1期1-12,共12页
The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parall... The sharp increase of the amount of Internet Chinese text data has significantly prolonged the processing time of classification on these data.In order to solve this problem,this paper proposes and implements a parallel naive Bayes algorithm(PNBA)for Chinese text classification based on Spark,a parallel memory computing platform for big data.This algorithm has implemented parallel operation throughout the entire training and prediction process of naive Bayes classifier mainly by adopting the programming model of resilient distributed datasets(RDD).For comparison,a PNBA based on Hadoop is also implemented.The test results show that in the same computing environment and for the same text sets,the Spark PNBA is obviously superior to the Hadoop PNBA in terms of key indicators such as speedup ratio and scalability.Therefore,Spark-based parallel algorithms can better meet the requirement of large-scale Chinese text data mining. 展开更多
关键词 Chinese text classification naive bayes SPARK HADOOP resilient distributed dataset PARALLELIZATION
下载PDF
Decision Tree and Naive Bayes Algorithm for Classification and Generation of Actionable Knowledge for Direct Marketing
2
作者 Masud Karim Rashedur M.Rahman 《Journal of Software Engineering and Applications》 2013年第4期196-206,共11页
Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audie... Many companies like credit card, insurance, bank, retail industry require direct marketing. Data mining can help those institutes to set marketing goal. Data mining techniques have good prospects in their target audiences and improve the likelihood of response. In this work we have investigated two data mining techniques: the Naive Bayes and the C4.5 decision tree algorithms. The goal of this work is to predict whether a client will subscribe a term deposit. We also made comparative study of performance of those two algorithms. Publicly available UCI data is used to train and test the performance of the algorithms. Besides, we extract actionable knowledge from decision tree that focuses to take interesting and important decision in business area. 展开更多
关键词 CRM Actionable KNOWLEDGE Data Mining C4.5 naive bayes ROC classification
下载PDF
DDoS Attack Detection Using Heuristics Clustering Algorithm and Naive Bayes Classification
3
作者 Sharmila Bista Roshan Chitrakar 《Journal of Information Security》 2018年第1期33-44,共12页
In recent times among the multitude of attacks present in network system, DDoS attacks have emerged to be the attacks with the most devastating effects. The main objective of this paper is to propose a system that eff... In recent times among the multitude of attacks present in network system, DDoS attacks have emerged to be the attacks with the most devastating effects. The main objective of this paper is to propose a system that effectively detects DDoS attacks appearing in any networked system using the clustering technique of data mining followed by classification. This method uses a Heuristics Clustering Algorithm (HCA) to cluster the available data and Na?ve Bayes (NB) classification to classify the data and detect the attacks created in the system based on some network attributes of the data packet. The clustering algorithm is based in unsupervised learning technique and is sometimes unable to detect some of the attack instances and few normal instances, therefore classification techniques are also used along with clustering to overcome this classification problem and to enhance the accuracy. Na?ve Bayes classifiers are based on very strong independence assumptions with fairly simple construction to derive the conditional probability for each relationship. A series of experiment is performed using “The CAIDA UCSD DDoS Attack 2007 Dataset” and “DARPA 2000 Dataset” and the efficiency of the proposed system has been tested based on the following performance parameters: Accuracy, Detection Rate and False Positive Rate and the result obtained from the proposed system has been found that it has enhanced accuracy and detection rate with low false positive rate. 展开更多
关键词 DDOS Attacks Heuristic Clustering Algorithm naive bayes classification CAIDA UCSD DARPA 2000
下载PDF
基于K-means和naive Bayes的数据库用户行为异常检测研究 被引量:8
4
作者 王旭仁 冯安然 +2 位作者 何发镁 马慧珍 杨杰 《计算机应用研究》 CSCD 北大核心 2020年第4期1128-1131,共4页
针对数据库用户行为异常导致数据库泄露问题,提出了一种基于K-means和naive Bayes算法的数据库用户异常检测方法。首先,利用数据库历史审计日志中用户的查询语句与查询结果,采用K-means聚类方法得到用户的分组;然后,使用naive Bayes分... 针对数据库用户行为异常导致数据库泄露问题,提出了一种基于K-means和naive Bayes算法的数据库用户异常检测方法。首先,利用数据库历史审计日志中用户的查询语句与查询结果,采用K-means聚类方法得到用户的分组;然后,使用naive Bayes分类算法构造用户异常检测模型。与单独使用naive Bayes分类法构造的模型相比,在数据预处理时其精简了用户行为轮廓的表示方法,降低了计算冗余,减少了81%的训练时间;利用K-means聚类方法得到用户组别,使检测的精确率提高了7.06%,F 1值提高了3.33%。实验证明,所提方法大幅降低了训练时间,取得了良好的检测效果。 展开更多
关键词 数据库 用户行为 异常检测 K-MEANS聚类 naive bayes分类算法
下载PDF
基于Naive Bayes的CLIF_NB文本分类学习方法 被引量:1
5
作者 刘丽珍 宋瀚涛 陆玉昌 《小型微型计算机系统》 CSCD 北大核心 2005年第9期1575-1577,共3页
针对NaiveBayes方法中条件独立性假设常常与实际相违背的情况,提出了CLIF-NB文本分类学习方法,利用互信息理论,计算特征属性之间的最大相关性概率,用变量集组合替代线性不可分属性,改善条件独立性假设的限制,并通过学习一系列分类器,缩... 针对NaiveBayes方法中条件独立性假设常常与实际相违背的情况,提出了CLIF-NB文本分类学习方法,利用互信息理论,计算特征属性之间的最大相关性概率,用变量集组合替代线性不可分属性,改善条件独立性假设的限制,并通过学习一系列分类器,缩小训练集中的分类错误,综合得出分类准确率较高的CLIF-NB分类器. 展开更多
关键词 文本分类 naive bayes 条件独立性假设
下载PDF
基于Naive Bayes的维吾尔文文本分类算法及其性能分析 被引量:7
6
作者 艾海麦提江.阿布来提 吐尔地.托合提 艾斯卡尔.艾木都拉 《计算机应用与软件》 CSCD 北大核心 2012年第12期27-29,共3页
以大规模网络维吾尔文文本的自动分类技术研究为背景,设计模块化结构的维吾尔文本分类系统,在深入调研基础上选择Naive Bayes算法为分类引擎,用C#实现分类系统。预处理中,结合维吾尔语的词法特征,通过引入词干提取方法大大降低特征维数... 以大规模网络维吾尔文文本的自动分类技术研究为背景,设计模块化结构的维吾尔文本分类系统,在深入调研基础上选择Naive Bayes算法为分类引擎,用C#实现分类系统。预处理中,结合维吾尔语的词法特征,通过引入词干提取方法大大降低特征维数。在包含10大类共计3 000多个较大规模文本语料库基础上给出分类实验结果,再通过x2统计方法选择不同数目的特征,也分别给出分类实验结果。结果表明,预处理后的维吾尔文特征空间中只有1%-3%特征是最佳的,因而进一步确定哪些是最佳特征或降低特征空间维数是有可能的。 展开更多
关键词 维吾尔文 文本分类 naive bayes词干提取 停用词
下载PDF
Mobile SMS Spam Filtering for Nepali Text Using Naive Bayesian and Support Vector Machine 被引量:2
7
作者 Tej Bahadur Shahi Abhimanu Yadav 《International Journal of Intelligence Science》 2014年第1期24-28,共5页
Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to ... Spam is a universal problem with which everyone is familiar. A number of approaches are used for Spam filtering. The most common filtering technique is content-based filtering which uses the actual text of message to determine whether it is Spam or not. The content is very dynamic and it is very challenging to represent all information in a mathematical model of classification. For instance, in content-based Spam filtering, the characteristics used by the filter to identify Spam message are constantly changing over time. Na?ve Bayes method represents the changing nature of message using probability theory and support vector machine (SVM) represents those using different features. These two methods of classification are efficient in different domains and the case of Nepali SMS or Text classification has not yet been in consideration;these two methods do not consider the issue and it is interesting to find out the performance of both the methods in the problem of Nepali Text classification. In this paper, the Na?ve Bayes and SVM-based classification techniques are implemented to classify the Nepali SMS as Spam and non-Spam. An empirical analysis for various text cases has been done to evaluate accuracy measure of the classification methodologies used in this study. And, it is found to be 87.15% accurate in SVM and 92.74% accurate in the case of Na?ve Bayes. 展开更多
关键词 SMS Spam Filtering classification Support Vector Machine naive bayes PREPROCESSING Feature Extraction Nepali SMS Datasets
下载PDF
基于Naive Bayes的P2P平台评论研究 被引量:1
8
作者 曾政多 《现代计算机》 2019年第20期10-13,共4页
随着支付宝余额宝、腾讯理财通等网络金融的发展,投资者对于网络投资的热情逐年递增,出现大量高收益的P2P网贷投资平台,由于发展的速度过快,且各平台良莠不齐,许多相关问题因此而生。对于这个现象,从各平台的用户评论入手,评论信息中不... 随着支付宝余额宝、腾讯理财通等网络金融的发展,投资者对于网络投资的热情逐年递增,出现大量高收益的P2P网贷投资平台,由于发展的速度过快,且各平台良莠不齐,许多相关问题因此而生。对于这个现象,从各平台的用户评论入手,评论信息中不仅可以反映民众对金融平台的关注程度,也反映公众表现出来的各类情感价值和思想动态,基于朴素贝叶斯(Naive Bayes)分类器,应用Python中的SnowNLP库,在已有的数据集上经过数据处理,建立模型,数据挖掘与分析对评论中的用户观点进行研究,为P2P投资的用户提供建议,同时也为P2P平台的监管与风险预测提供借鉴。 展开更多
关键词 朴素贝叶斯分类 PYTHON 中文评论情感分析 P2P平台
下载PDF
基于Bayes的一种改良垃圾邮件过滤模型 被引量:2
9
作者 龚伟 《微计算机信息》 北大核心 2007年第3期104-106,共3页
文章首先分析了垃圾邮件的产生机理,介绍了目前比较常见的几种垃圾邮件过滤技术,然后从朴素贝叶斯的理论依据出发,针对当前应用于重要商业领域的垃圾邮件过滤系统的不足,设计了一种应用多级邮件策略的新模型,并通过实验比较证明新模型... 文章首先分析了垃圾邮件的产生机理,介绍了目前比较常见的几种垃圾邮件过滤技术,然后从朴素贝叶斯的理论依据出发,针对当前应用于重要商业领域的垃圾邮件过滤系统的不足,设计了一种应用多级邮件策略的新模型,并通过实验比较证明新模型的应用在一定程度上提高了垃圾邮件过滤系统的查全率和查准率。 展开更多
关键词 垃圾邮件 过滤 实时黑名单 朴素贝叶斯 邮件分级
下载PDF
Automatically Constructing an Effective Domain Ontology for Document Classification 被引量:2
10
作者 Yi-Hsing Chang 《Computer Technology and Application》 2011年第3期182-189,共8页
An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the... An effective domain ontology automatically constructed is proposed in this paper. The main concept is using the Formal Concept Analysis to automatically establish domain ontology. Finally, the ontology is acted as the base for the Naive Bayes classifier to approve the effectiveness of the domain ontology for document classification. The 1752 documents divided into 10 categories are used to assess the effectiveness of the ontology, where 1252 and 500 documents are the training and testing documents, respectively. The Fl-measure is as the assessment criteria and the following three results are obtained. The average recall of Naive Bayes classifier is 0.94. Therefore, in recall, the performance of Naive Bayes classifier is excellent based on the automatically constructed ontology. The average precision of Naive Bayes classifier is 0.81. Therefore, in precision, the performance of Naive Bayes classifier is gored based on the automatically constructed ontology. The average Fl-measure for 10 categories by Naive Bayes classifier is 0.86. Therefore, the performance of Naive Bayes classifier is effective based on the automatically constructed ontology in the point of F 1-measure. Thus, the domain ontology automatically constructed could indeed be acted as the document categories to reach the effectiveness for document classification. 展开更多
关键词 naive bayes classifier ONTOLOGY formal concept analysis document classification.
下载PDF
Automatic Classification of Swedish Metadata Using Dewey Decimal Classification:A Comparison of Approaches 被引量:1
11
作者 Koraljka Golub Johan Hagelback Anders Ardo 《Journal of Data and Information Science》 CSCD 2020年第1期18-38,共21页
Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization syst... Purpose:With more and more digital collections of various information resources becoming available,also increasing is the challenge of assigning subject index terms and classes from quality knowledge organization systems.While the ultimate purpose is to understand the value of automatically produced Dewey Decimal Classification(DDC)classes for Swedish digital collections,the paper aims to evaluate the performance of six machine learning algorithms as well as a string-matching algorithm based on characteristics of DDC.Design/methodology/approach:State-of-the-art machine learning algorithms require at least 1,000 training examples per class.The complete data set at the time of research involved 143,838 records which had to be reduced to top three hierarchical levels of DDC in order to provide sufficient training data(totaling 802 classes in the training and testing sample,out of 14,413 classes at all levels).Findings:Evaluation shows that Support Vector Machine with linear kernel outperforms other machine learning algorithms as well as the string-matching algorithm on average;the string-matching algorithm outperforms machine learning for specific classes when characteristics of DDC are most suitable for the task.Word embeddings combined with different types of neural networks(simple linear network,standard neural network,1 D convolutional neural network,and recurrent neural network)produced worse results than Support Vector Machine,but reach close results,with the benefit of a smaller representation size.Impact of features in machine learning shows that using keywords or combining titles and keywords gives better results than using only titles as input.Stemming only marginally improves the results.Removed stop-words reduced accuracy in most cases,while removing less frequent words increased it marginally.The greatest impact is produced by the number of training examples:81.90%accuracy on the training set is achieved when at least 1,000 records per class are available in the training set,and 66.13%when too few records(often less than A Comparison of Approaches100 per class)on which to train are available—and these hold only for top 3 hierarchical levels(803 instead of 14,413 classes).Research limitations:Having to reduce the number of hierarchical levels to top three levels of DDC because of the lack of training data for all classes,skews the results so that they work in experimental conditions but barely for end users in operational retrieval systems.Practical implications:In conclusion,for operative information retrieval systems applying purely automatic DDC does not work,either using machine learning(because of the lack of training data for the large number of DDC classes)or using string-matching algorithm(because DDC characteristics perform well for automatic classification only in a small number of classes).Over time,more training examples may become available,and DDC may be enriched with synonyms in order to enhance accuracy of automatic classification which may also benefit information retrieval performance based on DDC.In order for quality information services to reach the objective of highest possible precision and recall,automatic classification should never be implemented on its own;instead,machine-aided indexing that combines the efficiency of automatic suggestions with quality of human decisions at the final stage should be the way for the future.Originality/value:The study explored machine learning on a large classification system of over 14,000 classes which is used in operational information retrieval systems.Due to lack of sufficient training data across the entire set of classes,an approach complementing machine learning,that of string matching,was applied.This combination should be explored further since it provides the potential for real-life applications with large target classification systems. 展开更多
关键词 LIBRIS Dewey Decimal classification Automatic classification Machine learning Support Vector Machine Multinomial naive bayes Simple linear network Standard neural network 1D convolutional neural network Recurrent neural network Word embeddings String matching
下载PDF
Classification of epilepsy using computational intelligence techniques 被引量:3
12
作者 Khurram I. Qazi H.K. Lam +2 位作者 Bo Xiao Gaoxiang Ouyang Xunhe Yin 《CAAI Transactions on Intelligence Technology》 2016年第2期137-149,共13页
This paper deals with a real-life application of epilepsy classification, where three phases of absence seizure, namely pre-seizure, seizure and seizure-free, are classified using real clinical data. Artificial neural... This paper deals with a real-life application of epilepsy classification, where three phases of absence seizure, namely pre-seizure, seizure and seizure-free, are classified using real clinical data. Artificial neural network (ANN) and support vector machines (SVMs) combined with su- pervised learning algorithms, and k-means clustering (k-MC) combined with unsupervised techniques are employed to classify the three seizure phases. Different techniques to combine binary SVMs, namely One Vs One (OvO), One Vs All (OVA) and Binary Decision Tree (BDT), are employed for multiclass classification. Comparisons are performed with two traditional classification methods, namely, k-Nearest Neighbour (k- NN) and Naive Bayes classifier. It is concluded that SVM-based classifiers outperform the traditional ones in terms of recognition accuracy and robustness property when the original clinical data is distorted with noise. Furthermore, SVM-based classifier with OvO provides the highest recognition accuracy, whereas ANN-based classifier overtakes by demonstrating maximum accuracy in the presence of noise. 展开更多
关键词 Absence seizure Discrete wavelet transform Epilepsy classification Feature extraction k-means clustering k-nearest neighbours naive bayes NEURALNETWORKS Support vector machines
下载PDF
基于增量式Bayes的中文网页自动分类技术
13
作者 高洁 赵俊荣 《电脑知识与技术》 2006年第5期45-46,68,共3页
本文提出了基于未标记的中文网页的增量式Bayes自动分类算法,实验结果表明,该算法是可行的和有效的。
关键词 中文网页分类 增量学习 naive bayes
下载PDF
基于Na?ve Bayes和TF-IDF的真假新闻分类
14
作者 蔡扬 付小斌 《电脑知识与技术》 2018年第2期184-186,共3页
信息爆炸的时代,大量的新闻每天充斥的我们的生活,海量的新闻总是能够引导着人们对社会中发生的事件做出自己的判断。假新闻的错误引导将会对社会起到消极的作用,于是该文提出对真假新闻进行分类的方法。该文结合TFIDF算法和朴素贝叶斯... 信息爆炸的时代,大量的新闻每天充斥的我们的生活,海量的新闻总是能够引导着人们对社会中发生的事件做出自己的判断。假新闻的错误引导将会对社会起到消极的作用,于是该文提出对真假新闻进行分类的方法。该文结合TFIDF算法和朴素贝叶斯算法,对新闻中的词条进行加权,之后重新定义朴素贝叶斯分类器,并对新闻进行分类。最后,我们进行了多组实验,并取得了多组实验的平均值作为本次实验的最终结论。 展开更多
关键词 真假新闻 TF-IDF 朴素贝叶斯 分类
下载PDF
一种应用于智能分诊的改进朴素贝叶斯方法 被引量:1
15
作者 鲍琪琪 孙超仁 《现代医院》 2024年第3期424-427,共4页
针对朴素贝叶斯分类方法(naive bayesian model,NBM)在应用于门诊智能分诊时,无法有效区分不同类型的症状涉及的疾病学科范围不同问题,提出了一种朴素贝叶斯分类方法的改进算法,引入IDF因子,为不同的症状类型提供相应的权重。首先,基于... 针对朴素贝叶斯分类方法(naive bayesian model,NBM)在应用于门诊智能分诊时,无法有效区分不同类型的症状涉及的疾病学科范围不同问题,提出了一种朴素贝叶斯分类方法的改进算法,引入IDF因子,为不同的症状类型提供相应的权重。首先,基于权威医疗文献,收集整理诊断学相关的语料作为训练数据集,然后,基于朴素贝叶斯分类方法计算先验概率、类条件概率,训练生成不同症状的IDF因子,最后,在进行分类判断时对不同的症状组合引入IDF因子,平滑不同类型症状的重要程度。在智能分诊准确性对比实验中,改进后的算法召回率提升约11%,明显高于朴素贝叶斯分类方法。 展开更多
关键词 智能分诊 朴素贝叶斯 IDF 多类别分类 有监督学习
下载PDF
基于主成分分析的朴素贝叶斯噬菌体病毒蛋白分类
16
作者 徐思蓉 叶仁玉 冷婷 《皖西学院学报》 2024年第2期44-48,共5页
噬菌体病毒蛋白质分类是生物信息学热点问题之一。对朴素贝斯分类中的特征独立性假设以及病毒蛋白质特征提取问题,提出一种结合伪氨基酸组成(PAAC)和k间隔氨基酸组成(CKSAAP)的混合特征提取法,且将主成分分析朴素贝叶斯分类模型(PNBC)... 噬菌体病毒蛋白质分类是生物信息学热点问题之一。对朴素贝斯分类中的特征独立性假设以及病毒蛋白质特征提取问题,提出一种结合伪氨基酸组成(PAAC)和k间隔氨基酸组成(CKSAAP)的混合特征提取法,且将主成分分析朴素贝叶斯分类模型(PNBC)应用于噬菌体病毒蛋白分类问题。实证分析表明,相比于朴素贝叶斯和支持向量机模型,主成分分析朴素贝叶斯模型分类准确率达80%,效果最优。 展开更多
关键词 主成分分析 朴素贝叶斯 噬菌体 蛋白质分类
下载PDF
基于朴素贝叶斯算法的微博垃圾信息自动识别系统
17
作者 崔凯雯 《移动信息》 2024年第6期291-294,共4页
贝叶斯算法是一种利用数学概率来计算可能性的算法,被广泛用于各种分类器,其将所有事件都假设为相互独立的事件,从而降低算法难度。文中设计并实现了一种基于朴素贝叶斯算法的微博垃圾信息自动识别系统。该系统基于MyEclipse8.6工具,采... 贝叶斯算法是一种利用数学概率来计算可能性的算法,被广泛用于各种分类器,其将所有事件都假设为相互独立的事件,从而降低算法难度。文中设计并实现了一种基于朴素贝叶斯算法的微博垃圾信息自动识别系统。该系统基于MyEclipse8.6工具,采用Java语言进行开发,首先使用爬虫程序对微博评论区内容进行抓取,并以txt格式保存评论区内容以备后续训练使用,随后采用MMAnalyzer算法进行中文文本分词,提取文本特征,最后使用朴素贝叶斯分类器进行分类。实验结果表明,基于朴素贝叶斯算法的分类器设计简单、使用方便且正确率较高,是一种具有良好前景的初级分类器。 展开更多
关键词 朴素贝叶斯算法 分类器 中文分词 文本分类
下载PDF
基于Gauss分布和Gram-Schmidt正交化的朴素贝叶斯分类算法 被引量:4
18
作者 黄小杰 刘芝秀 +2 位作者 邓梓杨 刘红军 吴春 《南昌大学学报(理科版)》 CAS 北大核心 2023年第3期213-217,共5页
朴素贝叶斯分类算法是一种简单实用的分类方法,人们对它的属性间条件独立性假设做了许多研究,致力于消除冗余属性、减少属性间的关联性,以获得一些新属性来使用朴素贝叶斯算法,但新属性间的独立性却不易度量,因而改进之处的理论支撑有... 朴素贝叶斯分类算法是一种简单实用的分类方法,人们对它的属性间条件独立性假设做了许多研究,致力于消除冗余属性、减少属性间的关联性,以获得一些新属性来使用朴素贝叶斯算法,但新属性间的独立性却不易度量,因而改进之处的理论支撑有所不足,改进后的朴素贝叶斯算法的效果更多的是由数据实验进行佐证。本文定义了Gauss分布型数据,提出了经Gram-Schmidt正交化方法改进的朴素贝叶斯算法,使其可以方便地使用于Gauss分布型数据的分类。该改进方法不同以往显式的构造新属性集或属性变换矩阵,而是直接正交化属性的样本数据,并证明了正交后的属性数据所对应的抽象新属性的独立性。这说明对于Gauss分布型数据的分类,原朴素贝叶斯算法中的条件独立性的假设不会给算法的使用造成障碍,经Gram-Schmidt正交化后即可满足这个约束条件。 展开更多
关键词 Gauss分布型数据 Gram-Schmidt正交化 朴素贝叶斯 分类
下载PDF
Network-based naive Bayes model for social network
19
作者 Danyang Huang Guoyu Guan +1 位作者 Jing Zhou Hansheng Wang 《Science China Mathematics》 SCIE CSCD 2018年第4期627-640,共14页
Naive Bayes(NB) is one of the most popular classification methods. It is particularly useful when the dimension of the predictor is high and data are generated independently. In the meanwhile, social network data are ... Naive Bayes(NB) is one of the most popular classification methods. It is particularly useful when the dimension of the predictor is high and data are generated independently. In the meanwhile, social network data are becoming increasingly accessible, due to the fast development of various social network services and websites. By contrast, data generated by a social network are most likely to be dependent. The dependency is mainly determined by their social network relationships. Then, how to extend the classical NB method to social network data becomes a problem of great interest. To this end, we propose here a network-based naive Bayes(NNB) method, which generalizes the classical NB model to social network data. The key advantage of the NNB method is that it takes the network relationships into consideration. The computational efficiency makes the NNB method even feasible in large scale social networks. The statistical properties of the NNB model are theoretically investigated. Simulation studies have been conducted to demonstrate its finite sample performance.A real data example is also analyzed for illustration purpose. 展开更多
关键词 classification naive bayes Sina Weibo social network data
原文传递
面向文本分类的特征词选取方法研究 被引量:1
20
作者 李鹏飞 王辉 +1 位作者 Marius.Petrescu 王浩畅 《计算机与数字工程》 2023年第12期2895-2900,共6页
基于文本分类的特征词选取是自然语言处理中最基础,也是最重要的内容之一。其目的主要是从一个文本中抽取出特征词,以此来表示文本信息,使其从相对无结构的文本转化为具有一定结构以便计算机可以识别并进行处理的信息。论文分别运用朴... 基于文本分类的特征词选取是自然语言处理中最基础,也是最重要的内容之一。其目的主要是从一个文本中抽取出特征词,以此来表示文本信息,使其从相对无结构的文本转化为具有一定结构以便计算机可以识别并进行处理的信息。论文分别运用朴素贝叶斯及fastText两种分类方法,进行文本特征词提取与分类方法研究。实验结果表明,在文本分类精度与效率上,fastText算法表现最佳,但在样本属性相关性较小时,朴素贝叶斯性能最佳。 展开更多
关键词 特征词选取 文本分类 朴素贝叶斯 fastTest
下载PDF
上一页 1 2 13 下一页 到第
使用帮助 返回顶部