期刊文献+

一种基于领域知识的特征提取算法 被引量:2

A feature extraction algorithm based on domain knowledge
下载PDF
导出
摘要 特征抽取是网络舆情分析中最重要的环节之一,优秀的特征抽取算法能够极大的提高舆情分析的效率和准确率.对旅游网络舆情进行分析和监管,能够及时发现云南旅游中的突发事件,可提供给相关部门以便迅速采取正确的应对方式,对云南的旅游业发展有很大的帮助,分析了传统特征抽取算法正确率低下、运行效率不高等方面的不足,将领域本体知识应用在旅游网络舆情分析的特征抽取算法之中,建立旅游网络舆情领域本体,根据领域本体优化特征抽取计算特征词权重,经过多次大数据量试验验证,优化后的方法显著提高了特征抽取的正确率和运行效率,证明基于领域知识的特征抽取的正确率和运行效率得到很大的提升. Feature extraction is one of the most important links in the analysis of public opinion while an excellent feature extraction algorithm can greatly improve the efficiency and accuracy of such analysis. The analysis and supervision of the public opinion on the tourism network can help the relevant departments discover the unexpected events in Yunnan and promptly adopt the correct approaches, which can help the healthy development of the tourism of Yunnan. This paper analyzes the low efficiency and inaccuracy of the traditional feature extraction algorithm, and then applies domain knowledge to the new feature extraction algorithm for the analysis of the public opinion on the tourism network. Through investigating and researching the information through inquiries and consulting with some experts, this paper builds a domain ontology for public opinion on the tourism network, and then extracts the weights of the feature words according to the domain ontology by optimizing the features. Several tests based on big data show the efficiency and accuracy of this feature extraction algorithm based on domain knowledge, which proves that domain knowledge has a very positive effect on the analysis of public opinion on the tourism network.
出处 《云南民族大学学报(自然科学版)》 CAS 2017年第3期252-257,共6页 Journal of Yunnan Minzu University:Natural Sciences Edition
基金 基金项目:云南省高校商务智能科技创新团队(42212217010)
关键词 旅游网络舆情 领域本体 特征抽取 权重 public opinion on the tourism network domain ontology feature extraction weights
  • 相关文献

参考文献7

二级参考文献91

  • 1张伟,高宏卿.RSS技术在网络远程教育中的应用[J].教育信息化,2006(17):76-77. 被引量:3
  • 2张力.对RSS聚合个性化网络教育资源的探讨[J].中国医学教育技术,2006,20(6):512-515. 被引量:5
  • 3Seibel J, Yu F, Foster R. Text Mining System for Web- Based Business Intelligence Applied to Web Site Server Logs: US Patent[P]. 2008.
  • 4Salton G. Automatic Text Processing [M]. Boston: Addison- Wesley Longman Publishing Company, 1988.
  • 5Steinbach M, Karypis G, Kumar V. A Comparison of Document Clustering Techniques[C]//Proc of the SIGKDD' 00, 2000:109-111.
  • 6Goldszmidt M, Sahami M. A Probabilistic Approach to Full- Text Document Clustering [R]. Technical Report ITAD- 433-MS-98-044, SRI International, 1998.
  • 7Gruber T. A Translation Approach to Portable Ontology Specifications[J]. Knowledge Acquisition, 1993,5 (2) : 199- 220.
  • 8Hu J, Zhang X, Lu C, et al. Exploiting Wikipedia as External Knowledge for Document Clustering [C]//Proc of the ACM SIGKDD'09, 2009:389- 396.
  • 9Huang A, Milne D, Frank E, et al. Clustering Documents Using a Wikipedia-Based Concept Representation[C]///Proc of the 13th PAKDD'09, 2009: 628-636.
  • 10Brost W. Construction of Engineering Ontologies for Knowledge Sharing and Reuse: [Ph D Thesis][D]. Ensehede: University of Twente, 1997.

共引文献71

同被引文献20

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部