期刊文献+

语义、句法网络作为语体分类知识源的对比研究 被引量:2

Comparison study of using semantic and syntactic network characteristics to do text clustering
下载PDF
导出
摘要 基于6种语体的句法和语义树库分别构建了依存句法和语义网络,对这些网络的边数、节点数、节点平均度、聚类系数、平均最短路径长度、网络中心势、直径、节点度幂律分布的幂指数、度分布与幂律拟合的决定系数等整体特征进行了对比分析。以这些整体特征为变量,采用不同的聚类方法,对这6种语体的句法和语义网络进行了聚类分析。研究结果显示,同样是基于语言学原则构建起来的网络结构,依存句法网络和依存语义网络之间有明显差异。其参数的含义不尽相同,依据其各项参数所做的聚类实验的结果也不相同。采用语义网络的一些主要参数组合,可以获得相对合理的聚类结果,但不能很好地区分书面语体和口语体;通过句法网络的一些主要参数组合,可以很好地区分不同语体的文本,获得较为合理的文本聚类结果。 The study builds six dependence syntactic networks and semantic networks based on syntactic and semantic treebanks of different genres and does a comparative analysis of overall features of the networks, including the number of edges, the number of the nodes, the average degree, the clustering coefficient, the average path length, the centraliza- tion, the diameter, the index of power-law, and the coefficient of determination. The article tries multi-methods, with fea- tures as variables, to do clustering analysis of these networks. The results show that, although the syntactic and semantic networks all follow the linguistic principles, there are obvious differences between syntax and semantic networks. The meanings of the network parameters vary and the clustering results according to the parameters are different. Using the combinations of main semantic network parameters can obtain relatively reasonable clustering results, but it cannot distin- guish well written style from colloquialism while using the combinations of main syntactic network parameters can well distinguish different styles of texts and obtain reasonable text clustering results.
出处 《计算机工程与应用》 CSCD 2014年第2期10-14,43,共6页 Computer Engineering and Applications
基金 国家社会科学基金重大项目(No.11&ZD188)
关键词 语体 文本分类 网络特征 genre text clustering network features
  • 相关文献

参考文献19

二级参考文献182

共引文献168

同被引文献19

  • 1唐璐,张永光,付雪.Structures of semantic networks: how do we learn semantic knowledge[J].Journal of Southeast University(English Edition),2006,22(3):413-417. 被引量:5
  • 2谭跃进 吕欣 吴俊 等.复杂网络抗毁性研究若干问题的思考.系统工程理论与实践,2008,(0):116-120.
  • 3刘知远,孙茂松.汉语词同现网络的小世界效应和无标度特性[J].中文信息学报,2007,21(6):52-58. 被引量:41
  • 4LIU Jian-yi, WANG Jing-hua. Keyword extraction using language net- work [ C ]//Proc of IEEE International Conference on Natural Lan- guage Processing and Knowledge. 2007 : 129-134.
  • 5SOLE R V, COROMINAS-MURTRA B, VALVERDE S, et el. Lan- guage networks : their structure, function, and evolution [ J ]. Com- plexity,2010,15 (6) :20-26.
  • 6CANCHO R F I, SOLE R V. The small world of human language [J]. Proceedings of the Royal Society of London Series B-Bio- logical Sciences ,2001,268(1482) :2261-2265.
  • 7CANCHO R F I, SOLE R V. Two regimes in the frequency of words and the origins of complex lexicons : zipf' s law revisited [ J ]. ,Journal of Quantitative Linguistics ,2001,8 ( 3 ) : 165-173.
  • 8GAO Yu-yang, LIANG Wei, SHI Yu-ming, et al. Comparison of di- rected and weighted co-occurrence networks of six languages [ J ]. Physica A: Statistical Mechanics and its Applications, 2014,393:579-589.
  • 9SHENG Long, LI Chun-guang. English and Chinese languages as weighted complex networks[J]. Physica A: Statistical Mechanics and its Applications, 2009,388 ( 12 ) : 2561 - 2570.
  • 10BARRAT A, BARTHELEMY M, VESPIGNANI A. Modeling the evolution of weighted networks [ J ]. Physical Review E, 2004,70 (6) :1-1.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部