摘要
广义话题结构是汉语篇章中客观存在的结构形式。依据有限状态机的思想设计了识别广义话题结构的计算模型,在较大规模语料中初步检验了它的有效性,分析了该模型的空间复杂度和时间复杂度。该模型的特点是:递推控制,输出和输入以标点句为单位同步进行,无长距离回溯,有限回填,有限存储,保持词序。这些特点正是人在"话题—说明"信息的认知过程中所遵循的准则,因此该计算模型可以看作人完成这一认知过程的机械模型。
Generalized topic structure(GTS)is the fundamental objective structure in Chinese text.We design a computational model to recognize this structure based on the idea of finite-state machine(FSM).We preliminarily prove its validity in large-scale corpus and analyze its spatial complexity and time complexity.The characteristics of this model are:iterative control,synchronization of output and input in punctuation clauses(P-clause),none backtracking in long distance,limited backfilling,limited storage,and unchanged lexical order.These features are also the principles obeyed by human being while cognizing the topic-comment information in text.Thus,this model can be regarded as a mechanical model of the cognitive process of human.
作者
卢达威
宋柔
尚英
LU Da wei;SONG Rou;SHANG Ying(Department of Chinese Language and Literature,Peking University,Beijing 100871;School of Information Science,Beijing Language and Culture University,Beijing 100083;School of Chinese Studies,Beijing Language and Culture University,Beijing 100083,China)
出处
《计算机工程与科学》
CSCD
北大核心
2018年第7期1264-1274,共11页
Computer Engineering & Science
基金
教育部人文社会科学研究青年项目(16YJC740050)
中国博士后科学基金(2016M600838)
关键词
广义话题结构
认知
计算模型
标点句
话题自足句
generalized topic structure
cognition
computational model
punctuation clause
topic sufficient sentence