摘要
为了分析蛋白质序列中是否存在语言学中的Zipf定律,从蛋白质二级结构数据库DSSP中抽取1.7357万条序列,把具有相同二级结构标记的氨基酸残基连续片段定义为单词,结果表明:单词出现的频率分布近似服从指数为0.981的Zipf定律.
In order to analyze whether Zipf' s law in linguistics exists in protein sequences, this paper uses 1. 735 7 × 10^4 protein sequences labeled with secondary structures which are selected from the DSSP database. The segments of successive amino acid residues with a same code of secondary structure are defined as words. The results show that the distribution of word emerging frequency follows Zipf' s law with the exponent as 0.981.
出处
《北京工业大学学报》
CAS
CSCD
北大核心
2005年第4期366-368,共3页
Journal of Beijing University of Technology
基金
北京市自然科学基金资助项目(4052005)北京市教育委员会科技发展基金资助项目(km200310005013).