From a basic probabilistic argumentation, the Zipfian distribution and Benford’s law are derived. It is argued that Zipf’s law fits to calculate the rank probabilities of identical indistinguishable objects and that...From a basic probabilistic argumentation, the Zipfian distribution and Benford’s law are derived. It is argued that Zipf’s law fits to calculate the rank probabilities of identical indistinguishable objects and that Benford’s distribution fits to calculate the rank probabilities of distinguishable objects. i.e. in the distribution of words in long texts all the words in a given rank are identical, therefore, the rank distribution is Zipfian. In logarithmic tables, the objects with identical 1st digits are distinguishable as there are many different digits in the 2nd, 3rd… places, etc., and therefore the distribution is according to Benford’s law. Pareto 20 - 80 rule is shown to be an outcome of Benford’s distribution as when the number of ranks is about 10 the probability of 20% of the high probability ranks is equal to the probability of the rest of 80% low probability ranks. It is argued that all these distributions, including the central limit theorem, are outcomes of Planck’s law and are the result of the quantization of energy. This argumentation may be considered a physical origin of probability.展开更多
Zipf's approach in linguistics is utilized to analyze the statistical features of frequency and correlation of 16 nearest neighboring nucleotides (AA, AC, AG, …, TT) in 12 human chro- mosomes (Y, 22, 21, 20, 19, ...Zipf's approach in linguistics is utilized to analyze the statistical features of frequency and correlation of 16 nearest neighboring nucleotides (AA, AC, AG, …, TT) in 12 human chro- mosomes (Y, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, and 12). It is found that these statistical features of nearest neighboring nucleotides in human genome: (i) the frequency distribution is a linear function, and (ii) the correlation distribution is an inverse function. The coefficients of the linear function and inverse function depend on the GC content. It proposes the correlation distribution of nearest neighboring nucleotides for the first time and extends the descriptor about nearest neighboring nueleotides.展开更多
To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their...To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their linguistic features were analyzed by two tests: Zipf power law and Shannon entropy, developed for analysis of natural languages and symbolic sequences. The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered. The results show that: the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4; the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins; a simple uni-gram model can distinguish different organisms; there exist organism-specific usages of "phrases" in protein sequences. It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence, structure and function.展开更多
A set of techniques for well treatment aimed to enhance oil recovery are considered in the present study.These are based on the application of elastic waves of various types(dilation-wave,vibro-wave,or other acoustica...A set of techniques for well treatment aimed to enhance oil recovery are considered in the present study.These are based on the application of elastic waves of various types(dilation-wave,vibro-wave,or other acoustically induced effects).In such a context,a new technique is proposed to predict the effectiveness of the elastic-wave well treatment using the rank distribution according to Zipf’s law.It is revealed that,when the results of elastic wave well treatments are analyzed,groups of wells exploiting various geological deposits can differ in terms of their slope coefficients and free members.As the slope coefficient increases,the average increase in the well oil production rate(after the well treatment)becomes larger.An equation is obtained accordingly for estimating the slope coefficient in the Zipf’s equation from the frequency of the elastic wave.The obtained results demonstrate the applicability of the Zipf’s law in the analysis of the technological efficiency of elastic-wave well treatment methods.展开更多
When P indistinguishable balls are randomly distributed among L distinguishable boxes, and considering the dense system , our natural intuition tells us that the box with the average number of balls P/L has the highes...When P indistinguishable balls are randomly distributed among L distinguishable boxes, and considering the dense system , our natural intuition tells us that the box with the average number of balls P/L has the highest probability and that none of boxes are empty;however in reality, the probability of the empty box is always the highest. This fact is with contradistinction to sparse system (i.e. energy distribution in gas) in which the average value has the highest probability. Here we show that when we postulate the requirement that all possible configurations of balls in the boxes have equal probabilities, a realistic “long tail” distribution is obtained. This formalism when applied for sparse systems converges to distributions in which the average is preferred. We calculate some of the distributions resulted from this postulate and obtain most of the known distributions in nature, namely: Zipf’s law, Benford’s law, particles energy distributions, and more. Further generalization of this novel approach yields not only much better predictions for elections, polls, market share distribution among competing companies and so forth, but also a compelling probabilistic explanation for Planck’s famous empirical finding that the energy of a photon is hv.展开更多
Inter-city mobility is one of the most important issues in the UN Sustainable Development Goals,as it is essential to access the regional labour market,goods and services,and to constrain the spread of infectious dise...Inter-city mobility is one of the most important issues in the UN Sustainable Development Goals,as it is essential to access the regional labour market,goods and services,and to constrain the spread of infectious diseases.Although the gravity model has been proved to be an effective model to describe mobility among settlements,knowledge is still insufficient in regions where dozens of megacities interact closely and over 100 million people reside.In addition,the existing knowledge is limited to overall population mobility,while the difference in inter-city travel with different purposes is unexplored on such a large geographic scale.We revisited the gravity laws of inter-city mobility using the 2.12 billion trip chains recorded by 40.48 million mobile phone users’trajectories in the Jing-Jin-Ji Region,which contains China’s capital Beijing.Firstly,unlike previous studies,we found that non-commuting rather than commuting is the dominant type of inter-city mobility(89.3%).Noncommuting travellers have a travel distance 42.3%longer than commuting travellers.Secondly,we developed more accurate gravity models for the spatial distribution of inter-city commuting and non-commuting travel.We also found that inter-city mobility has a hierarchical structure,as the distribution of inter-city travel volume follows Zipf’s law.In particular,the hierarchy of non-commuting travel volume among the cities is more in line with an ideal Zipf distribution than commuting travel.Our findings contribute to new knowledge on basic inter-city mobility laws,and they have significant applications for regional policies on human mobility.展开更多
文摘From a basic probabilistic argumentation, the Zipfian distribution and Benford’s law are derived. It is argued that Zipf’s law fits to calculate the rank probabilities of identical indistinguishable objects and that Benford’s distribution fits to calculate the rank probabilities of distinguishable objects. i.e. in the distribution of words in long texts all the words in a given rank are identical, therefore, the rank distribution is Zipfian. In logarithmic tables, the objects with identical 1st digits are distinguishable as there are many different digits in the 2nd, 3rd… places, etc., and therefore the distribution is according to Benford’s law. Pareto 20 - 80 rule is shown to be an outcome of Benford’s distribution as when the number of ranks is about 10 the probability of 20% of the high probability ranks is equal to the probability of the rest of 80% low probability ranks. It is argued that all these distributions, including the central limit theorem, are outcomes of Planck’s law and are the result of the quantization of energy. This argumentation may be considered a physical origin of probability.
基金ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (No.20173023 and No.90203012) and the Specialized Research Fund for the Doctoral Program of Higher Education of China
文摘Zipf's approach in linguistics is utilized to analyze the statistical features of frequency and correlation of 16 nearest neighboring nucleotides (AA, AC, AG, …, TT) in 12 human chro- mosomes (Y, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, and 12). It is found that these statistical features of nearest neighboring nucleotides in human genome: (i) the frequency distribution is a linear function, and (ii) the correlation distribution is an inverse function. The coefficients of the linear function and inverse function depend on the GC content. It proposes the correlation distribution of nearest neighboring nucleotides for the first time and extends the descriptor about nearest neighboring nueleotides.
基金Sponsored by the National Natural Science Foundation of China(Grant No.60435020)
文摘To obtain the statistical sequence analysis on a large number of genomic and proteomic sequences available for different organisms, the n-grams of whole genome protein sequences from 20 organisms were extracted. Their linguistic features were analyzed by two tests: Zipf power law and Shannon entropy, developed for analysis of natural languages and symbolic sequences. The natural genome proteins and the artificial genome proteins were compared with each other and some statistical features of n-grams were discovered. The results show that: the n-grams of whole genome protein sequences approximately follow the Zipf law when n is larger than 4; the Shannon n-gram entropy of natural genome proteins is lower than that of artificial proteins; a simple uni-gram model can distinguish different organisms; there exist organism-specific usages of "phrases" in protein sequences. It is suggested that further detailed analysis on n-gram of whole genome protein sequences will result in a powerful model for mapping the relationship of protein sequence, structure and function.
基金supported by the Government of Perm Krai,Research Project No.C-26/628 dated 05/04/2021.
文摘A set of techniques for well treatment aimed to enhance oil recovery are considered in the present study.These are based on the application of elastic waves of various types(dilation-wave,vibro-wave,or other acoustically induced effects).In such a context,a new technique is proposed to predict the effectiveness of the elastic-wave well treatment using the rank distribution according to Zipf’s law.It is revealed that,when the results of elastic wave well treatments are analyzed,groups of wells exploiting various geological deposits can differ in terms of their slope coefficients and free members.As the slope coefficient increases,the average increase in the well oil production rate(after the well treatment)becomes larger.An equation is obtained accordingly for estimating the slope coefficient in the Zipf’s equation from the frequency of the elastic wave.The obtained results demonstrate the applicability of the Zipf’s law in the analysis of the technological efficiency of elastic-wave well treatment methods.
文摘When P indistinguishable balls are randomly distributed among L distinguishable boxes, and considering the dense system , our natural intuition tells us that the box with the average number of balls P/L has the highest probability and that none of boxes are empty;however in reality, the probability of the empty box is always the highest. This fact is with contradistinction to sparse system (i.e. energy distribution in gas) in which the average value has the highest probability. Here we show that when we postulate the requirement that all possible configurations of balls in the boxes have equal probabilities, a realistic “long tail” distribution is obtained. This formalism when applied for sparse systems converges to distributions in which the average is preferred. We calculate some of the distributions resulted from this postulate and obtain most of the known distributions in nature, namely: Zipf’s law, Benford’s law, particles energy distributions, and more. Further generalization of this novel approach yields not only much better predictions for elections, polls, market share distribution among competing companies and so forth, but also a compelling probabilistic explanation for Planck’s famous empirical finding that the energy of a photon is hv.
基金supported by the National Natural Science Foundation of China(Grant Nos.41925003&42130402)the Beijing Municipal Social Science Foundation(Grant No.20JCB073)。
文摘Inter-city mobility is one of the most important issues in the UN Sustainable Development Goals,as it is essential to access the regional labour market,goods and services,and to constrain the spread of infectious diseases.Although the gravity model has been proved to be an effective model to describe mobility among settlements,knowledge is still insufficient in regions where dozens of megacities interact closely and over 100 million people reside.In addition,the existing knowledge is limited to overall population mobility,while the difference in inter-city travel with different purposes is unexplored on such a large geographic scale.We revisited the gravity laws of inter-city mobility using the 2.12 billion trip chains recorded by 40.48 million mobile phone users’trajectories in the Jing-Jin-Ji Region,which contains China’s capital Beijing.Firstly,unlike previous studies,we found that non-commuting rather than commuting is the dominant type of inter-city mobility(89.3%).Noncommuting travellers have a travel distance 42.3%longer than commuting travellers.Secondly,we developed more accurate gravity models for the spatial distribution of inter-city commuting and non-commuting travel.We also found that inter-city mobility has a hierarchical structure,as the distribution of inter-city travel volume follows Zipf’s law.In particular,the hierarchy of non-commuting travel volume among the cities is more in line with an ideal Zipf distribution than commuting travel.Our findings contribute to new knowledge on basic inter-city mobility laws,and they have significant applications for regional policies on human mobility.