This study makes a distinction between word and lexicon. A word is an individual lexical item. The lexicon is an aggregate of words. Past discussions of differences among lexicons could only list individual words for comparison. There was no way to show an overall view of lexical characteristics. Even when one compared the words collected in the dictionaries of different historical periods, it was difficult to see the lexical differences in various stages. In this study, the Old Chinese Digital Resources, Pre-Modern Chinese Digital Archive, Balanced Modern Chinese Digital Resources of "Academia Sinica", the 300 Tang Poems, the 300 Song Lyrics, the 1998 People’s Daily News Releases as word-segmented and tagged at Peking University, and the 1991-2002 digital news reports of the News Agency of Taiwan were examined. As the words used in these texts were vast in number, it was mandatory to extract a small number of significant lexical characteristics to capture the distinct nature of the lexicons. The crucial point of distinction is how words are used and not whether particular words exit. Word usage appears in word streams or texts. Therefore, the lexical attribute discussed here is called dynamic attribute. The occurrences of words in the text streams can be tabulated for their frequency and percentage of the occurrence with respect to the entire texts. As the word of the highest frequency is listed first, the cumulative percentage of the occurrences of the 15 highest frequency words is also tabulated. The cumulative percentage can be considered as the concentration level of high frequency words in use. This concentration level clearly differentiates the types of texts used in Old Chinese, Pre-Modern Chinese, Modern Chinese, poetic writings and press releases. Thus this lexical attribute is of quantitative nature and may be of some use in future research.
Essays on Linguistics
word and lexicon
lexical dynamic attribute
cumulative frequency percentage
concentration level of high frequency words