This paper aims to find out how Chinese senior high school students use booster 'very' in writing by means of contrastive interlanguage analysis (CIA). The writing corpora of Grade Three students in the Chines...This paper aims to find out how Chinese senior high school students use booster 'very' in writing by means of contrastive interlanguage analysis (CIA). The writing corpora of Grade Three students in the Chinese senior high schools, 04MET and 05MET, and the native speakers' writing corpora BROWN K&L are analyzed and compared with the help of the tools like MCO and AntConc. The findings of this study reveal the tendency of the usage of booster 'very' among Chinese senior high school students. Based on the above corpus analysis, this paper infers three pedagogical implications for English teaching and learning.展开更多
Though collocations have drawn much attention in the field of language acquisition, difficulties with them have not been investigated in much detail. This paper reports on a corpus-based exploratory study that analyze...Though collocations have drawn much attention in the field of language acquisition, difficulties with them have not been investigated in much detail. This paper reports on a corpus-based exploratory study that analyzes the mistakes learners made when they produced English collocations. The current study shows that not only beginners but also advanced learners have difficulties in choosing the right collocates and the difficulties that learners of different levels have are more or less the same. The L1 influence on the production of L2 collocations exists at every stage of learning though it varies with the learners' L2 competence.展开更多
Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world propriet...Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world proprietary datasets are too large to be copied onto a test storage system, and most data cannot be shared due to privacy issues, a benchmark needs to generate data synthetically. To ensure that the result is accurate, it is necessary to generate data content based on the characterization of real-world data properties that influence the storage system performance during the execution of a benchmark. The existing approach, called SDGen, cannot guarantee that the benchmark result is accurate in storage systems that have built-in word-based compressors. The reason is that SDGen characterizes the properties that influence compression performance only at the byte level, and no properties are characterized at the word level. To address this problem, we present TextGen, a realistic text data content generation method for modem storage system benchmarks. TextGen builds the word corpus by segmenting real-world text datasets, and creates a word-frequency distribution by counting each word in the corpus. To improve data generation performance, the word-frequency distribution is fitted to a lognormal distribution by maximum likelihood estimation. The Monte Carlo approach is used to generate synthetic data. The running time of TextGen generation depends only on the expected data size, which means that the time complexity of TextGen is O(n). To evaluate TextGen, four real-world datasets were used to perform an experiment. The experimental results show that, compared with SDGen, the compression performance and compression ratio of the datasets generated by TextGen deviate less from real-world datasets when end-tagged dense code, a representative of word-based compressors, is evaluated.展开更多
文摘This paper aims to find out how Chinese senior high school students use booster 'very' in writing by means of contrastive interlanguage analysis (CIA). The writing corpora of Grade Three students in the Chinese senior high schools, 04MET and 05MET, and the native speakers' writing corpora BROWN K&L are analyzed and compared with the help of the tools like MCO and AntConc. The findings of this study reveal the tendency of the usage of booster 'very' among Chinese senior high school students. Based on the above corpus analysis, this paper infers three pedagogical implications for English teaching and learning.
文摘Though collocations have drawn much attention in the field of language acquisition, difficulties with them have not been investigated in much detail. This paper reports on a corpus-based exploratory study that analyzes the mistakes learners made when they produced English collocations. The current study shows that not only beginners but also advanced learners have difficulties in choosing the right collocates and the difficulties that learners of different levels have are more or less the same. The L1 influence on the production of L2 collocations exists at every stage of learning though it varies with the learners' L2 competence.
基金Project supported by the National Natural Science Foundation of China (Nos. 61572394 and 61272098), the Shenzhen Funda mental Research Plan (Nos. JCYJ20120615101127404 and JSGG20140519141854753), and thc National Kcy Technologies R&D Program of China (No. 2011BAH04B03)
文摘Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world proprietary datasets are too large to be copied onto a test storage system, and most data cannot be shared due to privacy issues, a benchmark needs to generate data synthetically. To ensure that the result is accurate, it is necessary to generate data content based on the characterization of real-world data properties that influence the storage system performance during the execution of a benchmark. The existing approach, called SDGen, cannot guarantee that the benchmark result is accurate in storage systems that have built-in word-based compressors. The reason is that SDGen characterizes the properties that influence compression performance only at the byte level, and no properties are characterized at the word level. To address this problem, we present TextGen, a realistic text data content generation method for modem storage system benchmarks. TextGen builds the word corpus by segmenting real-world text datasets, and creates a word-frequency distribution by counting each word in the corpus. To improve data generation performance, the word-frequency distribution is fitted to a lognormal distribution by maximum likelihood estimation. The Monte Carlo approach is used to generate synthetic data. The running time of TextGen generation depends only on the expected data size, which means that the time complexity of TextGen is O(n). To evaluate TextGen, four real-world datasets were used to perform an experiment. The experimental results show that, compared with SDGen, the compression performance and compression ratio of the datasets generated by TextGen deviate less from real-world datasets when end-tagged dense code, a representative of word-based compressors, is evaluated.