[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm su...[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.展开更多
A new joint decoding strategy that combines the character-based and word-based conditional random field model is proposed.In this segmentation framework,fragments are used to generate candidate Out-of-Vocabularies(OOV...A new joint decoding strategy that combines the character-based and word-based conditional random field model is proposed.In this segmentation framework,fragments are used to generate candidate Out-of-Vocabularies(OOVs).After the initial segmentation,the segmentation fragments are divided into two classes as "combination"(combining several fragments as an unknown word) and "segregation"(segregating to some words).So,more OOVs can be recalled.Moreover,for the characteristics of the cross-domain segmentation,context information is reasonably used to guide Chinese Word Segmentation(CWS).This method is proved to be effective through several experiments on the test data from Sighan Bakeoffs 2007 and Bakeoffs 2010.The rates of OOV recall obtain better performance and the overall segmentation performances achieve a good effect.展开更多
In this paper, the specific application of key words Spotting used in the network monitoring is studied, and the keywords spotting is emphasized. The whole monitoring system is divided into two mod-ules: network moni...In this paper, the specific application of key words Spotting used in the network monitoring is studied, and the keywords spotting is emphasized. The whole monitoring system is divided into two mod-ules: network monitoring and keywords spotting. In the part of network monitoring, this paper adopts a method which is based on ARP spoofing technology to monitor the users' data, and to obtain the original audio streams. In the part of keywords spotting, the extraction methods of PLP (one of the main characteristic arameters) is studied, and improved feature parameters- PMCC are put forward. Meanwhile, in order to accurately detect syllable, the paper the double-threshold method with variance of frequency band method, and use the latter to carry out endpoint detection. Finally, keywords recognition module is built by HMM, and identification results are contrasted under Matlab environment. From the experiment results, a better solution for the application of key words recognition technology in network monitoring is found.展开更多
As teachers of EFL (English as a Foreign Language) in the Kingdom of Saudi Arabia, we are sometimes flabbergasted by the errors that the/earners make in their English scripts. Apparently, there was no pattern to the...As teachers of EFL (English as a Foreign Language) in the Kingdom of Saudi Arabia, we are sometimes flabbergasted by the errors that the/earners make in their English scripts. Apparently, there was no pattern to these errors. However, when the researcher observed the frequency of similar error over a few years of her experience as an EFL faculty for Saudi learners, the theory of Lexical Relations appeared to offer an answer. Therein also lies the genesis of the current study. In a positive development, the study did bring to light certain factors that directly played a role in the Saudi EFL learners' English errors. These have been presented in this paper with the hope that with the diagnosis in hand, it will be easier for the EFL teachers to find a solution to the dearth of English proficiency among the Saudi EFL learners.展开更多
This article presents an effective approach to polysemous-word acquisition based on cognitive linguistics. Although it may be true that most vocabulary is acquired through incidental learning, acquiring words through ...This article presents an effective approach to polysemous-word acquisition based on cognitive linguistics. Although it may be true that most vocabulary is acquired through incidental learning, acquiring words through inferring from context is not necessarily the most effective or efficient method in instructional setting. Results of a series of vocabulary tests involving Chinese learners of English show that providing a core sense leads to better guessing and long-term retention of figurative senses of polysemous words. In this article, some practical implications are suggested for reference.展开更多
Guessing strategy is a traditional and effective way for EFL students to improve their reading. Almost all readers apply this method, to a greater or lesser extent, when reading different kinds of materials. This is p...Guessing strategy is a traditional and effective way for EFL students to improve their reading. Almost all readers apply this method, to a greater or lesser extent, when reading different kinds of materials. This is partly because readers simply do not have the time to look up every new word in the dictionary. Linguistics developments in recent years make it possible to reconsider this kind of strategy in the framework of cognitive grammar. A number of theories have provided more evidence regarding the effectiveness of a guessing strategy. Linguistic theories dealing with terms such as schemata, prototype, etc. make it possible to reconsider the strategy in a broader context. Schemata theory tells us that the context of a given word is not the only source and basis of guessing. Other factors include background knowledge and the given word itself. Prototype theory, on the other hand, shows how readers guess the meaning of a familiar word form with a completely new part of speech or meaning. Even though cognitive linguistics has shown the effectiveness of a guessing strategy in reading, the applicability of the method needs to be reconsidered. Some materials may not be suitable for the application of a guessing strategy.展开更多
The present study employs a word association test to investigate the nature of Chinese English learners' mental lexicon by comparing the association responses of native speakers and Chinese English learners. The resu...The present study employs a word association test to investigate the nature of Chinese English learners' mental lexicon by comparing the association responses of native speakers and Chinese English learners. The result shows that there are significant differences in the structure of mental lexicons between Chinese English learners and native speakers. With regard to L1 mental lexicons, Chinese English learners have poorer concentricity of association and weaker association strength. Their association is more dependent on forms. They have no established systematic and stable networks between words. The semantic network in their mental lexicon is underdeveloped. The results of the experiments have some implications for L2 vocabulary teaching and learning.展开更多
With the purpose of describing levels of English language proficiency expected at each stage in our school's Diploma and BA programs, we attempted to compare the level of courses with national standards as embodied i...With the purpose of describing levels of English language proficiency expected at each stage in our school's Diploma and BA programs, we attempted to compare the level of courses with national standards as embodied in the national TEM4 and TEM8, and with international standards as embodied in international examinations such as Cambridge ESOL, and other descriptions such as the Common European Framework via one quantifiable parameter: vocabulary range. This is justified as vocabulary range offers an approximate but useful guide to the level of a course or a testing system. We hypothesize that the language competence at different levels of our program matches various standard proficiency examinations. Paul Nation's Range software was used both in its standard form using his three BASEWRD files and in an adapted form adding the authors' own BASEWRD files extrapolated from various levels of our textbook series. This enabled us to compare the vocabulary range of our courses with that of both national and international examinations where word lists are available or recoverable. Research results supported the hypotheses suggested.展开更多
This paper presents a study of the effect of exposure to meanings of unknown vocabulary on reading comprehension in the target language (in this case, English). The subjects are 30 second-year Chinese college students...This paper presents a study of the effect of exposure to meanings of unknown vocabulary on reading comprehension in the target language (in this case, English). The subjects are 30 second-year Chinese college students. The results illustrate that the subjects' being exposed to the meanings of unknown words in a passage does exert a significant effect on their reading comprehension in the target language.展开更多
This exploratory study aims at probing into the quantitative and qualitative lexical development of freshmen and sophomores in University of Science and Technology of China, one of the top universities in China. An in...This exploratory study aims at probing into the quantitative and qualitative lexical development of freshmen and sophomores in University of Science and Technology of China, one of the top universities in China. An investigation on lexical size and depth was administered to 76 freshmen who just registered in the university, and 104 sophomores who had finished the college English course. The vocabulary size test was developed on the basis of Paul Nation's Vocabulary Levels Test, whereas vocabulary depth test was based on the lexical competency framework (Nation 1999), measuring the three types of word knowledge: spelling, meaning, and word-class knowledge of six target words. The results suggest that (1) the freshmen had a start-up vocabulary size of about 3800 words and sophomores knew about 5000 words; (2) both groups of subjects had little trouble with spelling, but their grammatical knowledge and meaning knowledge were limited; (3) meaning reception was much better than meaning production, and the reception-production gap widened in the given learning session; (4) correlations between vocabulary size and word knowledge types were relatively significant and changed with subjects' L2 proficiency, and vocabulary size test was not a good indicator of depth of word knowledge.展开更多
This study explores word class influence upon L1 and L2 word association. The participants included 26 L1 English speakers and 28 advanced EFL learners who finished an English word association test that involved three...This study explores word class influence upon L1 and L2 word association. The participants included 26 L1 English speakers and 28 advanced EFL learners who finished an English word association test that involved three types of stimuli: nouns, verbs and adjectives. Response words to the stimuli were classified into paradigmatic, syntagmatic, encyclopedic and form- based categories. Results show that: 1) L2 mental lexicon largely resembled that of L1 English speakers in that both were dominated by paradigmatic association, but L2 syntagmatic association was obviously weaker than that of L1 across the three word classes; 2) Verbs and adjectives demonstrated a greater potential to elicit syntagmatic responses than nouns in both L1 and L2 association; 3) Compared with verbs and adjectives, nouns were more paradigmatically challenging to L2 learners.展开更多
This paper demonstrates a new framework of vocabulary learning processes(VLPs) with fill/ justification of involved learning steps, in response to inefficient self-directed vocabulary learning(SDVL) of college stu...This paper demonstrates a new framework of vocabulary learning processes(VLPs) with fill/ justification of involved learning steps, in response to inefficient self-directed vocabulary learning(SDVL) of college students in China's Mainland. Based on the review of established frameworks for second language acquisition and vocabulary learning, a more systematic and comprehensive framework of VLPs is constructed with six new VLPs explored and specified sequentially in a cycle. It aims to help initiate learners' mental efforts in a rational way and thereby achieve learners' long-term word retention and good word transfer. Besides, the newly-designed VLPs are embodied in authentic learning material for learners' self-study of vocabulary in particular domains. The efficacy of the material is also tested in an empirical study for the purpose of validating the effectiveness of these VLPs.展开更多
Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world propriet...Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world proprietary datasets are too large to be copied onto a test storage system, and most data cannot be shared due to privacy issues, a benchmark needs to generate data synthetically. To ensure that the result is accurate, it is necessary to generate data content based on the characterization of real-world data properties that influence the storage system performance during the execution of a benchmark. The existing approach, called SDGen, cannot guarantee that the benchmark result is accurate in storage systems that have built-in word-based compressors. The reason is that SDGen characterizes the properties that influence compression performance only at the byte level, and no properties are characterized at the word level. To address this problem, we present TextGen, a realistic text data content generation method for modem storage system benchmarks. TextGen builds the word corpus by segmenting real-world text datasets, and creates a word-frequency distribution by counting each word in the corpus. To improve data generation performance, the word-frequency distribution is fitted to a lognormal distribution by maximum likelihood estimation. The Monte Carlo approach is used to generate synthetic data. The running time of TextGen generation depends only on the expected data size, which means that the time complexity of TextGen is O(n). To evaluate TextGen, four real-world datasets were used to perform an experiment. The experimental results show that, compared with SDGen, the compression performance and compression ratio of the datasets generated by TextGen deviate less from real-world datasets when end-tagged dense code, a representative of word-based compressors, is evaluated.展开更多
基金Supported by the Science Foundation of Hengyang Normal University of China(09A36)~~
文摘[Objective] To examine the grammar model based on lexical substring exac- tion for RNA secondary structure prediction. [Method] By introducing cloud model into stochastic grammar model, a machine learning algorithm suitable for the lexicalized stochastic grammar model was proposed. The word grid mode was used to extract and divide RNA sequence to acquire lexical substring, and the cloud classifier was used to search the maximum probability of each lemma which was marked as a certain sec- ondary structure type. Then, the lemma information was introduced into the training stochastic grammar process as prior information, realizing the prediction on the sec- ondary structure of RNA, and the method was tested by experiment. [Result] The experimental results showed that the prediction accuracy and searching speed of stochastic grammar cloud model were significantly improved from the prediction with simple stochastic grammar. [Conclusion] This study laid the foundation for the wide application of stochastic grammar model for RNA secondary structure prediction.
基金supported by the National Natural Science Foundation of China under Grants No.61173100,No.61173101the Fundamental Research Funds for the Central Universities under Grant No.DUT10RW202
文摘A new joint decoding strategy that combines the character-based and word-based conditional random field model is proposed.In this segmentation framework,fragments are used to generate candidate Out-of-Vocabularies(OOVs).After the initial segmentation,the segmentation fragments are divided into two classes as "combination"(combining several fragments as an unknown word) and "segregation"(segregating to some words).So,more OOVs can be recalled.Moreover,for the characteristics of the cross-domain segmentation,context information is reasonably used to guide Chinese Word Segmentation(CWS).This method is proved to be effective through several experiments on the test data from Sighan Bakeoffs 2007 and Bakeoffs 2010.The rates of OOV recall obtain better performance and the overall segmentation performances achieve a good effect.
基金supported by the Natural Science Foundation of Guangxi Province(No.60961002)
文摘In this paper, the specific application of key words Spotting used in the network monitoring is studied, and the keywords spotting is emphasized. The whole monitoring system is divided into two mod-ules: network monitoring and keywords spotting. In the part of network monitoring, this paper adopts a method which is based on ARP spoofing technology to monitor the users' data, and to obtain the original audio streams. In the part of keywords spotting, the extraction methods of PLP (one of the main characteristic arameters) is studied, and improved feature parameters- PMCC are put forward. Meanwhile, in order to accurately detect syllable, the paper the double-threshold method with variance of frequency band method, and use the latter to carry out endpoint detection. Finally, keywords recognition module is built by HMM, and identification results are contrasted under Matlab environment. From the experiment results, a better solution for the application of key words recognition technology in network monitoring is found.
文摘As teachers of EFL (English as a Foreign Language) in the Kingdom of Saudi Arabia, we are sometimes flabbergasted by the errors that the/earners make in their English scripts. Apparently, there was no pattern to these errors. However, when the researcher observed the frequency of similar error over a few years of her experience as an EFL faculty for Saudi learners, the theory of Lexical Relations appeared to offer an answer. Therein also lies the genesis of the current study. In a positive development, the study did bring to light certain factors that directly played a role in the Saudi EFL learners' English errors. These have been presented in this paper with the hope that with the diagnosis in hand, it will be easier for the EFL teachers to find a solution to the dearth of English proficiency among the Saudi EFL learners.
文摘This article presents an effective approach to polysemous-word acquisition based on cognitive linguistics. Although it may be true that most vocabulary is acquired through incidental learning, acquiring words through inferring from context is not necessarily the most effective or efficient method in instructional setting. Results of a series of vocabulary tests involving Chinese learners of English show that providing a core sense leads to better guessing and long-term retention of figurative senses of polysemous words. In this article, some practical implications are suggested for reference.
文摘Guessing strategy is a traditional and effective way for EFL students to improve their reading. Almost all readers apply this method, to a greater or lesser extent, when reading different kinds of materials. This is partly because readers simply do not have the time to look up every new word in the dictionary. Linguistics developments in recent years make it possible to reconsider this kind of strategy in the framework of cognitive grammar. A number of theories have provided more evidence regarding the effectiveness of a guessing strategy. Linguistic theories dealing with terms such as schemata, prototype, etc. make it possible to reconsider the strategy in a broader context. Schemata theory tells us that the context of a given word is not the only source and basis of guessing. Other factors include background knowledge and the given word itself. Prototype theory, on the other hand, shows how readers guess the meaning of a familiar word form with a completely new part of speech or meaning. Even though cognitive linguistics has shown the effectiveness of a guessing strategy in reading, the applicability of the method needs to be reconsidered. Some materials may not be suitable for the application of a guessing strategy.
文摘The present study employs a word association test to investigate the nature of Chinese English learners' mental lexicon by comparing the association responses of native speakers and Chinese English learners. The result shows that there are significant differences in the structure of mental lexicons between Chinese English learners and native speakers. With regard to L1 mental lexicons, Chinese English learners have poorer concentricity of association and weaker association strength. Their association is more dependent on forms. They have no established systematic and stable networks between words. The semantic network in their mental lexicon is underdeveloped. The results of the experiments have some implications for L2 vocabulary teaching and learning.
文摘With the purpose of describing levels of English language proficiency expected at each stage in our school's Diploma and BA programs, we attempted to compare the level of courses with national standards as embodied in the national TEM4 and TEM8, and with international standards as embodied in international examinations such as Cambridge ESOL, and other descriptions such as the Common European Framework via one quantifiable parameter: vocabulary range. This is justified as vocabulary range offers an approximate but useful guide to the level of a course or a testing system. We hypothesize that the language competence at different levels of our program matches various standard proficiency examinations. Paul Nation's Range software was used both in its standard form using his three BASEWRD files and in an adapted form adding the authors' own BASEWRD files extrapolated from various levels of our textbook series. This enabled us to compare the vocabulary range of our courses with that of both national and international examinations where word lists are available or recoverable. Research results supported the hypotheses suggested.
文摘This paper presents a study of the effect of exposure to meanings of unknown vocabulary on reading comprehension in the target language (in this case, English). The subjects are 30 second-year Chinese college students. The results illustrate that the subjects' being exposed to the meanings of unknown words in a passage does exert a significant effect on their reading comprehension in the target language.
文摘This exploratory study aims at probing into the quantitative and qualitative lexical development of freshmen and sophomores in University of Science and Technology of China, one of the top universities in China. An investigation on lexical size and depth was administered to 76 freshmen who just registered in the university, and 104 sophomores who had finished the college English course. The vocabulary size test was developed on the basis of Paul Nation's Vocabulary Levels Test, whereas vocabulary depth test was based on the lexical competency framework (Nation 1999), measuring the three types of word knowledge: spelling, meaning, and word-class knowledge of six target words. The results suggest that (1) the freshmen had a start-up vocabulary size of about 3800 words and sophomores knew about 5000 words; (2) both groups of subjects had little trouble with spelling, but their grammatical knowledge and meaning knowledge were limited; (3) meaning reception was much better than meaning production, and the reception-production gap widened in the given learning session; (4) correlations between vocabulary size and word knowledge types were relatively significant and changed with subjects' L2 proficiency, and vocabulary size test was not a good indicator of depth of word knowledge.
基金supported in part by a research grant from Jiangsu Provincial Education Bureau (2014SJD118)
文摘This study explores word class influence upon L1 and L2 word association. The participants included 26 L1 English speakers and 28 advanced EFL learners who finished an English word association test that involved three types of stimuli: nouns, verbs and adjectives. Response words to the stimuli were classified into paradigmatic, syntagmatic, encyclopedic and form- based categories. Results show that: 1) L2 mental lexicon largely resembled that of L1 English speakers in that both were dominated by paradigmatic association, but L2 syntagmatic association was obviously weaker than that of L1 across the three word classes; 2) Verbs and adjectives demonstrated a greater potential to elicit syntagmatic responses than nouns in both L1 and L2 association; 3) Compared with verbs and adjectives, nouns were more paradigmatically challenging to L2 learners.
文摘This paper demonstrates a new framework of vocabulary learning processes(VLPs) with fill/ justification of involved learning steps, in response to inefficient self-directed vocabulary learning(SDVL) of college students in China's Mainland. Based on the review of established frameworks for second language acquisition and vocabulary learning, a more systematic and comprehensive framework of VLPs is constructed with six new VLPs explored and specified sequentially in a cycle. It aims to help initiate learners' mental efforts in a rational way and thereby achieve learners' long-term word retention and good word transfer. Besides, the newly-designed VLPs are embodied in authentic learning material for learners' self-study of vocabulary in particular domains. The efficacy of the material is also tested in an empirical study for the purpose of validating the effectiveness of these VLPs.
基金Project supported by the National Natural Science Foundation of China (Nos. 61572394 and 61272098), the Shenzhen Funda mental Research Plan (Nos. JCYJ20120615101127404 and JSGG20140519141854753), and thc National Kcy Technologies R&D Program of China (No. 2011BAH04B03)
文摘Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world proprietary datasets are too large to be copied onto a test storage system, and most data cannot be shared due to privacy issues, a benchmark needs to generate data synthetically. To ensure that the result is accurate, it is necessary to generate data content based on the characterization of real-world data properties that influence the storage system performance during the execution of a benchmark. The existing approach, called SDGen, cannot guarantee that the benchmark result is accurate in storage systems that have built-in word-based compressors. The reason is that SDGen characterizes the properties that influence compression performance only at the byte level, and no properties are characterized at the word level. To address this problem, we present TextGen, a realistic text data content generation method for modem storage system benchmarks. TextGen builds the word corpus by segmenting real-world text datasets, and creates a word-frequency distribution by counting each word in the corpus. To improve data generation performance, the word-frequency distribution is fitted to a lognormal distribution by maximum likelihood estimation. The Monte Carlo approach is used to generate synthetic data. The running time of TextGen generation depends only on the expected data size, which means that the time complexity of TextGen is O(n). To evaluate TextGen, four real-world datasets were used to perform an experiment. The experimental results show that, compared with SDGen, the compression performance and compression ratio of the datasets generated by TextGen deviate less from real-world datasets when end-tagged dense code, a representative of word-based compressors, is evaluated.