This paper presents two different algorithms that derive the cohesion structure in the form of lexical chains from two kinds of language resources HowNet and TongYiCiCiLin. The re-search that connects the cohesion str...This paper presents two different algorithms that derive the cohesion structure in the form of lexical chains from two kinds of language resources HowNet and TongYiCiCiLin. The re-search that connects the cohesion structure of a text to the derivation of its summary is displayed. A novel model of automatic text summarization is devised,based on the data provided by lexical chains from original texts. Moreover,the construction rules of lexical chains are modified accord-ing to characteristics of the knowledge database in order to be more suitable for Chinese summa-rization. Evaluation results show that high quality indicative summaries are produced from Chi-nese texts.展开更多
Webpage keyword extraction is very important for automatically extracting webpage summary, retrieval, automatic question answering, and character relation extraction, etc. In this paper, the environment vector of word...Webpage keyword extraction is very important for automatically extracting webpage summary, retrieval, automatic question answering, and character relation extraction, etc. In this paper, the environment vector of words is constructed with lexical chain, words context, word frequency, and webpage attribute weights according to the keywords characteristics. Thus, the multi-factor table of words is constructed, and then the keyword extraction issue is divided into two types according to the multi-factor table of words: keyword and non-keyword. Then, words are classified again with the support vector machine (SVM), and this method can extract the keywords of unregistered words and eliminate the semantic ambiguities. Experimental results show that this method is with higher precision ratio and recall ratio compared with the simple ff/idf algorithm.展开更多
Semantic lexical chains have been regarded as important in textural cohesion, although traditionally, the classification of these chains has been limited to repetition, synonymy, hyponymy, and collocates. The cases of...Semantic lexical chains have been regarded as important in textural cohesion, although traditionally, the classification of these chains has been limited to repetition, synonymy, hyponymy, and collocates. The cases of automatic extraction of lexical chains have found that the contextual synonyms can not be recognized, nor extracted automatically. This study took the data-based technology to extract the contextually co-occurring lexical chains through thematic lexical items. It found that these contextually co-occurring lexical chains can include the semantic lexical chains and contextual synonyms. It also found that, in extraction of collocates of the co-occurring lexical items, these collocates form secondary lexical chains, which contribute to textual cohesion. The vertical lexical chains made of contextually cooccurring lexical items and the horizontal chains made of collocational lexical items work together in making the text into a coherent whole.展开更多
基金the Key Project of National Natural Sci-ence Foundation of China (No.60435020)the High Technology Research and Development Programme of China (No.2002AA117010-09).
文摘This paper presents two different algorithms that derive the cohesion structure in the form of lexical chains from two kinds of language resources HowNet and TongYiCiCiLin. The re-search that connects the cohesion structure of a text to the derivation of its summary is displayed. A novel model of automatic text summarization is devised,based on the data provided by lexical chains from original texts. Moreover,the construction rules of lexical chains are modified accord-ing to characteristics of the knowledge database in order to be more suitable for Chinese summa-rization. Evaluation results show that high quality indicative summaries are produced from Chi-nese texts.
文摘Webpage keyword extraction is very important for automatically extracting webpage summary, retrieval, automatic question answering, and character relation extraction, etc. In this paper, the environment vector of words is constructed with lexical chain, words context, word frequency, and webpage attribute weights according to the keywords characteristics. Thus, the multi-factor table of words is constructed, and then the keyword extraction issue is divided into two types according to the multi-factor table of words: keyword and non-keyword. Then, words are classified again with the support vector machine (SVM), and this method can extract the keywords of unregistered words and eliminate the semantic ambiguities. Experimental results show that this method is with higher precision ratio and recall ratio compared with the simple ff/idf algorithm.
文摘Semantic lexical chains have been regarded as important in textural cohesion, although traditionally, the classification of these chains has been limited to repetition, synonymy, hyponymy, and collocates. The cases of automatic extraction of lexical chains have found that the contextual synonyms can not be recognized, nor extracted automatically. This study took the data-based technology to extract the contextually co-occurring lexical chains through thematic lexical items. It found that these contextually co-occurring lexical chains can include the semantic lexical chains and contextual synonyms. It also found that, in extraction of collocates of the co-occurring lexical items, these collocates form secondary lexical chains, which contribute to textual cohesion. The vertical lexical chains made of contextually cooccurring lexical items and the horizontal chains made of collocational lexical items work together in making the text into a coherent whole.