Purpose: This study aims to build an automatic survey generation tool, named CitationAS, based on citation content as represented by the set of citing sentences in the original articles.Design/methodology/approach: ...Purpose: This study aims to build an automatic survey generation tool, named CitationAS, based on citation content as represented by the set of citing sentences in the original articles.Design/methodology/approach: Firstly, we apply LDA to analyse topic distribution of citation content. Secondly, in CitationAS, we use bisecting K-means, Lingo and STC to cluster retrieved citation content. Then Word2Vec, Word Net and combination of them are applied to generate cluster labels. Next, we employ TF-IDF, MMR, as well as considering sentence location information, to extract important sentences, which are used to generate surveys. Finally, we adopt manual evaluation for the generated surveys.Findings: In experiments, we choose 20 high-frequency phrases as search terms. Results show that Lingo-Word2Vec, STC-Word Net and bisecting K-means-Word2Vec have better clustering effects. In 5 points evaluation system, survey quality scores obtained by designing methods are close to 3, indicating surveys are within acceptable limits. When considering sentence location information, survey quality will be improved. Combination of Lingo, Word2Vec, TF-IDF or MMR can acquire higher survey quality.Research limitations: The manual evaluation method may have a certain subjectivity. We use a simple linear function to combine Word2Vec and Word Net that may not bring out their strengths. The generated surveys may not contain some newly created knowledge of some articles which may concentrate on sentences with no citing.Practical implications: CitationAS tool can automatically generate a comprehensive, detailed and accurate survey according to user’s search terms. It can also help researchers learn about research status in a certain field.Originality/value: Citaiton AS tool is of practicability. It merges cluster labels from semantic level to improve clustering results. The tool also considers sentence location information when calculating sentence score by TF-IDF and MMR.展开更多
基金supported by Major Projects of National Social Science Fund (No. 17ZDA291)Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201704)Qing Lan Project
文摘Purpose: This study aims to build an automatic survey generation tool, named CitationAS, based on citation content as represented by the set of citing sentences in the original articles.Design/methodology/approach: Firstly, we apply LDA to analyse topic distribution of citation content. Secondly, in CitationAS, we use bisecting K-means, Lingo and STC to cluster retrieved citation content. Then Word2Vec, Word Net and combination of them are applied to generate cluster labels. Next, we employ TF-IDF, MMR, as well as considering sentence location information, to extract important sentences, which are used to generate surveys. Finally, we adopt manual evaluation for the generated surveys.Findings: In experiments, we choose 20 high-frequency phrases as search terms. Results show that Lingo-Word2Vec, STC-Word Net and bisecting K-means-Word2Vec have better clustering effects. In 5 points evaluation system, survey quality scores obtained by designing methods are close to 3, indicating surveys are within acceptable limits. When considering sentence location information, survey quality will be improved. Combination of Lingo, Word2Vec, TF-IDF or MMR can acquire higher survey quality.Research limitations: The manual evaluation method may have a certain subjectivity. We use a simple linear function to combine Word2Vec and Word Net that may not bring out their strengths. The generated surveys may not contain some newly created knowledge of some articles which may concentrate on sentences with no citing.Practical implications: CitationAS tool can automatically generate a comprehensive, detailed and accurate survey according to user’s search terms. It can also help researchers learn about research status in a certain field.Originality/value: Citaiton AS tool is of practicability. It merges cluster labels from semantic level to improve clustering results. The tool also considers sentence location information when calculating sentence score by TF-IDF and MMR.