摘要
[目的/意义]为帮助用户在拥有海量文本信息的问答社区高效率、高质量定位到符合自身需求的信息。[方法/过程]本文提出基于主题特征的问答文本摘要生成模型,该模型融合Word2Vec和SLDA算法多层次表达问答文本语义特征,而后基于图排序的思想,结合MRR冗余控制算法与文本句特征标签,调整句子权重,高效筛选出贴合问题标签的摘要内容。[结果/结论]本文对知乎问答社区多个问题下的问答文本数据进行验证,结果证明该模型具有较高的可行性和有效性。但本文选取了500份回答文本数据进行实证,未来可进一步扩大数据量开展更为充分的验证。
[Purpose/Significance]To help users locate information that meets their own needs with high efficiency and quality in the question and answer community with massive text information.[Method/Process]This paper proposed a question and answer text summary generation model based on topic features.This model combined Word2vec and SLDA algorithms to express the semantic features of question and answer text at multiple levels.Then,based on the idea of graph sorting,combined with MRR redundancy control algorithm and text sentence feature tags,the sentence weight was adjusted,and the summary content fitting the question tag was efficiently screened.[Result/Conclusion]Thise paper verifies the question and answer text data under multiple questions of Zhihu question and answer community,and the results show that the model is highly feasible and effective.However,this paper only selects 500 response text data for empirical analysis,and the data volume can be further expanded to carry out more full verification in the future.
作者
刘梦豪
熊回香
王妞妞
贺宇航
Liu Menghao;Xiong Huixiang;Wang Niuniu;He Yuhang(School of Information Management,Central China Normal University,Wuhan 430079,China;Undergraduate School,Central China Normal University,Wuhan 430079,China)
出处
《现代情报》
CSSCI
2023年第8期114-124,177,共12页
Journal of Modern Information
基金
国家社会科学基金重点项目“数据驱动的在线健康资源挖掘与智慧服务研究”(项目编号:22ATQ004)
2022年度华中师范大学基本科研业务费(人文社科类)交叉科学研究项目“基于量化自我技术的个体健康管理研究”(项目编号:CCNU22JC033)
华中师范大学研究生教育创新资助项目“跨学科科研合作视角下学术群落发现与知识增长点探测研究”(项目编号:2022CXZZ106)。