期刊文献+

基于统计主题模型的多粒度Web文档标注 被引量:1

Annotating Web document in multi-granularity way by statistical topical model
下载PDF
导出
摘要 针对已有Web文档语义标注技术在标注完整性方面的缺陷,将潜在狄里克雷分配(LDA)模型用于对Web文档添加语义标注。考虑到Web文档具有明显的领域特征,在传统的LDA模型中嵌入领域信息,提出Domain-enable LDA模型,提高了标注结果的完整性并避免了对词汇主题的强制分配;同时在文档隐含主题和文档所在领域本体概念间建立关联,利用本体概念表达的语义对隐含主题进行准确的解释,使文档的语义清晰化,为文档检索提供有效帮助。根据LDA模型可为每个词汇分配隐含主题的特征,提出多粒度语义标注的概念。在20news-group和WebKB数据集上的实验证明了Domain-enable LDA模型的有效性,并指出对文档进行多粒度标注有助于有效处理不同类型查询。 Concerning the Web document annotation techniques available have weakness in integrity annotation,Latent Dirichlet Allocation(LDA) model was applied to semantic annotation.By embedding document domain information to LDA model,a new LDA model called domain-enabled LDA was introduced.An association between the statistical topical model and domain ontology was established,so the implied topic generated could be interpreted by concepts and an explicit semantic in document was acquired.Because the LDA model assigned a topic to each word in document,a multi-granularity annotation strategy was proposed.The experiments on 20news-group and WebKB show that the domain-enabled LDA model proposed can improve the annotation effectiveness and the multi-granularity annotation method helps different types of query in information retrieval.
作者 袁柳 张龙波
出处 《计算机应用》 CSCD 北大核心 2010年第A12期3401-3406,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(60873196)
关键词 统计主题模型 本体 语义标注 概念 信息检索 statistical topical model ontology semantic annotation concept information retrieval
  • 相关文献

参考文献2

二级参考文献28

  • 1张启蕊,张凌,董守斌,谭景华.训练集类别分布对文本分类的影响[J].清华大学学报(自然科学版),2005,45(S1):1802-1805. 被引量:27
  • 2曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:388
  • 4Bergman MK. The deep Web: Surfacing hidden value. White Paper on the Deep Web. 2001. http://www.brightplanet.com/pdf/ deepweb whitepaper.pdf
  • 5Liu W, Meng XF, Meng WY. Deep Web data integration. Technical Report, WAMDM-TR-2006-3, WAMDM, 2006 (in Chinese with English abstract), http://idke.ruc.edu.cn/reports/report2006/seminar%20summary/Deep%20Web.pdf
  • 6Arlotta L, Crescenzi V, Mecca G, Merialdo P. Automatic annotation of data extracted from large Web sites. In: Christophides V, Freire J, eds. Proc. of the 6th Int'l Workshop on Web and Databases. San Diego: ACM Press, 2003.7-12.
  • 7Wang JY, Lochovsky FH. Data extraction and label assignment for Web databases. In: Proc. of the 12th Int'l World Wide Web Conf. Budapest: ACM Press, 2003. 187-196.
  • 8He H, Meng WY, Lu YY, Yu C, Wu ZH. Towards deeper understanding of the search interfaces of the deep Web. World Wide Web, 2007,10(2): 133-155.
  • 9Lu YY, He H, Zhao HK, Meng WY, Yu C. Annotating structured data of the deep Web. In: Proc. of the IEEE 23rd Int'l Conf. on Data Engineering. Istanbul: IEEE Computer Society Press, 200% 376-385.
  • 10Wang JY, Lochovsky FH. Data-Rich section extraction from HTML pages. In: Keong W, Ling TW, eds. Proc. of the 3rd Int'l Conf. on Web Information Systems Engineering. Singapore: IEEE Computer Society Press, 2002. 313-322.

共引文献129

同被引文献1

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部