期刊文献+

基于DOM的评论发现及抽取模型研究 被引量:5

Reviews discovery and opinions extraction model based on DOM
下载PDF
导出
摘要 Internet发展至今,除了静态的网页文本,目前出现较多的是基于动态文本的BBS评论、电子商务评论等。如何自动挖掘这些评论信息,变得日益重要。提出了一种新颖的评论算法进行评论信息的发现与抽取。采用了DOM技术对页面进行分块,同时结合信息熵的迭代计算技术实现了评论块的自动发现与抽取。 With the development oflntemet, more and more commercial websites have appeared. These commercial websites become the information platform where users can comment their reviews. These reviews are increasingly important. This text has put forward a kind of novel algorithm. This algorithm iteratively segment page by DOM and calculate the information entropy to automatically dis- cover and extract the reviews.
作者 李姜
出处 《计算机工程与设计》 CSCD 北大核心 2007年第9期2150-2153,共4页 Computer Engineering and Design
基金 国家自然科学基金项目(50376029)
关键词 评论抽取 文档对象模型 信息熵 页面分块 迭代计算 review extraction DOM information entropy page segmenting iterative calculation
  • 相关文献

参考文献8

  • 1Chakrabarti S.Mining the web:Discovering knowledge from hypertext data[M].USA:Morgan Kaufmann Publishers,2002.
  • 2Chakrabarti S,Punera K,Subramanyam M.Accelerated focused crawling through online relevance feedback[C].USA:Proceedings of the Eleventh International Conference on World Wide Web,2002.148-159.
  • 3Lin S H,Ho J M.Discovering informative content blocks from web documents[C].Canada:Proceedings of the Eighth ACM SIGKDD International Conference,2002.588-593.
  • 4Hsu C N,Dung M T.Generating finite-state transducers for semi-structured data extraction from the web[J].Information Systems,1998,23(8):521-538.
  • 5孟宪福,狄慧.基于Agent和XML的Web页面信息抽取研究与设计[J].计算机工程与设计,2004,25(8):1411-1414. 被引量:6
  • 6李效东,顾毓清.基于DOM的Web信息提取[J].计算机学报,2002,25(5):526-533. 被引量:101
  • 7Thomas M Cover,Joy A Thomas.Elements of information theory[M].USA:John Wiley & Sons Inc,2003.
  • 8Theresa Wilson,Janyce Wiebe,Paul Hoffmann.Recognizing contextual polarity in phrase-level sentiment analysis[C].Canada:Proceedings of Human Language Technology Conference and Conference Empirical Methods in Natural Language Processing,2005.347-354.

二级参考文献20

  • 1Ling Liu, Caltion Pu, Wei Han. XWRAP: An XML-enabled wrapper construction system for web information sources [C].Proceedings of the International Conference on Data Engineering, 2000.
  • 2Stefan Kuhlins, Ross Tredwell. Toolkits for generating wrap pers, a survey of software toolkits for automated data extraction from web sites [J]. Objects, Components, Architectures, Services, and Applications for a Networked World, 2003, 184-198.
  • 3Florescu D, Levy A Y, Mendelzon A. Database techniques for the World-Wide Web: A Survery. In: ACM The SIGMOD Record, 1998.59-74
  • 4Atzeni P, Mecca G, Merialdo P. To weave the Web. In: Proc the 23rd International Conference on Very Large Data Bases. Athens, Greece, 1997. 206-215
  • 5Pemberton S et al. XHTML 1.0: The extensible hyperText markup language. In: http://www.w3.org/MarkUp/
  • 6Cattell R G G. The Object Database Standard ODMG-93. San Mateo,California: Morgan Kaufmann Publishers,1994
  • 7Mitchell T. Machine Learning. New York: McGraw Hill, 1997
  • 8Wall L et al. Programming Perl(3rd Edition). O'Reilly & Associates,2000
  • 9Birbeck M et al. Professional XML. Wrox Press Inc, 2000
  • 10Liu L, Pu C, Han W. XWRAP: An XML-enabled wrapper construction system for web information sources. In: Proc International Conference on Data Engineering (ICDE), San diego, California, 2000. 611-621

共引文献105

同被引文献38

  • 1何昕,谢志鹏.基于简单树匹配算法的Web页面结构相似性度量[J].计算机研究与发展,2007,44(z3):1-6. 被引量:15
  • 2屠彤辉.期刊论文的元数据描述探析[J].上海高校图书情报工作研究,2006,16(4):30-34. 被引量:3
  • 3黄文蓓,杨静,顾君忠.基于分块的网页正文信息提取算法研究[J].计算机应用,2007,27(B06):24-26. 被引量:32
  • 4徐禾芳,何振辉.基于搜索引擎和数据挖掘的博客营销[D】.广州:华南理工大学工商管理学院,2008.
  • 5LIAO XIANGWEN, CAO DONGLIN, TAN SONGBO, et al. Combining language model with sentiment analysis for opinion retrieval of blog-post [ C]// TREC 2006: Text Retrieval Conference 2006 Proceedings. IS. l.]: NIST, 2006:211-213.
  • 6HU M, SUN A, LIM E-P. Comments-oriented blog summarization by sentence extraction [ C]// CIKM 07: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management. New York: ACM. 2007:901-904.
  • 7CHANG C, LUI S. IEPAD: information extraction based on pattern discovery [ C]// WWW 2001: International World Wide Web Conference. New York: ACM, 2001:681 -688.
  • 8ZHENG SHUYI, ZHOU DING, LI JIA, et al. Extracting author meta-data from Web using visual features [ C]// ICDMW 2007: Seventh IEEE International Conference on Data Mining Workshops. Washington, DC: IEEE Computer Society, 2007:33-40.
  • 9YI LAN, LIU BING, LI XIAOLI. Eliminating noisy information in Web pages for data mining [ C]// KDD '03: The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2003:296-305.
  • 10ZHENG SHUYI, SONG RUIHUA, WEN JI-RONG, et al. Efficient record-level wrapper induction [ C]//CIKM '09. New York: ACM, 2009:47-56.

引证文献5

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部