1FU YAN,YANG DONG2Q ING,TANG SH I2W E I.U sing XPath to discover informative content blocks of W eb pages[C]//3 rd International Conference on Semantics:Knowledge and Grid.Xiπan:IEEE Press,2007:450-453.
2GUPTA S, KAISER G, NEISTADT D, et al. DOM-based content extraction of HTML documents [C]. Proceedings of the 12th Word Wide Web Conference New York, USA: [s. n.], 2003.
3PELLEG D, BARAS D. K-means with large and noisy constraint sets [C]. Proceedings of the 18th European Conference on Machine Learning. Warsaw, Poland:[s. n.], 2007.
4EMBLEY DW,JIANG YS,NG YK.Record-Boundary Discovery in Web Documents[A].SIGMOD'99 Proceedings[C].1999.
5EMBLEY DW,LI X.Record Location and Reconfiguration in Unstructured Multiple-Record Web Documents[A].WebDB'00 Proceedings[C].2000.
6LIM SJ,NG YK.Extracting Structures of HTML Documents Using a High-Level Stack Machine[M].Information Networking in Asia,Gordon and Breach Science Publishers,Newark,New Jersey,2001.
7LIM SJ,NG YK,YANG XC.Integrating HTML Tables Using Semantic Hierarchies And Meta-Data Sets[A].International Database Engineering and Applications Symposium(IDEAS'02)[C].Edmonton,Canada,2002.
8LIM SJ,NG YK.A Heuristic Approach for Converting HTML Documents to XML Documents[A].Proceedings of the Sixth International Conference on Rules and Objects in Databases(DOOD 2000)[C].London,England,2000.1182-1196.
9LIN SH,HO JM.Discovering Informative Content Blocks from Web Documents[A].KDD 2002[C].2002.588-593.
10YU SP,CAI D,WEN JR,et al.Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation[EB/OL].http://research.microsoft.com/research/pubs/view.aspx?type=Technical%20Report&id=632,2002-12.