

Annotation Specification of Phrase in Search Engine Logs
摘要 语料标注是语料库构建的一项重要的基础性工作。基于搜狗日志,该文借助XML文档的结构化特点,将语料标注转换成节点属性的改写,根据语料的特点,制定了一套服务于搜索引擎用短语词典构建的短语语料标注加工规范及执行原则,并对标注集及加工规范进行了详细描述。利用此规范,已完成145 645条查询词串的标注,而且标注质量很高。 Corpus annotation is a fundamental work of corpus construction.Based on Sogou logs,this paper develops a set of annotation specification according to the characteristics of the corpus to build the phrases dictionary for search engine.In practice,the annotation process is completed as the task of node attribution filling in the XML file.With the proposed guideline,145 645 query strings has been annotated for their labels with a high quality.
作者 舒燕 吕学强
出处 《中文信息学报》 CSCD 北大核心 2013年第2期47-51,共5页 Journal of Chinese Information Processing
基金 国家社会科学基金资助项目(09CYY021)
关键词 语料标注 搜狗日志 短语词典 加工规范 corpus annotation Sogou logs phrases dictionary annotation specification
  • 相关文献


  • 1崔刚,盛永梅.语料库中语料的标注[J].清华大学学报(哲学社会科学版),2000,15(1):89-94. 被引量:36
  • 2俞士汶,段慧明,朱学锋,孙斌.北京大学现代汉语语料库基本加工规范[J].中文信息学报,2002,16(5):49-64. 被引量:126
  • 3俞士汶,段慧明,朱学锋,孙斌.北京大学现代汉语语料库基本加工规范(续)[J].中文信息学报,2002,16(6):58-65. 被引量:18
  • 4周强.汉语句法树库标注体系[J].中文信息学报,2004,18(4):1-8. 被引量:90
  • 5Leech G, Garside R. Running a grammar factory: the production of syntactically analysed corpora or ' tree- banks' [C]//Proceedings of Stig Johansson and Anna- Brita Stenstrom (eds.) English Computer Corpora: Selected papers and Research Guide. 1991:15-32.
  • 6Mitchell P Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini. Building a Large Annotated Corpus of English: The Penn Treebank[J], Computational Linguistics, 1993,19(2) :313-330.
  • 7Skut W, Brants T, Krenn B, et al. A linguistically interpreted corpus of German newspaper text[C]//Proceeding of the Conference on Language Resources and Evaluation LREC-98. Granade, Spain. 1998:705-711.
  • 8Brants S, Hansen S. Developments in the TIGER annotation scheme and their realization in the corpus [C]//Proceedings of the 3rd Conference on Language Resources and Evaluation (LREC-02). Las Palmas de Gran Canaria, Spain. 2002 : 1643-1649.
  • 9Hajic J. Building a syntactically annotated corpus: The Prague Dependency Treebank[C]//E. Hajicova (Ed.), Issues of valency and meaning. Studies in Honour of Jarmila Panevova. Prague, Czech Repubilc: Charles University Press. 1999.
  • 10Xia Fei, Martha Palmer, et al. Developing Guidelines and Ensuring Consistency for Chinese Text Annotation[C]//Proceed.ings of the 2nd International Con- ference on Language Resources and Evaluation (LREC-2000), Athens, Greece.










使用帮助 返回顶部