基于Web属性抽取训练分类模型的方法研究被引量：3

The Method Research of Web-based Attribute Extraction for Training Classification Model

下载PDF

导出

摘要针对通用搜索引擎信息量大、查询不准确、深度不够等问题,提出了基于Web的产品属性抽取这一新的搜索引擎服务模式。基于Web的产品属性抽取实际就是一个自动分类问题,其任务是:在给定的分类体系下,根据相关产品模板自动地判断属性的是非。完成此任务的关键在于寻找有效的特征值;确定相关分类规则,最终通过P、R和F指标来评价分类算法。 Carrying out Web-based product attribute extraction is one of the new search engine service patterns, it is put forward in relation that the general search engine is informative, inquiries inaccurate and not enough depth. Web-based product attribute extraction is a actual automatic classification problem, the task is： In a given classification system, in accordance with the relevant product template carry automatically attribute judge of right and wrong. Currently, the key is to search the effective feature value, determine the relevant classification rules, through P, R and F indicators assess the classification algorithm finally.

作者吴月萍

机构地区上海第二工业大学计算机与信息学院

出处《上海第二工业大学学报》 2008年第1期29-34,共6页 Journal of Shanghai Polytechnic University

关键词属性抽取分类规则特征值最大熵 attribute extraction classification rule feature value maximum entropy

分类号 TP393.09 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献7

1TJONG E F, SANG K, FIEN De MENLDE. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition [C].Proceedings of CoNLL-2003, Canada, Edmonton, 2003: 142-147.
2李素建,刘群,杨志峰.基于最大熵模型的组块分析[J].计算机学报,2003,26(12):1722-1727. 被引量：58
3周雅倩,郭以昆,黄萱菁,吴立德.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展,2003,40(3):440-446. 被引量：61
4BORTHWICK, ANDREW, STERLING J, et al. Exploiting diverse knowledge sources via maximum entropy in named entity recognition [C].Processing of the 6th Workshop on Very Large Corpora, Canada, Montreal, 1998:152-160.
5WOJCIETH SKUT, BRANTS T. A Maximum-entropy partial parser for unrestricted paper[C].Proceedings of the 6th Workshop on Very Large Corpora, Canada, Montreal, 1998: 143-151.
6DARROCH J N, RATCLIFF D. Generalized iterative scaling for log-linear models[J]. Annals of Mathematical Statistics, 1972,43(5): 1470-1480.
7ZHANG Le. Maximum Entropy Modeling Toolkit for Python and C++. URL http://homepages.inf.ed.ac.uk/s0450736/.2004:23-24.

二级参考文献33

1[1]Erik F, Tjong Kim Sang,Buchholz S. Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL2000 and LLL-2000, Lisbon, Portugal, 2000. 127～132
2[2]Steven A. Parsing by Chunks. In: Berwick, Abney, Tenny eds. Principle-Based Parsing: Kluwer Academic Publishers,1991. 257～278
3[5]Ratnaparkhi A. A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996
4[6]Ratnaparkhi A. A simple introduction to maximum entropy models for natural language processing. Institute for Research in Cognitive Science, University of Pennsylvania : Technical Report 9708, 1997
5[7]Berger A, Pietra S D, Pietra V D. A maximum entropy approach to natural language processing. Computational Linguistics, 1996,22(1):39～71
6[8]Skut, Wojciech, Thorsten Brants. A maximum entropy partial parser for unrestricted text. In:Proceedings of the 6th Workshop on Very Large Corpora, Montreal, Canada, 1998. 143～151
7[10]Abney S. Part-of-speech tagging and partial parsing. In:Church K, Young S, Bloothooft G eds. Corpus-Based Methods in Language and Speech, An ELSNET volume, Dordrecht:Kluwer Academic Publishers, 1996. 119～136
8[11]Church K W. A stochastic parts program and noun phrase parser for unrestricted text. In:Proceedings of the 2nd Conference on Applied Natural Language Processing, Texas, USA, 1988.136～143
9[12]Ramshaw L A, Marcus M P. Text chunking using transformation-based learning. In: Proceedings of ACL Third Workshop on Very Large Corpora, Cambridge, USA, 1995. 82～94
10[13]Darroch J N, Ratcliff D. Generalized iterative scaling for loglinear models. Annals of Mathematical Statistics, 1972,43(5):1470～1480

共引文献103

1李剑锋,胡国平,王仁华.基于最大熵模型的韵律短语边界预测[J].中文信息学报,2004,18(5):56-63. 被引量：20
2陈晓明,周渝.汉语部分句法分析的研究和发展趋势[J].贵州大学学报（自然科学版）,2004,21(4):384-386. 被引量：2
3干俊伟,黄德根.汉语介词短语的自动识别[J].中文信息学报,2005,19(4):17-23. 被引量：14
4王建会,王雷,胡运发.词语间依存关系的定量识别[J].中文信息学报,2005,19(4):31-38. 被引量：3
5冯丽萍,焦莉娟.基于最大熵的中文组织机构名识别模型[J].计算机与数字工程,2010,38(12):36-40. 被引量：2
6余正涛,樊孝忠.基于最大熵模型的汉语问句语义组块分析[J].计算机工程,2005,31(17):3-5. 被引量：5
7余正涛,樊孝忠,郭剑毅.基于支持向量机的汉语问句分类[J].华南理工大学学报（自然科学版）,2005,33(9):25-29. 被引量：20
8冯冲,陈肇雄,黄河燕,王江伟.最大熵模型的树-栅格最优N解码算法[J].计算机科学,2005,32(10):167-169. 被引量：1
9张仰森,曹元大,俞士汶.最大熵方法中特征选择算法的改进与纠错排歧[J].北京理工大学学报,2006,26(1):36-40. 被引量：4
10李跃进,赵晶,林鸿飞.基于Internet的军事演习信息抽取系统[J].计算机工程与应用,2006,42(14):214-218. 被引量：6

同被引文献35

1王海涛,曹存根,高颖.基于领域本体的半结构化文本知识自动获取方法的设计和实现[J].计算机学报,2005,28(12):2010-2018. 被引量：31
2刘非凡,赵军,吕碧波,徐波,于浩,夏迎炬.面向商务信息抽取的产品命名实体识别研究[J].中文信息学报,2006,20(1):7-13. 被引量：47
3M A Hearst.Automatic Acquisition of Hyponyms from Large Text corpora [C]// Proceedings of the 14th Conference on Computational Linguistics, 1992:539-545.
4S A Caraballo.Automatic Construction of a Hypernym-labeled Noun Hierarchy from Text [C]//Proceedings of the 37th Annual Meeting of the Association for Computational Linguistic on Computational Linguistics, 1999:120-126.
5M Berland and E Charniak.Finding Parts in Very Large Corpora[C.] //Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics,on Computational 57-64.
6M Poesio, T Ishikawa,etal.Acquiring Lexical Knowledge for Anaphora Resolution[C]//Proceedings of the 3rd Conference on Language Resources and Evaluation (LREC),2002.
7A Almuhareb and M Poesio.Attribute-Based and Value-Based Clustering:An Evaluation[C]//Proc of EMNLP,2004:158-165.
8Zhang Le. Maximum Entropy Modeling Toolkit for Python and C++ [EB,OL]. URL http://homepages.inf.ed.ac.uk/s0450736//maxenttoolkit.html.
9P Resnik.Semantic Similarity in a Taxonomy:An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language[J]. Journal of Artificial Intelligence, 1999(11):95-130.
10HATZIVASSILOGLOU V,MCKEOWN K R.Predicting the semantic orientation of adjectives[C]//Proceeding of the 35th Annual Meeting of the Association of Computational Linguistics(ACL-97),New Brunswick,1997:174-181.

引证文献3

1吴月萍,陈玉泉.基于Web的概念属性抽取的研究[J].中国管理信息化,2009,12(10):98-101. 被引量：7
2吴月萍.基于最大熵方法的评论信息抽取方法[J].上海第二工业大学学报,2010,27(3):218-223.
3李博,董晓凯,莫苏宁.产品属性挖掘综述[J].现代商贸工业,2012,24(9):87-89.

二级引证文献7

1吴月萍.基于最大熵方法的评论信息抽取方法[J].上海第二工业大学学报,2010,27(3):218-223.
2丁君军,郑彦宁,化柏林.国内外属性抽取研究综述[J].情报科学,2011,29(5):793-796. 被引量：9
3李龙澍,张晓红,赵志伟.基于粗糙集理论的中文文本主客观性研究[J].计算机技术与发展,2011,21(6):112-115.
4丁君军,郑彦宁,化柏林.基于规则的学术概念属性抽取[J].情报理论与实践,2011,34(12):10-14. 被引量：28
5董晓凯,莫苏宁,李博,陆伟.基于最大熵模型下复合特征模板的产品属性挖掘研究[J].苏州科技学院学报（自然科学版）,2012,29(1):61-65.
6李博,董晓凯,莫苏宁.产品属性挖掘综述[J].现代商贸工业,2012,24(9):87-89.
7翟劼,裘江南.基于规则的知识元属性抽取方法研究[J].情报科学,2016,34(4):43-47. 被引量：12

1王小菊,蒋芸,李永华.基于依赖度之差的属性重要性评分[J].计算机技术与发展,2009,19(1):67-70. 被引量：9
2大而全的桌面管理——IBM Tivoli Configuration Manager[J].网管员世界,2006(5):28-28.
3顿毅杰,张小峰,孙昊,赵丽.一种基于粒度的规则挖掘方法[J].兰州理工大学学报,2006,32(1):105-108. 被引量：3
4段晓飞,张素智,马红.基于Deep Web的模式匹配算法研究[J].郑州轻工业学院学报（自然科学版）,2008,23(3):73-75. 被引量：1
5仇光,郑淼,卜佳俊,史源,陈纯.基于传播的产品属性抽取[J].浙江大学学报（工学版）,2010,44(11):2188-2193.
6周临震,李青祝.基于Web和KBE的产品模板配置器的研究与开发[J].制造业自动化,2015,37(14):110-113.
7王宁,任红伟.网络表格间的快照关系发现[J].计算机科学,2015,42(7):5-11. 被引量：1
8陈炯,张虎,曹付元,张永奎.面向中文客户评论的产品属性抽取方法研究[J].计算机工程与设计,2012,33(3):1245-1250. 被引量：4
9周益民.基于WAVE技术的组合机床CAD系统的研制[J].机械设计与制造,2002(2):24-25. 被引量：3
10陆长明,蒋建东,张立彬.基于模板的小型农业作业机产品信息模型研究[J].计算机集成制造系统,2008,14(6):1101-1105. 被引量：10

上海第二工业大学学报

2008年第1期

浏览历史

内容加载中请稍等...

基于Web属性抽取训练分类模型的方法研究被引量：3

参考文献7

二级参考文献33

共引文献103

同被引文献35

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于Web属性抽取训练分类模型的方法研究 被引量：3

参考文献7

二级参考文献33

共引文献103

同被引文献35

引证文献3

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

基于Web属性抽取训练分类模型的方法研究被引量：3