VSM中用语片为特征项计算文本相似度被引量：2

Method of Text Semantic Similarity Computing Based on VSM Using the Skeleton Semantic Clip

下载PDF

导出

摘要定义了骨架语片的概念。用互信息量作为衡量两个词语间相关程度的参考值,借助依存关系、基本语法将满足相关度阈值的两个词组合成骨架语片。用骨架语片做特征项,用空间向量模型表示文本语义,用语片的出现频度做语片权重,用余弦法计算文本间语义相似度。应用于试卷主观题自动评分,实验证实这种方法结果具有令人满意的正确度。 Defining the concept of skeleton semantic clip in the paper. Comparing relevancy between two words using mutual information. Structuring two words accord with some value of mutual information through semantic dependence and basic syntax. Computing the semantic similarity of sentences by the method of cosine, eigenvalue come from the skeleton semantic clip, and the semantics of sentence expressed the vector space model. The application of the method is the auto gradeing system of subjective test questions in examination. The method is validated by some use case. The result is satisfying.

作者潘国清

机构地区东南大学计算机科学与工程学院

出处《计算机与数字工程》 2007年第10期24-25,34,共3页 Computer & Digital Engineering

关键词空间向量模型相关度骨架语片互信息相似度 vector space model,relevancy,skeleton semantic clip,mutual information,semantic similarity of sentences

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献5

1SaltonG., McGilM. J. AnintroductiontomoderninformationretrievalM, NewYork: McGraw - Hill, 1983
2Abney Steven, Parsing by Chunks, In: Robert Berwick, Steven Abney and Carol Tenny ( eds. ), .Principle - Based Parsing[ M], Kluwer Academic Publishers, 1991, pp. 257 - 278
3李素建,刘群,杨志峰.基于最大熵模型的组块分析[J].计算机学报,2003,26(12):1722-1727. 被引量：58
4朱德熙．朱德熙文集[M]．第一卷．商务印书馆．1999，21-28
5高思丹,袁春风.语句相似度计算在主观题自动批改技术中的初步应用[J].计算机工程与应用,2004,40(14):132-135. 被引量：47

二级参考文献19

1[1]Erik F, Tjong Kim Sang,Buchholz S. Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL2000 and LLL-2000, Lisbon, Portugal, 2000. 127～132
2[2]Steven A. Parsing by Chunks. In: Berwick, Abney, Tenny eds. Principle-Based Parsing: Kluwer Academic Publishers,1991. 257～278
3[5]Ratnaparkhi A. A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1996
4[6]Ratnaparkhi A. A simple introduction to maximum entropy models for natural language processing. Institute for Research in Cognitive Science, University of Pennsylvania : Technical Report 9708, 1997
5[7]Berger A, Pietra S D, Pietra V D. A maximum entropy approach to natural language processing. Computational Linguistics, 1996,22(1):39～71
6[8]Skut, Wojciech, Thorsten Brants. A maximum entropy partial parser for unrestricted text. In:Proceedings of the 6th Workshop on Very Large Corpora, Montreal, Canada, 1998. 143～151
7[10]Abney S. Part-of-speech tagging and partial parsing. In:Church K, Young S, Bloothooft G eds. Corpus-Based Methods in Language and Speech, An ELSNET volume, Dordrecht:Kluwer Academic Publishers, 1996. 119～136
8[11]Church K W. A stochastic parts program and noun phrase parser for unrestricted text. In:Proceedings of the 2nd Conference on Applied Natural Language Processing, Texas, USA, 1988.136～143
9[12]Ramshaw L A, Marcus M P. Text chunking using transformation-based learning. In: Proceedings of ACL Third Workshop on Very Large Corpora, Cambridge, USA, 1995. 82～94
10[13]Darroch J N, Ratcliff D. Generalized iterative scaling for loglinear models. Annals of Mathematical Statistics, 1972,43(5):1470～1480

共引文献103

1张燕霞.主观题自动批改技术的研究与实现[J].硅谷,2008,1(12).
2陈晓明,周渝.汉语部分句法分析的研究和发展趋势[J].贵州大学学报（自然科学版）,2004,21(4):384-386. 被引量：2
3干俊伟,黄德根.汉语介词短语的自动识别[J].中文信息学报,2005,19(4):17-23. 被引量：14
4孟爱国,卜胜贤,李鹰,甘文.一种网络考试系统中主观题自动评分的算法设计与实现[J].计算机与数字工程,2005,33(7):147-150. 被引量：46
5余正涛,樊孝忠.基于最大熵模型的汉语问句语义组块分析[J].计算机工程,2005,31(17):3-5. 被引量：5
6余正涛,樊孝忠,郭剑毅.基于支持向量机的汉语问句分类[J].华南理工大学学报（自然科学版）,2005,33(9):25-29. 被引量：20
7冯冲,陈肇雄,黄河燕,王江伟.最大熵模型的树-栅格最优N解码算法[J].计算机科学,2005,32(10):167-169. 被引量：1
8王晟.问答题自动评判技术的研究与实现[J].湖南人文科技学院学报,2005,22(5):78-81. 被引量：2
9李跃进,赵晶,林鸿飞.基于Internet的军事演习信息抽取系统[J].计算机工程与应用,2006,42(14):214-218. 被引量：6
10刘贵全,曾宇斌.基于最大熵模型的汉语依存分析[J].计算机工程,2006,32(11):216-218. 被引量：2

同被引文献18

1郑智斌,邓兰花.网络个人信源及其可信度分析[J].情报理论与实践,2008,31(6):857-859. 被引量：8
2宋玲,马军,连莉,张志军.文档相似度综合计算研究[J].计算机工程与应用,2006,42(30):160-163. 被引量：43
3Rheingold H. The virtual community [M]. MA: Addison Wesley, 1993: 5.
4iResearch Consulting Group. Consulting Group, China Online Social Network Research Report [EB/OL]. http://www.iresearch.com.cn/ html/constulting/web2/Free- Classid- 20- id- 1081.html , 2008 - 06-09.
5PAN Wei, IAN Xiaoyuan. Building a virtual community platform for subject informstion services at Shanghai Jiao Tong University Library - The Electronic Library, 2009, 27 (2).271-282.
6Hilligoss B, Rieh S Y. Developing a unifying framework of credibility assesement: Construct , heuristics, and interaction in context [ J ]. Information & Management, 2008, 44 (4) : 1467 - 1484.
7Pandelaere M, Dewitte S. On - Line versus Memory - based Information Credibility Inferences: Implications far Memorybasat Product Judg- meats [J]. Advances in Gonstmaer Research, 2006, 33: 565-567.
8SaltinG, WongA, YangCS. A VeetorSlmeeModel for automated indexing [J]. Communications of the ACM, 1975, 18 (1): 613- 620.
9朱艳春,刘鲁,张巍.基于评分用户可信度的信任模型分析与构建[J].管理工程学报,2007,21(4):150-152. 被引量：14
10李媛媛,马永强.基于潜在语义索引的文本特征词权重计算方法[J].计算机应用,2008,28(6):1460-1462. 被引量：17

引证文献2

1夏火松,刘建.文本相似度视角下的虚拟社区评论的可信性分析[J].现代情报,2011,31(9):33-37. 被引量：4
2史高翔,赵逢禹.基于缺陷相似度与再分配图的软件缺陷分配方法[J].计算机科学,2016,43(11):246-251. 被引量：1

二级引证文献5

1赵捧未,马琳,秦春秀.虚拟社区研究综述[J].情报理论与实践,2013,36(7):119-123. 被引量：35
2黄婷婷,曾国荪,熊焕亮.基于商品特征关联度的购物客户评论可信排序方法[J].计算机应用,2014,34(8):2322-2327. 被引量：12
3胡冰倩.微博的公安情报价值研究[J].科技创业月刊,2015,28(6):15-19.
4王忠群,钱寅亮,叶安杰,陈云霞.基于证据网络的在线商品评论有用性的评估模型[J].情报理论与实践,2021,44(1):154-161. 被引量：1
5董夏磊,项正龙,吴泓润,汪鼎文,李元香.基于开发者多元特征的软件缺陷自动分派方法[J].计算机科学,2022,49(12):81-88.

1陈超,陈性元,汪永伟,代向东.基于粗糙集理论的冗余规则处理方法[J].计算机工程与设计,2014,35(1):21-25. 被引量：9
2胡恩博,余腊生.一种基于中文分词的主观题自动评分优化算法研究[J].长沙大学学报,2014,28(5):59-61.
3杨巍巍.相似度模型在主观题自动阅卷中的应用[J].数字技术与应用,2013,31(5):77-78. 被引量：2
4李佳林.在线考试系统中主观题自动阅卷的设计[J].中国教育技术装备,2008(24):113-114. 被引量：6
5方德坚.主观题自动评分算法模型研究[J].电子世界,2013(23):178-178. 被引量：2
6曹莹,苗志刚.基于向量矩阵优化频繁项的改进Apriori算法[J].吉林大学学报（理学版）,2016,54(2):349-353. 被引量：19
7南铉国.基于语句相似度的主观题自动评分系统[J].丝路视野,2016,0(28):54-57.
8戴忠健.一个实用的汉语分词词库自动扩充系统[J].现代电子工程,1999(4):62-64.
9于寒冰,王继龙.基于IP地址聚类的网络异常流量分析[J].中国海洋大学学报（自然科学版）,2008,38(S1):187-190.
10李学俊.基于人工智能的主观题自动评分算法实现[J].江南大学学报（自然科学版）,2009,8(3):292-295. 被引量：18

计算机与数字工程

2007年第10期

浏览历史

内容加载中请稍等...

VSM中用语片为特征项计算文本相似度被引量：2

参考文献5

二级参考文献19

共引文献103

同被引文献18

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

VSM中用语片为特征项计算文本相似度 被引量：2

参考文献5

二级参考文献19

共引文献103

同被引文献18

引证文献2

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

VSM中用语片为特征项计算文本相似度被引量：2