基于语料库的潜语义信息度量

Latent semantic information measurement of corpus orientation

下载PDF

导出

摘要为关键词定义了与主题或语义相关联的信息度量。首先获取基于主题的语料库,然后建立语料库的潜语义向量空间模型,通过该模型定义关键词的信息度量。由此可以计算任意文档包含该主题的信息量,定义文档对主题的隶属度。设定文档对主题隶属度阈值,从而判断文档是否属于该主题类。实验表明,与主题或语义关联的信息度量可以克服搜索中"词匹配"的不足,达到"语义匹配"的搜索。 The authors defined an information measurement associated with a topic or semantics for a keyword. Firstly, the topic-based corpus was obtained. Then the latent semantic vector space model of the corpus was established. After that, the information measurement of the keyword was defined through the model. Accordingly, the amount of the topic information any document contained could be calculated. Lastly, the membership measurement which measured the membership degree of the document belonging to the topic was introduced. A measurement threshold was set, thereby it determined whether the documents belonging to the topic or not. The experimental results show that the definition of the information measurement can get over the difficulty of the word-match search and really reach the goal of the semantic-match search.

作者江开忠李路王昭宗

机构地区上海工程技术大学基础教学学院

出处《计算机应用》 CSCD 北大核心 2009年第9期2450-2453,2467,共5页 journal of Computer Applications

基金上海市科学技术委员会科技攻关项目(055115001) 上海工程技术大学大学生创新项目(cx082100)

关键词潜语义信息度量度量分布隶属度 latent semantics information measurement metric distribution membership degree

分类号 TP301 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献12

1DEERWESTER S, DUMAIS S T, HARSHMAN R, et al. Indexing by latent semantic analysis [ J]. Journal of the American Society for Information Science, 1990,41(6) : 391 -407.
2SONG D, BRUZA P D, COLE R J. Concept learning and information inferencing on a high-dimensional semantic space [ EB/OL]. [ 2009 - 02 - 02]. http://www, comp. rgu. ac. uk/staff/ds/papers/ 38-Song-etal-SIGIR04workshop. pdf.
3CHEONG P, SONG D, BRUZA P D, et al. Information flow analysis with Chinese text [ C]// UCNLP'04: Proceedings of the 1st International Joint Conference on Natural Language Processing, LNCS 3248. Berlin: Springer-verlag, 2004: 100-109.
4李源,何清,史忠植.基于概念语义空间的联想检索[J].北京科技大学学报,2001,23(6):577-580. 被引量：18
5赵军,金千里,徐波.面向文本检索的语义计算[J].计算机学报,2005,28(12):2068-2078. 被引量：28
6陈宁,陈安,周龙骧,贾维嘉,罗三定.基于模糊概念图的文档聚类及其在Web中的应用[J].软件学报,2002,13(8):1598-1605. 被引量：12
7HOFMANN T. Probabilistic latent semantic indexing [ C]//Proceedings of the 22nd ACM-SIGIR International Conference on Research and Development in Information Retrieval. Berkeley, California: [s.n.], 1999: 50-57.
8何伟.LSI潜在语义信息检索模型[J].数学的实践与认识,2003,33(9):1-10. 被引量：9
9周水庚,关佶红,胡运发.隐含语义索引及其在中文文本处理中的应用研究[J].小型微型计算机系统,2001,22(2):239-243. 被引量：41
10何明,冯博琴,傅向华.基于Rough集潜在语义索引的Web文档分类[J].计算机工程,2004,30(13):3-5. 被引量：7

二级参考文献42

1[1]Han, J., Cai, Y., Cercone, N. Knowledge discovery in databases: an attribute-oriented approach. In: Yuan, Le-yan, ed. Proceedings of the 18th International Conference on Very Large Data Bases. Vancouver: Morgan Kaufmann, 1992. 547～559.
2[2]Srikant, R., Agrawal, R. Mining generalized association rules. In: Umeshwar, D., Gray, P.M.D., Shojiro, N., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 407～419.
3[3]Han, J., Fu, Y. Discovery of multiple-level association rules from large database. In: Umeshwar, D., Gray, P.M.D., Shojiro, N., eds. Proceedings of the 21st International Conference on Very Large Data Bases. Zurich: Morgan Kaufmann, 1995. 420～431.
4[4]Oren, Z., Oren, E., Omid, M., et al. Fast and intuitive clustering of web document. In: Heckerman, D., Mannila, H., Pregibon, D., eds. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD'97). Newport Beach, CA: AAAI Press, 1997. 287～290.
5[5]Cheung, D.W., Kao, B., Lee, J. W. Discovering user access patterns on the world-wide-web. In: Lu Hong-jun, Motoda, H., Liu, Huan, eds. Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining. Singapore: World Scientific, 1997. 303～316.
6[6]Salton, G., Buckley, C. Term-Weighting approaches in automatic text retrieval. Information Processing and Management, 1988,24(5):513～523.
7[7]Oren, Z. Clustering web documents: a phrase-based method for grouping search engine results [Ph.D. Thesis]. Seattle, WA: University of Washington, 1999.
8[8]Bezedek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum Press, 1981.
9[9]Ruspini, E.H. A new approach to clustering. Information Control, 1969,19(15):22～32.
10[10]Luo, San-ding. Efficient intelligent search system for web information mining (EIS). In: Goscinski, A., Horace, H.S.I, Jia, Wei-jia, et al, eds. Proceedings of the 4th International Conference on Algorithms and Architecture for Parallel Processing (ICA3PP 2000). Hong Kong: World Scientific Publishing, 2000. 716～717.

共引文献109

1毕砚昭,张捷,汪浩文,赵歌,王苗苗.可视化技术在媒介丰富性研究中的应用[J].系统仿真技术,2020(3):166-170.
2熊回香,薛姣,李青维,夏佩.视频信息的对象关联交叉检索研究[J].图书情报工作,2011,55(S2):295-299. 被引量：1
3李虹,李磊.一种基于扩展概念图的词义识别算法[J].计算机科学,2004,31(7):171-174.
4郑庆华,王朝静,孙霞.一种基于结构化语料库的概念语义网络自动生成算法[J].计算机研究与发展,2005,42(3):478-485. 被引量：7
5钱晓东,王正欧.基于改进KNN的文本分类方法[J].情报科学,2005,23(4):550-554. 被引量：19
6钱晓东,王正欧.文本处理中基于随机映射的加速LSI方法[J].天津大学学报（自然科学与工程技术版）,2005,38(4):372-376. 被引量：1
7陈涛,宋妍,谢阳群.基于IIG和LSI组合特征提取方法的文本聚类研究[J].情报学报,2005,24(2):203-209. 被引量：14
8马国俊,贠卫国.基于潜在语义索引的中文文本聚类的研究[J].现代电子技术,2005,28(10):58-59. 被引量：4
9李孟臣.VSM与LSI中的正交假设[J].现代情报,2005,25(8):223-224. 被引量：1
10王兰成,蒋丹,李超.基于中文词义概念的Web信息分类检索研究[J].现代图书情报技术,2005(10):35-37. 被引量：2

1马文宁.语义Web服务的相似度计算研究[J].电脑知识与技术,2013,9(8X):5513-5514. 被引量：1
2陈英芝.基于语义的中文自动文摘方法[J].科技信息,2009(30).
3胡吉明,胡昌平.基于主题层次树和语义向量空间模型的用户建模[J].情报学报,2013,32(8):838-843. 被引量：7
4韩美灵,杨勇.一种面向语义检索的向量空间模型改进方法[J].农业网络信息,2012(10):39-41. 被引量：2
5刘立群,王联国,火久元,韩俊英,刘成忠.基于模糊阈值补偿的混合蛙跳算法[J].计算机工程,2014,40(5):168-172. 被引量：3
6谢水根,张肃宇.一种基于改进Pal.King算子的AVI视频边缘检测研究[J].科技广场,2011(6):23-25. 被引量：1
7李燕萍,唐振民,丁辉,张燕.基于非参数直方图模型的鲁棒说话人识别算法[J].数据采集与处理,2010,25(1):81-85. 被引量：1
8黄莉.基于动态特征词的中文句子相似度计算[J].宝鸡文理学院学报（自然科学版）,2013,33(3):49-52. 被引量：2
9延霞,范士喜.基于问答社区的海量问句检索关键技术研究[J].计算机应用与软件,2013,30(7):315-317. 被引量：3
10成培,李峰.图像模糊边缘检测算法的改进[J].电子技术应用,2006,32(12):31-33. 被引量：6

计算机应用

2009年第9期

浏览历史

内容加载中请稍等...

基于语料库的潜语义信息度量

参考文献12

二级参考文献42

共引文献109

相关作者

相关机构

相关主题

浏览历史