网页分类及其维文信息检索中的应用研究被引量：2

The Research of Automated Text classification in the Uyghur Information Retrieval

下载PDF

导出

摘要研究维文信息检索中网页分类问题。在维文信息预处理，文档特征词组抽取和信息检索模型的建立等方面做了一些探讨。提出一种引入网页分类和词组抽取技术的信息检索方法。采用了基于KNN的网页分类方法，此方法符合雏文语言特点，能够提高信息检索系统的查询准确率，使得返回结果更符合用户检索需求。 This paper studies the problems of Uyghur Information Retrieval automated text classification. We probe into the pre-process of Uyghur information, the extract of character phrases of documents and the establishment of information retrieval model. Web text classification algorithm based on KNN method presented. The experiments has proved that the design of the system accords with the language characteristic of Uyghur and improves the query precision in Uyghur information retrieval system, so the returned query results can best meet the users＇ needs.

作者海丽且木·艾沙维尼拉·木沙江

机构地区新疆大学信息科学与工程学院

出处《电脑知识与技术》 2011年第1期192-193,共2页 Computer Knowledge and Technology

基金国家自然科学基金项目（61063022）新疆维吾尔自治区高校科研计划重点资助项目（XJEDU2006113）

关键词维文网页网页预处理网页分类 Uyghur web web pre-process text classification

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献5

1单松巍,冯是聪,李晓明.几种典型特征选取方法在中文网页分类上的效果比较[J].计算机工程与应用,2003,39(22):146-148. 被引量：76
2吴俊森,吐尔根.依不拉音.基于内容的维文文本检索系统[J].现代计算机,2006,12(10):90-92. 被引量：2
3程泽凯,陆小艺.文本分类中的特征选择方法[J].安徽工业大学学报（自然科学版）,2004,21(3):220-224. 被引量：11
4李永平,程莉,叶卫国.基于隐含语义的kNN文本分类研究[J].计算机工程与应用,2004,40(6):71-73. 被引量：8
5H.B.Mitehell,P.A.Schaefer.A"soft",K-Nearest Neighbor Voting Scheme[J].International Journal of IntelligentSystems,2001:459-468.

二级参考文献19

1冯是聪单松巍张志刚等.一个中文网页数据集及其分类体系[A]..海峡两岸技术交流会[C].南京,2002-10.121-129.
2Yiming Yang,Jan O Pedersen.A comparative Study on Feature Selection in Text Categorization[C].In :Proceedings of the Fourteenth International Conference on Machine Leaming(ICML'97), 1997.
3Yiming Yang,Xin Liu.A re-examination of text categorization methods[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR'99,1999:42---49.
4Yiming Yang.A study on thresholding strategies for text categorization[C].In:Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'01),2001.
5John G H,Kohavi R,Pfleger K,Irrelevant feature and the subset selection problem[EB/OL] ,http://www,stanford,edu/-kpfleger/copy/publications/relevance4,ps,gz,1994.
6Yang Y,Pedersen J P,A comparative study on feature selection in text categorization[A] ,In:Proc of the 14th Int' l ConferenceMachine Learning (ICML'97)[C],1997 ,412 -420.
7Mladenic D,Grobelnk M,Feature selection for unbalanced class distribution and Na 1ve bayes [ A ],In:Proc of the 16th Int'l Confon Machine Learning (ICML'99) [C],San Francisco:Morgan Kaufmann Publishers,1999,258- 267.
8ladenic M D,Machine Learning on non-homogeneous,distributed text data [EB/OL],http://www,cs,cmu,edu/afs/cs/project/theo-4/text-learning/www/pww/papers/PhD/PhDBib,ps,gz,1998.
9Lewis D D,Gale WA,A Sequential Algorithm for Training Text Classifiers[A],SIGIR 94:Proceedings of 17th Annual InternationalACM-SIGIR Conference on Research and Development in Information Retrival[C],Springer- Verlag,London,1994,3-12.
10RE Filman,S Pant.Searching the Internet:IEEE Internet Computing,1998

共引文献93

1张培颖.基于Web内容和日志挖掘的个性化网页推荐系统[J].计算机系统应用,2008,17(9):9-11. 被引量：6
2高博,朱东华,韩士雄.一种智能化的信息采集系统的研究与实现[J].兵工学报,2009,30(S1):130-134. 被引量：3
3陈淑珍.Web文本挖掘中的特征表示与特征提取技术[J].三明高等专科学校学报,2004,21(2):53-57. 被引量：2
4何峰,林亚丽.改进的KNN文本分类算法综述[J].福建电脑,2005,21(1):4-5. 被引量：1
5江祥奎,原思聪.中文网页分类中的网页特征提取方法[J].电脑开发与应用,2005,18(10):27-28. 被引量：1
6马光志,张生庭.基于关联规则的Web文档分类[J].计算机工程与设计,2005,26(9):2515-2518. 被引量：8
7薛为民,陆玉昌.文本挖掘技术研究[J].北京联合大学学报,2005,19(4):59-63. 被引量：63
8付雪峰,刘邱云,王明文.基于互信息的粗糙集信息检索模型[J].山东大学学报（理学版）,2006,41(3):17-19. 被引量：2
9谷峰,吴扬扬.文本分类关键技术[J].福建电脑,2006,22(9):5-6. 被引量：2
10谭金波,黄峰,杨晓江,李艺.一种改进的互信息特征选择算法[J].情报学报,2006,25(6):651-656. 被引量：7

同被引文献29

1陈丽珍,卡米力.毛依丁.WEB维文信息检索系统中维文的存储和特征项抽取[J].新疆大学学报（自然科学版）,2006,23(1):90-92. 被引量：1
2茆诗松,程依明,濮晓龙.概率论与数理统计教程[M].北京:高等教育出版社.2009.
3Soumen Chakrabarti.Web数据挖掘[M].北京:人民邮电出版社,2009,53-137.
4Yiming yang, Jan O Pedersen. A comparative Study on Feature Selection in text Categorization In:Proceeding of the Fourteenth International[C].Conference on Machine Learning ICML(97),1997,2-6.
5Yiming Yang. A study on thresholding strategies for text categorization[C]. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SICIR'01),2001,137-145.
6Hanchuan Peng,Fuhui Long,Chris Ding. Feature Selection Based on Mutual Information:Criteria of Max-Dependency, Max- Relevance,and Min-Redundacy[J]. IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005,27(2):1228-1236.
7VLACHOS A. Active learning with support vector machines[D]. MS:University of Edinburgh,2004.
8Hsu C W, Lin C J. A comparison of methods for multi class support vector machines[J]. IEEE Transactions on Neural Networks, 2002,13(2):415-425.
9Yang Yi-ming.An evaluation of statistical approaches to text categorization [J]. Information Retrieval, 1999,1(1):76-88.
10Mladenic D.Machine Learning on non - homogeneous, Distributed Text Data[D].Doctoral Dissertation , University of Ljublijana ,1998: 163-168.

引证文献2

1李永可,吴悠,张太红,冯向萍,吴向前.维文垃圾网页多元线性回归识别研究[J].新疆大学学报（自然科学版）,2012,29(2):218-222. 被引量：1
2罗雪琼,陈国忠,周毅,王爽,杜守洪,森干.基于云计算的维医药科研信息服务系统研究与开发[J].中国数字医学,2014,9(8):38-41. 被引量：1

二级引证文献2

1罗雪琼,饶从志,周毅,毛晓鹏,姜橙,田翔华,谷魁英.基于云平台泌尿外科医学信息获取分析系统的研究[J].中国数字医学,2014,9(12):20-23. 被引量：1
2李驰,李林.搜索引擎应对垃圾网页的技术研究[J].电脑知识与技术（过刊）,2015,21(9X):20-22.

1陈丽珍,卡米力.毛依丁.WEB维文信息检索系统中维文的存储和特征项抽取[J].新疆大学学报（自然科学版）,2006,23(1):90-92. 被引量：1
2刘艳民.中文网页分类方法的研究[J].微电子学与计算机,2009,26(9):166-169. 被引量：3
3段军峰,黄维通,陆玉昌.中文网页分类研究与系统实现[J].计算机科学,2007,34(6):210-213. 被引量：12
4陈丽珍,卡米力.毛依丁.基于WEB信息检索系统中维文处理方面的研究[J].电脑知识与技术（技术论坛）,2005(12):5-6.
5薛永大.网页分类技术研究综述[J].电脑知识与技术,2012,8(9):5958-5961. 被引量：2
6黄媛.面向网络爬虫的企业网站优化策略[J].信息系统工程,2017,30(4):23-23. 被引量：2
7廉捷,刘云.网络舆情中的信息预处理与自动摘要算法[J].北京交通大学学报,2010,34(5):94-99. 被引量：8
8周序生,李爽.网页自动分类的建模与仿真研究[J].计算机仿真,2011,28(10):121-124. 被引量：3
9朱保平,张金康.云环境中基于本体语义扩展的密文检索方案[J].南京理工大学学报,2015,39(4):392-397. 被引量：3
10梁党卫,彭文滔,边利亚.垂直搜索引擎中过滤器的设计与实现[J].计算机应用与软件,2009,26(12):148-151. 被引量：2

电脑知识与技术

2011年第1期

浏览历史

内容加载中请稍等...

网页分类及其维文信息检索中的应用研究被引量：2

参考文献5

二级参考文献19

共引文献93

同被引文献29

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

网页分类及其维文信息检索中的应用研究 被引量：2

参考文献5

二级参考文献19

共引文献93

同被引文献29

引证文献2

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

网页分类及其维文信息检索中的应用研究被引量：2