The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research area...The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research areas, including Web mining and data warehouse, Deep Web and Web integration, P2P networks, text processing and information retrieval, as well as Web Services and Web infrastructure. After briefly introducing the WISA conference, the survey outlines the current activities and future trends concerning Web information systems and applications based on the papers accepted for publication.展开更多
Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text...Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text documents toextract case reasoning and related data. This sort of case processing helps professionals and researchers to refer the previous case with more accuracy in reducedtime. The rapid development of judicial ontologies seems to deliver interestingproblem solving to legal knowledge formalization. Mining context informationthrough ontologies from corpora is a challenging and interesting field. Thisresearch paper presents a three tier contextual text mining framework throughontologies for judicial corpora. This framework comprises on the judicial corpus,text mining processing resources and ontologies for mining contextual text fromcorpora to make text and data mining more reliable and fast. A top-down ontologyconstruction approach has been adopted in this paper. The judicial corpus hasbeen selected with a sufficient dataset to process and evaluate the results.The experimental results and evaluations show significant improvements incomparison with the available techniques.展开更多
In this paper, a new context free stemmer is proposed which consists of the combination of traditional rule based system with string similarity approach. This algorithm can be called as hybrid algorithm. It is languag...In this paper, a new context free stemmer is proposed which consists of the combination of traditional rule based system with string similarity approach. This algorithm can be called as hybrid algorithm. It is language dependent algorithm. Context free stemmer means that stemmer which stems the word that is not based on the context i.e., for every context such rule is applied. After stripping the words using traditional context free rule based approach, it may over stem or under stem the inflected words which are overcome by applying string similarity function of dynamic programming. For measuring the string similarity function, edit distance is used. The stripped inflected word is compared with the words stored in a text database available. That word having minimum distance is taken as the substitution of the stripped inflected word which leads to the stem of it. The concept of traditional rule based system and corpus based approach is heavily used in this approach. This algorithm is tested for Nepali Language which is based on Devanagari Script. The approach has given better result in comparison to traditional rule based system particularly for Nepali Language only. The total accuracy of this hybrid algorithm is 70.10% whereas the total accuracy of traditional rule based system is 68.43%.展开更多
Beginning with the characteristics of information resources on the Internet,this article discusses the characteristics of information retrieval on the Web and the development trend of the information retrieval based o...Beginning with the characteristics of information resources on the Internet,this article discusses the characteristics of information retrieval on the Web and the development trend of the information retrieval based on the Internet.The relationship and differences among traditional information retrieval,data mining and text mining are emphasized.展开更多
针对自然语言处理中查询主题漂移和词不匹配问题,提出基于CSC(Copulas-based Support and Confidence)框架的关联模式挖掘与规则扩展算法,并将基于统计学分析的关联模式与具有上下文语义信息的词向量融合,提出关联模式挖掘与词向量学习...针对自然语言处理中查询主题漂移和词不匹配问题,提出基于CSC(Copulas-based Support and Confidence)框架的关联模式挖掘与规则扩展算法,并将基于统计学分析的关联模式与具有上下文语义信息的词向量融合,提出关联模式挖掘与词向量学习融合的伪相关反馈查询扩展模型.该模型对伪相关反馈文档集挖掘规则扩展词,对初检文档集进行词嵌入学习训练得到词向量,计算规则扩展词与原查询的向量相似度,提取向量相似度不低于阈值的规则扩展词作为最终扩展词.实验结果表明,所提扩展模型能有效地减少查询主题漂移和词不匹配问题,提高检索性能,与现有基于关联模式的和基于词向量的查询扩展方法比较,MAP(Mean Average Precision)平均增幅最大可达17.52%,对短查询更有效.所提挖掘方法可用于其他文本挖掘任务和推荐系统,以提高其性能.展开更多
文摘The fourth international conference on Web information systems and applications (WISA 2007) has received 409 submissions and has accepted 37 papers for publication in this issue. The papers cover broad research areas, including Web mining and data warehouse, Deep Web and Web integration, P2P networks, text processing and information retrieval, as well as Web Services and Web infrastructure. After briefly introducing the WISA conference, the survey outlines the current activities and future trends concerning Web information systems and applications based on the papers accepted for publication.
文摘Digitalization has changed the way of information processing, and newtechniques of legal data processing are evolving. Text mining helps to analyze andsearch different court cases available in the form of digital text documents toextract case reasoning and related data. This sort of case processing helps professionals and researchers to refer the previous case with more accuracy in reducedtime. The rapid development of judicial ontologies seems to deliver interestingproblem solving to legal knowledge formalization. Mining context informationthrough ontologies from corpora is a challenging and interesting field. Thisresearch paper presents a three tier contextual text mining framework throughontologies for judicial corpora. This framework comprises on the judicial corpus,text mining processing resources and ontologies for mining contextual text fromcorpora to make text and data mining more reliable and fast. A top-down ontologyconstruction approach has been adopted in this paper. The judicial corpus hasbeen selected with a sufficient dataset to process and evaluate the results.The experimental results and evaluations show significant improvements incomparison with the available techniques.
文摘In this paper, a new context free stemmer is proposed which consists of the combination of traditional rule based system with string similarity approach. This algorithm can be called as hybrid algorithm. It is language dependent algorithm. Context free stemmer means that stemmer which stems the word that is not based on the context i.e., for every context such rule is applied. After stripping the words using traditional context free rule based approach, it may over stem or under stem the inflected words which are overcome by applying string similarity function of dynamic programming. For measuring the string similarity function, edit distance is used. The stripped inflected word is compared with the words stored in a text database available. That word having minimum distance is taken as the substitution of the stripped inflected word which leads to the stem of it. The concept of traditional rule based system and corpus based approach is heavily used in this approach. This algorithm is tested for Nepali Language which is based on Devanagari Script. The approach has given better result in comparison to traditional rule based system particularly for Nepali Language only. The total accuracy of this hybrid algorithm is 70.10% whereas the total accuracy of traditional rule based system is 68.43%.
文摘Beginning with the characteristics of information resources on the Internet,this article discusses the characteristics of information retrieval on the Web and the development trend of the information retrieval based on the Internet.The relationship and differences among traditional information retrieval,data mining and text mining are emphasized.
文摘针对自然语言处理中查询主题漂移和词不匹配问题,提出基于CSC(Copulas-based Support and Confidence)框架的关联模式挖掘与规则扩展算法,并将基于统计学分析的关联模式与具有上下文语义信息的词向量融合,提出关联模式挖掘与词向量学习融合的伪相关反馈查询扩展模型.该模型对伪相关反馈文档集挖掘规则扩展词,对初检文档集进行词嵌入学习训练得到词向量,计算规则扩展词与原查询的向量相似度,提取向量相似度不低于阈值的规则扩展词作为最终扩展词.实验结果表明,所提扩展模型能有效地减少查询主题漂移和词不匹配问题,提高检索性能,与现有基于关联模式的和基于词向量的查询扩展方法比较,MAP(Mean Average Precision)平均增幅最大可达17.52%,对短查询更有效.所提挖掘方法可用于其他文本挖掘任务和推荐系统,以提高其性能.