We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornog...We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornographic webpages with the following major advantages: 1) a weighted window-based technique was proposed to estimate for the condition of concept drift for the keywords found recently in pornographic webpages; 2) checking only contexts of webpages without scanning pictures; 3) an incremental learning mechanism was designed to incrementally update the pornographic keyword database.展开更多
A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting...A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4.5 algorithm, the disadvantage of excessive adaptation in C4.5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.展开更多
提出一种基于广义后缀树的概念生成算法(generalized suffix tree based concept generation algorithm,GSTCG),将背景中所有对象的属性序列及其后缀建立为一棵广义后缀树,并根据广义后缀树产生候选概念;其次,合并具有相同对象集合的候...提出一种基于广义后缀树的概念生成算法(generalized suffix tree based concept generation algorithm,GSTCG),将背景中所有对象的属性序列及其后缀建立为一棵广义后缀树,并根据广义后缀树产生候选概念;其次,合并具有相同对象集合的候选概念,再根据规则对候选概念进行扩展;最后,删除冗余的候选概念后得到全部形式概念。在两类不同参数人工数据集上的实验结果表明,GSTCG算法与NextClosure算法在所有背景上得到的概念数量一致,且前者具有更优的时间性能。展开更多
基金supported by MOST under Grant No.MOST 103-2410-H-004-112
文摘We applied the decision tree algorithm to learn association rules between webpage’s category(pornographic or normal) and the critical features.Based on these rules, we proposed an efficient method of filtering pornographic webpages with the following major advantages: 1) a weighted window-based technique was proposed to estimate for the condition of concept drift for the keywords found recently in pornographic webpages; 2) checking only contexts of webpages without scanning pictures; 3) an incremental learning mechanism was designed to incrementally update the pornographic keyword database.
文摘A new classification algorithm for web mining is proposed on the basis of general classification algorithm for data mining in order to implement personalized information services. The building tree method of detecting class threshold is used for construction of decision tree according to the concept of user expectation so as to find classification rules in different layers. Compared with the traditional C4.5 algorithm, the disadvantage of excessive adaptation in C4.5 has been improved so that classification results not only have much higher accuracy but also statistic meaning.
文摘提出一种基于广义后缀树的概念生成算法(generalized suffix tree based concept generation algorithm,GSTCG),将背景中所有对象的属性序列及其后缀建立为一棵广义后缀树,并根据广义后缀树产生候选概念;其次,合并具有相同对象集合的候选概念,再根据规则对候选概念进行扩展;最后,删除冗余的候选概念后得到全部形式概念。在两类不同参数人工数据集上的实验结果表明,GSTCG算法与NextClosure算法在所有背景上得到的概念数量一致,且前者具有更优的时间性能。