Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web the...Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining.展开更多
The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics an...The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics and identify the boundary of each subtopic. Based on the term frequency matrix, the method measures the similarity between adjacent blocks, such as paragraphs, passages. In the real-world sample experiment, the macro-averaged precision and recall reach 73.4 % and 82.5 %, and the micro-averaged precision and recall reach 72.9% and 83. 1%. Moreover, this method is equally efficient to other Asian languages such as Japanese and Korean, as well as other western languages.展开更多
As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This spa...As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags for clustering. In this work, we propose a user-related tag expansion method to overcome this problem, which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. To tackle this problem, we have designed a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion, especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods., which indicates that tags could be a better resource for the clustering task.展开更多
While XML web services become recognized as a solution to business-to-business transactions, there are many problems that should be solved. For example, it is not easy to manipulate business documents of existing stan...While XML web services become recognized as a solution to business-to-business transactions, there are many problems that should be solved. For example, it is not easy to manipulate business documents of existing standards such as RosettaNet and UN/EDIFACT EDI, traditionally regarded as an important resource for managing B2B relationships. As a starting point for the complete implementation of B2B web services, this paper deals with how to support B2B business documents in XML web services. In the first phase, basic requirements for driving XML web services by business documents are introduced. As a solution, this paper presents how to express B2B business documents in WSDL, a core standard for XML web services. This kind of approach facilitates the reuse of existing business documents and enhances interoperability between implemented web services. Furthermore, it suggests how to link with other conceptual modeling frameworks such as ebXML/UMM, built on a rich heritage of electronic business experience.展开更多
文摘Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge. Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. Key words conceptual clustering - clustering center - dynamic conceptual clustering - theme - web documents clustering CLC number TP 311 Foundation item: Supported by the National “863” Program of China (2002AA111010, 2003AA001032)Biography: WANG Yun-hua(1979-), male, Master candidate, research direction: knowledge engineering and data mining.
基金Supported by the National High Tech-nology Research and Development Program of China(2002AA119050)
文摘The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics and identify the boundary of each subtopic. Based on the term frequency matrix, the method measures the similarity between adjacent blocks, such as paragraphs, passages. In the real-world sample experiment, the macro-averaged precision and recall reach 73.4 % and 82.5 %, and the micro-averaged precision and recall reach 72.9% and 83. 1%. Moreover, this method is equally efficient to other Asian languages such as Japanese and Korean, as well as other western languages.
基金supported by the National Natural Science Foundation of China under Grant No. 61070111
文摘As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags for clustering. In this work, we propose a user-related tag expansion method to overcome this problem, which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. To tackle this problem, we have designed a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion, especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods., which indicates that tags could be a better resource for the clustering task.
文摘While XML web services become recognized as a solution to business-to-business transactions, there are many problems that should be solved. For example, it is not easy to manipulate business documents of existing standards such as RosettaNet and UN/EDIFACT EDI, traditionally regarded as an important resource for managing B2B relationships. As a starting point for the complete implementation of B2B web services, this paper deals with how to support B2B business documents in XML web services. In the first phase, basic requirements for driving XML web services by business documents are introduced. As a solution, this paper presents how to express B2B business documents in WSDL, a core standard for XML web services. This kind of approach facilitates the reuse of existing business documents and enhances interoperability between implemented web services. Furthermore, it suggests how to link with other conceptual modeling frameworks such as ebXML/UMM, built on a rich heritage of electronic business experience.