Sentence similarity computing plays an important role in machine question-answering systems, machine-translation systems, information retrieval and automatic abstracting systems. This article firstly sums up several m...Sentence similarity computing plays an important role in machine question-answering systems, machine-translation systems, information retrieval and automatic abstracting systems. This article firstly sums up several methods for calculating similarity between sentences, and brings out a new method which takes all factors into consideration including critical words, semantic information, sentential form and sen-tence length. And on this basis, a automatic abstracting system based on LexRank algorithm is implemented. We made several improvements in both sentence weight computing and redundancy resolution. The system described in this article could deal with single or multi-document summarization both in English and Chinese. With evaluations on two corpuses, our system could produce better summaries to a certain degree. We also show that our system is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. And in the end, existing problem and the developing trend of automatic summariza-tion technology are discussed.展开更多
Collaborative platform on clustering applications for governments consists of six large-scale systems, including the clustering Government Internet portal system, clustering public-mailboxes collaboration system, clus...Collaborative platform on clustering applications for governments consists of six large-scale systems, including the clustering Government Internet portal system, clustering public-mailboxes collaboration system, clustering government affairs portal system, clustering emergency information collaboration system, clustering office automation collaboration system, and clustering messages collaboration systems. The appli-cation and technology architectures of the collaborative platform are elaborated in this paper,and the major key technologies on the platform are also expounded, which includes realization of many governments ap-plications’ scale integration and collaborative application, business model driven software development plat-form based on SOA, SSO, tans-departmental and cross-level multi-engine clustering protocol. Based on the "clustering application"design, to maximize the utilization of hardware, software resources and administra-tive resources of the provincial government collaborative platform, rural districts and counties can build their own platforms based on the provincial platform. The platform having been running for over 2 years shows that planning of urban and rural e-governments’ construction and maintenance is achieved, thus reducing costs greatly and improving governments’ functions.展开更多
Previous work on the one-class collaborative filtering (OCCF) problem can be roughly categorized into pointwise methods, pairwise methods, and content-based methods. A fundamental assumption of these approaches is t...Previous work on the one-class collaborative filtering (OCCF) problem can be roughly categorized into pointwise methods, pairwise methods, and content-based methods. A fundamental assumption of these approaches is that all missing values in the user-item rating matrix are considered negative. However, this assumption may not hold because the missing values may contain negative and positive examples. For example, a user who fails to give positive feedback about an item may not necessarily dislike it; he may simply be unfamiliar with it. Meanwhile, content-based methods, e.g. collaborative topic regression (CTR), usually require textual content information of the items, and thus their applicability is largely limited when the text information is not available. In this paper, we propose to apply the latent Dirichlet allocation (LDA) model on OCCF to address the above-mentioned problems. The basic idea of this approach is that items are regarded as words, users are considered as documents, and the user-item feedback matrix constitutes the corpus. Our model drops the strong assumption that missing values are all negative and only utilizes the observed data to predict a user's interest. Additionally, the proposed model does not need content information of the items. Experimental results indicate that the proposed method outperforms previous methods on various ranking-oriented evaluation metrics.We further combine this method with a matrix factorizationbased method to tackle the multi-class collaborative filtering (MCCF) problem, which also achieves better performance on predicting user ratings.展开更多
文摘Sentence similarity computing plays an important role in machine question-answering systems, machine-translation systems, information retrieval and automatic abstracting systems. This article firstly sums up several methods for calculating similarity between sentences, and brings out a new method which takes all factors into consideration including critical words, semantic information, sentential form and sen-tence length. And on this basis, a automatic abstracting system based on LexRank algorithm is implemented. We made several improvements in both sentence weight computing and redundancy resolution. The system described in this article could deal with single or multi-document summarization both in English and Chinese. With evaluations on two corpuses, our system could produce better summaries to a certain degree. We also show that our system is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. And in the end, existing problem and the developing trend of automatic summariza-tion technology are discussed.
文摘Collaborative platform on clustering applications for governments consists of six large-scale systems, including the clustering Government Internet portal system, clustering public-mailboxes collaboration system, clustering government affairs portal system, clustering emergency information collaboration system, clustering office automation collaboration system, and clustering messages collaboration systems. The appli-cation and technology architectures of the collaborative platform are elaborated in this paper,and the major key technologies on the platform are also expounded, which includes realization of many governments ap-plications’ scale integration and collaborative application, business model driven software development plat-form based on SOA, SSO, tans-departmental and cross-level multi-engine clustering protocol. Based on the "clustering application"design, to maximize the utilization of hardware, software resources and administra-tive resources of the provincial government collaborative platform, rural districts and counties can build their own platforms based on the provincial platform. The platform having been running for over 2 years shows that planning of urban and rural e-governments’ construction and maintenance is achieved, thus reducing costs greatly and improving governments’ functions.
基金We greatly appreciate Weike Pan for his codes of algorithm GBPR[1], which makes us able to evaluate the algorithm more efficiently and more fairly. This work was supported by the National Natural Science Foundation of China (NSFC) (Grant Nos. 61370126, 61672081, 71540028, 61571052, 61602237), National High-tech R&D Program of China (2015AA016004), Beijing Advanced Innovation Center for Imaging Technology (BAICIT-2016001), the Fund of the State Key Laboratory of Software Development Environment (SKLSDE-2013ZX-19), the Fund of Beijing Social Science (14JGC103), the Statistics Research Project of National Bureau (2013LY055), and the Fund of Beijing Wuzi University, China (GJB20141002).
文摘Previous work on the one-class collaborative filtering (OCCF) problem can be roughly categorized into pointwise methods, pairwise methods, and content-based methods. A fundamental assumption of these approaches is that all missing values in the user-item rating matrix are considered negative. However, this assumption may not hold because the missing values may contain negative and positive examples. For example, a user who fails to give positive feedback about an item may not necessarily dislike it; he may simply be unfamiliar with it. Meanwhile, content-based methods, e.g. collaborative topic regression (CTR), usually require textual content information of the items, and thus their applicability is largely limited when the text information is not available. In this paper, we propose to apply the latent Dirichlet allocation (LDA) model on OCCF to address the above-mentioned problems. The basic idea of this approach is that items are regarded as words, users are considered as documents, and the user-item feedback matrix constitutes the corpus. Our model drops the strong assumption that missing values are all negative and only utilizes the observed data to predict a user's interest. Additionally, the proposed model does not need content information of the items. Experimental results indicate that the proposed method outperforms previous methods on various ranking-oriented evaluation metrics.We further combine this method with a matrix factorizationbased method to tackle the multi-class collaborative filtering (MCCF) problem, which also achieves better performance on predicting user ratings.