In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and...In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and mathematically defined. Not only such a definition is possible, it is actually provably better than any other feasible definitions. Not only such an ultimate definition is possible, but also it can be conveniently and fruitfully applied to construct a QA system. We have built such a system -- QUANTA. Extensive experiments are conducted to justify the new theory.展开更多
Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is di...Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective.展开更多
The configuration of information system security policy is directly related to the information asset risk, and the configuration required by the classified security protection is able to ensure the optimal and minimum...The configuration of information system security policy is directly related to the information asset risk, and the configuration required by the classified security protection is able to ensure the optimal and minimum policy in the corresponding security level. Through the random survey on the information assets of multiple departments, this paper proposes the relative deviation distance of security policy configuration as risk measure parameter based on the distance of information-state transition(DIT) theory. By quantitatively analyzing the information asset weight, deviation degree and DIT, we establish the evaluation model for information system. With example analysis, the results prove that this method conducts effective risk evaluation on the information system intuitively and reliably, avoids the threat caused by subjective measurement, and shows performance benefits compared with existing solutions. It is not only theoretically but also practically feasible to realize the scientific analysis of security risk for the information system.展开更多
Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summa...Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC/TAC 2007 to 2009 datasets (http://duc.nist.gov/, http://www.nist.gov/tac/) have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.展开更多
Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and language...Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and languageindependent Multiword Expression Distance (MED). The new metric is derived from an accepted physical principle, measures the distance from an n-gram to its semantics, and outperforms other state-of-the-art methods on MWEs in two applications: question answering and named entity extraction.展开更多
Belief functions theory is an important tool in the field of information fusion. However, when the cardinality of the frame of discernment becomes large, the high computational cost of evidence combination will become...Belief functions theory is an important tool in the field of information fusion. However, when the cardinality of the frame of discernment becomes large, the high computational cost of evidence combination will become the bottleneck of belief functions theory in real applications. The basic probability assignment (BPA) approximations, which can reduce the complexity of the BPAs, are always used to reduce the computational cost of evidence combination. In this paper, both the cardinalities and the mass assignment values of focal elements are used as the criteria of reduction. The two criteria are jointly used by using rank-level fusion. Some experiments and related analyses are provided to illustrate and justify the proposed new BPA approximation approach.展开更多
基金the National Natural Science Foundation of China under Grant Nos.60572084 and 60621062.
文摘In a question answering (QA) system, the fundamental problem is how to measure the distance between a question and an answer, hence ranking different answers. We demonstrate that such a distance can be precisely and mathematically defined. Not only such a definition is possible, it is actually provably better than any other feasible definitions. Not only such an ultimate definition is possible, but also it can be conveniently and fruitfully applied to construct a QA system. We have built such a system -- QUANTA. Extensive experiments are conducted to justify the new theory.
文摘Data processing of small samples is an important and valuable research problem in the electronic equipment test. Because it is difficult and complex to determine the probability distribution of small samples, it is difficult to use the traditional probability theory to process the samples and assess the degree of uncertainty. Using the grey relational theory and the norm theory, the grey distance information approach, which is based on the grey distance information quantity of a sample and the average grey distance information quantity of the samples, is proposed in this article. The definitions of the grey distance information quantity of a sample and the average grey distance information quantity of the samples, with their characteristics and algorithms, are introduced. The correlative problems, including the algorithm of estimated value, the standard deviation, and the acceptance and rejection criteria of the samples and estimated results, are also proposed. Moreover, the information whitening ratio is introduced to select the weight algorithm and to compare the different samples. Several examples are given to demonstrate the application of the proposed approach. The examples show that the proposed approach, which has no demand for the probability distribution of small samples, is feasible and effective.
基金Supported by the National Natural Science Foundation of China(61662009)the Education Reform Project in Guizhou Province(SJJG201404)the Natural Science Foundation of Guizhou Province Education Department(KY(2015)367)
文摘The configuration of information system security policy is directly related to the information asset risk, and the configuration required by the classified security protection is able to ensure the optimal and minimum policy in the corresponding security level. Through the random survey on the information assets of multiple departments, this paper proposes the relative deviation distance of security policy configuration as risk measure parameter based on the distance of information-state transition(DIT) theory. By quantitatively analyzing the information asset weight, deviation degree and DIT, we establish the evaluation model for information system. With example analysis, the results prove that this method conducts effective risk evaluation on the information system intuitively and reliably, avoids the threat caused by subjective measurement, and shows performance benefits compared with existing solutions. It is not only theoretically but also practically feasible to realize the scientific analysis of security risk for the information system.
基金supported by the National Natural Science Foundation of China under Grant No.60973104the National Basic Research 973 Program of China under Grant No.2007CB311003the IRCI Project from IDRC,Canada
文摘Fast changing knowledge on the Internet can be acquired more efficiently with the help of automatic document summarization and updating techniques. This paper describes a novel approach for multi-document update summarization. The best summary is defined to be the one which has the minimum information distance to the entire document set. The best update summary has the minimum conditional information distance to a document cluster given that a prior document cluster has already been read. Experiments on the DUC/TAC 2007 to 2009 datasets (http://duc.nist.gov/, http://www.nist.gov/tac/) have proved that our method closely correlates with the human summaries and outperforms other programs such as LexRank in many categories under the ROUGE evaluation criterion.
基金supported mainly by Canada's IDRC Research Chair in Information Technology Program,under Grant No.104519006supported by the National Natural Science Foundation of China under Grant No.60973104+2 种基金the National Basic Research 973 Program of China under Grant No.2007CB311003NSERC Grant OGP0046506Canada Research Chair Program,MITACS,an NSERC Collaborative Grant,and Ontario's Premier's Discovery Award
文摘Multiword Expressions (MWEs) appear frequently and ungrammatically in natural languages. Identifying MWEs in free texts is a very challenging problem. This paper proposes a knowledge-free, unsupervised, and languageindependent Multiword Expression Distance (MED). The new metric is derived from an accepted physical principle, measures the distance from an n-gram to its semantics, and outperforms other state-of-the-art methods on MWEs in two applications: question answering and named entity extraction.
基金co-supported by Grant for State Key Program for Basic Research of China(No.2013CB329405)National Natural Science Foundation of China(Nos.61104214,61203222)+3 种基金Foundation for Innovative Research Groups of the National Natural Science Foundation of China(No.61221063)Specialized Research Fund for the Doctoral Program of Higher Education(No.20120201120036)China Postdoctoral Science Foundation(No.20100481337),China Postdoctoral Science Foundation-Special fund(No.201104670)Fundamental Research Funds for the Central Universities
文摘Belief functions theory is an important tool in the field of information fusion. However, when the cardinality of the frame of discernment becomes large, the high computational cost of evidence combination will become the bottleneck of belief functions theory in real applications. The basic probability assignment (BPA) approximations, which can reduce the complexity of the BPAs, are always used to reduce the computational cost of evidence combination. In this paper, both the cardinalities and the mass assignment values of focal elements are used as the criteria of reduction. The two criteria are jointly used by using rank-level fusion. Some experiments and related analyses are provided to illustrate and justify the proposed new BPA approximation approach.