The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of ...The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of object and etalon descriptions while maintaining the required level of classification efficiency.The class to be recognized is represented by an infinite set of images obtained from the etalon by applying arbitrary geometric transformations.It is proposed to reduce the descriptions for the etalon database by selecting the most significant descriptor components according to the information content criterion.The informativeness of an etalon descriptor is estimated by the difference of the closest distances to its own and other descriptions.The developed method determines the relevance of the full description of the recognized object with the reduced description of the etalons.Several practical models of the classifier with different options for establishing the correspondence between object descriptors and etalons are considered.The results of the experimental modeling of the proposed methods for a database including images of museum jewelry are presented.The test sample is formed as a set of images from the etalon database and out of the database with the application of geometric transformations of scale and rotation in the field of view.The practical problems of determining the threshold for the number of votes,based on which a classification decision is made,have been researched.Modeling has revealed the practical possibility of tenfold reducing descriptions with full preservation of classification accuracy.Reducing the descriptions by twenty times in the experiment leads to slightly decreased accuracy.The speed of the analysis increases in proportion to the degree of reduction.The use of reduction by the informativeness criterion confirmed the possibility of obtaining the most significant subset of features for classification,which guarantees a decent level of accuracy.展开更多
It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limit...It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.展开更多
Ontologies have been used for several years in life sciences to formally represent concepts and reason about knowledge bases in domains such as the semantic web, information retrieval and artificial intelligence. The ...Ontologies have been used for several years in life sciences to formally represent concepts and reason about knowledge bases in domains such as the semantic web, information retrieval and artificial intelligence. The exploration of these domains for the correspondence of semantic content requires calculation of the measure of semantic similarity between concepts. Semantic similarity is a measure on a set of documents, based on the similarity of their meanings, which refers to the similarity between two concepts belonging to one or more ontologies. The similarity between concepts is also a quantitative measure of information, calculated based on the properties of concepts and their relationships. This study proposes a method for finding similarity between concepts in two different ontologies based on feature, information content and structure. More specifically, this means proposing a hybrid method using two existing measures to find the similarity between two concepts from different ontologies based on information content and the set of common superconcepts, which represents the set of common parent concepts. We simulated our method on datasets. The results show that our measure provides similarity values that are better than those reported in the literature.展开更多
Converting customer needs into specific forms and providing consumers with services are crucial in product design.Currently,conversion is no longer difficult due to the development of modern technology,and various mea...Converting customer needs into specific forms and providing consumers with services are crucial in product design.Currently,conversion is no longer difficult due to the development of modern technology,and various measures can be applied for product realization,thus increasing the complexity of analysis and evaluation in the design process.The focus of the design process has thus shifted from problem solving to minimizing the total amount of information content.This paper presents a New Hybrid Axiomatic Design(AD)Methodology based on iteratively matching and merging design parameters that meet the independence axiom and attribute constraints by applying trimming technology,the ideal final results,and technology evolution theory.The proposed method minimizes the total amount of information content and improves the design quality.Finally,a case study of a rehabilitation robot design for hemiplegic patients is presented.The results indicate that the iterative matching and merging of related attributes can minimize the total amount of information content,reduce the cost,and improve design efficiency.Additionally,evolutionary technology prediction can ensure product novelty and improve market competitiveness.The methodology provides an excellent way to design a new(or improved)product.展开更多
[Objective] This study aimed to improve the accuracy of remote sensing classification for Dongting Lake Wetland.[Method] Based on the TM data and ground GIS information of Donting Lake,the decision tree classification...[Objective] This study aimed to improve the accuracy of remote sensing classification for Dongting Lake Wetland.[Method] Based on the TM data and ground GIS information of Donting Lake,the decision tree classification method was established through the expert classification knowledge base.The images of Dongting Lake wetland were classified into water area,mudflat,protection forest beach,Carem spp beach,Phragmites beach,Carex beach and other water body according to decision tree layers.[Result] The accuracy of decision tree classification reached 80.29%,which was much higher than the traditional method,and the total Kappa coefficient was 0.883 9,indicating that the data accuracy of this method could fulfill the requirements of actual practice.In addition,the image classification results based on knowledge could solve some classification mistakes.[Conclusion] Compared with the traditional method,the decision tree classification based on rules could classify the images by using various conditions,which reduced the data processing time and improved the classification accuracy.展开更多
Because of the developed economy and lush vegetation in southern China, the following obstacles or difficulties exist in remote sensing land surface classification: 1) Diverse surface composition types;2) Undulating t...Because of the developed economy and lush vegetation in southern China, the following obstacles or difficulties exist in remote sensing land surface classification: 1) Diverse surface composition types;2) Undulating terrains;3) Small fragmented land;4) Indistinguishable shadows of surface objects. It is our top priority to clarify how to use the concept of big data (Data mining technology) and various new technologies and methods to make complex surface remote sensing information extraction technology develop in the direction of automation, refinement and intelligence. In order to achieve the above research objectives, the paper takes the Gaofen-2 satellite data produced in China as the data source, and takes the complex surface remote sensing information extraction technology as the research object, and intelligently analyzes the remote sensing information of complex surface on the basis of completing the data collection and preprocessing. The specific extraction methods are as follows: 1) extraction research on fractal texture features of Brownian motion;2) extraction research on color features;3) extraction research on vegetation index;4) research on vectors and corresponding classification. In this paper, fractal texture features, color features, vegetation features and spectral features of remote sensing images are combined to form a combination feature vector, which improves the dimension of features, and the feature vector improves the difference of remote sensing features, and it is more conducive to the classification of remote sensing features, and thus it improves the classification accuracy of remote sensing images. It is suitable for remote sensing information extraction of complex surface in southern China. This method can be extended to complex surface area in the future.展开更多
In most of the passive tracking systems, only the target kinematical information is used in the measurement-to-track association, which results in error tracking in a multitarget environment, where the targets are too...In most of the passive tracking systems, only the target kinematical information is used in the measurement-to-track association, which results in error tracking in a multitarget environment, where the targets are too close to each other. To enhance the tracking accuracy, the target signal classification information (TSCI) should be used to improve the data association. The TSCI is integrated in the data association process using the JPDA (joint probabilistic data association). The use of the TSCI in the data association can improve discrimination by yielding a purer track and preserving continuity. To verify the validity of the application of TSCI, two simulation experiments are done on an air target-tracing problem, that is, one using the TSCI and the other not using the TSCI. The final comparison shows that the use of the TSCI can effectively improve tracking accuracy.展开更多
Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three ...Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction. Key words information extraction - competing classification - feature extraction - wrapper induction CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (60303024)Biography: LI Xiang-yang (1974-), male, Ph. D. Candidate, research direction: information extraction, natural language processing.展开更多
Element contents of tree rings and soils near tree roots collected from Deodar cedar (Cedrus deodara (Roxb.) G. Don) and Masson pine (Picks massoniana lamb.) were determined to study the relationship between the angul...Element contents of tree rings and soils near tree roots collected from Deodar cedar (Cedrus deodara (Roxb.) G. Don) and Masson pine (Picks massoniana lamb.) were determined to study the relationship between the angular distribution of element contents in tree rings and the environmental information. The chemical composition and properties of soils are very much complicated, which leads to the non-uniform distribution of the element contents in tree rings. The statistical multi-variable regression method was used to got the information of the tree-centered distribution of element contents in the environment (soil) (C’), C’(z, θj ), from the distribution of element contents in tree rings (C), C(Z, θi), which depends on the plane azimuth angle (θi), i. e., C=C(Z,θi), where Z is the atomic number of the element, with a satisfactory result,though this study is only a primary one.展开更多
The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descript...The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining.展开更多
At present, network information audit system is almost based on text information filtering, but badness information is embedded into image or image file directly by badness information provider, in order to avoid moni...At present, network information audit system is almost based on text information filtering, but badness information is embedded into image or image file directly by badness information provider, in order to avoid monitored by. The paper realizes an information audit system based on image content filtering. Taking the pornographic program identification for an example, the system can monitor the video including any abnormal human body information by matching the texture characters with those defined in advance, which consist of contrast, energy, correlation measure and entropy character measure and so on.展开更多
Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics...Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics of geometric explosion of information in the era of big data,this paper studies the possible approaches to balance the maximum value and privacy of information,and disposes the Nine-Cells information matrix,hierarchical classification.Furthermore,the paper uses the rough sets theory to proceed from the two dimensions of value and privacy,establishes information classification method,puts forward the countermeasures for information security.Taking spam messages for example,the massive spam messages can be classified,and then targeted hierarchical management strategy was put forward.This paper proposes personal Information index system,Information management platform and possible solutions to protect information security and utilize information value in the age of big data.展开更多
Information embodied in machine component classification codes has internal relation with the probability distribu- tion of the code symbol. This paper presents a model considering codes as information source based on...Information embodied in machine component classification codes has internal relation with the probability distribu- tion of the code symbol. This paper presents a model considering codes as information source based on Shannon’s information theory. Using information entropy, it preserves the mathematical form and quantitatively measures the information amount of a symbol and a bit in the machine component classification coding system. It also gets the maximum value of information amount and the corresponding coding scheme when the category of symbols is fixed. Samples are given to show how to evaluate the information amount of component codes and how to optimize a coding system.展开更多
The relationship between the importance of criterion and the criterion aggregation function is discussed, criterion's weight and combinational weights between some criteria are defined, and a multi-criteria classific...The relationship between the importance of criterion and the criterion aggregation function is discussed, criterion's weight and combinational weights between some criteria are defined, and a multi-criteria classification method with incomplete certain information and polynomial aggregation function is proposed. First, linear programming is constructed by classification to reference alternative set (assignment examples) and incomplete certain information on criterion's weights. Then the coefficient of the polynomial aggregation function and thresholds of categories are gained by solving the linear programming. And the consistency index of alternatives is obtained, the classification of the alternatives is achieved. The certain criteria's values of categories and uncertain criteria's values of categories are discussed in the method. Finally, an example shows the feasibility and availability of this method.展开更多
This paper derives the variance of the information content and develops its statistical inference method. We describe the relations between information content and sensitivity, specificity, efficiency, prevalence rate...This paper derives the variance of the information content and develops its statistical inference method. We describe the relations between information content and sensitivity, specificity, efficiency, prevalence rate. If sensitivity, specificity and efficiency are fixed, the closer to 0. 5 the prevalence rate is, the more the information content. If prevalence rate and efficiency are fixed, the closer to each other the sensitivity and specificity are, the more the information content. We compare the power of information content method, efficiecy test, Youden's index test and kappa coefficient method. The information content method has higher power than the other methods in most conditions. It is especially sensitive to the difference between two sensitivities. It comes to conclusion that the information content method has more virtues than the other methods mentioned in this paper.展开更多
In order to solve the poor performance in text classification when using traditional formula of mutual information (MI) , a feature selection algorithm were proposed based on improved mutual information. The improve...In order to solve the poor performance in text classification when using traditional formula of mutual information (MI) , a feature selection algorithm were proposed based on improved mutual information. The improved mutual information algorithm, which is on the basis of traditional improved mutual information methods that enbance the MI value of negative characteristics and feature' s frequency, supports the concept of concentration degree and dispersion degree. In accordance with the concept of concentration degree and dispersion degree, formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these. In this paper, the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods. The experimental results showed that the improved mutu- al information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain. Through the introduction of the concept of concentration degree and dispersion degree, the improved mutual information feature selection method greatly improves the performance of text classification system.展开更多
A database stores data in order to provide the user with information. However, how a database may achieve this is not always clear. The main reason for this seems that we, who are in the database community, have not f...A database stores data in order to provide the user with information. However, how a database may achieve this is not always clear. The main reason for this seems that we, who are in the database community, have not fully understood and therefore clearly defined the notion of “the information that data in a database carry”, in other words, “the information content of data”. As a result, databases’ capability is limited in terms of answering queries, especially, when users explore information beyond the scope of data stored in a database, the database normally cannot provide it. The underlying reason of the problem is that queries are answered based on a direct match between a query and data (up to aggregations of the data). We observe that this is because the information that data carry is seen as exactly the data per se. To tackle this problem, we propose the notion of information content inclusion relation, and show that it formulates the intuitive notion of the “information content of data” and then show how this notion may be used for the derivation of information from data in a database.展开更多
Evaluating government openness is important in monitoring government performance and promoting government transparency. Therefore, it is necessary to develop an evaluation system for information openness of local gove...Evaluating government openness is important in monitoring government performance and promoting government transparency. Therefore, it is necessary to develop an evaluation system for information openness of local governments. In order to select evaluation indicators, we conducted a content analysis on current evaluation systems constructed by researchers and local governments and the materials of a case study on a local government. This evaluation system is composed of 5 first-tier indicators, 30 secondtier indicators and 69 third-tier indicators. Then Delphi Method and Analytic Hierarchy Process(AHP) Method are adopted to determine the weight of each indicator. At last, the practicability of the system is tested by an evaluation of the local government of Tianjin Binhai New Area, which has been undergoing administrative reform and attempting to reinvent itself in the past 5 years.展开更多
基金This research was funded by Prince Sattam bin Abdulaziz University(Project Number PSAU/2023/01/25387).
文摘The research aims to improve the performance of image recognition methods based on a description in the form of a set of keypoint descriptors.The main focus is on increasing the speed of establishing the relevance of object and etalon descriptions while maintaining the required level of classification efficiency.The class to be recognized is represented by an infinite set of images obtained from the etalon by applying arbitrary geometric transformations.It is proposed to reduce the descriptions for the etalon database by selecting the most significant descriptor components according to the information content criterion.The informativeness of an etalon descriptor is estimated by the difference of the closest distances to its own and other descriptions.The developed method determines the relevance of the full description of the recognized object with the reduced description of the etalons.Several practical models of the classifier with different options for establishing the correspondence between object descriptors and etalons are considered.The results of the experimental modeling of the proposed methods for a database including images of museum jewelry are presented.The test sample is formed as a set of images from the etalon database and out of the database with the application of geometric transformations of scale and rotation in the field of view.The practical problems of determining the threshold for the number of votes,based on which a classification decision is made,have been researched.Modeling has revealed the practical possibility of tenfold reducing descriptions with full preservation of classification accuracy.Reducing the descriptions by twenty times in the experiment leads to slightly decreased accuracy.The speed of the analysis increases in proportion to the degree of reduction.The use of reduction by the informativeness criterion confirmed the possibility of obtaining the most significant subset of features for classification,which guarantees a decent level of accuracy.
文摘It is common for datasets to contain both categorical and continuous variables. However, many feature screening methods designed for high-dimensional classification assume that the variables are continuous. This limits the applicability of existing methods in handling this complex scenario. To address this issue, we propose a model-free feature screening approach for ultra-high-dimensional multi-classification that can handle both categorical and continuous variables. Our proposed feature screening method utilizes the Maximal Information Coefficient to assess the predictive power of the variables. By satisfying certain regularity conditions, we have proven that our screening procedure possesses the sure screening property and ranking consistency properties. To validate the effectiveness of our approach, we conduct simulation studies and provide real data analysis examples to demonstrate its performance in finite samples. In summary, our proposed method offers a solution for effectively screening features in ultra-high-dimensional datasets with a mixture of categorical and continuous covariates.
文摘Ontologies have been used for several years in life sciences to formally represent concepts and reason about knowledge bases in domains such as the semantic web, information retrieval and artificial intelligence. The exploration of these domains for the correspondence of semantic content requires calculation of the measure of semantic similarity between concepts. Semantic similarity is a measure on a set of documents, based on the similarity of their meanings, which refers to the similarity between two concepts belonging to one or more ontologies. The similarity between concepts is also a quantitative measure of information, calculated based on the properties of concepts and their relationships. This study proposes a method for finding similarity between concepts in two different ontologies based on feature, information content and structure. More specifically, this means proposing a hybrid method using two existing measures to find the similarity between two concepts from different ontologies based on information content and the set of common superconcepts, which represents the set of common parent concepts. We simulated our method on datasets. The results show that our measure provides similarity values that are better than those reported in the literature.
基金Supported by Research Startup Fund Project of Fujian University of Technology(Grant No.GY-Z20089)Science Foundation for Young Scholars of Fujian Province of China(Grant No.2018J05099)Education and Scientific Research Projects of Young Teachers in Fujian Province of China(Grant No.JAT160313).
文摘Converting customer needs into specific forms and providing consumers with services are crucial in product design.Currently,conversion is no longer difficult due to the development of modern technology,and various measures can be applied for product realization,thus increasing the complexity of analysis and evaluation in the design process.The focus of the design process has thus shifted from problem solving to minimizing the total amount of information content.This paper presents a New Hybrid Axiomatic Design(AD)Methodology based on iteratively matching and merging design parameters that meet the independence axiom and attribute constraints by applying trimming technology,the ideal final results,and technology evolution theory.The proposed method minimizes the total amount of information content and improves the design quality.Finally,a case study of a rehabilitation robot design for hemiplegic patients is presented.The results indicate that the iterative matching and merging of related attributes can minimize the total amount of information content,reduce the cost,and improve design efficiency.Additionally,evolutionary technology prediction can ensure product novelty and improve market competitiveness.The methodology provides an excellent way to design a new(or improved)product.
文摘[Objective] This study aimed to improve the accuracy of remote sensing classification for Dongting Lake Wetland.[Method] Based on the TM data and ground GIS information of Donting Lake,the decision tree classification method was established through the expert classification knowledge base.The images of Dongting Lake wetland were classified into water area,mudflat,protection forest beach,Carem spp beach,Phragmites beach,Carex beach and other water body according to decision tree layers.[Result] The accuracy of decision tree classification reached 80.29%,which was much higher than the traditional method,and the total Kappa coefficient was 0.883 9,indicating that the data accuracy of this method could fulfill the requirements of actual practice.In addition,the image classification results based on knowledge could solve some classification mistakes.[Conclusion] Compared with the traditional method,the decision tree classification based on rules could classify the images by using various conditions,which reduced the data processing time and improved the classification accuracy.
文摘Because of the developed economy and lush vegetation in southern China, the following obstacles or difficulties exist in remote sensing land surface classification: 1) Diverse surface composition types;2) Undulating terrains;3) Small fragmented land;4) Indistinguishable shadows of surface objects. It is our top priority to clarify how to use the concept of big data (Data mining technology) and various new technologies and methods to make complex surface remote sensing information extraction technology develop in the direction of automation, refinement and intelligence. In order to achieve the above research objectives, the paper takes the Gaofen-2 satellite data produced in China as the data source, and takes the complex surface remote sensing information extraction technology as the research object, and intelligently analyzes the remote sensing information of complex surface on the basis of completing the data collection and preprocessing. The specific extraction methods are as follows: 1) extraction research on fractal texture features of Brownian motion;2) extraction research on color features;3) extraction research on vegetation index;4) research on vectors and corresponding classification. In this paper, fractal texture features, color features, vegetation features and spectral features of remote sensing images are combined to form a combination feature vector, which improves the dimension of features, and the feature vector improves the difference of remote sensing features, and it is more conducive to the classification of remote sensing features, and thus it improves the classification accuracy of remote sensing images. It is suitable for remote sensing information extraction of complex surface in southern China. This method can be extended to complex surface area in the future.
基金the Youth Science and Technology Foundection of University of Electronic Science andTechnology of China (JX0622).
文摘In most of the passive tracking systems, only the target kinematical information is used in the measurement-to-track association, which results in error tracking in a multitarget environment, where the targets are too close to each other. To enhance the tracking accuracy, the target signal classification information (TSCI) should be used to improve the data association. The TSCI is integrated in the data association process using the JPDA (joint probabilistic data association). The use of the TSCI in the data association can improve discrimination by yielding a purer track and preserving continuity. To verify the validity of the application of TSCI, two simulation experiments are done on an air target-tracing problem, that is, one using the TSCI and the other not using the TSCI. The final comparison shows that the use of the TSCI can effectively improve tracking accuracy.
文摘Web information extraction is viewed as a classification process and a competing classification method is presented to extract Web information directly through classification. Web fragments are represented with three general features and the similarities between fragments are then defined on the bases of these features. Through competitions of fragments for different slots in information templates, the method classifies fragments into slot classes and filters out noise information. Far less annotated samples are needed as compared with rule-based methods and therefore it has a strong portability. Experiments show that the method has good performance and is superior to DOM-based method in information extraction. Key words information extraction - competing classification - feature extraction - wrapper induction CLC number TP 311 Foundation item: Supported by the National Natural Science Foundation of China (60303024)Biography: LI Xiang-yang (1974-), male, Ph. D. Candidate, research direction: information extraction, natural language processing.
文摘Element contents of tree rings and soils near tree roots collected from Deodar cedar (Cedrus deodara (Roxb.) G. Don) and Masson pine (Picks massoniana lamb.) were determined to study the relationship between the angular distribution of element contents in tree rings and the environmental information. The chemical composition and properties of soils are very much complicated, which leads to the non-uniform distribution of the element contents in tree rings. The statistical multi-variable regression method was used to got the information of the tree-centered distribution of element contents in the environment (soil) (C’), C’(z, θj ), from the distribution of element contents in tree rings (C), C(Z, θi), which depends on the plane azimuth angle (θi), i. e., C=C(Z,θi), where Z is the atomic number of the element, with a satisfactory result,though this study is only a primary one.
文摘The frame of text classification system was presented. The high dimensionality in feature space for text classification was studied. The mutual information is a widely used information theoretic measure, in a descriptive way, to measure the stochastic dependency of discrete random variables. The measure method was used as a criterion to reduce high dimensionality of feature vectors in text classification on Web. Feature selections or conversions were performed by using maximum mutual information including linear and non-linear feature conversions. Entropy was used and extended to find right features commendably in pattern recognition systems. Favorable foundation would be established for text classification mining.
基金Supported by Hunan Provincial Natural ScienceFoundation of China(03JJY3103)
文摘At present, network information audit system is almost based on text information filtering, but badness information is embedded into image or image file directly by badness information provider, in order to avoid monitored by. The paper realizes an information audit system based on image content filtering. Taking the pornographic program identification for an example, the system can monitor the video including any abnormal human body information by matching the texture characters with those defined in advance, which consist of contrast, energy, correlation measure and entropy character measure and so on.
文摘Big data is becoming increasingly important because of the enormous information generation and storage in recent years.It has become a challenge to the data mining technique and management.Based on the characteristics of geometric explosion of information in the era of big data,this paper studies the possible approaches to balance the maximum value and privacy of information,and disposes the Nine-Cells information matrix,hierarchical classification.Furthermore,the paper uses the rough sets theory to proceed from the two dimensions of value and privacy,establishes information classification method,puts forward the countermeasures for information security.Taking spam messages for example,the massive spam messages can be classified,and then targeted hierarchical management strategy was put forward.This paper proposes personal Information index system,Information management platform and possible solutions to protect information security and utilize information value in the age of big data.
基金Projects supported by the Hi-Tech Research and Development Pro-gram (863) of China (No. 2004AA84ts03) and the Science and Technology Committee of Zhejiang Province (No. 2004C31018), China
文摘Information embodied in machine component classification codes has internal relation with the probability distribu- tion of the code symbol. This paper presents a model considering codes as information source based on Shannon’s information theory. Using information entropy, it preserves the mathematical form and quantitatively measures the information amount of a symbol and a bit in the machine component classification coding system. It also gets the maximum value of information amount and the corresponding coding scheme when the category of symbols is fixed. Samples are given to show how to evaluate the information amount of component codes and how to optimize a coding system.
基金This project was supported by the Social Science Foundation of Hunan(05YB74)
文摘The relationship between the importance of criterion and the criterion aggregation function is discussed, criterion's weight and combinational weights between some criteria are defined, and a multi-criteria classification method with incomplete certain information and polynomial aggregation function is proposed. First, linear programming is constructed by classification to reference alternative set (assignment examples) and incomplete certain information on criterion's weights. Then the coefficient of the polynomial aggregation function and thresholds of categories are gained by solving the linear programming. And the consistency index of alternatives is obtained, the classification of the alternatives is achieved. The certain criteria's values of categories and uncertain criteria's values of categories are discussed in the method. Finally, an example shows the feasibility and availability of this method.
文摘This paper derives the variance of the information content and develops its statistical inference method. We describe the relations between information content and sensitivity, specificity, efficiency, prevalence rate. If sensitivity, specificity and efficiency are fixed, the closer to 0. 5 the prevalence rate is, the more the information content. If prevalence rate and efficiency are fixed, the closer to each other the sensitivity and specificity are, the more the information content. We compare the power of information content method, efficiecy test, Youden's index test and kappa coefficient method. The information content method has higher power than the other methods in most conditions. It is especially sensitive to the difference between two sensitivities. It comes to conclusion that the information content method has more virtues than the other methods mentioned in this paper.
基金Sponsored by the National Nature Science Foundation Projects (Grant No. 60773070,60736044)
文摘In order to solve the poor performance in text classification when using traditional formula of mutual information (MI) , a feature selection algorithm were proposed based on improved mutual information. The improved mutual information algorithm, which is on the basis of traditional improved mutual information methods that enbance the MI value of negative characteristics and feature' s frequency, supports the concept of concentration degree and dispersion degree. In accordance with the concept of concentration degree and dispersion degree, formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these. In this paper, the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods. The experimental results showed that the improved mutu- al information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain. Through the introduction of the concept of concentration degree and dispersion degree, the improved mutual information feature selection method greatly improves the performance of text classification system.
文摘A database stores data in order to provide the user with information. However, how a database may achieve this is not always clear. The main reason for this seems that we, who are in the database community, have not fully understood and therefore clearly defined the notion of “the information that data in a database carry”, in other words, “the information content of data”. As a result, databases’ capability is limited in terms of answering queries, especially, when users explore information beyond the scope of data stored in a database, the database normally cannot provide it. The underlying reason of the problem is that queries are answered based on a direct match between a query and data (up to aggregations of the data). We observe that this is because the information that data carry is seen as exactly the data per se. To tackle this problem, we propose the notion of information content inclusion relation, and show that it formulates the intuitive notion of the “information content of data” and then show how this notion may be used for the derivation of information from data in a database.
基金jointly supported by the Foundation for Humanities and Social Sciences of the Chinese Ministry of Education(Grant No.10YJA870021)Center for Asia Research of Nankai University(Grant No.AS0917)
文摘Evaluating government openness is important in monitoring government performance and promoting government transparency. Therefore, it is necessary to develop an evaluation system for information openness of local governments. In order to select evaluation indicators, we conducted a content analysis on current evaluation systems constructed by researchers and local governments and the materials of a case study on a local government. This evaluation system is composed of 5 first-tier indicators, 30 secondtier indicators and 69 third-tier indicators. Then Delphi Method and Analytic Hierarchy Process(AHP) Method are adopted to determine the weight of each indicator. At last, the practicability of the system is tested by an evaluation of the local government of Tianjin Binhai New Area, which has been undergoing administrative reform and attempting to reinvent itself in the past 5 years.