In light of the rapid growth and development of social media, it has become the focus of interest in many different scientific fields. They seek to extract useful information from it, and this is called (knowledge), s...In light of the rapid growth and development of social media, it has become the focus of interest in many different scientific fields. They seek to extract useful information from it, and this is called (knowledge), such as extracting information related to people’s behaviors and interactions to analyze feelings or understand the behavior of users or groups, and many others. This extracted knowledge has a very important role in decision-making, creating and improving marketing objectives and competitive advantage, monitoring events, whether political or economic, and development in all fields. Therefore, to extract this knowledge, we need to analyze the vast amount of data found within social media using the most popular data mining techniques and applications related to social media sites.展开更多
In the face of intelligent manufacturing(or smart manufacturing)human resource shortage,the training of industrial engineers in the field of intelligent manufacturing is of great significance.In academia,the positive ...In the face of intelligent manufacturing(or smart manufacturing)human resource shortage,the training of industrial engineers in the field of intelligent manufacturing is of great significance.In academia,the positive link between learning transfer and knowledge innovation is recognized by most scholars,while the learner’s attitude toward big data decision-making,as a cognitive perception,affects learning transfer from the learner’s experienced engineering paradigm to the intelligent manufacturing paradigm.Thus,learning transfer can be regarded as a result of the learner’s attitude,and it becomes the intermediary state between their attitude and knowledge innovation.This paper reviews prior research on knowledge transfer and develops hypotheses on the relationships between learner acceptance attitude,knowledge transfer,and knowledge innovation.展开更多
Quality management is a constant and significant concern in enterprises.Effective determination of correct solutions for comprehensive problems helps avoid increased backtesting costs.This study proposes an intelligen...Quality management is a constant and significant concern in enterprises.Effective determination of correct solutions for comprehensive problems helps avoid increased backtesting costs.This study proposes an intelligent quality control method for manufacturing processes based on a human–cyber–physical(HCP)knowledge graph,which is a systematic method that encompasses the following elements:data management and classification based on HCP ternary data,HCP ontology construction,knowledge extraction for constructing an HCP knowledge graph,and comprehensive application of quality control based on HCP knowledge.The proposed method implements case retrieval,automatic analysis,and assisted decision making based on an HCP knowledge graph,enabling quality monitoring,inspection,diagnosis,and maintenance strategies for quality control.In practical applications,the proposed modular and hierarchical HCP ontology exhibits significant superiority in terms of shareability and reusability of the acquired knowledge.Moreover,the HCP knowledge graph deeply integrates the provided HCP data and effectively supports comprehensive decision making.The proposed method was implemented in cases involving an automotive production line and a gear manufacturing process,and the effectiveness of the method was verified by the application system deployed.Furthermore,the proposed method can be extended to other manufacturing process quality control tasks.展开更多
With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most...With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most elementary learning models of machine learning.Privacy issues surrounding SVM classifier training have attracted increasing attention.In this paper,we investigate Differential Privacy-compliant Federated Machine Learning with Dimensionality Reduction,called FedDPDR-DPML,which greatly improves data utility while providing strong privacy guarantees.Considering in distributed learning scenarios,multiple participants usually hold unbalanced or small amounts of data.Therefore,FedDPDR-DPML enables multiple participants to collaboratively learn a global model based on weighted model averaging and knowledge aggregation and then the server distributes the global model to each participant to improve local data utility.Aiming at high-dimensional data,we adopt differential privacy in both the principal component analysis(PCA)-based dimensionality reduction phase and SVM classifiers training phase,which improves model accuracy while achieving strict differential privacy protection.Besides,we train Differential privacy(DP)-compliant SVM classifiers by adding noise to the objective function itself,thus leading to better data utility.Extensive experiments on three high-dimensional datasets demonstrate that FedDPDR-DPML can achieve high accuracy while ensuring strong privacy protection.展开更多
With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. Fir...With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. First, some valid mining task schedules are generated, and then au tonomous and local mining are executed periodically, finally, previous results are merged and refined. The framework based on the model creates a communication mechanism to in corporate domain knowledge into continuous process through ontology service. The local and merge mining are transparent to the end user and heterogeneous data ,source by ontology. Experiments suggest that the framework should be useful in guiding the continuous mining process.展开更多
Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop...Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop's distributed system was selected to solve the storage problem of massive forestry big data and the memory-based Spark computing framework to realize real-time and fast processing of data.The forestry data contains a wealth of information,and mining this information is of great significance for guiding the development of forestry.We conducts co-word and cluster analyses on the keywords of forestry data,extracts the rules hidden in the data,analyzes the research hotspots more accurately,grasps the evolution trend of subject topics,and plays an important role in promoting the research and development of subject areas.The co-word analysis and clustering algorithm have important practical significance for the topic structure,research hotspot or development trend in the field of forestry research.Distributed storage framework and parallel computing have greatly improved the performance of data mining algorithms.Therefore,the forestry big data mining system by big data technology has important practical significance for promoting the development of intelligent forestry.展开更多
In order to realize the intelligent management of data mining (DM) domain knowledge, this paper presents an architecture for DM knowledge management based on ontology. Using ontology database, this architecture can ...In order to realize the intelligent management of data mining (DM) domain knowledge, this paper presents an architecture for DM knowledge management based on ontology. Using ontology database, this architecture can realize intelligent knowledge retrieval and automatic accomplishment of DM tasks by means of ontology services. Its key features include:①Describing DM ontology and meta-data using ontology based on Web ontology language (OWL).② Ontology reasoning function. Based on the existing concepts and relations, the hidden knowledge in ontology can be obtained using the reasoning engine. This paper mainly focuses on the construction of DM ontology and the reasoning of DM ontology based on OWL DL(s).展开更多
In the big data environment, enterprises must constantly assimilate big dataknowledge and private knowledge by multiple knowledge transfers to maintain theircompetitive advantage. The optimal time of knowledge transfe...In the big data environment, enterprises must constantly assimilate big dataknowledge and private knowledge by multiple knowledge transfers to maintain theircompetitive advantage. The optimal time of knowledge transfer is one of the mostimportant aspects to improve knowledge transfer efficiency. Based on the analysis of thecomplex characteristics of knowledge transfer in the big data environment, multipleknowledge transfers can be divided into two categories. One is the simultaneous transferof various types of knowledge, and the other one is multiple knowledge transfers atdifferent time points. Taking into consideration the influential factors, such as theknowledge type, knowledge structure, knowledge absorptive capacity, knowledge updaterate, discount rate, market share, profit contributions of each type of knowledge, transfercosts, product life cycle and so on, time optimization models of multiple knowledgetransfers in the big data environment are presented by maximizing the total discountedexpected profits (DEPs) of an enterprise. Some simulation experiments have beenperformed to verify the validity of the models, and the models can help enterprisesdetermine the optimal time of multiple knowledge transfer in the big data environment.展开更多
A decision model of knowledge transfer is presented on the basis of the characteristics of knowledge transfer in a big data environment.This model can determine the weight of knowledge transferred from another enterpr...A decision model of knowledge transfer is presented on the basis of the characteristics of knowledge transfer in a big data environment.This model can determine the weight of knowledge transferred from another enterprise or from a big data provider.Numerous simulation experiments are implemented to test the efficiency of the optimization model.Simulation experiment results show that when increasing the weight of knowledge from big data knowledge provider,the total discount expectation of profits will increase,and the transfer cost will be reduced.The calculated results are in accordance with the actual economic situation.The optimization model can provide useful decision support for enterprises in a big data environment.展开更多
To improve the efficiency of the attribute reduction, we present an attribute reduction algorithm based on background knowledge and information entropy by making use of background knowledge from research fields. Under...To improve the efficiency of the attribute reduction, we present an attribute reduction algorithm based on background knowledge and information entropy by making use of background knowledge from research fields. Under the condition of known background knowledge, the algorithm can not only greatly improve the efficiency of attribute reduction, but also avoid the defection of information entropy partial to attribute with much value. The experimental result verifies that the algorithm is effective. In the end, the algorithm produces better results when applied in the classification of the star spectra data.展开更多
Big data knowledge,such as customer demands and consumer preferences,is among the crucial external knowledge that firms need for new product development in the big data environment.Prior research has focused on the pr...Big data knowledge,such as customer demands and consumer preferences,is among the crucial external knowledge that firms need for new product development in the big data environment.Prior research has focused on the profit of big data knowledge providers rather than the profit and pricing schemes of knowledge recipients.This research addresses this theoretical gap and uses theoretical and numerical analysis to compare the profitability of two pricing schemes commonly used by knowledge recipients:subscription pricing and pay-per-use pricing.We find that:(1)the subscription price of big data knowledge has no effect on the optimal time of knowledge transaction in the same pricing scheme,but the usage ratio of the big data knowledge affects the optimal time of knowledge transaction,and the smaller the usage ratio of big data knowledge the earlier the big data knowledge transaction conducts;(2)big data knowledge with a higher update rate can bring greater profits to the firm both in subscription pricing scheme and pay-per-use pricing scheme;(3)a knowledge recipient will choose the knowledge that can bring a higher market share growth rate regardless of what price scheme it adopts,and firms can choose more efficient knowledge in the pay-per-use pricing scheme by adjusting the usage ratio of knowledge usage according to their economic conditions.The model and findings in this paper can help knowledge recipient firms select optimal pricing method and enhance future new product development performance.展开更多
The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from...The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.展开更多
This study was an attempt to examine the relationship between Iranian EFL (English as Foreign Language) learners' depth of vocabulary knowledge and their lexical inferencing strategy use and success. The participan...This study was an attempt to examine the relationship between Iranian EFL (English as Foreign Language) learners' depth of vocabulary knowledge and their lexical inferencing strategy use and success. The participants in this study were 32 intermediate senior students of Shiraz Azad University majoring in English translation. The Word Associates Test (WAT) (Read, 1993) was used to measure their depth of vocabulary knowledge and lexical inferencing strategy questionnaire which was part of Schmitt (1997), vocabulary learning strategies as well as a reading passage (Haastrup, 1999), cited in Nassaji (2004), were used to determine their lexical inferencing strategy use and success. Results indicate that: (1) Those who have stronger depth of vocabulary knowledge use certain strategies more frequently than those who have weaker depth of vocabulary knowledge; (2) The students with stronger depth of vocabulary knowledge make more effective use of certain types of lexical inferencing strategies. These findings support the importance of depth dimension of vocabulary knowledge and the role it plays in guessing the meaning of unknown words.展开更多
With market competition becoming fiercer,enterprises must update their products by constantly assimilating new big data knowledge and private knowledge to maintain their market shares at different time points in the b...With market competition becoming fiercer,enterprises must update their products by constantly assimilating new big data knowledge and private knowledge to maintain their market shares at different time points in the big data environment.Typically,there is mutual influence between each knowledge transfer if the time interval is not too long.It is necessary to study the problem of continuous knowledge transfer in the big data environment.Based on research on one-time knowledge transfer,a model of continuous knowledge transfer is presented,which can consider the interaction between knowledge transfer and determine the optimal knowledge transfer time at different time points in the big data environment.Simulation experiments were performed by adjusting several parameters.The experimental results verified the model’s validity and facilitated conclusions regarding their practical application values.The experimental results can provide more effective decisions for enterprises that must carry out continuous knowledge transfer in the big data environment.展开更多
Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the presen...Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.展开更多
It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in...It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in Data (KDD) model was developed to address this issue. The framework transfers existing multidimensional historical data from completed projects into useful knowledge for future projects. The model starts by understanding the problem domain, industrial construction projects. The second step is analyzing the problem data and its multiple dimensions. The target dataset is the labour resources data generated while managing industrial construction projects. The next step is developing the data collection model and prototype data ware-house. The data warehouse stores collected data in a ready-for-mining format and produces dynamic On Line Analytical Processing (OLAP) reports and graphs. Data was collected from a large western-Canadian structural steel fabricator to prove the applicability of the developed methodology. The proposed framework was applied to three different case studies to validate the applicability of the developed framework to real projects data.展开更多
In order to compete in the global manufacturing mar ke t, agility is the only possible solution to response to the fragmented market se gments and frequently changed customer requirements. However, manufacturing agil ...In order to compete in the global manufacturing mar ke t, agility is the only possible solution to response to the fragmented market se gments and frequently changed customer requirements. However, manufacturing agil ity can only be attained through the deployment of knowledge. To embed knowledge into a CAD system to form a knowledge intensive CAD (KIC) system is one of way to enhance the design compatibility of a manufacturing company. The most difficu lt phase to develop a KIC system is to capitalize a huge amount of legacy data t o form a knowledge database. In the past, such capitalization process could only be done solely manually or semi-automatic. In this paper, a five step model fo r automatic design knowledge capitalization through the use of data mining is pr oposed whilst details of how to select, verify and performance benchmarking an a ppropriate data mining algorithm for a specific design task will also be discuss ed. A case study concerning the design of a plastic toaster casing was used as an illustration for the proposed methodology and it was found that the avera ge absolute error of the predictions for the most appropriate algorithm is withi n 17%.展开更多
In this paper, we propose a flexible knowledge representation framework which utilizes Symbolic Regression to learn and mathematical expressions to represent the knowledge to be captured from data. In this approach, l...In this paper, we propose a flexible knowledge representation framework which utilizes Symbolic Regression to learn and mathematical expressions to represent the knowledge to be captured from data. In this approach, learning algorithms are used to generate new insights which can be added to domain knowledge bases supporting again symbolic regression. This is used for the generalization of the well-known regression analysis to fulfill supervised classification. The approach aims to produce a learning model which best separates the class members of a labeled training set. The class boundaries are given by a separation surface which is represented by the level set of a model function. The separation boundary is defined by the respective equation. In our symbolic approach, the learned knowledge model is represented by mathematical formulas and it is composed of an optimum set of expressions of a given superset. We show that this property gives human experts options to gain additional insights into the application domain. Furthermore, the representation in terms of mathematical formulas (e.g., the analytical model and its first and second derivative) adds additional value to the classifier and enables to answer questions, which sub-symbolic classifier approaches cannot. The symbolic representation of the models enables an interpretation by human experts. Existing and previously known expert knowledge can be added to the developed knowledge representation framework or it can be used as constraints. Additionally, the knowledge acquisition framework can be repeated several times. In each step, new insights from the search process can be added to the knowledge base to improve the overall performance of the proposed learning algorithms.展开更多
In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantag...In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management.展开更多
Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as...Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as the study of the development and use of advanced information technologies and systems for national and international security-related applications. The First and Second Symposiums on ISI were held in Tucson,Arizona,in 2003 and 2004,respectively. In 2005,the IEEE International Conference on ISI was held in Atlanta,Georgia. These ISI conferences have brought together academic researchers,law enforcement and intelligence experts,information technology consultant and practitioners to discuss their research and practice related to various ISI topics including ISI data management,data and text mining for ISI applications,terrorism informatics,deception detection,terrorist and criminal social network analysis,crime analysis,monitoring and surveillance,policy studies and evaluation,information assurance,among others. We continue this stream of ISI conferences by organizing the Workshop on Intelligence and Security Informatics (WISI’06) in conjunction with the Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06). WISI’06 will provide a stimulating forum for ISI researchers in Pacific Asia and other regions of the world to exchange ideas and report research progress. The workshop also welcomes contributions dealing with ISI challenges specific to the Pacific Asian region.展开更多
文摘In light of the rapid growth and development of social media, it has become the focus of interest in many different scientific fields. They seek to extract useful information from it, and this is called (knowledge), such as extracting information related to people’s behaviors and interactions to analyze feelings or understand the behavior of users or groups, and many others. This extracted knowledge has a very important role in decision-making, creating and improving marketing objectives and competitive advantage, monitoring events, whether political or economic, and development in all fields. Therefore, to extract this knowledge, we need to analyze the vast amount of data found within social media using the most popular data mining techniques and applications related to social media sites.
基金Natural Science Foundation of Inner Mongolia(Project No.2023LHMS07016)the Fundamental Research Fund for the directly affiliated university of Inner Mongolia(Project No.2022 JBQN056)。
文摘In the face of intelligent manufacturing(or smart manufacturing)human resource shortage,the training of industrial engineers in the field of intelligent manufacturing is of great significance.In academia,the positive link between learning transfer and knowledge innovation is recognized by most scholars,while the learner’s attitude toward big data decision-making,as a cognitive perception,affects learning transfer from the learner’s experienced engineering paradigm to the intelligent manufacturing paradigm.Thus,learning transfer can be regarded as a result of the learner’s attitude,and it becomes the intermediary state between their attitude and knowledge innovation.This paper reviews prior research on knowledge transfer and develops hypotheses on the relationships between learner acceptance attitude,knowledge transfer,and knowledge innovation.
基金supported by the National Science and Technology Innovation 2030 of China Next-Generation Artificial Intelligence Major Project(2018AAA0101800)the National Natural Science Foundation of China(52375482)the Regional Innovation Cooperation Project of Sichuan Province(2023YFQ0019).
文摘Quality management is a constant and significant concern in enterprises.Effective determination of correct solutions for comprehensive problems helps avoid increased backtesting costs.This study proposes an intelligent quality control method for manufacturing processes based on a human–cyber–physical(HCP)knowledge graph,which is a systematic method that encompasses the following elements:data management and classification based on HCP ternary data,HCP ontology construction,knowledge extraction for constructing an HCP knowledge graph,and comprehensive application of quality control based on HCP knowledge.The proposed method implements case retrieval,automatic analysis,and assisted decision making based on an HCP knowledge graph,enabling quality monitoring,inspection,diagnosis,and maintenance strategies for quality control.In practical applications,the proposed modular and hierarchical HCP ontology exhibits significant superiority in terms of shareability and reusability of the acquired knowledge.Moreover,the HCP knowledge graph deeply integrates the provided HCP data and effectively supports comprehensive decision making.The proposed method was implemented in cases involving an automotive production line and a gear manufacturing process,and the effectiveness of the method was verified by the application system deployed.Furthermore,the proposed method can be extended to other manufacturing process quality control tasks.
基金supported in part by National Natural Science Foundation of China(Nos.62102311,62202377,62272385)in part by Natural Science Basic Research Program of Shaanxi(Nos.2022JQ-600,2022JM-353,2023-JC-QN-0327)+2 种基金in part by Shaanxi Distinguished Youth Project(No.2022JC-47)in part by Scientific Research Program Funded by Shaanxi Provincial Education Department(No.22JK0560)in part by Distinguished Youth Talents of Shaanxi Universities,and in part by Youth Innovation Team of Shaanxi Universities.
文摘With the widespread data collection and processing,privacy-preserving machine learning has become increasingly important in addressing privacy risks related to individuals.Support vector machine(SVM)is one of the most elementary learning models of machine learning.Privacy issues surrounding SVM classifier training have attracted increasing attention.In this paper,we investigate Differential Privacy-compliant Federated Machine Learning with Dimensionality Reduction,called FedDPDR-DPML,which greatly improves data utility while providing strong privacy guarantees.Considering in distributed learning scenarios,multiple participants usually hold unbalanced or small amounts of data.Therefore,FedDPDR-DPML enables multiple participants to collaboratively learn a global model based on weighted model averaging and knowledge aggregation and then the server distributes the global model to each participant to improve local data utility.Aiming at high-dimensional data,we adopt differential privacy in both the principal component analysis(PCA)-based dimensionality reduction phase and SVM classifiers training phase,which improves model accuracy while achieving strict differential privacy protection.Besides,we train Differential privacy(DP)-compliant SVM classifiers by adding noise to the objective function itself,thus leading to better data utility.Extensive experiments on three high-dimensional datasets demonstrate that FedDPDR-DPML can achieve high accuracy while ensuring strong privacy protection.
基金Supported by the National Natural Science Foun-dation of China (60173058 ,70372024)
文摘With the explosive growth of data available, there is an urgent need to develop continuous data mining which reduces manual interaction evidently. A novel model for data mining is proposed in evolving environment. First, some valid mining task schedules are generated, and then au tonomous and local mining are executed periodically, finally, previous results are merged and refined. The framework based on the model creates a communication mechanism to in corporate domain knowledge into continuous process through ontology service. The local and merge mining are transparent to the end user and heterogeneous data ,source by ontology. Experiments suggest that the framework should be useful in guiding the continuous mining process.
基金grants from the Fundamental Research Funds for the Central Universities(Grant No.2572018BH02)Special Funds for Scientific Research in the Forestry Public Welfare Industry(Grant Nos.201504307-03)。
文摘Using the advantages of web crawlers in data collection and distributed storage technologies,we accessed to a wealth of forestry-related data.Combined with the mature big data technology at its present stage,Hadoop's distributed system was selected to solve the storage problem of massive forestry big data and the memory-based Spark computing framework to realize real-time and fast processing of data.The forestry data contains a wealth of information,and mining this information is of great significance for guiding the development of forestry.We conducts co-word and cluster analyses on the keywords of forestry data,extracts the rules hidden in the data,analyzes the research hotspots more accurately,grasps the evolution trend of subject topics,and plays an important role in promoting the research and development of subject areas.The co-word analysis and clustering algorithm have important practical significance for the topic structure,research hotspot or development trend in the field of forestry research.Distributed storage framework and parallel computing have greatly improved the performance of data mining algorithms.Therefore,the forestry big data mining system by big data technology has important practical significance for promoting the development of intelligent forestry.
基金the Natural Science Foundation of Chongqing (CSTC2005BB2190)
文摘In order to realize the intelligent management of data mining (DM) domain knowledge, this paper presents an architecture for DM knowledge management based on ontology. Using ontology database, this architecture can realize intelligent knowledge retrieval and automatic accomplishment of DM tasks by means of ontology services. Its key features include:①Describing DM ontology and meta-data using ontology based on Web ontology language (OWL).② Ontology reasoning function. Based on the existing concepts and relations, the hidden knowledge in ontology can be obtained using the reasoning engine. This paper mainly focuses on the construction of DM ontology and the reasoning of DM ontology based on OWL DL(s).
基金supported by the National Natural Science Foundation ofChina (Grant No. 71704016,71331008, 71402010)the Natural Science Foundation of HunanProvince (Grant No. 2017JJ2267)+1 种基金the Educational Economy and Financial Research Base ofHunan Province (Grant No. 13JCJA2)the Project of China Scholarship Council forOverseas Studies (201508430121, 201208430233).
文摘In the big data environment, enterprises must constantly assimilate big dataknowledge and private knowledge by multiple knowledge transfers to maintain theircompetitive advantage. The optimal time of knowledge transfer is one of the mostimportant aspects to improve knowledge transfer efficiency. Based on the analysis of thecomplex characteristics of knowledge transfer in the big data environment, multipleknowledge transfers can be divided into two categories. One is the simultaneous transferof various types of knowledge, and the other one is multiple knowledge transfers atdifferent time points. Taking into consideration the influential factors, such as theknowledge type, knowledge structure, knowledge absorptive capacity, knowledge updaterate, discount rate, market share, profit contributions of each type of knowledge, transfercosts, product life cycle and so on, time optimization models of multiple knowledgetransfers in the big data environment are presented by maximizing the total discountedexpected profits (DEPs) of an enterprise. Some simulation experiments have beenperformed to verify the validity of the models, and the models can help enterprisesdetermine the optimal time of multiple knowledge transfer in the big data environment.
基金supported by NSFC(Grant No.71373032)the Natural Science Foundation of Hunan Province(Grant No.12JJ4073)+3 种基金the Scientific Research Fund of Hunan Provincial Education Department(Grant No.11C0029)the Educational Economy and Financial Research Base of Hunan Province(Grant No.13JCJA2)the Project of China Scholarship Council for Overseas Studies(201208430233201508430121)
文摘A decision model of knowledge transfer is presented on the basis of the characteristics of knowledge transfer in a big data environment.This model can determine the weight of knowledge transferred from another enterprise or from a big data provider.Numerous simulation experiments are implemented to test the efficiency of the optimization model.Simulation experiment results show that when increasing the weight of knowledge from big data knowledge provider,the total discount expectation of profits will increase,and the transfer cost will be reduced.The calculated results are in accordance with the actual economic situation.The optimization model can provide useful decision support for enterprises in a big data environment.
基金Supported by the National Natural Science Foundation of China(No. 60573075), the National High Technology Research and Development Program of China (No. 2003AA133060) and the Natural Science Foundation of Shanxi Province (No. 200601104).
文摘To improve the efficiency of the attribute reduction, we present an attribute reduction algorithm based on background knowledge and information entropy by making use of background knowledge from research fields. Under the condition of known background knowledge, the algorithm can not only greatly improve the efficiency of attribute reduction, but also avoid the defection of information entropy partial to attribute with much value. The experimental result verifies that the algorithm is effective. In the end, the algorithm produces better results when applied in the classification of the star spectra data.
基金This research was funded by(the National Natural Science Foundation of China)Grant Number(71704016),(the Key Scientific Research Fund of Hunan Provincial Education Department of China)Grant Number(19A006),and(the Enterprise Strategic Management and Investment Decision Research Base of Hunan Province)Grant Number(19qyzd03).
文摘Big data knowledge,such as customer demands and consumer preferences,is among the crucial external knowledge that firms need for new product development in the big data environment.Prior research has focused on the profit of big data knowledge providers rather than the profit and pricing schemes of knowledge recipients.This research addresses this theoretical gap and uses theoretical and numerical analysis to compare the profitability of two pricing schemes commonly used by knowledge recipients:subscription pricing and pay-per-use pricing.We find that:(1)the subscription price of big data knowledge has no effect on the optimal time of knowledge transaction in the same pricing scheme,but the usage ratio of the big data knowledge affects the optimal time of knowledge transaction,and the smaller the usage ratio of big data knowledge the earlier the big data knowledge transaction conducts;(2)big data knowledge with a higher update rate can bring greater profits to the firm both in subscription pricing scheme and pay-per-use pricing scheme;(3)a knowledge recipient will choose the knowledge that can bring a higher market share growth rate regardless of what price scheme it adopts,and firms can choose more efficient knowledge in the pay-per-use pricing scheme by adjusting the usage ratio of knowledge usage according to their economic conditions.The model and findings in this paper can help knowledge recipient firms select optimal pricing method and enhance future new product development performance.
文摘The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.
文摘This study was an attempt to examine the relationship between Iranian EFL (English as Foreign Language) learners' depth of vocabulary knowledge and their lexical inferencing strategy use and success. The participants in this study were 32 intermediate senior students of Shiraz Azad University majoring in English translation. The Word Associates Test (WAT) (Read, 1993) was used to measure their depth of vocabulary knowledge and lexical inferencing strategy questionnaire which was part of Schmitt (1997), vocabulary learning strategies as well as a reading passage (Haastrup, 1999), cited in Nassaji (2004), were used to determine their lexical inferencing strategy use and success. Results indicate that: (1) Those who have stronger depth of vocabulary knowledge use certain strategies more frequently than those who have weaker depth of vocabulary knowledge; (2) The students with stronger depth of vocabulary knowledge make more effective use of certain types of lexical inferencing strategies. These findings support the importance of depth dimension of vocabulary knowledge and the role it plays in guessing the meaning of unknown words.
基金supported by the National Natural Science Foundation of China(Grant No.71704016,71331008)the Natural Science Foundation of Hunan Province(Grant No.2017JJ2267)+1 种基金Key Projects of Chinese Ministry of Education(17JZD022)the Project of China Scholarship Council for Overseas Studies(201208430233,201508430121),which are acknowledged.
文摘With market competition becoming fiercer,enterprises must update their products by constantly assimilating new big data knowledge and private knowledge to maintain their market shares at different time points in the big data environment.Typically,there is mutual influence between each knowledge transfer if the time interval is not too long.It is necessary to study the problem of continuous knowledge transfer in the big data environment.Based on research on one-time knowledge transfer,a model of continuous knowledge transfer is presented,which can consider the interaction between knowledge transfer and determine the optimal knowledge transfer time at different time points in the big data environment.Simulation experiments were performed by adjusting several parameters.The experimental results verified the model’s validity and facilitated conclusions regarding their practical application values.The experimental results can provide more effective decisions for enterprises that must carry out continuous knowledge transfer in the big data environment.
文摘Scholarly communication of knowledge is predominantly document-based in digital repositories,and researchers find it tedious to automatically capture and process the semantics among related articles.Despite the present digital era of big data,there is a lack of visual representations of the knowledge present in scholarly articles,and a time-saving approach for a literature search and visual navigation is warranted.The majority of knowledge display tools cannot cope with current big data trends and pose limitations in meeting the requirements of automatic knowledge representation,storage,and dynamic visualization.To address this limitation,the main aim of this paper is to model the visualization of unstructured data and explore the feasibility of achieving visual navigation for researchers to gain insight into the knowledge hidden in scientific articles of digital repositories.Contemporary topics of research and practice,including modifiable risk factors leading to a dramatic increase in Alzheimer’s disease and other forms of dementia,warrant deeper insight into the evidence-based knowledge available in the literature.The goal is to provide researchers with a visual-based easy traversal through a digital repository of research articles.This paper takes the first step in proposing a novel integrated model using knowledge maps and next-generation graph datastores to achieve a semantic visualization with domain-specific knowledge,such as dementia risk factors.The model facilitates a deep conceptual understanding of the literature by automatically establishing visual relationships among the extracted knowledge from the big data resources of research articles.It also serves as an automated tool for a visual navigation through the knowledge repository for faster identification of dementia risk factors reported in scholarly articles.Further,it facilitates a semantic visualization and domain-specific knowledge discovery from a large digital repository and their associations.In this study,the implementation of the proposed model in the Neo4j graph data repository,along with the results achieved,is presented as a proof of concept.Using scholarly research articles on dementia risk factors as a case study,automatic knowledge extraction,storage,intelligent search,and visual navigation are illustrated.The implementation of contextual knowledge and its relationship for a visual exploration by researchers show promising results in the knowledge discovery of dementia risk factors.Overall,this study demonstrates the significance of a semantic visualization with the effective use of knowledge maps and paves the way for extending visual modeling capabilities in the future.
文摘It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in Data (KDD) model was developed to address this issue. The framework transfers existing multidimensional historical data from completed projects into useful knowledge for future projects. The model starts by understanding the problem domain, industrial construction projects. The second step is analyzing the problem data and its multiple dimensions. The target dataset is the labour resources data generated while managing industrial construction projects. The next step is developing the data collection model and prototype data ware-house. The data warehouse stores collected data in a ready-for-mining format and produces dynamic On Line Analytical Processing (OLAP) reports and graphs. Data was collected from a large western-Canadian structural steel fabricator to prove the applicability of the developed methodology. The proposed framework was applied to three different case studies to validate the applicability of the developed framework to real projects data.
文摘In order to compete in the global manufacturing mar ke t, agility is the only possible solution to response to the fragmented market se gments and frequently changed customer requirements. However, manufacturing agil ity can only be attained through the deployment of knowledge. To embed knowledge into a CAD system to form a knowledge intensive CAD (KIC) system is one of way to enhance the design compatibility of a manufacturing company. The most difficu lt phase to develop a KIC system is to capitalize a huge amount of legacy data t o form a knowledge database. In the past, such capitalization process could only be done solely manually or semi-automatic. In this paper, a five step model fo r automatic design knowledge capitalization through the use of data mining is pr oposed whilst details of how to select, verify and performance benchmarking an a ppropriate data mining algorithm for a specific design task will also be discuss ed. A case study concerning the design of a plastic toaster casing was used as an illustration for the proposed methodology and it was found that the avera ge absolute error of the predictions for the most appropriate algorithm is withi n 17%.
文摘In this paper, we propose a flexible knowledge representation framework which utilizes Symbolic Regression to learn and mathematical expressions to represent the knowledge to be captured from data. In this approach, learning algorithms are used to generate new insights which can be added to domain knowledge bases supporting again symbolic regression. This is used for the generalization of the well-known regression analysis to fulfill supervised classification. The approach aims to produce a learning model which best separates the class members of a labeled training set. The class boundaries are given by a separation surface which is represented by the level set of a model function. The separation boundary is defined by the respective equation. In our symbolic approach, the learned knowledge model is represented by mathematical formulas and it is composed of an optimum set of expressions of a given superset. We show that this property gives human experts options to gain additional insights into the application domain. Furthermore, the representation in terms of mathematical formulas (e.g., the analytical model and its first and second derivative) adds additional value to the classifier and enables to answer questions, which sub-symbolic classifier approaches cannot. The symbolic representation of the models enables an interpretation by human experts. Existing and previously known expert knowledge can be added to the developed knowledge representation framework or it can be used as constraints. Additionally, the knowledge acquisition framework can be repeated several times. In each step, new insights from the search process can be added to the knowledge base to improve the overall performance of the proposed learning algorithms.
文摘In this paper, we propose a rule management system for data cleaning that is based on knowledge. This system combines features of both rule based systems and rule based data cleaning frameworks. The important advantages of our system are threefold. First, it aims at proposing a strong and unified rule form based on first order structure that permits the representation and management of all the types of rules and their quality via some characteristics. Second, it leads to increase the quality of rules which conditions the quality of data cleaning. Third, it uses an appropriate knowledge acquisition process, which is the weakest task in the current rule and knowledge based systems. As several research works have shown that data cleaning is rather driven by domain knowledge than by data, we have identified and analyzed the properties that distinguish knowledge and rules from data for better determining the most components of the proposed system. In order to illustrate our system, we also present a first experiment with a case study at health sector where we demonstrate how the system is useful for the improvement of data quality. The autonomy, extensibility and platform-independency of the proposed rule management system facilitate its incorporation in any system that is interested in data quality management.
文摘Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as the study of the development and use of advanced information technologies and systems for national and international security-related applications. The First and Second Symposiums on ISI were held in Tucson,Arizona,in 2003 and 2004,respectively. In 2005,the IEEE International Conference on ISI was held in Atlanta,Georgia. These ISI conferences have brought together academic researchers,law enforcement and intelligence experts,information technology consultant and practitioners to discuss their research and practice related to various ISI topics including ISI data management,data and text mining for ISI applications,terrorism informatics,deception detection,terrorist and criminal social network analysis,crime analysis,monitoring and surveillance,policy studies and evaluation,information assurance,among others. We continue this stream of ISI conferences by organizing the Workshop on Intelligence and Security Informatics (WISI’06) in conjunction with the Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06). WISI’06 will provide a stimulating forum for ISI researchers in Pacific Asia and other regions of the world to exchange ideas and report research progress. The workshop also welcomes contributions dealing with ISI challenges specific to the Pacific Asian region.