The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from...The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.展开更多
Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as...Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as the study of the development and use of advanced information technologies and systems for national and international security-related applications. The First and Second Symposiums on ISI were held in Tucson,Arizona,in 2003 and 2004,respectively. In 2005,the IEEE International Conference on ISI was held in Atlanta,Georgia. These ISI conferences have brought together academic researchers,law enforcement and intelligence experts,information technology consultant and practitioners to discuss their research and practice related to various ISI topics including ISI data management,data and text mining for ISI applications,terrorism informatics,deception detection,terrorist and criminal social network analysis,crime analysis,monitoring and surveillance,policy studies and evaluation,information assurance,among others. We continue this stream of ISI conferences by organizing the Workshop on Intelligence and Security Informatics (WISI’06) in conjunction with the Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06). WISI’06 will provide a stimulating forum for ISI researchers in Pacific Asia and other regions of the world to exchange ideas and report research progress. The workshop also welcomes contributions dealing with ISI challenges specific to the Pacific Asian region.展开更多
It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in...It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in Data (KDD) model was developed to address this issue. The framework transfers existing multidimensional historical data from completed projects into useful knowledge for future projects. The model starts by understanding the problem domain, industrial construction projects. The second step is analyzing the problem data and its multiple dimensions. The target dataset is the labour resources data generated while managing industrial construction projects. The next step is developing the data collection model and prototype data ware-house. The data warehouse stores collected data in a ready-for-mining format and produces dynamic On Line Analytical Processing (OLAP) reports and graphs. Data was collected from a large western-Canadian structural steel fabricator to prove the applicability of the developed methodology. The proposed framework was applied to three different case studies to validate the applicability of the developed framework to real projects data.展开更多
A large number of ontologies have been introduced by the biomedical community in recent years. Knowledge discovery for entity identification from ontology has become an important research area, and it is always intere...A large number of ontologies have been introduced by the biomedical community in recent years. Knowledge discovery for entity identification from ontology has become an important research area, and it is always interesting to discovery how associations are established to connect concepts in a single ontology or across multiple ontologies. However, due to the exponential growth of biomedical big data and their complicated associations, it becomes very challenging to detect key associations among entities in an inefficient dynamic manner. Therefore, there exists a gap between the increasing needs for association detection and large volume of biomedical ontologies. In this paper, to bridge this gap, we presented a knowledge discovery framework, the BioBroker, for grouping entities to facilitate the process of biomedical knowledge discovery in an intelligent way. Specifically, we developed an innovative knowledge discovery algorithm that combines a graph clustering method and an indexing technique to discovery knowledge patterns over a set of interlinked data sources in an efficient way. We have demonstrated capabilities of the BioBroker for query execution with a use case study on a subset of the Bio2RDF life science linked data.展开更多
There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mi...There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.展开更多
Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. On...Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. One approach is to summarize large datasets in such a way that the resulting summary dataset is of manageable size. Histogram has received significant attention as summarization/representative object for large database. But, it suffers from computational and space complexity. In this paper, we propose an idea to transform the histogram object into a Piecewise Linear Regression (PLR) line object and suggest that PLR objects can be less computational and storage intensive while compared to those of histograms. On the other hand to carry out a cluster analysis, we propose a distance measure for computing the distance between the PLR lines. Case study is presented based on the real data of online education system LMS. This demonstrates that PLR is a powerful knowledge representative for very large database.展开更多
Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimen...Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimensional table which was well suited for organizing and analyzing the data in ancient Chinese books on Materia Medica. Moreover, we demonstrated its capability of facilitating further mining works in TCM through two illustrative studies of discovering meaningful patterns in the three-dimensional table of Shennong’s Classic of Materia Medica. This work might provide an appropriate data model for the development of knowledge discovery in TCM.展开更多
Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part c...Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part constructed with a KDD subsystem, is put forward to make a decision for a large scale engineering project. A typical CBR system consists of four parts: case representation, case retriever, evaluation, and adaptation. A case library is a set of parameterized excellent and successful structures. For a structural choice, the key point is that the system must be able to detect the pattern classes hidden in the case library and classify the input parameters into classes properly. That is done by using the KDD Data Mining algorithm based on Self Organizing Feature Maps (SOFM), which makes the whole system more adaptive, self organizing, self learning and open.展开更多
This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes...This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects. We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table (SUIT). In theory, SUIT records all information contained in the studied objects, but in reality, because of the complexity and varieties of spatial relations, only those factors of interest to us are selected. In order to find out the comprehensive knowledge from spatial databases, an efficient comprehensive knowledge discovery algorithm called recycled algorithm (RAR) is suggested.展开更多
With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data ...With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.展开更多
Tsinghua Science and Technology is founded and published since 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date ...Tsinghua Science and Technology is founded and published since 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, and other information technology fields. It is indexed by Ei and other abstracting and indexing services. From 2013, the journal commits to the open access at IEEE Xplore Digital Library.展开更多
It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and s...It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and surveys applications of this technology in the telecommunications sector all over the world. It also discusses some possible applications of this technology in China, and reports a preliminary result of the first attempt to apply KDD technique in telephone traffic volume prediction. It concludes that KDD is a promising technology that can help to enhance-the competitiveness of China's telecom companies in the face of looming competition in a liberated market.展开更多
An integrated solution for discovery of literature information knowledge is proposed. The analytic model of literature Information model and discovery of literature information knowledge are illustrated. Practical ill...An integrated solution for discovery of literature information knowledge is proposed. The analytic model of literature Information model and discovery of literature information knowledge are illustrated. Practical illustrative example for discovery of literature information knowledge is given.展开更多
为研究地质学领域的大数据和人工智能研究现状、热点和前沿,在中国知网(CNKI)核心期刊和Web of Science(WoS)核心数据库收集了2000—2022年相关中文文献3600篇、英文文献1803篇,利用社区结构分析软件CiteSpace,从合作作者、研究国家、...为研究地质学领域的大数据和人工智能研究现状、热点和前沿,在中国知网(CNKI)核心期刊和Web of Science(WoS)核心数据库收集了2000—2022年相关中文文献3600篇、英文文献1803篇,利用社区结构分析软件CiteSpace,从合作作者、研究国家、研究机构、关键词聚类、关键词时空分布图谱等进行可视化分析,并统计了2021—2022年间,地质学领域国际顶级期刊(综合影响因子10以上)的文献进行前沿分析。分析结果表明,近10年内该研究领域全球累计发文量激增,以中国为代表的亚洲国家和以美国为代表的欧美国家研究为主,双方累计发文量相差不大,论文中介中心性欧美国家普遍较高。我国研究机构之间的交流合作居多,与国外的研究机构交流合作较少,国外研究机构则与之相反。该领域以应用机器学习类方法、知识图谱构建等,在地质灾害防治、地震解释、石油与天然气勘查、固体矿产资源预测等方向进行的科学研究为研究热点,以深度学习、集成学习、智能平台搭建等为手段的地球演化过程中的重大地质事件研究、全球性气候变化、极地及海洋地质研究、数字地质建模及定量分析、地震预报、地灾易发性精准评估等为研究前沿。展开更多
Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better ...Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better understanding of Chinese herbal property theory (CHPT), this paper performed an improved association rule learning to analyze semistructured text in the book entitled Shennong's Classic of Materia Medica. The text was firstly annotated and transformed to well-structured multidimensional data. Subsequently, an Apriori algorithm was employed for producing association rules after the sensitivity analysis of parameters. From the confirmed 120 resulting rules that described the intrinsic relationships between herbal property (qi, flavor and their combinations) and herbal efficacy, two novel fundamental principles underlying CHPT were acquired and further elucidated: (1) the many-to-one mapping of herbal efficacy to herbal property; (2) the nonrandom overlap between the related efficacy of qi and flavor. This work provided an innovative knowledge about CHPT, which would be helpful for its modern research.展开更多
文摘The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library (NSL), Chinese Academy of Sciences (CAS) in Beijing from June 19 till June 22, 2016. The Conference was opened by NSL Director Xiangyang Huang, who placed the event within the goals of the Library, and lauded the spirit of intemational collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The Conference was organized by the Journal of Data andlnformation Science (JDIS) to bring the Joumal to the attention of an international and local audience.
文摘Important Dates Submission due November 15, 2005 Notification of acceptance December 30, 2005 Camera-ready copy due January 10, 2006 Workshop Scope Intelligence and Security Informatics (ISI) can be broadly defined as the study of the development and use of advanced information technologies and systems for national and international security-related applications. The First and Second Symposiums on ISI were held in Tucson,Arizona,in 2003 and 2004,respectively. In 2005,the IEEE International Conference on ISI was held in Atlanta,Georgia. These ISI conferences have brought together academic researchers,law enforcement and intelligence experts,information technology consultant and practitioners to discuss their research and practice related to various ISI topics including ISI data management,data and text mining for ISI applications,terrorism informatics,deception detection,terrorist and criminal social network analysis,crime analysis,monitoring and surveillance,policy studies and evaluation,information assurance,among others. We continue this stream of ISI conferences by organizing the Workshop on Intelligence and Security Informatics (WISI’06) in conjunction with the Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06). WISI’06 will provide a stimulating forum for ISI researchers in Pacific Asia and other regions of the world to exchange ideas and report research progress. The workshop also welcomes contributions dealing with ISI challenges specific to the Pacific Asian region.
文摘It is common in industrial construction projects for data to be collected and discarded without being analyzed to extract useful knowledge. A proposed integrated methodology based on a five-step Knowledge Discovery in Data (KDD) model was developed to address this issue. The framework transfers existing multidimensional historical data from completed projects into useful knowledge for future projects. The model starts by understanding the problem domain, industrial construction projects. The second step is analyzing the problem data and its multiple dimensions. The target dataset is the labour resources data generated while managing industrial construction projects. The next step is developing the data collection model and prototype data ware-house. The data warehouse stores collected data in a ready-for-mining format and produces dynamic On Line Analytical Processing (OLAP) reports and graphs. Data was collected from a large western-Canadian structural steel fabricator to prove the applicability of the developed methodology. The proposed framework was applied to three different case studies to validate the applicability of the developed framework to real projects data.
文摘A large number of ontologies have been introduced by the biomedical community in recent years. Knowledge discovery for entity identification from ontology has become an important research area, and it is always interesting to discovery how associations are established to connect concepts in a single ontology or across multiple ontologies. However, due to the exponential growth of biomedical big data and their complicated associations, it becomes very challenging to detect key associations among entities in an inefficient dynamic manner. Therefore, there exists a gap between the increasing needs for association detection and large volume of biomedical ontologies. In this paper, to bridge this gap, we presented a knowledge discovery framework, the BioBroker, for grouping entities to facilitate the process of biomedical knowledge discovery in an intelligent way. Specifically, we developed an innovative knowledge discovery algorithm that combines a graph clustering method and an indexing technique to discovery knowledge patterns over a set of interlinked data sources in an efficient way. We have demonstrated capabilities of the BioBroker for query execution with a use case study on a subset of the Bio2RDF life science linked data.
文摘There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.
文摘Recent developments in database technology have seen a wide variety of data being stored in huge collections. The wide variety makes the analysis tasks of a generic database a strenuous task in knowledge discovery. One approach is to summarize large datasets in such a way that the resulting summary dataset is of manageable size. Histogram has received significant attention as summarization/representative object for large database. But, it suffers from computational and space complexity. In this paper, we propose an idea to transform the histogram object into a Piecewise Linear Regression (PLR) line object and suggest that PLR objects can be less computational and storage intensive while compared to those of histograms. On the other hand to carry out a cluster analysis, we propose a distance measure for computing the distance between the PLR lines. Case study is presented based on the real data of online education system LMS. This demonstrates that PLR is a powerful knowledge representative for very large database.
文摘Knowledge discovery, as an increasingly adopted information technology in biomedical science, has shown great promise in the field of Traditional Chinese Medicine (TCM). In this paper, we provided a kind of multidimensional table which was well suited for organizing and analyzing the data in ancient Chinese books on Materia Medica. Moreover, we demonstrated its capability of facilitating further mining works in TCM through two illustrative studies of discovering meaningful patterns in the three-dimensional table of Shennong’s Classic of Materia Medica. This work might provide an appropriate data model for the development of knowledge discovery in TCM.
文摘Structural choice is a significant decision having an important influence on structural function, social economics, structural reliability and construction cost. A Case Based Reasoning system with its retrieval part constructed with a KDD subsystem, is put forward to make a decision for a large scale engineering project. A typical CBR system consists of four parts: case representation, case retriever, evaluation, and adaptation. A case library is a set of parameterized excellent and successful structures. For a structural choice, the key point is that the system must be able to detect the pattern classes hidden in the case library and classify the input parameters into classes properly. That is done by using the KDD Data Mining algorithm based on Self Organizing Feature Maps (SOFM), which makes the whole system more adaptive, self organizing, self learning and open.
基金theChina’sNationalSurveyingTechnicalFund (No .2 0 0 0 7)
文摘This paper proposes the principle of comprehensive knowledge discovery. Unlike most of the current knowledge discovery methods, the comprehensive knowledge discovery considers both the spatial relations and attributes of spatial entities or objects. We introduce the theory of spatial knowledge expression system and some concepts including comprehensive knowledge discovery and spatial union information table (SUIT). In theory, SUIT records all information contained in the studied objects, but in reality, because of the complexity and varieties of spatial relations, only those factors of interest to us are selected. In order to find out the comprehensive knowledge from spatial databases, an efficient comprehensive knowledge discovery algorithm called recycled algorithm (RAR) is suggested.
文摘With massive amounts of data stored in databases, mining information and knowledge in databases has become an important issue in recent research. Researchers in many different fields have shown great interest in data mining and knowledge discovery in databases. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining and knowledge discovery techniques to understand user behavior better, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a comprehensive survey on the data mining and knowledge discovery techniques developed recently, and introduce some real application systems as well. In conclusion, this article also lists some problems and challenges for further research.
文摘Tsinghua Science and Technology is founded and published since 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, and other information technology fields. It is indexed by Ei and other abstracting and indexing services. From 2013, the journal commits to the open access at IEEE Xplore Digital Library.
文摘It is important for telecom companies to make sense of the large number of data they have accumulated over the years. This paper reviews the concepts and the techniques of knowledge discovery in databases (KDD), and surveys applications of this technology in the telecommunications sector all over the world. It also discusses some possible applications of this technology in China, and reports a preliminary result of the first attempt to apply KDD technique in telephone traffic volume prediction. It concludes that KDD is a promising technology that can help to enhance-the competitiveness of China's telecom companies in the face of looming competition in a liberated market.
文摘An integrated solution for discovery of literature information knowledge is proposed. The analytic model of literature Information model and discovery of literature information knowledge are illustrated. Practical illustrative example for discovery of literature information knowledge is given.
文摘为研究地质学领域的大数据和人工智能研究现状、热点和前沿,在中国知网(CNKI)核心期刊和Web of Science(WoS)核心数据库收集了2000—2022年相关中文文献3600篇、英文文献1803篇,利用社区结构分析软件CiteSpace,从合作作者、研究国家、研究机构、关键词聚类、关键词时空分布图谱等进行可视化分析,并统计了2021—2022年间,地质学领域国际顶级期刊(综合影响因子10以上)的文献进行前沿分析。分析结果表明,近10年内该研究领域全球累计发文量激增,以中国为代表的亚洲国家和以美国为代表的欧美国家研究为主,双方累计发文量相差不大,论文中介中心性欧美国家普遍较高。我国研究机构之间的交流合作居多,与国外的研究机构交流合作较少,国外研究机构则与之相反。该领域以应用机器学习类方法、知识图谱构建等,在地质灾害防治、地震解释、石油与天然气勘查、固体矿产资源预测等方向进行的科学研究为研究热点,以深度学习、集成学习、智能平台搭建等为手段的地球演化过程中的重大地质事件研究、全球性气候变化、极地及海洋地质研究、数字地质建模及定量分析、地震预报、地灾易发性精准评估等为研究前沿。
文摘Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better understanding of Chinese herbal property theory (CHPT), this paper performed an improved association rule learning to analyze semistructured text in the book entitled Shennong's Classic of Materia Medica. The text was firstly annotated and transformed to well-structured multidimensional data. Subsequently, an Apriori algorithm was employed for producing association rules after the sensitivity analysis of parameters. From the confirmed 120 resulting rules that described the intrinsic relationships between herbal property (qi, flavor and their combinations) and herbal efficacy, two novel fundamental principles underlying CHPT were acquired and further elucidated: (1) the many-to-one mapping of herbal efficacy to herbal property; (2) the nonrandom overlap between the related efficacy of qi and flavor. This work provided an innovative knowledge about CHPT, which would be helpful for its modern research.