This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technol...This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.展开更多
目的应用文献计量学方法分析近10年来近视研究领域的现状、热点和未来的发展方向。方法检索Web of Science核心数据库中2013年1月1日至2022年12月31日近视相关的研究类和综述类文献,使用VOSviewer软件对国家、研究机构、作者进行共现分...目的应用文献计量学方法分析近10年来近视研究领域的现状、热点和未来的发展方向。方法检索Web of Science核心数据库中2013年1月1日至2022年12月31日近视相关的研究类和综述类文献,使用VOSviewer软件对国家、研究机构、作者进行共现分析,使用CiteSpace软件对关键词和共被引参考文献进行聚类分析。结果最终纳入9745篇文献,涉及123个国家或地区,7150个机构和29343位作者。通过分析发现全球在近视领域的发文量整体呈增长趋势,中国是发文量最多的国家,来自美国的研究总被引用次数最多。关键词分析结果表明,早期近视研究热点主要集中于屈光手术、并发症的诊断与治疗、遗传学研究以及流行病学特征,而近年来研究重点已迅速转向近视的预防和控制。共被引文献聚类分析结果显示,近视领域包含多个聚类模块,如#0学龄儿童、#1小切口角膜基质透镜取出术、#2近视控制、#3屈光不正、#4接触镜等研究方向。研究前沿主要聚焦于近视管理技术、近视与视网膜和脉络膜血管、人工智能在近视领域的应用等方面。结论近十年近视研究领域涵盖眼科学、分子生物学、遗传学、眼视光学、流行病学等多个学科领域。未来需要进一步探索近视的病因和发病机制、早期识别和筛查、管理技术、人工智能辅助诊断等,以制定更加有效、安全的近视防控策略。展开更多
Leveraging big data analytics and advanced algorithms to accelerate and optimize the process of molecular and materials design, synthesis, and application has revolutionized the field of molecular and materials scienc...Leveraging big data analytics and advanced algorithms to accelerate and optimize the process of molecular and materials design, synthesis, and application has revolutionized the field of molecular and materials science, allowing researchers to gain a deeper understanding of material properties and behaviors,leading to the development of new materials that are more efficient and reliable. However, the difficulty in constructing large-scale datasets of new molecules/materials due to the high cost of data acquisition and annotation limits the development of conventional machine learning(ML) approaches. Knowledgereused transfer learning(TL) methods are expected to break this dilemma. The application of TL lowers the data requirements for model training, which makes TL stand out in researches addressing data quality issues. In this review, we summarize recent progress in TL related to molecular and materials. We focus on the application of TL methods for the discovery of advanced molecules/materials, particularly, the construction of TL frameworks for different systems, and how TL can enhance the performance of models. In addition, the challenges of TL are also discussed.展开更多
I provide some science and reflections from my experiences working in geophysics,along with connections to computational and data sciences,including recent developments in machine learning.I highlight several individu...I provide some science and reflections from my experiences working in geophysics,along with connections to computational and data sciences,including recent developments in machine learning.I highlight several individuals and groups who have influenced me,both through direct collaborations as well as from ideas and insights that I have learned from.While my reflections are rooted in geophysics,they should also be relevant to other computational scientific and engineering fields.I also provide some thoughts for young,applied scientists and engineers.展开更多
The rise or fall of the stock markets directly affects investors’interest and loyalty.Therefore,it is necessary to measure the performance of stocks in the market in advance to prevent our assets from suffering signi...The rise or fall of the stock markets directly affects investors’interest and loyalty.Therefore,it is necessary to measure the performance of stocks in the market in advance to prevent our assets from suffering significant losses.In our proposed study,six supervised machine learning(ML)strategies and deep learning(DL)models with long short-term memory(LSTM)of data science was deployed for thorough analysis and measurement of the performance of the technology stocks.Under discussion are Apple Inc.(AAPL),Microsoft Corporation(MSFT),Broadcom Inc.,Taiwan Semiconductor Manufacturing Company Limited(TSM),NVIDIA Corporation(NVDA),and Avigilon Corporation(AVGO).The datasets were taken from the Yahoo Finance API from 06-05-2005 to 06-05-2022(seventeen years)with 4280 samples.As already noted,multiple studies have been performed to resolve this problem using linear regression,support vectormachines,deep long short-termmemory(LSTM),and many other models.In this research,the Hidden Markov Model(HMM)outperformed other employed machine learning ensembles,tree-based models,the ARIMA(Auto Regressive IntegratedMoving Average)model,and long short-term memory with a robust mean accuracy score of 99.98.Other statistical analyses and measurements for machine learning ensemble algorithms,the Long Short-TermModel,and ARIMA were also carried out for further investigation of the performance of advanced models for forecasting time series data.Thus,the proposed research found the best model to be HMM,and LSTM was the second-best model that performed well in all aspects.A developedmodel will be highly recommended and helpful for early measurement of technology stock performance for investment or withdrawal based on the future stock rise or fall for creating smart environments.展开更多
The potential of citizen science projects in research has been increasingly acknowledged,but the substantial engagement of these projects is restricted by the quality of citizen science data.Based on the largest emerg...The potential of citizen science projects in research has been increasingly acknowledged,but the substantial engagement of these projects is restricted by the quality of citizen science data.Based on the largest emerging citizen science project in the country-Birdreport Online Database(BOD),we examined the biases of birdwatching data from the Greater Bay Area of China.The results show that the sampling effort is disparate among land cover types due to contributors’ preference towards urban and suburban areas,indicating the environment suitable for species existence could be underrepresented in the BOD data.We tested the contributors’ skill of species identification via a questionnaire targeting the citizen birders in the Greater Bay Area.The questionnaire show that most citizen birdwatchers could correctly identify the common species widely distributed in Southern China and the less common species with conspicuous morphological characteristics,while failed to identify the species from Alaudidae;Caprimulgidae,Emberizidae,Phylloscopidae,Scolopacidae and Scotocercidae.With a study example,we demonstrate that spatially clustered bird watching visits can cause underestimation of species richness in insufficiently sampled areas;and the result of species richness mapping is sensitive to the contributors’ skill of identifying bird species.Our results address how avian research can be influenced by the reliability of citizen science data in a region of generally high accessibility,and highlight the necessity of pre-analysis scrutiny on data reliability regarding to research aims at all spatial and temporal scales.To improve the data quality,we suggest to equip the data collection frame of BOD with a flexible filter for bird abundance,and questionnaires that collect information related to contributors’ bird identification skill.Statistic modelling approaches are encouraged to apply for correcting the bias of sampling effort.展开更多
1 Key concepts underpinning geo-data science Geoinformatics and Geomathematics Computers have been used for data collection,management,analysis,and transmission in geoscience for about 70 years since the 1950s (Merria...1 Key concepts underpinning geo-data science Geoinformatics and Geomathematics Computers have been used for data collection,management,analysis,and transmission in geoscience for about 70 years since the 1950s (Merriam,2001;2004).The term geoinformatics is widely used to describe such activities.In real-world practices,researchers in both geography and geoscience are using the term geoinformatics.展开更多
Due to the recent explosion of big data, our society has been rapidly going through digital transformation and entering a new world with numerous eye-opening developments. These new trends impact the society and futur...Due to the recent explosion of big data, our society has been rapidly going through digital transformation and entering a new world with numerous eye-opening developments. These new trends impact the society and future jobs, and thus student careers. At the heart of this digital transformation is data science, the discipline that makes sense of big data. With many rapidly emerging digital challenges ahead of us, this article discusses perspectives on iSchools' opportunities and suggestions in data science education. We argue that iSchools should empower their students with "information computing" disciplines, which we define as the ability to solve problems and create values, information, and knowledge using tools in application domains. As specific approaches to enforcing information computing disciplines in data science education, we suggest the three foci of user-based, tool-based, and application- based. These three loci will serve to differentiate the data science education of iSchools from that of computer science or business schools. We present a layered Data Science Education Framework (DSEF) with building blocks that include the three pillars of data science (people, technology, and data), computational thinking, data-driven paradigms, and data science lifecycles. Data science courses built on the top of this framework should thus be executed with user-based, tool-based, and application-based approaches. This framework will help our students think about data science problems from the big picture perspective and foster appropriate problem-solving skills in conjunction with broad perspectives of data science lifecycles. We hope the DSEF discussed in this article will help fellow iSchools in their design of new data science curricula.展开更多
Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advanc...Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The "utilitarian nature" and "historical and traditional views" of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part ofa metadata linguafranca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.展开更多
Since its launch in 2011, the Materials Genome Initiative(MGI) has drawn the attention of researchers from academia,government, and industry worldwide. As one of the three tools of the MGI, the use of materials data...Since its launch in 2011, the Materials Genome Initiative(MGI) has drawn the attention of researchers from academia,government, and industry worldwide. As one of the three tools of the MGI, the use of materials data, for the first time, has emerged as an extremely significant approach in materials discovery. Data science has been applied in different disciplines as an interdisciplinary field to extract knowledge from data. The concept of materials data science has been utilized to demonstrate its application in materials science. To explore its potential as an active research branch in the big data era, a three-tier system has been put forward to define the infrastructure for the classification, curation and knowledge extraction of materials data.展开更多
To comprehensively understand the Arctic and Antarctic upper atmosphere, it is often crucial to analyze various data that are obtained from many regions. Infrastructure that promotes such interdisciplinary studies on ...To comprehensively understand the Arctic and Antarctic upper atmosphere, it is often crucial to analyze various data that are obtained from many regions. Infrastructure that promotes such interdisciplinary studies on the upper atmosphere has been developed by a Japanese inter-university project called the Inter-university Upper atmosphere Global Observation Network (1UGONET). The objective of this paper is to describe the infrastructure and tools developed by IUGONET. We focus on the data analysis software. It is written in Interactive Data Language (IDL) and is a plug-in for the THEMIS Data Analysis Software suite (TDAS), which is a set of IDL libraries used to visualize and analyze satellite- and ground-based data. We present plots of upper atmospheric data provided by IUGONET as examples of applications, and verify the usefulness of the software in the study of polar science. We discuss IUGONET's new and unique developments, i.e., an executable file of TDAS that can run on the IDL Virtual Machine, IDL routines to retrieve metadata from the IUGONET database, and an archive of 3-D simulation data that uses the Common Data Format so that it can easily be used with TDAS.展开更多
Standards and specifications are the premise of integrated reorganization of science specimen data, and data integration is the core of the reorganization. ETL [1] which is the abbreviation of extract, transform, and ...Standards and specifications are the premise of integrated reorganization of science specimen data, and data integration is the core of the reorganization. ETL [1] which is the abbreviation of extract, transform, and load [2], is very suitable for data integration. Kettle is a kind of ETL software. In this paper, it has been introduced into the integrated reorganization of science specimen data. Multi-source and heterogeneous specimen data are integrated using kettle, and good results have been achieved. It proved the effectiveness of kettle in the integrated reorganization of science specimen data. The application has practical significance, and the method can be referenced when reorganizing other resource data.展开更多
In present digital era,data science techniques exploit artificial intelligence(AI)techniques who start and run small and medium-sized enterprises(SMEs)to have an impact and develop their businesses.Data science integr...In present digital era,data science techniques exploit artificial intelligence(AI)techniques who start and run small and medium-sized enterprises(SMEs)to have an impact and develop their businesses.Data science integrates the conventions of econometrics with the technological elements of data science.It make use of machine learning(ML),predictive and prescriptive analytics to effectively understand financial data and solve related problems.Smart technologies for SMEs enable allows the firm to get smarter with their processes and offers efficient operations.At the same time,it is needed to develop an effective tool which can assist small to medium sized enterprises to forecast business failure as well as financial crisis.AI becomes a familiar tool for several businesses due to the fact that it concentrates on the design of intelligent decision making tools to solve particular real time problems.With this motivation,this paper presents a new AI based optimal functional link neural network(FLNN)based financial crisis prediction(FCP)model forSMEs.The proposed model involves preprocessing,feature selection,classification,and parameter tuning.At the initial stage,the financial data of the enterprises are collected and are preprocessed to enhance the quality of the data.Besides,a novel chaotic grasshopper optimization algorithm(CGOA)based feature selection technique is applied for the optimal selection of features.Moreover,functional link neural network(FLNN)model is employed for the classification of the feature reduced data.Finally,the efficiency of theFLNNmodel can be improvised by the use of cat swarm optimizer(CSO)algorithm.A detailed experimental validation process takes place on Polish dataset to ensure the performance of the presented model.The experimental studies demonstrated that the CGOA-FLNN-CSO model has accomplished maximum prediction accuracy of 98.830%,92.100%,and 95.220%on the applied Polish dataset Year I-III respectively.展开更多
The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive st...The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive structure for measuring the worth of data elements,hindering effective navigation of the changing digital environment.This paper aims to fill this research gap by introducing the innovative concept of“data components.”It proposes a graphtheoretic representation model that presents a clear mathematical definition and demonstrates the superiority of data components over traditional processing methods.Additionally,the paper introduces an information measurement model that provides a way to calculate the information entropy of data components and establish their increased informational value.The paper also assesses the value of information,suggesting a pricing mechanism based on its significance.In conclusion,this paper establishes a robust framework for understanding and quantifying the value of implicit information in data,laying the groundwork for future research and practical applications.展开更多
There has long been discussion about the distinctions of library science,information science,and informatics,and how these areas differ and overlap with computer science.Today the term data science is emerging that ge...There has long been discussion about the distinctions of library science,information science,and informatics,and how these areas differ and overlap with computer science.Today the term data science is emerging that generates excitement and questions about how it relates to and differs from these other areas of study.展开更多
With the ongoing advancements in sensor networks and data acquisition technologies across various systems like manufacturing,aviation,and healthcare,the data driven vibration control(DDVC)has attracted broad interests...With the ongoing advancements in sensor networks and data acquisition technologies across various systems like manufacturing,aviation,and healthcare,the data driven vibration control(DDVC)has attracted broad interests from both the industrial and academic communities.Input shaping(IS),as a simple and effective feedforward method,is greatly demanded in DDVC methods.It convolves the desired input command with impulse sequence without requiring parametric dynamics and the closed-loop system structure,thereby suppressing the residual vibration separately.Based on a thorough investigation into the state-of-the-art DDVC methods,this survey has made the following efforts:1)Introducing the IS theory and typical input shapers;2)Categorizing recent progress of DDVC methods;3)Summarizing commonly adopted metrics for DDVC;and 4)Discussing the engineering applications and future trends of DDVC.By doing so,this study provides a systematic and comprehensive overview of existing DDVC methods from designing to optimizing perspectives,aiming at promoting future research regarding this emerging and vital issue.展开更多
Science data are very important resources for innovative research in all scientific disciplines. The Ministry of Science and Technology (MOST) of China has launched a comprehensive platform program for supporting sc...Science data are very important resources for innovative research in all scientific disciplines. The Ministry of Science and Technology (MOST) of China has launched a comprehensive platform program for supporting scientific innovations and agricultural science database construction and sharing project is one of the activities under this program supported by MOST. This paper briefly described the achievements of the Agricultural Science Data Center Project.展开更多
The Energization and Radiation in Geospace (ERG) mission seeks to explore the dynamics of the radiation belts in the Earth's inner magnetosphere with a space-borne probe (ERG satellite) in coordination with relat...The Energization and Radiation in Geospace (ERG) mission seeks to explore the dynamics of the radiation belts in the Earth's inner magnetosphere with a space-borne probe (ERG satellite) in coordination with related ground observations and simulations/modeling studies. For this mission, the Science Center of the ERG project (ERG-SC) will provide a useful data analysis platform based on the THEMIS Data Analysis software Suite (TDAS), which has been widely used by researchers in many conjunction studies of the Time History of Events and Macroscale Interactions during Substorms (THEMIS) spacecraft and ground data. To import SuperDARN data to this highly useful platform, ERG-SC, in close collaboration with SuperDARN groups, developed the Common Data Format (CDF) design suitable for fitacf data and has prepared an open database of SuperDARN data archived in CDE ERG-SC has also been developing programs written in Interactive Data Language (IDL) to load fltacf CDF files and to generate various kinds of plots-not only range-time-intensity-type plots but also two-dimensional map plots that can be superposed with other data, such as all-sky images of THEMIS-GBO and orbital footprints of various satellites. The CDF-TDAS scheme developed by ERG-SC will make it easier for researchers who are not familiar with SuperDARN data to access and analyze SuperDARN data and thereby facilitate collaborative studies with satellite data, such as the inner magnetosphere data pro- vided by the ERG (Japan)-RBSP (USA)-THEMIS (USA) fleet.展开更多
This paper reviews literature pertaining to the development of data science as a discipline,current issues with data bias and ethics,and the role that the discipline of information science may play in addressing these...This paper reviews literature pertaining to the development of data science as a discipline,current issues with data bias and ethics,and the role that the discipline of information science may play in addressing these concerns.Information science research and researchers have much to offer for data science,owing to their background as transdisciplinary scholars who apply human-centered and social-behavioral perspectives to issues within natural science disciplines.Information science researchers have already contributed to a humanistic approach to data ethics within the literature and an emphasis on data science within information schools all but ensures that this literature will continue to grow in coming decades.This review article serves as a reference for the history,current progress,and potential future directions of data ethics research within the corpus of information science literature.展开更多
文摘This article discusses the current status and development strategies of computer science and technology in the context of big data.Firstly,it explains the relationship between big data and computer science and technology,focusing on analyzing the current application status of computer science and technology in big data,including data storage,data processing,and data analysis.Then,it proposes development strategies for big data processing.Computer science and technology play a vital role in big data processing by providing strong technical support.
文摘目的应用文献计量学方法分析近10年来近视研究领域的现状、热点和未来的发展方向。方法检索Web of Science核心数据库中2013年1月1日至2022年12月31日近视相关的研究类和综述类文献,使用VOSviewer软件对国家、研究机构、作者进行共现分析,使用CiteSpace软件对关键词和共被引参考文献进行聚类分析。结果最终纳入9745篇文献,涉及123个国家或地区,7150个机构和29343位作者。通过分析发现全球在近视领域的发文量整体呈增长趋势,中国是发文量最多的国家,来自美国的研究总被引用次数最多。关键词分析结果表明,早期近视研究热点主要集中于屈光手术、并发症的诊断与治疗、遗传学研究以及流行病学特征,而近年来研究重点已迅速转向近视的预防和控制。共被引文献聚类分析结果显示,近视领域包含多个聚类模块,如#0学龄儿童、#1小切口角膜基质透镜取出术、#2近视控制、#3屈光不正、#4接触镜等研究方向。研究前沿主要聚焦于近视管理技术、近视与视网膜和脉络膜血管、人工智能在近视领域的应用等方面。结论近十年近视研究领域涵盖眼科学、分子生物学、遗传学、眼视光学、流行病学等多个学科领域。未来需要进一步探索近视的病因和发病机制、早期识别和筛查、管理技术、人工智能辅助诊断等,以制定更加有效、安全的近视防控策略。
基金National Key R&D Program of China (No. 2021YFC2100100)Shanghai Science and Technology Project (No. 21JC1403400, 23JC1402300)。
文摘Leveraging big data analytics and advanced algorithms to accelerate and optimize the process of molecular and materials design, synthesis, and application has revolutionized the field of molecular and materials science, allowing researchers to gain a deeper understanding of material properties and behaviors,leading to the development of new materials that are more efficient and reliable. However, the difficulty in constructing large-scale datasets of new molecules/materials due to the high cost of data acquisition and annotation limits the development of conventional machine learning(ML) approaches. Knowledgereused transfer learning(TL) methods are expected to break this dilemma. The application of TL lowers the data requirements for model training, which makes TL stand out in researches addressing data quality issues. In this review, we summarize recent progress in TL related to molecular and materials. We focus on the application of TL methods for the discovery of advanced molecules/materials, particularly, the construction of TL frameworks for different systems, and how TL can enhance the performance of models. In addition, the challenges of TL are also discussed.
文摘I provide some science and reflections from my experiences working in geophysics,along with connections to computational and data sciences,including recent developments in machine learning.I highlight several individuals and groups who have influenced me,both through direct collaborations as well as from ideas and insights that I have learned from.While my reflections are rooted in geophysics,they should also be relevant to other computational scientific and engineering fields.I also provide some thoughts for young,applied scientists and engineers.
基金supported by Kyungpook National University Research Fund,2020.
文摘The rise or fall of the stock markets directly affects investors’interest and loyalty.Therefore,it is necessary to measure the performance of stocks in the market in advance to prevent our assets from suffering significant losses.In our proposed study,six supervised machine learning(ML)strategies and deep learning(DL)models with long short-term memory(LSTM)of data science was deployed for thorough analysis and measurement of the performance of the technology stocks.Under discussion are Apple Inc.(AAPL),Microsoft Corporation(MSFT),Broadcom Inc.,Taiwan Semiconductor Manufacturing Company Limited(TSM),NVIDIA Corporation(NVDA),and Avigilon Corporation(AVGO).The datasets were taken from the Yahoo Finance API from 06-05-2005 to 06-05-2022(seventeen years)with 4280 samples.As already noted,multiple studies have been performed to resolve this problem using linear regression,support vectormachines,deep long short-termmemory(LSTM),and many other models.In this research,the Hidden Markov Model(HMM)outperformed other employed machine learning ensembles,tree-based models,the ARIMA(Auto Regressive IntegratedMoving Average)model,and long short-term memory with a robust mean accuracy score of 99.98.Other statistical analyses and measurements for machine learning ensemble algorithms,the Long Short-TermModel,and ARIMA were also carried out for further investigation of the performance of advanced models for forecasting time series data.Thus,the proposed research found the best model to be HMM,and LSTM was the second-best model that performed well in all aspects.A developedmodel will be highly recommended and helpful for early measurement of technology stock performance for investment or withdrawal based on the future stock rise or fall for creating smart environments.
基金the Estuary wetland wildlife survey project of the Greater Bay Area of China(Science and Technology Planning Projects of Guangdong Province,2021B1212110002).
文摘The potential of citizen science projects in research has been increasingly acknowledged,but the substantial engagement of these projects is restricted by the quality of citizen science data.Based on the largest emerging citizen science project in the country-Birdreport Online Database(BOD),we examined the biases of birdwatching data from the Greater Bay Area of China.The results show that the sampling effort is disparate among land cover types due to contributors’ preference towards urban and suburban areas,indicating the environment suitable for species existence could be underrepresented in the BOD data.We tested the contributors’ skill of species identification via a questionnaire targeting the citizen birders in the Greater Bay Area.The questionnaire show that most citizen birdwatchers could correctly identify the common species widely distributed in Southern China and the less common species with conspicuous morphological characteristics,while failed to identify the species from Alaudidae;Caprimulgidae,Emberizidae,Phylloscopidae,Scolopacidae and Scotocercidae.With a study example,we demonstrate that spatially clustered bird watching visits can cause underestimation of species richness in insufficiently sampled areas;and the result of species richness mapping is sensitive to the contributors’ skill of identifying bird species.Our results address how avian research can be influenced by the reliability of citizen science data in a region of generally high accessibility,and highlight the necessity of pre-analysis scrutiny on data reliability regarding to research aims at all spatial and temporal scales.To improve the data quality,we suggest to equip the data collection frame of BOD with a flexible filter for bird abundance,and questionnaires that collect information related to contributors’ bird identification skill.Statistic modelling approaches are encouraged to apply for correcting the bias of sampling effort.
基金supported by the National Science Foundation (Grant No.1815526).
文摘1 Key concepts underpinning geo-data science Geoinformatics and Geomathematics Computers have been used for data collection,management,analysis,and transmission in geoscience for about 70 years since the 1950s (Merriam,2001;2004).The term geoinformatics is widely used to describe such activities.In real-world practices,researchers in both geography and geoscience are using the term geoinformatics.
文摘Due to the recent explosion of big data, our society has been rapidly going through digital transformation and entering a new world with numerous eye-opening developments. These new trends impact the society and future jobs, and thus student careers. At the heart of this digital transformation is data science, the discipline that makes sense of big data. With many rapidly emerging digital challenges ahead of us, this article discusses perspectives on iSchools' opportunities and suggestions in data science education. We argue that iSchools should empower their students with "information computing" disciplines, which we define as the ability to solve problems and create values, information, and knowledge using tools in application domains. As specific approaches to enforcing information computing disciplines in data science education, we suggest the three foci of user-based, tool-based, and application- based. These three loci will serve to differentiate the data science education of iSchools from that of computer science or business schools. We present a layered Data Science Education Framework (DSEF) with building blocks that include the three pillars of data science (people, technology, and data), computational thinking, data-driven paradigms, and data science lifecycles. Data science courses built on the top of this framework should thus be executed with user-based, tool-based, and application-based approaches. This framework will help our students think about data science problems from the big picture perspective and foster appropriate problem-solving skills in conjunction with broad perspectives of data science lifecycles. We hope the DSEF discussed in this article will help fellow iSchools in their design of new data science curricula.
文摘Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The "utilitarian nature" and "historical and traditional views" of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part ofa metadata linguafranca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.
基金Project supported by the National Key R&D Program of China(Grant No.2016YFB0700503)the National High Technology Research and Development Program of China(Grant No.2015AA03420)+2 种基金Beijing Municipal Science and Technology Project,China(Grant No.D161100002416001)the National Natural Science Foundation of China(Grant No.51172018)Kennametal Inc
文摘Since its launch in 2011, the Materials Genome Initiative(MGI) has drawn the attention of researchers from academia,government, and industry worldwide. As one of the three tools of the MGI, the use of materials data, for the first time, has emerged as an extremely significant approach in materials discovery. Data science has been applied in different disciplines as an interdisciplinary field to extract knowledge from data. The concept of materials data science has been utilized to demonstrate its application in materials science. To explore its potential as an active research branch in the big data era, a three-tier system has been put forward to define the infrastructure for the classification, curation and knowledge extraction of materials data.
基金supported by the Special Edu-cational Research Budget(Research Promotion)[FY2009]the Special Budget(Project)[FY2010 and later years]from the Ministry of Education,Culture,Sports,Science and Technology(MEXT),Japansupported by the GRENE Arctic Climate Change Research Project,Japan
文摘To comprehensively understand the Arctic and Antarctic upper atmosphere, it is often crucial to analyze various data that are obtained from many regions. Infrastructure that promotes such interdisciplinary studies on the upper atmosphere has been developed by a Japanese inter-university project called the Inter-university Upper atmosphere Global Observation Network (1UGONET). The objective of this paper is to describe the infrastructure and tools developed by IUGONET. We focus on the data analysis software. It is written in Interactive Data Language (IDL) and is a plug-in for the THEMIS Data Analysis Software suite (TDAS), which is a set of IDL libraries used to visualize and analyze satellite- and ground-based data. We present plots of upper atmospheric data provided by IUGONET as examples of applications, and verify the usefulness of the software in the study of polar science. We discuss IUGONET's new and unique developments, i.e., an executable file of TDAS that can run on the IDL Virtual Machine, IDL routines to retrieve metadata from the IUGONET database, and an archive of 3-D simulation data that uses the Common Data Format so that it can easily be used with TDAS.
文摘Standards and specifications are the premise of integrated reorganization of science specimen data, and data integration is the core of the reorganization. ETL [1] which is the abbreviation of extract, transform, and load [2], is very suitable for data integration. Kettle is a kind of ETL software. In this paper, it has been introduced into the integrated reorganization of science specimen data. Multi-source and heterogeneous specimen data are integrated using kettle, and good results have been achieved. It proved the effectiveness of kettle in the integrated reorganization of science specimen data. The application has practical significance, and the method can be referenced when reorganizing other resource data.
基金The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under Grant Number(RGP 1/147/42),www.kku.edu.sa.This research was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the Fast-Track Path of Research Funding Program.
文摘In present digital era,data science techniques exploit artificial intelligence(AI)techniques who start and run small and medium-sized enterprises(SMEs)to have an impact and develop their businesses.Data science integrates the conventions of econometrics with the technological elements of data science.It make use of machine learning(ML),predictive and prescriptive analytics to effectively understand financial data and solve related problems.Smart technologies for SMEs enable allows the firm to get smarter with their processes and offers efficient operations.At the same time,it is needed to develop an effective tool which can assist small to medium sized enterprises to forecast business failure as well as financial crisis.AI becomes a familiar tool for several businesses due to the fact that it concentrates on the design of intelligent decision making tools to solve particular real time problems.With this motivation,this paper presents a new AI based optimal functional link neural network(FLNN)based financial crisis prediction(FCP)model forSMEs.The proposed model involves preprocessing,feature selection,classification,and parameter tuning.At the initial stage,the financial data of the enterprises are collected and are preprocessed to enhance the quality of the data.Besides,a novel chaotic grasshopper optimization algorithm(CGOA)based feature selection technique is applied for the optimal selection of features.Moreover,functional link neural network(FLNN)model is employed for the classification of the feature reduced data.Finally,the efficiency of theFLNNmodel can be improvised by the use of cat swarm optimizer(CSO)algorithm.A detailed experimental validation process takes place on Polish dataset to ensure the performance of the presented model.The experimental studies demonstrated that the CGOA-FLNN-CSO model has accomplished maximum prediction accuracy of 98.830%,92.100%,and 95.220%on the applied Polish dataset Year I-III respectively.
基金supported by the EU H2020 Research and Innovation Program under the Marie Sklodowska-Curie Grant Agreement(Project-DEEP,Grant number:101109045)National Key R&D Program of China with Grant number 2018YFB1800804+2 种基金the National Natural Science Foundation of China(Nos.NSFC 61925105,and 62171257)Tsinghua University-China Mobile Communications Group Co.,Ltd,Joint Institutethe Fundamental Research Funds for the Central Universities,China(No.FRF-NP-20-03)。
文摘The increasing dependence on data highlights the need for a detailed understanding of its behavior,encompassing the challenges involved in processing and evaluating it.However,current research lacks a comprehensive structure for measuring the worth of data elements,hindering effective navigation of the changing digital environment.This paper aims to fill this research gap by introducing the innovative concept of“data components.”It proposes a graphtheoretic representation model that presents a clear mathematical definition and demonstrates the superiority of data components over traditional processing methods.Additionally,the paper introduces an information measurement model that provides a way to calculate the information entropy of data components and establish their increased informational value.The paper also assesses the value of information,suggesting a pricing mechanism based on its significance.In conclusion,this paper establishes a robust framework for understanding and quantifying the value of implicit information in data,laying the groundwork for future research and practical applications.
文摘There has long been discussion about the distinctions of library science,information science,and informatics,and how these areas differ and overlap with computer science.Today the term data science is emerging that generates excitement and questions about how it relates to and differs from these other areas of study.
基金supported by the National Natural Science Foundation of China (62272078)。
文摘With the ongoing advancements in sensor networks and data acquisition technologies across various systems like manufacturing,aviation,and healthcare,the data driven vibration control(DDVC)has attracted broad interests from both the industrial and academic communities.Input shaping(IS),as a simple and effective feedforward method,is greatly demanded in DDVC methods.It convolves the desired input command with impulse sequence without requiring parametric dynamics and the closed-loop system structure,thereby suppressing the residual vibration separately.Based on a thorough investigation into the state-of-the-art DDVC methods,this survey has made the following efforts:1)Introducing the IS theory and typical input shapers;2)Categorizing recent progress of DDVC methods;3)Summarizing commonly adopted metrics for DDVC;and 4)Discussing the engineering applications and future trends of DDVC.By doing so,this study provides a systematic and comprehensive overview of existing DDVC methods from designing to optimizing perspectives,aiming at promoting future research regarding this emerging and vital issue.
基金Supported by Ministry of Science and Technology"National Science and Technology Platform Program"(2005DKA31800)
文摘Science data are very important resources for innovative research in all scientific disciplines. The Ministry of Science and Technology (MOST) of China has launched a comprehensive platform program for supporting scientific innovations and agricultural science database construction and sharing project is one of the activities under this program supported by MOST. This paper briefly described the achievements of the Agricultural Science Data Center Project.
文摘The Energization and Radiation in Geospace (ERG) mission seeks to explore the dynamics of the radiation belts in the Earth's inner magnetosphere with a space-borne probe (ERG satellite) in coordination with related ground observations and simulations/modeling studies. For this mission, the Science Center of the ERG project (ERG-SC) will provide a useful data analysis platform based on the THEMIS Data Analysis software Suite (TDAS), which has been widely used by researchers in many conjunction studies of the Time History of Events and Macroscale Interactions during Substorms (THEMIS) spacecraft and ground data. To import SuperDARN data to this highly useful platform, ERG-SC, in close collaboration with SuperDARN groups, developed the Common Data Format (CDF) design suitable for fitacf data and has prepared an open database of SuperDARN data archived in CDE ERG-SC has also been developing programs written in Interactive Data Language (IDL) to load fltacf CDF files and to generate various kinds of plots-not only range-time-intensity-type plots but also two-dimensional map plots that can be superposed with other data, such as all-sky images of THEMIS-GBO and orbital footprints of various satellites. The CDF-TDAS scheme developed by ERG-SC will make it easier for researchers who are not familiar with SuperDARN data to access and analyze SuperDARN data and thereby facilitate collaborative studies with satellite data, such as the inner magnetosphere data pro- vided by the ERG (Japan)-RBSP (USA)-THEMIS (USA) fleet.
文摘This paper reviews literature pertaining to the development of data science as a discipline,current issues with data bias and ethics,and the role that the discipline of information science may play in addressing these concerns.Information science research and researchers have much to offer for data science,owing to their background as transdisciplinary scholars who apply human-centered and social-behavioral perspectives to issues within natural science disciplines.Information science researchers have already contributed to a humanistic approach to data ethics within the literature and an emphasis on data science within information schools all but ensures that this literature will continue to grow in coming decades.This review article serves as a reference for the history,current progress,and potential future directions of data ethics research within the corpus of information science literature.