Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been br...Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been broadly utilized in the lifecycle of electronic items that range from the structure and generation stages to the administration organize.A far reaching examination of DM with Big Data and a survey of its application in the phases of its lifecycle won't just profit scientists to create solid research.As of late huge data have turned into a trendy expression,which constrained the analysts to extend the current data mining methods to adapt to the advanced idea of data and to grow new scientific procedures.In this paper,we build up an exact assessment technique dependent on the standard of Design of Experiment.We apply this technique to assess data mining instruments and AI calculations towards structure huge data examination for media transmission checking data.Two contextual investigations are directed to give bits of knowledge of relations between the necessities of data examination and the decision of an instrument or calculation with regards to data investigation work processes.展开更多
Managing large amounts of data is becoming part of everyday life in most organizations. Handling, analyzing, searching, and making predictions from big data is becoming the norm for many organizations of many interest...Managing large amounts of data is becoming part of everyday life in most organizations. Handling, analyzing, searching, and making predictions from big data is becoming the norm for many organizations of many interests. Big data provides the foundations for more benefits and higher values to be extracted from big data. As big data comes with countless benefits, it also comes with many challenges to fulfilling its expectations. Some of those problems haunting big data banks are being termed dirty data. This paper focuses on dirty data while working on an organization’s natural live information system. The author was responsible for studying and analyzing a faltering information system and planning and carrying out the required solutions and fixes. The importance of the work carried out lies in the high level of dirty data observed in the system. Therefore, this paper is based on the part of dirty data—the paper focuses on how the team suffered from dirty data and how it was dealt with.展开更多
The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge ...The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.展开更多
在当今数字化和智能化的时代背景下,人工智能(artificial intelligence,AI)已成为科技创新的重要引擎,总结探讨AI研究的最新趋势和未来发展方向具有重要的研究和现实意义.为此,对2021—2023年间在中国计算机学会(CCF)推荐的AI领域CCF-A...在当今数字化和智能化的时代背景下,人工智能(artificial intelligence,AI)已成为科技创新的重要引擎,总结探讨AI研究的最新趋势和未来发展方向具有重要的研究和现实意义.为此,对2021—2023年间在中国计算机学会(CCF)推荐的AI领域CCF-A类国际会议和期刊所发表论文的研究成果进行收集,并在此基础上采用文献计量学的方法论来通过关键词对研究热点进行分析,进行基于高频关键词分析研究热点、基于新增关键词分析研究趋势、基于引用量加权的关键词分析高影响力研究,可以梳理AI研究的主流方向、发现AI主要研究方向的相互联系和交叉融合的特点.此外,对当前研究热点如大语言模型(large language model,LLM)、AI驱动的科学研究(AI for Science)和视觉生成相关论文的关联热点进行分析,可以挖掘技术路径和方法论的演变,展现技术创新背后的科学理论和应用前景,从而进一步揭示AI研究的最新趋势和发展前景.展开更多
The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epi...The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epidemic characters.However,the re-sults of current prediction models are inaccurate since they are not closely combined with the actual situation of Omicron transmission.In consequence,these inaccurate results have negative impacts on the process of the manufacturing and the service industry,for example,the production of masks and the recovery of the tourism industry.The authors have studied the epidemic characters in two ways,that is,investigation and prediction.First,a large amount of data is collected by utilising the Baidu index and conduct questionnaire survey concerning epidemic characters.Second,theβ-SEIDR model is established,where the population is classified as Susceptible,Exposed,Infected,Dead andβ-Recovered persons,to intelligently predict the epidemic characters of COVID-19.Note thatβ-Recovered persons denote that the Recovered persons may become Sus-ceptible persons with probabilityβ.The simulation results show that the model can accurately predict the epidemic characters.展开更多
文摘Data mining is a procedure of separating covered up,obscure,however possibly valuable data from gigantic data.Huge Data impactsly affects logical disclosures and worth creation.Data mining(DM)with Big Data has been broadly utilized in the lifecycle of electronic items that range from the structure and generation stages to the administration organize.A far reaching examination of DM with Big Data and a survey of its application in the phases of its lifecycle won't just profit scientists to create solid research.As of late huge data have turned into a trendy expression,which constrained the analysts to extend the current data mining methods to adapt to the advanced idea of data and to grow new scientific procedures.In this paper,we build up an exact assessment technique dependent on the standard of Design of Experiment.We apply this technique to assess data mining instruments and AI calculations towards structure huge data examination for media transmission checking data.Two contextual investigations are directed to give bits of knowledge of relations between the necessities of data examination and the decision of an instrument or calculation with regards to data investigation work processes.
文摘Managing large amounts of data is becoming part of everyday life in most organizations. Handling, analyzing, searching, and making predictions from big data is becoming the norm for many organizations of many interests. Big data provides the foundations for more benefits and higher values to be extracted from big data. As big data comes with countless benefits, it also comes with many challenges to fulfilling its expectations. Some of those problems haunting big data banks are being termed dirty data. This paper focuses on dirty data while working on an organization’s natural live information system. The author was responsible for studying and analyzing a faltering information system and planning and carrying out the required solutions and fixes. The importance of the work carried out lies in the high level of dirty data observed in the system. Therefore, this paper is based on the part of dirty data—the paper focuses on how the team suffered from dirty data and how it was dealt with.
文摘The increasing volume of data in the area of environmental sciences needs analysis and interpretation. Among the challenges generated by this “data deluge”, the development of efficient strategies for the knowledge discovery is an important issue. Here, statistical and tools from computational intelligence are applied to analyze large data sets from meteorology and climate sciences. Our approach allows a geographical mapping of the statistical property to be easily interpreted by meteorologists. Our data analysis comprises two main steps of knowledge extraction, applied successively in order to reduce the complexity from the original data set. The goal is to identify a much smaller subset of climatic variables that might still be able to describe or even predict the probability of occurrence of an extreme event. The first step applies a class comparison technique: p-value estimation. The second step consists of a decision tree (DT) configured from the data available and the p-value analysis. The DT is used as a predictive model, identifying the most statistically significant climate variables of the precipitation intensity. The methodology is employed to the study the climatic causes of an extreme precipitation events occurred in Alagoas and Pernambuco States (Brazil) at June/2010.
文摘在当今数字化和智能化的时代背景下,人工智能(artificial intelligence,AI)已成为科技创新的重要引擎,总结探讨AI研究的最新趋势和未来发展方向具有重要的研究和现实意义.为此,对2021—2023年间在中国计算机学会(CCF)推荐的AI领域CCF-A类国际会议和期刊所发表论文的研究成果进行收集,并在此基础上采用文献计量学的方法论来通过关键词对研究热点进行分析,进行基于高频关键词分析研究热点、基于新增关键词分析研究趋势、基于引用量加权的关键词分析高影响力研究,可以梳理AI研究的主流方向、发现AI主要研究方向的相互联系和交叉融合的特点.此外,对当前研究热点如大语言模型(large language model,LLM)、AI驱动的科学研究(AI for Science)和视觉生成相关论文的关联热点进行分析,可以挖掘技术路径和方法论的演变,展现技术创新背后的科学理论和应用前景,从而进一步揭示AI研究的最新趋势和发展前景.
基金Key discipline construction project for traditional Chinese Medicine in Guangdong province,Grant/Award Number:20220104The construction project of inheritance studio of national famous and old traditional Chinese Medicine experts,Grant/Award Number:140000020132。
文摘The epidemic characters of Omicron(e.g.large-scale transmission)are significantly different from the initial variants of COVID-19.The data generated by large-scale transmission is important to predict the trend of epidemic characters.However,the re-sults of current prediction models are inaccurate since they are not closely combined with the actual situation of Omicron transmission.In consequence,these inaccurate results have negative impacts on the process of the manufacturing and the service industry,for example,the production of masks and the recovery of the tourism industry.The authors have studied the epidemic characters in two ways,that is,investigation and prediction.First,a large amount of data is collected by utilising the Baidu index and conduct questionnaire survey concerning epidemic characters.Second,theβ-SEIDR model is established,where the population is classified as Susceptible,Exposed,Infected,Dead andβ-Recovered persons,to intelligently predict the epidemic characters of COVID-19.Note thatβ-Recovered persons denote that the Recovered persons may become Sus-ceptible persons with probabilityβ.The simulation results show that the model can accurately predict the epidemic characters.