Given the increasing number of countries reporting degraded air quality,effective air quality monitoring has become a critical issue in today’s world.However,the current air quality observatory systems are often proh...Given the increasing number of countries reporting degraded air quality,effective air quality monitoring has become a critical issue in today’s world.However,the current air quality observatory systems are often prohibitively expensive,resulting in a lack of observatories in many regions within a country.Consequently,a significant problem arises where not every region receives the same level of air quality information.This disparity occurs because some locations have to rely on information from observatories located far away from their regions,even if they may be the closest available options.To address this challenge,a novel approach that leverages machine learning and deep learning techniques to forecast fine dust concentrations was proposed.Specifically,continuous location features in the form of latitude and longitude values were incorporated into our models.By utilizing a comprehensive dataset comprising weather conditions,air quality measurements,and location properties,various machine learning models,including Random Forest Regression,XGBoost Regression,AdaBoost Regression,and a deep learning model known as Long Short-Term Memory(LSTM)were trained.Our experimental results demonstrated that the LSTM model outperforms the other models,achieving the best score with a root mean squared error of 23.48 in predicting fine dust(PM10)concentrations on an hourly basis.Furthermore,the fact that incorporating location properties,such as longitude and latitude values,enhances the overall quality of the regression models was discovered.Additionally,the implications and contributions of our research were discussed.By implementing our approach,the cost associated with relying solely on existing observatories can be substantially reduced.This reduction in costs can pave the way for economically efficient fine dust observation systems,ensuring more widespread and accurate air quality monitoring across different regions.展开更多
In the field of natural language processing(NLP),the advancement of neural machine translation has paved the way for cross-lingual research.Yet,most studies in NLP have evaluated the proposed language models on well-r...In the field of natural language processing(NLP),the advancement of neural machine translation has paved the way for cross-lingual research.Yet,most studies in NLP have evaluated the proposed language models on well-refined datasets.We investigatewhether amachine translation approach is suitable for multilingual analysis of unrefined datasets,particularly,chat messages in Twitch.In order to address it,we collected the dataset,which included 7,066,854 and 3,365,569 chat messages from English and Korean streams,respectively.We employed several machine learning classifiers and neural networks with two different types of embedding:word-sequence embedding and the final layer of a pre-trained language model.The results of the employed models indicate that the accuracy difference between English,and English to Korean was relatively high,ranging from 3%to 12%.For Korean data(Korean,and Korean to English),it ranged from 0%to 2%.Therefore,the results imply that translation from a low-resource language(e.g.,Korean)into a high-resource language(e.g.,English)shows higher performance,in contrast to vice versa.Several implications and limitations of the presented results are also discussed.For instance,we suggest the feasibility of translation from resource-poor languages for using the tools of resource-rich languages in further analysis.展开更多
基金This research was supported by the MSIT(Ministry of Science and ICT),Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)Program(IITP-2020-0-01816)supervised by the IITP(Institute of Information&Communications Technology Planning&Evaluation)This research was also supported by National Research Foundation(NRF)of Korea Grant funded by the Korean Government(MSIT)(No.2021R1A4A3022102).
文摘Given the increasing number of countries reporting degraded air quality,effective air quality monitoring has become a critical issue in today’s world.However,the current air quality observatory systems are often prohibitively expensive,resulting in a lack of observatories in many regions within a country.Consequently,a significant problem arises where not every region receives the same level of air quality information.This disparity occurs because some locations have to rely on information from observatories located far away from their regions,even if they may be the closest available options.To address this challenge,a novel approach that leverages machine learning and deep learning techniques to forecast fine dust concentrations was proposed.Specifically,continuous location features in the form of latitude and longitude values were incorporated into our models.By utilizing a comprehensive dataset comprising weather conditions,air quality measurements,and location properties,various machine learning models,including Random Forest Regression,XGBoost Regression,AdaBoost Regression,and a deep learning model known as Long Short-Term Memory(LSTM)were trained.Our experimental results demonstrated that the LSTM model outperforms the other models,achieving the best score with a root mean squared error of 23.48 in predicting fine dust(PM10)concentrations on an hourly basis.Furthermore,the fact that incorporating location properties,such as longitude and latitude values,enhances the overall quality of the regression models was discovered.Additionally,the implications and contributions of our research were discussed.By implementing our approach,the cost associated with relying solely on existing observatories can be substantially reduced.This reduction in costs can pave the way for economically efficient fine dust observation systems,ensuring more widespread and accurate air quality monitoring across different regions.
基金This work was supported by Institute of Information&communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.2021-0-00358,AI·Big data based Cyber Security Orchestration and Automated Response Technology Development).
文摘In the field of natural language processing(NLP),the advancement of neural machine translation has paved the way for cross-lingual research.Yet,most studies in NLP have evaluated the proposed language models on well-refined datasets.We investigatewhether amachine translation approach is suitable for multilingual analysis of unrefined datasets,particularly,chat messages in Twitch.In order to address it,we collected the dataset,which included 7,066,854 and 3,365,569 chat messages from English and Korean streams,respectively.We employed several machine learning classifiers and neural networks with two different types of embedding:word-sequence embedding and the final layer of a pre-trained language model.The results of the employed models indicate that the accuracy difference between English,and English to Korean was relatively high,ranging from 3%to 12%.For Korean data(Korean,and Korean to English),it ranged from 0%to 2%.Therefore,the results imply that translation from a low-resource language(e.g.,Korean)into a high-resource language(e.g.,English)shows higher performance,in contrast to vice versa.Several implications and limitations of the presented results are also discussed.For instance,we suggest the feasibility of translation from resource-poor languages for using the tools of resource-rich languages in further analysis.