On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is e...On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is essential for on-site programming big data.Duplicate data detection is an important step in data cleaning,which can save storage resources and enhance data consistency.Due to the insufficiency in traditional Sorted Neighborhood Method(SNM)and the difficulty of high-dimensional data detection,an optimized algorithm based on random forests with the dynamic and adaptive window size is proposed.The efficiency of the algorithm can be elevated by improving the method of the key-selection,reducing dimension of data set and using an adaptive variable size sliding window.Experimental results show that the improved SNM algorithm exhibits better performance and achieve higher accuracy.展开更多
The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterpri...The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterprise data center requires a significant amount of time and human effort. Following a major disruption, the recovery process involves multiple stages, and during each stage, the partially recovered infrastructures can provide limited services to users at some degraded service level. However, how fast and efficiently an enterprise infrastructure can be recovered de- pends on how the recovery mechanism restores the disrupted components, considering the inter-dependencies between services, along with the limitations of expert human operators. The entire problem turns out to be NP- hard and rather complex, and we devise an efficient meta-heuristic to solve the problem. By considering some real-world examples, we show that the proposed meta-heuristic provides very accurate results, and still runs 600-2800 times faster than the optimal solution obtained from a general purpose mathematical solver [1].展开更多
he transition from traditional learning to practice-oriented programming learning will bring learners discomfort.The discomfort quickly breeds negative emotions when encountering programming difficulties,which leads t...he transition from traditional learning to practice-oriented programming learning will bring learners discomfort.The discomfort quickly breeds negative emotions when encountering programming difficulties,which leads the learner to lose interest in programming or even give up.Emotion plays a crucial role in learning.Educational psychology research shows that positive emotion can promote learning performance,increase learning interest and cultivate creative thinking.Accurate recognition and interpretation of programming learners’emotions can give them feedback in time,and adjust teaching strategies accurately and individually,which is of considerable significance to improve effects of programming learning and education.The existing methods of sensor-free emotion prediction include emotion prediction based on keyboard dynamic,mouse interaction data and interaction logs,respectively.However,none of the three studies considered the temporal characteristics of emotion,resulting in low recognition accuracy.For the first time,this paper proposes an emotion prediction model based on time series and context information.Then,we establish a Bi-recurrent neural network,obtain the time sequence characteristics of data automatically,and explore the application of deep learning in the field of Academic Emotion prediction.The results show that the classification ability of this model is much better than that of the original LSTM(Long-Short Term Memory),GRU(Gate Recurrent Unit)and RNN(Re-current Neural Network),and this model has better generalization ability.展开更多
This study evaluated the application of the European flood forecasting operational real time system (EFFORTS) to the Yellow River. An automatic data pre-processing program was developed to provide real-time hydromet...This study evaluated the application of the European flood forecasting operational real time system (EFFORTS) to the Yellow River. An automatic data pre-processing program was developed to provide real-time hydrometeorological data. Various GIS layers were collected and developed to meet the demands of the distributed hydrological model in the EFFORTS. The model parameters were calibrated and validated based on more than ten years of historical hydrometeorological data from the study area. The San-Hua Basin (from the Sanmenxia Reservoir to the Huayuankou Hydrological Station), the most geographically important area of the Yellow River, was chosen as the study area. The analysis indicates that the EFFORTS enhances the work efficiency, extends the flood forecasting lead time, and attains an acceptable level of forecasting accuracy in the San-Hua Basin, with a mean deterministic coefficient at Huayuankou Station, the basin outlet, of 0.90 in calibration and 0.96 in validation. The analysis also shows that the ;simulation accuracy is better for the southern part than for the northern part of the San-Hua Basin. This implies that, along with the characteristics of the basin and the mechanisms of runoff generation of the hydrological model, the hydrometeorological data play an important role in simulation of hydrological behavior.展开更多
For this special section on software systems special section, discuss important issues that will shape several research leaders in software systems, as guest editors for this this field's future directions. The essa...For this special section on software systems special section, discuss important issues that will shape several research leaders in software systems, as guest editors for this this field's future directions. The essays included in this roundtable article cover research opportunities and challenges for emerging software systems such as data processing programs (Xiangyu Zhang) and online services (Dongmei Zhang), with new directions of technologies such as unifications in software testing (Yves Le Traon), data-driven and evidence-based software engineering (Qing Wang), and dynamic analysis of multiple traces (Lu Zhang). Tao Xie, Leading Editor of Special Section on Softwaare Svstem.展开更多
基金supported by the National Key R&D Program of China(Nos.2018YFB1003905)the National Natural Science Foundation of China under Grant No.61971032,Fundamental Research Funds for the Central Universities(No.FRF-TP-18-008A3).
文摘On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is essential for on-site programming big data.Duplicate data detection is an important step in data cleaning,which can save storage resources and enhance data consistency.Due to the insufficiency in traditional Sorted Neighborhood Method(SNM)and the difficulty of high-dimensional data detection,an optimized algorithm based on random forests with the dynamic and adaptive window size is proposed.The efficiency of the algorithm can be elevated by improving the method of the key-selection,reducing dimension of data set and using an adaptive variable size sliding window.Experimental results show that the improved SNM algorithm exhibits better performance and achieve higher accuracy.
文摘The primary focus of this paper is to design a progressive restoration plan for an enterprise data center environment following a partial or full disruption. Repairing and restoring disrupted components in an enterprise data center requires a significant amount of time and human effort. Following a major disruption, the recovery process involves multiple stages, and during each stage, the partially recovered infrastructures can provide limited services to users at some degraded service level. However, how fast and efficiently an enterprise infrastructure can be recovered de- pends on how the recovery mechanism restores the disrupted components, considering the inter-dependencies between services, along with the limitations of expert human operators. The entire problem turns out to be NP- hard and rather complex, and we devise an efficient meta-heuristic to solve the problem. By considering some real-world examples, we show that the proposed meta-heuristic provides very accurate results, and still runs 600-2800 times faster than the optimal solution obtained from a general purpose mathematical solver [1].
基金supported by the 2018-2020 Higher Education Talent Training Quality and Teaching Reform Project of Sichuan Province(Grant No.JG2018-46)the Science and Technology Planning Program of Sichuan University and Luzhou(Grant No.2017CDLZG30)the Postdoctoral Science fund of Sichuan University(Grant No.2019SCU12058).
文摘he transition from traditional learning to practice-oriented programming learning will bring learners discomfort.The discomfort quickly breeds negative emotions when encountering programming difficulties,which leads the learner to lose interest in programming or even give up.Emotion plays a crucial role in learning.Educational psychology research shows that positive emotion can promote learning performance,increase learning interest and cultivate creative thinking.Accurate recognition and interpretation of programming learners’emotions can give them feedback in time,and adjust teaching strategies accurately and individually,which is of considerable significance to improve effects of programming learning and education.The existing methods of sensor-free emotion prediction include emotion prediction based on keyboard dynamic,mouse interaction data and interaction logs,respectively.However,none of the three studies considered the temporal characteristics of emotion,resulting in low recognition accuracy.For the first time,this paper proposes an emotion prediction model based on time series and context information.Then,we establish a Bi-recurrent neural network,obtain the time sequence characteristics of data automatically,and explore the application of deep learning in the field of Academic Emotion prediction.The results show that the classification ability of this model is much better than that of the original LSTM(Long-Short Term Memory),GRU(Gate Recurrent Unit)and RNN(Re-current Neural Network),and this model has better generalization ability.
基金supported by the ADB Loan for Flood Management Project in the Yellow River Basin (Grant No. YH-SW-XH-02)
文摘This study evaluated the application of the European flood forecasting operational real time system (EFFORTS) to the Yellow River. An automatic data pre-processing program was developed to provide real-time hydrometeorological data. Various GIS layers were collected and developed to meet the demands of the distributed hydrological model in the EFFORTS. The model parameters were calibrated and validated based on more than ten years of historical hydrometeorological data from the study area. The San-Hua Basin (from the Sanmenxia Reservoir to the Huayuankou Hydrological Station), the most geographically important area of the Yellow River, was chosen as the study area. The analysis indicates that the EFFORTS enhances the work efficiency, extends the flood forecasting lead time, and attains an acceptable level of forecasting accuracy in the San-Hua Basin, with a mean deterministic coefficient at Huayuankou Station, the basin outlet, of 0.90 in calibration and 0.96 in validation. The analysis also shows that the ;simulation accuracy is better for the southern part than for the northern part of the San-Hua Basin. This implies that, along with the characteristics of the basin and the mechanisms of runoff generation of the hydrological model, the hydrometeorological data play an important role in simulation of hydrological behavior.
文摘For this special section on software systems special section, discuss important issues that will shape several research leaders in software systems, as guest editors for this this field's future directions. The essays included in this roundtable article cover research opportunities and challenges for emerging software systems such as data processing programs (Xiangyu Zhang) and online services (Dongmei Zhang), with new directions of technologies such as unifications in software testing (Yves Le Traon), data-driven and evidence-based software engineering (Qing Wang), and dynamic analysis of multiple traces (Lu Zhang). Tao Xie, Leading Editor of Special Section on Softwaare Svstem.