A research study collected intensive longitudinal data from cancer patients on a daily basis as well as non-intensive longitudinal survey data on a monthly basis. Although the daily data need separate analysis, those ...A research study collected intensive longitudinal data from cancer patients on a daily basis as well as non-intensive longitudinal survey data on a monthly basis. Although the daily data need separate analysis, those data can also be utilized to generate predictors of monthly outcomes. Alternatives for generating daily data predictors of monthly outcomes are addressed in this work. Analyses are reported of depression measured by the Patient Health Questionnaire 8 as the monthly survey outcome. Daily measures include numbers of opioid medications taken, numbers of pain flares, least pain levels, and worst pain levels. Predictors are averages of recent non-missing values for each daily measure recorded on or prior to survey dates for depression values. Weights for recent non-missing values are based on days between measurement of a recent value and a survey date. Five alternative averages are considered: averages with unit weights, averages with reciprocal weights, weighted averages with reciprocal weights, averages with exponential weights, and weighted averages with exponential weights. Adaptive regression methods based on likelihood cross-validation (LCV) scores are used to generate fractional polynomial models for possible nonlinear dependence of depression on each average. For all four daily measures, the best LCV score over averages of all types is generated using the average of recent non-missing values with reciprocal weights. Generated models are nonlinear and monotonic. Results indicate that an appropriate choice would be to assume three recent non-missing values and use the average with reciprocal weights of the first three recent non-missing values.展开更多
Macroseismic intensity data plays an important role in the process of seismic hazard analysis as well in developing of reliable earthquake loss models. This paper presents a physical-based model to predict macroseismi...Macroseismic intensity data plays an important role in the process of seismic hazard analysis as well in developing of reliable earthquake loss models. This paper presents a physical-based model to predict macroseismic intensity attenuation based on 560 intensity data obtained in Iran in the time period 1975-2013. The geometric spreading and energy absorption of seismic waves have been considered in the proposed model. The proposed easy to implement relation describes the intensity simply as a function of moment magnitude, source to site distance and focal depth. The prediction capability of the proposed model is assessed by means of residuals analysis. Prediction results have been compared with those of other intensity prediction models for Italy, Turkey, Iran and central Asia. The results indicate the higher attenuation rate for the study area in distances less than 70 km.展开更多
In order to compete in the global manufacturing mar ke t, agility is the only possible solution to response to the fragmented market se gments and frequently changed customer requirements. However, manufacturing agil ...In order to compete in the global manufacturing mar ke t, agility is the only possible solution to response to the fragmented market se gments and frequently changed customer requirements. However, manufacturing agil ity can only be attained through the deployment of knowledge. To embed knowledge into a CAD system to form a knowledge intensive CAD (KIC) system is one of way to enhance the design compatibility of a manufacturing company. The most difficu lt phase to develop a KIC system is to capitalize a huge amount of legacy data t o form a knowledge database. In the past, such capitalization process could only be done solely manually or semi-automatic. In this paper, a five step model fo r automatic design knowledge capitalization through the use of data mining is pr oposed whilst details of how to select, verify and performance benchmarking an a ppropriate data mining algorithm for a specific design task will also be discuss ed. A case study concerning the design of a plastic toaster casing was used as an illustration for the proposed methodology and it was found that the avera ge absolute error of the predictions for the most appropriate algorithm is withi n 17%.展开更多
Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projec...Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projects cannot be carried out and many researchers are excluded.Since about 80%of the time in data-intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of data-our methods do not scale.Therefore experts worldwide are looking for strategies and methods that have a potential for the future.The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers(PID)and metadata(MD).In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use.It is argued,however,that assigning PIDs is just the first step.If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed,we are close to defining Digital Objects(DOs)which could indeed indicate a solution to solve some of the basic problems in data management and processing.In addition to standardizing the way we assign PIDs,metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations.We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry.A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.展开更多
Byte-addressable persistent memory(B-APM)presents a new opportunity to bridge the performance gap between main memory and storage.In this paper,we present the usage scenarios for this new technology,based on the capab...Byte-addressable persistent memory(B-APM)presents a new opportunity to bridge the performance gap between main memory and storage.In this paper,we present the usage scenarios for this new technology,based on the capabilities of Intel’s DCPMM.We outline some of the basic performance characteristics of DCPMM,and explain how it can be configured and used to address the needs of memory and I/O intensive applications in the HPC(high-performance computing)and data intensive domains.Two decision trees are presented to advise on the configuration options for BAPM;their use is illustrated with two examples.We show that the flexibility of the technology has the potential to be truly disruptive,not only because of the performance improvements it can deliver,but also because it allows systems to cater for wider range of applications on homogeneous hardware.展开更多
Cloud computing emerges as a new computing pattern that can provide elastic services for any users around the world. It provides good chances to solve large scale scientific problems with fewer efforts. Application de...Cloud computing emerges as a new computing pattern that can provide elastic services for any users around the world. It provides good chances to solve large scale scientific problems with fewer efforts. Application deployment remains an important issue in clouds. Appropriate scheduling mechanisms can shorten the total completion time of an application and therefore improve the quality of service(QoS) for cloud users. Unlike current scheduling algorithms which mostly focus on single task allocation, we propose a deadline based scheduling approach for data-intensive applications in clouds. It does not simply consider the total completion time of an application as the sum of all its subtasks' completion time. Not only the computation capacity of virtual machine(VM) is considered, but also the communication delay and data access latencies are taken into account. Simulations show that our proposed approach has a decided advantage over the two other algorithms.展开更多
文摘A research study collected intensive longitudinal data from cancer patients on a daily basis as well as non-intensive longitudinal survey data on a monthly basis. Although the daily data need separate analysis, those data can also be utilized to generate predictors of monthly outcomes. Alternatives for generating daily data predictors of monthly outcomes are addressed in this work. Analyses are reported of depression measured by the Patient Health Questionnaire 8 as the monthly survey outcome. Daily measures include numbers of opioid medications taken, numbers of pain flares, least pain levels, and worst pain levels. Predictors are averages of recent non-missing values for each daily measure recorded on or prior to survey dates for depression values. Weights for recent non-missing values are based on days between measurement of a recent value and a survey date. Five alternative averages are considered: averages with unit weights, averages with reciprocal weights, weighted averages with reciprocal weights, averages with exponential weights, and weighted averages with exponential weights. Adaptive regression methods based on likelihood cross-validation (LCV) scores are used to generate fractional polynomial models for possible nonlinear dependence of depression on each average. For all four daily measures, the best LCV score over averages of all types is generated using the average of recent non-missing values with reciprocal weights. Generated models are nonlinear and monotonic. Results indicate that an appropriate choice would be to assume three recent non-missing values and use the average with reciprocal weights of the first three recent non-missing values.
文摘Macroseismic intensity data plays an important role in the process of seismic hazard analysis as well in developing of reliable earthquake loss models. This paper presents a physical-based model to predict macroseismic intensity attenuation based on 560 intensity data obtained in Iran in the time period 1975-2013. The geometric spreading and energy absorption of seismic waves have been considered in the proposed model. The proposed easy to implement relation describes the intensity simply as a function of moment magnitude, source to site distance and focal depth. The prediction capability of the proposed model is assessed by means of residuals analysis. Prediction results have been compared with those of other intensity prediction models for Italy, Turkey, Iran and central Asia. The results indicate the higher attenuation rate for the study area in distances less than 70 km.
文摘In order to compete in the global manufacturing mar ke t, agility is the only possible solution to response to the fragmented market se gments and frequently changed customer requirements. However, manufacturing agil ity can only be attained through the deployment of knowledge. To embed knowledge into a CAD system to form a knowledge intensive CAD (KIC) system is one of way to enhance the design compatibility of a manufacturing company. The most difficu lt phase to develop a KIC system is to capitalize a huge amount of legacy data t o form a knowledge database. In the past, such capitalization process could only be done solely manually or semi-automatic. In this paper, a five step model fo r automatic design knowledge capitalization through the use of data mining is pr oposed whilst details of how to select, verify and performance benchmarking an a ppropriate data mining algorithm for a specific design task will also be discuss ed. A case study concerning the design of a plastic toaster casing was used as an illustration for the proposed methodology and it was found that the avera ge absolute error of the predictions for the most appropriate algorithm is withi n 17%.
文摘Data-intensive science is reality in large scientific organizations such as the Max Planck Society,but due to the inefficiency of our data practices when it comes to integrating data from different sources,many projects cannot be carried out and many researchers are excluded.Since about 80%of the time in data-intensive projects is wasted according to surveys we need to conclude that we are not fit for the challenges that will come with the billions of smart devices producing continuous streams of data-our methods do not scale.Therefore experts worldwide are looking for strategies and methods that have a potential for the future.The first steps have been made since there is now a wide agreement from the Research Data Alliance to the FAIR principles that data should be associated with persistent identifiers(PID)and metadata(MD).In fact after 20 years of experience we can claim that there are trustworthy PID systems already in broad use.It is argued,however,that assigning PIDs is just the first step.If we agree to assign PIDs and also use the PID to store important relationships such as pointing to locations where the bit sequences or different metadata can be accessed,we are close to defining Digital Objects(DOs)which could indeed indicate a solution to solve some of the basic problems in data management and processing.In addition to standardizing the way we assign PIDs,metadata and other state information we could also define a Digital Object Access Protocol as a universal exchange protocol for DOs stored in repositories using different data models and data organizations.We could also associate a type with each DO and a set of operations allowed working on its content which would facilitate the way to automatic processing which has been identified as the major step for scalability in data science and data industry.A globally connected group of experts is now working on establishing testbeds for a DO-based data infrastructure.
基金The NEXTGenIO (Next Generation I/O for the Exascale) project has received funding from the European Union’s Horizon 2020Research and Innovation Programme under Grant Agreement No. 671951.
文摘Byte-addressable persistent memory(B-APM)presents a new opportunity to bridge the performance gap between main memory and storage.In this paper,we present the usage scenarios for this new technology,based on the capabilities of Intel’s DCPMM.We outline some of the basic performance characteristics of DCPMM,and explain how it can be configured and used to address the needs of memory and I/O intensive applications in the HPC(high-performance computing)and data intensive domains.Two decision trees are presented to advise on the configuration options for BAPM;their use is illustrated with two examples.We show that the flexibility of the technology has the potential to be truly disruptive,not only because of the performance improvements it can deliver,but also because it allows systems to cater for wider range of applications on homogeneous hardware.
基金supported by the National Natural Science Foundation of China (51507084)the NUPTSF (NY214203)the Natural Science Foundation for Colleges and Universities in Jiangsu Province (14KJB120009)
文摘Cloud computing emerges as a new computing pattern that can provide elastic services for any users around the world. It provides good chances to solve large scale scientific problems with fewer efforts. Application deployment remains an important issue in clouds. Appropriate scheduling mechanisms can shorten the total completion time of an application and therefore improve the quality of service(QoS) for cloud users. Unlike current scheduling algorithms which mostly focus on single task allocation, we propose a deadline based scheduling approach for data-intensive applications in clouds. It does not simply consider the total completion time of an application as the sum of all its subtasks' completion time. Not only the computation capacity of virtual machine(VM) is considered, but also the communication delay and data access latencies are taken into account. Simulations show that our proposed approach has a decided advantage over the two other algorithms.