The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 pa...The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 participants, 39 undergraduate students (novice users) and 25 graduate students (intermediate-level users) participated in the study. The experimental design was 2 × 2 × 2 × 3 mixed design using two between-subject variables (display complexity, user experience) and two within-subject variables (display format, question difficulty). The results indicated that response time was superior for graphs (relative to tables), especially when the questions were difficult. The intermediate users seemed to adopt more extensive search strategies than novices, as revealed by an analysis of the number of changes they made to the display prior to answering questions. It was concluded that designers of data displays should consider the (a) type of display, (b) difficulty of the task, and (c) expertise level of the user to obtain optimal levels of performance.展开更多
The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are ca...The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.展开更多
This article is about orthogonal frequency-division multiplexing with quadrature amplitude modulation combined with code division multiplexing access for complex data transmission. It aims to present a method which us...This article is about orthogonal frequency-division multiplexing with quadrature amplitude modulation combined with code division multiplexing access for complex data transmission. It aims to present a method which uses two interfering subsets in order to improve the performance of the transmission scheme. The idea is to spread in a coherent manner some data amongst two different codes belonging to the two different subsets involved in complex orthogonal frequency-division multiplexing with quadrature amplitude modulation and code division multiplexing access. This will improve the useful signal level at the receiving side and therefore improve the decoding process especially at low signal to noise ratio. However, this procedure implies some interference with other codes therefore creating a certain noise which is noticeable at high signal to noise ratio.展开更多
The increasing richness of data encourages a comprehensive understanding of economic and financial activities,where variables of interest may include not only scalar(point-like)indicators,but also functional(curve-lik...The increasing richness of data encourages a comprehensive understanding of economic and financial activities,where variables of interest may include not only scalar(point-like)indicators,but also functional(curve-like)and compositional(pie-like)ones.In many research topics,the variables are also chronologically collected across individuals,which falls into the paradigm of longitudinal analysis.The complicated nature of data,however,increases the difficulty of modeling these variables under the classic longitudinal frame-work.In this study,we investigate the linear mixed-effects model(LMM)for such complex data.Different types of variables arefirst consistently represented using the corresponding basis expansions so that the classic LMM can then be conducted on them,which gener-alizes the theoretical framework of LMM to complex data analysis.A number of simulation studies indicate the feasibility and effectiveness of the proposed model.We further illustrate its practical utility in a real data study on Chinese stock market and show that the proposed method can enhance the performance and interpretability of the regression for complex data with diversified characteristics.展开更多
On November 13, 2016, an MW7.8 earthquake struck Kaikoura in South Island of New Zealand. By means of back-projection of array recordings, ASTFs-analysis of global seismic recordings, and joint inversion of global sei...On November 13, 2016, an MW7.8 earthquake struck Kaikoura in South Island of New Zealand. By means of back-projection of array recordings, ASTFs-analysis of global seismic recordings, and joint inversion of global seismic data and co-seismic In SAR data, we investigated complexity of the earthquake source. The result shows that the 2016 MW7.8 Kaikoura earthquake ruptured about 100 s unilaterally from south to northeast(~N28°–33°E), producing a rupture area about 160 km long and about 50 km wide and releasing scalar moment 1.01×1021 Nm. In particular, the rupture area consisted of two slip asperities, with one close to the initial rupture point having a maximal slip value ~6.9 m while the other far away in the northeast having a maximal slip value ~9.3 m. The first asperity slipped for about 65 s and the second one started 40 s after the first one had initiated. The two slipped simultaneously for about 25 s.Furthermore, the first had a nearly thrust slip while the second had both thrust and strike slip. It is interesting that the rupture velocity was not constant, and the whole process may be divided into 5 stages in which the velocities were estimated to be 1.4 km/s, 0 km/s, 2.1 km/s, 0 km/s and 1.1 km/s, respectively. The high-frequency sources distributed nearly along the lower edge of the rupture area, the highfrequency radiating mainly occurred at launching of the asperities, and it seemed that no high-frequency energy was radiated when the rupturing was going to stop.展开更多
Complex engineered systems are often difficult to analyze and design due to the tangled interdependencies among their subsystems and components. Conventional design methods often need exact modeling or accurate struct...Complex engineered systems are often difficult to analyze and design due to the tangled interdependencies among their subsystems and components. Conventional design methods often need exact modeling or accurate structure decomposition, which limits their practical application. The rapid expansion of data makes utilizing data to guide and improve system design indispensable in practical engineering. In this paper, a data driven uncertainty evaluation approach is proposed to support the design of complex engineered systems. The core of the approach is a data-mining based uncertainty evaluation method that predicts the uncertainty level of a specific system design by means of analyzing association relations along different system attributes and synthesizing the information entropy of the covered attribute areas, and a quantitative measure of system uncertainty can be obtained accordingly. Monte Carlo simulation is introduced to get the uncertainty extrema, and the possible data distributions under different situations is discussed in detail The uncertainty values can be normalized using the simulation results and the values can be used to evaluate different system designs. A prototype system is established, and two case studies have been carded out. The case of an inverted pendulum system validates the effectiveness of the proposed method, and the case of an oil sump design shows the practicability when two or more design plans need to be compared. This research can be used to evaluate the uncertainty of complex engineered systems completely relying on data, and is ideally suited for plan selection and performance analysis in system design.展开更多
In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the sur...In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the survey. Infections are often clustered within geographical areas such as enumerator areas (EAs) and thus inducing unobserved frailty. In this paper we consider an approach for estimating parameters when infection time is unknown and assumed correlated within an EA where dependency is modeled as frailties assuming a normal distribution for frailties and a Weibull distribution for baseline hazards. The data was from a household based population survey that used a multi-stage stratified sample design to randomly select 23,275 interviewed individuals from 10,584 households of whom 15,851 interviewed individuals were further tested for HIV (crude prevalence = 9.1%). A further test conducted among those that tested HIV positive found 181 (12.5%) recently infected. Results show high degree of heterogeneity in HIV distribution between EAs translating to a modest correlation of 0.198. Intervention strategies should target geographical areas that contribute disproportionately to the epidemic of HIV. Further research needs to identify such hot spot areas and understand what factors make these areas prone to HIV.展开更多
Baddeleyite is an important mineral geochronometer. It is valued in the U-Pb (ID-TIMS) geochronology more than zircon because of its magmatic origin, while zircon can be metamorphic, hydrothermal or occur as xenocryst...Baddeleyite is an important mineral geochronometer. It is valued in the U-Pb (ID-TIMS) geochronology more than zircon because of its magmatic origin, while zircon can be metamorphic, hydrothermal or occur as xenocrysts. Detailed mineralogical (BSE, KL, etc.) research of baddeleyite started in the Fennoscandian Shield in the 1990s. The mineral was first extracted from the Paleozoic Kovdor deposit, the second-biggest baddeleyite deposit in the world after Phalaborwa (2.1 Ga), South Africa. The mineral was successfully introduced into the U-Pb systematics. This study provides new U-Pb and LA-ICP-MS data on Archean Ti-Mgt and BIF deposits, Paleoproterozoic layered PGE intrusions with Pt-Pd and Cu-Ni reefs and Paleozoic complex deposits (baddeleyite, apatite, foscorite ores, etc.) in the NE Fennoscandian Shield. Data on concentrations of REE in baddeleyite and temperature of the U-Pb systematics closure are also provided. It is shown that baddeleyite plays an important role in the geological history of the Earth, in particular, in the break-up of supercontinents.展开更多
Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producin...Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producing a pseudo likeli-hood. In a 3-level weighted analysis for a binary outcome, we implemented two methods for scaling the sampling weights in the National Health Survey of Pa-kistan (NHSP). For NHSP with health care utilization as a binary outcome we found age, gender, household (HH) goods, urban/rural status, community de-velopment index, province and marital status as significant predictors of health care utilization (p-value < 0.05). The variance of the random intercepts using scaling method 1 is estimated as 0.0961 (standard error 0.0339) for PSU level, and 0.2726 (standard error 0.0995) for household level respectively. Both esti-mates are significantly different from zero (p-value < 0.05) and indicate consid-erable heterogeneity in health care utilization with respect to households and PSUs. The results of the NHSP data analysis showed that all three analyses, weighted (two scaling methods) and un-weighted, converged to almost identical results with few exceptions. This may have occurred because of the large num-ber of 3rd and 2nd level clusters and relatively small ICC. We performed a sim-ulation study to assess the effect of varying prevalence and intra-class correla-tion coefficients (ICCs) on bias of fixed effect parameters and variance components of a multilevel pseudo maximum likelihood (weighted) analysis. The simulation results showed that the performance of the scaled weighted estimators is satisfactory for both scaling methods. Incorporating simulation into the analysis of complex multilevel surveys allows the integrity of the results to be tested and is recommended as good practice.展开更多
In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorith...In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results.展开更多
We deal with the problem of pinning sampled-data synchronization for a complex network with probabilistic time-varying coupling delay. The sampling period considered here is assumed to be less than a given bound. With...We deal with the problem of pinning sampled-data synchronization for a complex network with probabilistic time-varying coupling delay. The sampling period considered here is assumed to be less than a given bound. Without using the Kronecker product, a new synchronization error system is constructed by using the property of the random variable and input delay approach. Based on the Lyapunov theory, a delay-dependent pinning sampled-data synchronization criterion is derived in terms of linear matrix inequalities (LMIs) that can be solved effectively by using MATLAB LMI toolbox. Numerical examples are provided to demonstrate the effectiveness of the proposed scheme.展开更多
文摘The study investigated user experience, display complexity, display type (tables versus graphs), and task difficulty as variables affecting the user’s ability to navigate through complex visual data. A total of 64 participants, 39 undergraduate students (novice users) and 25 graduate students (intermediate-level users) participated in the study. The experimental design was 2 × 2 × 2 × 3 mixed design using two between-subject variables (display complexity, user experience) and two within-subject variables (display format, question difficulty). The results indicated that response time was superior for graphs (relative to tables), especially when the questions were difficult. The intermediate users seemed to adopt more extensive search strategies than novices, as revealed by an analysis of the number of changes they made to the display prior to answering questions. It was concluded that designers of data displays should consider the (a) type of display, (b) difficulty of the task, and (c) expertise level of the user to obtain optimal levels of performance.
基金supported in part by the“Pioneer”and“Leading Goose”R&D Program of Zhejiang(Grant No.2022C03174)the National Natural Science Foundation of China(No.92067103)+4 种基金the Key Research and Development Program of Shaanxi,China(No.2021ZDLGY06-02)the Natural Science Foundation of Shaanxi Province(No.2019ZDLGY12-02)the Shaanxi Innovation Team Project(No.2018TD-007)the Xi'an Science and technology Innovation Plan(No.201809168CX9JC10)the Fundamental Research Funds for the Central Universities(No.YJS2212)and National 111 Program of China B16037.
文摘The security of Federated Learning(FL)/Distributed Machine Learning(DML)is gravely threatened by data poisoning attacks,which destroy the usability of the model by contaminating training samples,so such attacks are called causative availability indiscriminate attacks.Facing the problem that existing data sanitization methods are hard to apply to real-time applications due to their tedious process and heavy computations,we propose a new supervised batch detection method for poison,which can fleetly sanitize the training dataset before the local model training.We design a training dataset generation method that helps to enhance accuracy and uses data complexity features to train a detection model,which will be used in an efficient batch hierarchical detection process.Our model stockpiles knowledge about poison,which can be expanded by retraining to adapt to new attacks.Being neither attack-specific nor scenario-specific,our method is applicable to FL/DML or other online or offline scenarios.
文摘This article is about orthogonal frequency-division multiplexing with quadrature amplitude modulation combined with code division multiplexing access for complex data transmission. It aims to present a method which uses two interfering subsets in order to improve the performance of the transmission scheme. The idea is to spread in a coherent manner some data amongst two different codes belonging to the two different subsets involved in complex orthogonal frequency-division multiplexing with quadrature amplitude modulation and code division multiplexing access. This will improve the useful signal level at the receiving side and therefore improve the decoding process especially at low signal to noise ratio. However, this procedure implies some interference with other codes therefore creating a certain noise which is noticeable at high signal to noise ratio.
基金This research was financially supported by the Natural Science Foundation of China(Nos.71420107025,11701023).
文摘The increasing richness of data encourages a comprehensive understanding of economic and financial activities,where variables of interest may include not only scalar(point-like)indicators,but also functional(curve-like)and compositional(pie-like)ones.In many research topics,the variables are also chronologically collected across individuals,which falls into the paradigm of longitudinal analysis.The complicated nature of data,however,increases the difficulty of modeling these variables under the classic longitudinal frame-work.In this study,we investigate the linear mixed-effects model(LMM)for such complex data.Different types of variables arefirst consistently represented using the corresponding basis expansions so that the classic LMM can then be conducted on them,which gener-alizes the theoretical framework of LMM to complex data analysis.A number of simulation studies indicate the feasibility and effectiveness of the proposed model.We further illustrate its practical utility in a real data study on Chinese stock market and show that the proposed method can enhance the performance and interpretability of the regression for complex data with diversified characteristics.
基金supported by the NSFC project (41474046)the DQJB project (DQJB16B05) of the Institute of Geophysics, CEA
文摘On November 13, 2016, an MW7.8 earthquake struck Kaikoura in South Island of New Zealand. By means of back-projection of array recordings, ASTFs-analysis of global seismic recordings, and joint inversion of global seismic data and co-seismic In SAR data, we investigated complexity of the earthquake source. The result shows that the 2016 MW7.8 Kaikoura earthquake ruptured about 100 s unilaterally from south to northeast(~N28°–33°E), producing a rupture area about 160 km long and about 50 km wide and releasing scalar moment 1.01×1021 Nm. In particular, the rupture area consisted of two slip asperities, with one close to the initial rupture point having a maximal slip value ~6.9 m while the other far away in the northeast having a maximal slip value ~9.3 m. The first asperity slipped for about 65 s and the second one started 40 s after the first one had initiated. The two slipped simultaneously for about 25 s.Furthermore, the first had a nearly thrust slip while the second had both thrust and strike slip. It is interesting that the rupture velocity was not constant, and the whole process may be divided into 5 stages in which the velocities were estimated to be 1.4 km/s, 0 km/s, 2.1 km/s, 0 km/s and 1.1 km/s, respectively. The high-frequency sources distributed nearly along the lower edge of the rupture area, the highfrequency radiating mainly occurred at launching of the asperities, and it seemed that no high-frequency energy was radiated when the rupturing was going to stop.
基金Supported by National Hi-tech Research and Development Program of China(863 Program,Grant No.2015AA042101)
文摘Complex engineered systems are often difficult to analyze and design due to the tangled interdependencies among their subsystems and components. Conventional design methods often need exact modeling or accurate structure decomposition, which limits their practical application. The rapid expansion of data makes utilizing data to guide and improve system design indispensable in practical engineering. In this paper, a data driven uncertainty evaluation approach is proposed to support the design of complex engineered systems. The core of the approach is a data-mining based uncertainty evaluation method that predicts the uncertainty level of a specific system design by means of analyzing association relations along different system attributes and synthesizing the information entropy of the covered attribute areas, and a quantitative measure of system uncertainty can be obtained accordingly. Monte Carlo simulation is introduced to get the uncertainty extrema, and the possible data distributions under different situations is discussed in detail The uncertainty values can be normalized using the simulation results and the values can be used to evaluate different system designs. A prototype system is established, and two case studies have been carded out. The case of an inverted pendulum system validates the effectiveness of the proposed method, and the case of an oil sump design shows the practicability when two or more design plans need to be compared. This research can be used to evaluate the uncertainty of complex engineered systems completely relying on data, and is ideally suited for plan selection and performance analysis in system design.
文摘In studies of HIV, interval-censored data occur naturally. HIV infection time is not usually known exactly, only that it occurred before the survey, within some time interval or has not occurred at the time of the survey. Infections are often clustered within geographical areas such as enumerator areas (EAs) and thus inducing unobserved frailty. In this paper we consider an approach for estimating parameters when infection time is unknown and assumed correlated within an EA where dependency is modeled as frailties assuming a normal distribution for frailties and a Weibull distribution for baseline hazards. The data was from a household based population survey that used a multi-stage stratified sample design to randomly select 23,275 interviewed individuals from 10,584 households of whom 15,851 interviewed individuals were further tested for HIV (crude prevalence = 9.1%). A further test conducted among those that tested HIV positive found 181 (12.5%) recently infected. Results show high degree of heterogeneity in HIV distribution between EAs translating to a modest correlation of 0.198. Intervention strategies should target geographical areas that contribute disproportionately to the epidemic of HIV. Further research needs to identify such hot spot areas and understand what factors make these areas prone to HIV.
文摘Baddeleyite is an important mineral geochronometer. It is valued in the U-Pb (ID-TIMS) geochronology more than zircon because of its magmatic origin, while zircon can be metamorphic, hydrothermal or occur as xenocrysts. Detailed mineralogical (BSE, KL, etc.) research of baddeleyite started in the Fennoscandian Shield in the 1990s. The mineral was first extracted from the Paleozoic Kovdor deposit, the second-biggest baddeleyite deposit in the world after Phalaborwa (2.1 Ga), South Africa. The mineral was successfully introduced into the U-Pb systematics. This study provides new U-Pb and LA-ICP-MS data on Archean Ti-Mgt and BIF deposits, Paleoproterozoic layered PGE intrusions with Pt-Pd and Cu-Ni reefs and Paleozoic complex deposits (baddeleyite, apatite, foscorite ores, etc.) in the NE Fennoscandian Shield. Data on concentrations of REE in baddeleyite and temperature of the U-Pb systematics closure are also provided. It is shown that baddeleyite plays an important role in the geological history of the Earth, in particular, in the break-up of supercontinents.
文摘Complex survey designs often involve unequal selection probabilities of clus-ters or units within clusters. When estimating models for complex survey data, scaled weights are incorporated into the likelihood, producing a pseudo likeli-hood. In a 3-level weighted analysis for a binary outcome, we implemented two methods for scaling the sampling weights in the National Health Survey of Pa-kistan (NHSP). For NHSP with health care utilization as a binary outcome we found age, gender, household (HH) goods, urban/rural status, community de-velopment index, province and marital status as significant predictors of health care utilization (p-value < 0.05). The variance of the random intercepts using scaling method 1 is estimated as 0.0961 (standard error 0.0339) for PSU level, and 0.2726 (standard error 0.0995) for household level respectively. Both esti-mates are significantly different from zero (p-value < 0.05) and indicate consid-erable heterogeneity in health care utilization with respect to households and PSUs. The results of the NHSP data analysis showed that all three analyses, weighted (two scaling methods) and un-weighted, converged to almost identical results with few exceptions. This may have occurred because of the large num-ber of 3rd and 2nd level clusters and relatively small ICC. We performed a sim-ulation study to assess the effect of varying prevalence and intra-class correla-tion coefficients (ICCs) on bias of fixed effect parameters and variance components of a multilevel pseudo maximum likelihood (weighted) analysis. The simulation results showed that the performance of the scaled weighted estimators is satisfactory for both scaling methods. Incorporating simulation into the analysis of complex multilevel surveys allows the integrity of the results to be tested and is recommended as good practice.
文摘In this paper, we analyze the complexity and entropy of different methods of data compression algorithms: LZW, Huffman, Fixed-length code (FLC), and Huffman after using Fixed-length code (HFLC). We test those algorithms on different files of different sizes and then conclude that: LZW is the best one in all compression scales that we tested especially on the large files, then Huffman, HFLC, and FLC, respectively. Data compression still is an important topic for research these days, and has many applications and uses needed. Therefore, we suggest continuing searching in this field and trying to combine two techniques in order to reach a best one, or use another source mapping (Hamming) like embedding a linear array into a Hypercube with other good techniques like Huffman and trying to reach good results.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61203049 and 61303020)the Natural Science Foundation of Shanxi Province of China(Grant No.2013021018-3)the Doctoral Startup Foundation of Taiyuan University of Science and Technology,China(Grant No.20112010)
文摘We deal with the problem of pinning sampled-data synchronization for a complex network with probabilistic time-varying coupling delay. The sampling period considered here is assumed to be less than a given bound. Without using the Kronecker product, a new synchronization error system is constructed by using the property of the random variable and input delay approach. Based on the Lyapunov theory, a delay-dependent pinning sampled-data synchronization criterion is derived in terms of linear matrix inequalities (LMIs) that can be solved effectively by using MATLAB LMI toolbox. Numerical examples are provided to demonstrate the effectiveness of the proposed scheme.