In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to mul...In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.展开更多
Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretabilit...Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretability of these models presents a persistent challenge.Design/methodology/approach:This paper proposes two innovative dimensionality reduction models based on integer programming(DRMBIP).These models assess compactness through the correlation of each indicator with its class center,while separation is evaluated by the correlation between different class centers.In contrast to DRMBIP-p,the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation.Findings:This study,getting data from the Global Health Observatory(GHO),investigates 141 indicators that influence life expectancy.The findings reveal that DRMBIP-p effectively reduces the dimensionality of data,ensuring compactness.It also maintains compatibility with other models.Additionally,DRMBIP-v finds the optimal result,showing exceptional separation.Visualization of the results reveals that all classes have a high compactness.Research limitations:The DRMBIP-p requires the input of the correlation threshold parameter,which plays a pivotal role in the effectiveness of the final dimensionality reduction results.In the DRMBIP-v,modifying the threshold parameter to variable potentially emphasizes either separation or compactness.This necessitates an artificial adjustment to the overflow component within the objective function.Practical implications:The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators.Validated by life expectancy data,this paper demonstrates potential to assist data miners with the reduction of data dimensions.Originality/value:To our knowledge,this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering.It not only has applications in life expectancy,but also has obvious advantages in data mining work that requires precise class centers.展开更多
This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contra...This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.展开更多
The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource,...The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource, heterogeneous, dynamic, and sparse data. However, IoT offers inconsequential practical benefits without the ability to integrate, fuse, and glean useful information from such massive amounts of data. Accordingly, preparing us for the imminent invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve process efficiency and provide advanced intelligence. In order to determine an acceptable quality of intelligence, diverse and voluminous data have to be combined and fused. Therefore, it is imperative to improve the computational efficiency for fusing and mining multidimensional data. In this paper, we propose an efficient multidimensional fusion algorithm for IoT data based on partitioning. The basic concept involves the partitioning of dimensions (attributes), i.e., a big data set with higher dimensions can be transformed into certain number of relatively smaller data subsets that can be easily processed. Then, based on the partitioning of dimensions, the discernible matrixes of all data subsets in rough set theory are computed to obtain their core attribute sets. Furthermore, a global core attribute set can be determined. Finally, the attribute reduction and rule extraction methods are used to obtain the fusion results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is illustrated.展开更多
The ongoing quest for higher data storage density has led to a plethora of innovations in the field of optical data storage.This review paper provides a comprehensive overview of recent advancements in next-generation...The ongoing quest for higher data storage density has led to a plethora of innovations in the field of optical data storage.This review paper provides a comprehensive overview of recent advancements in next-generation optical data storage,offering insights into various technological roadmaps.We pay particular attention to multidimensional and superresolution approaches,each of which uniquely addresses the challenge of dense storage.The multidimensional approach exploits multiple parameters of light,allowing for the storage of multiple bits of information within a single voxel while still adhering to diffraction limitation.Alternatively,superresolution approaches leverage the photoexcitation and photoinhibition properties of materials to create diffraction-unlimited data voxels.We conclude by summarizing the immense opportunities these approaches present,while also outlining the formidable challenges they face in the transition to industrial applications.展开更多
Comprehensive characterization of metabolites and metabolic profiles in plasma has considerable significance in determining the efficacy and safety of traditional Chinese medicine(TCM)in vivo.However,this process is u...Comprehensive characterization of metabolites and metabolic profiles in plasma has considerable significance in determining the efficacy and safety of traditional Chinese medicine(TCM)in vivo.However,this process is usually hindered by the insufficient characteristic fragments of metabolites,ubiquitous matrix interference,and complicated screening and identification procedures for metabolites.In this study,an effective strategy was established to systematically characterize the metabolites,deduce the metabolic pathways,and describe the metabolic profiles of bufadienolides isolated from Venenum Bufonis in vivo.The strategy was divided into five steps.First,the blank and test plasma samples were injected into an ultra-high performance liquid chromatography/linear trap quadrupole-orbitrap-mass spectrometry(MS)system in the full scan mode continuously five times to screen for valid matrix compounds and metabolites.Second,an extension-mass defect filter model was established to obtain the targeted precursor ions of the list of bufadienolide metabolites,which reduced approximately 39%of the interfering ions.Third,an acquisition model was developed and used to trigger more tandem MS(MS/MS)fragments of precursor ions based on the targeted ion list.The acquisition mode enhanced the acquisition capability by approximately four times than that of the regular data-dependent acquisition mode.Fourth,the acquired data were imported into Compound Discoverer software for identification of metabolites with metabolic network prediction.The main in vivo metabolic pathways of bufadienolides were elucidated.A total of 147 metabolites were characterized,and the main biotransformation reactions of bufadienolides were hydroxylation,dihydroxylation,and isomerization.Finally,the main prototype bufadienolides in plasma at different time points were determined using LC-MS/MS,and the metabolic profiles were clearly identified.This strategy could be widely used to elucidate the metabolic profiles of TCM preparations or Chinese patent medicines in vivo and provide critical data for rational drug use.展开更多
Scatterplots and scatterplot matrix methods have been popularly used for showing statistical graphics and for exposing patterns in multivariate data.A recent technique,called Linkable Scatterplots,provides an interest...Scatterplots and scatterplot matrix methods have been popularly used for showing statistical graphics and for exposing patterns in multivariate data.A recent technique,called Linkable Scatterplots,provides an interesting idea for interactive visual exploration which provides a set of necessary plot panels on demand together with interaction,linking and brushing.This article presents a controlled study with a mixed-model design to evaluate the effectiveness and user experience on the visual exploration when using a Sequential-Scatterplots who a single plot is shown at a time,Multiple-Scatterplots who number of plots can be specified and shown,and Simultaneous-Scatterplots who all plots are shown as a scatterplot matrix.Results from the study demonstrated higher accuracy using the Multiple-Scatterplots visualization,particularly in comparison with the Simultaneous-Scatterplots.While the time taken to complete tasks was longer in the Multiple-Scatterplots technique,compared with the simpler Sequential-Scatterplots,Multiple-Scatterplots is inherently more accurate.Moreover,the Multiple-Scatterplots technique is the most highly preferred and positively experienced technique in this study.Overall,results support the strength of Multiple-Scatterplots and highlight its potential as an effective data visualization technique for exploring multivariate data.展开更多
We present angle-uniform parallel coordinates,a data-independent technique that deforms the image plane of parallel coordinates so that the angles of linear relationships between two variables are linearly mapped alon...We present angle-uniform parallel coordinates,a data-independent technique that deforms the image plane of parallel coordinates so that the angles of linear relationships between two variables are linearly mapped along the horizontal axis of the parallel coordinates plot.Despite being a common method for visualizing multidimensional data,parallel coordinates are ineffective for revealing positive correlations since the associated parallel coordinates points of such structures may be located at infinity in the image plane and the asymmetric encoding of negative and positive correlations may lead to unreliable estimations.To address this issue,we introduce a transformation that bounds all points horizontally using an angleuniform mapping and shrinks them vertically in a structure-preserving fashion;polygonal lines become smooth curves and a symmetric representation of data correlations is achieved.We further propose a combined subsampling and density visualization approach to reduce visual clutter caused by overdrawing.Our method enables accurate visual pattern interpretation of data correlations,and its data-independent nature makes it applicable to all multidimensional datasets.The usefulness of our method is demonstrated using examples of synthetic and real-world datasets.展开更多
Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have rec...Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have received great attention from security experts and system administrators.However,the complexity and size of audit logs,which increase in real time,have hindered analysts from understanding and analyzing them.In this paper,we present a novel visual analytics system,LongLine,which enables interactive visual analyses of large-scale audit logs.LongLine lowers the interpretation barrier of audit logs by employing human-understandable representations(e.g.,file paths and commands)instead of abstract indicators of operating systems(e.g.,file descriptors)as well as revealing the temporal patterns of the logs in a multi-scale fashion with meaningful granularity of time in mind(e.g.,hourly,daily,and weekly).LongLine also streamlines comparative analysis between interesting subsets of logs,which is essential in detecting anomalous behaviors of systems.In addition,LongLine allows analysts to monitor the system state in a streaming fashion,keeping the latency between log creation and visualization less than one minute.Finally,we evaluate our system through a case study and a scenario analysis with security experts.展开更多
基金This project was supported by China Postdoctoral Science Foundation (2005037506) and the National Natural ScienceFoundation of China (70472029)
文摘In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.
基金supported by the National Natural Science Foundation of China (Nos.72371115)the Natural Science Foundation of Jilin,China (No.20230101184JC)。
文摘Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretability of these models presents a persistent challenge.Design/methodology/approach:This paper proposes two innovative dimensionality reduction models based on integer programming(DRMBIP).These models assess compactness through the correlation of each indicator with its class center,while separation is evaluated by the correlation between different class centers.In contrast to DRMBIP-p,the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation.Findings:This study,getting data from the Global Health Observatory(GHO),investigates 141 indicators that influence life expectancy.The findings reveal that DRMBIP-p effectively reduces the dimensionality of data,ensuring compactness.It also maintains compatibility with other models.Additionally,DRMBIP-v finds the optimal result,showing exceptional separation.Visualization of the results reveals that all classes have a high compactness.Research limitations:The DRMBIP-p requires the input of the correlation threshold parameter,which plays a pivotal role in the effectiveness of the final dimensionality reduction results.In the DRMBIP-v,modifying the threshold parameter to variable potentially emphasizes either separation or compactness.This necessitates an artificial adjustment to the overflow component within the objective function.Practical implications:The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators.Validated by life expectancy data,this paper demonstrates potential to assist data miners with the reduction of data dimensions.Originality/value:To our knowledge,this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering.It not only has applications in life expectancy,but also has obvious advantages in data mining work that requires precise class centers.
基金supported by the Agence Nationale de la Recherche(ANR)(contract“ANR-17-EURE-0002”)by the Region of Bourgogne Franche-ComtéCADRAN Projectsupported by the European Research Council(ERC)project HYPATIA under the European Union's Horizon 2020 research and innovation programme.Grant agreement n.835294。
文摘This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
基金the National High-Tech Research and Development (863) Program of China (No. 2011AA010101)the National Natural Science Foundation of China (Nos. 61103197, 61073009, and 61240029)+5 种基金the Science and Technology Key Project of Jilin Province (No. 2011ZDGG007)the Youth Foundation of Jilin Province of China (No. 201101035)the Fundamental Research Funds for the Central Universities of China (No. 200903179)China Postdoctoral Science Foundation (No. 2011M500611)the 2011 Industrial Technology Research and Development Special Project of Jilin Province (No. 2011006-9)the 2012 National College Students' Innovative Training Program of China, and European Union Framework Program: MONICA Project under the Grant Agreement Number PIRSES-GA-2011-295222
文摘The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource, heterogeneous, dynamic, and sparse data. However, IoT offers inconsequential practical benefits without the ability to integrate, fuse, and glean useful information from such massive amounts of data. Accordingly, preparing us for the imminent invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve process efficiency and provide advanced intelligence. In order to determine an acceptable quality of intelligence, diverse and voluminous data have to be combined and fused. Therefore, it is imperative to improve the computational efficiency for fusing and mining multidimensional data. In this paper, we propose an efficient multidimensional fusion algorithm for IoT data based on partitioning. The basic concept involves the partitioning of dimensions (attributes), i.e., a big data set with higher dimensions can be transformed into certain number of relatively smaller data subsets that can be easily processed. Then, based on the partitioning of dimensions, the discernible matrixes of all data subsets in rough set theory are computed to obtain their core attribute sets. Furthermore, a global core attribute set can be determined. Finally, the attribute reduction and rule extraction methods are used to obtain the fusion results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is illustrated.
基金supported by the National Key Research and Development Program of China(No.2022YFB2804300)the Creative Research Group Project of NSFC(No.61821003)+2 种基金the Innovation Fund of the Wuhan National Laboratory for Optoelectronicsthe Program for HUST Academic Frontier Youth Teamthe Innovation Project of Optics Valley Laboratory.
文摘The ongoing quest for higher data storage density has led to a plethora of innovations in the field of optical data storage.This review paper provides a comprehensive overview of recent advancements in next-generation optical data storage,offering insights into various technological roadmaps.We pay particular attention to multidimensional and superresolution approaches,each of which uniquely addresses the challenge of dense storage.The multidimensional approach exploits multiple parameters of light,allowing for the storage of multiple bits of information within a single voxel while still adhering to diffraction limitation.Alternatively,superresolution approaches leverage the photoexcitation and photoinhibition properties of materials to create diffraction-unlimited data voxels.We conclude by summarizing the immense opportunities these approaches present,while also outlining the formidable challenges they face in the transition to industrial applications.
基金supported by the National Natural Science Foundation of China (Grant Nos.: 81530095 and 81673591)Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No.: XDA12020348)+1 种基金National Standardization of Traditional Chinese Medicine Project (Grant No.: ZYBZH-K-LN-01)Science and Technology Commission Foundation of Shanghai (Grant No.: 15DZ0502800)
文摘Comprehensive characterization of metabolites and metabolic profiles in plasma has considerable significance in determining the efficacy and safety of traditional Chinese medicine(TCM)in vivo.However,this process is usually hindered by the insufficient characteristic fragments of metabolites,ubiquitous matrix interference,and complicated screening and identification procedures for metabolites.In this study,an effective strategy was established to systematically characterize the metabolites,deduce the metabolic pathways,and describe the metabolic profiles of bufadienolides isolated from Venenum Bufonis in vivo.The strategy was divided into five steps.First,the blank and test plasma samples were injected into an ultra-high performance liquid chromatography/linear trap quadrupole-orbitrap-mass spectrometry(MS)system in the full scan mode continuously five times to screen for valid matrix compounds and metabolites.Second,an extension-mass defect filter model was established to obtain the targeted precursor ions of the list of bufadienolide metabolites,which reduced approximately 39%of the interfering ions.Third,an acquisition model was developed and used to trigger more tandem MS(MS/MS)fragments of precursor ions based on the targeted ion list.The acquisition mode enhanced the acquisition capability by approximately four times than that of the regular data-dependent acquisition mode.Fourth,the acquired data were imported into Compound Discoverer software for identification of metabolites with metabolic network prediction.The main in vivo metabolic pathways of bufadienolides were elucidated.A total of 147 metabolites were characterized,and the main biotransformation reactions of bufadienolides were hydroxylation,dihydroxylation,and isomerization.Finally,the main prototype bufadienolides in plasma at different time points were determined using LC-MS/MS,and the metabolic profiles were clearly identified.This strategy could be widely used to elucidate the metabolic profiles of TCM preparations or Chinese patent medicines in vivo and provide critical data for rational drug use.
文摘Scatterplots and scatterplot matrix methods have been popularly used for showing statistical graphics and for exposing patterns in multivariate data.A recent technique,called Linkable Scatterplots,provides an interesting idea for interactive visual exploration which provides a set of necessary plot panels on demand together with interaction,linking and brushing.This article presents a controlled study with a mixed-model design to evaluate the effectiveness and user experience on the visual exploration when using a Sequential-Scatterplots who a single plot is shown at a time,Multiple-Scatterplots who number of plots can be specified and shown,and Simultaneous-Scatterplots who all plots are shown as a scatterplot matrix.Results from the study demonstrated higher accuracy using the Multiple-Scatterplots visualization,particularly in comparison with the Simultaneous-Scatterplots.While the time taken to complete tasks was longer in the Multiple-Scatterplots technique,compared with the simpler Sequential-Scatterplots,Multiple-Scatterplots is inherently more accurate.Moreover,the Multiple-Scatterplots technique is the most highly preferred and positively experienced technique in this study.Overall,results support the strength of Multiple-Scatterplots and highlight its potential as an effective data visualization technique for exploring multivariate data.
基金support from the Data for Better Health Project of Peking University-Master Kong,YW from the National Natural Science Foundation of China(62132017)DW from the Deutsche Forschungsgemeinschaft(DFG)Project-ID 251654672-TRR 161.
文摘We present angle-uniform parallel coordinates,a data-independent technique that deforms the image plane of parallel coordinates so that the angles of linear relationships between two variables are linearly mapped along the horizontal axis of the parallel coordinates plot.Despite being a common method for visualizing multidimensional data,parallel coordinates are ineffective for revealing positive correlations since the associated parallel coordinates points of such structures may be located at infinity in the image plane and the asymmetric encoding of negative and positive correlations may lead to unreliable estimations.To address this issue,we introduce a transformation that bounds all points horizontally using an angleuniform mapping and shrinks them vertically in a structure-preserving fashion;polygonal lines become smooth curves and a symmetric representation of data correlations is achieved.We further propose a combined subsampling and density visualization approach to reduce visual clutter caused by overdrawing.Our method enables accurate visual pattern interpretation of data correlations,and its data-independent nature makes it applicable to all multidimensional datasets.The usefulness of our method is demonstrated using examples of synthetic and real-world datasets.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea govem-ment(MSIP)(No.NRF-2016R1A2B2007153)by the Han-kuk University of Foreign Studies Research Fund.
文摘Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have received great attention from security experts and system administrators.However,the complexity and size of audit logs,which increase in real time,have hindered analysts from understanding and analyzing them.In this paper,we present a novel visual analytics system,LongLine,which enables interactive visual analyses of large-scale audit logs.LongLine lowers the interpretation barrier of audit logs by employing human-understandable representations(e.g.,file paths and commands)instead of abstract indicators of operating systems(e.g.,file descriptors)as well as revealing the temporal patterns of the logs in a multi-scale fashion with meaningful granularity of time in mind(e.g.,hourly,daily,and weekly).LongLine also streamlines comparative analysis between interesting subsets of logs,which is essential in detecting anomalous behaviors of systems.In addition,LongLine allows analysts to monitor the system state in a streaming fashion,keeping the latency between log creation and visualization less than one minute.Finally,we evaluate our system through a case study and a scenario analysis with security experts.