In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to mul...In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.展开更多
Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is pr...Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is propose a new multidimensional data model based on the partial ordering and mapping. The data model can fully express the complex data structure and semantics of data warehouse, and provide an OLAP operation as the core of the operation of algebra, support structure in levels of complex aggregation operation sequence, which can effectively support the application of OLAE The data model supports the concept of aggregation function constraint, and provides constraint mechanism of the hierarchy aggregation function.展开更多
This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contra...This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.展开更多
Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization met...Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization methods aim at visualizing multidimensional data into low-dimensional representational spaces by making use of spatial metaphors and applying dimension reduction techniques.Spatial metaphors are able to provide a metaphoric framework for the visualization of information at different levels of granularity.The present paper makes an investigation on how the issue of granularity is handled in the context of representative examples of spatialization methods.Furthermore,this paper introduces the prototyping tool Geo-Scape,which provides an interactive spatialization environment for representing and exploring multidimensional data at different levels of granularity,by making use of a kernel density estimation technique and on the landscape "smoothness" metaphor.A demonstration scenario is presented next to show how Geo-Scape helps to discover knowledge into a large set of data,by grouping them into meaningful clusters on the basis of a similarity measure and organizing them at different levels of granularity.展开更多
Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretabilit...Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretability of these models presents a persistent challenge.Design/methodology/approach:This paper proposes two innovative dimensionality reduction models based on integer programming(DRMBIP).These models assess compactness through the correlation of each indicator with its class center,while separation is evaluated by the correlation between different class centers.In contrast to DRMBIP-p,the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation.Findings:This study,getting data from the Global Health Observatory(GHO),investigates 141 indicators that influence life expectancy.The findings reveal that DRMBIP-p effectively reduces the dimensionality of data,ensuring compactness.It also maintains compatibility with other models.Additionally,DRMBIP-v finds the optimal result,showing exceptional separation.Visualization of the results reveals that all classes have a high compactness.Research limitations:The DRMBIP-p requires the input of the correlation threshold parameter,which plays a pivotal role in the effectiveness of the final dimensionality reduction results.In the DRMBIP-v,modifying the threshold parameter to variable potentially emphasizes either separation or compactness.This necessitates an artificial adjustment to the overflow component within the objective function.Practical implications:The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators.Validated by life expectancy data,this paper demonstrates potential to assist data miners with the reduction of data dimensions.Originality/value:To our knowledge,this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering.It not only has applications in life expectancy,but also has obvious advantages in data mining work that requires precise class centers.展开更多
Dimensionality reduction is often used to project time series data from multidimensional to two-dimensional space to generate visual representations of the temporal evolution.In this context,we address the problem of ...Dimensionality reduction is often used to project time series data from multidimensional to two-dimensional space to generate visual representations of the temporal evolution.In this context,we address the problem of multidimensional time series visualization by presenting a new method to show and handle projection errors introduced by dimensionality reduction techniques on multidimensional temporal data.For visualization,subsequent time instances are rendered as dots that are connected by lines or curves to indicate the temporal dependencies.However,inevitable projection artifacts may lead to poor visualization quality and misinterpretation of the temporal information.Wrongly projected data points,inaccurate variations in the distances between projected time instances,and intersections of connecting lines could lead to wrong assumptions about the original data.We adapt local and global quality metrics to measure the visual quality along the projected time series,and we introduce a model to assess the projection error at intersecting lines.These serve as a basis for our new uncertainty visualization techniques that use different visual encodings and interactions to indicate,communicate,and work with the visualization uncertainty from projection errors and artifacts along the timeline of data points,their connections,and intersections.Our approach is agnostic to the projection method and works for linear and non-linear dimensionality reduction methods alike.展开更多
The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource,...The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource, heterogeneous, dynamic, and sparse data. However, IoT offers inconsequential practical benefits without the ability to integrate, fuse, and glean useful information from such massive amounts of data. Accordingly, preparing us for the imminent invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve process efficiency and provide advanced intelligence. In order to determine an acceptable quality of intelligence, diverse and voluminous data have to be combined and fused. Therefore, it is imperative to improve the computational efficiency for fusing and mining multidimensional data. In this paper, we propose an efficient multidimensional fusion algorithm for IoT data based on partitioning. The basic concept involves the partitioning of dimensions (attributes), i.e., a big data set with higher dimensions can be transformed into certain number of relatively smaller data subsets that can be easily processed. Then, based on the partitioning of dimensions, the discernible matrixes of all data subsets in rough set theory are computed to obtain their core attribute sets. Furthermore, a global core attribute set can be determined. Finally, the attribute reduction and rule extraction methods are used to obtain the fusion results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is illustrated.展开更多
The ongoing quest for higher data storage density has led to a plethora of innovations in the field of optical data storage.This review paper provides a comprehensive overview of recent advancements in next-generation...The ongoing quest for higher data storage density has led to a plethora of innovations in the field of optical data storage.This review paper provides a comprehensive overview of recent advancements in next-generation optical data storage,offering insights into various technological roadmaps.We pay particular attention to multidimensional and superresolution approaches,each of which uniquely addresses the challenge of dense storage.The multidimensional approach exploits multiple parameters of light,allowing for the storage of multiple bits of information within a single voxel while still adhering to diffraction limitation.Alternatively,superresolution approaches leverage the photoexcitation and photoinhibition properties of materials to create diffraction-unlimited data voxels.We conclude by summarizing the immense opportunities these approaches present,while also outlining the formidable challenges they face in the transition to industrial applications.展开更多
Comprehensive characterization of metabolites and metabolic profiles in plasma has considerable significance in determining the efficacy and safety of traditional Chinese medicine(TCM)in vivo.However,this process is u...Comprehensive characterization of metabolites and metabolic profiles in plasma has considerable significance in determining the efficacy and safety of traditional Chinese medicine(TCM)in vivo.However,this process is usually hindered by the insufficient characteristic fragments of metabolites,ubiquitous matrix interference,and complicated screening and identification procedures for metabolites.In this study,an effective strategy was established to systematically characterize the metabolites,deduce the metabolic pathways,and describe the metabolic profiles of bufadienolides isolated from Venenum Bufonis in vivo.The strategy was divided into five steps.First,the blank and test plasma samples were injected into an ultra-high performance liquid chromatography/linear trap quadrupole-orbitrap-mass spectrometry(MS)system in the full scan mode continuously five times to screen for valid matrix compounds and metabolites.Second,an extension-mass defect filter model was established to obtain the targeted precursor ions of the list of bufadienolide metabolites,which reduced approximately 39%of the interfering ions.Third,an acquisition model was developed and used to trigger more tandem MS(MS/MS)fragments of precursor ions based on the targeted ion list.The acquisition mode enhanced the acquisition capability by approximately four times than that of the regular data-dependent acquisition mode.Fourth,the acquired data were imported into Compound Discoverer software for identification of metabolites with metabolic network prediction.The main in vivo metabolic pathways of bufadienolides were elucidated.A total of 147 metabolites were characterized,and the main biotransformation reactions of bufadienolides were hydroxylation,dihydroxylation,and isomerization.Finally,the main prototype bufadienolides in plasma at different time points were determined using LC-MS/MS,and the metabolic profiles were clearly identified.This strategy could be widely used to elucidate the metabolic profiles of TCM preparations or Chinese patent medicines in vivo and provide critical data for rational drug use.展开更多
People's attitudes towards public events or products may change overtime,rather than staying on the same state.Understanding how sentiments change overtime is an interesting and important problem with many applica...People's attitudes towards public events or products may change overtime,rather than staying on the same state.Understanding how sentiments change overtime is an interesting and important problem with many applications.Given a certain public event or product,a user's sentiments expressed in microblog stream can be regarded as a vector.In this paper,we define a novel problem of sentiment evolution analysis,and develop a simple yet effective method to detect sentiment evolution in user-level for public events.We firstly propose a multidimensional sentiment model with hierarchical structure to model user's complicate sentiments.Based on this model,we use FP-growth tree algorithm to mine frequent sentiment patterns and perform sentiment evolution analysis by Kullback-Leibler divergence.Moreover,we develop an improve Affinity Propagation algorithm to detect why people change their sentiments.Experimental evaluations on real data sets show that sentiment evolution could be implemented effectively using our method proposed in this article.展开更多
Scatterplots and scatterplot matrix methods have been popularly used for showing statistical graphics and for exposing patterns in multivariate data.A recent technique,called Linkable Scatterplots,provides an interest...Scatterplots and scatterplot matrix methods have been popularly used for showing statistical graphics and for exposing patterns in multivariate data.A recent technique,called Linkable Scatterplots,provides an interesting idea for interactive visual exploration which provides a set of necessary plot panels on demand together with interaction,linking and brushing.This article presents a controlled study with a mixed-model design to evaluate the effectiveness and user experience on the visual exploration when using a Sequential-Scatterplots who a single plot is shown at a time,Multiple-Scatterplots who number of plots can be specified and shown,and Simultaneous-Scatterplots who all plots are shown as a scatterplot matrix.Results from the study demonstrated higher accuracy using the Multiple-Scatterplots visualization,particularly in comparison with the Simultaneous-Scatterplots.While the time taken to complete tasks was longer in the Multiple-Scatterplots technique,compared with the simpler Sequential-Scatterplots,Multiple-Scatterplots is inherently more accurate.Moreover,the Multiple-Scatterplots technique is the most highly preferred and positively experienced technique in this study.Overall,results support the strength of Multiple-Scatterplots and highlight its potential as an effective data visualization technique for exploring multivariate data.展开更多
We present angle-uniform parallel coordinates,a data-independent technique that deforms the image plane of parallel coordinates so that the angles of linear relationships between two variables are linearly mapped alon...We present angle-uniform parallel coordinates,a data-independent technique that deforms the image plane of parallel coordinates so that the angles of linear relationships between two variables are linearly mapped along the horizontal axis of the parallel coordinates plot.Despite being a common method for visualizing multidimensional data,parallel coordinates are ineffective for revealing positive correlations since the associated parallel coordinates points of such structures may be located at infinity in the image plane and the asymmetric encoding of negative and positive correlations may lead to unreliable estimations.To address this issue,we introduce a transformation that bounds all points horizontally using an angleuniform mapping and shrinks them vertically in a structure-preserving fashion;polygonal lines become smooth curves and a symmetric representation of data correlations is achieved.We further propose a combined subsampling and density visualization approach to reduce visual clutter caused by overdrawing.Our method enables accurate visual pattern interpretation of data correlations,and its data-independent nature makes it applicable to all multidimensional datasets.The usefulness of our method is demonstrated using examples of synthetic and real-world datasets.展开更多
Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have rec...Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have received great attention from security experts and system administrators.However,the complexity and size of audit logs,which increase in real time,have hindered analysts from understanding and analyzing them.In this paper,we present a novel visual analytics system,LongLine,which enables interactive visual analyses of large-scale audit logs.LongLine lowers the interpretation barrier of audit logs by employing human-understandable representations(e.g.,file paths and commands)instead of abstract indicators of operating systems(e.g.,file descriptors)as well as revealing the temporal patterns of the logs in a multi-scale fashion with meaningful granularity of time in mind(e.g.,hourly,daily,and weekly).LongLine also streamlines comparative analysis between interesting subsets of logs,which is essential in detecting anomalous behaviors of systems.In addition,LongLine allows analysts to monitor the system state in a streaming fashion,keeping the latency between log creation and visualization less than one minute.Finally,we evaluate our system through a case study and a scenario analysis with security experts.展开更多
Hierarchical clustering algorithm has been applied to identify the X-ray diffraction(XRD)patterns from a high-throughput characterization of the combinatorial materials chips.As data quality is usually correlated with...Hierarchical clustering algorithm has been applied to identify the X-ray diffraction(XRD)patterns from a high-throughput characterization of the combinatorial materials chips.As data quality is usually correlated with acquisition time,it is important to study the hierarchical clustering performance as a function of data quality in order to optimize the efficiency of high-throughput experiments.This work investigated the effects of signal-to-noise ratio on the performance of hier-archical clustering using 29 distance metrics for the XRD patterns from Fe−Co−Ni ternary combinatorial materials chip.It is found that the clustering accuracies evaluated by the F1 score only fluctuate slightly with signal-to-noise ratio varying from 15.5 to 22.3(dB)under the experimental condition.This suggests that although it may take 40-50 s to collect a visually high-quality diffraction pattern,the measurement time could be significantly reduced to as low as 4 s without substantial loss in phase identification accuracy by hierarchical clustering.Among the 29 distance metrics,Pearsonχ^(2)shows the highest mean F1 score of 0.77 and lowest standard deviation of 0.008.It shows that the distance matrixes calculated by Pearsonχ^(2)are mainly controlled by the XRD peak shifting characteristics and visualized by the metric multidimensional data scaling.展开更多
基金This project was supported by China Postdoctoral Science Foundation (2005037506) and the National Natural ScienceFoundation of China (70472029)
文摘In e-commerce the multidimensional data analysis based on the Web data needs integrating various data sources such as XML data and relational data on the conceptual level. A conceptual data description approach to multidimensional data model the UML galaxy diagram is presented in order to conduct multidimensional data analysis for multiple subjects. The approach is illuminated using a case of 2_roots UML galaxy diagram that takes marketing analysis of TV products involved one retailer and several suppliers into consideration.
文摘Data structure and semantics of the traditional data model cannot effectively represent the data warehouse, it is difficult to effectively support online analytical processing (referred to as OLAP). This paper is propose a new multidimensional data model based on the partial ordering and mapping. The data model can fully express the complex data structure and semantics of data warehouse, and provide an OLAP operation as the core of the operation of algebra, support structure in levels of complex aggregation operation sequence, which can effectively support the application of OLAE The data model supports the concept of aggregation function constraint, and provides constraint mechanism of the hierarchy aggregation function.
基金supported by the Agence Nationale de la Recherche(ANR)(contract“ANR-17-EURE-0002”)by the Region of Bourgogne Franche-ComtéCADRAN Projectsupported by the European Research Council(ERC)project HYPATIA under the European Union's Horizon 2020 research and innovation programme.Grant agreement n.835294。
文摘This paper investigates the problem of collecting multidimensional data throughout time(i.e.,longitudinal studies)for the fundamental task of frequency estimation under Local Differential Privacy(LDP)guarantees.Contrary to frequency estimation of a single attribute,the multidimensional aspect demands particular attention to the privacy budget.Besides,when collecting user statistics longitudinally,privacy progressively degrades.Indeed,the“multiple”settings in combination(i.e.,many attributes and several collections throughout time)impose several challenges,for which this paper proposes the first solution for frequency estimates under LDP.To tackle these issues,we extend the analysis of three state-of-the-art LDP protocols(Generalized Randomized Response–GRR,Optimized Unary Encoding–OUE,and Symmetric Unary Encoding–SUE)for both longitudinal and multidimensional data collections.While the known literature uses OUE and SUE for two rounds of sanitization(a.k.a.memoization),i.e.,L-OUE and L-SUE,respectively,we analytically and experimentally show that starting with OUE and then with SUE provides higher data utility(i.e.,L-OSUE).Also,for attributes with small domain sizes,we propose Longitudinal GRR(L-GRR),which provides higher utility than the other protocols based on unary encoding.Last,we also propose a new solution named Adaptive LDP for LOngitudinal and Multidimensional FREquency Estimates(ALLOMFREE),which randomly samples a single attribute to be sent with the whole privacy budget and adaptively selects the optimal protocol,i.e.,either L-GRR or L-OSUE.As shown in the results,ALLOMFREE consistently and considerably outperforms the state-of-the-art L-SUE and L-OUE protocols in the quality of the frequency estimates.
文摘Recently,the expertise accumulated in the field of geovisualization has found application in the visualization of abstract multidimensional data,on the basis of methods called spatialization methods.Spatialization methods aim at visualizing multidimensional data into low-dimensional representational spaces by making use of spatial metaphors and applying dimension reduction techniques.Spatial metaphors are able to provide a metaphoric framework for the visualization of information at different levels of granularity.The present paper makes an investigation on how the issue of granularity is handled in the context of representative examples of spatialization methods.Furthermore,this paper introduces the prototyping tool Geo-Scape,which provides an interactive spatialization environment for representing and exploring multidimensional data at different levels of granularity,by making use of a kernel density estimation technique and on the landscape "smoothness" metaphor.A demonstration scenario is presented next to show how Geo-Scape helps to discover knowledge into a large set of data,by grouping them into meaningful clusters on the basis of a similarity measure and organizing them at different levels of granularity.
基金supported by the National Natural Science Foundation of China (Nos.72371115)the Natural Science Foundation of Jilin,China (No.20230101184JC)。
文摘Purpose:Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance.Additionally,the interpretability of these models presents a persistent challenge.Design/methodology/approach:This paper proposes two innovative dimensionality reduction models based on integer programming(DRMBIP).These models assess compactness through the correlation of each indicator with its class center,while separation is evaluated by the correlation between different class centers.In contrast to DRMBIP-p,the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation.Findings:This study,getting data from the Global Health Observatory(GHO),investigates 141 indicators that influence life expectancy.The findings reveal that DRMBIP-p effectively reduces the dimensionality of data,ensuring compactness.It also maintains compatibility with other models.Additionally,DRMBIP-v finds the optimal result,showing exceptional separation.Visualization of the results reveals that all classes have a high compactness.Research limitations:The DRMBIP-p requires the input of the correlation threshold parameter,which plays a pivotal role in the effectiveness of the final dimensionality reduction results.In the DRMBIP-v,modifying the threshold parameter to variable potentially emphasizes either separation or compactness.This necessitates an artificial adjustment to the overflow component within the objective function.Practical implications:The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators.Validated by life expectancy data,this paper demonstrates potential to assist data miners with the reduction of data dimensions.Originality/value:To our knowledge,this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering.It not only has applications in life expectancy,but also has obvious advantages in data mining work that requires precise class centers.
基金Deutsche Forschungsgemeinschaft(DFG,German Research Foundation)under Germany’s Excellence Strategy–EXC-2075–390740016.
文摘Dimensionality reduction is often used to project time series data from multidimensional to two-dimensional space to generate visual representations of the temporal evolution.In this context,we address the problem of multidimensional time series visualization by presenting a new method to show and handle projection errors introduced by dimensionality reduction techniques on multidimensional temporal data.For visualization,subsequent time instances are rendered as dots that are connected by lines or curves to indicate the temporal dependencies.However,inevitable projection artifacts may lead to poor visualization quality and misinterpretation of the temporal information.Wrongly projected data points,inaccurate variations in the distances between projected time instances,and intersections of connecting lines could lead to wrong assumptions about the original data.We adapt local and global quality metrics to measure the visual quality along the projected time series,and we introduce a model to assess the projection error at intersecting lines.These serve as a basis for our new uncertainty visualization techniques that use different visual encodings and interactions to indicate,communicate,and work with the visualization uncertainty from projection errors and artifacts along the timeline of data points,their connections,and intersections.Our approach is agnostic to the projection method and works for linear and non-linear dimensionality reduction methods alike.
基金the National High-Tech Research and Development (863) Program of China (No. 2011AA010101)the National Natural Science Foundation of China (Nos. 61103197, 61073009, and 61240029)+5 种基金the Science and Technology Key Project of Jilin Province (No. 2011ZDGG007)the Youth Foundation of Jilin Province of China (No. 201101035)the Fundamental Research Funds for the Central Universities of China (No. 200903179)China Postdoctoral Science Foundation (No. 2011M500611)the 2011 Industrial Technology Research and Development Special Project of Jilin Province (No. 2011006-9)the 2012 National College Students' Innovative Training Program of China, and European Union Framework Program: MONICA Project under the Grant Agreement Number PIRSES-GA-2011-295222
文摘The Internet of Things (IoT) implies a worldwide network of interconnected objects uniquely addressable, via standard communication protocols. The prevalence of IoT is bound to generate large amounts of multisource, heterogeneous, dynamic, and sparse data. However, IoT offers inconsequential practical benefits without the ability to integrate, fuse, and glean useful information from such massive amounts of data. Accordingly, preparing us for the imminent invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve process efficiency and provide advanced intelligence. In order to determine an acceptable quality of intelligence, diverse and voluminous data have to be combined and fused. Therefore, it is imperative to improve the computational efficiency for fusing and mining multidimensional data. In this paper, we propose an efficient multidimensional fusion algorithm for IoT data based on partitioning. The basic concept involves the partitioning of dimensions (attributes), i.e., a big data set with higher dimensions can be transformed into certain number of relatively smaller data subsets that can be easily processed. Then, based on the partitioning of dimensions, the discernible matrixes of all data subsets in rough set theory are computed to obtain their core attribute sets. Furthermore, a global core attribute set can be determined. Finally, the attribute reduction and rule extraction methods are used to obtain the fusion results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is illustrated.
基金supported by the National Key Research and Development Program of China(No.2022YFB2804300)the Creative Research Group Project of NSFC(No.61821003)+2 种基金the Innovation Fund of the Wuhan National Laboratory for Optoelectronicsthe Program for HUST Academic Frontier Youth Teamthe Innovation Project of Optics Valley Laboratory.
文摘The ongoing quest for higher data storage density has led to a plethora of innovations in the field of optical data storage.This review paper provides a comprehensive overview of recent advancements in next-generation optical data storage,offering insights into various technological roadmaps.We pay particular attention to multidimensional and superresolution approaches,each of which uniquely addresses the challenge of dense storage.The multidimensional approach exploits multiple parameters of light,allowing for the storage of multiple bits of information within a single voxel while still adhering to diffraction limitation.Alternatively,superresolution approaches leverage the photoexcitation and photoinhibition properties of materials to create diffraction-unlimited data voxels.We conclude by summarizing the immense opportunities these approaches present,while also outlining the formidable challenges they face in the transition to industrial applications.
基金supported by the National Natural Science Foundation of China (Grant Nos.: 81530095 and 81673591)Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No.: XDA12020348)+1 种基金National Standardization of Traditional Chinese Medicine Project (Grant No.: ZYBZH-K-LN-01)Science and Technology Commission Foundation of Shanghai (Grant No.: 15DZ0502800)
文摘Comprehensive characterization of metabolites and metabolic profiles in plasma has considerable significance in determining the efficacy and safety of traditional Chinese medicine(TCM)in vivo.However,this process is usually hindered by the insufficient characteristic fragments of metabolites,ubiquitous matrix interference,and complicated screening and identification procedures for metabolites.In this study,an effective strategy was established to systematically characterize the metabolites,deduce the metabolic pathways,and describe the metabolic profiles of bufadienolides isolated from Venenum Bufonis in vivo.The strategy was divided into five steps.First,the blank and test plasma samples were injected into an ultra-high performance liquid chromatography/linear trap quadrupole-orbitrap-mass spectrometry(MS)system in the full scan mode continuously five times to screen for valid matrix compounds and metabolites.Second,an extension-mass defect filter model was established to obtain the targeted precursor ions of the list of bufadienolide metabolites,which reduced approximately 39%of the interfering ions.Third,an acquisition model was developed and used to trigger more tandem MS(MS/MS)fragments of precursor ions based on the targeted ion list.The acquisition mode enhanced the acquisition capability by approximately four times than that of the regular data-dependent acquisition mode.Fourth,the acquired data were imported into Compound Discoverer software for identification of metabolites with metabolic network prediction.The main in vivo metabolic pathways of bufadienolides were elucidated.A total of 147 metabolites were characterized,and the main biotransformation reactions of bufadienolides were hydroxylation,dihydroxylation,and isomerization.Finally,the main prototype bufadienolides in plasma at different time points were determined using LC-MS/MS,and the metabolic profiles were clearly identified.This strategy could be widely used to elucidate the metabolic profiles of TCM preparations or Chinese patent medicines in vivo and provide critical data for rational drug use.
基金ACKNOWLEDGEMENTS The authors would like to thank the reviewers for their detailed reviews and constructive comments, which have helped improve the quality of this paper. The research was supported in part by National Basic Research Program of China (973 Program, No. 2013CB329601, No. 2013CB329604), National Natural Science Foundation of China (No.91124002, 61372191, 61472433, 61202362, 11301302), and China Postdoctoral Science Foundation (2013M542560). All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.
文摘People's attitudes towards public events or products may change overtime,rather than staying on the same state.Understanding how sentiments change overtime is an interesting and important problem with many applications.Given a certain public event or product,a user's sentiments expressed in microblog stream can be regarded as a vector.In this paper,we define a novel problem of sentiment evolution analysis,and develop a simple yet effective method to detect sentiment evolution in user-level for public events.We firstly propose a multidimensional sentiment model with hierarchical structure to model user's complicate sentiments.Based on this model,we use FP-growth tree algorithm to mine frequent sentiment patterns and perform sentiment evolution analysis by Kullback-Leibler divergence.Moreover,we develop an improve Affinity Propagation algorithm to detect why people change their sentiments.Experimental evaluations on real data sets show that sentiment evolution could be implemented effectively using our method proposed in this article.
文摘Scatterplots and scatterplot matrix methods have been popularly used for showing statistical graphics and for exposing patterns in multivariate data.A recent technique,called Linkable Scatterplots,provides an interesting idea for interactive visual exploration which provides a set of necessary plot panels on demand together with interaction,linking and brushing.This article presents a controlled study with a mixed-model design to evaluate the effectiveness and user experience on the visual exploration when using a Sequential-Scatterplots who a single plot is shown at a time,Multiple-Scatterplots who number of plots can be specified and shown,and Simultaneous-Scatterplots who all plots are shown as a scatterplot matrix.Results from the study demonstrated higher accuracy using the Multiple-Scatterplots visualization,particularly in comparison with the Simultaneous-Scatterplots.While the time taken to complete tasks was longer in the Multiple-Scatterplots technique,compared with the simpler Sequential-Scatterplots,Multiple-Scatterplots is inherently more accurate.Moreover,the Multiple-Scatterplots technique is the most highly preferred and positively experienced technique in this study.Overall,results support the strength of Multiple-Scatterplots and highlight its potential as an effective data visualization technique for exploring multivariate data.
基金support from the Data for Better Health Project of Peking University-Master Kong,YW from the National Natural Science Foundation of China(62132017)DW from the Deutsche Forschungsgemeinschaft(DFG)Project-ID 251654672-TRR 161.
文摘We present angle-uniform parallel coordinates,a data-independent technique that deforms the image plane of parallel coordinates so that the angles of linear relationships between two variables are linearly mapped along the horizontal axis of the parallel coordinates plot.Despite being a common method for visualizing multidimensional data,parallel coordinates are ineffective for revealing positive correlations since the associated parallel coordinates points of such structures may be located at infinity in the image plane and the asymmetric encoding of negative and positive correlations may lead to unreliable estimations.To address this issue,we introduce a transformation that bounds all points horizontally using an angleuniform mapping and shrinks them vertically in a structure-preserving fashion;polygonal lines become smooth curves and a symmetric representation of data correlations is achieved.We further propose a combined subsampling and density visualization approach to reduce visual clutter caused by overdrawing.Our method enables accurate visual pattern interpretation of data correlations,and its data-independent nature makes it applicable to all multidimensional datasets.The usefulness of our method is demonstrated using examples of synthetic and real-world datasets.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea govem-ment(MSIP)(No.NRF-2016R1A2B2007153)by the Han-kuk University of Foreign Studies Research Fund.
文摘Audit logs are different from other software logs in that they record the most primitive events(i.e.,system calls)in modem operating systems.Audit logs contain a detailed trace of an operating system,and thus have received great attention from security experts and system administrators.However,the complexity and size of audit logs,which increase in real time,have hindered analysts from understanding and analyzing them.In this paper,we present a novel visual analytics system,LongLine,which enables interactive visual analyses of large-scale audit logs.LongLine lowers the interpretation barrier of audit logs by employing human-understandable representations(e.g.,file paths and commands)instead of abstract indicators of operating systems(e.g.,file descriptors)as well as revealing the temporal patterns of the logs in a multi-scale fashion with meaningful granularity of time in mind(e.g.,hourly,daily,and weekly).LongLine also streamlines comparative analysis between interesting subsets of logs,which is essential in detecting anomalous behaviors of systems.In addition,LongLine allows analysts to monitor the system state in a streaming fashion,keeping the latency between log creation and visualization less than one minute.Finally,we evaluate our system through a case study and a scenario analysis with security experts.
基金funded by the National Key Research and Development Program of China(Grant Nos.2021YFB370-2102 and 2017YFB0701900).
文摘Hierarchical clustering algorithm has been applied to identify the X-ray diffraction(XRD)patterns from a high-throughput characterization of the combinatorial materials chips.As data quality is usually correlated with acquisition time,it is important to study the hierarchical clustering performance as a function of data quality in order to optimize the efficiency of high-throughput experiments.This work investigated the effects of signal-to-noise ratio on the performance of hier-archical clustering using 29 distance metrics for the XRD patterns from Fe−Co−Ni ternary combinatorial materials chip.It is found that the clustering accuracies evaluated by the F1 score only fluctuate slightly with signal-to-noise ratio varying from 15.5 to 22.3(dB)under the experimental condition.This suggests that although it may take 40-50 s to collect a visually high-quality diffraction pattern,the measurement time could be significantly reduced to as low as 4 s without substantial loss in phase identification accuracy by hierarchical clustering.Among the 29 distance metrics,Pearsonχ^(2)shows the highest mean F1 score of 0.77 and lowest standard deviation of 0.008.It shows that the distance matrixes calculated by Pearsonχ^(2)are mainly controlled by the XRD peak shifting characteristics and visualized by the metric multidimensional data scaling.