Together with the big data movement, many organizations collect their own big data and build distinctive applications. In order to provide smart services upon big data, massive variable data should be well linked and ...Together with the big data movement, many organizations collect their own big data and build distinctive applications. In order to provide smart services upon big data, massive variable data should be well linked and organized to form Data Ocean, which specially emphasizes the deep exploration of the relationships among unstructured data to support smart services. Currently, almost all of these applications have to deal with unstructured data by integrating various analysis and search techniques upon massive storage and processing infrastructure at the application level, which greatly increase the difficulty and cost of application development. This paper presents D-Ocean, an unstructured data management system for data ocean environment. D-Ocean has an open and scalable architecture, which consists of a core platform, pluggable components and auxiliary tools. It exploits a unified storage framework to store data in different kinds of data stores, integrates batch and incremental processing mechanisms to process unstructured data, and provides a combined search engine to conduct compound queries. Furthermore, a so-called RAISE process modeling is proposed to support the whole process of Repository, Analysis, Index, Search and Environment modeling, which can greatly simplify application development. The experiments and use cases in production demonstrate the efficiency and usability of D-Ocean.展开更多
To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initial...To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.展开更多
Currently, relational database management systems (RDBMSs)face different challenges in application development due to the massive growthof unstructured and semi-structured data. This introduced new DBMS categories, kn...Currently, relational database management systems (RDBMSs)face different challenges in application development due to the massive growthof unstructured and semi-structured data. This introduced new DBMS categories, known as not only structured query language (NoSQL) DBMSs, whichdo not adhere to the relational model. The migration from relational databasesto NoSQL databases is challenging due to the data complexity. This study aimsto enhance the storage performance of RDBMSs in handling a variety of data.The paper presents two approaches. The first approach proposes a convenientrepresentation of unstructured data storage. Several extensive experimentswere implemented to assess the efficiency of this approach that could resultin substantial improvements in the RDBMSs storage. The second approachproposes using the JavaScript Object Notation (JSON) format to representmultivalued attributes and many to many (M:N) relationships in relationaldatabases to create a flexible schema and store semi-structured data. Theresults indicate that the proposed approaches outperform similar approachesand improve data storage performance, which helps preserve software stabilityin huge organizations by improving existing software packages whose replacement may be highly costly.展开更多
There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mi...There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.展开更多
E lement- partition- based methods for visualization of 3D unstructured grid data are presented. First, partition schemes for common elements, including curvilinear tetrahedra, pentahedra, hexahedra, etc., are given, ...E lement- partition- based methods for visualization of 3D unstructured grid data are presented. First, partition schemes for common elements, including curvilinear tetrahedra, pentahedra, hexahedra, etc., are given, so that complex elements can be divided into several rectilinear tetrahedra, and the visualization processes can be simplified.Then, a slice method for cloud map and an iso-surface method based on the partition schemes are described.展开更多
Big data refer to the massive amounts and varieties of information in the structured and unstructured form,generated by social networking sites,biomedical equipment,financial companies,internet and websites,scientific...Big data refer to the massive amounts and varieties of information in the structured and unstructured form,generated by social networking sites,biomedical equipment,financial companies,internet and websites,scientific sensors,agriculture engineering sources,and so on.This huge amount of data cannot be processed using traditional data processing systems and technologies.Big data analytics is a process of examining information and patterns from huge data.Hence,the process needs a system architecture for data collection,transmission,storage,processing and analysis,and visualization mechanisms.In this paper,we review the background and futuristic aspects of big data.We first introduce the history,background and related technologies of big data.We focus on big data system architecture,phases and classes of big data analytics.Then we present an open source big data framework to address some of the big data challenges.Finally,we discuss different applications of big data with some examples.展开更多
Purpose-Patient treatment trajectory data are used to predict the outcome of the treatment to particular disease that has been carried out in the research.In order to determine the evolving disease on the patient and ...Purpose-Patient treatment trajectory data are used to predict the outcome of the treatment to particular disease that has been carried out in the research.In order to determine the evolving disease on the patient and changes in the health due to treatment has not considered existing methodologies.Hence deep learning models to trajectory data mining can be employed to identify disease prediction with high accuracy and less computation cost.Design/methodology/approach-Multifocus deep neural network classifiers has been utilized to detect the novel disease class and comorbidity class to the changes in the genome pattern of the patient trajectory data can be identified on the layers of the architecture.Classifier is employed to learn extracted feature set with activation and weight function and then merged on many aspects to classify the undetermined sequence of diseases as a new variant.The performance of disease progression learning progress utilizes the precision of the constituent classifiers,which usually has larger generalization benefits than those optimized classifiers.Findings-Deep learning architecture uses weight function,bias function on input layers and max pooling.Outcome of the input layer has applied to hidden layer to generate the multifocus characteristics of the disease,and multifocus characterized disease is processed in activation function using ReLu function along hyper parameter tuning which produces the effective outcome in the output layer of a fully connected network.Experimental results have proved using cross validation that proposed model outperforms methodologies in terms of computation time and accuracy.Originality/value-Proposed evolving classifier represented as a robust architecture on using objective function to map the data sequence into a class distribution of the evolving disease class to the patient trajectory.Then,the generative output layer of the proposed model produces the progression outcome of the disease of the particular patient trajectory.The model tries to produce the accurate prognosis outcomes by employing data conditional probability function.The originality of the work defines 70%and comparisons of the previous methods the method of values are accurate and increased analysis of the predictions.展开更多
Despite streamers having earned widespread attention,no studies have explored the relationship between streamers and customer engagement from a vocal–visual perspective in the livestreaming commerce context.Drawing o...Despite streamers having earned widespread attention,no studies have explored the relationship between streamers and customer engagement from a vocal–visual perspective in the livestreaming commerce context.Drawing on the elaboration likelihood model,we examine how streamers’speech rate and facial attractiveness influence customer engagement using 434 pieces of unstructured livestreaming video data extracted from Taobao.The findings show that speech rate is positively related to customer engagement behaviors.Facial attractiveness has a significant positive effect on the number of comments and viewers obtained,but it has no impact on the number of likes received in a livestream.Speech rate and facial attractiveness demonstrate a significant interaction effect,increasing customer engagement behaviors.Additionally,the numbers of comments and viewers obtained are positively related to sales performance.These results offer new insights into the vital role of streamers and provide practical implications for improving customer engagement in livestreaming commerce.展开更多
文摘Together with the big data movement, many organizations collect their own big data and build distinctive applications. In order to provide smart services upon big data, massive variable data should be well linked and organized to form Data Ocean, which specially emphasizes the deep exploration of the relationships among unstructured data to support smart services. Currently, almost all of these applications have to deal with unstructured data by integrating various analysis and search techniques upon massive storage and processing infrastructure at the application level, which greatly increase the difficulty and cost of application development. This paper presents D-Ocean, an unstructured data management system for data ocean environment. D-Ocean has an open and scalable architecture, which consists of a core platform, pluggable components and auxiliary tools. It exploits a unified storage framework to store data in different kinds of data stores, integrates batch and incremental processing mechanisms to process unstructured data, and provides a combined search engine to conduct compound queries. Furthermore, a so-called RAISE process modeling is proposed to support the whole process of Repository, Analysis, Index, Search and Environment modeling, which can greatly simplify application development. The experiments and use cases in production demonstrate the efficiency and usability of D-Ocean.
文摘To efficiently mine threat intelligence from the vast array of open-source cybersecurity analysis reports on the web,we have developed the Parallel Deep Forest-based Multi-Label Classification(PDFMLC)algorithm.Initially,open-source cybersecurity analysis reports are collected and converted into a standardized text format.Subsequently,five tactics category labels are annotated,creating a multi-label dataset for tactics classification.Addressing the limitations of low execution efficiency and scalability in the sequential deep forest algorithm,our PDFMLC algorithm employs broadcast variables and the Lempel-Ziv-Welch(LZW)algorithm,significantly enhancing its acceleration ratio.Furthermore,our proposed PDFMLC algorithm incorporates label mutual information from the established dataset as input features.This captures latent label associations,significantly improving classification accuracy.Finally,we present the PDFMLC-based Threat Intelligence Mining(PDFMLC-TIM)method.Experimental results demonstrate that the PDFMLC algorithm exhibits exceptional node scalability and execution efficiency.Simultaneously,the PDFMLC-TIM method proficiently conducts text classification on cybersecurity analysis reports,extracting tactics entities to construct comprehensive threat intelligence.As a result,successfully formatted STIX2.1 threat intelligence is established.
基金This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute(KHIDI),funded by the Ministry of Health&Welfare,Republic of Korea(Grant Number:HI21C1831)and the Soonchunhyang University Research Fund.
文摘Currently, relational database management systems (RDBMSs)face different challenges in application development due to the massive growthof unstructured and semi-structured data. This introduced new DBMS categories, known as not only structured query language (NoSQL) DBMSs, whichdo not adhere to the relational model. The migration from relational databasesto NoSQL databases is challenging due to the data complexity. This study aimsto enhance the storage performance of RDBMSs in handling a variety of data.The paper presents two approaches. The first approach proposes a convenientrepresentation of unstructured data storage. Several extensive experimentswere implemented to assess the efficiency of this approach that could resultin substantial improvements in the RDBMSs storage. The second approachproposes using the JavaScript Object Notation (JSON) format to representmultivalued attributes and many to many (M:N) relationships in relationaldatabases to create a flexible schema and store semi-structured data. Theresults indicate that the proposed approaches outperform similar approachesand improve data storage performance, which helps preserve software stabilityin huge organizations by improving existing software packages whose replacement may be highly costly.
文摘There are both associations and differences between structured and unstructured data mining. How to unite them together to be a united theoretical framework and to guide the research of knowledge discovery and data mining has become an urgent problem to be solved. On the base of analysis and study of existing research results, the united model of knowledge discovery state space (UMKDSS) is presented, and the structured data mining and the complex type data mining are associated together. UMKDSS can provide theoretical guidance for complex type data mining. An application example of UMKDSS is given at last.
文摘E lement- partition- based methods for visualization of 3D unstructured grid data are presented. First, partition schemes for common elements, including curvilinear tetrahedra, pentahedra, hexahedra, etc., are given, so that complex elements can be divided into several rectilinear tetrahedra, and the visualization processes can be simplified.Then, a slice method for cloud map and an iso-surface method based on the partition schemes are described.
文摘Big data refer to the massive amounts and varieties of information in the structured and unstructured form,generated by social networking sites,biomedical equipment,financial companies,internet and websites,scientific sensors,agriculture engineering sources,and so on.This huge amount of data cannot be processed using traditional data processing systems and technologies.Big data analytics is a process of examining information and patterns from huge data.Hence,the process needs a system architecture for data collection,transmission,storage,processing and analysis,and visualization mechanisms.In this paper,we review the background and futuristic aspects of big data.We first introduce the history,background and related technologies of big data.We focus on big data system architecture,phases and classes of big data analytics.Then we present an open source big data framework to address some of the big data challenges.Finally,we discuss different applications of big data with some examples.
文摘Purpose-Patient treatment trajectory data are used to predict the outcome of the treatment to particular disease that has been carried out in the research.In order to determine the evolving disease on the patient and changes in the health due to treatment has not considered existing methodologies.Hence deep learning models to trajectory data mining can be employed to identify disease prediction with high accuracy and less computation cost.Design/methodology/approach-Multifocus deep neural network classifiers has been utilized to detect the novel disease class and comorbidity class to the changes in the genome pattern of the patient trajectory data can be identified on the layers of the architecture.Classifier is employed to learn extracted feature set with activation and weight function and then merged on many aspects to classify the undetermined sequence of diseases as a new variant.The performance of disease progression learning progress utilizes the precision of the constituent classifiers,which usually has larger generalization benefits than those optimized classifiers.Findings-Deep learning architecture uses weight function,bias function on input layers and max pooling.Outcome of the input layer has applied to hidden layer to generate the multifocus characteristics of the disease,and multifocus characterized disease is processed in activation function using ReLu function along hyper parameter tuning which produces the effective outcome in the output layer of a fully connected network.Experimental results have proved using cross validation that proposed model outperforms methodologies in terms of computation time and accuracy.Originality/value-Proposed evolving classifier represented as a robust architecture on using objective function to map the data sequence into a class distribution of the evolving disease class to the patient trajectory.Then,the generative output layer of the proposed model produces the progression outcome of the disease of the particular patient trajectory.The model tries to produce the accurate prognosis outcomes by employing data conditional probability function.The originality of the work defines 70%and comparisons of the previous methods the method of values are accurate and increased analysis of the predictions.
基金supported by the National Natural Science Foundation of China[grant number 72202185]the Fundamental Research Funds for the Central Universities[grant number SWU2109521]+1 种基金the Chongqing Social Science Planning Project[grant number 2020BS60]the Innovation Research 2035 Pilot Plan of Southwest University[grant number SWUPilotPlan026].
文摘Despite streamers having earned widespread attention,no studies have explored the relationship between streamers and customer engagement from a vocal–visual perspective in the livestreaming commerce context.Drawing on the elaboration likelihood model,we examine how streamers’speech rate and facial attractiveness influence customer engagement using 434 pieces of unstructured livestreaming video data extracted from Taobao.The findings show that speech rate is positively related to customer engagement behaviors.Facial attractiveness has a significant positive effect on the number of comments and viewers obtained,but it has no impact on the number of likes received in a livestream.Speech rate and facial attractiveness demonstrate a significant interaction effect,increasing customer engagement behaviors.Additionally,the numbers of comments and viewers obtained are positively related to sales performance.These results offer new insights into the vital role of streamers and provide practical implications for improving customer engagement in livestreaming commerce.