The extraction of entity relationship triples is very important to build a knowledge graph(KG),meanwhile,various entity relationship extraction algorithms are mostly based on data-driven,especially for the current pop...The extraction of entity relationship triples is very important to build a knowledge graph(KG),meanwhile,various entity relationship extraction algorithms are mostly based on data-driven,especially for the current popular deep learning algorithms.Therefore,obtaining a large number of accurate triples is the key to build a good KG as well as train a good entity relationship extraction algorithm.Because of business requirements,this KG’s application field is determined and the experts’opinions also must be satisfied.Considering these factors we adopt the top-down method which refers to determining the data schema firstly,then filling the specific data according to the schema.The design of data schema is the top-level design of KG,and determining the data schema according to the characteristics of KG is equivalent to determining the scope of data’s collection and the mode of data’s organization.This method is generally suitable for the construction of domain KG.This article proposes a fast and efficient method to extract the topdown type KG’s triples in social media with the help of structured data in the information box on the right side of the related encyclopedia webpage.At the same time,based on the obtained triples,a data labeling method is proposed to obtain sufficiently high-quality training data,using in various Natural Language Processing(NLP)information extraction algorithms’training.展开更多
Big Earth Data refers to the multidimensional integration and association of scientific data,including geography,resources,environment,ecology,and biology.An effective data classification system and label management s...Big Earth Data refers to the multidimensional integration and association of scientific data,including geography,resources,environment,ecology,and biology.An effective data classification system and label management strategy are important foundations for long-term management of data resources.The objective of this study was to construct a classification system and realize multidimensional semantic data label management for the Big Earth Data Science Engineering Program(CASEarth).This study constructed two sets of classification and coding systems that realize classification by mapping each other;namely,the geosphere-level and Sustainable Development Goals(SDGs)indicator classifications.This technique was based on natural language processing technology and solved problems with subject-word segmentation,weight calculation,and dynamic matching.A prototype system for classification and label management was constructed based on existing CASEarth datasets of more than 1,100.Furthermore,we expect our study to provide the methodology and technical support for useroriented classification and label management services for Big Earth Data.展开更多
Fault diagnosis plays the increasingly vital role to guarantee the machine reliability in the industrial enterprise.Among all the solutions,deep learning(DL)methods have achieved more popularity for their feature extr...Fault diagnosis plays the increasingly vital role to guarantee the machine reliability in the industrial enterprise.Among all the solutions,deep learning(DL)methods have achieved more popularity for their feature extraction ability from the raw historical data.However,the performance of DL relies on the huge amount of labeled data,as it is costly to obtain in the real world as the labeling process for data is usually tagged by hand.To obtain the good performance with limited labeled data,this research proposes a threshold-control generative adversarial network(TCGAN)method.Firstly,the 1D vibration signals are processed to be converted into 2D images,which are used as the input of TCGAN.Secondly,TCGAN would generate pseudo data which have the similar distribution with the limited labeled data.With pseudo data generation,the training dataset can be enlarged and the increase on the labeled data could further promote the performance of TCGAN on fault diagnosis.Thirdly,to mitigate the instability of the generated data,a threshold-control is presented to adjust the relationship between discriminator and generator dynamically and automatically.The proposed TCGAN is validated on the datasets from Case Western Reserve University and Self-Priming Centrifugal Pump.The prediction accuracies with limited labeled data have reached to 99.96%and 99.898%,which are even better than other methods tested under the whole labeled datasets.展开更多
文摘The extraction of entity relationship triples is very important to build a knowledge graph(KG),meanwhile,various entity relationship extraction algorithms are mostly based on data-driven,especially for the current popular deep learning algorithms.Therefore,obtaining a large number of accurate triples is the key to build a good KG as well as train a good entity relationship extraction algorithm.Because of business requirements,this KG’s application field is determined and the experts’opinions also must be satisfied.Considering these factors we adopt the top-down method which refers to determining the data schema firstly,then filling the specific data according to the schema.The design of data schema is the top-level design of KG,and determining the data schema according to the characteristics of KG is equivalent to determining the scope of data’s collection and the mode of data’s organization.This method is generally suitable for the construction of domain KG.This article proposes a fast and efficient method to extract the topdown type KG’s triples in social media with the help of structured data in the information box on the right side of the related encyclopedia webpage.At the same time,based on the obtained triples,a data labeling method is proposed to obtain sufficiently high-quality training data,using in various Natural Language Processing(NLP)information extraction algorithms’training.
基金the Big Earth Science Engineering Program(CASEarth)of the Chinese Academy of Sciences[XDA19090200 and XDA19040501].
文摘Big Earth Data refers to the multidimensional integration and association of scientific data,including geography,resources,environment,ecology,and biology.An effective data classification system and label management strategy are important foundations for long-term management of data resources.The objective of this study was to construct a classification system and realize multidimensional semantic data label management for the Big Earth Data Science Engineering Program(CASEarth).This study constructed two sets of classification and coding systems that realize classification by mapping each other;namely,the geosphere-level and Sustainable Development Goals(SDGs)indicator classifications.This technique was based on natural language processing technology and solved problems with subject-word segmentation,weight calculation,and dynamic matching.A prototype system for classification and label management was constructed based on existing CASEarth datasets of more than 1,100.Furthermore,we expect our study to provide the methodology and technical support for useroriented classification and label management services for Big Earth Data.
基金supported in part by the National Key R&D Program of China(No.2018AAA0101700)the National Natural Science Foundation of China(No.51805192)the State Key Laboratory of Digital Manufacturing Equipment and Technology of Huazhong University of Science and Technology(No.DMETKF2020029).
文摘Fault diagnosis plays the increasingly vital role to guarantee the machine reliability in the industrial enterprise.Among all the solutions,deep learning(DL)methods have achieved more popularity for their feature extraction ability from the raw historical data.However,the performance of DL relies on the huge amount of labeled data,as it is costly to obtain in the real world as the labeling process for data is usually tagged by hand.To obtain the good performance with limited labeled data,this research proposes a threshold-control generative adversarial network(TCGAN)method.Firstly,the 1D vibration signals are processed to be converted into 2D images,which are used as the input of TCGAN.Secondly,TCGAN would generate pseudo data which have the similar distribution with the limited labeled data.With pseudo data generation,the training dataset can be enlarged and the increase on the labeled data could further promote the performance of TCGAN on fault diagnosis.Thirdly,to mitigate the instability of the generated data,a threshold-control is presented to adjust the relationship between discriminator and generator dynamically and automatically.The proposed TCGAN is validated on the datasets from Case Western Reserve University and Self-Priming Centrifugal Pump.The prediction accuracies with limited labeled data have reached to 99.96%and 99.898%,which are even better than other methods tested under the whole labeled datasets.