Today, the quantity of data continues to increase, furthermore, the data are heterogeneous, from multiple sources (structured, semi-structured and unstructured) and with different levels of quality. Therefore, it is v...Today, the quantity of data continues to increase, furthermore, the data are heterogeneous, from multiple sources (structured, semi-structured and unstructured) and with different levels of quality. Therefore, it is very likely to manipulate data without knowledge about their structures and their semantics. In fact, the meta-data may be insufficient or totally absent. Data Anomalies may be due to the poverty of their semantic descriptions, or even the absence of their description. In this paper, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-col- umns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns.展开更多
In the era of big data,the conflict between data mining and data privacy protection is increasing day by day.Traditional information security focuses on protecting the security of attribute values without semantic ass...In the era of big data,the conflict between data mining and data privacy protection is increasing day by day.Traditional information security focuses on protecting the security of attribute values without semantic association.The data privacy of big data is mainly reflected in the effective use of data without exposing the user’s sensitive information.Considering the semantic association,reasonable security access for privacy protect is required.Semi-structured and self-descriptive XML(eXtensible Markup Language)has become a common form of data organization for database management in big data environments.Based on the semantic integration nature of XML data,this paper proposes a data access control model for individual users.Through the semantic dependency between data and the integration process from bottom to top,the global visual range of inverted XML structure is realized.Experimental results show that the model effectively protects the privacy and has high access efficiency.展开更多
Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long shor...Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long short term memory(LSTM),which is able to better capture sequential and syntactic features of text.However,this method neglects the dependencies among instances,such as their context semantic similarities.To solve this problem,we proposed a novel WSD model by introducing a cache-like memory module to capture the semantic dependencies among instances for WSD.Extensive evaluations on standard datasets demonstrate the superiority of the proposed model over various baselines.展开更多
Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures.Recently,the deep neural network recognizers based on the encoder-decoder ...Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures.Recently,the deep neural network recognizers based on the encoder-decoder frame-work have achieved great improvements on this task.However,the unsatisfactory recognition performance for formulas with long LTeX strings is one shortcoming of the existing work.Moreover,lacking sufficient training data also limits the capability of these recognizers.In this paper,we design a multimodal dependence attention(MDA)module to help the model learn visual and semantic dependencies among symbols in the same formula to improve the recognition perfor-mance of the formulas with long LTeX strings.To alleviate overfitting and further improve the recognition performance,we also propose a new dataset,Handwritten Formula Image Dataset(HFID),which contains 25620 handwritten formula images collected from real life.We conduct extensive experiments to demonstrate the effectiveness of our proposed MDA module and HFID dataset and achieve state-of-the-art performances,63.79%and 65.24%expression accuracy on CROHME 2014 and CROHME 2016,respectively.展开更多
Relation extraction has been widely used to find semantic relations between entities from plain text.Dependency trees provide deeper semantic information for relation extraction.However,existing dependency tree based ...Relation extraction has been widely used to find semantic relations between entities from plain text.Dependency trees provide deeper semantic information for relation extraction.However,existing dependency tree based models adopt pruning strategies that are too aggressive or conservative,leading to insufficient semantic information or excessive noise in relation extraction models.To overcome this issue,we propose the Neural Attentional Relation Extraction Model with Dual Dependency Trees(called DDT-REM),which takes advantage of both the syntactic dependency tree and the semantic dependency tree to well capture syntactic features and semantic features,respectively.Specifically,we first propose novel representation learning to capture the dependency relations from both syntax and semantics.Second,for the syntactic dependency tree,we propose a local-global attention mechanism to solve semantic deficits.We design an extension of graph convolutional networks(GCNs)to perform relation extraction,which effectively improves the extraction accuracy.We conduct experimental studies based on three real-world datasets.Compared with the traditional methods,our method improves the F 1 scores by 0.3,0.1 and 1.6 on three real-world datasets,respectively.展开更多
文摘Today, the quantity of data continues to increase, furthermore, the data are heterogeneous, from multiple sources (structured, semi-structured and unstructured) and with different levels of quality. Therefore, it is very likely to manipulate data without knowledge about their structures and their semantics. In fact, the meta-data may be insufficient or totally absent. Data Anomalies may be due to the poverty of their semantic descriptions, or even the absence of their description. In this paper, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-col- umns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns.
基金This work was supported by Funding of Jiangsu Innovation Program for Graduate Education KYLX_0285,the National Natural Science Foundation of China(No.61602241)the Natural Science Foundation of Jiangsu Province(No.BK20150758)the pre-study fund of PLA University of Science and Technology.
文摘In the era of big data,the conflict between data mining and data privacy protection is increasing day by day.Traditional information security focuses on protecting the security of attribute values without semantic association.The data privacy of big data is mainly reflected in the effective use of data without exposing the user’s sensitive information.Considering the semantic association,reasonable security access for privacy protect is required.Semi-structured and self-descriptive XML(eXtensible Markup Language)has become a common form of data organization for database management in big data environments.Based on the semantic integration nature of XML data,this paper proposes a data access control model for individual users.Through the semantic dependency between data and the integration process from bottom to top,the global visual range of inverted XML structure is realized.Experimental results show that the model effectively protects the privacy and has high access efficiency.
文摘Word sense disambiguation(WSD),identifying the specific sense of the target word given its context,is a fundamental task in natural language processing.Recently,researchers have shown promising results using long short term memory(LSTM),which is able to better capture sequential and syntactic features of text.However,this method neglects the dependencies among instances,such as their context semantic similarities.To solve this problem,we proposed a novel WSD model by introducing a cache-like memory module to capture the semantic dependencies among instances for WSD.Extensive evaluations on standard datasets demonstrate the superiority of the proposed model over various baselines.
基金supported by the National Key Research and Development Program of China under Grant No.2020YFB1313602.
文摘Offline handwritten formula recognition is a challenging task due to the variety of handwritten symbols and two-dimensional formula structures.Recently,the deep neural network recognizers based on the encoder-decoder frame-work have achieved great improvements on this task.However,the unsatisfactory recognition performance for formulas with long LTeX strings is one shortcoming of the existing work.Moreover,lacking sufficient training data also limits the capability of these recognizers.In this paper,we design a multimodal dependence attention(MDA)module to help the model learn visual and semantic dependencies among symbols in the same formula to improve the recognition perfor-mance of the formulas with long LTeX strings.To alleviate overfitting and further improve the recognition performance,we also propose a new dataset,Handwritten Formula Image Dataset(HFID),which contains 25620 handwritten formula images collected from real life.We conduct extensive experiments to demonstrate the effectiveness of our proposed MDA module and HFID dataset and achieve state-of-the-art performances,63.79%and 65.24%expression accuracy on CROHME 2014 and CROHME 2016,respectively.
基金the National Science and Technology Major Project of the Ministry of Science and Technology of China(Secret 501).
文摘Relation extraction has been widely used to find semantic relations between entities from plain text.Dependency trees provide deeper semantic information for relation extraction.However,existing dependency tree based models adopt pruning strategies that are too aggressive or conservative,leading to insufficient semantic information or excessive noise in relation extraction models.To overcome this issue,we propose the Neural Attentional Relation Extraction Model with Dual Dependency Trees(called DDT-REM),which takes advantage of both the syntactic dependency tree and the semantic dependency tree to well capture syntactic features and semantic features,respectively.Specifically,we first propose novel representation learning to capture the dependency relations from both syntax and semantics.Second,for the syntactic dependency tree,we propose a local-global attention mechanism to solve semantic deficits.We design an extension of graph convolutional networks(GCNs)to perform relation extraction,which effectively improves the extraction accuracy.We conduct experimental studies based on three real-world datasets.Compared with the traditional methods,our method improves the F 1 scores by 0.3,0.1 and 1.6 on three real-world datasets,respectively.