Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent e...Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent each word as a single vector,without considering the homonymy and polysemy of the word;thus,their performances are limited.In order to address this problem,an effective topical word embedding(TWE)‐based WSD method,named TWE‐WSD,is proposed,which integrates Latent Dirichlet Allocation(LDA)and word embedding.Instead of generating a single word vector(WV)for each word,TWE‐WSD generates a topical WV for each word under each topic.Effective integrating strategies are designed to obtain high quality contextual vectors.Extensive experiments on SemEval‐2013 and SemEval‐2015 for English all‐words tasks showed that TWE‐WSD outperforms other state‐of‐the‐art WSD methods,especially on nouns.展开更多
T-overlap query is the basis of set similarity query and has been applied in many important fields.Most existing approaches employ a pruning-and-verification framework,thus in low efficiency.Modern GPU has much higher...T-overlap query is the basis of set similarity query and has been applied in many important fields.Most existing approaches employ a pruning-and-verification framework,thus in low efficiency.Modern GPU has much higher parallelism as well as memory bandwidth than CPU and can be used to accelerate T-overlap query.In this paper,we use hash segmentation to divide inverted lists into segments,then design an efficient inverted index called GHSII on GPU using hash segmentation.Based on GHSII,a new segmentation parallel T-overlap algorithm,GSPS,is proposed.GSPS uses segment at a time to scan segments and uses shared memory to decrease the number of accesses to device memory.Furthermore,an optimized algorithm called GSPS-TLLO using a heuristic query order is proposed to solve the problem of load imbalance.Experiments are carried out on two real datasets and the results show that GSPS-TLLO outperforms the state-of-the-art GPU parallel T-overlap algorithms.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:61562054The Fund of China Scholarship Council,Grant/Award Number:201908530036Talents Introduction Project of Guangxi University for Nationalities,Grant/Award Number:2014MDQD020。
文摘Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent each word as a single vector,without considering the homonymy and polysemy of the word;thus,their performances are limited.In order to address this problem,an effective topical word embedding(TWE)‐based WSD method,named TWE‐WSD,is proposed,which integrates Latent Dirichlet Allocation(LDA)and word embedding.Instead of generating a single word vector(WV)for each word,TWE‐WSD generates a topical WV for each word under each topic.Effective integrating strategies are designed to obtain high quality contextual vectors.Extensive experiments on SemEval‐2013 and SemEval‐2015 for English all‐words tasks showed that TWE‐WSD outperforms other state‐of‐the‐art WSD methods,especially on nouns.
文摘T-overlap query is the basis of set similarity query and has been applied in many important fields.Most existing approaches employ a pruning-and-verification framework,thus in low efficiency.Modern GPU has much higher parallelism as well as memory bandwidth than CPU and can be used to accelerate T-overlap query.In this paper,we use hash segmentation to divide inverted lists into segments,then design an efficient inverted index called GHSII on GPU using hash segmentation.Based on GHSII,a new segmentation parallel T-overlap algorithm,GSPS,is proposed.GSPS uses segment at a time to scan segments and uses shared memory to decrease the number of accesses to device memory.Furthermore,an optimized algorithm called GSPS-TLLO using a heuristic query order is proposed to solve the problem of load imbalance.Experiments are carried out on two real datasets and the results show that GSPS-TLLO outperforms the state-of-the-art GPU parallel T-overlap algorithms.