摘要
Distant supervision has the ability to generate a huge amount training data.Recently,the multi-instance multi-label learning is imported to distant supervision to combat noisy data and improve the performance of relation extraction.But multi-instance multi-label learning only uses hidden variables when inference relation between entities,which could not make full use of training data.Besides,traditional lexical and syntactic features are defective reflecting domain knowledge and global information of sentence,which limits the system’s performance.This paper presents a novel approach for multi-instance multilabel learning,which takes the idea of fuzzy classification.We use cluster center as train-data and in this way we can adequately utilize sentencelevel features.Meanwhile,we extend feature set by paragraph vector,which carries semantic information of sentences.We conduct an extensive empirical study to verify our contributions.The result shows our method is superior to the state-of-the-art distant supervised baseline.
出处
《国际计算机前沿大会会议论文集》
2015年第B12期45-47,共3页
International Conference of Pioneering Computer Scientists, Engineers and Educators(ICPCSEE)