Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor do...Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor domains and sparse representation is proposed in this paper,i.e.UKGE-MS.Specifically,UKGE-MS can improve the existing text mining technology's ability of understanding and discovering high-dimensional unmarked information,and solves the problems of traditional unsupervised feature selection methods,which only focus on selecting features from a global perspective and ignoring the impact of local connection of samples.Firstly,considering the influence of local information of samples in feature correlation evaluation,a feature clustering algorithm based on average neighborhood mutual information is proposed,and the feature clusters with certain event correlation are obtained;Secondly,an unsupervised feature selection method based on the high-order correlation of multi-dimensional statistical data is designed by combining the dimension reduction advantage of local linear embedding algorithm and the feature selection ability of sparse representation,so as to enhance the generalization ability of the selected feature items.Finally,the events knowledge graph is constructed by means of sparse representation and l1 norm.Extensive experiments are carried out on five real datasets and synthetic datasets,and the UKGE-MS are compared with five corresponding algorithms.The experimental results show that UKGE-MS is better than the traditional method in event clustering and feature selection,and has some advantages over other methods in text event recognition and discovery.展开更多
Scientific literature often contains abbreviated terms in English for brief.Machine translation(MT)systems can help to share knowledge in different languages among researchers.Current MT systems may translate the same...Scientific literature often contains abbreviated terms in English for brief.Machine translation(MT)systems can help to share knowledge in different languages among researchers.Current MT systems may translate the same abbreviated term in different sentences into different target terms.MT systems translate the abbreviated term in two ways:one is to use translation of the full name,the other is to use the abbreviated term directly.Abbreviated terms may be ambiguous and polysemous,and MT systems do not have an explicit strategy to decide which way to use without context information.To get the consistent translation for abbreviated terms in scientific literature,this paper proposes a translation model for abbreviated terms that integrates context information to get consistent translation of abbreviated terms.The context information includes the positions of abbreviated term and domain attributes of scientific literature.The first abbreviated term is translated in full name while the latter ones of the same abbreviated term will show the abbreviated form in the translation text.Experiments of translation from Chinese to English show the effectiveness of the proposed translation model.展开更多
基金This study was funded by the International Science and Technology Cooperation Program of the Science and Technology Department of Shaanxi Province,China(No.2021KW-16)the Science and Technology Project in Xi’an(No.2019218114GXRC017CG018-GXYD17.11),Thesis work was supported by the special fund construction project of Key Disciplines in Ordinary Colleges and Universities in Shaanxi Province,the authors would like to thank the anonymous reviewers for their helpful comments and suggestions.
文摘Text event mining,as an indispensable method of text mining processing,has attracted the extensive attention of researchers.A modeling method for knowledge graph of events based on mutual information among neighbor domains and sparse representation is proposed in this paper,i.e.UKGE-MS.Specifically,UKGE-MS can improve the existing text mining technology's ability of understanding and discovering high-dimensional unmarked information,and solves the problems of traditional unsupervised feature selection methods,which only focus on selecting features from a global perspective and ignoring the impact of local connection of samples.Firstly,considering the influence of local information of samples in feature correlation evaluation,a feature clustering algorithm based on average neighborhood mutual information is proposed,and the feature clusters with certain event correlation are obtained;Secondly,an unsupervised feature selection method based on the high-order correlation of multi-dimensional statistical data is designed by combining the dimension reduction advantage of local linear embedding algorithm and the feature selection ability of sparse representation,so as to enhance the generalization ability of the selected feature items.Finally,the events knowledge graph is constructed by means of sparse representation and l1 norm.Extensive experiments are carried out on five real datasets and synthetic datasets,and the UKGE-MS are compared with five corresponding algorithms.The experimental results show that UKGE-MS is better than the traditional method in event clustering and feature selection,and has some advantages over other methods in text event recognition and discovery.
基金the National Key Research and Development Program of China(No.2019YFA0707201)ISTIC Research Foundation Project(No.ZD2020-10)。
文摘Scientific literature often contains abbreviated terms in English for brief.Machine translation(MT)systems can help to share knowledge in different languages among researchers.Current MT systems may translate the same abbreviated term in different sentences into different target terms.MT systems translate the abbreviated term in two ways:one is to use translation of the full name,the other is to use the abbreviated term directly.Abbreviated terms may be ambiguous and polysemous,and MT systems do not have an explicit strategy to decide which way to use without context information.To get the consistent translation for abbreviated terms in scientific literature,this paper proposes a translation model for abbreviated terms that integrates context information to get consistent translation of abbreviated terms.The context information includes the positions of abbreviated term and domain attributes of scientific literature.The first abbreviated term is translated in full name while the latter ones of the same abbreviated term will show the abbreviated form in the translation text.Experiments of translation from Chinese to English show the effectiveness of the proposed translation model.