For an extract description of threads information in question and answer (QnA) web forums, it is proposed to construct a QnA knowledge presentation model in the English language, and then an entire solution for the ...For an extract description of threads information in question and answer (QnA) web forums, it is proposed to construct a QnA knowledge presentation model in the English language, and then an entire solution for the QnA knowledge system is presented, including data gathering, platform building and applications design. With pre-defined dictionary and grammatical analysis, the model draws semantic information, grammatical information and knowledge confidence into IR methods, in the form of statement sets and term sets with semantic links. Theoretical analysis shows that the statement model can provide an exact presentation for QnA knowledge, breaking through any limits from original QnA patterns and being adaptable to various query demands; the semantic links between terms can assist the statement model, in terms of deducing new from existing knowledge. The model makes use of both information retrieval (IR) and natural language processing (NLP) features, strengthening the knowledge presentation ability. Many knowledge-based applications built upon this model can be improved, providing better performance.展开更多
Using the method of analogy, this paper built the social information field, information field force model and the information diffusion dynamics model to study the dynamic mechanism and the law of the movement regardi...Using the method of analogy, this paper built the social information field, information field force model and the information diffusion dynamics model to study the dynamic mechanism and the law of the movement regarding how misconduct information moves among nodes in the web forum. It also constructed the web forum misconduct information diffusion complex network simulation model to study the diffusion intensity of misconduct information and its influencing factors. The conclusion is that, under the force of the field, the information flows from the high potential node to the low potential node, during which resistance is generated inside and outside the diffusion channel. In the complex network of the web forum, the diffusion intensity of misconduct information displays an increasing trend as the possibility of reconnection among broken nodes becomes higher. The main factor that determines the diffusion intensity of the misconduct information is the average shortest path. It also increases when the interaction frequency turns higher.展开更多
To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,t...To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.展开更多
互联网上充斥着用户生成文档,如论坛中的帖子。如何对这些杂乱无章的内容进行监控是安全部门所关心的重点之一,话题识别与跟踪(Topic Detection and Tracking,TDT)是监控的有效手段之一。但是,网络论坛帖子的特点是回帖篇幅短、话题转移...互联网上充斥着用户生成文档,如论坛中的帖子。如何对这些杂乱无章的内容进行监控是安全部门所关心的重点之一,话题识别与跟踪(Topic Detection and Tracking,TDT)是监控的有效手段之一。但是,网络论坛帖子的特点是回帖篇幅短、话题转移快,使得面向论坛的话题识别与跟踪变得异常困难。针对其特点,给出了三个TDT模型:首先给出一个基线模型;为了缓解"话题漂移"现象,提出了将一个话题表示为种子向量与后续向量的改进模型;在改进的模型上运用最新的命名实体(NE)权重调节策略。针对论坛帖子格式不规范及TDT系统对处理速度的要求,提出了一种特征提取方法。最后,在真实数据集上给出了所用TDT模型的实验结果,证实了所建模型及特征提取方法的有效性。展开更多
基金Microsoft Research Asia Internet Services in Aca-demic Research Fund (NoFY07-RES-OPP-116)Tianjin Technological Development Program Project (No06YFGZGX05900)
文摘For an extract description of threads information in question and answer (QnA) web forums, it is proposed to construct a QnA knowledge presentation model in the English language, and then an entire solution for the QnA knowledge system is presented, including data gathering, platform building and applications design. With pre-defined dictionary and grammatical analysis, the model draws semantic information, grammatical information and knowledge confidence into IR methods, in the form of statement sets and term sets with semantic links. Theoretical analysis shows that the statement model can provide an exact presentation for QnA knowledge, breaking through any limits from original QnA patterns and being adaptable to various query demands; the semantic links between terms can assist the statement model, in terms of deducing new from existing knowledge. The model makes use of both information retrieval (IR) and natural language processing (NLP) features, strengthening the knowledge presentation ability. Many knowledge-based applications built upon this model can be improved, providing better performance.
基金Supported by the National Natural Science Foundation for Young Scientists of China(71503188)Key Research Institute of Humanities&Social Science in Hubei Higher Education Institutions(DSS20160105)+1 种基金Subsidy Fund of Grain Science and Technology Innovation and Scientific and Technological Achievements Transformation Set by Grain Bureau of Hubei ProvinceWuhan Social Science Consortium Project
文摘Using the method of analogy, this paper built the social information field, information field force model and the information diffusion dynamics model to study the dynamic mechanism and the law of the movement regarding how misconduct information moves among nodes in the web forum. It also constructed the web forum misconduct information diffusion complex network simulation model to study the diffusion intensity of misconduct information and its influencing factors. The conclusion is that, under the force of the field, the information flows from the high potential node to the low potential node, during which resistance is generated inside and outside the diffusion channel. In the complex network of the web forum, the diffusion intensity of misconduct information displays an increasing trend as the possibility of reconnection among broken nodes becomes higher. The main factor that determines the diffusion intensity of the misconduct information is the average shortest path. It also increases when the interaction frequency turns higher.
基金Microsoft Research Asia Internet Services in Academic Research Fund(No.FY07-RES-OPP-116)the Science and Technology Development Program of Tianjin(No.06YFGZGX05900)
文摘To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.
文摘互联网上充斥着用户生成文档,如论坛中的帖子。如何对这些杂乱无章的内容进行监控是安全部门所关心的重点之一,话题识别与跟踪(Topic Detection and Tracking,TDT)是监控的有效手段之一。但是,网络论坛帖子的特点是回帖篇幅短、话题转移快,使得面向论坛的话题识别与跟踪变得异常困难。针对其特点,给出了三个TDT模型:首先给出一个基线模型;为了缓解"话题漂移"现象,提出了将一个话题表示为种子向量与后续向量的改进模型;在改进的模型上运用最新的命名实体(NE)权重调节策略。针对论坛帖子格式不规范及TDT系统对处理速度的要求,提出了一种特征提取方法。最后,在真实数据集上给出了所用TDT模型的实验结果,证实了所建模型及特征提取方法的有效性。