期刊文献+

跨文档事件检测算法 被引量:1

The detection algorithm for cross-document event
下载PDF
导出
摘要 为了从海量文档中检测出特定事件,提出了一种跨文档事件检测的模型和算法。首先从文档中提取信息要素,包括主体、时间、地点、主题。然后以信息要素为基础对文档建立共现词网络图,并采用4W向量描述待检测事件,即从逆向的角度考虑,对共现词网络图进行带约束条件的深度优先搜索,寻找图中定长的环。最后判断这些环中的节点是否包含待检测事件的信息要素以实现事件的检测,并以环中节点反向获得与事件相关联的文档。实验表明该算法能从文档库中检测出事件,与其他算法相比,能同时获得较高的准确率和召回率。 In order to detect event from huge amounts of documents , it proposes a cross -document event detec-tion model, designs the algorithm .It extracts essential elements of information from all documents , builds word co-occurrence networks based on the essential elements of information , uses 4W vector to represent the event . This algorithm finds fix length acyclic with depth -first search in word co -occurrence network and constraint condition.It conducts the event detection process (EDP).In EDP the key point is to decide whether the node in the acyclic includes essential elements of information of events which are pending for check , and get the event-related documents by the essential elements in detected acyclic .Experiments show that the algorithm detects e-vents in document corpus , and is better than other algorithms in precision and recall rate .
作者 冯戈利
出处 《机械设计与制造工程》 2015年第1期6-10,共5页 Machine Design and Manufacturing Engineering
关键词 事件检测 跨文档 共现词网络 深度优先搜索 event detection cross-document word co-occurrence network acyclic depth-first search
  • 相关文献

参考文献7

二级参考文献53

  • 1姜吉发.一种跨语句汉语事件信息抽取方法[J].计算机工程,2005,31(2):27-29. 被引量:12
  • 2Fung B C M,Wang K,Ester M.Hierarchical document clustering//Wang John ed.The Encyclopedia of Data Warehousing and Mining,idea Group.2005:970-975.
  • 3Salton G.The SMART Retrieval System-Experiments in Automatic Document Processing.Englewood Cliffs,New Jersey:Prentice Hall Inc,1971.
  • 4Wang Y,Julia H.Document clustering with semantic analysis//Proceedings of the 39th Hawaii International Conferences on System Sciences.Hawaii,US,2006:54-63.
  • 5Hotho A,Staab S,Stumme G.Wordnet improves text document clustering//Proceedings of the Semantic Web Workshop at SIGIR-2003,26th Annual International ACM SIGIR Conference.Toronto,Canada,2003:541-550.
  • 6Hall P,Dowling G.Approximate string matching.Computing Survey,1980,12(4):381-402.
  • 7Coelho T,Calado P,Souza L,Ribeiro-Neto B,Muntz R.Image retrieval using multiple evidence ranking.IEEETransactions on Knowledge and Data Engineering,2004,16(4):408-417.
  • 8Ko Y,Park J,Seo J.Improving text categorization using the importance of sentences.lnformation Processing and Management,2004,40(1):65-79.
  • 9Erkan G,Radev D.Lexrank:Graph-based lexical centrality as salience in text summarization.Journal of Artificial Intelligence Research,2004,22(7):457-479.
  • 10Theobald M,Siddharth J,Paepcke A.SpotSigs:Robust and efficient near duplicate detection in large Web collections//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Singapore,2008:563-570.

共引文献229

同被引文献26

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部