How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event det...How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event detection system, based on sorted subtopic matching algorithm and constructs the entire design framework. In this p^per, the subtopics contained in old topics (or news stories) are sorted in descending order according to their importance to the topic(or news stories), and form a sorted subtopic sequence. In the process of subtopic matching, subtopic scoring matrix is used to determine whether a new story is reporting a new event. Experimental results show that the sorted subtopic matching model improved the accuracy and effectiveness ofthenew event detection system in cyberspace.展开更多
Online monitoring of temporally-sequenced news streams for interesting patterns and trends has gained popularity in the last decade.In this paper,we study a particular news stream monitoring task:timely detection of b...Online monitoring of temporally-sequenced news streams for interesting patterns and trends has gained popularity in the last decade.In this paper,we study a particular news stream monitoring task:timely detection of bursty events which have happened recently and discovery of their evolutionary patterns along the timeline.Here,a news stream is represented as feature streams of tens of thousands of features(i.e.,keyword.Each news story consists of a set of keywords.).A bursty event therefore is composed of a group of bursty features,which show bursty rises in frequency as the related event emerges.In this paper,we give a formal definition to the above problem and present a solution with the following steps:(1) applying an online multi-resolution burst detection method to identify bursty features with different bursty durations within a recent time period;(2) clustering bursty features to form bursty events and associating each event with a power value which reflects its bursty level;(3) applying an information retrieval method based on cosine similarity to discover the event's evolution(i.e.,highly related bursty events in history) along the timeline.We extensively evaluate the proposed methods on the Reuters Corpus Volume 1.Experimental results show that our methods can detect bursty events in a timely way and effectively discover their evolution.The power values used in our model not only measure event's bursty level or relative importance well at a certain time point but also show relative strengths of events along the same evolution.展开更多
各种媒体每天有大量的新闻报道产生,需要一种自动化的分析方法将新闻以一种更加清晰的组织形式展示给用户.大多已有工作将新闻划分成平面的话题,然而一个话题并非仅仅是简单的新闻集合,而是由一系列相互关联的事件所组成的.由于话题内...各种媒体每天有大量的新闻报道产生,需要一种自动化的分析方法将新闻以一种更加清晰的组织形式展示给用户.大多已有工作将新闻划分成平面的话题,然而一个话题并非仅仅是简单的新闻集合,而是由一系列相互关联的事件所组成的.由于话题内的事件之间往往非常相似,导致话题内的事件检测精确度较差.为了克服以上问题,提出了基于事件词元委员会的事件检测与关系发现方法.即首先挖掘每个事件的核心词元,随后利用事件的核心词元进行事件检测与关系发现.在Linguistic Data Consortium(LDC)的两个数据集上的实验结果显示,提出的事件检测与关系发现方法可以显著地改善已有方法的效果.展开更多
基金Funded by the Planning Project of National Language Committee in the "12th 5-year Plan"(No.YB125-49)the Foundation for Key Program of Ministry of Education,China(No.212167)the Fundamental Research Funds for the Central Universities(No.SWJTU12CX096)
文摘How to quickly and accurately detect new topics from massive data online becomes a main problem of public opinion monitoring in cyberspace. This paperpresents a new event detection method for the current new event detection system, based on sorted subtopic matching algorithm and constructs the entire design framework. In this p^per, the subtopics contained in old topics (or news stories) are sorted in descending order according to their importance to the topic(or news stories), and form a sorted subtopic sequence. In the process of subtopic matching, subtopic scoring matrix is used to determine whether a new story is reporting a new event. Experimental results show that the sorted subtopic matching model improved the accuracy and effectiveness ofthenew event detection system in cyberspace.
基金Project (No.2008BAH26B00) supported by the National Key Technology R & D Program of China
文摘Online monitoring of temporally-sequenced news streams for interesting patterns and trends has gained popularity in the last decade.In this paper,we study a particular news stream monitoring task:timely detection of bursty events which have happened recently and discovery of their evolutionary patterns along the timeline.Here,a news stream is represented as feature streams of tens of thousands of features(i.e.,keyword.Each news story consists of a set of keywords.).A bursty event therefore is composed of a group of bursty features,which show bursty rises in frequency as the related event emerges.In this paper,we give a formal definition to the above problem and present a solution with the following steps:(1) applying an online multi-resolution burst detection method to identify bursty features with different bursty durations within a recent time period;(2) clustering bursty features to form bursty events and associating each event with a power value which reflects its bursty level;(3) applying an information retrieval method based on cosine similarity to discover the event's evolution(i.e.,highly related bursty events in history) along the timeline.We extensively evaluate the proposed methods on the Reuters Corpus Volume 1.Experimental results show that our methods can detect bursty events in a timely way and effectively discover their evolution.The power values used in our model not only measure event's bursty level or relative importance well at a certain time point but also show relative strengths of events along the same evolution.
文摘各种媒体每天有大量的新闻报道产生,需要一种自动化的分析方法将新闻以一种更加清晰的组织形式展示给用户.大多已有工作将新闻划分成平面的话题,然而一个话题并非仅仅是简单的新闻集合,而是由一系列相互关联的事件所组成的.由于话题内的事件之间往往非常相似,导致话题内的事件检测精确度较差.为了克服以上问题,提出了基于事件词元委员会的事件检测与关系发现方法.即首先挖掘每个事件的核心词元,随后利用事件的核心词元进行事件检测与关系发现.在Linguistic Data Consortium(LDC)的两个数据集上的实验结果显示,提出的事件检测与关系发现方法可以显著地改善已有方法的效果.
文摘新事件检测(new event detection,简称NED)的目标是从一个或多个新闻源中检测出报道一个新闻话题的第一个新闻.初步实验发现,在对不同类别的新闻报道进行新事件检测时,其不同类型的词元往往具有不同的敏感程度.而传统方法往往将所有的词元等同看待.重点研究在新事件检测模型中,对于不同词元的权重设定问题.提出利用统计方法优化不同类别新闻对于不同词性词元的权重参数;提出利用已有新闻簇信息动态更新词元权重的方法,采用在新闻之间(而非新闻与新闻簇之间)计算相似度的形式,发挥两种比较形式的优点.在Linguistic Data Consortium(LDC)公共数据集TDT2与TDT3上进行实验,实验结果表明,这两种改进方法的效果明显,性能与同类系统相比有显著提升.