RelatedWork.tex

\section{Background \& Related Work}
\label{sec:Related_Work}
In this section, we identify the related work along the following lines.

\noindent\textbf{Temporal Information Extraction:\\}
Time is inherently associated with events [Setzer and Gaizauskas 2000]. A temporal expression \cite{uzzaman2012tempeval} is a term or a phrase that refers to a precise time point or an interval. One of the earliest works on temporal information extraction (TIE) distilled the facts from text while inducing as much temporal information as possible \cite{ling2010temporal}. In addition to recognizing temporal relations between times and events, TIE performed global inference, enforcing transitivity to bound the start and ending times for each event \cite{ling2010temporal}. They used the notion of temporal element to unify the notion of times and events. Recently, a lot of attention has been given to recognizing and normalizing temporal expressions. TempEval \cite{uzzaman2012tempeval} competitions are held with a specific goal to evaluate temporal taggers on different document collections. Some existing systems are HeidelTime \cite{strotgen2013multilingual}, SUTime \cite{chang2012sutime}, and Tarsqi toolkit \cite{verhagen2008temporal}. Though all these systems mainly target explicit and relative temporal expressions, a recent work by Kuzey et al. \cite{kuzey2016time} propose to automatically normalize free-text temporal expressions or \textit{temponyms}.\\

\noindent\textbf{Temporal Information Retrieval in News Event Detection and Analysis: \\}
The initial effort to automatically determine and track events was introduced by Allan et al. \cite{allan1998topic} through the Topic Detection and Tracking (TDT) initiative. The goal of the TDT project, which was one of the first significant research initiatives related to automatic news management, is the exploration of techniques that identify the occurrence of new events in order to follow their track over time. Swan and Allan \cite{swan1999extracting} and Swan and Jensen \cite{swan2000timemines} proposed using a classical hypothesis test to discover time dependent features that identify important topics in text documents. Makkonen and Ahonen-Myka \cite{makkonen2003topic} suggested an alternative solution by comparing one document with another through a temporal similarity measure. Kumaran and Allan \cite{kumaran2004text}, on the other hand, detected new stories by measuring the degree overlap of one story with those that occurred in the past. Shaparenko et al. \cite{shaparenko2005identifying} correlated topic events with texts used in a document collection to provide an overview of how topics evolve over time. The underlying assumption is that as events change, the text used in documents will change as well.\\

\noindent\textbf{Focus Time Extraction: \\}
Extraction of information on focus time of an event, is relatively a niche area of research. So far, a handful of research have been done in this topic. Two of the most promising works are presented below.

In Jatowt et al. \cite{jatowt2013estimating}, association scores of words with different years are calculated. They are then used for determining document-year associations. For determining word-time associations an external knowledge base is utilized \cite{jatowt2013estimating}. This knowledge base contains references to the past associated with absolute dates \cite{jatowt2013estimating}. Large datasets of news articles are used as training data \cite{jatowt2013estimating}. Finally, based on discriminative capabilities of words in temporal space (i.e. temporal entropy), focus time of a document is determined under the assumption, \textit{``If a document d contains many words that are strongly associated with a time point t, then d has strong association with t"}

Spitz et al. \cite{spitz2016terms} goes a step ahead and builds the co-occurrence graph based on named entities with the help of state-of-art NER-tagger tools. This graph-based model relies on co-occurrences of named entities belonging to the classes locations, organizations, actors, and dates and puts them in the context of surrounding terms. Here, for a given entity, adjacent nodes in the graph are ranked by the importance of their co-occurrence, i.e. the most relevant entity will be the one with maximum co-occurrence. It provides a comprehensive framework for browsing and summarizing event data \cite{spitz2016terms}. The improvement in dealing with focus time of an event isn't significantly high over the previous one. We'll discuss more about the results in the experiments section.\\

\noindent\textbf{Distributed Representation: \\}
In distributional semantics, while LSA tries to find out topic distribution in a set of documents by matrix decomposition, LDA assumes a Dirichlet prior over the latent topics and by using probabilistic modelling it computes the same. When it comes to word level semantics, Bengio \cite{bengio2003neural} uses feedforward neural networks with fixed length context to compute vector representations for individual words in the corpus. Later, Schwenk \cite{schwenk2005training} has shown that neural network based models provide significant improvements in speech recognition for several tasks against good baseline systems.
A major deficiency of Bengio's \cite{bengio2003neural} approach is that, a fixed length context that needs to be specified ad-hoc before training. Mikolov et al. \cite{mikolov2013distributed} uses an architecture that is usually called a simple recurrent neural network or Elman network \cite{elman1990finding}. This is probably the simplest possible version of recurrent neural network, and very easy to implement and train. The results, produced by Mikolov et al. \cite{mikolov2013distributed} actually captures the context level understanding of individual words.  \\

\noindent\textbf{Continuous Bag of Words (CBOW):} It's kind of a bi-gram model where a target word is predicted given a contextual word. The size of the hidden layer decides the length of the vector of a word learned. The output layer uses soft-max loss function to predict the probability of a target word given a context word. Now the context can be increased to two or more words. In that case, it will be a multi-word contextual input where the context will be `n' words and output will a single target word. \cite{mikolov2013distributed}.\\

\noindent\textbf{Skip-Gram:} It`s the opposite of CBOW architecture as it uses the target word as the input word and context words as the output \cite{mikolov2013distributed}.\\

%\noindent\textbf{Vector Aggregation}
\noindent\textbf{Timeline Summarization: \\}
Swan and Allan construct timelines by extracting clusters of noun phrases and named entities \cite{swan2000automatic}. Later they built a system to provide timelines which consist of one sentence per date, considering usefulness and novelty \cite{allan2001temporal}. Chieu et al. built a similar system in units of sentences with interest and burstiness \cite{chieu2004query}. Yan et al. \cite{yan2011evolutionary} then introduced ETS which is based on a balanced optimization framework via iterative substitution. It generated trajectory time lines from massive data on the Internet. Given a query related news collection, ETS summarizes an evolution trajectory. Yan et al. \cite{yan2011timeline} later introduced a novel approach for the web mining problem - Evolutionary Trans-Temporal Summarization  (ETTS), which takes a collection relevant to a news subject as input, and automatically outputs a time-line with items of component summaries.
	\section{Background \& Related Work}
	\label{sec:Related_Work}
	In this section, we identify the related work along the following lines.

	\noindent\textbf{Temporal Information Extraction:\\}
	Time is inherently associated with events [Setzer and Gaizauskas 2000]. A temporal expression \cite{uzzaman2012tempeval} is a term or a phrase that refers to a precise time point or an interval. One of the earliest works on temporal information extraction (TIE) distilled the facts from text while inducing as much temporal information as possible \cite{ling2010temporal}. In addition to recognizing temporal relations between times and events, TIE performed global inference, enforcing transitivity to bound the start and ending times for each event \cite{ling2010temporal}. They used the notion of temporal element to unify the notion of times and events. Recently, a lot of attention has been given to recognizing and normalizing temporal expressions. TempEval \cite{uzzaman2012tempeval} competitions are held with a specific goal to evaluate temporal taggers on different document collections. Some existing systems are HeidelTime \cite{strotgen2013multilingual}, SUTime \cite{chang2012sutime}, and Tarsqi toolkit \cite{verhagen2008temporal}. Though all these systems mainly target explicit and relative temporal expressions, a recent work by Kuzey et al. \cite{kuzey2016time} propose to automatically normalize free-text temporal expressions or \textit{temponyms}.\\

	\noindent\textbf{Temporal Information Retrieval in News Event Detection and Analysis: \\}
	The initial effort to automatically determine and track events was introduced by Allan et al. \cite{allan1998topic} through the Topic Detection and Tracking (TDT) initiative. The goal of the TDT project, which was one of the first significant research initiatives related to automatic news management, is the exploration of techniques that identify the occurrence of new events in order to follow their track over time. Swan and Allan \cite{swan1999extracting} and Swan and Jensen \cite{swan2000timemines} proposed using a classical hypothesis test to discover time dependent features that identify important topics in text documents. Makkonen and Ahonen-Myka \cite{makkonen2003topic} suggested an alternative solution by comparing one document with another through a temporal similarity measure. Kumaran and Allan \cite{kumaran2004text}, on the other hand, detected new stories by measuring the degree overlap of one story with those that occurred in the past. Shaparenko et al. \cite{shaparenko2005identifying} correlated topic events with texts used in a document collection to provide an overview of how topics evolve over time. The underlying assumption is that as events change, the text used in documents will change as well.\\

	\noindent\textbf{Focus Time Extraction: \\}
	Extraction of information on focus time of an event, is relatively a niche area of research. So far, a handful of research have been done in this topic. Two of the most promising works are presented below.

	In Jatowt et al. \cite{jatowt2013estimating}, association scores of words with different years are calculated. They are then used for determining document-year associations. For determining word-time associations an external knowledge base is utilized \cite{jatowt2013estimating}. This knowledge base contains references to the past associated with absolute dates \cite{jatowt2013estimating}. Large datasets of news articles are used as training data \cite{jatowt2013estimating}. Finally, based on discriminative capabilities of words in temporal space (i.e. temporal entropy), focus time of a document is determined under the assumption, \textit{``If a document d contains many words that are strongly associated with a time point t, then d has strong association with t"}

	Spitz et al. \cite{spitz2016terms} goes a step ahead and builds the co-occurrence graph based on named entities with the help of state-of-art NER-tagger tools. This graph-based model relies on co-occurrences of named entities belonging to the classes locations, organizations, actors, and dates and puts them in the context of surrounding terms. Here, for a given entity, adjacent nodes in the graph are ranked by the importance of their co-occurrence, i.e. the most relevant entity will be the one with maximum co-occurrence. It provides a comprehensive framework for browsing and summarizing event data \cite{spitz2016terms}. The improvement in dealing with focus time of an event isn't significantly high over the previous one. We'll discuss more about the results in the experiments section.\\

	\noindent\textbf{Distributed Representation: \\}
	In distributional semantics, while LSA tries to find out topic distribution in a set of documents by matrix decomposition, LDA assumes a Dirichlet prior over the latent topics and by using probabilistic modelling it computes the same. When it comes to word level semantics, Bengio \cite{bengio2003neural} uses feedforward neural networks with fixed length context to compute vector representations for individual words in the corpus. Later, Schwenk \cite{schwenk2005training} has shown that neural network based models provide significant improvements in speech recognition for several tasks against good baseline systems.
	A major deficiency of Bengio's \cite{bengio2003neural} approach is that, a fixed length context that needs to be specified ad-hoc before training. Mikolov et al. \cite{mikolov2013distributed} uses an architecture that is usually called a simple recurrent neural network or Elman network \cite{elman1990finding}. This is probably the simplest possible version of recurrent neural network, and very easy to implement and train. The results, produced by Mikolov et al. \cite{mikolov2013distributed} actually captures the context level understanding of individual words. \\

	\noindent\textbf{Continuous Bag of Words (CBOW):} It's kind of a bi-gram model where a target word is predicted given a contextual word. The size of the hidden layer decides the length of the vector of a word learned. The output layer uses soft-max loss function to predict the probability of a target word given a context word. Now the context can be increased to two or more words. In that case, it will be a multi-word contextual input where the context will be `n' words and output will a single target word. \cite{mikolov2013distributed}.\\

	\noindent\textbf{Skip-Gram:} It`s the opposite of CBOW architecture as it uses the target word as the input word and context words as the output \cite{mikolov2013distributed}.\\

	%\noindent\textbf{Vector Aggregation}
	\noindent\textbf{Timeline Summarization: \\}
	Swan and Allan construct timelines by extracting clusters of noun phrases and named entities \cite{swan2000automatic}. Later they built a system to provide timelines which consist of one sentence per date, considering usefulness and novelty \cite{allan2001temporal}. Chieu et al. built a similar system in units of sentences with interest and burstiness \cite{chieu2004query}. Yan et al. \cite{yan2011evolutionary} then introduced ETS which is based on a balanced optimization framework via iterative substitution. It generated trajectory time lines from massive data on the Internet. Given a query related news collection, ETS summarizes an evolution trajectory. Yan et al. \cite{yan2011timeline} later introduced a novel approach for the web mining problem - Evolutionary Trans-Temporal Summarization (ETTS), which takes a collection relevant to a news subject as input, and automatically outputs a time-line with items of component summaries.