Results.tex

\section{Experiments \& Results} \label{sec:Results}

\noindent\textbf{Benchmark Queries: \\}
The queries contain a wide range of topics starting from natural disasters, accidents, riots to assassinations. These queries contain events starting from the year 1969 to 2007, so temporally, these events are diverse too. These are not biased to any specific country or region. So collected results are completely unbiased in terms of queries.\\

\noindent\textbf{Configuration: \\}
We produced the distributed represntations for individual words present in the corpus with the following configuration: dimension of 300 units, context window size of 5 words and CBOW architecture. We are retrieving top 100 documents for an indivdual query.\\

\noindent\textbf{Evaluation Measure: \\}
We compare the quality of our own methods in concluding focus time of an event-query with respect to state-of-art approaches. We use two standard techniques such as Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG).

\begin{table}[t]
\centering
%\caption{}
\label{Comparison}
\begin{tabular}{|c|c|c|l|l|}
\hline
\textbf{Method}              & \multicolumn{1}{l|}{\textbf{MRR}} & \textbf{NDCG}\\ \hline
Random              & 0.15                                       & 0.21\\ \hline
LDA              & 0.19                                       & 0.35\\ \hline
Adam\_Jatowt        & 0.38                                       & 0.50\\ \hline
Andreas\_Spitz      & 0.46                                       & 0.57\\ \hline
Early\_Fusion\_Mean & 0.61                                       & 0.69\\ \hline
Early\_Fusion\_Sum  & 0.61                                       & 0.69\\ \hline
Early\_Fusion\_Local\_IDF  & 0.50                                & 0.63\\ \hline
Late\_Fusion\_Sum   & 0.57                                       & 0.66\\ \hline
Late\_Fusion\_TF    & 0.45                                       & 0.57\\ \hline
Late\_Fusion\_IDF   & 0.43                                       & 0.54\\ \hline
Late\_Fusion\_TFIDF & 0.56                                       & 0.65\\ \hline
Late\_Fusion\_NER   & 0.62                                       & 0.73\\ \hline
\end{tabular}
\captionof{table}{Overall Baseline Comparisons}
\end{table}

To put more emphasis on the performance of our methods, we compared the results of our methods with the state-of-art approaches in terms of MRR, outside the coverage of the corpus we used i.e. events before 1991 or after 2010.

\begin{table}[t]
\centering
%\caption{}
\label{Comparison Outside}
\begin{tabular}{|l|c|}
\hline
\multicolumn{1}{|c|}{\textbf{Method}} & \textbf{MRR} \\ \hline
LDA                 & 0.0703                 \\ \hline
Adam\_Jatowt                 & 0.0550                 \\ \hline
Andreas\_Spitz               & 0.1250                 \\ \hline
Early\_Fusion\_Mean              & 0.5715                \\ \hline
Early\_Fusion\_Sum              & 0.5715                \\ \hline
Early\_Fusion\_Local\_IDF              & 0.4321                \\ \hline
Late\_Fusion\_Sum            & 0.5558                \\ \hline
Late\_Fusion\_Local\_TF            & 0.4290                \\ \hline
Late\_Fusion\_Local\_IDF            & 0.370                \\ \hline
Late\_Fusion\_Local\_TFIDF            & 0.4934                \\ \hline
Late\_Fusion\_NER            & 0.5492                \\ \hline
\end{tabular}
\captionof{table}{Comparison with Baselines Outside the Corpus Coverage}
\end{table}

\subsection{Robustness Analysis}

The table below analyses win/loss performance ratio between the state-of-art approaches and two of our best performing methods, Early Fusion Mean and Late Fusion NER.

\begin{table}[t]
\centering
%\caption{}
\label{my-label}
\begin{tabular}{|c|c|c|}
\hline
                        & \textbf{Late\_Fusion\_NER} & \textbf{Early\_Fusion\_Mean} \\ \hline
\textbf{Adam\_Jatowt}   & 59/17                      & 61/18                        \\ \hline
\textbf{Andreas\_Spitz} & 53/27                      & 52/30                        \\ \hline
\textbf{LDA}            & 81/13                      & 87/6                         \\ \hline
\end{tabular}
\captionof{table}{Win Loss Performance Analysis}
\end{table}


\subsection{Gain/Loss Analysis}
Maximum gain of $+0.96$ is observed for Query:
\textit{``Lieutenant Colonel Oliver North and Vice Admiral John Poindexter are indicted on charges of conspiracy to defraud the United States." - 1988}.

The reason being, in graph-based methods, co-occurrence and other pure statistical measures are calculated blindly in order to find out the relation between a temporal unit and a word present in the corpus. In recurrent neural networks, used in Word2Vec architecture, an internal memory is used to process arbitrary sequences of long inputs. They were introduced to learn distributed representations of structure, such as logical terms. That is why, at the end of vectorization process, we get contextually similar terms. So, even if there are only references of past events happened in years outside coverage, the terms which have occurred in the context of a particular query are interrelated in semantic space (may not be statistically correlated, in terms of co occurrence frequency).

Maximum Loss of $-0.94$ is observed for Query: \textit{``Columbine High School massacre: Two Littleton, Colorado teenagers, Eric Harris and Dylan Klebold, open fire on their teachers and classmates, killing 12 students and 1 teacher, and then themselves." - 1999}. In this query, the names like `Eric Harris, Dylan Klebold' are pretty ambiguous; so, overall context becomes cloudy in terms of judging the focus time of the event. It's hard to find contextual similarity among the terms of this event.

\subsection{Easy/Hard Queries}

Most difficult query to be judged, in the benchmarks, is \textit{``The Intergovernmental Panel on Climate Change releases its first assessment report, linking increases in carbon dioxide in the Earth's atmosphere, and resultant rise in global temperature, to human activities." - 1905}. Since, focus time of this query is way beyond the coverage of the corpus, all of the methods including the baselines and our own methods failed to capture the context.

Easiest query to be judged, in the benchmarks, is \textit{``Chechen terrorists take 1,128 people hostage, mostly children, in a school in the Beslan school hostage crisis. The hostage-takers demand the release of Chechen rebels imprisoned in neighbouring Ingushetia and the independence of Chechnya from Russia." - 2004}. Since, content of this query is pretty informative and also well with in the coverage of the corpus, all the methods returned the actual focus time on top of the retrieved list.\\

\noindent\textbf{Comparison between Early Fusion and Late Fusion by an example: \\}
In a query \textit{``A peaceful student demonstration in Prague, Czechoslovakia, is severely beaten back by riot police. This sparks a revolution aimed at overthrowing the Communist government"}, words like \textit{'prague', 'czechoslovakia', 'communist', 'government', 'overthrowing'} are the important words which drive the final result towards a particular focus time, but in Early Fusion, words like \textit{'riot', 'police', 'beaten', 'revolution'} are adding sufficient noise to the result. In Late Fusion, these noisy parts could not make an impact because these words are practically associated with many years (occurring in various contexts but several yeara). \\

\noindent\textbf{Late Fusion NER v/s other Late Fusion approaches:\\ }
If in the same sentence, we find places like, \textit{Montgomery County, Maryland} or \textit{Kuta, Bali} or organization like \textit{International Astronomical Union} or person like \textit{Laurent Kabila}, the result naturally improves because the final result of these words help in establishing a certain sense to which the query is inclined to. If the result has decreased only because the places are far different from each other and the resultant vector could not make sense at all in the direction of the query e.g. in a query like, \textit{``An Amtrak train en route from Washington, D.C. to Boston collides with Conrail engines at Chase, Maryland"}. In this example, \textit{Washington, Boston, Maryland} are all different places, so the resultant vector indicates an ambiguous direction which is not in favor of the direction of the query.
	\section{Experiments \& Results} \label{sec:Results}

	\noindent\textbf{Benchmark Queries: \\}
	The queries contain a wide range of topics starting from natural disasters, accidents, riots to assassinations. These queries contain events starting from the year 1969 to 2007, so temporally, these events are diverse too. These are not biased to any specific country or region. So collected results are completely unbiased in terms of queries.\\

	\noindent\textbf{Configuration: \\}
	We produced the distributed represntations for individual words present in the corpus with the following configuration: dimension of 300 units, context window size of 5 words and CBOW architecture. We are retrieving top 100 documents for an indivdual query.\\

	\noindent\textbf{Evaluation Measure: \\}
	We compare the quality of our own methods in concluding focus time of an event-query with respect to state-of-art approaches. We use two standard techniques such as Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG).

	\begin{table}[t]
	\centering
	%\caption{}
	\label{Comparison}
	\begin{tabular}{\|c\|c\|c\|l\|l\|}
	\hline
	\textbf{Method} & \multicolumn{1}{l\|}{\textbf{MRR}} & \textbf{NDCG}\\ \hline
	Random & 0.15 & 0.21\\ \hline
	LDA & 0.19 & 0.35\\ \hline
	Adam\_Jatowt & 0.38 & 0.50\\ \hline
	Andreas\_Spitz & 0.46 & 0.57\\ \hline
	Early\_Fusion\_Mean & 0.61 & 0.69\\ \hline
	Early\_Fusion\_Sum & 0.61 & 0.69\\ \hline
	Early\_Fusion\_Local\_IDF & 0.50 & 0.63\\ \hline
	Late\_Fusion\_Sum & 0.57 & 0.66\\ \hline
	Late\_Fusion\_TF & 0.45 & 0.57\\ \hline
	Late\_Fusion\_IDF & 0.43 & 0.54\\ \hline
	Late\_Fusion\_TFIDF & 0.56 & 0.65\\ \hline
	Late\_Fusion\_NER & 0.62 & 0.73\\ \hline
	\end{tabular}
	\captionof{table}{Overall Baseline Comparisons}
	\end{table}

	To put more emphasis on the performance of our methods, we compared the results of our methods with the state-of-art approaches in terms of MRR, outside the coverage of the corpus we used i.e. events before 1991 or after 2010.

	\begin{table}[t]
	\centering
	%\caption{}
	\label{Comparison Outside}
	\begin{tabular}{\|l\|c\|}
	\hline
	\multicolumn{1}{\|c\|}{\textbf{Method}} & \textbf{MRR} \\ \hline
	LDA & 0.0703 \\ \hline
	Adam\_Jatowt & 0.0550 \\ \hline
	Andreas\_Spitz & 0.1250 \\ \hline
	Early\_Fusion\_Mean & 0.5715 \\ \hline
	Early\_Fusion\_Sum & 0.5715 \\ \hline
	Early\_Fusion\_Local\_IDF & 0.4321 \\ \hline
	Late\_Fusion\_Sum & 0.5558 \\ \hline
	Late\_Fusion\_Local\_TF & 0.4290 \\ \hline
	Late\_Fusion\_Local\_IDF & 0.370 \\ \hline
	Late\_Fusion\_Local\_TFIDF & 0.4934 \\ \hline
	Late\_Fusion\_NER & 0.5492 \\ \hline
	\end{tabular}
	\captionof{table}{Comparison with Baselines Outside the Corpus Coverage}
	\end{table}

	\subsection{Robustness Analysis}

	The table below analyses win/loss performance ratio between the state-of-art approaches and two of our best performing methods, Early Fusion Mean and Late Fusion NER.

	\begin{table}[t]
	\centering
	%\caption{}
	\label{my-label}
	\begin{tabular}{\|c\|c\|c\|}
	\hline
	& \textbf{Late\_Fusion\_NER} & \textbf{Early\_Fusion\_Mean} \\ \hline
	\textbf{Adam\_Jatowt} & 59/17 & 61/18 \\ \hline
	\textbf{Andreas\_Spitz} & 53/27 & 52/30 \\ \hline
	\textbf{LDA} & 81/13 & 87/6 \\ \hline
	\end{tabular}
	\captionof{table}{Win Loss Performance Analysis}
	\end{table}


	\subsection{Gain/Loss Analysis}
	Maximum gain of $+0.96$ is observed for Query:
	\textit{``Lieutenant Colonel Oliver North and Vice Admiral John Poindexter are indicted on charges of conspiracy to defraud the United States." - 1988}.

	The reason being, in graph-based methods, co-occurrence and other pure statistical measures are calculated blindly in order to find out the relation between a temporal unit and a word present in the corpus. In recurrent neural networks, used in Word2Vec architecture, an internal memory is used to process arbitrary sequences of long inputs. They were introduced to learn distributed representations of structure, such as logical terms. That is why, at the end of vectorization process, we get contextually similar terms. So, even if there are only references of past events happened in years outside coverage, the terms which have occurred in the context of a particular query are interrelated in semantic space (may not be statistically correlated, in terms of co occurrence frequency).

	Maximum Loss of $-0.94$ is observed for Query: \textit{``Columbine High School massacre: Two Littleton, Colorado teenagers, Eric Harris and Dylan Klebold, open fire on their teachers and classmates, killing 12 students and 1 teacher, and then themselves." - 1999}. In this query, the names like `Eric Harris, Dylan Klebold' are pretty ambiguous; so, overall context becomes cloudy in terms of judging the focus time of the event. It's hard to find contextual similarity among the terms of this event.

	\subsection{Easy/Hard Queries}

	Most difficult query to be judged, in the benchmarks, is \textit{``The Intergovernmental Panel on Climate Change releases its first assessment report, linking increases in carbon dioxide in the Earth's atmosphere, and resultant rise in global temperature, to human activities." - 1905}. Since, focus time of this query is way beyond the coverage of the corpus, all of the methods including the baselines and our own methods failed to capture the context.

	Easiest query to be judged, in the benchmarks, is \textit{``Chechen terrorists take 1,128 people hostage, mostly children, in a school in the Beslan school hostage crisis. The hostage-takers demand the release of Chechen rebels imprisoned in neighbouring Ingushetia and the independence of Chechnya from Russia." - 2004}. Since, content of this query is pretty informative and also well with in the coverage of the corpus, all the methods returned the actual focus time on top of the retrieved list.\\

	\noindent\textbf{Comparison between Early Fusion and Late Fusion by an example: \\}
	In a query \textit{``A peaceful student demonstration in Prague, Czechoslovakia, is severely beaten back by riot police. This sparks a revolution aimed at overthrowing the Communist government"}, words like \textit{'prague', 'czechoslovakia', 'communist', 'government', 'overthrowing'} are the important words which drive the final result towards a particular focus time, but in Early Fusion, words like \textit{'riot', 'police', 'beaten', 'revolution'} are adding sufficient noise to the result. In Late Fusion, these noisy parts could not make an impact because these words are practically associated with many years (occurring in various contexts but several yeara). \\

	\noindent\textbf{Late Fusion NER v/s other Late Fusion approaches:\\ }
	If in the same sentence, we find places like, \textit{Montgomery County, Maryland} or \textit{Kuta, Bali} or organization like \textit{International Astronomical Union} or person like \textit{Laurent Kabila}, the result naturally improves because the final result of these words help in establishing a certain sense to which the query is inclined to. If the result has decreased only because the places are far different from each other and the resultant vector could not make sense at all in the direction of the query e.g. in a query like, \textit{``An Amtrak train en route from Washington, D.C. to Boston collides with Conrail engines at Chase, Maryland"}. In this example, \textit{Washington, Boston, Maryland} are all different places, so the resultant vector indicates an ambiguous direction which is not in favor of the direction of the query.