Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

Temp2Vec

Master Thesis Project with Max Planck Institute for Informatik, Germany

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Java 1.8
Apache Maven 3.3
IntelliJ IDEA 2016.3.4
MongoDB 3.2.1

Installing

  1. Install Java
  2. Install Maven
  3. IntelliJ IDEA
  4. Install MongoDB

Running the tests

Run BaseLine.AdamJatowt

Steps:
  1. Metadata Collection
    example command line arguments : "local metadata true"
  2. Jaccard Distance Calculation
    example command line arguments : "local jaccard"
  3. Context Association Calculation
    example command line arguments : "local conasso 1"
  4. Temporal Kurtosis Calculation
    example command line arguments : "local tempkurto 2"
  5. Focustime Calculation
    example command line arguments : "local focustime 3"
    In step 1
    -> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes.
    -> args[1] = "metadata", the algorithmic step
    -> args[3] = "true/false", "true" if you want to run on only top 100,000 documents, "false" if you want to run on whole Gigaword
    In steps, 2 to 5
    -> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes.
    -> args[1] = "algorithmic step"
    -> args[2] = "query no."

Run BaseLine.AndreasSpitz

Steps:
  1. Generate annotations for building a LOAD graph
    BaseLine.AndreasSpitz.Annotation.Annotation_App
    example command line arguments : "local"
    -> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes.
  2. Create a database in mongodb. Insert the generated sentence annotations and NER annotations in two different collections. Change the SystemSettings in Settings package.
  3. Build the LOAD graph
    BaseLine.AndreasSpitz.Construction.ParallelExtractNetworkFromMongo
    -> Change folder location in BaseLine.AndreasSpitz.SystemSettings to point where you need to build the graph
  4. Query the graph
    BaseLine.AndreasSpitz.GraphQuery.GraphQueryInterface
    -> Point this program the location of your graph

Run ThesisWork

Steps:
  1. Generate vectors using Word2Vec from your corpus
    ThesisWork.WordVectorCreator.VectorCreator
    Example command line arguments : "local 300 part"
    -> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change config.properties file to make necessary changes.
    -> args[1] = "300" indicates the vector size you want.
    -> args[2] = "part", if you want to create the vector on top 100,000 documents retrieved by 100 queries and args[2] = "full" if you want to run on total Gigaword corpus

  2. Query temporal information
    ThesisWork.PseudoRelevanceApp
    -> example command line arguments : "3 sumtfScore local LateFusion 300 0.2"
    -> args[0] = QueryNo
    -> args[1] = Craftmanship sub model
    -> args[2] = Environment (local/global depends on where you want to run local_machine/server)
    -> args[3] = Craftmanship model
    -> args[4] = Vector Size
    -> args[5] = lambda (weight on global vector in fraction to pseudo-relevant vector)

Authors

Baseline:

Package : BaseLine.AdamJatowt
Author : Supratim Das
Literature : "Estimating Document Focus Time" by Adam Jatowt et al. CIKM'13

Baseline:

Package : BaseLine.AndreasSpitz
Author : Andreas Spitz (Maven Packaging & BaseLine.Annotation package by Supratim Das)
Literature : "Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events" by Adam Jatowt et al. SIGIR'16

Main Thesis Project:

Package : ThesisWork
Author : Supratim Das
Literature : Yet to be written ;)

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments