Master Thesis Project with Max Planck Institute for Informatik, Saarbrücken
Java C TeX Shell Other
Switch branches/tags
Nothing to show
Latest commit bea2e30 Jul 19, 2017 @sudass committed on GitHub Enterprise Update PseudoRelevanceApp.java
Permalink
Failed to load latest commit information.
.idea Latest Changes Jul 14, 2017
Diagrams Latest Changes Jul 14, 2017
EvalMeasures/trec_eval.9.0 Latest Changes Jul 14, 2017
Inputs Latest Changes Jul 14, 2017
Outputs Latest Changes Jul 14, 2017
Results Latest Changes Jul 14, 2017
Scripts Latest Changes Jul 14, 2017
TopDocs Latest Changes Jul 14, 2017
out Latest Changes Jul 14, 2017
src Update PseudoRelevanceApp.java Jul 19, 2017
submission Latest Changes Jul 14, 2017
target Latest Changes Jul 14, 2017
README.md updated README Jul 19, 2017
Temp2Vec.iml Latest Changes Jul 14, 2017
pom.xml Latest Changes Jul 14, 2017

README.md

Temp2Vec

Master Thesis Project with Max Planck Institute for Informatik, Germany

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Java 1.8
Apache Maven 3.3
IntelliJ IDEA 2016.3.4
MongoDB 3.2.1

Installing

  1. Install Java
  2. Install Maven
  3. IntelliJ IDEA
  4. Install MongoDB

Running the tests

Run BaseLine.AdamJatowt

Steps:
  1. Metadata Collection
    example command line arguments : "local metadata true"
  2. Jaccard Distance Calculation
    example command line arguments : "local jaccard"
  3. Context Association Calculation
    example command line arguments : "local conasso 1"
  4. Temporal Kurtosis Calculation
    example command line arguments : "local tempkurto 2"
  5. Focustime Calculation
    example command line arguments : "local focustime 3"
    In step 1
    -> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes.
    -> args[1] = "metadata", the algorithmic step
    -> args[3] = "true/false", "true" if you want to run on only top 100,000 documents, "false" if you want to run on whole Gigaword
    In steps, 2 to 5
    -> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes.
    -> args[1] = "algorithmic step"
    -> args[2] = "query no."

Run BaseLine.AndreasSpitz

Steps:
  1. Generate annotations for building a LOAD graph
    BaseLine.AndreasSpitz.Annotation.Annotation_App
    example command line arguments : "local"
    -> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes.
  2. Create a database in mongodb. Insert the generated sentence annotations and NER annotations in two different collections. Change the SystemSettings in Settings package.
  3. Build the LOAD graph
    BaseLine.AndreasSpitz.Construction.ParallelExtractNetworkFromMongo
    -> Change folder location in BaseLine.AndreasSpitz.SystemSettings to point where you need to build the graph
  4. Query the graph
    BaseLine.AndreasSpitz.GraphQuery.GraphQueryInterface
    -> Point this program the location of your graph

Run ThesisWork

Steps:
  1. Generate vectors using Word2Vec from your corpus
    ThesisWork.WordVectorCreator.VectorCreator
    Example command line arguments : "local 300 part"
    -> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change config.properties file to make necessary changes.
    -> args[1] = "300" indicates the vector size you want.
    -> args[2] = "part", if you want to create the vector on top 100,000 documents retrieved by 100 queries and args[2] = "full" if you want to run on total Gigaword corpus

  2. Query temporal information
    ThesisWork.PseudoRelevanceApp
    -> example command line arguments : "3 sumtfScore local LateFusion 300 0.2"
    -> args[0] = QueryNo
    -> args[1] = Craftmanship sub model
    -> args[2] = Environment (local/global depends on where you want to run local_machine/server)
    -> args[3] = Craftmanship model
    -> args[4] = Vector Size
    -> args[5] = lambda (weight on global vector in fraction to pseudo-relevant vector)

Authors

Baseline:

Package : BaseLine.AdamJatowt
Author : Supratim Das
Literature : "Estimating Document Focus Time" by Adam Jatowt et al. CIKM'13

Baseline:

Package : BaseLine.AndreasSpitz
Author : Andreas Spitz (Maven Packaging & BaseLine.Annotation package by Supratim Das)
Literature : "Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events" by Adam Jatowt et al. SIGIR'16

Main Thesis Project:

Package : ThesisWork
Author : Supratim Das
Literature : Yet to be written ;)

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments