Temp2Vec
Master Thesis Project with Max Planck Institute for Informatik, Germany
Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Prerequisites
Java 1.8
Apache Maven 3.3
IntelliJ IDEA 2016.3.4
MongoDB 3.2.1
Installing
- Install Java
- Install Maven
- IntelliJ IDEA
- Install MongoDB
Running the tests
Run BaseLine.AdamJatowt
- Metadata Collection
example command line arguments : "local metadata true" - Jaccard Distance Calculation
example command line arguments : "local jaccard" - Context Association Calculation
example command line arguments : "local conasso 1" - Temporal Kurtosis Calculation
example command line arguments : "local tempkurto 2" - Focustime Calculation
example command line arguments : "local focustime 3"
In step 1
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes.
-> args[1] = "metadata", the algorithmic step
-> args[3] = "true/false", "true" if you want to run on only top 100,000 documents, "false" if you want to run on whole Gigaword
In steps, 2 to 5
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes.
-> args[1] = "algorithmic step"
-> args[2] = "query no."
Run BaseLine.AndreasSpitz
- Generate annotations for building a LOAD graph
BaseLine.AndreasSpitz.Annotation.Annotation_App
example command line arguments : "local"
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file to make necessary changes. - Create a database in mongodb. Insert the generated sentence annotations and NER annotations in two different collections. Change the SystemSettings in Settings package.
- Build the LOAD graph
BaseLine.AndreasSpitz.Construction.ParallelExtractNetworkFromMongo
-> Change folder location in BaseLine.AndreasSpitz.SystemSettings to point where you need to build the graph - Query the graph
BaseLine.AndreasSpitz.GraphQuery.GraphQueryInterface
-> Point this program the location of your graph
Run ThesisWork
-
Generate vectors using Word2Vec from your corpus
ThesisWork.WordVectorCreator.VectorCreator
Example command line arguments : "local 300 part"
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change config.properties file to make necessary changes.
-> args[1] = "300" indicates the vector size you want.
-> args[2] = "part", if you want to create the vector on top 100,000 documents retrieved by 100 queries and args[2] = "full" if you want to run on total Gigaword corpus -
Query temporal information
ThesisWork.PseudoRelevanceApp
-> example command line arguments : "3 sumtfScore local LateFusion 300 0.2"
-> args[0] = QueryNo
-> args[1] = Craftmanship sub model
-> args[2] = Environment (local/global depends on where you want to run local_machine/server)
-> args[3] = Craftmanship model
-> args[4] = Vector Size
-> args[5] = lambda (weight on global vector in fraction to pseudo-relevant vector)
Package : BaseLine.AdamJatowt
Author : Supratim Das
Literature : "Estimating Document Focus Time" by Adam Jatowt et al. CIKM'13
Package : BaseLine.AndreasSpitz
Author : Andreas Spitz (Maven Packaging & BaseLine.Annotation package by Supratim Das)
Literature : "Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events" by Adam Jatowt et al. SIGIR'16
Package : ThesisWork
Author : Supratim Das
Literature : Yet to be written ;)
License
This project is licensed under the MIT License - see the LICENSE.md file for details
Acknowledgments
- Thesis Supervisor : Dr. Klaus Berberich http://people.mpi-inf.mpg.de/~kberberi/
- Thesis Advisor : Arunav Mishra http://people.mpi-inf.mpg.de/~amishra/
- Thesis Advisor : Vinay Setty http://vbn.aau.dk/en/persons/vinay-jayarama-setty(b23ea8ea-1b2f-4d23-9908-693ac76d4be5).html