Skip to content
Permalink
master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
executable file 100 lines (88 sloc) 4.56 KB
## Temp2Vec
Master Thesis Project with Max Planck Institute for Informatik, Germany </br>
### Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. </br>
### Prerequisites
Java 1.8 </br>
Apache Maven 3.3 </br>
IntelliJ IDEA 2016.3.4 </br>
MongoDB 3.2.1 </br>
### Installing
1. Install Java </br>
2. Install Maven </br>
3. IntelliJ IDEA </br>
4. Install MongoDB </br>
### Running the tests
#### Run BaseLine.AdamJatowt
##### Steps: </br>
1. Metadata Collection </br>
example command line arguments : "local metadata true" </br>
2. Jaccard Distance Calculation </br>
example command line arguments : "local jaccard" </br>
3. Context Association Calculation </br>
example command line arguments : "local conasso 1" </br>
4. Temporal Kurtosis Calculation </br>
example command line arguments : "local tempkurto 2" </br>
5. Focustime Calculation </br>
example command line arguments : "local focustime 3" </br>
In step 1</br>
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file
to make necessary changes. </br>
-> args[1] = "metadata", the algorithmic step </br>
-> args[3] = "true/false", "true" if you want to run on only top 100,000 documents, "false" if you want to run on whole Gigaword</br>
In steps, 2 to 5</br>
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file
to make necessary changes. </br>
-> args[1] = "algorithmic step" </br>
-> args[2] = "query no." </br>
#### Run BaseLine.AndreasSpitz
##### Steps: </br>
1. Generate annotations for building a LOAD graph </br>
BaseLine.AndreasSpitz.Annotation.Annotation_App </br>
example command line arguments : "local" </br>
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file
to make necessary changes.</br>
2. Create a database in mongodb. Insert the generated sentence annotations and NER annotations in two different collections. Change the SystemSettings in Settings package.</br>
3. Build the LOAD graph </br>
BaseLine.AndreasSpitz.Construction.ParallelExtractNetworkFromMongo </br>
-> Change folder location in BaseLine.AndreasSpitz.SystemSettings to point where you need to build the graph </br>
4. Query the graph </br>
BaseLine.AndreasSpitz.GraphQuery.GraphQueryInterface </br>
-> Point this program the location of your graph </br>
#### Run ThesisWork
##### Steps: </br>
1. Generate vectors using Word2Vec from your corpus </br>
ThesisWork.WordVectorCreator.VectorCreator </br>
Example command line arguments : "local 300 part" </br>
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change config.properties file
to make necessary changes. </br>
-> args[1] = "300" indicates the vector size you want. </br>
-> args[2] = "part", if you want to create the vector on top 100,000 documents retrieved by 100 queries and args[2] = "full" if you want to run on total Gigaword corpus</br>
2. Query temporal information </br>
ThesisWork.PseudoRelevanceApp </br>
-> example command line arguments : "3 sumtfScore local LateFusion 300 0.2" </br>
-> args[0] = QueryNo </br>
-> args[1] = Craftmanship sub model </br>
-> args[2] = Environment (local/global depends on where you want to run local_machine/server) </br>
-> args[3] = Craftmanship model </br>
-> args[4] = Vector Size </br>
-> args[5] = lambda (weight on global vector in fraction to pseudo-relevant vector) </br>
### Authors </br>
#### Baseline: </br>
Package : BaseLine.AdamJatowt </br>
Author : Supratim Das </br>
Literature : "Estimating Document Focus Time" by Adam Jatowt et al. CIKM'13 </br>
#### Baseline: </br>
Package : BaseLine.AndreasSpitz </br>
Author : Andreas Spitz (Maven Packaging & BaseLine.Annotation package by Supratim Das) </br>
Literature : "Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events" by Adam Jatowt et al. SIGIR'16 </br>
#### Main Thesis Project: </br>
Package : ThesisWork </br>
Author : Supratim Das </br>
Literature : Yet to be written ;)
### License
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details </br>
### Acknowledgments
* Thesis Supervisor : Dr. Klaus Berberich <http://people.mpi-inf.mpg.de/~kberberi/> </br>
* Thesis Advisor : Arunav Mishra <http://people.mpi-inf.mpg.de/~amishra/> </br>
* Thesis Advisor : Vinay Setty <http://vbn.aau.dk/en/persons/vinay-jayarama-setty(b23ea8ea-1b2f-4d23-9908-693ac76d4be5).html> </br>