Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Temp2Vec/README.md
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
executable file
100 lines (88 sloc)
4.56 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Temp2Vec | |
Master Thesis Project with Max Planck Institute for Informatik, Germany </br> | |
### Getting Started | |
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. </br> | |
### Prerequisites | |
Java 1.8 </br> | |
Apache Maven 3.3 </br> | |
IntelliJ IDEA 2016.3.4 </br> | |
MongoDB 3.2.1 </br> | |
### Installing | |
1. Install Java </br> | |
2. Install Maven </br> | |
3. IntelliJ IDEA </br> | |
4. Install MongoDB </br> | |
### Running the tests | |
#### Run BaseLine.AdamJatowt | |
##### Steps: </br> | |
1. Metadata Collection </br> | |
example command line arguments : "local metadata true" </br> | |
2. Jaccard Distance Calculation </br> | |
example command line arguments : "local jaccard" </br> | |
3. Context Association Calculation </br> | |
example command line arguments : "local conasso 1" </br> | |
4. Temporal Kurtosis Calculation </br> | |
example command line arguments : "local tempkurto 2" </br> | |
5. Focustime Calculation </br> | |
example command line arguments : "local focustime 3" </br> | |
In step 1</br> | |
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file | |
to make necessary changes. </br> | |
-> args[1] = "metadata", the algorithmic step </br> | |
-> args[3] = "true/false", "true" if you want to run on only top 100,000 documents, "false" if you want to run on whole Gigaword</br> | |
In steps, 2 to 5</br> | |
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file | |
to make necessary changes. </br> | |
-> args[1] = "algorithmic step" </br> | |
-> args[2] = "query no." </br> | |
#### Run BaseLine.AndreasSpitz | |
##### Steps: </br> | |
1. Generate annotations for building a LOAD graph </br> | |
BaseLine.AndreasSpitz.Annotation.Annotation_App </br> | |
example command line arguments : "local" </br> | |
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change properties file | |
to make necessary changes.</br> | |
2. Create a database in mongodb. Insert the generated sentence annotations and NER annotations in two different collections. Change the SystemSettings in Settings package.</br> | |
3. Build the LOAD graph </br> | |
BaseLine.AndreasSpitz.Construction.ParallelExtractNetworkFromMongo </br> | |
-> Change folder location in BaseLine.AndreasSpitz.SystemSettings to point where you need to build the graph </br> | |
4. Query the graph </br> | |
BaseLine.AndreasSpitz.GraphQuery.GraphQueryInterface </br> | |
-> Point this program the location of your graph </br> | |
#### Run ThesisWork | |
##### Steps: </br> | |
1. Generate vectors using Word2Vec from your corpus </br> | |
ThesisWork.WordVectorCreator.VectorCreator </br> | |
Example command line arguments : "local 300 part" </br> | |
-> args[0] = "local", if you want to run in your local machine. "global", if you want to run in server. Change config.properties file | |
to make necessary changes. </br> | |
-> args[1] = "300" indicates the vector size you want. </br> | |
-> args[2] = "part", if you want to create the vector on top 100,000 documents retrieved by 100 queries and args[2] = "full" if you want to run on total Gigaword corpus</br> | |
2. Query temporal information </br> | |
ThesisWork.PseudoRelevanceApp </br> | |
-> example command line arguments : "3 sumtfScore local LateFusion 300 0.2" </br> | |
-> args[0] = QueryNo </br> | |
-> args[1] = Craftmanship sub model </br> | |
-> args[2] = Environment (local/global depends on where you want to run local_machine/server) </br> | |
-> args[3] = Craftmanship model </br> | |
-> args[4] = Vector Size </br> | |
-> args[5] = lambda (weight on global vector in fraction to pseudo-relevant vector) </br> | |
### Authors </br> | |
#### Baseline: </br> | |
Package : BaseLine.AdamJatowt </br> | |
Author : Supratim Das </br> | |
Literature : "Estimating Document Focus Time" by Adam Jatowt et al. CIKM'13 </br> | |
#### Baseline: </br> | |
Package : BaseLine.AndreasSpitz </br> | |
Author : Andreas Spitz (Maven Packaging & BaseLine.Annotation package by Supratim Das) </br> | |
Literature : "Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events" by Adam Jatowt et al. SIGIR'16 </br> | |
#### Main Thesis Project: </br> | |
Package : ThesisWork </br> | |
Author : Supratim Das </br> | |
Literature : Yet to be written ;) | |
### License | |
This project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details </br> | |
### Acknowledgments | |
* Thesis Supervisor : Dr. Klaus Berberich <http://people.mpi-inf.mpg.de/~kberberi/> </br> | |
* Thesis Advisor : Arunav Mishra <http://people.mpi-inf.mpg.de/~amishra/> </br> | |
* Thesis Advisor : Vinay Setty <http://vbn.aau.dk/en/persons/vinay-jayarama-setty(b23ea8ea-1b2f-4d23-9908-693ac76d4be5).html> </br> |