-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
51 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,51 @@ | ||
# ENTYFI | ||
# ENTYFI: Entity Typing in Fictional Texts | ||
|
||
Cuong Xuan Chu, Simon Razniewski, Gerhard Weikum. (WSDM 2020) | ||
Project website: https://www.mpi-inf.mpg.de/yago-naga/entyfi | ||
|
||
# Dependencies | ||
- Python2 for mention detection. | ||
- cPickle | ||
- theano | ||
- python3 with tensorflow for fictional typing | ||
- tensorflow | ||
- sklearn | ||
- pandas | ||
- keras | ||
- python3 for ultra-fine typing and ilp | ||
- Pytorch (ver 0.3.0) | ||
- Python3 | ||
- Numpy | ||
- Tensorboard | ||
- Pickle | ||
- ast, pulp | ||
- Pretrained word embeddings: "wget http://nlp.stanford.edu/data/glove.840B.300d.zip". | ||
|
||
# Required Data | ||
You need to download required data which include background knowledge bases of all reference universes, pretrained models for fictional typing module and data for reference universe ranking. | ||
|
||
All data can be found at: http://people.mpi-inf.mpg.de/~cxchu/entyfi/ | ||
|
||
# Configuration | ||
To run typing, you need to set some paths in several files: | ||
- ultrafile/resources/constant.py | ||
- GLOVE_VEC=path to pretrained word embedding (glove) | ||
- utils/Constants.java | ||
- PYTHON_TAGGER=path to python2 for mention detection | ||
- PYTHON_ULTRA=path to python3 for ultra-fine typing and ilp | ||
- PYTHON_GENERALTYPING=path python3 with tensorflow for fictional typing | ||
- resources/wikia.properties | ||
- BASE_DIR=path to data-store (background KB of all universes) --- data-store (downloaded data) | ||
- ATTENTION_MODEL=path to pretrained model of fictional typing module --- attentionModel (downloaded data) | ||
- TERMATRIX=path to universe-term matrix for reference universe ranking --- universe-termmatrix (downloaded data) | ||
|
||
# How to Run | ||
|
||
- Build: ./build.sh | ||
- Run typing: ./run.sh heap-size typing.ENTYFI input-file output-file | ||
For example: ./run.sh 10G typing.ENTYFI input-file output-file | ||
Other parameters like topK reference universes or topK types returned by ILP can be defined in class typing/ENTYFI.java | ||
|
||
# Notes | ||
|
||
- For mention detection, to improve efficiency, we use technique from the paper: https://arxiv.org/abs/1603.01360 |