Arxiv

Automating data extraction in systematic reviews of environmental agents

January 2019
Artur Nowak, Paweł Kunstman

Abstract

We describe our entry for the Systematic Review Information Extraction track of the 2018 Text Analysis Conference. Our solution is an end-to-end, deeplearning, sequence tagging model based on the BILSTM-CRF architecture. However, we use interleaved, alternating LSTM layers with highway connections instead of the more traditional approach,where last hidden states of both directions are concatenated to create an input to the next layer. Wealso make extensive use of pre-trained word embeddings, namely GloVe and ELMo. Thanks to anumber of regularization techniques, we were ableto achieve relatively large capacity of the model(31.3M+ of trainable parameters) for the size oftraining set (100 documents, less than 200K tokens).The system’s official score was 60.9% (micro-F1)and it ranked first for the Task 1. Additionally, after rectifying an obvious mistake in the submissionformat, the system scored 67.35%.

Blank white image with no visible content or details