Parse a pdf, quickly
Irwin Williams
- You have docker installed.
- You don’t have time to futz around with PDFs
If you have a PDF you’d like to get tables out of, tabula works a dream. When last I used it successfully, I was on my mac. I jumped on to a windows machine, and I just couldn’t get it to run.
*a few googles later*
I found a docker command that hosts a web API that gives access to tabula:
docker run -p 8080:8080 -e HOST=0.0.0.0 gavinkflam/tabula-api:1.0.0
At the readme for this container, the example was this:
curl -X POST -H ‘Content-Type: multipart/form-data’
-F ‘[email protected]’ -F ‘guess=true’ -F ‘pages=all’
http://localhost:8080/api/extract
Thankfully, because of WSL, I just popped over into bash and I had tables in a flash 😎
Feb 2020