This project is a small pipeline for exploring a corpus of text/PDF documents (e.g., the House Oversight Committee’s Jeffrey Epstein email release). Unzip the contents locally, e.g.: project-root/ ...