Abstract: Exporting selected textual data from PDF formats is a challenging task due to the diverse structures of these documents. This project introduces a tool for efficient extraction of ...
This project implements a clean, modular pipeline for technical PDFs: ingestion → index → RAG → evaluation. It extracts text, tables, and images, builds a vector index, answers questions with grounded ...
A comprehensive PDF processing pipeline that extracts structured data from complex PDFs, including OCR text, tables, images, and rich context-aware metadata using Large Language Models (LLMs).
Some results have been hidden because they may be inaccessible to you
Show inaccessible results