kotaemon/knowledgehub/loaders
Tuan Anh Nguyen Dang (Tadashi_Cin) 4704e2c11a Add new OCRReader with PDF+OCR text merging (#66)
This change speeds up OCR extraction by allowing bypassing OCR for texts that are irrelevant (not in table).

---------

Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
2023-11-13 17:43:02 +07:00
..
utils Add new OCRReader with PDF+OCR text merging (#66) 2023-11-13 17:43:02 +07:00
__init__.py Update Base interface of Index/Retrieval pipeline (#36) 2023-10-04 14:27:44 +07:00
base.py Add Huggingface embeddings and Cohere embeddings (#63) 2023-11-10 09:38:30 +07:00
excel_loader.py [AUR-432] Add layout-aware table parsing PDF reader (#27) 2023-09-26 15:52:44 +07:00
mathpix_loader.py [AUR-432] Add layout-aware table parsing PDF reader (#27) 2023-09-26 15:52:44 +07:00
ocr_loader.py Add new OCRReader with PDF+OCR text merging (#66) 2023-11-13 17:43:02 +07:00