kotaemon/knowledgehub
Tuan Anh Nguyen Dang (Tadashi_Cin) 4704e2c11a Add new OCRReader with PDF+OCR text merging (#66)
This change speeds up OCR extraction by allowing bypassing OCR for texts that are irrelevant (not in table).

---------

Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
2023-11-13 17:43:02 +07:00
..
base Simplify the BaseComponent inteface (#64) 2023-11-13 15:10:18 +07:00
chatbot Upgrade the declarative pipeline for cleaner interface (#51) 2023-10-24 11:12:22 +07:00
composite Simplify the BaseComponent inteface (#64) 2023-11-13 15:10:18 +07:00
contribs Upgrade the declarative pipeline for cleaner interface (#51) 2023-10-24 11:12:22 +07:00
docstores [AUR-338, AUR-406, AUR-407] Export pipeline to config for PromptUI. Construct PromptUI dynamically based on config. (#16) 2023-09-21 14:27:23 +07:00
documents Upgrade the declarative pipeline for cleaner interface (#51) 2023-10-24 11:12:22 +07:00
embeddings Simplify the BaseComponent inteface (#64) 2023-11-13 15:10:18 +07:00
llms Simplify the BaseComponent inteface (#64) 2023-11-13 15:10:18 +07:00
loaders Add new OCRReader with PDF+OCR text merging (#66) 2023-11-13 17:43:02 +07:00
parsers Update Base interface of Index/Retrieval pipeline (#36) 2023-10-04 14:27:44 +07:00
pipelines Add new OCRReader with PDF+OCR text merging (#66) 2023-11-13 17:43:02 +07:00
post_processing Simplify the BaseComponent inteface (#64) 2023-11-13 15:10:18 +07:00
prompt Simplify the BaseComponent inteface (#64) 2023-11-13 15:10:18 +07:00
vectorstores Simplify the BaseComponent inteface (#64) 2023-11-13 15:10:18 +07:00
__init__.py Simplify the BaseComponent inteface (#64) 2023-11-13 15:10:18 +07:00
cli.py Provide ready binary for Mac and Linux to do sharing tunneling (#49) 2023-10-17 17:19:29 +07:00