kotaemon

Files

Tuan Anh Nguyen Dang (Tadashi_Cin) 4704e2c11a Add new OCRReader with PDF+OCR text merging (#66 )

This change speeds up OCR extraction by allowing bypassing OCR for texts that are irrelevant (not in table).

---------

Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>

2023-11-13 17:43:02 +07:00

7810d908b0ff4ce381dcab873196d133.jpg

Add new OCRReader with PDF+OCR text merging (#66 )

2023-11-13 17:43:02 +07:00

dummy.pdf

[AUR-391, AUR-393] Add Document and DocumentReader base (#6 )

2023-08-31 11:24:12 +07:00

dummy.xlsx

[AUR-432] Add layout-aware table parsing PDF reader (#27 )

2023-09-26 15:52:44 +07:00

embedding_openai_batch.json

[AUR-389] Add base interface and embedding model (#17 )

2023-09-14 14:08:58 +07:00

embedding_openai.json

[AUR-389] Add base interface and embedding model (#17 )

2023-09-14 14:08:58 +07:00

fullocr_sample_output.json

Add new OCRReader with PDF+OCR text merging (#66 )

2023-11-13 17:43:02 +07:00

policy.md

[AUR-432] Add layout-aware table parsing PDF reader (#27 )

2023-09-26 15:52:44 +07:00

table.pdf

Add new OCRReader with PDF+OCR text merging (#66 )

2023-11-13 17:43:02 +07:00