kotaemon

Author	SHA1	Message	Date
cin-klein	66e565649e	feat: integrate nano-graphrag (#433 ) * add nano graph-rag * ignore entities for relevant context reference * refactor and add local model as default nano-graphrag * feat: add kotaemon llm & embedding integration with nanographrag * fix: add env var for nano GraphRAG --------- Co-authored-by: Tadashi <tadashi@cinnamon.is>	2024-10-30 15:32:30 +07:00
Khoi-Nguyen Nguyen-Ngoc	19b386b51e	fix: pin `python-multipart` version to avoid yanking issues with micropip (#436 )	2024-10-28 15:13:47 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	f2f192ed72	feat: add toggle dark mode button on main Chat UI (#423 ) * feat: add toggle dark mode button on main UI * docs: update docs	2024-10-22 18:48:18 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	764fe595f4	feat: add file grouping feature (#416 ) bump:patch	2024-10-21 12:47:18 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	e6fa1af404	feat: add mindmap visualization (#405 ) bump:minor	2024-10-17 14:35:28 +07:00
a652	4764b0e82a	fix: update adobe_loader (#399 ) bump:patch * Update adobe_loader.py fix:When initializing the Document, extra_info was not added to the metadata. * Update adobe_loader.py Change the method of checking whether extra_info is None.	2024-10-16 11:01:55 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	b113efc855	feat: add web URL loader & refine indexing logics (#397 ) * feat: add web URL loader & refine indexing logics * fix: comfort mypy	2024-10-15 22:42:24 +07:00
ronchengang	8188760f32	feat: allow to use customized GraphRAG settings.yaml (#387 ) bump:patch * allow to use customized GraphRAG settings.yaml * adjust import style * fix typo * Added GraphRAG original documentation reference. * feat: allow to use customized GraphRAG settings.yaml (#387) --------- Co-authored-by: Chen, Ron Gang <git@git.com>	2024-10-14 21:18:34 +07:00
Tadashi	f0f3b4b23e	docs: update README #none	2024-10-14 10:16:43 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	15c7916ad8	fix: improve GRAPHRAG key passing (#384 ) #none	2024-10-11 12:01:06 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	6da9db489f	fix: add optional graphrag toggle in dockerfile (#377 ) * fix: toggle graphrag install in Docker build * fix: update Dockerfile * fix: remove unused logics in chat_fn * fix: disable duckduckgo test due to API limit	2024-10-10 16:09:57 +07:00
Tadashi	3ff6af8acf	fix: optimize chat suggestion logic	2024-10-10 14:44:50 +07:00
ronchengang	ad34395d0b	update output path logic since GraphRAG has changed the storage config value in the latest release (#374 ) bump:patch Co-authored-by: Chen, Ron Gang <git@git.com>	2024-10-10 11:20:02 +07:00
KennyWu	49a083fd9f	feat: tweak the 'Chat suggestion' feature to tie it to conversations (#341 ) #none Signed-off-by: Kennywu <jdlow@live.cn>	2024-10-10 11:02:04 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	dfd00fe752	fix: vastly improve chat UI responsiveness by reordering Gradio events (#360 ) bump:patch	2024-10-04 17:15:49 +07:00
taprosoft	76ab3fdd90	fix: check empty Cohere key in rerank	2024-10-01 09:37:09 +00:00
Tadashi	a424a630f2	fix: pass .env.example to Docker and release package bump:patch	2024-10-01 14:49:57 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	7ac8f0329a	fix: improve Download UI bump:minor (#352 ) * fix: rerank test ui * fix: improve download all UI	2024-10-01 12:03:19 +07:00
KennyWu	53530e296f	feat: support TEI embedding service, configurable reranking model (#287 ) * feat: add support for TEI embedding service, allow reranking model to be configurable. Signed-off-by: Kennywu <jdlow@live.cn> * fix: add cohere default reranking model * fix: comfort pre-commit --------- Signed-off-by: Kennywu <jdlow@live.cn> Co-authored-by: wujiaye <wujiaye@bluemoon.com.cn> Co-authored-by: Tadashi <tadashi@cinnamon.is>	2024-09-30 22:00:00 +07:00
Mikhail Khludnev	2e3c17b256	fix: convert graphrag input path to str (#237 ) #none I noticed type cast error in pycharm debug (it intercept forking programs). Anyway, this change obeys types. Although, this conversions happens implicitly. Co-authored-by: Tadashi <tadashi@cinnamon.is>	2024-09-29 23:02:23 +07:00
Pedro Lima	aac6233412	feat: button to delete all files in index (#320 ) #none * button to delete all files in index * code formatting --------- Co-authored-by: Tadashi <tadashi@cinnamon.is>	2024-09-29 22:55:51 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	b7e81e61dd	fix: remove duplicated deps (#344 ) #none	2024-09-29 22:38:02 +07:00
Ben Dykstra	f7b6f313b5	fix: update setup instructions (#144 ) #none * activate directory to gitignore * add my custom env to gitignore, will have to change that * add unstructured to kotaemon pyproject.toml * add .env to gitignore * remove .env from tracking * make changes to the run_macos script, update readme with more detailed instructions * remove my personal changes from gitignore * remove line from run_macos script * remove option for not installing miniconda for non technical users, mark docker dependency as optional * docs: update demo URL * gitignore changes * merge .env.example * revert changes to run_macos.sh * unstructured to advanced dependencies * add link to unstructured system dependencies * remove api key * fix: skip tests when unstructured pdf not installed * chore: loosen unstructured package version in pyproject.toml * chore: correct syntax --------- Co-authored-by: Tadashi <tadashi@cinnamon.is> Co-authored-by: cin-albert <albert@cinnamon.is>	2024-09-29 22:26:02 +07:00
taprosoft	00df123309	fix: fix vectorstore tests #none	2024-09-27 04:18:11 +00:00
saidmukhamad	94cc3a96c2	fix: add langchain google dependency (#329 ) * add-gemini-deps * uncomment gemeni flow settings	2024-09-27 11:15:42 +07:00
Tadashi	79b309396b	fix: update default cohere embedding models bump:patch	2024-09-25 11:10:09 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	88d577b0cc	feat: add first setup screen for LLM & Embedding models (#314 ) (bump:minor) * fix: utf-8 txt reader * fix: revise vectorstore import and make it optional * feat: add cohere chat model with tool call support * fix: simplify citation pipeline * fix: improve citation logic * fix: improve decompose func call * fix: revise question rewrite prompt * fix: revise chat box default placeholder * fix: add key from ktem to cohere rerank * fix: conv name suggestion * fix: ignore default key cohere rerank * fix: improve test connection UI * fix: reorder requirements * feat: add first setup screen * fix: update requirements * fix: vectorstore tests * fix: update cohere version * fix: relax langchain core version * fix: add demo mode * fix: update flowsettings * fix: typo * fix: fix bool env passing	2024-09-22 16:32:23 +07:00
Khoi-Nguyen Nguyen-Ngoc	a865e2b095	feat: modify base dependencies + remove unnecessary packages in lite docker (#310 ) * feat: update base/adv dependencies * feat: update Dockerfile * ci: update free disk for docker build	2024-09-21 12:11:58 +07:00
Quang (Albert)	7762190d05	feat: add local theme (#288 ) * feat: add local theme instead of from hub * chore: add credit * fix: typo	2024-09-17 19:03:39 +07:00
Anush	e2bd78e9c4	feat: Qdrant vectorstore support (#260 ) * feat: Qdrant vectorstore support * chore: review changes * docs: Updated README.md	2024-09-16 04:17:36 +07:00
kan_cin	d3fd75297f	feat: add multi-stages docker and support platform arm (#274 ) * feat: add multi-stages docker and support platform arm * refactor: pre-commit * fix: raise ImportError (fastembed) instead of auto install * feat: add dependencies for local llm * feat: free disk * feat: update README * feat: update README * chore: fix typo --------- Co-authored-by: cin-niko <niko@cinnamon.is>	2024-09-12 20:25:03 +07:00
mst	73a476979e	fix: change column type to string for relation_type (#272 ) #none	2024-09-11 20:47:03 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	96d2086017	fix: add guidance parameters for LC wrapper models (#255 ) * fix: add docstring to LC wrapper models * fix: fix metadata passing with LC embedding wrapper	2024-09-09 14:15:34 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	b06c4777a3	fix: add PDFJS download to Windows setup (#249 )	2024-09-08 21:22:01 +07:00
kan_cin	dbb6bb275f	feat: add test connection for edit spec (#239 )	2024-09-08 10:55:13 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	069f0f3c83	feat: expose Cohere and HF embedding support on UI (#236 )	2024-09-06 18:18:19 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	ef7e91fcae	fix: update requirements (#230 )	2024-09-06 09:36:21 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	e2ed3564ce	fix: limit fastapi version (#229 )	2024-09-06 09:23:26 +07:00
Tadashi	318895b287	fix: disable default install for anthropic	2024-09-05 23:18:53 +07:00
Tadashi	3267e6c654	fix: disable default install for google-genai package	2024-09-05 23:08:28 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	05245f501c	feat: add support for Gemini, Claude through Langchain (#225 ) (bump:patch)	2024-09-05 21:58:20 +07:00
ChengZi	772186b6e5	feat: support milvus vector db (#188 ) #none Signed-off-by: ChengZi <chen.zhang@zilliz.com>	2024-09-04 20:22:50 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	76f2652d2a	fix: re-enable tests and fix legacy test interface (#208 ) * fix: re-enable tests and fix legacy test interface * fix: skip llamacpp based on installed status * fix: minor fix	2024-09-04 12:37:39 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	607867d7e6	feat: add markdown file support (#202 ) * feat: add support for .md * fix: disable download all on private collection	2024-09-03 23:15:26 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	35b2927e5c	fix: update app version resolver in flowsettings (#180 ) (bump:patch)	2024-09-02 17:42:39 +07:00
kan_cin	041d229282	feat: add test connection feature (#166 ) * feat: add test connection feature * fix: typo	2024-09-01 08:22:36 +07:00
Quang (Albert)	4b2b334d2c	fix: refine kotaemon/pyproject.toml (#153 )	2024-08-30 23:02:14 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	d880294153	fix: pwd change in setttings (#147 )	2024-08-29 13:41:12 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	bb56ef4f8e	chore: update workflow (#124 )	2024-08-26 09:52:16 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	2570e11501	feat: merge develop (#123 ) * Support hybrid vector retrieval * Enable figures and table reading in Azure DI * Retrieve with multi-modal * Fix mixing up table * Add txt loader * Add Anthropic Chat * Raising error when retrieving help file * Allow same filename for different people if private is True * Allow declaring extra LLM vendors * Show chunks on the File page * Allow elasticsearch to get more docs * Fix Cohere response (#86) * Fix Cohere response * Remove Adobe pdfservice from dependency kotaemon doesn't rely more pdfservice for its core functionality, and pdfservice uses very out-dated dependency that causes conflict. --------- Co-authored-by: trducng <trungduc1992@gmail.com> * Add confidence score (#87) * Save question answering data as a log file * Save the original information besides the rewritten info * Export Cohere relevance score as confidence score * Fix style check * Upgrade the confidence score appearance (#90) * Highlight the relevance score * Round relevance score. Get key from config instead of env * Cohere return all scores * Display relevance score for image * Remove columns and rows in Excel loader which contains all NaN (#91) * remove columns and rows which contains all NaN * back to multiple joiner options * Fix style --------- Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local> Co-authored-by: trducng <trungduc1992@gmail.com> * Track retriever state * Bump llama-index version 0.10 * feat/save-azuredi-mhtml-to-markdown (#93) * feat/save-azuredi-mhtml-to-markdown * fix: replace os.path to pathlib change theflow.settings * refactor: base on pre-commit * chore: move the func of saving content markdown above removed_spans --------- Co-authored-by: jacky0218 <jacky0218@github.com> * fix: losing first chunk (#94) * fix: losing first chunk. * fix: update the method of preventing losing chunks --------- Co-authored-by: jacky0218 <jacky0218@github.com> * fix: adding the base64 image in markdown (#95) * feat: more chunk info on UI * fix: error when reindexing files * refactor: allow more information exception trace when using gpt4v * feat: add excel reader that treats each worksheet as a document * Persist loader information when indexing file * feat: allow hiding unneeded setting panels * feat: allow specific timezone when creating conversation * feat: add more confidence score (#96) * Allow a list of rerankers * Export llm reranking score instead of filter with boolean * Get logprobs from LLMs * Rename cohere reranking score * Call 2 rerankers at once * Run QA pipeline for each chunk to get qa_score * Display more relevance scores * Define another LLMScoring instead of editing the original one * Export logprobs instead of probs * Call LLMScoring * Get qa_score only in the final answer * feat: replace text length with token in file list * ui: show index name instead of id in the settings * feat(ai): restrict the vision temperature * fix(ui): remove the misleading message about non-retrieved evidences * feat(ui): show the reasoning name and description in the reasoning setting page * feat(ui): show version on the main windows * feat(ui): show default llm name in the setting page * fix(conf): append the result of doc in llm_scoring (#97) * fix: constraint maximum number of images * feat(ui): allow filter file by name in file list page * Fix exceeding token length error for OpenAI embeddings by chunking then averaging (#99) * Average embeddings in case the text exceeds max size * Add docstring * fix: Allow empty string when calling embedding * fix: update trulens LLM ranking score for retrieval confidence, improve citation (#98) * Round when displaying not by default * Add LLMTrulens reranking model * Use llmtrulensscoring in pipeline * fix: update UI display for trulen score --------- Co-authored-by: taprosoft <tadashi@cinnamon.is> * feat: add question decomposition & few-shot rewrite pipeline (#89) * Create few-shot query-rewriting. Run and display the result in info_panel * Fix style check * Put the functions to separate modules * Add zero-shot question decomposition * Fix fewshot rewriting * Add default few-shot examples * Fix decompose question * Fix importing rewriting pipelines * fix: update decompose logic in fullQA pipeline --------- Co-authored-by: taprosoft <tadashi@cinnamon.is> * fix: add encoding utf-8 when save temporal markdown in vectorIndex (#101) * fix: improve retrieval pipeline and relevant score display (#102) * fix: improve retrieval pipeline by extending first round top_k with multiplier * fix: minor fix * feat: improve UI default settings and add quick switch option for pipeline * fix: improve agent logics (#103) * fix: improve agent progres display * fix: update retrieval logic * fix: UI display * fix: less verbose debug log * feat: add warning message for low confidence * fix: LLM scoring enabled by default * fix: minor update logics * fix: hotfix image citation * feat: update docx loader for handle merged table cells + handle zip file upload (#104) * feat: update docx loader for handle merged table cells * feat: handle zip file * refactor: pre-commit * fix: escape text in download UI * feat: optimize vector store query db (#105) * feat: optimize vector store query db * feat: add file_id to chroma metadatas * feat: remove unnecessary logs and update migrate script * feat: iterate through file index * fix: remove unused code --------- Co-authored-by: taprosoft <tadashi@cinnamon.is> * fix: add openai embedidng exponential back-off * fix: update import download_loader * refactor: codespell * fix: update some default settings * fix: update installation instruction * fix: default chunk length in simple QA * feat: add share converstation feature and enable retrieval history (#108) * feat: add share converstation feature and enable retrieval history * fix: update share conversation UI --------- Co-authored-by: taprosoft <tadashi@cinnamon.is> * fix: allow exponential backoff for failed OCR call (#109) * fix: update default prompt when no retrieval is used * fix: create embedding for long image chunks * fix: add exception handling for additional table retriever * fix: clean conversation & file selection UI * fix: elastic search with empty doc_ids * feat: add thumbnail PDF reader for quick multimodal QA * feat: add thumbnail handling logic in indexing * fix: UI text update * fix: PDF thumb loader page number logic * feat: add quick indexing pipeline and update UI * feat: add conv name suggestion * fix: minor UI change * feat: citation in thread * fix: add conv name suggestion in regen * chore: add assets for usage doc * chore: update usage doc * feat: pdf viewer (#110) * feat: update pdfviewer * feat: update missing files * fix: update rendering logic of infor panel * fix: improve thumbnail retrieval logic * fix: update PDF evidence rendering logic * fix: remove pdfjs built dist * fix: reduce thumbnail evidence count * chore: update gitignore * fix: add js event on chat msg select * fix: update css for viewer * fix: add env var for PDFJS prebuilt * fix: move language setting to reasoning utils --------- Co-authored-by: phv2312 <kat87yb@gmail.com> Co-authored-by: trducng <trungduc1992@gmail.com> * feat: graph rag (#116) * fix: reload server when add/delete index * fix: rework indexing pipeline to be able to disable vectorstore and splitter if needed * feat: add graphRAG index with plot view * fix: update requirement for graphRAG and lighten unnecessary packages * feat: add knowledge network index (#118) * feat: add Knowledge Network index * fix: update reader mode setting for knet * fix: update init knet * fix: update collection name to index pipeline * fix: missing req --------- Co-authored-by: jeff52415 <jeff.yang@cinnamon.is> * fix: update info panel return for graphrag * fix: retriever setting graphrag * feat: local llm settings (#122) * feat: expose context length as reasoning setting to better fit local models * fix: update context length setting for agents * fix: rework threadpool llm call * fix: fix improve indexing logic * fix: fix improve UI * feat: add lancedb * fix: improve lancedb logic * feat: add lancedb vectorstore * fix: lighten requirement * fix: improve lanceDB vs * fix: improve UI * fix: openai retry * fix: update reqs * fix: update launch command * feat: update Dockerfile * feat: add plot history * fix: update default config * fix: remove verbose print * fix: update default setting * fix: update gradio plot return * fix: default gradio tmp * fix: improve lancedb docstore * fix: fix question decompose pipeline * feat: add multimodal reader in UI * fix: udpate docs * fix: update default settings & docker build * fix: update app startup * chore: update documentation * chore: update README * chore: update README --------- Co-authored-by: trducng <trungduc1992@gmail.com> * chore: update README * chore: update README --------- Co-authored-by: trducng <trungduc1992@gmail.com> Co-authored-by: cin-ace <ace@cinnamon.is> Co-authored-by: Linh Nguyen <70562198+linhnguyen-cinnamon@users.noreply.github.com> Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local> Co-authored-by: cin-jacky <101088014+jacky0218@users.noreply.github.com> Co-authored-by: jacky0218 <jacky0218@github.com> Co-authored-by: kan_cin <kan@cinnamon.is> Co-authored-by: phv2312 <kat87yb@gmail.com> Co-authored-by: jeff52415 <jeff.yang@cinnamon.is>	2024-08-26 08:50:37 +07:00

1 2 3 4

177 Commits