kotaemon

Author	SHA1	Message	Date
trducng	7fc54d52e4	Improve ocr loader error message	2024-02-06 12:21:12 +07:00
trducng	1a4fd7c33f	Update default settings to conform Langchain's Azure implementation	2024-02-05 18:04:36 +07:00
trducng	771f074c0e	Add utf-8 encoding in Help Page for rendering on Windows	2024-02-05 16:42:40 +07:00
trducng	bff55230ba	Reduce the default chunk size in the reasoning pipeline to fit LLM capability	2024-02-03 09:38:50 +07:00
trducng	107bc7580e	Enable HTML upload	2024-02-02 11:37:57 +07:00
Duc Nguyen (john)	65852b7d71	Add docx + html reader (#139 )	2024-01-31 19:21:30 +07:00
ian_Cin	116919b346	Update docs (#106 )	2024-01-30 18:50:17 +07:00
trducng	cbe40fac99	Show retrieved but non-evidence docs. Support language changing	2024-01-29 11:16:07 +07:00
trducng	50b5d936f5	Optionally allow database migration with Alembic	2024-01-28 19:54:15 +07:00
trducng	04635b77f6	Make the database table customizable	2024-01-28 07:54:38 +07:00
trducng	6ae9634399	Enable .doc file	2024-01-27 23:45:19 +07:00
trducng	23c0331bab	Enable pptx support	2024-01-27 23:08:06 +07:00
trducng	80ec214107	Fix loaders' file_path and other metadata	2024-01-27 22:52:46 +07:00
trducng	c6637ca56e	Relate the retrievers to the indexer	2024-01-27 16:39:40 +07:00
trducng	9b586466ff	Add the tutorial to mkdocs	2024-01-26 15:40:04 +00:00
Duc Nguyen (john)	22c646e5c4	Add documentation about adding reasoning and indexing pipelines to the application (#138 )	2024-01-26 22:31:52 +07:00
trducng	757aabca4d	Add app title, favicon. More natural chat	2024-01-25 22:40:32 +07:00
Duc Nguyen (john)	513e86f490	Add dedicated information panel to the UI (#137 ) * Allow streaming to the chatbot and the information panel without threading * Highlight evidence in a simple manner	2024-01-25 19:07:53 +07:00
Duc Nguyen (john)	ebc61400d8	Provide a developer mode when running ktem (#135 ) Implement and utilize `on_app_created` to support the developer mode.	2024-01-23 11:46:59 +07:00
Duc Nguyen (john)	2dd531114f	Make ktem official (#134 ) * Move kotaemon and ktem into same folder * Update docs * Update CI * Resolve mypy, isorts * Re-allow test pdf files	2024-01-23 10:54:18 +07:00
Duc Nguyen (john)	9c5b707010	Customize application settings (#132 ) * Allow customizing the base application * Make the core llms and embeddings customizable * Make the settings, reasoning and index customizable * Import from langchain_openai	2024-01-21 14:36:07 +07:00
Duc Nguyen (john)	5a9d6f75be	Migrate the MVP into kotaemon (#108 ) - Migrate the MVP into kotaemon. - Preliminary include the pipeline within chatbot interface. - Organize MVP as an application. Todo: - Add an info panel to view the planning of agents -> Fix streaming agents' output. Resolve: #60 Resolve: #61 Resolve: #62	2024-01-10 15:28:09 +07:00
ian_Cin	230328c62f	Best docs Cinnamon will probably ever have (#105 )	2023-12-20 11:30:25 +07:00
Duc Nguyen (john)	0e30dcbb06	Create Langchain LLM converter to quickly supply it to Langchain's chain (#102 ) * Create Langchain LLM converter to quickly supply it to Langchain's chain * Clean up	2023-12-11 14:55:56 +07:00
Duc Nguyen (john)	da0ac1d69f	Change template to private attribute and simplify imports (#101 ) --------- Co-authored-by: ian <ian@cinnamon.is>	2023-12-08 18:10:34 +07:00
Duc Nguyen (john)	1f927d3391	Upgrade promptui to conform to Gradio V4 (#98 )	2023-12-07 15:24:07 +07:00
ian_Cin	797df5a69c	refractor agents (#100 ) * refractor agents * minor cosmetic, add terminal ui for cli * pump to 0.3.4 * Add temporary path * fix unclose files in tests --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-12-06 17:06:29 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	d9e925eb75	Add UnstructuredReader with support for various legacy files (.doc, .xls) (#99 )	2023-12-05 16:19:13 +07:00
Duc Nguyen (john)	37c744b616	Add file-based document store and vector store (#96 ) * Modify docstore and vectorstore objects to be reconstructable * Simplify the file docstore * Use the simple file docstore and vector store in MVP	2023-12-04 17:46:00 +07:00
Duc Nguyen (john)	0ce3a8832f	Provide type hints for pass-through Langchain and Llama-index objects (#95 )	2023-12-04 10:59:13 +07:00
Duc Nguyen (john)	e34b1e4c6d	Refactor the index component and update the MVP insurance accordingly (#90 ) Refactor the `kotaemon/pipelines` module to `kotaemon/indices`. Create the VectorIndex. Note: currently I place `qa` to be inside `kotaemon/indices` since at the moment we only have `qa` in RAG. At the same time, I think `qa` can be an independent module in `kotaemon/qa`. Since this can be changed later, I still go at the 1st option for now to observe if we can change it later.	2023-11-30 18:35:07 +07:00
Nguyen Trung Duc (john)	8e3a1d193f	Refactor agents and tools (#91 ) * Move tools to agents * Move agents to dedicate place * Remove subclassing BaseAgent from BaseTool	2023-11-30 09:52:08 +07:00
ian_Cin	4256030b4f	Adopt pyproject.toml (#89 ) * ditching setup.py in favour of pyproject.toml; bump to 0.3.2 * bump to 0.3.3	2023-11-29 14:58:35 +07:00
ian_Cin	8e0779a22d	Enforce all IO objects to be subclassed from Document (#88 ) * enforce Document as IO * Separate rerankers, splitters and extractors (#85) * partially refractor importing * add text to embedding outputs --------- Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>	2023-11-27 16:35:09 +07:00
Nguyen Trung Duc (john)	2186c5558f	Separate rerankers, splitters and extractors (#85 )	2023-11-27 14:25:54 +07:00
ian_Cin	0dede9c82d	Subclass chat messages from Document (#86 )	2023-11-27 10:38:19 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	3ac277cc0b	Update Elastics store delete() (#84 )	2023-11-21 15:29:00 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	9a96a9b876	Add Elasticsearch Docstore (#83 ) * add Elasticsearch Docstore * update missing requirements * add docstore * [ignore cache] update default param * update docstring	2023-11-21 11:59:20 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	8bb7ad91e0	Add Langchain Agent wrapper with OpenAI Function / Self-ask agent support (#82 ) * update Param() type hint in MVP * update default embedding endpoint * update Langchain agent wrapper * update langchain agent	2023-11-20 16:26:08 +07:00
Nguyen Trung Duc (john)	0a3fc4b228	Correct the use of abstractmethod (#80 ) * Correct abstractmethod usage * Update interface * Specify minimal llama-index version [ignore cache] * Update examples	2023-11-20 11:18:53 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	98509f886c	Update splitters + metadata extractor interface to conform with new LlamaIndex design (#81 ) * change splitter to general doc parsers class to fit new llama-index desing * moving interface of splitter	2023-11-20 10:09:30 +07:00
Nguyen Trung Duc (john)	98c76c4700	Refactor excel Loader (#79 )	2023-11-16 11:30:11 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	cc1e75b3c6	Add Citation pipeline (#78 ) * add rerankers in retrieving pipeline * update example MVP pipeline * add citation pipeline and function call interface * change return type of QA and AgentPipeline to Document	2023-11-16 11:24:35 +07:00
Nguyen Trung Duc (john)	f8b8d86d4e	Move LLM-related components into LLM module (#74 ) * Move splitter into indexing module * Rename post_processing module to parsers * Migrate LLM-specific composite pipelines into llms module This change moves the `splitters` module into `indexing` module. The `indexing` module will be created soon, to house `indexing`-related components. This change renames `post_processing` module into `parsers` module. Post-processing is a generic term which provides very little information. In the future, we will add other extractors into the `parser` module, like Metadata extractor... This change migrates the composite elements into `llms` module. These elements heavily assume that the internal nodes are llm-specific. As a result, migrating these elements into `llms` module will make them more discoverable, and simplify code base structure.	2023-11-15 16:26:53 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	9945afdf6f	Add Reranker implementation and integration in Retrieving pipeline (#77 ) * Add base Reranker * Add LLM Reranker * Add Cohere Reranker * Add integration of Rerankers in Retrieving pipeline	2023-11-15 16:03:51 +07:00
Nguyen Trung Duc (john)	b52f312d8e	Use new Langchain's dedicated Azure OpenAI embedding class (#76 ) * Use new Langchain's dedicated Azure OpenAI embedding class * Update test	2023-11-15 14:46:32 +07:00
Nguyen Trung Duc (john)	b159897ac6	Combine docstores and vectorstores within a storages component (#72 )	2023-11-14 17:50:57 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	640962e916	Update retrieving + agent pipeline (#71 )	2023-11-14 16:40:13 +07:00
Nguyen Trung Duc (john)	693ed39de4	Move prompts into LLMs module (#70 ) Since the only usage of prompt is within LLMs, it is reasonable to keep it within the LLM module. This way, it would be easier to discover module, and make the code base less complicated. Changes: * Move prompt components into llms * Bump version 0.3.1 * Make pip install dependencies in eager mode --------- Co-authored-by: ian <ian@cinnamon.is>	2023-11-14 16:00:10 +07:00
Nguyen Trung Duc (john)	8532138842	Move Document and other interface into base/schema (#69 )	2023-11-14 11:51:10 +07:00

... 2 3 4 5 6

290 Commits