kotaemon

Author	SHA1	Message	Date
Tuan Anh Nguyen Dang (Tadashi_Cin)	cc1e75b3c6	Add Citation pipeline (#78 ) * add rerankers in retrieving pipeline * update example MVP pipeline * add citation pipeline and function call interface * change return type of QA and AgentPipeline to Document	2023-11-16 11:24:35 +07:00
Nguyen Trung Duc (john)	f8b8d86d4e	Move LLM-related components into LLM module (#74 ) * Move splitter into indexing module * Rename post_processing module to parsers * Migrate LLM-specific composite pipelines into llms module This change moves the `splitters` module into `indexing` module. The `indexing` module will be created soon, to house `indexing`-related components. This change renames `post_processing` module into `parsers` module. Post-processing is a generic term which provides very little information. In the future, we will add other extractors into the `parser` module, like Metadata extractor... This change migrates the composite elements into `llms` module. These elements heavily assume that the internal nodes are llm-specific. As a result, migrating these elements into `llms` module will make them more discoverable, and simplify code base structure.	2023-11-15 16:26:53 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	9945afdf6f	Add Reranker implementation and integration in Retrieving pipeline (#77 ) * Add base Reranker * Add LLM Reranker * Add Cohere Reranker * Add integration of Rerankers in Retrieving pipeline	2023-11-15 16:03:51 +07:00
Nguyen Trung Duc (john)	b159897ac6	Combine docstores and vectorstores within a storages component (#72 )	2023-11-14 17:50:57 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	640962e916	Update retrieving + agent pipeline (#71 )	2023-11-14 16:40:13 +07:00
Nguyen Trung Duc (john)	693ed39de4	Move prompts into LLMs module (#70 ) Since the only usage of prompt is within LLMs, it is reasonable to keep it within the LLM module. This way, it would be easier to discover module, and make the code base less complicated. Changes: * Move prompt components into llms * Bump version 0.3.1 * Make pip install dependencies in eager mode --------- Co-authored-by: ian <ian@cinnamon.is>	2023-11-14 16:00:10 +07:00
Nguyen Trung Duc (john)	8532138842	Move Document and other interface into base/schema (#69 )	2023-11-14 11:51:10 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	4704e2c11a	Add new OCRReader with PDF+OCR text merging (#66 ) This change speeds up OCR extraction by allowing bypassing OCR for texts that are irrelevant (not in table). --------- Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>	2023-11-13 17:43:02 +07:00
Nguyen Trung Duc (john)	d79b3744cb	Simplify the `BaseComponent` inteface (#64 ) This change remove `BaseComponent`'s: - run_raw - run_batch_raw - run_document - run_batch_document - is_document - is_batch Each component is expected to support multiple types of inputs and a single type of output. Since we want the component to work out-of-the-box with both standardized and customized use cases, supporting multiple types of inputs are expected. At the same time, to reduce the complexity of understanding how to use a component, we restrict a component to only have a single output type. To accommodate these changes, we also refactor some components to remove their run_raw, run_batch_raw... methods, and to decide the common output interface for those components. Tests are updated accordingly. Commit changes: * Add kwargs to vector store's query * Simplify the BaseComponent * Update tests * Remove support for Python 3.8 and 3.9 * Bump version 0.3.0 * Fix github PR caching still use old environment after bumping version --------- Co-authored-by: ian <ian@cinnamon.is>	2023-11-13 15:10:18 +07:00
ian_Cin	6095526dc7	Add Huggingface embeddings and Cohere embeddings (#63 ) * Add huggingface embeddings and cohere embeddings * Update openai interface and the mock for newer OpenAI SDK --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-11-10 09:38:30 +07:00
Nguyen Trung Duc (john)	9035e25666	Upgrade the declarative pipeline for cleaner interface (#51 )	2023-10-24 11:12:22 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	79cc60e6a2	[AUR-429] Add MVP pipeline with Ingestion and QA stage (#39 ) * add base Tool * minor update test_tool * update test dependency * update test dependency * Fix namespace conflict * update test * add base Agent Interface, add ReWoo Agent * minor update * update test * fix typo * remove unneeded print * update rewoo agent * add LLMTool * update BaseAgent type * add ReAct agent * add ReAct agent * minor update * minor update * minor update * minor update * update base reader with BaseComponent * add splitter * update agent and tool * update vectorstores * update load/save for indexing and retrieving pipeline * update test_agent for more use-cases * add missing dependency for test * update test case for in memory vectorstore * add TextSplitter to BaseComponent * update type hint basetool * add insurance mvp pipeline * update requirements * Remove redundant plugins param * Mock GoogleSearch --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-10-05 12:31:33 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	56bc41b673	Update Base interface of Index/Retrieval pipeline (#36 ) * add base Tool * minor update test_tool * update test dependency * update test dependency * Fix namespace conflict * update test * add base Agent Interface, add ReWoo Agent * minor update * update test * fix typo * remove unneeded print * update rewoo agent * add LLMTool * update BaseAgent type * add ReAct agent * add ReAct agent * minor update * minor update * minor update * minor update * update base reader with BaseComponent * add splitter * update agent and tool * update vectorstores * update load/save for indexing and retrieving pipeline * update test_agent for more use-cases * add missing dependency for test * update test case for in memory vectorstore * add TextSplitter to BaseComponent * update type hint basetool --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-10-04 14:27:44 +07:00
Nguyen Trung Duc (john)	49ed3f6994	[AUR-405] Auto-generate markdown documentation from pipeline (#33 ) * Create a script to auto-generate markdown docs from pipeline * Clean up documentation for Chain-of-Thought	2023-10-04 10:54:24 +07:00
Nguyen Trung Duc (john)	6ab1854532	feat: Add chain-of-thought (#37 ) * Add chain-of-thought * Use BasePromptComponent * Add terminate callback for the chain-of-thought	2023-10-04 02:16:33 +07:00
ian_Cin	d83c22aa4e	[AUR-395, AUR-415] Adopt Example1 Injury pipeline; add .flow() for enabling bottom-up pipeline execution (#32 ) * add example1/injury pipeline example * add dotenv * update various api	2023-10-02 16:24:56 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	3cceec63ef	[AUR-431] Add ReAct Agent (#34 ) * add base Tool * minor update test_tool * update test dependency * update test dependency * Fix namespace conflict * update test * add base Agent Interface, add ReWoo Agent * minor update * update test * fix typo * remove unneeded print * update rewoo agent * add LLMTool * update BaseAgent type * add ReAct agent * add ReAct agent * minor update * minor update * minor update * minor update * update docstring * fix max_iteration --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-10-02 11:29:12 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	91048770fa	[AUR-431, AUR-435] Add Agent Interface and ReWOO Agent implementation (#31 ) * add base Tool * minor update test_tool * update test dependency * update test dependency * Fix namespace conflict * update test * add base Agent Interface, add ReWoo Agent * minor update * update test * fix typo * remove unneeded print * update rewoo agent --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-10-01 11:53:08 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	f9fc02a32a	[AUR-363, AUR-433, AUR-434] Add Base Tool interface with Wikipedia/Google tools (#30 ) * add base Tool * minor update test_tool * update test dependency * update test dependency * Fix namespace conflict * update test --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-09-29 10:18:49 +07:00
Nguyen Trung Duc (john)	c6dd01e820	[AUR-338, AUR-406, AUR-407] Export pipeline to config for PromptUI. Construct PromptUI dynamically based on config. (#16 ) From pipeline > config > UI. Provide example project for promptui - Pipeline to config: `kotaemon.contribs.promptui.config.export_pipeline_to_config`. The config follows schema specified in this document: https://cinnamon-ai.atlassian.net/wiki/spaces/ATM/pages/2748711193/Technical+Detail. Note: this implementation exclude the logs, which will be handled in AUR-408. - Config to UI: `kotaemon.contribs.promptui.build_from_yaml` - Example project is located at `examples/promptui/`	2023-09-21 14:27:23 +07:00
ian_Cin	b794051653	[AUR-421] base output post-processor that works using regex. (#20 )	2023-09-19 19:54:44 +07:00
Nguyen Trung Duc (john)	620b2b03ca	[AUR-392, AUR-413, AUR-414] Define base vector store, and make use of ChromaVectorStore from llama_index. Indexing and retrieving vectors with vector store (#18 ) Design the base interface of vector store, and apply it to the Chroma Vector Store (wrapped around llama_index's implementation). Provide the pipelines to populate and retrieve from vector store.	2023-09-14 14:18:20 +07:00
trducng	043209fda7	Initiate repository	2023-08-16 14:56:48 +07:00

23 Commits