kotaemon

Author	SHA1	Message	Date
ian_Cin	797df5a69c	refractor agents (#100 ) * refractor agents * minor cosmetic, add terminal ui for cli * pump to 0.3.4 * Add temporary path * fix unclose files in tests --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-12-06 17:06:29 +07:00
Duc Nguyen (john)	37c744b616	Add file-based document store and vector store (#96 ) * Modify docstore and vectorstore objects to be reconstructable * Simplify the file docstore * Use the simple file docstore and vector store in MVP	2023-12-04 17:46:00 +07:00
Duc Nguyen (john)	0ce3a8832f	Provide type hints for pass-through Langchain and Llama-index objects (#95 )	2023-12-04 10:59:13 +07:00
Duc Nguyen (john)	e34b1e4c6d	Refactor the index component and update the MVP insurance accordingly (#90 ) Refactor the `kotaemon/pipelines` module to `kotaemon/indices`. Create the VectorIndex. Note: currently I place `qa` to be inside `kotaemon/indices` since at the moment we only have `qa` in RAG. At the same time, I think `qa` can be an independent module in `kotaemon/qa`. Since this can be changed later, I still go at the 1st option for now to observe if we can change it later.	2023-11-30 18:35:07 +07:00
ian_Cin	8e0779a22d	Enforce all IO objects to be subclassed from Document (#88 ) * enforce Document as IO * Separate rerankers, splitters and extractors (#85) * partially refractor importing * add text to embedding outputs --------- Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>	2023-11-27 16:35:09 +07:00
ian_Cin	0dede9c82d	Subclass chat messages from Document (#86 )	2023-11-27 10:38:19 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	cc1e75b3c6	Add Citation pipeline (#78 ) * add rerankers in retrieving pipeline * update example MVP pipeline * add citation pipeline and function call interface * change return type of QA and AgentPipeline to Document	2023-11-16 11:24:35 +07:00
Nguyen Trung Duc (john)	f8b8d86d4e	Move LLM-related components into LLM module (#74 ) * Move splitter into indexing module * Rename post_processing module to parsers * Migrate LLM-specific composite pipelines into llms module This change moves the `splitters` module into `indexing` module. The `indexing` module will be created soon, to house `indexing`-related components. This change renames `post_processing` module into `parsers` module. Post-processing is a generic term which provides very little information. In the future, we will add other extractors into the `parser` module, like Metadata extractor... This change migrates the composite elements into `llms` module. These elements heavily assume that the internal nodes are llm-specific. As a result, migrating these elements into `llms` module will make them more discoverable, and simplify code base structure.	2023-11-15 16:26:53 +07:00
Nguyen Trung Duc (john)	693ed39de4	Move prompts into LLMs module (#70 ) Since the only usage of prompt is within LLMs, it is reasonable to keep it within the LLM module. This way, it would be easier to discover module, and make the code base less complicated. Changes: * Move prompt components into llms * Bump version 0.3.1 * Make pip install dependencies in eager mode --------- Co-authored-by: ian <ian@cinnamon.is>	2023-11-14 16:00:10 +07:00
Nguyen Trung Duc (john)	8532138842	Move Document and other interface into base/schema (#69 )	2023-11-14 11:51:10 +07:00
Nguyen Trung Duc (john)	d79b3744cb	Simplify the `BaseComponent` inteface (#64 ) This change remove `BaseComponent`'s: - run_raw - run_batch_raw - run_document - run_batch_document - is_document - is_batch Each component is expected to support multiple types of inputs and a single type of output. Since we want the component to work out-of-the-box with both standardized and customized use cases, supporting multiple types of inputs are expected. At the same time, to reduce the complexity of understanding how to use a component, we restrict a component to only have a single output type. To accommodate these changes, we also refactor some components to remove their run_raw, run_batch_raw... methods, and to decide the common output interface for those components. Tests are updated accordingly. Commit changes: * Add kwargs to vector store's query * Simplify the BaseComponent * Update tests * Remove support for Python 3.8 and 3.9 * Bump version 0.3.0 * Fix github PR caching still use old environment after bumping version --------- Co-authored-by: ian <ian@cinnamon.is>	2023-11-13 15:10:18 +07:00
Nguyen Trung Duc (john)	9035e25666	Upgrade the declarative pipeline for cleaner interface (#51 )	2023-10-24 11:12:22 +07:00
Nguyen Trung Duc (john)	6e7905cbc0	[AUR-411] Adopt to Example2 project (#28 ) Add the chatbot from Example2. Create the UI for chat.	2023-10-12 15:13:25 +07:00
ian_Cin	d83c22aa4e	[AUR-395, AUR-415] Adopt Example1 Injury pipeline; add .flow() for enabling bottom-up pipeline execution (#32 ) * add example1/injury pipeline example * add dotenv * update various api	2023-10-02 16:24:56 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	3cceec63ef	[AUR-431] Add ReAct Agent (#34 ) * add base Tool * minor update test_tool * update test dependency * update test dependency * Fix namespace conflict * update test * add base Agent Interface, add ReWoo Agent * minor update * update test * fix typo * remove unneeded print * update rewoo agent * add LLMTool * update BaseAgent type * add ReAct agent * add ReAct agent * minor update * minor update * minor update * minor update * update docstring * fix max_iteration --------- Co-authored-by: trducng <trungduc1992@gmail.com>	2023-10-02 11:29:12 +07:00
Nguyen Trung Duc (john)	c6dd01e820	[AUR-338, AUR-406, AUR-407] Export pipeline to config for PromptUI. Construct PromptUI dynamically based on config. (#16 ) From pipeline > config > UI. Provide example project for promptui - Pipeline to config: `kotaemon.contribs.promptui.config.export_pipeline_to_config`. The config follows schema specified in this document: https://cinnamon-ai.atlassian.net/wiki/spaces/ATM/pages/2748711193/Technical+Detail. Note: this implementation exclude the logs, which will be handled in AUR-408. - Config to UI: `kotaemon.contribs.promptui.build_from_yaml` - Example project is located at `examples/promptui/`	2023-09-21 14:27:23 +07:00
ian_Cin	b794051653	[AUR-421] base output post-processor that works using regex. (#20 )	2023-09-19 19:54:44 +07:00
Nguyen Trung Duc (john)	c339912312	[AUR-389] Add base interface and embedding model (#17 ) This change provides the base interface of an embedding, and wrap the Langchain's OpenAI embedding. Usage as follow: ```python from kotaemon.embeddings import AzureOpenAIEmbeddings model = AzureOpenAIEmbeddings( model="text-embedding-ada-002", deployment="embedding-deployment", openai_api_base="https://test.openai.azure.com/", openai_api_key="some-key", ) output = model("Hello world") ```	2023-09-14 14:08:58 +07:00
ian_Cin	5241edbc46	[AUR-361] Setup pre-commit, pytest, GitHub actions, ssh-secret (#3 ) Co-authored-by: trducng <trungduc1992@gmail.com>	2023-08-30 07:22:01 +07:00
Nguyen Trung Duc (john)	c3c25db48c	[AUR-385, AUR-388] Declare BaseComponent and decide LLM call interface (#2 ) - Use cases related to LLM call: https://cinnamon-ai.atlassian.net/browse/AUR-388?focusedCommentId=34873 - Sample usages: `test_llms_chat_models.py` and `test_llms_completion_models.py`: ```python from kotaemon.llms.chats.openai import AzureChatOpenAI model = AzureChatOpenAI( openai_api_base="https://test.openai.azure.com/", openai_api_key="some-key", openai_api_version="2023-03-15-preview", deployment_name="gpt35turbo", temperature=0, request_timeout=60, ) output = model("hello world") ``` For the LLM-call component, I decide to wrap around Langchain's LLM models and Langchain's Chat models. And set the interface as follow: - Completion LLM component: ```python class CompletionLLM: def run_raw(self, text: str) -> LLMInterface: # Run text completion: str in -> LLMInterface out def run_batch_raw(self, text: list[str]) -> list[LLMInterface]: # Run text completion in batch: list[str] in -> list[LLMInterface] out # run_document and run_batch_document just reuse run_raw and run_batch_raw, due to unclear use case ``` - Chat LLM component: ```python class ChatLLM: def run_raw(self, text: str) -> LLMInterface: # Run chat completion (no chat history): str in -> LLMInterface out def run_batch_raw(self, text: list[str]) -> list[LLMInterface]: # Run chat completion in batch mode (no chat history): list[str] in -> list[LLMInterface] out def run_document(self, text: list[BaseMessage]) -> LLMInterface: # Run chat completion (with chat history): list[langchain's BaseMessage] in -> LLMInterface out def run_batch_document(self, text: list[list[BaseMessage]]) -> list[LLMInterface]: # Run chat completion in batch mode (with chat history): list[list[langchain's BaseMessage]] in -> list[LLMInterface] out ``` - The LLMInterface is as follow: ```python @dataclass class LLMInterface: text: list[str] completion_tokens: int = -1 total_tokens: int = -1 prompt_tokens: int = -1 logits: list[list[float]] = field(default_factory=list) ```	2023-08-29 15:47:12 +07:00

20 Commits