Commit Graph

13 Commits

Author SHA1 Message Date
ian_Cin
8e0779a22d Enforce all IO objects to be subclassed from Document (#88)
* enforce Document as IO

* Separate rerankers, splitters and extractors (#85)

* partially refractor importing

* add text to embedding outputs

---------

Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
2023-11-27 16:35:09 +07:00
Nguyen Trung Duc (john)
b52f312d8e Use new Langchain's dedicated Azure OpenAI embedding class (#76)
* Use new Langchain's dedicated Azure OpenAI embedding class

* Update test
2023-11-15 14:46:32 +07:00
Nguyen Trung Duc (john)
8532138842 Move Document and other interface into base/schema (#69) 2023-11-14 11:51:10 +07:00
Nguyen Trung Duc (john)
d79b3744cb Simplify the BaseComponent inteface (#64)
This change remove `BaseComponent`'s:

- run_raw
- run_batch_raw
- run_document
- run_batch_document
- is_document
- is_batch

Each component is expected to support multiple types of inputs and a single type of output. Since we want the component to work out-of-the-box with both standardized and customized use cases, supporting multiple types of inputs are expected. At the same time, to reduce the complexity of understanding how to use a component, we restrict a component to only have a single output type.

To accommodate these changes, we also refactor some components to remove their run_raw, run_batch_raw... methods, and to decide the common output interface for those components.

Tests are updated accordingly.

Commit changes:

* Add kwargs to vector store's query
* Simplify the BaseComponent
* Update tests
* Remove support for Python 3.8 and 3.9
* Bump version 0.3.0
* Fix github PR caching still use old environment after bumping version

---------

Co-authored-by: ian <ian@cinnamon.is>
2023-11-13 15:10:18 +07:00
ian_Cin
6095526dc7 Add Huggingface embeddings and Cohere embeddings (#63)
* Add huggingface embeddings and cohere embeddings
* Update openai interface and the mock for newer OpenAI SDK

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-11-10 09:38:30 +07:00
Nguyen Trung Duc (john)
9035e25666 Upgrade the declarative pipeline for cleaner interface (#51) 2023-10-24 11:12:22 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
79cc60e6a2 [AUR-429] Add MVP pipeline with Ingestion and QA stage (#39)
* add base Tool

* minor update test_tool

* update test dependency

* update test dependency

* Fix namespace conflict

* update test

* add base Agent Interface, add ReWoo Agent

* minor update

* update test

* fix typo

* remove unneeded print

* update rewoo agent

* add LLMTool

* update BaseAgent type

* add ReAct agent

* add ReAct agent

* minor update

* minor update

* minor update

* minor update

* update base reader with BaseComponent

* add splitter

* update agent and tool

* update vectorstores

* update load/save for indexing and retrieving pipeline

* update test_agent for more use-cases

* add missing dependency for test

* update test case for in memory vectorstore

* add TextSplitter to BaseComponent

* update type hint basetool

* add insurance mvp pipeline

* update requirements

* Remove redundant plugins param

* Mock GoogleSearch

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-10-05 12:31:33 +07:00
Nguyen Trung Duc (john)
49ed3f6994 [AUR-405] Auto-generate markdown documentation from pipeline (#33)
* Create a script to auto-generate markdown docs from pipeline
* Clean up documentation for Chain-of-Thought
2023-10-04 10:54:24 +07:00
ian_Cin
b794051653 [AUR-421] base output post-processor that works using regex. (#20) 2023-09-19 19:54:44 +07:00
Nguyen Trung Duc (john)
2a3a23ecd7 [AUR-420] Provide document store base interface and an in-memory version (#21)
Document store handles storing and indexing Documents. It supports the following interfaces:

- add: add 1 or more documents into document store
- get: get a list of documents
- get_all: get all documents in a document store
- delete: delete 1 or more document
- save: persist a document store into disk
- load: load a document store from disk
2023-09-19 14:49:23 +07:00
Nguyen Trung Duc (john)
620b2b03ca [AUR-392, AUR-413, AUR-414] Define base vector store, and make use of ChromaVectorStore from llama_index. Indexing and retrieving vectors with vector store (#18)
Design the base interface of vector store, and apply it to the Chroma Vector Store (wrapped around llama_index's implementation). Provide the pipelines to populate and retrieve from vector store.
2023-09-14 14:18:20 +07:00
Nguyen Trung Duc (john)
c339912312 [AUR-389] Add base interface and embedding model (#17)
This change provides the base interface of an embedding, and wrap the Langchain's OpenAI embedding. Usage as follow:

```python
from kotaemon.embeddings import AzureOpenAIEmbeddings

model = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    deployment="embedding-deployment",
    openai_api_base="https://test.openai.azure.com/",
    openai_api_key="some-key",
)
output = model("Hello world")
```
2023-09-14 14:08:58 +07:00
trducng
043209fda7 Initiate repository 2023-08-16 14:56:48 +07:00