* enforce Document as IO
* Separate rerankers, splitters and extractors (#85)
* partially refractor importing
* add text to embedding outputs
---------
Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
This change remove `BaseComponent`'s:
- run_raw
- run_batch_raw
- run_document
- run_batch_document
- is_document
- is_batch
Each component is expected to support multiple types of inputs and a single type of output. Since we want the component to work out-of-the-box with both standardized and customized use cases, supporting multiple types of inputs are expected. At the same time, to reduce the complexity of understanding how to use a component, we restrict a component to only have a single output type.
To accommodate these changes, we also refactor some components to remove their run_raw, run_batch_raw... methods, and to decide the common output interface for those components.
Tests are updated accordingly.
Commit changes:
* Add kwargs to vector store's query
* Simplify the BaseComponent
* Update tests
* Remove support for Python 3.8 and 3.9
* Bump version 0.3.0
* Fix github PR caching still use old environment after bumping version
---------
Co-authored-by: ian <ian@cinnamon.is>
Document store handles storing and indexing Documents. It supports the following interfaces:
- add: add 1 or more documents into document store
- get: get a list of documents
- get_all: get all documents in a document store
- delete: delete 1 or more document
- save: persist a document store into disk
- load: load a document store from disk
Design the base interface of vector store, and apply it to the Chroma Vector Store (wrapped around llama_index's implementation). Provide the pipelines to populate and retrieve from vector store.
This change provides the base interface of an embedding, and wrap the Langchain's OpenAI embedding. Usage as follow:
```python
from kotaemon.embeddings import AzureOpenAIEmbeddings
model = AzureOpenAIEmbeddings(
model="text-embedding-ada-002",
deployment="embedding-deployment",
openai_api_base="https://test.openai.azure.com/",
openai_api_key="some-key",
)
output = model("Hello world")
```