Update various docs (#4)

* rename cli tool

* remove redundant docs

* update docs

* update macos instructions

* add badges
This commit is contained in:
ian_Cin 2024-03-29 19:47:03 +07:00 committed by GitHub
parent 556c48b259
commit a3bf728400
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
23 changed files with 339 additions and 415 deletions

168
README.md
View File

@ -1,41 +1,60 @@
# kotaemon # kotaemon
Quick and easy AI components to build Kotaemon - applicable in client [Documentation](https://cinnamon.github.io/kotaemon/)
projects.
[Documentation](https://docs.bleh-internal.cinnamon.is/) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/release/python-31013/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![built with Codeium](https://codeium.com/badges/main)](https://codeium.com)
## Install Build and use local RAG-based Question Answering (QA) applications.
### Easy install This repository would like to appeal to both end users who want to do QA on their
documents and developers who want to build their own QA pipeline.
1. Clone the repository. - For end users:
2. Navigate to the `scripts` folder and start an installer that matches your OS: - A local Question Answering UI for RAG-based QA.
- Linux: `run_linux.sh` - Supports LLM API providers (OpenAI, AzureOpenAI, Cohere, etc) and local LLMs
- Windows: `run_windows.bat` (currently only GGUF format is supported via `llama-cpp-python`).
- macOS: `run_macos.sh` - Easy installation scripts, no environment setup required.
3. After the installation, the installer will ask to launch the ktem's UI, answer to continue. - For developers:
4. If launched, the application will be available at `http://localhost:7860/`. Let's start exploring! - A framework for building your own RAG-based QA pipeline.
- See your RAG pipeline in action with the provided UI (built with Gradio).
- Share your pipeline so that others can use it.
Here is the setup and update strategy: This repository is under active development. Feedback, issues, and PRs are highly
appreciated. Your input is valuable as it helps us persuade our business guys to support
open source.
- **Run `run_*` script**: This setup environment, including downloading Miniconda (in case Conda is not available in your machine) and installing necessary dependencies in `install_dir` folder. ## Installation
- **Launch the UI**: To launch the ktem's UI after initial setup or any changes, simply run `run_*` script again.
- **Reinstall dependencies**: Simply delete the `install_dir/env` folder and run `run_*` script. The script will recreate the folder with fresh dependencies.
### Manual install ### Manual installation
- Create conda environment (suggest 3.10)
```shell
conda create -n kotaemon python=3.10
conda activate kotaemon
```
- Clone the repo - Clone the repo
```shell ```shell
git clone git@github.com:Cinnamon/kotaemon.git git clone git@github.com:Cinnamon/kotaemon.git
cd kotaemon
```
- Install the environment
- Create a conda environment (python >= 3.10 is recommended)
```shell
conda create -n kotaemon python=3.10
conda activate kotaemon
# install dependencies
cd libs/kotaemon
pip install -e ".[all]"
```
- Or run the installer (one of the `scripts/run_*` scripts depends on your OS), then
you will have all the dependencies installed as a conda environment at
`install_dir/env`.
```shell
conda activate install_dir/env
``` ```
- Pre-commit - Pre-commit
@ -44,99 +63,26 @@ Here is the setup and update strategy:
pre-commit install pre-commit install
``` ```
- Install all
```shell
cd kotaemon/libs/kotaemon
pip install -e ".[dev]"
```
- Test - Test
```shell ```shell
pytest tests pytest tests
``` ```
### Credential sharing ### From installation scripts
This repo uses [git-secret](https://sobolevn.me/git-secret/) to share credentials, which 1. Clone the repository.
internally uses `gpg` to encrypt and decrypt secret files. 2. Navigate to the `scripts` folder and start an installer that matches your OS:
- Linux: `run_linux.sh`
- Windows: `run_windows.bat`
- macOS: `run_macos.sh`
3. After the installation, the installer will ask to launch the ktem's UI,answer to continue.
4. If launched, the application will be available at `http://localhost:7860/`.
5. The conda environment is located in the `install_dir/env` folder.
This repo also uses `python-dotenv` to manage credentials stored as environment variables. Here is the setup and update strategy:
Please note that the use of `python-dotenv` and credentials are for development
purposes only. Thus, it should not be used in the main source code (i.e. `kotaemon/` and `tests/`), but can be used in `examples/`.
#### Install git-secret - **Run the `run_*` script**: This setup environment, including downloading Miniconda (in case Conda is not available in your machine) and installing necessary dependencies in `install_dir` folder.
- **Launch the UI**: To launch the ktem's UI after initial setup or any changes, simply run `run_*` script again.
Please follow the [official guide](https://sobolevn.me/git-secret/installation) to install git-secret. - **Reinstall dependencies**: Simply delete the `install_dir/env` folder and run `run_*`
script again. The script will recreate the folder with fresh dependencies.
For Windows users, see [For Windows users](#for-windows-users).
For users who don't have sudo privilege to install packages, follow the `Manual Installation` in the [official guide](https://sobolevn.me/git-secret/installation) and set `PREFIX` to a path that you have access to. And please don't forget to add `PREFIX` to your `PATH`.
#### Gaining access
In order to gain access to the secret files, you must provide your gpg public file to anyone who has access and ask them to add your key to the keyring. For a quick tutorial on generating your gpg key pair, you can refer to the `Using gpg` section from the [git-secret main page](https://sobolevn.me/git-secret/).
#### Decrypt the secret file
The credentials are encrypted in the `.env.secret` file. To print the decrypted content to stdout, run
```shell
git-secret cat [filename]
```
Or to get the decrypted `.env` file, run
```shell
git-secret reveal [filename]
```
#### For Windows users
git-secret is currently not available for Windows, thus the easiest way is to use it in WSL (please use the latest version of WSL2). From there you can:
- Use the `gpg` and `git-secret` in WSL.
This is the most straight-forward option since you would use WSL just like any other Unix environment. However, the downside is that you have to make WSL your main environment, which means WSL must have write permission on your repo. To achieve this, you must either:
- Clone and store your repo inside WSL's file system.
- Provide WSL with the necessary permission on your Windows file system. This can be achieved by setting `automount` options for WSL. To do that, add this content to `/etc/wsl.conf` and then restart your sub-system.
```shell
[automount]
options = "metadata,umask=022,fmask=011"
```
This enables all permissions for user owner.
- (Optional) use `git-secret` and `gpg` from WSL in Windows.
For those who use Windows as the main environment, having to switch back and forth between Windows and WSL will be inconvenient. You can instead stay within your Windows environment and apply some tricks to use `git-secret` from WSL.
- Install and setup `gpg` on WSL. Now in Windows you can invoke WSL's `gpg`
using `wsl gpg`.
- Install `git-secret` on WSL. Now in Windows you can invoke `git-secret` using `wsl git-secret`.
- Additionally, you can set up aliases in CMD to shorten the syntax. Please refer to [this SO answer](https://stackoverflow.com/a/65823225) for the instruction. Some recommended aliases are:
```bat
@echo off
:: Commands
DOSKEY ls=dir /B $*
DOSKEY ll=dir /a $*
DOSKEY git-secret=wsl git-secret $*
DOSKEY gs=wsl git-secret $*
DOSKEY gpg=wsl gpg $*
```
Now you can invoke `git-secret` in CMD using `git-secret` or `gs`.
- For Powershell users, similar behaviours can be achieved using
`Set-Alias` and `profile.ps1`. Please refer to [this SO thread](https://stackoverflow.com/questions/61081434/how-do-i-create-a-permanent-alias-file-in-powershell-core)
as an example.
### Code base structure
- documents: define document
- loaders

View File

@ -1,166 +0,0 @@
# Getting started
## Setup
- Create conda environment (suggest 3.10)
```shell
conda create -n kotaemon python=3.10
conda activate kotaemon
```
- Clone the repo
```shel
git clone git@github.com:Cinnamon/kotaemon.git
cd kotaemon
```
- Install all
```shell
cd libs/kotaemon
pip install -e ".[dev]"
```
- Pre-commit
```shell
pre-commit install
```
- Test
```shell
pytest tests
```
## Credential sharing
This repo uses [git-secret](https://sobolevn.me/git-secret/) to share credentials, which
internally uses `gpg` to encrypt and decrypt secret files.
This repo uses `python-dotenv` to manage credentials stored as environment variable.
Please note that the use of `python-dotenv` and credentials are for development
purposes only. Thus, it should not be used in the main source code (i.e. `kotaemon/` and `tests/`), but can be used in `examples/`.
### Install git-secret
Please follow the [official guide](https://sobolevn.me/git-secret/installation) to install git-secret.
For Windows users, see [For Windows users](#for-windows-users).
For users who don't have sudo privilege to install packages, follow the `Manual Installation` in the [official guide](https://sobolevn.me/git-secret/installation) and set `PREFIX` to a path that you have access to. And please don't forget to add `PREFIX` to your `PATH`.
### Gaining access to credientials
In order to gain access to the secret files, you must provide your gpg public file to anyone who has access and ask them to ask your key to the keyring. For a quick tutorial on generating your gpg key pair, you can refer to the `Using gpg` section from the [git-secret main page](https://sobolevn.me/git-secret/).
### Decrypt the secret file
The credentials are encrypted in the `.env.secret` file. To print the decrypted content to stdout, run
```shell
git-secret cat [filename]
```
Or to get the decrypted `.env` file, run
```shell
git-secret reveal [filename]
```
### For Windows users
git-secret is currently not available for Windows, thus the easiest way is to use it in WSL (please use the latest version of WSL2). From there you have 2 options:
1. Using the gpg of WSL.
This is the most straight-forward option since you would use WSL just like any other unix environment. However, the downside is that you have to make WSL your main environment, which means WSL must have write permission on your repo. To achieve this, you must either:
- Clone and store your repo inside WSL's file system.
- Provide WSL with necessary permission on your Windows file system. This can be achieve by setting `automount` options for WSL. To do that, add these content to `/etc/wsl.conf` and then restart your sub-system.
```shell
[automount]
options = "metadata,umask=022,fmask=011"
```
This enables all permissions for user owner.
2. Using the gpg of Windows but with git-secret from WSL.
For those who use Windows as the main environment, having to switch back and forth between Windows and WSL will be inconvenient. You can instead stay within your Windows environment and apply some tricks to use `git-secret` from WSL.
- Install and setup `gpg` on Windows.
- Install `git-secret` on WSL. Now in Windows, you can invoke `git-secret` using `wsl git-secret`.
- Alternatively you can setup alias in CMD to shorten the syntax. Please refer to [this SO answer](https://stackoverflow.com/a/65823225) for the instruction. Some recommended aliases are:
```bat
@echo off
:: Commands
DOSKEY ls=dir /B $*
DOSKEY ll=dir /a $*
DOSKEY git-secret=wsl git-secret $*
DOSKEY gs=wsl git-secret $*
```
Now you can invoke `git-secret` in CMD using `git-secret` or `gs`.
- For Powershell users, similar behaviours can be achieved using `Set-Alias` and `profile.ps1`. Please refer this [SO thread](https://stackoverflow.com/questions/61081434/how-do-i-create-a-permanent-alias-file-in-powershell-core) as an example.
# PR guideline
## Common conventions
- Review should be done as soon as possible (within 2 business days).
- PR title: [ticket] One-line description (example: [AUR-385, AUR-388] Declare BaseComponent and decide LLM call interface).
- [Encouraged] Provide a quick description in the PR, so that:
- Reviewers can quickly understand the direction of the PR.
- It will be included in the commit message when the PR is merged.
## Environment caching on PR
- To speed up CI, environments are cached based on the version specified in `__init__.py`.
- Since dependencies versions in `setup.py` are not pinned, you need to pump the version in order to use a new environment. That environment will then be cached and used by your subsequence commits within the PR, until you pump the version again
- The new environment created during your PR is cached and will be available to others once the PR is merged.
- If you are experimenting with new dependencies and want a fresh environment every time, add `[ignore cache]` in your commit message. The CI will create a fresh environment to run your commit and then discard it.
- If your PR include updated dependencies, the recommended workflow would be:
- Doing development as usual.
- When you want to run the CI, push a commit with the message containing `[ignore cache]`.
- Once the PR is final, pump the version in `__init__.py` and push a final commit not containing `[ignore cache]`.
Examples: https://github.com/Cinnamon/kotaemon/pull/2
## Merge PR guideline
- Use squash and merge option
- 1st line message is the PR title.
- The text area is the PR description.
![image](images/274787925-e2593010-d7ef-46e3-8719-6fcae0315b5d.png)
![image](images/274787941-bfe6a117-85cd-4dd4-b432-197c791a9901.png)
## Develop pipelines
- Nodes
- Params
- Run function
```
from kotaemon.base import BaseComponent
class Pipeline(BaseComponent):
llm: AzureOpenAIEmbedding
doc_store: BaseDocumentStore
def run(self, input1, input2) -> str:
input = input1 + input2
output = self.llm(input)
self.doc_store.add(output)
return output
pipeline = Pipeline(llm=AzureOpenAILLM(), doc_store=InMemoryDocstore())
output = pipeline("this is text1", "this is text2")
```

View File

@ -0,0 +1,72 @@
# Package overview
`kotaemon` library focuses on the AI building blocks to implement a RAG-based QA application. It consists of base interfaces, core components and a list of utilities:
- Base interfaces: `kotaemon` defines the base interface of a component in a pipeline. A pipeline is also a component. By clearly define this interface, a pipeline of steps can be easily constructed and orchestrated.
- Core components: `kotaemon` implements (or wraps 3rd-party libraries
like Langchain, llama-index,... when possible) commonly used components in
kotaemon use cases. Some of these components are: LLM, vector store,
document store, retriever... For a detailed list and description of these
components, please refer to the [API Reference](/reference/nav/) section.
- List of utilities: `kotaemon` provides utilities and tools that are
usually needed in client project. For example, it provides a prompt
engineering UI for AI developers in a project to quickly create a prompt
engineering tool for DMs and QALs. It also provides a command to quickly spin
up a project code base. For a full list and description of these utilities,
please refer to the [Utilities](/development/utilities) section.
```mermaid
mindmap
root((kotaemon))
Base Interfaces
Document
LLMInterface
RetrievedDocument
BaseEmbeddings
BaseChat
BaseCompletion
...
Core Components
LLMs
AzureOpenAI
OpenAI
Embeddings
AzureOpenAI
OpenAI
HuggingFaceEmbedding
VectorStore
InMemoryVectorstore
ChromaVectorstore
Agent
Tool
DocumentStore
...
Utilities
Scaffold project
PromptUI
Documentation Support
```
# Common conventions
- PR title: One-line description (example: Feat: Declare BaseComponent and decide LLM call interface).
- [Encouraged] Provide a quick description in the PR, so that:
- Reviewers can quickly understand the direction of the PR.
- It will be included in the commit message when the PR is merged.
# Environment caching on PR
- To speed up CI, environments are cached based on the version specified in `__init__.py`.
- Since dependencies versions in `setup.py` are not pinned, you need to pump the version in order to use a new environment. That environment will then be cached and used by your subsequence commits within the PR, until you pump the version again
- The new environment created during your PR is cached and will be available to others once the PR is merged.
- If you are experimenting with new dependencies and want a fresh environment every time, add `[ignore cache]` in your commit message. The CI will create a fresh environment to run your commit and then discard it.
- If your PR include updated dependencies, the recommended workflow would be:
- Doing development as usual.
- When you want to run the CI, push a commit with the message containing `[ignore cache]`.
- Once the PR is final, pump the version in `__init__.py` and push a final commit not containing `[ignore cache]`.
# Merge PR guideline
- Use squash and merge option
- 1st line message is the PR title.
- The text area is the PR description.

View File

@ -0,0 +1 @@
--8<-- "README.md"

View File

@ -4,11 +4,13 @@ Utilities detail can be referred in the sub-pages of this section.
![chat-ui](images/271332562-ac8f9aac-d853-4571-a48b-d866a99eaf3e.png) ![chat-ui](images/271332562-ac8f9aac-d853-4571-a48b-d866a99eaf3e.png)
**_Important:_** despite the name prompt engineering UI, this tool allows DMs to test any kind of parameters that are exposed by AIRs. Prompt is one kind of param. There can be other type of params that DMs can tweak (e.g. top_k, temperature...). **_Important:_** despite the name prompt engineering UI, this tool allows testers to test any kind of parameters that are exposed by developers. Prompt is one kind of param. There can be other type of params that testers can tweak (e.g. top_k, temperature...).
**_Note:_** For hands-on examination of how to use prompt engineering UI, refer `./examples/promptui` and `./examples/example2/` In the development process, developers typically build the pipeline. However, for use
cases requiring expertise in prompt creation, non-technical members (testers, domain experts) can be more
In client projects, AI developers typically build the pipeline. However, for LLM projects requiring Japanese and domain expertise in prompt creation, non-technical team members (DM, BizDev, and QALs) can be more effective. To facilitate this, "xxx" offers a user-friendly prompt engineering UI that AI developers integrate into their pipelines. This enables non-technical members to adjust prompts and parameters, run experiments, and export results for optimization. effective. To facilitate this, `kotaemon` offers a user-friendly prompt engineering UI
that developers integrate into their pipelines. This enables non-technical members to
adjust prompts and parameters, run experiments, and export results for optimization.
As of Sept 2023, there are 2 kinds of prompt engineering UI: As of Sept 2023, there are 2 kinds of prompt engineering UI:
@ -19,22 +21,23 @@ As of Sept 2023, there are 2 kinds of prompt engineering UI:
For simple pipeline, the supported client project workflow looks as follow: For simple pipeline, the supported client project workflow looks as follow:
1. [AIR] Build pipeline 1. [tech] Build pipeline
2. [AIR] Export pipeline to config: `$ kh promptui export <module.path.piplineclass> --output <path/to/config/file.yml>` 2. [tech] Export pipeline to config: `$ kotaemon promptui export <module.path.piplineclass> --output <path/to/config/file.yml>`
3. [AIR] Customize the config 3. [tech] Customize the config
4. [AIR] Spin up prompt engineering UI: `$ kh promptui run <path/to/config/file.yml>` 4. [tech] Spin up prompt engineering UI: `$ kotaemon promptui run <path/to/config/file.yml>`
5. [DM] Change params, run inference 5. [non-tech] Change params, run inference
6. [DM] Export to Excel 6. [non-tech] Export to Excel
7. [DM] Select the set of params that achieve the best output 7. [non-tech] Select the set of params that achieve the best output
The prompt engineering UI prominently involves from step 2 to step 7 (step 1 is normal AI tasks in project, while step 7 happens exclusively in Excel file). The prompt engineering UI prominently involves from step 2 to step 7 (step 1 is normally
done by the developers, while step 7 happens exclusively in Excel file).
#### Step 2 - Export pipeline to config #### Step 2 - Export pipeline to config
Command: Command:
``` ```
$ kh promptui export <module.path.piplineclass> --output <path/to/config/file.yml> $ kotaemon promptui export <module.path.piplineclass> --output <path/to/config/file.yml>
``` ```
where: where:
@ -54,14 +57,14 @@ Declared as above, and `param1` will show up in the config YAML file, while `par
#### Step 3 - Customize the config #### Step 3 - Customize the config
AIR can further edit the config file in this step to get the most suitable UI (step 4) with their tasks. The exported config will have this overall schema: developers can further edit the config file in this step to get the most suitable UI (step 4) with their tasks. The exported config will have this overall schema:
``` ```
<module.path.pipelineclass1>: <module.path.pipelineclass1>:
params: params:
... (Detail param information to initiate a pipeline. This corresponds to the pipeline init parameters.) ... (Detail param information to initiate a pipeline. This corresponds to the pipeline init parameters.)
inputs: inputs:
... (Detail the input of the pipeline e.g. a text prompt, an FNOL... This corresponds to the params of `run(...)` method.) ... (Detail the input of the pipeline e.g. a text prompt. This corresponds to the params of `run(...)` method.)
outputs: outputs:
... (Detail the output of the pipeline e.g. prediction, accuracy... This is the output information we wish to see in the UI.) ... (Detail the output of the pipeline e.g. prediction, accuracy... This is the output information we wish to see in the UI.)
logs: logs:
@ -141,7 +144,7 @@ logs:
Command: Command:
``` ```
$ kh promptui run <path/to/config/file.yml> $ kotaemon promptui run <path/to/config/file.yml>
``` ```
This will generate an UI as follow: This will generate an UI as follow:

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

BIN
docs/images/chat-tab.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

BIN
docs/images/file-index.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.0 KiB

BIN
docs/images/file-list.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

BIN
docs/images/file-upload.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

BIN
docs/images/info-panel.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

View File

@ -1 +1,122 @@
--8<-- "README.md" # Getting Started with Kotaemon
This page is intended for end users who want to use the `kotaemon` tool for Question Answering on local documents.
## Download
Download and upzip the latest version of `kotaemon` by clicking this
[link](https://github.com/Cinnamon/kotaemon/archive/refs/heads/main.zip).
## Choose what model to use
The tool uses Large Language Model (LLMs) to perform various tasks in a QA pipeline. So
prior to running, you need to provide the application with access to the LLMs you want
to use.
Please edit the `.env` file with the information needed to connect to the LLMs. You only
need to provide at least one. However, tt is recommended that you include all the LLMs
that you have access to, you will be able to switch between them while using the
application.
Currently, the following providers are supported:
### OpenAI
In the `.env` file, set the `OPENAI_API_KEY` variable with your OpenAI API key in order
to enable access to OpenAI's models. There are other variables that can be modified,
please feel free to edit them to fit your case. Otherwise, the default parameter should
work for most people.
```shell
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_API_KEY=<your OpenAI API key here>
OPENAI_CHAT_MODEL=gpt-3.5-turbo
OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002
```
### Azure OpenAI
For OpenAI models via Azure platform, you need to provide your Azure endpoint and API
key. Your might also need to provide your developments' name for the chat model and the
embedding model depending on how you set up Azure development.
```shell
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_API_KEY=
OPENAI_API_VERSION=2024-02-15-preview
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002
```
### Local models
- Pros:
- Privacy. Your documents will be stored and process locally.
- Choices. There are a wide range of LLMs in terms of size, domain, language to choose
from.
- Cost. It's free.
- Cons:
- Quality. Local models are much smaller and thus have lower generative quality than
paid APIs.
- Speed. Local models are deployed using your machine so the processing speed is
limited by your hardware.
#### Find and download a LLM
You can search and download a LLM to be ran locally from the [Hugging Face
Hub](https://huggingface.co/models). Currently, these model formats are supported:
- GGUF
You should choose a model whose size is less than your device's memory and should leave
about 2 GB. For example, if you have 16 GB of RAM in total, of which 12 GB is available,
then you should choose a model that take up at most 10 GB of RAM. Bigger models tend to
give better generation but also take more processing time.
Here are some recommendations and their size in memory:
- [Qwen1.5-1.8B-Chat-GGUF](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q8_0.gguf?download=true):
around 2 GB
#### Enable local models
To add a local model to the model pool, set the `LOCAL_MODEL` variable in the `.env`
file to the path of the model file.
```shell
LOCAL_MODEL=<full path to your model file>
```
Here is how to get the full path of your model file:
- On Windows 11: right click the file and select `Copy as Path`.
## Installation
1. Navigate to the `scripts` folder and start an installer that matches your OS:
- Windows: `run_windows.bat`. Just double click the file.
- macOS: `run_macos.sh`
1. Right click on your file and select Open with and Other.
2. Enable All Applications and choose Terminal.
3. NOTE: If you always want to open that file with Terminal, then check Always Open With.
4. From now on, double click on your file and it should work.
- Linux: `run_linux.sh`. If you are using Linux, you would know how to run a bash
script, right ?
2. After the installation, the installer will ask to launch the ktem's UI,answer to continue.
3. If launched, the application will be available at `http://localhost:7860/`.
## Launch
To launch the app after initial setup or any changes, simply run the `run_*` script again.
A browser window will be opened and greet you with this screen:
![Chat tab](images/startup-chat-tab.png)
## Usage
For how to use the application, see [Usage](/usage). Have fun !
## Feedback
Feel free to create a bug report or a feature request or join a discussion at https://github.com/Cinnamon/kotaemon/issues.

View File

@ -1,84 +0,0 @@
## Introduction
`kotaemon` library focuses on the AI building blocks to implement the Kotaemon. It can be used in both client project and in product development. It consists of base interfaces, core components and a list of utilities:
- Base interfaces: `kotaemon` defines the base interface of a component in a pipeline. A pipeline is also a component. By clearly define this interface, a pipeline of steps can be easily constructed and orchestrated.
- Core components: `kotaemon` implements (or wraps 3rd-party libraries
like Langchain, llama-index,... when possible) commonly used components in
kotaemon use cases. Some of these components are: LLM, vector store,
document store, retriever... For a detailed list and description of these
components, please refer to the [API Reference](/reference/nav/) section.
- List of utilities: `lib-knowledge` provides utilities and tools that are
usually needed in client project. For example, it provides a prompt
engineering UI for AI developers in a project to quickly create a prompt
engineering tool for DMs and QALs. It also provides a command to quickly spin
up a project code base. For a full list and description of these utilities,
please refer to the [Tutorial/Utilities](/ultilities) section.
```mermaid
mindmap
root((kotaemon))
Base Interfaces
Document
LLMInterface
RetrievedDocument
BaseEmbeddings
BaseChat
BaseCompletion
...
Core Components
LLMs
AzureOpenAI
OpenAI
Embeddings
AzureOpenAI
OpenAI
HuggingFaceEmbedding
VectorStore
InMemoryVectorstore
ChromaVectorstore
Agent
Tool
DocumentStore
...
Utilities
Scaffold project
PromptUI
Documentation Support
```
## Expected benefit
Before `kotaemon`:
- Starting everything from scratch.
- Knowledge and expertise is fragmented.
- Nothing to reuse.
- No way to collaborate between tech and non-tech experts.
`kotaemon` expects to completely revolutionize the way we are building LLM-related projects. It helps the company side-steps those issues by:
- Standardize the interface to (1) make building LLM pipeline clearer (2) more reasonable to integrate pipelines between different projects.
- Centralize LLM-related technical components into 1 place. Avoid fragmented technology development. Easy to find the LLM-related technology inside the company.
- Centralize bug fixes and improvements in 1 place.
- Reduce boilerplate code during project development.
- Lightning fast prototyping.
## Install
The kotaemon can be installed from source with:
```
pip install kotaemon@git+ssh://git@github.com/Cinnamon/kotaemon.git
```
or from Cinnamon's internal python package index:
```
pip install kotaemon --extra-index-url https://ian_devpi.promptui.dm.cinnamon.is/root/packages
```
## Example use cases
- Start a project from scratch: `kh start-project`
- Run prompt engineering UI tool: `kh promptui export`, then `kh promptui run`.

View File

@ -1,24 +0,0 @@
Devpi server endpoint (subjected to change): https://ian_devpi.promptui.dm.cinnamon.is/root/packages
Install devpi-client
```bash
pip install devpi-client
```
Login to the server
```bash
devpi use <server endpoint> # set server endpoint provided above
devpi login <user name> --password=<your password> # login
```
If you don't yet have an account, please contact Ian or John.
Upload your package
```bash
devpi use <package name>\dev # choose the index to upload your package
cd <your package directory which must contain a pyproject.toml/setup.py>
devpi upload
```

59
docs/usage.md Normal file
View File

@ -0,0 +1,59 @@
# Basic Usage
## Chat tab
![chat tab](images/chat-tab.png)
The chat tab is divided into 3 columns:
- Left: Conversation settings
- Middle: Chat interface
- Right: Information panel
### Conversation settings
#### Conversation control
Create, rename, and delete conversations.
![covnerstaion control](images/converstaion-control.png)
#### File index
Choose which files to retrieve references from. If no file is selected, all files will be used.
![file index](images/file-index.png)
### Chat interface
Interact with the chatbot.
![chat interface](images/chat-interface.png)
### Information panel
Supporting information such as the retrieved evidence and reference will be displayed
here.
![info panel](images/info-panel.png)
## File index tab
![file index tab](images/file-index-tab.png)
### File upload
In order for a file to be used as an index for retrieval, it must be processed by the
application first. Do this uploading your file to the UI and then select `Upload and Index`.
![file upload](images/file-upload.png)
The application will take some time to process the file and show a message once it is
done. Then you will be able to select it in the [File index section](#file-index) of the [Chat tab](#chat-tab).
### File list
This section shows the list of files that have been uploaded to the application and
allows users to delete them.
![file-list](images/file-list.png)

View File

@ -73,7 +73,7 @@ dev = [
all = ["kotaemon[adv,dev]"] all = ["kotaemon[adv,dev]"]
[project.scripts] [project.scripts]
kh = "kotaemon.cli:main" kotaemon = "kotaemon.cli:main"
[project.urls] [project.urls]
Homepage = "https://github.com/Cinnamon/kotaemon/" Homepage = "https://github.com/Cinnamon/kotaemon/"

View File

@ -224,8 +224,8 @@ def test_wrapper_agent_langchain(openai_completion, llm, mock_google_search):
side_effect=_openai_chat_completion_responses_react_langchain_tool, side_effect=_openai_chat_completion_responses_react_langchain_tool,
) )
def test_react_agent_with_langchain_tools(openai_completion, llm): def test_react_agent_with_langchain_tools(openai_completion, llm):
from langchain.tools import DuckDuckGoSearchRun, WikipediaQueryRun from langchain_community.tools import DuckDuckGoSearchRun, WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper from langchain_community.utilities import WikipediaAPIWrapper
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper()) wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
search = DuckDuckGoSearchRun() search = DuckDuckGoSearchRun()

View File

@ -6,8 +6,7 @@ edit_uri: edit/main/docs/
nav: nav:
- Getting Started: - Getting Started:
- Quick Start: index.md - Quick Start: index.md
- Overview: overview.md - Basic Usage: usage.md
- Contributing: contributing.md
- Application: - Application:
- Features: pages/app/features.md - Features: pages/app/features.md
- Index: - Index:
@ -20,15 +19,13 @@ nav:
- Customize flow logic: pages/app/customize-flows.md - Customize flow logic: pages/app/customize-flows.md
- Customize UI: pages/app/customize-ui.md - Customize UI: pages/app/customize-ui.md
- Functional description: pages/app/functional-description.md - Functional description: pages/app/functional-description.md
- Tutorial: - Development:
- Data & Data Structure Components: data-components.md - Contributing: development/contributing.md
- Creating a Component: create-a-component.md - Data & Data Structure Components: development/data-components.md
- Utilities: ultilities.md - Creating a Component: development/create-a-component.md
- Utilities: development/utilities.md
# generated using gen-files + literate-nav # generated using gen-files + literate-nav
- API Reference: reference/ - API Reference: reference/
- Use Cases: examples/
- Misc:
- Upload python package to private index: upload-package.md
markdown_extensions: markdown_extensions:
- admonition - admonition
@ -59,7 +56,6 @@ plugins:
- gen-files: - gen-files:
scripts: scripts:
- docs/scripts/generate_reference_docs.py - docs/scripts/generate_reference_docs.py
- docs/scripts/generate_examples_docs.py
- literate-nav: - literate-nav:
nav_file: NAV.md nav_file: NAV.md
- mkdocstrings: - mkdocstrings: