Update various docs (#4)

* rename cli tool * remove redundant docs * update docs * update macos instructions * add badges
2024-03-29 19:47:03 +07:00 · 2024-03-29 19:47:03 +07:00 · a3bf728400
commit a3bf728400
parent 556c48b259
23 changed files with 339 additions and 415 deletions
--- a/README.md
+++ b/README.md
@ -1,41 +1,60 @@
 # kotaemon
-Quick and easy AI components to build Kotaemon - applicable in client
+[Documentation](https://cinnamon.github.io/kotaemon/)
 projects.
-[Documentation](https://docs.bleh-internal.cinnamon.is/)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/release/python-31013/)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![built with Codeium](https://codeium.com/badges/main)](https://codeium.com)
-## Install
+Build and use local RAG-based Question Answering (QA) applications.
-### Easy install
+This repository would like to appeal to both end users who want to do QA on their
 documents and developers who want to build their own QA pipeline.
-1. Clone the repository.
+- For end users:
-2. Navigate to the `scripts` folder and start an installer that matches your OS:
+  - A local Question Answering UI for RAG-based QA.
-   - Linux: `run_linux.sh`
+  - Supports LLM API providers (OpenAI, AzureOpenAI, Cohere, etc) and local LLMs
-   - Windows: `run_windows.bat`
+    (currently only GGUF format is supported via `llama-cpp-python`).
-   - macOS: `run_macos.sh`
+  - Easy installation scripts, no environment setup required.
-3. After the installation, the installer will ask to launch the ktem's UI, answer to continue.
+- For developers:
-4. If launched, the application will be available at `http://localhost:7860/`. Let's start exploring!
+  - A framework for building your own RAG-based QA pipeline.
  - See your RAG pipeline in action with the provided UI (built with Gradio).
  - Share your pipeline so that others can use it.
-Here is the setup and update strategy:
+This repository is under active development. Feedback, issues, and PRs are highly
 appreciated. Your input is valuable as it helps us persuade our business guys to support
 open source.
- **Run `run_*` script**: This setup environment, including downloading Miniconda (in case Conda is not available in your machine) and installing necessary dependencies in `install_dir` folder.
+## Installation
 - **Launch the UI**: To launch the ktem's UI after initial setup or any changes, simply run `run_*` script again.
 - **Reinstall dependencies**: Simply delete the `install_dir/env` folder and run `run_*` script. The script will recreate the folder with fresh dependencies.
-### Manual install
+### Manual installation
 - Create conda environment (suggest 3.10)
  ```shell
  conda create -n kotaemon python=3.10
  conda activate kotaemon
  ```
 - Clone the repo
  ```shell
  git clone git@github.com:Cinnamon/kotaemon.git
  cd kotaemon
  ```
 - Install the environment
  - Create a conda environment (python >= 3.10 is recommended)
    ```shell
    conda create -n kotaemon python=3.10
    conda activate kotaemon
    # install dependencies
    cd libs/kotaemon
    pip install -e ".[all]"
    ```
  - Or run the installer (one of the `scripts/run_*` scripts depends on your OS), then
    you will have all the dependencies installed as a conda environment at
    `install_dir/env`.
    ```shell
    conda activate install_dir/env
    ```
 - Pre-commit
@ -44,99 +63,26 @@ Here is the setup and update strategy:
  pre-commit install
  ```
 - Install all
  ```shell
  cd kotaemon/libs/kotaemon
  pip install -e ".[dev]"
  ```
 - Test
  ```shell
  pytest tests
  ```
-### Credential sharing
+### From installation scripts
-This repo uses [git-secret](https://sobolevn.me/git-secret/) to share credentials, which
+1. Clone the repository.
-internally uses `gpg` to encrypt and decrypt secret files.
+2. Navigate to the `scripts` folder and start an installer that matches your OS:
   - Linux: `run_linux.sh`
   - Windows: `run_windows.bat`
   - macOS: `run_macos.sh`
 3. After the installation, the installer will ask to launch the ktem's UI,answer to continue.
 4. If launched, the application will be available at `http://localhost:7860/`.
 5. The conda environment is located in the `install_dir/env` folder.
-This repo also uses `python-dotenv` to manage credentials stored as environment variables.
+Here is the setup and update strategy:
 Please note that the use of `python-dotenv` and credentials are for development
 purposes only. Thus, it should not be used in the main source code (i.e. `kotaemon/` and `tests/`), but can be used in `examples/`.
-#### Install git-secret
+- **Run the `run_*` script**: This setup environment, including downloading Miniconda (in case Conda is not available in your machine) and installing necessary dependencies in `install_dir` folder.
-
+- **Launch the UI**: To launch the ktem's UI after initial setup or any changes, simply run `run_*` script again.
-Please follow the [official guide](https://sobolevn.me/git-secret/installation) to install git-secret.
+- **Reinstall dependencies**: Simply delete the `install_dir/env` folder and run `run_*`
-
+  script again. The script will recreate the folder with fresh dependencies.
 For Windows users, see [For Windows users](#for-windows-users).
 For users who don't have sudo privilege to install packages, follow the `Manual Installation` in the [official guide](https://sobolevn.me/git-secret/installation) and set `PREFIX` to a path that you have access to. And please don't forget to add `PREFIX` to your `PATH`.
 #### Gaining access
 In order to gain access to the secret files, you must provide your gpg public file to anyone who has access and ask them to add your key to the keyring. For a quick tutorial on generating your gpg key pair, you can refer to the `Using gpg` section from the [git-secret main page](https://sobolevn.me/git-secret/).
 #### Decrypt the secret file
 The credentials are encrypted in the `.env.secret` file. To print the decrypted content to stdout, run
 ```shell
 git-secret cat [filename]
 ```
 Or to get the decrypted `.env` file, run
 ```shell
 git-secret reveal [filename]
 ```
 #### For Windows users
 git-secret is currently not available for Windows, thus the easiest way is to use it in WSL (please use the latest version of WSL2). From there you can:
 - Use the `gpg` and `git-secret` in WSL.
  This is the most straight-forward option since you would use WSL just like any other Unix environment. However, the downside is that you have to make WSL your main environment, which means WSL must have write permission on your repo. To achieve this, you must either:
  - Clone and store your repo inside WSL's file system.
  - Provide WSL with the necessary permission on your Windows file system. This can be achieved by setting `automount` options for WSL. To do that, add this content to `/etc/wsl.conf` and then restart your sub-system.
    ```shell
    [automount]
    options = "metadata,umask=022,fmask=011"
    ```
    This enables all permissions for user owner.
 - (Optional) use `git-secret` and `gpg` from WSL in Windows.
  For those who use Windows as the main environment, having to switch back and forth between Windows and WSL will be inconvenient. You can instead stay within your Windows environment and apply some tricks to use `git-secret` from WSL.
  - Install and setup `gpg` on WSL. Now in Windows you can invoke WSL's `gpg`
    using `wsl gpg`.
  - Install `git-secret` on WSL. Now in Windows you can invoke `git-secret` using `wsl git-secret`.
  - Additionally, you can set up aliases in CMD to shorten the syntax. Please refer to [this SO answer](https://stackoverflow.com/a/65823225) for the instruction. Some recommended aliases are:
    ```bat
    @echo off
    :: Commands
    DOSKEY ls=dir /B $*
    DOSKEY ll=dir /a $*
    DOSKEY git-secret=wsl git-secret $*
    DOSKEY gs=wsl git-secret $*
    DOSKEY gpg=wsl gpg $*
    ```
    Now you can invoke `git-secret` in CMD using `git-secret` or `gs`.
    - For Powershell users, similar behaviours can be achieved using
      `Set-Alias` and `profile.ps1`. Please refer to [this SO thread](https://stackoverflow.com/questions/61081434/how-do-i-create-a-permanent-alias-file-in-powershell-core)
      as an example.
 ### Code base structure
 - documents: define document
 - loaders
--- a/docs/contributing.md
+++ b/docs/contributing.md
@ -1,166 +0,0 @@
 # Getting started
 ## Setup
 - Create conda environment (suggest 3.10)
  ```shell
  conda create -n kotaemon python=3.10
  conda activate kotaemon
  ```
 - Clone the repo
  ```shel
  git clone git@github.com:Cinnamon/kotaemon.git
  cd kotaemon
  ```
 - Install all
  ```shell
  cd libs/kotaemon
  pip install -e ".[dev]"
  ```
 - Pre-commit
  ```shell
  pre-commit install
  ```
 - Test
  ```shell
  pytest tests
  ```
 ## Credential sharing
 This repo uses [git-secret](https://sobolevn.me/git-secret/) to share credentials, which
 internally uses `gpg` to encrypt and decrypt secret files.
 This repo uses `python-dotenv` to manage credentials stored as environment variable.
 Please note that the use of `python-dotenv` and credentials are for development
 purposes only. Thus, it should not be used in the main source code (i.e. `kotaemon/` and `tests/`), but can be used in `examples/`.
 ### Install git-secret
 Please follow the [official guide](https://sobolevn.me/git-secret/installation) to install git-secret.
 For Windows users, see [For Windows users](#for-windows-users).
 For users who don't have sudo privilege to install packages, follow the `Manual Installation` in the [official guide](https://sobolevn.me/git-secret/installation) and set `PREFIX` to a path that you have access to. And please don't forget to add `PREFIX` to your `PATH`.
 ### Gaining access to credientials
 In order to gain access to the secret files, you must provide your gpg public file to anyone who has access and ask them to ask your key to the keyring. For a quick tutorial on generating your gpg key pair, you can refer to the `Using gpg` section from the [git-secret main page](https://sobolevn.me/git-secret/).
 ### Decrypt the secret file
 The credentials are encrypted in the `.env.secret` file. To print the decrypted content to stdout, run
 ```shell
 git-secret cat [filename]
 ```
 Or to get the decrypted `.env` file, run
 ```shell
 git-secret reveal [filename]
 ```
 ### For Windows users
 git-secret is currently not available for Windows, thus the easiest way is to use it in WSL (please use the latest version of WSL2). From there you have 2 options:
 1. Using the gpg of WSL.
   This is the most straight-forward option since you would use WSL just like any other unix environment. However, the downside is that you have to make WSL your main environment, which means WSL must have write permission on your repo. To achieve this, you must either:
   - Clone and store your repo inside WSL's file system.
   - Provide WSL with necessary permission on your Windows file system. This can be achieve by setting `automount` options for WSL. To do that, add these content to `/etc/wsl.conf` and then restart your sub-system.
     ```shell
     [automount]
     options = "metadata,umask=022,fmask=011"
     ```
     This enables all permissions for user owner.
 2. Using the gpg of Windows but with git-secret from WSL.
   For those who use Windows as the main environment, having to switch back and forth between Windows and WSL will be inconvenient. You can instead stay within your Windows environment and apply some tricks to use `git-secret` from WSL.
   - Install and setup `gpg` on Windows.
   - Install `git-secret` on WSL. Now in Windows, you can invoke `git-secret` using `wsl git-secret`.
   - Alternatively you can setup alias in CMD to shorten the syntax. Please refer to [this SO answer](https://stackoverflow.com/a/65823225) for the instruction. Some recommended aliases are:
     ```bat
     @echo off
     :: Commands
     DOSKEY ls=dir /B $*
     DOSKEY ll=dir /a $*
     DOSKEY git-secret=wsl git-secret $*
     DOSKEY gs=wsl git-secret $*
     ```
     Now you can invoke `git-secret` in CMD using `git-secret` or `gs`.
     - For Powershell users, similar behaviours can be achieved using `Set-Alias` and `profile.ps1`. Please refer this [SO thread](https://stackoverflow.com/questions/61081434/how-do-i-create-a-permanent-alias-file-in-powershell-core) as an example.
 # PR guideline
 ## Common conventions
 - Review should be done as soon as possible (within 2 business days).
 - PR title: [ticket] One-line description (example: [AUR-385, AUR-388] Declare BaseComponent and decide LLM call interface).
 - [Encouraged] Provide a quick description in the PR, so that:
  - Reviewers can quickly understand the direction of the PR.
  - It will be included in the commit message when the PR is merged.
 ## Environment caching on PR
 - To speed up CI, environments are cached based on the version specified in `__init__.py`.
 - Since dependencies versions in `setup.py` are not pinned, you need to pump the version in order to use a new environment. That environment will then be cached and used by your subsequence commits within the PR, until you pump the version again
 - The new environment created during your PR is cached and will be available to others once the PR is merged.
 - If you are experimenting with new dependencies and want a fresh environment every time, add `[ignore cache]` in your commit message. The CI will create a fresh environment to run your commit and then discard it.
 - If your PR include updated dependencies, the recommended workflow would be:
  - Doing development as usual.
  - When you want to run the CI, push a commit with the message containing `[ignore cache]`.
  - Once the PR is final, pump the version in `__init__.py` and push a final commit not containing `[ignore cache]`.
 Examples: https://github.com/Cinnamon/kotaemon/pull/2
 ## Merge PR guideline
 - Use squash and merge option
 - 1st line message is the PR title.
 - The text area is the PR description.
 ![image](images/274787925-e2593010-d7ef-46e3-8719-6fcae0315b5d.png)
 ![image](images/274787941-bfe6a117-85cd-4dd4-b432-197c791a9901.png)
 ## Develop pipelines
 - Nodes
 - Params
 - Run function
 ```
 from kotaemon.base import BaseComponent
 class Pipeline(BaseComponent):
   llm: AzureOpenAIEmbedding
   doc_store: BaseDocumentStore
   def run(self, input1, input2) -> str:
      input = input1 + input2
      output = self.llm(input)
      self.doc_store.add(output)
      return output
 pipeline = Pipeline(llm=AzureOpenAILLM(), doc_store=InMemoryDocstore())
 output = pipeline("this is text1", "this is text2")
 ```
--- a/docs/development/contributing.md
+++ b/docs/development/contributing.md
@ -0,0 +1,72 @@
 # Package overview
 `kotaemon` library focuses on the AI building blocks to implement a RAG-based QA application. It consists of base interfaces, core components and a list of utilities:
 - Base interfaces: `kotaemon` defines the base interface of a component in a pipeline. A pipeline is also a component. By clearly define this interface, a pipeline of steps can be easily constructed and orchestrated.
 - Core components: `kotaemon` implements (or wraps 3rd-party libraries
  like Langchain, llama-index,... when possible) commonly used components in
  kotaemon use cases. Some of these components are: LLM, vector store,
  document store, retriever... For a detailed list and description of these
  components, please refer to the [API Reference](/reference/nav/) section.
 - List of utilities: `kotaemon` provides utilities and tools that are
  usually needed in client project. For example, it provides a prompt
  engineering UI for AI developers in a project to quickly create a prompt
  engineering tool for DMs and QALs. It also provides a command to quickly spin
  up a project code base. For a full list and description of these utilities,
  please refer to the [Utilities](/development/utilities) section.
 ```mermaid
 mindmap
  root((kotaemon))
    Base Interfaces
      Document
      LLMInterface
      RetrievedDocument
      BaseEmbeddings
      BaseChat
      BaseCompletion
      ...
    Core Components
      LLMs
        AzureOpenAI
        OpenAI
      Embeddings
        AzureOpenAI
        OpenAI
        HuggingFaceEmbedding
      VectorStore
        InMemoryVectorstore
        ChromaVectorstore
      Agent
      Tool
      DocumentStore
      ...
    Utilities
      Scaffold project
      PromptUI
      Documentation Support
 ```
 # Common conventions
 - PR title: One-line description (example: Feat: Declare BaseComponent and decide LLM call interface).
 - [Encouraged] Provide a quick description in the PR, so that:
  - Reviewers can quickly understand the direction of the PR.
  - It will be included in the commit message when the PR is merged.
 # Environment caching on PR
 - To speed up CI, environments are cached based on the version specified in `__init__.py`.
 - Since dependencies versions in `setup.py` are not pinned, you need to pump the version in order to use a new environment. That environment will then be cached and used by your subsequence commits within the PR, until you pump the version again
 - The new environment created during your PR is cached and will be available to others once the PR is merged.
 - If you are experimenting with new dependencies and want a fresh environment every time, add `[ignore cache]` in your commit message. The CI will create a fresh environment to run your commit and then discard it.
 - If your PR include updated dependencies, the recommended workflow would be:
  - Doing development as usual.
  - When you want to run the CI, push a commit with the message containing `[ignore cache]`.
  - Once the PR is final, pump the version in `__init__.py` and push a final commit not containing `[ignore cache]`.
 # Merge PR guideline
 - Use squash and merge option
 - 1st line message is the PR title.
 - The text area is the PR description.
--- a/docs/development/create-a-component.md
+++ b/docs/development/create-a-component.md
--- a/docs/development/data-components.md
+++ b/docs/development/data-components.md
--- a/docs/development/index.md
+++ b/docs/development/index.md
@ -0,0 +1 @@
 --8<-- "README.md"
--- a/docs/development/utilities.md
+++ b/docs/development/utilities.md
@ -4,11 +4,13 @@ Utilities detail can be referred in the sub-pages of this section.
 ![chat-ui](images/271332562-ac8f9aac-d853-4571-a48b-d866a99eaf3e.png)
-**_Important:_** despite the name prompt engineering UI, this tool allows DMs to test any kind of parameters that are exposed by AIRs. Prompt is one kind of param. There can be other type of params that DMs can tweak (e.g. top_k, temperature...).
+**_Important:_** despite the name prompt engineering UI, this tool allows testers to test any kind of parameters that are exposed by developers. Prompt is one kind of param. There can be other type of params that testers can tweak (e.g. top_k, temperature...).
-**_Note:_** For hands-on examination of how to use prompt engineering UI, refer `./examples/promptui` and `./examples/example2/`
+In the development process, developers typically build the pipeline. However, for use
-
+cases requiring expertise in prompt creation, non-technical members (testers, domain experts) can be more
-In client projects, AI developers typically build the pipeline. However, for LLM projects requiring Japanese and domain expertise in prompt creation, non-technical team members (DM, BizDev, and QALs) can be more effective. To facilitate this, "xxx" offers a user-friendly prompt engineering UI that AI developers integrate into their pipelines. This enables non-technical members to adjust prompts and parameters, run experiments, and export results for optimization.
+effective. To facilitate this, `kotaemon` offers a user-friendly prompt engineering UI
 that developers integrate into their pipelines. This enables non-technical members to
 adjust prompts and parameters, run experiments, and export results for optimization.
 As of Sept 2023, there are 2 kinds of prompt engineering UI:
@ -19,22 +21,23 @@ As of Sept 2023, there are 2 kinds of prompt engineering UI:
 For simple pipeline, the supported client project workflow looks as follow:
-1. [AIR] Build pipeline
+1. [tech] Build pipeline
-2. [AIR] Export pipeline to config: `$ kh promptui export <module.path.piplineclass> --output <path/to/config/file.yml>`
+2. [tech] Export pipeline to config: `$ kotaemon promptui export <module.path.piplineclass> --output <path/to/config/file.yml>`
-3. [AIR] Customize the config
+3. [tech] Customize the config
-4. [AIR] Spin up prompt engineering UI: `$ kh promptui run <path/to/config/file.yml>`
+4. [tech] Spin up prompt engineering UI: `$ kotaemon promptui run <path/to/config/file.yml>`
-5. [DM] Change params, run inference
+5. [non-tech] Change params, run inference
-6. [DM] Export to Excel
+6. [non-tech] Export to Excel
-7. [DM] Select the set of params that achieve the best output
+7. [non-tech] Select the set of params that achieve the best output
-The prompt engineering UI prominently involves from step 2 to step 7 (step 1 is normal AI tasks in project, while step 7 happens exclusively in Excel file).
+The prompt engineering UI prominently involves from step 2 to step 7 (step 1 is normally
 done by the developers, while step 7 happens exclusively in Excel file).
 #### Step 2 - Export pipeline to config
 Command:
 ```
-$ kh promptui export <module.path.piplineclass> --output <path/to/config/file.yml>
+$ kotaemon promptui export <module.path.piplineclass> --output <path/to/config/file.yml>
 ```
 where:
@ -54,14 +57,14 @@ Declared as above, and `param1` will show up in the config YAML file, while `par
 #### Step 3 - Customize the config
-AIR can further edit the config file in this step to get the most suitable UI (step 4) with their tasks. The exported config will have this overall schema:
+developers can further edit the config file in this step to get the most suitable UI (step 4) with their tasks. The exported config will have this overall schema:
 ```
 <module.path.pipelineclass1>:
  params:
    ... (Detail param information to initiate a pipeline. This corresponds to the pipeline init parameters.)
  inputs:
-    ... (Detail the input of the pipeline e.g. a text prompt, an FNOL... This corresponds to the params of `run(...)` method.)
+    ... (Detail the input of the pipeline e.g. a text prompt. This corresponds to the params of `run(...)` method.)
  outputs:
    ... (Detail the output of the pipeline e.g. prediction, accuracy... This is the output information we wish to see in the UI.)
  logs:
@ -141,7 +144,7 @@ logs:
 Command:
 ```
-$ kh promptui run <path/to/config/file.yml>
+$ kotaemon promptui run <path/to/config/file.yml>
 ```
 This will generate an UI as follow:
--- a/docs/images/chat-interface.png
+++ b/docs/images/chat-interface.png
--- a/docs/images/chat-tab.png
+++ b/docs/images/chat-tab.png
--- a/docs/images/converstaion-control.png
+++ b/docs/images/converstaion-control.png
--- a/docs/images/file-index-tab.png
+++ b/docs/images/file-index-tab.png
--- a/docs/images/file-index.png
+++ b/docs/images/file-index.png
--- a/docs/images/file-list.png
+++ b/docs/images/file-list.png
--- a/docs/images/file-upload.png
+++ b/docs/images/file-upload.png
--- a/docs/images/info-panel.png
+++ b/docs/images/info-panel.png
--- a/docs/images/startup-chat-tab.png
+++ b/docs/images/startup-chat-tab.png
--- a/docs/index.md
+++ b/docs/index.md
@ -1 +1,122 @@
--8<-- "README.md"
+# Getting Started with Kotaemon
 This page is intended for end users who want to use the `kotaemon` tool for Question Answering on local documents.
 ## Download
 Download and upzip the latest version of `kotaemon` by clicking this
 [link](https://github.com/Cinnamon/kotaemon/archive/refs/heads/main.zip).
 ## Choose what model to use
 The tool uses Large Language Model (LLMs) to perform various tasks in a QA pipeline. So
 prior to running, you need to provide the application with access to the LLMs you want
 to use.
 Please edit the `.env` file with the information needed to connect to the LLMs. You only
 need to provide at least one. However, tt is recommended that you include all the LLMs
 that you have access to, you will be able to switch between them while using the
 application.
 Currently, the following providers are supported:
 ### OpenAI
 In the `.env` file, set the `OPENAI_API_KEY` variable with your OpenAI API key in order
 to enable access to OpenAI's models. There are other variables that can be modified,
 please feel free to edit them to fit your case. Otherwise, the default parameter should
 work for most people.
 ```shell
 OPENAI_API_BASE=https://api.openai.com/v1
 OPENAI_API_KEY=<your OpenAI API key here>
 OPENAI_CHAT_MODEL=gpt-3.5-turbo
 OPENAI_EMBEDDINGS_MODEL=text-embedding-ada-002
 ```
 ### Azure OpenAI
 For OpenAI models via Azure platform, you need to provide your Azure endpoint and API
 key. Your might also need to provide your developments' name for the chat model and the
 embedding model depending on how you set up Azure development.
 ```shell
 AZURE_OPENAI_ENDPOINT=
 AZURE_OPENAI_API_KEY=
 OPENAI_API_VERSION=2024-02-15-preview
 AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-35-turbo
 AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT=text-embedding-ada-002
 ```
 ### Local models
 - Pros:
  - Privacy. Your documents will be stored and process locally.
  - Choices. There are a wide range of LLMs in terms of size, domain, language to choose
    from.
  - Cost. It's free.
 - Cons:
  - Quality. Local models are much smaller and thus have lower generative quality than
    paid APIs.
  - Speed. Local models are deployed using your machine so the processing speed is
    limited by your hardware.
 #### Find and download a LLM
 You can search and download a LLM to be ran locally from the [Hugging Face
 Hub](https://huggingface.co/models). Currently, these model formats are supported:
 - GGUF
 You should choose a model whose size is less than your device's memory and should leave
 about 2 GB. For example, if you have 16 GB of RAM in total, of which 12 GB is available,
 then you should choose a model that take up at most 10 GB of RAM. Bigger models tend to
 give better generation but also take more processing time.
 Here are some recommendations and their size in memory:
 - [Qwen1.5-1.8B-Chat-GGUF](https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q8_0.gguf?download=true):
  around 2 GB
 #### Enable local models
 To add a local model to the model pool, set the `LOCAL_MODEL` variable in the `.env`
 file to the path of the model file.
 ```shell
 LOCAL_MODEL=<full path to your model file>
 ```
 Here is how to get the full path of your model file:
 - On Windows 11: right click the file and select `Copy as Path`.
 ## Installation
 1. Navigate to the `scripts` folder and start an installer that matches your OS:
   - Windows: `run_windows.bat`. Just double click the file.
   - macOS: `run_macos.sh`
     1. Right click on your file and select Open with and Other.
     2. Enable All Applications and choose Terminal.
     3. NOTE: If you always want to open that file with Terminal, then check Always Open With.
     4. From now on, double click on your file and it should work.
   - Linux: `run_linux.sh`. If you are using Linux, you would know how to run a bash
     script, right ?
 2. After the installation, the installer will ask to launch the ktem's UI,answer to continue.
 3. If launched, the application will be available at `http://localhost:7860/`.
 ## Launch
 To launch the app after initial setup or any changes, simply run the `run_*` script again.
 A browser window will be opened and greet you with this screen:
 ![Chat tab](images/startup-chat-tab.png)
 ## Usage
 For how to use the application, see [Usage](/usage). Have fun !
 ## Feedback
 Feel free to create a bug report or a feature request or join a discussion at https://github.com/Cinnamon/kotaemon/issues.
--- a/docs/overview.md
+++ b/docs/overview.md
@ -1,84 +0,0 @@
 ## Introduction
 `kotaemon` library focuses on the AI building blocks to implement the Kotaemon. It can be used in both client project and in product development. It consists of base interfaces, core components and a list of utilities:
 - Base interfaces: `kotaemon` defines the base interface of a component in a pipeline. A pipeline is also a component. By clearly define this interface, a pipeline of steps can be easily constructed and orchestrated.
 - Core components: `kotaemon` implements (or wraps 3rd-party libraries
  like Langchain, llama-index,... when possible) commonly used components in
  kotaemon use cases. Some of these components are: LLM, vector store,
  document store, retriever... For a detailed list and description of these
  components, please refer to the [API Reference](/reference/nav/) section.
 - List of utilities: `lib-knowledge` provides utilities and tools that are
  usually needed in client project. For example, it provides a prompt
  engineering UI for AI developers in a project to quickly create a prompt
  engineering tool for DMs and QALs. It also provides a command to quickly spin
  up a project code base. For a full list and description of these utilities,
  please refer to the [Tutorial/Utilities](/ultilities) section.
 ```mermaid
 mindmap
  root((kotaemon))
    Base Interfaces
      Document
      LLMInterface
      RetrievedDocument
      BaseEmbeddings
      BaseChat
      BaseCompletion
      ...
    Core Components
      LLMs
        AzureOpenAI
        OpenAI
      Embeddings
        AzureOpenAI
        OpenAI
        HuggingFaceEmbedding
      VectorStore
        InMemoryVectorstore
        ChromaVectorstore
      Agent
      Tool
      DocumentStore
      ...
    Utilities
      Scaffold project
      PromptUI
      Documentation Support
 ```
 ## Expected benefit
 Before `kotaemon`:
 - Starting everything from scratch.
 - Knowledge and expertise is fragmented.
 - Nothing to reuse.
 - No way to collaborate between tech and non-tech experts.
 `kotaemon` expects to completely revolutionize the way we are building LLM-related projects. It helps the company side-steps those issues by:
 - Standardize the interface to (1) make building LLM pipeline clearer (2) more reasonable to integrate pipelines between different projects.
 - Centralize LLM-related technical components into 1 place. Avoid fragmented technology development. Easy to find the LLM-related technology inside the company.
 - Centralize bug fixes and improvements in 1 place.
 - Reduce boilerplate code during project development.
 - Lightning fast prototyping.
 ## Install
 The kotaemon can be installed from source with:
 ```
 pip install kotaemon@git+ssh://git@github.com/Cinnamon/kotaemon.git
 ```
 or from Cinnamon's internal python package index:
 ```
 pip install kotaemon --extra-index-url https://ian_devpi.promptui.dm.cinnamon.is/root/packages
 ```
 ## Example use cases
 - Start a project from scratch: `kh start-project`
 - Run prompt engineering UI tool: `kh promptui export`, then `kh promptui run`.
--- a/docs/upload-package.md
+++ b/docs/upload-package.md
@ -1,24 +0,0 @@
 Devpi server endpoint (subjected to change): https://ian_devpi.promptui.dm.cinnamon.is/root/packages
 Install devpi-client
 ```bash
 pip install devpi-client
 ```
 Login to the server
 ```bash
 devpi use <server endpoint> # set server endpoint provided above
 devpi login <user name> --password=<your password> # login
 ```
 If you don't yet have an account, please contact Ian or John.
 Upload your package
 ```bash
 devpi use <package name>\dev # choose the index to upload your package
 cd <your package directory which must contain a pyproject.toml/setup.py>
 devpi upload
 ```
--- a/docs/usage.md
+++ b/docs/usage.md
@ -0,0 +1,59 @@
 # Basic Usage
 ## Chat tab
 ![chat tab](images/chat-tab.png)
 The chat tab is divided into 3 columns:
 - Left: Conversation settings
 - Middle: Chat interface
 - Right: Information panel
 ### Conversation settings
 #### Conversation control
 Create, rename, and delete conversations.
 ![covnerstaion control](images/converstaion-control.png)
 #### File index
 Choose which files to retrieve references from. If no file is selected, all files will be used.
 ![file index](images/file-index.png)
 ### Chat interface
 Interact with the chatbot.
 ![chat interface](images/chat-interface.png)
 ### Information panel
 Supporting information such as the retrieved evidence and reference will be displayed
 here.
 ![info panel](images/info-panel.png)
 ## File index tab
 ![file index tab](images/file-index-tab.png)
 ### File upload
 In order for a file to be used as an index for retrieval, it must be processed by the
 application first. Do this uploading your file to the UI and then select `Upload and Index`.
 ![file upload](images/file-upload.png)
 The application will take some time to process the file and show a message once it is
 done. Then you will be able to select it in the [File index section](#file-index) of the [Chat tab](#chat-tab).
 ### File list
 This section shows the list of files that have been uploaded to the application and
 allows users to delete them.
 ![file-list](images/file-list.png)
--- a/libs/kotaemon/pyproject.toml
+++ b/libs/kotaemon/pyproject.toml
@ -73,7 +73,7 @@ dev = [
 all = ["kotaemon[adv,dev]"]
 [project.scripts]
-kh = "kotaemon.cli:main"
+kotaemon = "kotaemon.cli:main"
 [project.urls]
 Homepage = "https://github.com/Cinnamon/kotaemon/"
--- a/libs/kotaemon/tests/test_agent.py
+++ b/libs/kotaemon/tests/test_agent.py
@ -224,8 +224,8 @@ def test_wrapper_agent_langchain(openai_completion, llm, mock_google_search):
    side_effect=_openai_chat_completion_responses_react_langchain_tool,
 )
 def test_react_agent_with_langchain_tools(openai_completion, llm):
-    from langchain.tools import DuckDuckGoSearchRun, WikipediaQueryRun
+    from langchain_community.tools import DuckDuckGoSearchRun, WikipediaQueryRun
-    from langchain.utilities import WikipediaAPIWrapper
+    from langchain_community.utilities import WikipediaAPIWrapper
    wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
    search = DuckDuckGoSearchRun()
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -6,8 +6,7 @@ edit_uri: edit/main/docs/
 nav:
  - Getting Started:
      - Quick Start: index.md
-      - Overview: overview.md
+      - Basic Usage: usage.md
      - Contributing: contributing.md
  - Application:
      - Features: pages/app/features.md
      - Index:
@ -20,15 +19,13 @@ nav:
      - Customize flow logic: pages/app/customize-flows.md
      - Customize UI: pages/app/customize-ui.md
      - Functional description: pages/app/functional-description.md
-  - Tutorial:
+  - Development:
-      - Data & Data Structure Components: data-components.md
+      - Contributing: development/contributing.md
-      - Creating a Component: create-a-component.md
+      - Data & Data Structure Components: development/data-components.md
-      - Utilities: ultilities.md
+      - Creating a Component: development/create-a-component.md
      - Utilities: development/utilities.md
  # generated using gen-files + literate-nav
  - API Reference: reference/
  - Use Cases: examples/
  - Misc:
      - Upload python package to private index: upload-package.md
 markdown_extensions:
  - admonition
@ -59,7 +56,6 @@ plugins:
  - gen-files:
      scripts:
        - docs/scripts/generate_reference_docs.py
        - docs/scripts/generate_examples_docs.py
  - literate-nav:
      nav_file: NAV.md
  - mkdocstrings: