Commit Graph

196 Commits

Author SHA1 Message Date
Duc Nguyen (john)
e1cf970a3d Disable Gradio analytics and unnecessary font loading to avoid app hanging in private network (#145) 2024-02-20 22:02:28 +07:00
trducng
08cc99d8db Add docstring for database and OCR loader 2024-02-20 21:20:48 +07:00
Duc Nguyen (john)
767aaaa1ef Utilize llama.cpp for both completion and chat models (#141) 2024-02-20 18:17:48 +07:00
ian_Cin
a86c727869 add albert to git-secret (#140)
* add albert to git-secret

* update readme

* Limit llama-index version

* Langchain upgrade their wikipedia tool name

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2024-02-20 17:28:06 +07:00
trducng
89450ab661 Enable zip file upload in ktem 2024-02-20 02:59:46 +07:00
Duc Nguyen (john)
d36522129f refactor: replace llama-index based loader, to a llama-index mixin loader (#142) 2024-02-20 02:33:28 +07:00
trducng
7fc54d52e4 Improve ocr loader error message 2024-02-06 12:21:12 +07:00
trducng
1a4fd7c33f Update default settings to conform Langchain's Azure implementation 2024-02-05 18:04:36 +07:00
trducng
771f074c0e Add utf-8 encoding in Help Page for rendering on Windows 2024-02-05 16:42:40 +07:00
trducng
bff55230ba Reduce the default chunk size in the reasoning pipeline to fit LLM capability 2024-02-03 09:38:50 +07:00
trducng
107bc7580e Enable HTML upload 2024-02-02 11:37:57 +07:00
Duc Nguyen (john)
65852b7d71 Add docx + html reader (#139) 2024-01-31 19:21:30 +07:00
ian_Cin
116919b346 Update docs (#106) 2024-01-30 18:50:17 +07:00
trducng
cbe40fac99 Show retrieved but non-evidence docs. Support language changing 2024-01-29 11:16:07 +07:00
trducng
50b5d936f5 Optionally allow database migration with Alembic 2024-01-28 19:54:15 +07:00
trducng
04635b77f6 Make the database table customizable 2024-01-28 07:54:38 +07:00
trducng
6ae9634399 Enable .doc file 2024-01-27 23:45:19 +07:00
trducng
23c0331bab Enable pptx support 2024-01-27 23:08:06 +07:00
trducng
80ec214107 Fix loaders' file_path and other metadata 2024-01-27 22:52:46 +07:00
trducng
c6637ca56e Relate the retrievers to the indexer 2024-01-27 16:39:40 +07:00
trducng
9b586466ff Add the tutorial to mkdocs 2024-01-26 15:40:04 +00:00
Duc Nguyen (john)
22c646e5c4 Add documentation about adding reasoning and indexing pipelines to the application (#138) 2024-01-26 22:31:52 +07:00
trducng
757aabca4d Add app title, favicon. More natural chat 2024-01-25 22:40:32 +07:00
Duc Nguyen (john)
513e86f490 Add dedicated information panel to the UI (#137)
* Allow streaming to the chatbot and the information panel without threading
* Highlight evidence in a simple manner
2024-01-25 19:07:53 +07:00
Duc Nguyen (john)
ebc61400d8 Provide a developer mode when running ktem (#135)
Implement and utilize `on_app_created` to support the developer mode.
2024-01-23 11:46:59 +07:00
Duc Nguyen (john)
2dd531114f Make ktem official (#134)
* Move kotaemon and ktem into same folder

* Update docs

* Update CI

* Resolve mypy, isorts

* Re-allow test pdf files
2024-01-23 10:54:18 +07:00
Duc Nguyen (john)
9c5b707010 Customize application settings (#132)
* Allow customizing the base application

* Make the core llms and embeddings customizable

* Make the settings, reasoning and index customizable

* Import from langchain_openai
2024-01-21 14:36:07 +07:00
Duc Nguyen (john)
5a9d6f75be Migrate the MVP into kotaemon (#108)
- Migrate the MVP into kotaemon.
- Preliminary include the pipeline within chatbot interface.
- Organize MVP as an application.

Todo:

- Add an info panel to view the planning of agents -> Fix streaming agents' output.

Resolve: #60
Resolve: #61 
Resolve: #62
2024-01-10 15:28:09 +07:00
ian_Cin
230328c62f Best docs Cinnamon will probably ever have (#105) 2023-12-20 11:30:25 +07:00
Duc Nguyen (john)
0e30dcbb06 Create Langchain LLM converter to quickly supply it to Langchain's chain (#102)
* Create Langchain LLM converter to quickly supply it to Langchain's chain

* Clean up
2023-12-11 14:55:56 +07:00
Duc Nguyen (john)
da0ac1d69f Change template to private attribute and simplify imports (#101)
---------

Co-authored-by: ian <ian@cinnamon.is>
2023-12-08 18:10:34 +07:00
Duc Nguyen (john)
1f927d3391 Upgrade promptui to conform to Gradio V4 (#98) 2023-12-07 15:24:07 +07:00
ian_Cin
797df5a69c refractor agents (#100)
* refractor agents

* minor cosmetic, add terminal ui for cli

* pump to 0.3.4

* Add temporary path

* fix unclose files in tests

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
2023-12-06 17:06:29 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
d9e925eb75 Add UnstructuredReader with support for various legacy files (.doc, .xls) (#99) 2023-12-05 16:19:13 +07:00
Duc Nguyen (john)
37c744b616 Add file-based document store and vector store (#96)
* Modify docstore and vectorstore objects to be reconstructable
* Simplify the file docstore
* Use the simple file docstore and vector store in MVP
2023-12-04 17:46:00 +07:00
Duc Nguyen (john)
0ce3a8832f Provide type hints for pass-through Langchain and Llama-index objects (#95) 2023-12-04 10:59:13 +07:00
Duc Nguyen (john)
e34b1e4c6d Refactor the index component and update the MVP insurance accordingly (#90)
Refactor the `kotaemon/pipelines` module to `kotaemon/indices`. Create the VectorIndex.

Note: currently I place `qa` to be inside `kotaemon/indices` since at the moment we only have `qa` in RAG. At the same time, I think `qa` can be an independent module in `kotaemon/qa`. Since this can be changed later, I still go at the 1st option for now to observe if we can change it later.
2023-11-30 18:35:07 +07:00
Nguyen Trung Duc (john)
8e3a1d193f Refactor agents and tools (#91)
* Move tools to agents

* Move agents to dedicate place

* Remove subclassing BaseAgent from BaseTool
2023-11-30 09:52:08 +07:00
ian_Cin
4256030b4f Adopt pyproject.toml (#89)
* ditching setup.py in favour of pyproject.toml; bump to 0.3.2

* bump to 0.3.3
2023-11-29 14:58:35 +07:00
ian_Cin
8e0779a22d Enforce all IO objects to be subclassed from Document (#88)
* enforce Document as IO

* Separate rerankers, splitters and extractors (#85)

* partially refractor importing

* add text to embedding outputs

---------

Co-authored-by: Nguyen Trung Duc (john) <trungduc1992@gmail.com>
2023-11-27 16:35:09 +07:00
Nguyen Trung Duc (john)
2186c5558f Separate rerankers, splitters and extractors (#85) 2023-11-27 14:25:54 +07:00
ian_Cin
0dede9c82d Subclass chat messages from Document (#86) 2023-11-27 10:38:19 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
3ac277cc0b Update Elastics store delete() (#84) 2023-11-21 15:29:00 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
9a96a9b876 Add Elasticsearch Docstore (#83)
* add Elasticsearch Docstore

* update missing requirements

* add docstore

* [ignore cache] update default param

* update docstring
2023-11-21 11:59:20 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
8bb7ad91e0 Add Langchain Agent wrapper with OpenAI Function / Self-ask agent support (#82)
* update Param() type hint in MVP

* update default embedding endpoint

* update Langchain agent wrapper

* update langchain agent
2023-11-20 16:26:08 +07:00
Nguyen Trung Duc (john)
0a3fc4b228 Correct the use of abstractmethod (#80)
* Correct abstractmethod usage

* Update interface

* Specify minimal llama-index version [ignore cache]

* Update examples
2023-11-20 11:18:53 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
98509f886c Update splitters + metadata extractor interface to conform with new LlamaIndex design (#81)
* change splitter to general doc parsers class to fit new llama-index desing
* moving interface of splitter
2023-11-20 10:09:30 +07:00
Nguyen Trung Duc (john)
98c76c4700 Refactor excel Loader (#79) 2023-11-16 11:30:11 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)
cc1e75b3c6 Add Citation pipeline (#78)
* add rerankers in retrieving pipeline

* update example MVP pipeline

* add citation pipeline and function call interface

* change return type of QA and AgentPipeline to Document
2023-11-16 11:24:35 +07:00
Nguyen Trung Duc (john)
f8b8d86d4e Move LLM-related components into LLM module (#74)
* Move splitter into indexing module
* Rename post_processing module to parsers
* Migrate LLM-specific composite pipelines into llms module

This change moves the `splitters` module into `indexing` module. The `indexing` module will be created soon, to house `indexing`-related components.

This change renames `post_processing` module into `parsers` module. Post-processing is a generic term which provides very little information. In the future, we will add other extractors into the `parser` module, like Metadata extractor...

This change migrates the composite elements into `llms` module. These elements heavily assume that the internal nodes are llm-specific. As a result, migrating these elements into `llms` module will make them more discoverable, and simplify code base structure.
2023-11-15 16:26:53 +07:00