kotaemon

Author	SHA1	Message	Date
ian	b6ac35029f	remove git secret	2024-03-28 16:04:12 +07:00
ian_Cin	df12dec732	Feat/local endpoint llm (#148 ) * serve local model in a different process from the app --------- Co-authored-by: albert <albert@cinnamon.is> Co-authored-by: trducng <trungduc1992@gmail.com>	2024-03-15 16:17:33 +07:00
Albert (Quang)	cc87aaa783	Add one-click installers for Linux, Windows, and MacOS (#146 ) * feat: Add installers for linux, windows, and macos * docs: Update README * pre-commit fix styles * Update installers and README * Remove env vars check and fix paths * Update installers: * Remove start.py and move install and launch part back to .sh/.bat * Add conda deactivate * Make messages more informative * Improve kotaemon based on insights from projects (#147) - Include static files in the package. - More reliable information panel. Faster & not breaking randomly. - Add directory upload. - Enable zip file to upload. - Allow setting endpoint for the OCR reader using environment variable. * feat: Add installers for linux, windows, and macos * docs: Update README * pre-commit fix styles * Update installers and README * Remove env vars check and fix paths * Update installers: * Remove start.py and move install and launch part back to .sh/.bat * Add conda deactivate * Make messages more informative * Make macOS installer runable and improve Windows, Linux installers * Minor fix macos commands * installation should pause before exit * Update Windows installer: add a new label to exit function with error * put install_dir to .gitignore * chore: Add comments to clarify the 'end' labels --------- Co-authored-by: Duc Nguyen (john) <trungduc1992@gmail.com> Co-authored-by: ian <ian@cinnamon.is>	2024-03-06 10:59:30 +07:00
Duc Nguyen (john)	2dd531114f	Make ktem official (#134 ) * Move kotaemon and ktem into same folder * Update docs * Update CI * Resolve mypy, isorts * Re-allow test pdf files	2024-01-23 10:54:18 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	9945afdf6f	Add Reranker implementation and integration in Retrieving pipeline (#77 ) * Add base Reranker * Add LLM Reranker * Add Cohere Reranker * Add integration of Rerankers in Retrieving pipeline	2023-11-15 16:03:51 +07:00
ian_Cin	84f1fa8cbd	[AUR-395] Adopt Example1 disclaimer pipeline (#42 ) * Adopt Example1 disclaimer pipeline * Update Document class * Add composite components * Modify Extractor behaviours	2023-10-10 15:42:48 +07:00
Nguyen Trung Duc (john)	6ab1854532	feat: Add chain-of-thought (#37 ) * Add chain-of-thought * Use BasePromptComponent * Add terminate callback for the chain-of-thought	2023-10-04 02:16:33 +07:00
ian_Cin	d83c22aa4e	[AUR-395, AUR-415] Adopt Example1 Injury pipeline; add .flow() for enabling bottom-up pipeline execution (#32 ) * add example1/injury pipeline example * add dotenv * update various api	2023-10-02 16:24:56 +07:00
Tuan Anh Nguyen Dang (Tadashi_Cin)	6207f4332a	[AUR-430] Add test case for Chroma VectoStore save/load (#26 ) * add test case for Chroma save/load * minor name change * add delete_collection support for chroma * move save load to chroma --------- Co-authored-by: Nguyen Trung Duc (john) <john@cinnamon.is>	2023-09-26 10:58:41 +07:00
ian_Cin	5241edbc46	[AUR-361] Setup pre-commit, pytest, GitHub actions, ssh-secret (#3 ) Co-authored-by: trducng <trungduc1992@gmail.com>	2023-08-30 07:22:01 +07:00
Nguyen Trung Duc (john)	c3c25db48c	[AUR-385, AUR-388] Declare BaseComponent and decide LLM call interface (#2 ) - Use cases related to LLM call: https://cinnamon-ai.atlassian.net/browse/AUR-388?focusedCommentId=34873 - Sample usages: `test_llms_chat_models.py` and `test_llms_completion_models.py`: ```python from kotaemon.llms.chats.openai import AzureChatOpenAI model = AzureChatOpenAI( openai_api_base="https://test.openai.azure.com/", openai_api_key="some-key", openai_api_version="2023-03-15-preview", deployment_name="gpt35turbo", temperature=0, request_timeout=60, ) output = model("hello world") ``` For the LLM-call component, I decide to wrap around Langchain's LLM models and Langchain's Chat models. And set the interface as follow: - Completion LLM component: ```python class CompletionLLM: def run_raw(self, text: str) -> LLMInterface: # Run text completion: str in -> LLMInterface out def run_batch_raw(self, text: list[str]) -> list[LLMInterface]: # Run text completion in batch: list[str] in -> list[LLMInterface] out # run_document and run_batch_document just reuse run_raw and run_batch_raw, due to unclear use case ``` - Chat LLM component: ```python class ChatLLM: def run_raw(self, text: str) -> LLMInterface: # Run chat completion (no chat history): str in -> LLMInterface out def run_batch_raw(self, text: list[str]) -> list[LLMInterface]: # Run chat completion in batch mode (no chat history): list[str] in -> list[LLMInterface] out def run_document(self, text: list[BaseMessage]) -> LLMInterface: # Run chat completion (with chat history): list[langchain's BaseMessage] in -> LLMInterface out def run_batch_document(self, text: list[list[BaseMessage]]) -> list[LLMInterface]: # Run chat completion in batch mode (with chat history): list[list[langchain's BaseMessage]] in -> list[LLMInterface] out ``` - The LLMInterface is as follow: ```python @dataclass class LLMInterface: text: list[str] completion_tokens: int = -1 total_tokens: int = -1 prompt_tokens: int = -1 logits: list[list[float]] = field(default_factory=list) ```	2023-08-29 15:47:12 +07:00
Nguyen Trung Duc (john)	e9d1d5c118	[AUR-401] Disable Haystack telemetry with monkey patching (#1 ) Sample Haystack log when running a pipeline. Note: the `pipeline.classname` can leak company information. ```json { "hardware.cpus": 16, "hardware.gpus": 0, "libraries.colab": false, "libraries.cuda": false, "libraries.haystack": "1.20.0rc0", "libraries.ipython": false, "libraries.pytest": false, "libraries.ray": false, "libraries.torch": false, "libraries.transformers": "4.31.0", "os.containerized": false, "os.family": "Linux", "os.machine": "x86_64", "os.version": "6.2.0-26-generic", "pipeline.classname": "TempPipeline", "pipeline.config_hash": "07a8eddd5a6e512c0d898c6d9f445ed9", "pipeline.nodes.PromptNode": 1, "pipeline.nodes.Shaper": 1, "pipeline.nodes.WebRetriever": 1, "pipeline.run_parameters.debug": false, "pipeline.run_parameters.documents": [ 0 ], "pipeline.run_parameters.file_paths": 0, "pipeline.run_parameters.labels": 0, "pipeline.run_parameters.meta": 1, "pipeline.run_parameters.params": false, "pipeline.run_parameters.queries": true, "pipeline.runs": 1, "pipeline.type": "Query", "python.version": "3.10.12" } ``` Solution: Haystack telemetry uses the `telemetry` variable, `posthog` library and `HAYSTACK_TELEMETRY_ENABLED` envar. We set the envar to False and make sure the relevant objects are disabled.	2023-08-22 10:02:46 +07:00

12 Commits