kotaemon/libs/ktem/ktem/reasoning/simple.py
Tuan Anh Nguyen Dang (Tadashi_Cin) 2570e11501
feat: merge develop (#123)
* Support hybrid vector retrieval

* Enable figures and table reading in Azure DI

* Retrieve with multi-modal

* Fix mixing up table

* Add txt loader

* Add Anthropic Chat

* Raising error when retrieving help file

* Allow same filename for different people if private is True

* Allow declaring extra LLM vendors

* Show chunks on the File page

* Allow elasticsearch to get more docs

* Fix Cohere response (#86)

* Fix Cohere response

* Remove Adobe pdfservice from dependency

kotaemon doesn't rely more pdfservice for its core functionality,
and pdfservice uses very out-dated dependency that causes conflict.

---------

Co-authored-by: trducng <trungduc1992@gmail.com>

* Add confidence score (#87)

* Save question answering data as a log file

* Save the original information besides the rewritten info

* Export Cohere relevance score as confidence score

* Fix style check

* Upgrade the confidence score appearance (#90)

* Highlight the relevance score

* Round relevance score. Get key from config instead of env

* Cohere return all scores

* Display relevance score for image

* Remove columns and rows in Excel loader which contains all NaN (#91)

* remove columns and rows which contains all NaN

* back to multiple joiner options

* Fix style

---------

Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local>
Co-authored-by: trducng <trungduc1992@gmail.com>

* Track retriever state

* Bump llama-index version 0.10

* feat/save-azuredi-mhtml-to-markdown (#93)

* feat/save-azuredi-mhtml-to-markdown

* fix: replace os.path to pathlib change theflow.settings

* refactor: base on pre-commit

* chore: move the func of saving content markdown above removed_spans

---------

Co-authored-by: jacky0218 <jacky0218@github.com>

* fix: losing first chunk (#94)

* fix: losing first chunk.

* fix: update the method of preventing losing chunks

---------

Co-authored-by: jacky0218 <jacky0218@github.com>

* fix: adding the base64 image in markdown (#95)

* feat: more chunk info on UI

* fix: error when reindexing files

* refactor: allow more information exception trace when using gpt4v

* feat: add excel reader that treats each worksheet as a document

* Persist loader information when indexing file

* feat: allow hiding unneeded setting panels

* feat: allow specific timezone when creating conversation

* feat: add more confidence score (#96)

* Allow a list of rerankers

* Export llm reranking score instead of filter with boolean

* Get logprobs from LLMs

* Rename cohere reranking score

* Call 2 rerankers at once

* Run QA pipeline for each chunk to get qa_score

* Display more relevance scores

* Define another LLMScoring instead of editing the original one

* Export logprobs instead of probs

* Call LLMScoring

* Get qa_score only in the final answer

* feat: replace text length with token in file list

* ui: show index name instead of id in the settings

* feat(ai): restrict the vision temperature

* fix(ui): remove the misleading message about non-retrieved evidences

* feat(ui): show the reasoning name and description in the reasoning setting page

* feat(ui): show version on the main windows

* feat(ui): show default llm name in the setting page

* fix(conf): append the result of doc in llm_scoring (#97)

* fix: constraint maximum number of images

* feat(ui): allow filter file by name in file list page

* Fix exceeding token length error for OpenAI embeddings by chunking then averaging (#99)

* Average embeddings in case the text exceeds max size

* Add docstring

* fix: Allow empty string when calling embedding

* fix: update trulens LLM ranking score for retrieval confidence, improve citation (#98)

* Round when displaying not by default

* Add LLMTrulens reranking model

* Use llmtrulensscoring in pipeline

* fix: update UI display for trulen score

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* feat: add question decomposition & few-shot rewrite pipeline (#89)

* Create few-shot query-rewriting. Run and display the result in info_panel

* Fix style check

* Put the functions to separate modules

* Add zero-shot question decomposition

* Fix fewshot rewriting

* Add default few-shot examples

* Fix decompose question

* Fix importing rewriting pipelines

* fix: update decompose logic in fullQA pipeline

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: add encoding utf-8 when save temporal markdown in vectorIndex (#101)

* fix: improve retrieval pipeline and relevant score display (#102)

* fix: improve retrieval pipeline by extending first round top_k with multiplier

* fix: minor fix

* feat: improve UI default settings and add quick switch option for pipeline

* fix: improve agent logics (#103)

* fix: improve agent progres display

* fix: update retrieval logic

* fix: UI display

* fix: less verbose debug log

* feat: add warning message for low confidence

* fix: LLM scoring enabled by default

* fix: minor update logics

* fix: hotfix image citation

* feat: update docx loader for handle merged table cells + handle zip file upload (#104)

* feat: update docx loader for handle merged table cells

* feat: handle zip file

* refactor: pre-commit

* fix: escape text in download UI

* feat: optimize vector store query db (#105)

* feat: optimize vector store query db

* feat: add file_id to chroma metadatas

* feat: remove unnecessary logs and update migrate script

* feat: iterate through file index

* fix: remove unused code

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: add openai embedidng exponential back-off

* fix: update import download_loader

* refactor: codespell

* fix: update some default settings

* fix: update installation instruction

* fix: default chunk length in simple QA

* feat: add share converstation feature and enable retrieval history (#108)

* feat: add share converstation feature and enable retrieval history

* fix: update share conversation UI

---------

Co-authored-by: taprosoft <tadashi@cinnamon.is>

* fix: allow exponential backoff for failed OCR call (#109)

* fix: update default prompt when no retrieval is used

* fix: create embedding for long image chunks

* fix: add exception handling for additional table retriever

* fix: clean conversation & file selection UI

* fix: elastic search with empty doc_ids

* feat: add thumbnail PDF reader for quick multimodal QA

* feat: add thumbnail handling logic in indexing

* fix: UI text update

* fix: PDF thumb loader page number logic

* feat: add quick indexing pipeline and update UI

* feat: add conv name suggestion

* fix: minor UI change

* feat: citation in thread

* fix: add conv name suggestion in regen

* chore: add assets for usage doc

* chore: update usage doc

* feat: pdf viewer (#110)

* feat: update pdfviewer

* feat: update missing files

* fix: update rendering logic of infor panel

* fix: improve thumbnail retrieval logic

* fix: update PDF evidence rendering logic

* fix: remove pdfjs built dist

* fix: reduce thumbnail evidence count

* chore: update gitignore

* fix: add js event on chat msg select

* fix: update css for viewer

* fix: add env var for PDFJS prebuilt

* fix: move language setting to reasoning utils

---------

Co-authored-by: phv2312 <kat87yb@gmail.com>
Co-authored-by: trducng <trungduc1992@gmail.com>

* feat: graph rag (#116)

* fix: reload server when add/delete index

* fix: rework indexing pipeline to be able to disable vectorstore and splitter if needed

* feat: add graphRAG index with plot view

* fix: update requirement for graphRAG and lighten unnecessary packages

* feat: add knowledge network index (#118)

* feat: add Knowledge Network index

* fix: update reader mode setting for knet

* fix: update init knet

* fix: update collection name to index pipeline

* fix: missing req

---------

Co-authored-by: jeff52415 <jeff.yang@cinnamon.is>

* fix: update info panel return for graphrag

* fix: retriever setting graphrag

* feat: local llm settings (#122)

* feat: expose context length as reasoning setting to better fit local models

* fix: update context length setting for agents

* fix: rework threadpool llm call

* fix: fix improve indexing logic

* fix: fix improve UI

* feat: add lancedb

* fix: improve lancedb logic

* feat: add lancedb vectorstore

* fix: lighten requirement

* fix: improve lanceDB vs

* fix: improve UI

* fix: openai retry

* fix: update reqs

* fix: update launch command

* feat: update Dockerfile

* feat: add plot history

* fix: update default config

* fix: remove verbose print

* fix: update default setting

* fix: update gradio plot return

* fix: default gradio tmp

* fix: improve lancedb docstore

* fix: fix question decompose pipeline

* feat: add multimodal reader in UI

* fix: udpate docs

* fix: update default settings & docker build

* fix: update app startup

* chore: update documentation

* chore: update README

* chore: update README

---------

Co-authored-by: trducng <trungduc1992@gmail.com>

* chore: update README

* chore: update README

---------

Co-authored-by: trducng <trungduc1992@gmail.com>
Co-authored-by: cin-ace <ace@cinnamon.is>
Co-authored-by: Linh Nguyen <70562198+linhnguyen-cinnamon@users.noreply.github.com>
Co-authored-by: linhnguyen-cinnamon <cinmc0019@CINMC0019-LinhNguyen.local>
Co-authored-by: cin-jacky <101088014+jacky0218@users.noreply.github.com>
Co-authored-by: jacky0218 <jacky0218@github.com>
Co-authored-by: kan_cin <kan@cinnamon.is>
Co-authored-by: phv2312 <kat87yb@gmail.com>
Co-authored-by: jeff52415 <jeff.yang@cinnamon.is>
2024-08-26 08:50:37 +07:00

962 lines
35 KiB
Python

import html
import logging
import threading
from collections import defaultdict
from difflib import SequenceMatcher
from functools import partial
from typing import Generator
import numpy as np
import tiktoken
from ktem.llms.manager import llms
from ktem.reasoning.prompt_optimization import (
DecomposeQuestionPipeline,
RewriteQuestionPipeline,
)
from ktem.utils.render import Render
from theflow.settings import settings as flowsettings
from kotaemon.base import (
AIMessage,
BaseComponent,
Document,
HumanMessage,
Node,
RetrievedDocument,
SystemMessage,
)
from kotaemon.indices.qa.citation import CitationPipeline
from kotaemon.indices.splitters import TokenSplitter
from kotaemon.llms import ChatLLM, PromptTemplate
from ..utils import SUPPORTED_LANGUAGE_MAP
from .base import BaseReasoning
logger = logging.getLogger(__name__)
EVIDENCE_MODE_TEXT = 0
EVIDENCE_MODE_TABLE = 1
EVIDENCE_MODE_CHATBOT = 2
EVIDENCE_MODE_FIGURE = 3
MAX_IMAGES = 10
def find_text(search_span, context):
sentence_list = search_span.split("\n")
matches = []
# don't search for small text
if len(search_span) > 5:
for sentence in sentence_list:
match = SequenceMatcher(
None, sentence, context, autojunk=False
).find_longest_match()
if match.size > len(sentence) * 0.35:
matches.append((match.b, match.b + match.size))
return matches
class PrepareEvidencePipeline(BaseComponent):
"""Prepare the evidence text from the list of retrieved documents
This step usually happens after `DocumentRetrievalPipeline`.
Args:
trim_func: a callback function or a BaseComponent, that splits a large
chunk of text into smaller ones. The first one will be retained.
"""
max_context_length: int = 32000
trim_func: TokenSplitter | None = None
def run(self, docs: list[RetrievedDocument]) -> Document:
evidence = ""
images = []
table_found = 0
evidence_modes = []
evidence_trim_func = (
self.trim_func
if self.trim_func
else TokenSplitter(
chunk_size=self.max_context_length,
chunk_overlap=0,
separator=" ",
tokenizer=partial(
tiktoken.encoding_for_model("gpt-3.5-turbo").encode,
allowed_special=set(),
disallowed_special="all",
),
)
)
for _id, retrieved_item in enumerate(docs):
retrieved_content = ""
page = retrieved_item.metadata.get("page_label", None)
source = filename = retrieved_item.metadata.get("file_name", "-")
if page:
source += f" (Page {page})"
if retrieved_item.metadata.get("type", "") == "table":
evidence_modes.append(EVIDENCE_MODE_TABLE)
if table_found < 5:
retrieved_content = retrieved_item.metadata.get(
"table_origin", retrieved_item.text
)
if retrieved_content not in evidence:
table_found += 1
evidence += (
f"<br><b>Table from {source}</b>\n"
+ retrieved_content
+ "\n<br>"
)
elif retrieved_item.metadata.get("type", "") == "chatbot":
evidence_modes.append(EVIDENCE_MODE_CHATBOT)
retrieved_content = retrieved_item.metadata["window"]
evidence += (
f"<br><b>Chatbot scenario from {filename} (Row {page})</b>\n"
+ retrieved_content
+ "\n<br>"
)
elif retrieved_item.metadata.get("type", "") == "image":
evidence_modes.append(EVIDENCE_MODE_FIGURE)
retrieved_content = retrieved_item.metadata.get("image_origin", "")
retrieved_caption = html.escape(retrieved_item.get_content())
evidence += (
f"<br><b>Figure from {source}</b>\n"
+ "<img width='85%' src='<src>' "
+ f"alt='{retrieved_caption}'/>"
+ "\n<br>"
)
images.append(retrieved_content)
else:
if "window" in retrieved_item.metadata:
retrieved_content = retrieved_item.metadata["window"]
else:
retrieved_content = retrieved_item.text
retrieved_content = retrieved_content.replace("\n", " ")
if retrieved_content not in evidence:
evidence += (
f"<br><b>Content from {source}: </b> "
+ retrieved_content
+ " \n<br>"
)
# resolve evidence mode
evidence_mode = EVIDENCE_MODE_TEXT
if EVIDENCE_MODE_FIGURE in evidence_modes:
evidence_mode = EVIDENCE_MODE_FIGURE
elif EVIDENCE_MODE_TABLE in evidence_modes:
evidence_mode = EVIDENCE_MODE_TABLE
# trim context by trim_len
print("len (original)", len(evidence))
if evidence:
texts = evidence_trim_func([Document(text=evidence)])
evidence = texts[0].text
print("len (trimmed)", len(evidence))
return Document(content=(evidence_mode, evidence, images))
DEFAULT_QA_TEXT_PROMPT = (
"Use the following pieces of context to answer the question at the end in detail with clear explanation. " # noqa: E501
"If you don't know the answer, just say that you don't know, don't try to "
"make up an answer. Give answer in "
"{lang}.\n\n"
"{context}\n"
"Question: {question}\n"
"Helpful Answer:"
)
DEFAULT_QA_TABLE_PROMPT = (
"Use the given context: texts, tables, and figures below to answer the question, "
"then provide answer with clear explanation."
"If you don't know the answer, just say that you don't know, "
"don't try to make up an answer. Give answer in {lang}.\n\n"
"Context:\n"
"{context}\n"
"Question: {question}\n"
"Helpful Answer:"
) # noqa
DEFAULT_QA_CHATBOT_PROMPT = (
"Pick the most suitable chatbot scenarios to answer the question at the end, "
"output the provided answer text. If you don't know the answer, "
"just say that you don't know. Keep the answer as concise as possible. "
"Give answer in {lang}.\n\n"
"Context:\n"
"{context}\n"
"Question: {question}\n"
"Answer:"
) # noqa
DEFAULT_QA_FIGURE_PROMPT = (
"Use the given context: texts, tables, and figures below to answer the question. "
"If you don't know the answer, just say that you don't know. "
"Give answer in {lang}.\n\n"
"Context: \n"
"{context}\n"
"Question: {question}\n"
"Answer: "
) # noqa
DEFAULT_REWRITE_PROMPT = (
"Given the following question, rephrase and expand it "
"to help you do better answering. Maintain all information "
"in the original question. Keep the question as concise as possible. "
"Give answer in {lang}\n"
"Original question: {question}\n"
"Rephrased question: "
) # noqa
CONTEXT_RELEVANT_WARNING_SCORE = 0.7
class AnswerWithContextPipeline(BaseComponent):
"""Answer the question based on the evidence
Args:
llm: the language model to generate the answer
citation_pipeline: generates citation from the evidence
qa_template: the prompt template for LLM to generate answer (refer to
evidence_mode)
qa_table_template: the prompt template for LLM to generate answer for table
(refer to evidence_mode)
qa_chatbot_template: the prompt template for LLM to generate answer for
pre-made scenarios (refer to evidence_mode)
lang: the language of the answer. Currently support English and Japanese
"""
llm: ChatLLM = Node(default_callback=lambda _: llms.get_default())
vlm_endpoint: str = getattr(flowsettings, "KH_VLM_ENDPOINT", "")
use_multimodal: bool = getattr(flowsettings, "KH_REASONINGS_USE_MULTIMODAL", True)
citation_pipeline: CitationPipeline = Node(
default_callback=lambda _: CitationPipeline(llm=llms.get_default())
)
qa_template: str = DEFAULT_QA_TEXT_PROMPT
qa_table_template: str = DEFAULT_QA_TABLE_PROMPT
qa_chatbot_template: str = DEFAULT_QA_CHATBOT_PROMPT
qa_figure_template: str = DEFAULT_QA_FIGURE_PROMPT
enable_citation: bool = False
system_prompt: str = ""
lang: str = "English" # support English and Japanese
n_last_interactions: int = 5
def get_prompt(self, question, evidence, evidence_mode: int):
"""Prepare the prompt and other information for LLM"""
if evidence_mode == EVIDENCE_MODE_TEXT:
prompt_template = PromptTemplate(self.qa_template)
elif evidence_mode == EVIDENCE_MODE_TABLE:
prompt_template = PromptTemplate(self.qa_table_template)
elif evidence_mode == EVIDENCE_MODE_FIGURE:
if self.use_multimodal:
prompt_template = PromptTemplate(self.qa_figure_template)
else:
prompt_template = PromptTemplate(self.qa_template)
else:
prompt_template = PromptTemplate(self.qa_chatbot_template)
prompt = prompt_template.populate(
context=evidence,
question=question,
lang=self.lang,
)
return prompt, evidence
def run(
self, question: str, evidence: str, evidence_mode: int = 0, **kwargs
) -> Document:
return self.invoke(question, evidence, evidence_mode, **kwargs)
def invoke(
self,
question: str,
evidence: str,
evidence_mode: int = 0,
images: list[str] = [],
**kwargs,
) -> Document:
raise NotImplementedError
async def ainvoke( # type: ignore
self,
question: str,
evidence: str,
evidence_mode: int = 0,
images: list[str] = [],
**kwargs,
) -> Document:
"""Answer the question based on the evidence
In addition to the question and the evidence, this method also take into
account evidence_mode. The evidence_mode tells which kind of evidence is.
The kind of evidence affects:
1. How the evidence is represented.
2. The prompt to generate the answer.
By default, the evidence_mode is 0, which means the evidence is plain text with
no particular semantic representation. The evidence_mode can be:
1. "table": There will be HTML markup telling that there is a table
within the evidence.
2. "chatbot": There will be HTML markup telling that there is a chatbot.
This chatbot is a scenario, extracted from an Excel file, where each
row corresponds to an interaction.
Args:
question: the original question posed by user
evidence: the text that contain relevant information to answer the question
(determined by retrieval pipeline)
evidence_mode: the mode of evidence, 0 for text, 1 for table, 2 for chatbot
"""
raise NotImplementedError
def stream( # type: ignore
self,
question: str,
evidence: str,
evidence_mode: int = 0,
images: list[str] = [],
**kwargs,
) -> Generator[Document, None, Document]:
history = kwargs.get("history", [])
print(f"Got {len(images)} images")
# check if evidence exists, use QA prompt
if evidence:
prompt, evidence = self.get_prompt(question, evidence, evidence_mode)
else:
prompt = question
# retrieve the citation
citation = None
def citation_call():
nonlocal citation
citation = self.citation_pipeline(context=evidence, question=question)
if evidence and self.enable_citation:
# execute function call in thread
citation_thread = threading.Thread(target=citation_call)
citation_thread.start()
else:
citation_thread = None
output = ""
logprobs = []
messages = []
if self.system_prompt:
messages.append(SystemMessage(content=self.system_prompt))
for human, ai in history[-self.n_last_interactions :]:
messages.append(HumanMessage(content=human))
messages.append(AIMessage(content=ai))
if self.use_multimodal and evidence_mode == EVIDENCE_MODE_FIGURE:
# create image message:
messages.append(
HumanMessage(
content=[
{"type": "text", "text": prompt},
]
+ [
{
"type": "image_url",
"image_url": {"url": image},
}
for image in images[:MAX_IMAGES]
],
)
)
else:
# append main prompt
messages.append(HumanMessage(content=prompt))
try:
# try streaming first
print("Trying LLM streaming")
for out_msg in self.llm.stream(messages):
output += out_msg.text
logprobs += out_msg.logprobs
yield Document(channel="chat", content=out_msg.text)
except NotImplementedError:
print("Streaming is not supported, falling back to normal processing")
output = self.llm(messages).text
yield Document(channel="chat", content=output)
if logprobs:
qa_score = np.exp(np.average(logprobs))
else:
qa_score = None
if citation_thread:
citation_thread.join()
answer = Document(
text=output,
metadata={"citation": citation, "qa_score": qa_score},
)
return answer
class AddQueryContextPipeline(BaseComponent):
n_last_interactions: int = 5
llm: ChatLLM = Node(default_callback=lambda _: llms.get_default())
def run(self, question: str, history: list) -> Document:
messages = [
SystemMessage(
content="Below is a history of the conversation so far, and a new "
"question asked by the user that needs to be answered by searching "
"in a knowledge base.\nYou have access to a Search index "
"with 100's of documents.\nGenerate a search query based on the "
"conversation and the new question.\nDo not include cited source "
"filenames and document names e.g info.txt or doc.pdf in the search "
"query terms.\nDo not include any text inside [] or <<>> in the "
"search query terms.\nDo not include any special characters like "
"'+'.\nIf the question is not in English, rewrite the query in "
"the language used in the question.\n If the question contains enough "
"information, return just the number 1\n If it's unnecessary to do "
"the searching, return just the number 0."
),
HumanMessage(content="How did crypto do last year?"),
AIMessage(
content="Summarize Cryptocurrency Market Dynamics from last year"
),
HumanMessage(content="What are my health plans?"),
AIMessage(content="Show available health plans"),
]
for human, ai in history[-self.n_last_interactions :]:
messages.append(HumanMessage(content=human))
messages.append(AIMessage(content=ai))
messages.append(HumanMessage(content=f"Generate search query for: {question}"))
resp = self.llm(messages).text
if resp == "0":
return Document(content="")
if resp == "1":
return Document(content=question)
return Document(content=resp)
class FullQAPipeline(BaseReasoning):
"""Question answering pipeline. Handle from question to answer"""
class Config:
allow_extra = True
# configuration parameters
trigger_context: int = 150
use_rewrite: bool = False
retrievers: list[BaseComponent]
evidence_pipeline: PrepareEvidencePipeline = PrepareEvidencePipeline.withx()
answering_pipeline: AnswerWithContextPipeline = AnswerWithContextPipeline.withx()
rewrite_pipeline: RewriteQuestionPipeline | None = None
add_query_context: AddQueryContextPipeline = AddQueryContextPipeline.withx()
def retrieve(
self, message: str, history: list
) -> tuple[list[RetrievedDocument], list[Document]]:
"""Retrieve the documents based on the message"""
# if len(message) < self.trigger_context:
# # prefer adding context for short user questions, avoid adding context for
# # long questions, as they are likely to contain enough information
# # plus, avoid the situation where the original message is already too long
# # for the model to handle
# query = self.add_query_context(message, history).content
# else:
# query = message
# print(f"Rewritten query: {query}")
query = None
if not query:
# TODO: previously return [], [] because we think this message as something
# like "Hello", "I need help"...
query = message
docs, doc_ids = [], []
plot_docs = []
for idx, retriever in enumerate(self.retrievers):
retriever_node = self._prepare_child(retriever, f"retriever_{idx}")
retriever_docs = retriever_node(text=query)
retriever_docs_text = []
retriever_docs_plot = []
for doc in retriever_docs:
if doc.metadata.get("type", "") == "plot":
retriever_docs_plot.append(doc)
else:
retriever_docs_text.append(doc)
for doc in retriever_docs_text:
if doc.doc_id not in doc_ids:
docs.append(doc)
doc_ids.append(doc.doc_id)
plot_docs.extend(retriever_docs_plot)
info = [
Document(
channel="info",
content=Render.collapsible_with_header(doc, open_collapsible=True),
)
for doc in docs
] + [
Document(
channel="plot",
content=doc.metadata.get("data", ""),
)
for doc in plot_docs
]
return docs, info
def prepare_citations(self, answer, docs) -> tuple[list[Document], list[Document]]:
"""Prepare the citations to show on the UI"""
with_citation, without_citation = [], []
spans = defaultdict(list)
has_llm_score = any("llm_trulens_score" in doc.metadata for doc in docs)
if answer.metadata["citation"] and answer.metadata["citation"].answer:
for fact_with_evidence in answer.metadata["citation"].answer:
for quote in fact_with_evidence.substring_quote:
matched_excerpts = []
for doc in docs:
matches = find_text(quote, doc.text)
for start, end in matches:
if "|" not in doc.text[start:end]:
spans[doc.doc_id].append(
{
"start": start,
"end": end,
}
)
matched_excerpts.append(doc.text[start:end])
print("Matched citation:", quote, matched_excerpts),
id2docs = {doc.doc_id: doc for doc in docs}
not_detected = set(id2docs.keys()) - set(spans.keys())
# render highlight spans
for _id, ss in spans.items():
if not ss:
not_detected.add(_id)
continue
cur_doc = id2docs[_id]
highlight_text = ""
ss = sorted(ss, key=lambda x: x["start"])
text = cur_doc.text[: ss[0]["start"]]
for idx, span in enumerate(ss):
to_highlight = cur_doc.text[span["start"] : span["end"]]
if len(to_highlight) > len(highlight_text):
highlight_text = to_highlight
text += Render.highlight(to_highlight)
if idx < len(ss) - 1:
text += cur_doc.text[span["end"] : ss[idx + 1]["start"]]
text += cur_doc.text[ss[-1]["end"] :]
# add to display list
with_citation.append(
Document(
channel="info",
content=Render.collapsible_with_header_score(
cur_doc,
override_text=text,
highlight_text=highlight_text,
open_collapsible=True,
),
)
)
print("Got {} cited docs".format(len(with_citation)))
sorted_not_detected_items_with_scores = [
(id_, id2docs[id_].metadata.get("llm_trulens_score", 0.0))
for id_ in not_detected
]
sorted_not_detected_items_with_scores.sort(key=lambda x: x[1], reverse=True)
for id_, _ in sorted_not_detected_items_with_scores:
doc = id2docs[id_]
doc_score = doc.metadata.get("llm_trulens_score", 0.0)
is_open = not has_llm_score or (
doc_score > CONTEXT_RELEVANT_WARNING_SCORE and len(with_citation) == 0
)
without_citation.append(
Document(
channel="info",
content=Render.collapsible_with_header_score(
doc, open_collapsible=is_open
),
)
)
return with_citation, without_citation
def show_citations(self, answer, docs):
# show the evidence
with_citation, without_citation = self.prepare_citations(answer, docs)
if not with_citation and not without_citation:
yield Document(channel="info", content="<h5><b>No evidence found.</b></h5>")
else:
# clear the Info panel
max_llm_rerank_score = max(
doc.metadata.get("llm_trulens_score", 0.0) for doc in docs
)
has_llm_score = any("llm_trulens_score" in doc.metadata for doc in docs)
# clear previous info
yield Document(channel="info", content=None)
# yield warning message
if has_llm_score and max_llm_rerank_score < CONTEXT_RELEVANT_WARNING_SCORE:
yield Document(
channel="info",
content=(
"<h5>WARNING! Context relevance score is low. "
"Double check the model answer for correctness.</h5>"
),
)
# show QA score
qa_score = (
round(answer.metadata["qa_score"], 2)
if answer.metadata.get("qa_score")
else None
)
if qa_score:
yield Document(
channel="info",
content=f"<h5>Answer confidence: {qa_score}</h5>",
)
yield from with_citation
if without_citation:
yield from without_citation
async def ainvoke( # type: ignore
self, message: str, conv_id: str, history: list, **kwargs # type: ignore
) -> Document: # type: ignore
raise NotImplementedError
def stream( # type: ignore
self, message: str, conv_id: str, history: list, **kwargs # type: ignore
) -> Generator[Document, None, Document]:
if self.use_rewrite and self.rewrite_pipeline:
print("Chosen rewrite pipeline", self.rewrite_pipeline)
message = self.rewrite_pipeline(question=message).text
print("Rewrite result", message)
print(f"Retrievers {self.retrievers}")
# should populate the context
docs, infos = self.retrieve(message, history)
print(f"Got {len(docs)} retrieved documents")
yield from infos
evidence_mode, evidence, images = self.evidence_pipeline(docs).content
def generate_relevant_scores():
nonlocal docs
docs = self.retrievers[0].generate_relevant_scores(message, docs)
# generate relevant score using
if evidence and self.retrievers:
scoring_thread = threading.Thread(target=generate_relevant_scores)
scoring_thread.start()
else:
scoring_thread = None
answer = yield from self.answering_pipeline.stream(
question=message,
history=history,
evidence=evidence,
evidence_mode=evidence_mode,
images=images,
conv_id=conv_id,
**kwargs,
)
# show the evidence
if scoring_thread:
scoring_thread.join()
yield from self.show_citations(answer, docs)
return answer
@classmethod
def get_pipeline(cls, settings, states, retrievers):
"""Get the reasoning pipeline
Args:
settings: the settings for the pipeline
retrievers: the retrievers to use
"""
max_context_length_setting = settings.get("reasoning.max_context_length", 32000)
pipeline = cls(
retrievers=retrievers,
rewrite_pipeline=RewriteQuestionPipeline(),
)
prefix = f"reasoning.options.{cls.get_info()['id']}"
llm_name = settings.get(f"{prefix}.llm", None)
llm = llms.get(llm_name, llms.get_default())
# prepare evidence pipeline configuration
evidence_pipeline = pipeline.evidence_pipeline
evidence_pipeline.max_context_length = max_context_length_setting
# answering pipeline configuration
answer_pipeline = pipeline.answering_pipeline
answer_pipeline.llm = llm
answer_pipeline.citation_pipeline.llm = llm
answer_pipeline.n_last_interactions = settings[f"{prefix}.n_last_interactions"]
answer_pipeline.enable_citation = settings[f"{prefix}.highlight_citation"]
answer_pipeline.system_prompt = settings[f"{prefix}.system_prompt"]
answer_pipeline.qa_template = settings[f"{prefix}.qa_prompt"]
answer_pipeline.lang = SUPPORTED_LANGUAGE_MAP.get(
settings["reasoning.lang"], "English"
)
pipeline.add_query_context.llm = llm
pipeline.add_query_context.n_last_interactions = settings[
f"{prefix}.n_last_interactions"
]
pipeline.trigger_context = settings[f"{prefix}.trigger_context"]
pipeline.use_rewrite = states.get("app", {}).get("regen", False)
if pipeline.rewrite_pipeline:
pipeline.rewrite_pipeline.llm = llm
pipeline.rewrite_pipeline.lang = SUPPORTED_LANGUAGE_MAP.get(
settings["reasoning.lang"], "English"
)
return pipeline
@classmethod
def get_user_settings(cls) -> dict:
from ktem.llms.manager import llms
llm = ""
choices = [("(default)", "")]
try:
choices += [(_, _) for _ in llms.options().keys()]
except Exception as e:
logger.exception(f"Failed to get LLM options: {e}")
return {
"llm": {
"name": "Language model",
"value": llm,
"component": "dropdown",
"choices": choices,
"special_type": "llm",
"info": (
"The language model to use for generating the answer. If None, "
"the application default language model will be used."
),
},
"highlight_citation": {
"name": "Highlight Citation",
"value": True,
"component": "checkbox",
},
"system_prompt": {
"name": "System Prompt",
"value": "This is a question answering system",
},
"qa_prompt": {
"name": "QA Prompt (contains {context}, {question}, {lang})",
"value": DEFAULT_QA_TEXT_PROMPT,
},
"n_last_interactions": {
"name": "Number of interactions to include",
"value": 5,
"component": "number",
"info": "The maximum number of chat interactions to include in the LLM",
},
"trigger_context": {
"name": "Maximum message length for context rewriting",
"value": 150,
"component": "number",
"info": (
"The maximum length of the message to trigger context addition. "
"Exceeding this length, the message will be used as is."
),
},
}
@classmethod
def get_info(cls) -> dict:
return {
"id": "simple",
"name": "Simple QA",
"description": (
"Simple RAG-based question answering pipeline. This pipeline can "
"perform both keyword search and similarity search to retrieve the "
"context. After that it includes that context to generate the answer."
),
}
class FullDecomposeQAPipeline(FullQAPipeline):
def answer_sub_questions(
self, messages: list, conv_id: str, history: list, **kwargs
):
output_str = ""
for idx, message in enumerate(messages):
yield Document(
channel="chat",
content=f"<br><b>Sub-question {idx + 1}</b>"
f"<br>{message}<br><b>Answer</b><br>",
)
# should populate the context
docs, infos = self.retrieve(message, history)
print(f"Got {len(docs)} retrieved documents")
yield from infos
evidence_mode, evidence, images = self.evidence_pipeline(docs).content
answer = yield from self.answering_pipeline.stream(
question=message,
history=history,
evidence=evidence,
evidence_mode=evidence_mode,
images=images,
conv_id=conv_id,
**kwargs,
)
output_str += (
f"Sub-question {idx + 1}-th: '{message}'\nAnswer: '{answer.text}'\n\n"
)
return output_str
def stream( # type: ignore
self, message: str, conv_id: str, history: list, **kwargs # type: ignore
) -> Generator[Document, None, Document]:
sub_question_answer_output = ""
if self.rewrite_pipeline:
print("Chosen rewrite pipeline", self.rewrite_pipeline)
result = self.rewrite_pipeline(question=message)
print("Rewrite result", result)
if isinstance(result, Document):
message = result.text
elif (
isinstance(result, list)
and len(result) > 0
and isinstance(result[0], Document)
):
yield Document(
channel="chat",
content="<h4>Sub questions and their answers</h4>",
)
sub_question_answer_output = yield from self.answer_sub_questions(
[r.text for r in result], conv_id, history, **kwargs
)
yield Document(
channel="chat",
content=f"<h4>Main question</h4>{message}<br><b>Answer</b><br>",
)
# should populate the context
docs, infos = self.retrieve(message, history)
print(f"Got {len(docs)} retrieved documents")
yield from infos
evidence_mode, evidence, images = self.evidence_pipeline(docs).content
answer = yield from self.answering_pipeline.stream(
question=message,
history=history,
evidence=evidence + "\n" + sub_question_answer_output,
evidence_mode=evidence_mode,
images=images,
conv_id=conv_id,
**kwargs,
)
# show the evidence
with_citation, without_citation = self.prepare_citations(answer, docs)
if not with_citation and not without_citation:
yield Document(channel="info", content="<h5><b>No evidence found.</b></h5>")
else:
yield Document(channel="info", content=None)
yield from with_citation
yield from without_citation
return answer
@classmethod
def get_user_settings(cls) -> dict:
user_settings = super().get_user_settings()
user_settings["decompose_prompt"] = {
"name": "Decompose Prompt",
"value": DecomposeQuestionPipeline.DECOMPOSE_SYSTEM_PROMPT_TEMPLATE,
}
return user_settings
@classmethod
def get_pipeline(cls, settings, states, retrievers):
"""Get the reasoning pipeline
Args:
settings: the settings for the pipeline
retrievers: the retrievers to use
"""
prefix = f"reasoning.options.{cls.get_info()['id']}"
pipeline = cls(
retrievers=retrievers,
rewrite_pipeline=DecomposeQuestionPipeline(
prompt_template=settings.get(f"{prefix}.decompose_prompt")
),
)
llm_name = settings.get(f"{prefix}.llm", None)
llm = llms.get(llm_name, llms.get_default())
# answering pipeline configuration
answer_pipeline = pipeline.answering_pipeline
answer_pipeline.llm = llm
answer_pipeline.citation_pipeline.llm = llm
answer_pipeline.n_last_interactions = settings[f"{prefix}.n_last_interactions"]
answer_pipeline.enable_citation = settings[f"{prefix}.highlight_citation"]
answer_pipeline.system_prompt = settings[f"{prefix}.system_prompt"]
answer_pipeline.qa_template = settings[f"{prefix}.qa_prompt"]
answer_pipeline.lang = SUPPORTED_LANGUAGE_MAP.get(
settings["reasoning.lang"], "English"
)
pipeline.add_query_context.llm = llm
pipeline.add_query_context.n_last_interactions = settings[
f"{prefix}.n_last_interactions"
]
pipeline.trigger_context = settings[f"{prefix}.trigger_context"]
pipeline.use_rewrite = states.get("app", {}).get("regen", False)
if pipeline.rewrite_pipeline:
pipeline.rewrite_pipeline.llm = llm
return pipeline
@classmethod
def get_info(cls) -> dict:
return {
"id": "complex",
"name": "Complex QA",
"description": (
"Use multi-step reasoning to decompose a complex question into "
"multiple sub-questions. This pipeline can "
"perform both keyword search and similarity search to retrieve the "
"context. After that it includes that context to generate the answer."
),
}