Relevant Table Schema Finder (LLM)¶
Module:
neurosurfer.tools.sql.relevant_table_schema_retriever.RelevantTableSchemaFinderLLM
Pairs with:BaseTool•ToolSpec•Toolkit
Overview¶
RelevantTableSchemaFinderLLM uses an LLM to select the most relevant tables for a user’s question based on table summaries, then retrieves the corresponding schemas from your schema store. It’s typically used early in a workflow before generating SQL.
When to Use¶
- You have many tables and need to narrow down which are relevant to the question.
- You want to assemble a schema context to pass to
SQLQueryGenerator.
Spec (Inputs & Returns)¶
| Field | Type | Required | Description |
|---|---|---|---|
query | string | ✓ | Natural-language user question. |
Returns: string — A human-readable message listing selected tables.
Extras: schema_context: string — The formatted schemas of selected tables for downstream tools.
Runtime Dependencies & Config¶
- Constructor:
RelevantTableSchemaFinderLLM(llm: BaseModel, sql_schema_store: SQLSchemaStore, logger: logging.Logger | None = None) - Prompt:
RELEVENT_TABLES_PROMPT(note: result must be a valid Python list literal) - Top K:
top_k = 6(upper bound; LLM can return fewer) - Token trimming: uses
RAGRetrieverAgent._trim_context_by_token_limit(...)to fit summaries and adjustmax_new_tokens. - LLM call:
stream=False(expects a one-shot list literal) - Post-processing:
eval(...)is applied to parse the returned list of table names.
⚠️ Security note: Since
evalis used on the LLM response, ensure your LLM and prompts are trusted/controlled. The system prompt strictly instructs a Python list literal only—no text or newlines.
Behavior¶
- Collects table summaries:
"Table: <name>\nSummary: <summary>\n\n". - Trims context to token budget if needed.
- Calls LLM; expects output like
['Users', 'Orders']. - Builds a message and schema context by fetching schemas via
sql_schema_store.get_table_data(name). - Returns
ToolResponse(final_answer=False, observation=<message>, extras={{"schema_context": ...}}).
Usage¶
finder = RelevantTableSchemaFinderLLM(llm=chat_llm, sql_schema_store=schema_store)
resp = finder(query="Show monthly revenue by region in 2024")
schema_ctx = resp.extras["schema_context"] # pass to SQLQueryGenerator
Error Handling & Notes¶
- If the LLM returns an invalid list,
evalmay fail; guard at the agent layer if necessary. - Ensure
sql_schema_storecontains bothsummaryandschemafor each table used. - Special token available:
" [__RELEVANT_TABLES__] "(if you tag content downstream).