SQLAgent¶
Module: neurosurfer.agents.sql_agent
Overview¶
SQLAgent specializes the ReActAgent for relational databases. It boots with a SQL-focused Toolkit that knows how to:
- Summarize table schemas and cache them locally (
SQLSchemaStore) - Pick the most relevant tables for a question
- Generate a SQL statement
- Execute the statement safely via SQLAlchemy
- Format the result into natural language
Every step is streamed like the base ReAct agent (thoughts → action → observation → … → final). If your app suppresses special tokens, set skip_special_tokens=True in ReActConfig when constructing the agent (see below).
Constructor¶
SQLAgent.__init__¶
SQLAgent(
llm: BaseModel,
db_uri: str,
*,
storage_path: str | None = None,
sample_rows_in_table_info: int = 3,
logger: logging.Logger = logging.getLogger(__name__),
verbose: bool = True,
config: ReActConfig | None = None,
specific_instructions: str | None = None,
)
| Parameter | Type | Description |
|---|---|---|
llm | BaseModel | Model used for reasoning and SQL generation. |
db_uri | str | SQLAlchemy connection string (e.g. postgresql://user:pass@host/db). |
storage_path | str \| None | Optional location for persisting schema summaries. Defaults to the SQLSchemaStore default under the working directory. |
sample_rows_in_table_info | int | Number of example rows to capture when summarizing table schema. |
logger | logging.Logger | Logger for status messages. |
verbose | bool | When True, prints tool calls and observations as they happen. |
config | ReActConfig | None | Advanced configuration (retries, pruning, streaming markers, etc.). Defaults are used when None. |
specific_instructions | str \| None | Optional SQL-specific system addendum. If None, sensible SQL policies are used (discover → generate → execute → format). |
During initialization the agent:
- Creates a SQLAlchemy engine via
create_engine(db_uri) - Instantiates
SQLSchemaStore(handles schema discovery + caching) - Registers the following tools in its toolkit:
RelevantTableSchemaFinderLLM(table selection + schema fetch)SQLQueryGenerator(LLM prompt → SQL string)SQLExecutor(read-only execution via the engine)FinalAnswerFormatter(transforms rows to natural language)DBInsightsTool(high‑level database summaries and health checks)
Note: If your docs use a different path layout, adjust the links above to match your structure.
Usage¶
from neurosurfer.agents.sql_agent import SQLAgent
from neurosurfer.agents.react import ReActConfig
from neurosurfer.models.chat_models.openai import OpenAIModel
llm = OpenAIModel(model_name="gpt-4o-mini")
agent = SQLAgent(
llm=llm,
db_uri="sqlite:///examples/chinook.db",
sample_rows_in_table_info=5,
verbose=True,
config=ReActConfig(skip_special_tokens=False, allow_input_pruning=True, repair_with_llm=True),
)
for chunk in agent.run("List the top 5 artists by total sales."):
print(chunk, end="") # Streams thoughts + tool observations + final answer markers
If you set skip_special_tokens=True in ReActConfig, the agent will not emit <__final_answer__> markers; only the raw text streams. This is useful if your UI has its own finalization logic.
Methods¶
run(user_query: str, **kwargs) -> Generator[str, None, str]¶
Delegates to ReActAgent.run. Pass generation kwargs such as temperature or max_new_tokens. The generator yields formatted strings (thoughts, actions, tool observations) and finally returns the final answer string.
Runtime context (for example injecting a runtime filter) can be supplied via keyword arguments; they are forwarded to every tool invocation.
register_tool(tool: BaseTool) -> None¶
Adds a custom tool to the underlying toolkit and immediately updates the agent to include it in subsequent runs.
get_toolkit() -> Toolkit¶
Returns the preconfigured toolkit if you want to inspect or extend it.
train(summarize: bool = False, force: bool = False) -> Generator¶
Convenience wrapper around SQLSchemaStore.train. Use it to pre-compute or refresh schema summaries outside of a chat session.
SQLSchemaStore; typically summarize=True asks the LLM for narrative summaries of each table, while force=True bypasses existing cache entries. is_trained() -> bool¶
Returns True when at least one cached schema summary is available.
Built-in SQL tools¶
| Tool | Purpose | Typical Inputs | Typical Output |
|---|---|---|---|
RelevantTableSchemaFinderLLM | Selects relevant tables and returns concise schema context (optionally with sample rows). | question: str, optional knobs for limits | Schema text / JSON for downstream SQL generation |
SQLQueryGenerator | Produces a dialect-correct SQL query from the question + schema context. | question: str, schema: str | SQL string |
SQLExecutor | Executes a SQL statement via the configured SQLAlchemy engine. | sql: str, optional params | Rows (list of dicts/tuples) or error info |
FinalAnswerFormatter | Converts rows into a concise, human-readable answer. | rows: Any, question: str | Natural-language summary text |
DBInsightsTool | Provides quick DB-wide insights (table counts, schema stats, anomalies). | optional scope params | Descriptive text / small tables |
Each tool defines a
ToolSpecso the ReAct agent can validate inputs, prune unknown keys (if enabled), and self-repair Actions when needed.
Notes & Best Practices¶
- Read-only by default:
SQLExecutorwill execute whatever SQL you pass it. For production, use read-only credentials unless write operations are explicitly intended and guarded. - Schema cache location: supply
storage_pathif you need deterministic cache placement (e.g., containers, CI). - Streaming UI: distinguish tool observations (e.g.,
Observation:lines) from the final answer markers (unless suppressed). - Domain policy: the agent ships with SQL-specific guidance (discover → generate → execute → format). You can override/extend via
specific_instructions. - Extras between tools: tool
extrasare passed agent-side to the next tool call without going through the LLM. Use this to pass rich Python objects (e.g., compiled queries, parsed schemas) that aren’t easily serializable. - Retries & repair: Action parsing problems and tool failures are automatically repaired with bounded retries (see
ReActConfigandRetryPolicy).
Security Considerations¶
- SQL injection: The agent itself won’t construct queries unsafely if
SQLQueryGeneratoris given correct schema context, but validate inputs and consider parameterized SQL in your execution layer. - Privileges: Provide least-privilege DB credentials. Separate read and write roles where possible.
- Auditing: Log tool calls and queries in production (subject to privacy constraints).