SQLSchemaStore¶
Module:
neurosurfer.db.sql.sql_schema_store.SQLSchemaStore
SQLSchemaStore extracts per-table schema strings from a live database and (optionally) asks an LLM to produce concise JSON summaries. It persists these to disk as a single JSON file you can reuse in agents/tools without re-querying the DB on every run.
What it stores¶
Each table is keyed by name and contains at least:
{
"table_name": "users",
"summary": "What data the table stores and its role in the system.",
"schema": "CREATE TABLE ... /* sample rows ... */"
}
When
summarize=True, thesummaryis generated by your provided LLM according to a strict prompt; otherwise you can store raw schema only (extend as needed).
Initialize¶
from neurosurfer.db.sql.sql_schema_store import SQLSchemaStore
from neurosurfer.models.chat_models.openai_like import OpenAILikeModel # implements BaseModel
llm = OpenAILikeModel(model="gpt-4o-mini") # optional if you won't summarize
store = SQLSchemaStore(
db_uri="postgresql://user:pass@localhost:5432/analytics",
llm=llm, # required if summarize=True
sample_rows_in_table_info=3, # how many rows to include in schema text
storage_path="./" # where to write sql_schema_store_<db>.json
)
Train (extract and optionally summarize)¶
train(...) is a generator that yields the number of processed tables (for progress bars).
for n in store.train(summarize=True, force=True):
# e.g., update a progress UI
pass
print("Saved to:", store.storage_path)
print("Tables summarized:", store.get_tables_count())
summarize=False→ store raw schema strings (you can add summaries later).summarize=True→ uses yourllmto generate concise JSON summaries per table.force=True→ clears any existing file and retrains all tables.
The built-in prompt enforces a strict JSON response:
System: You are a master of SQL...
User: Given the schema of {table_name} ... Output this JSON object: {"table_name": "...", "summary": "..."}
Read & use¶
data = store.get_table_data("users") # dict with keys: table_name, summary, schema
count = store.get_tables_count() # number of usable tables
tables = store.db.get_usable_table_names() # from the underlying SQLDatabase
These summaries are ideal inputs for relevant-table selection, SQL generation, or DB insights tools. See the built-in tools under Tools → Built-in Tools → SQL.
API reference¶
__init__(db_uri, llm=None, sample_rows_in_table_info=3, storage_path=None, logger=None)¶
- Creates an internal
SQLDatabasewithview_support=Trueand sets up a JSON file namedsql_schema_store_<db>.jsonbeneathstorage_path(default./). - If
llmis provided, it must satisfy theBaseModelinterface with.ask(...)returning an OpenAI-like response dict.
train(summarize: bool = False, force: bool = False) -> Generator[int, None, None]¶
- Extracts schema strings for all usable tables.
- If
summarize=True, callssummarize_schema__(table, schema)per table. - Yields the count of processed tables for progress reporting.
- Writes the final store to
storage_pathwhen complete.
get_all_table_schemas() -> dict[str, str]¶
Returns a mapping {table_name: schema_string} by delegating to the underlying SQLDatabase (get_table_info([t])).
get_table_data(table: str) -> dict | None¶
Fetches the stored JSON object for a single table.
save_to_file() / load_from_file()¶
Persist and restore the store JSON (storage_path).
get_db_name() -> str | None¶
Extracts the DB name from the URI; used to build the default filename.
get_tables_count() -> int¶
Returns the number of usable tables from SQLDatabase.
Tips & caveats¶
- Ensure your DB user has metadata privileges (and read access for
LIMIT Nsamples). - You can postprocess or augment summaries (e.g., add row counts, PK/FK lists).
- Treat generated JSON carefully in downstream code; validate before use in agents.
- For very large schemas, consider chunking or filtering tables to keep context sizes manageable.