Skip to content

SQLDatabase

Module: neurosurfer.db.sql.sql_database.SQLDatabase

A production-grade SQLAlchemy wrapper used by Neurosurfer’s SQL agents/tools. It provides schema introspection, table/view filtering, sample rows, and metadata caching so agents can plan reliably without repeated, slow reflection calls.


Features

  • 🔌 SQLAlchemy-powered: works with Postgres, MySQL, SQLite, SQL Server (via ODBC), etc.
  • 🧭 Table/View discovery: include/ignore lists + optional view reflection.
  • 🔎 Schema text: emits CREATE TABLE DDL plus optional indexes and sample rows.
  • 🚀 Metadata cache: reflective metadata is cached to disk; TTL and force refresh supported.
  • 🧪 Safe query execution: convenience run(...) pattern (if present in your codebase) and low-level helpers demonstrated below.
  • 🧰 Utility: build_connection_string(...) for consistent DSNs.

Initialization

from neurosurfer.db.sql.sql_database import SQLDatabase

db = SQLDatabase(
    database_uri="postgresql://user:pass@localhost:5432/analytics",
    schema=None,                         # target schema (optional)
    ignore_tables=None,                  # e.g., ["migrations"]
    include_tables=None,                 # OR a whitelist, e.g., ["users", "orders"]
    sample_rows_in_table_info=3,         # rows per table to include under /* ... */
    indexes_in_table_info=False,         # include index info in DDL block
    custom_table_info=None,              # {table_name: "pre-rendered info string"}
    view_support=False,                  # include views in reflection
    max_string_length=300,               # formatting clamp for sample values
    lazy_table_reflection=False,         # when False, reflect eagerly
    metadata_cache_dir="~/.cache/neurosurfer/sqlmeta",
    force_refresh=False,
    cache_ttl_seconds=86400,             # expire cache after N seconds (None = never)
)

Parameter reference

Name Type Default Description
database_uri str | URL SQLAlchemy connection string.
schema str? None Schema namespace to inspect (dialect-dependent).
ignore_tables list[str]? None Exclude these tables from all operations.
include_tables list[str]? None Restrict operations to these tables only. Mutually exclusive with ignore_tables.
sample_rows_in_table_info int 3 Number of example rows to include per table in get_table_info().
indexes_in_table_info bool False Include index metadata in get_table_info() output.
custom_table_info dict? None Pre-rendered table strings; bypass reflection for those tables.
view_support bool False Reflect views in addition to tables.
max_string_length int 300 Clamp cell values when printing sample rows.
lazy_table_reflection bool False If False, reflect eagerly (or load from cache). If True, reflect on first use.
metadata_cache_dir str | Path? ~/.cache/sqlalchemy_metadata Where to store metadata cache files.
force_refresh bool False Ignore cache and rebuild metadata now.
cache_ttl_seconds int? None If set, reload metadata when cache age exceeds TTL.

Common tasks

1) List usable tables

db.get_usable_table_names()
# honors include/ignore and (optional) view_support

2) Get schema info (DDL + extras)

print(db.get_table_info())          # all usable tables
print(db.get_table_info(["users"])) # specific tables

Output contains CREATE TABLE ... followed by an optional /* ... */ block with indexes and N sample rows, e.g.:

CREATE TABLE users (
  id BIGINT PRIMARY KEY,
  email TEXT NOT NULL,
  ...
)

/*
Table Indexes:
...

3 rows from users table:
id  email   ...
1   alice@example.com   ...
2   bob@example.com ...
3   ...
*/

3) Build DSN safely

from neurosurfer.db.sql.sql_database import SQLDatabase

dsn = SQLDatabase.build_connection_string(
    server="db.acme.com",
    database="analytics",
    username="svc_app",
    password="s3cr3t!",
    driver="mssql+pyodbc",
    odbc_driver="ODBC Driver 18 for SQL Server",
    port="1433",
)

Metadata caching

When lazy_table_reflection=False, SQLDatabase attempts to load previously reflected MetaData from a cache file whose name is derived from:

  • dialect (e.g., postgresql),
  • database identifier (never includes credentials),
  • schema, view_support, and the set of usable tables.

You control freshness and behavior with:

  • cache_ttl_seconds — consider cache stale after this many seconds,
  • force_refresh=True — ignore cache and rebuild now.

For sqlite:///:memory: the cache is skipped entirely.


Notes & gotchas

  • If both include_tables and ignore_tables are set → a ValueError is raised.
  • When asking for table_names that don’t exist → you’ll get a ValueError.
  • Sample rows use SELECT ... LIMIT N; some dialects may return ProgrammingError for empty tables (handled).
  • JSON/unknown-typed columns (NullType) are excluded from printed DDL for clarity.