Neurosurfer API (Backend)¶

Neurosurfer’s backend is a FastAPI server that exposes an OpenAI‑compatible surface for chat completions and a small set of ergonomic decorators to register your logic. You get streaming, JWT/cookie auth, a model registry, thread‑scoped RAG, and typed custom routes—without wiring FastAPI by hand.

What you’ll find here¶

A compact request flow for /v1/chat/completions (sync/async, streaming or not)
Decorator‑driven APIs: @app.chat() and @app.endpoint(...)
Built‑in auth (bcrypt + JWT) that works for browsers (cookies) and API clients (bearer)
Integration points for RAG and model registries

For deeper topics, jump directly to the dedicated pages linked below.

Components¶

Configuration

Centralized settings via Pydantic (models, CORS, DB paths, RAG, server flags). Start here to wire your environment.

Documentation
Lifecycle Hooks

Start/stop callbacks to load models, warm resources, initialize RAG, and clean up.

Documentation
Chat Handlers

Your entry point for /v1/chat/completions. Register once, stream or return a final message, and compose RAG/services inside.

Documentation
Custom Endpoints

Add typed REST routes with request/response models and dependencies—perfect for utilities, admin, and service façades.

Documentation
Auth & Users

bcrypt passwords, JWT tokens (header or HttpOnly cookie), and slim dependencies for required/optional auth.

Documentation

CORS Configuration

By default, CORS is enabled for all origins. This has been done by setting allow_origin_regex to .*. To restrict access, set cors_origins in the config. See Configuration for more details.

Quick start¶

Minimal handler¶

from neurosurfer.server.app import NeurosurferApp

app = NeurosurferApp()

@app.chat()
def handle_chat(request, ctx):
    return f"You said: {request.messages[-1]['content']}"

With a model registry¶

app = NeurosurferApp()

app.model_registry.add(
    id="qwen3",
    family="Qwen",
    provider="Qwen",
    context_length=8192,
)

@app.chat()
async def handle_chat(request, ctx):
    # stream an answer (pseudo)
    for token in generate_tokens(request):
        yield {"choices":[{"delta":{"content": token}}]}

With RAG orchestration¶

from neurosurfer.server.services.rag_orchestrator import RAGOrchestrator

rag = RAGOrchestrator(embedder=your_embedder, persist_dir="./vector_store", top_k=8)

@app.chat()
def handle_chat(request, ctx):
    user_query = request.messages[-1]["content"]
    # rag.apply(...) may augment the query based on thread uploads
    # then your LLM answers with the augmented prompt
    return your_llm.generate(user_query)

Where next?¶

Handlers: learn streaming, request shapes, and optional tool prompts → Chat Handlers
Typed routes: build utility/admin APIs → Custom Endpoints
Auth: headers, cookies, dependencies → Auth & Users
Boot sequence: models, RAG, warmups → Lifecycle Hooks
Configure it all: env vars, CORS, ports → Configuration