The compliance runtime for AI
Run any AI workload on sovereign infrastructure.
US cloud APIs mean jurisdictional risk. Self-hosting means operational burden. We federate spot GPU capacity across EU providers into a single, reliable compute pool — up to 75% cheaper than realtime inference.
How it works
OpenAI-compatible batch API. Submit jobs, pick your delivery window, get results via webhook. Our orchestration layer routes to optimal GPU capacity across our EU provider network.
Submit a batch job
from openai import OpenAI
client = OpenAI(
api_key="sf-...",
base_url="https://api.sference.eu/v1/"
)
batch = client.batches.create(
input_file_id=upload.id,
endpoint="/chat/completions",
completion_window="priority", # ~1hr, or "overnight"
metadata={"webhook": "https://you.com/hook"}
)Get results via webhook
// POST https://you.com/hook
{
"id": "batch_abc123",
"status": "completed",
"output_file_id": "file-xyz789",
"request_counts": { "completed": 4892, "failed": 0 },
"compliance": {
"data_residency": "EU (Slovenia)",
"model": "Qwen3.5-35B-A3B",
"audit_log": "https://dashboard.sference.eu/audit/batch_abc123"
}
}Full audit trail — configurable retention, exportable reports
Any model. Including yours.
Open-weight models from the Qwen, Mistral, and Llama families — or bring your own fine-tune. If it runs on vLLM/SGLang, we serve it. No model lock-in.
Trade latency for cost
Priority (~1hr) and overnight (~24hr) delivery windows. We use spot and preemptible GPU capacity — batch workloads are naturally interruptible and resumable. The structural cost advantage of non-realtime processing.
Sovereign infrastructure. No jurisdictional risk.
Every request processed on EU GPUs. No US CLOUD Act exposure. Full model and version transparency per request. DPA included. EU-only processing guaranteed architecturally, not by policy.
Compliance runtime built in
Full request traceability. Configurable retention. Exportable compliance reports. Your compliance dashboard serves both your team and your customer's compliance officer.
How we're different
Most batch APIs give you a 50% discount and a 24-hour window. Our architecture goes further.
We abstract across multiple EU GPU providers — different hardware generations, different pricing. Workloads route to the best available capacity. No single-vendor dependency.
Non-realtime processing lets us use preemptible and spot capacity at significant discounts. Batch workloads are naturally interruptible and resumable — if a spot instance is reclaimed, the orchestrator reschedules remaining chunks.
Without millisecond latency requirements, we cold-start models per batch job rather than keeping them resident in GPU memory. This enables BYOM — upload your fine-tuned weights, we load for the job, process, and release.
Each batch decomposes into chunks distributed across available GPUs. The orchestrator handles scheduling, fault tolerance, checkpoint resumption, and provider selection.
Full request traceability, configurable retention, exportable reports, transparent model provenance. Built into the infrastructure, not bolted on after the fact.
US batch APIs offer discounts but no EU sovereignty or BYOM. Inference platforms optimize for realtime only. EU datacenters sell raw GPU hours with no batch optimization or compliance tooling. Compliance platforms don't provide inference. We combine async batch, any model including BYOM, EU sovereignty, and compliance traceability — nobody else does all five.
Built for regulated verticals
For SaaS companies whose customers demand compliance. One integration brings thousands of end-users through your API — with audit trails their compliance teams can verify.
Batch KYC extraction, transaction classification, statement processing. Overnight processing with full audit trail for regulated financial data.
Contract corpus analysis, document review, embedding generation for legal RAG. Sovereign processing for sensitive legal data.
Medical record digitization, prescription extraction, clinical data processing. Full compliance traceability for patient data.
Claims processing, policy document analysis, underwriting data extraction. Structured output with configurable retention.
Model evals, synthetic data generation, fine-tuning data prep on sensitive datasets. Run thousands of evaluations in hours, not days.
Invoices, contracts, forms at scale. Any open-weight model or your own fine-tune. Cost-optimized batch processing with full governance.
Pricing
Pick a delivery window. We use spot and preemptible GPU capacity — the longer you can wait, the deeper the discount.
Prompt iteration and testing pipelines.
Background agents and production workflows.
Large batch jobs and bulk processing.
No credit card required. No minimum spend. Pay only for tokens used.
For your compliance team
The section your engineer can forward to their CTO — and their customer's compliance officer. Our compliance dashboard serves both layers: operational for your team, audit-ready for your customer.
Your customers keep asking where their data goes.
Now you have an answer. Sovereign infrastructure, full audit trail, exportable compliance reports, DPA included. Give your customer's compliance team a dashboard link — not a "we take security seriously" PDF. EU-only processing guaranteed architecturally, not by policy.
DORA is in enforcement. AI Act begins August 2026.
DORA already requires financial institutions to assess third-party AI risk. The EU AI Act's deployer obligations begin August 2026 — transparency and traceability for any AI touching regulated data. We build it into the infrastructure so you don't have to.
Bring your own model, keep compliance.
Fine-tuned on proprietary data? Run it on our infrastructure with the same compliance guarantees as any catalog model. Same audit trail, same dashboard, same exportable reports. Transparent model provenance — you know exactly what processed your data.
Built by engineers from
Founded Specto (acquired by Sentry). Director of Engineering at Sentry, leading teams processing billions of events/day. Former Tech Lead at Facebook.
Sr ML Engineering Manager at Adobe, leading 20K+ GPU AI Platform for Adobe Firefly. Former VP of Engineering & Product at Celtra.
CEO of Iryo (healthcare tech). Deep network in Slovenian and EU tech ecosystems.