The compliance runtime for AI

Run any AI workload on sovereign infrastructure.

$sference batch ./workload.jsonl --model qwen3.5-35b --window overnight

US cloud APIs mean jurisdictional risk. Self-hosting means operational burden. We federate spot GPU capacity across EU providers into a single, reliable compute pool — up to 75% cheaper than realtime inference.

How it works

OpenAI-compatible batch API. Submit jobs, pick your delivery window, get results via webhook. Our orchestration layer routes to optimal GPU capacity across our EU provider network.

1

Submit a batch job

Python
from openai import OpenAI

client = OpenAI(
    api_key="sf-...",
    base_url="https://api.sference.eu/v1/"
)

batch = client.batches.create(
    input_file_id=upload.id,
    endpoint="/chat/completions",
    completion_window="priority",  # ~1hr, or "overnight"
    metadata={"webhook": "https://you.com/hook"}
)
2

Get results via webhook

Response
// POST https://you.com/hook
{
  "id": "batch_abc123",
  "status": "completed",
  "output_file_id": "file-xyz789",
  "request_counts": { "completed": 4892, "failed": 0 },
  "compliance": {
    "data_residency": "EU (Slovenia)",
    "model": "Qwen3.5-35B-A3B",
    "audit_log": "https://dashboard.sference.eu/audit/batch_abc123"
  }
}
3

Full audit trail — configurable retention, exportable reports

Compliance DashboardLive
Export report
Batch jobs (30d)
2,847
Data residency
EU
Retention policy
30 days
Recent batches
14:23:01batch_abc123Qwen3.5-35B4,892completed
13:41:22batch_def456Llama-4-Scout12,100completed
12:08:55batch_ghi789custom-ft-v3891completed
11:52:30batch_jkl012Mistral-Small6,340running

Any model. Including yours.

Open-weight models from the Qwen, Mistral, and Llama families — or bring your own fine-tune. If it runs on vLLM/SGLang, we serve it. No model lock-in.

Trade latency for cost

Priority (~1hr) and overnight (~24hr) delivery windows. We use spot and preemptible GPU capacity — batch workloads are naturally interruptible and resumable. The structural cost advantage of non-realtime processing.

Sovereign infrastructure. No jurisdictional risk.

Every request processed on EU GPUs. No US CLOUD Act exposure. Full model and version transparency per request. DPA included. EU-only processing guaranteed architecturally, not by policy.

Compliance runtime built in

Full request traceability. Configurable retention. Exportable compliance reports. Your compliance dashboard serves both your team and your customer's compliance officer.

How we're different

Most batch APIs give you a 50% discount and a 24-hour window. Our architecture goes further.

Hardware-agnostic GPU federation

We abstract across multiple EU GPU providers — different hardware generations, different pricing. Workloads route to the best available capacity. No single-vendor dependency.

Spot instance economics

Non-realtime processing lets us use preemptible and spot capacity at significant discounts. Batch workloads are naturally interruptible and resumable — if a spot instance is reclaimed, the orchestrator reschedules remaining chunks.

On-demand model loading

Without millisecond latency requirements, we cold-start models per batch job rather than keeping them resident in GPU memory. This enables BYOM — upload your fine-tuned weights, we load for the job, process, and release.

Smart routing and chunking

Each batch decomposes into chunks distributed across available GPUs. The orchestrator handles scheduling, fault tolerance, checkpoint resumption, and provider selection.

Compliance built into the runtime

Full request traceability, configurable retention, exportable reports, transparent model provenance. Built into the infrastructure, not bolted on after the fact.

US batch APIs offer discounts but no EU sovereignty or BYOM. Inference platforms optimize for realtime only. EU datacenters sell raw GPU hours with no batch optimization or compliance tooling. Compliance platforms don't provide inference. We combine async batch, any model including BYOM, EU sovereignty, and compliance traceability — nobody else does all five.

Built for regulated verticals

For SaaS companies whose customers demand compliance. One integration brings thousands of end-users through your API — with audit trails their compliance teams can verify.

FinTech

Batch KYC extraction, transaction classification, statement processing. Overnight processing with full audit trail for regulated financial data.

LegalTech

Contract corpus analysis, document review, embedding generation for legal RAG. Sovereign processing for sensitive legal data.

HealthTech

Medical record digitization, prescription extraction, clinical data processing. Full compliance traceability for patient data.

InsureTech

Claims processing, policy document analysis, underwriting data extraction. Structured output with configurable retention.

AI/ML Teams

Model evals, synthetic data generation, fine-tuning data prep on sensitive datasets. Run thousands of evaluations in hours, not days.

Document Processing

Invoices, contracts, forms at scale. Any open-weight model or your own fine-tune. Cost-optimized batch processing with full governance.

Pricing

Pick a delivery window. We use spot and preemptible GPU capacity — the longer you can wait, the deeper the discount.

Dev Mode
Realtime
Full price

Prompt iteration and testing pipelines.

Priority
~1 hour
Up to 50% off

Background agents and production workflows.

Overnight
~24 hours
Up to 75% off

Large batch jobs and bulk processing.

No credit card required. No minimum spend. Pay only for tokens used.

For your compliance team

The section your engineer can forward to their CTO — and their customer's compliance officer. Our compliance dashboard serves both layers: operational for your team, audit-ready for your customer.

Your customers keep asking where their data goes.

Now you have an answer. Sovereign infrastructure, full audit trail, exportable compliance reports, DPA included. Give your customer's compliance team a dashboard link — not a "we take security seriously" PDF. EU-only processing guaranteed architecturally, not by policy.

Regulatory deadlines

DORA is in enforcement. AI Act begins August 2026.

DORA already requires financial institutions to assess third-party AI risk. The EU AI Act's deployer obligations begin August 2026 — transparency and traceability for any AI touching regulated data. We build it into the infrastructure so you don't have to.

Bring your own model, keep compliance.

Fine-tuned on proprietary data? Run it on our infrastructure with the same compliance guarantees as any catalog model. Same audit trail, same dashboard, same exportable reports. Transparent model provenance — you know exactly what processed your data.

Built by engineers from

Sentry·Adobe·Facebook·Celtra
Jernej Strasner
CEO

Founded Specto (acquired by Sentry). Director of Engineering at Sentry, leading teams processing billions of events/day. Former Tech Lead at Facebook.

Aleksander Pejcic
CTO

Sr ML Engineering Manager at Adobe, leading 20K+ GPU AI Platform for Adobe Firefly. Former VP of Engineering & Product at Celtra.

Benjamin Dobnikar
Business Development

CEO of Iryo (healthcare tech). Deep network in Slovenian and EU tech ecosystems.

AI Act ReadyGDPR CompliantEU Data ResidencyDORA Ready

FAQ