A — Z Reference

Plugsky Documentation

Everything you need to ship AI to production. API reference, SDKs for 6+ languages, every model, auth & quotas, deployment topologies, security controls, migration guide from OpenAI, billing, SLAs, and a real-world examples library.

Getting started

Five minutes from zero to your first model response. The Plugsky API is 100% OpenAI-compatible — change your base_url and your existing code, SDK, and prompts keep working.

Install

# Python (OpenAI SDK already works)
pip install opencode

# Node.js / TypeScript
npm i opencode

# Go
go get github.com/plugsky/opencode-go

# Or use raw HTTP — no SDK required

Your first API call

python
from openai import OpenAI

client = OpenAI(
    api_key="psk_live_…",                          # your Plugsky key
    base_url="https://api.plugsky.com/v1",         # the only line that changes
)

resp = client.chat.completions.create(
    model="plugsky-pro",
    messages=[{"role": "user", "content": "Say hello in 5 languages"}],
    max_tokens=200,
)
print(resp.choices[0].message.content)

That's it. Same request shape, same response shape, same streaming, same function-calling, same JSON mode. Full OpenAI-compatibility reference →

Drop-in. Your existing OpenAI Python, Node, Go, Java, .NET, and cURL clients all work without code changes. Tools like LangChain, LlamaIndex, Vercel AI SDK, and the OpenAI Playground are first-class supported.

Core concepts

  • Model. The inference engine. plugsky-micro through plugsky-frontier, plus 13+ third-party (opencode.ai, NVIDIA NIM, Mistral, Cohere, Stability).
  • Request. A single API call. Token-counted. Priced.
  • Thread / Run. Stateful multi-turn conversation (Assistants API).
  • Tool. A function you expose to the model for function-calling.
  • Knowledge base / Vector store. Indexed documents the model can retrieve from (RAG).
  • Endpoint / Region. Where the model runs. me-central-1 (UAE), eu-west-1, us-east-1, plus customer VPC and on-prem.

Authentication

Plugsky uses bearer-token API keys. Keys are project-scoped, role-scoped, and rotatable without downtime.

API keys

Generate keys in the DashboardAPI keys. Each key carries a scope and a project. Three built-in roles:

RoleCan doCannot do
readList models, get usage, view audit logsInference, key creation
inferChat / embeddings / image / audio / batchAdmin, billing, key creation
adminAll of the above + key mgmt + billing + RBACOrg-level settings (root only)

Environment variables

bash
export PLUGSKY_API_KEY="psk_live_…"
export PLUGSKY_BASE_URL="https://api.plugsky.com/v1"
export PLUGSKY_PROJECT="prj_8x2…"   # optional, defaults to your first project
export PLUGSKY_REGION="me-central-1"

Scopes & roles

Each key has a comma-separated scope list. Examples: chat:write,embeddings:write,files:read. Use the narrowest scope that works — production keys should never have admin.

OAuth 2.0 (3rd-party apps)

Plugsky supports standard OAuth 2.0 authorization-code flow with PKCE for SaaS apps that want to offer "Sign in with Plugsky" or access their users' Plugsky workspaces. Read the full OAuth guide →

Rotate keys quarterly. Old keys stay valid for 24 hours after rotation so deployments can roll without a brownout. Rotate now →

API reference

Every endpoint, every parameter, every status code. Compatible with OpenAI's /v1/* namespace; Plugsky-specific extensions live under /v1/plugsky/*.

Chat completions

POST /v1/chat/completions
The primary inference endpoint. Supports streaming, function-calling, JSON mode, structured outputs, vision, and tool use.
Required: model, messages  ·  Optional: temperature, top_p, n, stream, stop, max_tokens, presence_penalty, frequency_penalty, tools, tool_choice, response_format, seed, user
python
from openai import OpenAI
client = OpenAI(api_key="psk_live_…", base_url="https://api.plugsky.com/v1")

stream = client.chat.completions.create(
    model="plugsky-pro",
    messages=[{"role": "user", "content": "Write a haiku about GCC summers"}],
    stream=True,
    temperature=0.7,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Legacy completions

POST/v1/completions
Pre-chat raw completion endpoint. Backwards-compatible. Prefer /v1/chat/completions for new integrations.

Embeddings

POST/v1/embeddings
Vector embeddings for semantic search, RAG, clustering, and recommendations. Models →
python
resp = client.embeddings.create(
    model="plugsky-embed-2",
    input=["Plugsky is a sovereign AI cloud", "GCC banks run on us"],
)
print(len(resp.data[0].embedding), "dimensions")  # 1536

Image generation

POST/v1/images/generations
DALL·E-compatible image generation. Models: plugsky-imagine-xl, plugsky-imagine-fast.

Audio (ASR / TTS)

POST/v1/audio/transcriptions
Speech-to-text. 99 languages, Arabic dialect support, real-time streaming. whisper-plugsky model.
POST/v1/audio/speech
Text-to-speech. 30+ voices, SSML, Arabic & English, real-time.

Moderation

POST/v1/moderations
Classify text for harmful content. Returns per-category scores (hate, violence, sexual, self-harm, PII). Plugsky extension: pii, prompt-injection.

Files

POST/v1/files
Upload files for fine-tuning, batch, or vector-store ingestion. Up to 512 MB per file. PDF, DOCX, MD, TXT, JSON, CSV, audio, video.

Batch API

POST/v1/batches
Process up to 50,000 requests asynchronously. 24-hour SLA, 50% cheaper than sync. Returns JSONL results.

Fine-tuning

POST/v1/fine_tuning/jobs
Supervised fine-tuning (SFT) and DPO. Base models: plugsky-micro, plugsky-lite, plugsky-pro. Your data, your region, your keys.

Assistants / Agents

POST/v1/assistants
OpenAI Assistants-compatible stateful agents with tools (code interpreter, file search, function calling). 100% compatible with the OpenAI Assistants API.

Function calling / tools

Pass a tools array with JSON-Schema function definitions. The model returns a structured tool_calls payload you execute, then return the result. Streaming and parallel tool calls supported. Generate your function schema →

Responses (stateful)

POST/v1/responses
OpenAI Responses-compatible stateful endpoint. Built-in RAG, web search, code interpreter, and computer use. Recommended for new agentic builds.

Plugsky extensions

GET/v1/plugsky/usage
Per-key, per-model, per-day usage. Returns the same shape used by the Dashboard.
POST/v1/plugsky/route
Smart routing — pass model="auto" and Plugsky picks the best model for the prompt, your cost target, and your latency target.

Models

31 first-party models across free, paid, and embedding tiers — all served behind one OpenAI-compatible endpoint, billed from one invoice, governed by one set of policies. Use GET /v1/models for the live list. Hover the icon in the dashboard Models page for full details on any model.

Free tier — instant, no API key required

ModelContextBest forTier
plugsky-micro 131KFast, cheap — classification, simple chat, intent detectionfree
plugsky-lite 32KSupport & chat automation — moderate complexityfree

Paid tier — balanced general agents

ModelContextBest forTier
plugsky-plus 32KBalanced general agent — good quality at lower costpaid
plugsky-pro 65KCoding & reasoning (default) — strong general purposedefault
plugsky-max 131KComplex multi-step — deep reasoningreasoning
plugsky-frontier 131KFrontier-tier — Mistral Large 3 675B (EU origin, 128K context)reasoning

Specialized — reasoning, vision, code, long-context

ModelContextBest forCapabilities
plugsky-reasoning 65KDeep reasoning, math, code — NVIDIA Nemotron Super 120B🧠 reasoning · 🔧 tools
plugsky-kimi 131KLong-context (256K) — MoonshotAI Kimi K2.6📄 long-context
plugsky-deepseek-pro 65KReasoning + code — DeepSeek V4 Pro🧠 reasoning · 💻 code
plugsky-deepseek-flash 32KFast DeepSeek — V4 Flash⚡ fast
plugsky-gpt-oss 32KOpen-source GPT — gpt-oss-120B (NVIDIA)🧠 reasoning · 📄 long-context
plugsky-qwen-next 131KAlibaba Qwen3 Next 80B MoE (256K context)📄 long-context · ⚛ MoE
plugsky-coder 131KBest open coding model — Qwen3 Coder 480B MoE💻 code · 📄 long-context
plugsky-minimax 32KNVIDIA MiniMax-M3 — strong multimodal + reasoning👁 vision · 🎬 video · 🧠 reasoning
plugsky-vision-fast 32KMultimodal fast — Llama 3.2 Vision 11B👁 vision
plugsky-llama4 131KMeta Llama 4 Maverick 17B (128 experts MoE)⚛ MoE · 📄 long-context
plugsky-qwen-vl 262KQwen 3.5 397B MoE — multimodal + 256K context + reasoning👁 vision · 🧠 reasoning · 📄 long-context
plugsky-longctx 131KMistral Large 3 675B — European, 128K context📄 long-context · 🇪🇺 EU
plugsky-mistral-medium 131KMistral Medium 3.5 128B — fast 128K context📄 long-context
plugsky-gemma-4 32KGoogle Gemma 3 Nano 4B — fast + multimodal👁 vision · ⚡ fast
plugsky-nano 1MNVIDIA Nemotron 3 Nano 30B MoE — 1M context, fast📄 long-context · ⚛ MoE
plugsky-tiny 131KNVIDIA Nemotron Nano 9B v2 — small, fast, low cost⚡ fast
plugsky-coder-fast 32KFast coding — Llama 3.2 3B (newer than 3.1)💻 code · ⚡ fast
plugsky-phi 131KNVIDIA Nemotron Mini 4B — ultra-compact, very fast⚡ fast
plugsky-ultra 1MNVIDIA Nemotron 3 Nano Omni 30B — omni-modal + reasoning, 1M context👁 vision · 🧠 reasoning · 📄 long-context
plugsky-gemma3-nano-2b 32KGoogle Gemma 3 Nano 2B — ultra-compact, fast⚡ fast
plugsky-gemma3-nano-4b 32KGoogle Gemma 3 Nano 4B — small + fast + multimodal⚡ fast
plugsky-mistral-small 131KMistral Small 4 119B — fast + 128K context📄 long-context

Embedding models — for RAG & semantic search

ModelDimensionsMax tokensNotes
plugsky-embed 40968192Default. Best $/quality for RAG.
plugsky-embed-nim 40968192NVIDIA NV-Embed v1 — best general embeddings.
plugsky-embed-multilingual 10248192BGE-M3 — multilingual embeddings (100+ langs).

Smart routing & Model Fusion

Two ways to get cost savings automatically. Set model="plugsky-fusion" to use the dashboard's default chain (sequential, parallel, cost-saver — your choice). Or set model="auto" with a route_hint for Plugsky's classifier-based routing. Typical savings: 60-80% on production traffic. See Model Fusion in the dashboard for the full UI.

python
# Option 1: use your configured Fusion chain
resp = client.chat.completions.create(
    model="plugsky-fusion",   # runs the workspace's default chain
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.model)             # which model actually answered (e.g. "plugsky-micro")

# Option 2: smart routing with cost hint
resp = client.chat.completions.create(
    model="auto",              # classifier picks the best model
    route_hint="cost",         # cost | quality | latency
    max_cost_per_1m=0.50,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.model)             # the model Plugsky chose

SDKs

The OpenAI SDK works as-is. We also ship idiomatic first-party SDKs that add Plugsky-specific features (smart routing, batch, regional pinning).

v2.1.0
Python
3.8+. pip install opencode. Streaming, async, type hints.
v1.4.0
Node.js / TypeScript
18+. npm i opencode. Browser, Bun, Deno, Cloudflare Workers.
v0.9.0
Go
1.21+. go get github.com/plugsky/opencode-go. Context-aware.
v1.1.0
Java / Kotlin
JDK 11+. Maven & Gradle. Coroutines, Reactor, sync.
v0.6.0
Rust
1.74+. cargo add opencode. tokio, async-std, sync.
v1.0
cURL & raw HTTP
Any HTTP client. OpenAPI 3.1 spec published at /v1/openapi.json.

Framework integrations

  • LangChainChatOpenAI(base_url="https://api.plugsky.com/v1")
  • LlamaIndexOpenAI(base_url=…)
  • Vercel AI SDKopenai("…", { baseURL: "https://api.plugsky.com/v1" })
  • HaystackOpenAIGenerator(api_base=…)
  • Semantic KernelOpenAIChatCompletion(…endpoint=…)
  • AutoGenOpenAIWrapper(base_url=…)
  • OpenAI Playground — Custom base URL field. Tested daily.

Example: streaming with back-pressure

python
import openai
client = openai.OpenAI(api_key="psk_live_…", base_url="https://api.plugsky.com/v1")

with client.chat.completions.stream(
    model="plugsky-pro",
    messages=[{"role":"user","content":"Tell me a 500-word story about Plugsky"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_completion()
    print("\n--- usage:", final.usage)

Official software & installers

First-party desktop, CLI, and self-hosted apps — all open-source under MIT / Apache-2.0, branded Plugsky, and pre-configured to talk to the Plugsky API. Install with one command.

Plugsky CLI

AI coding agent for your terminal. Fork of opencode, rebranded 100% as Plugsky. MIT-licensed, supports all major platforms.

bash
# macOS / Linux
curl -fsSL https://plugsky.com/install | bash

# Windows (PowerShell)
irm https://plugsky.com/install | iex

# With API key
curl -fsSL https://plugsky.com/install | bash -s -- --api-key sk-live-...
plugsky

View on GitHub → · All 58 integrations →

Plugsky Desktop (Jan-based)

Native AI chat client for macOS, Linux, Windows. Built on the open-source Jan project (Apache-2.0). Branded Plugsky, pre-configured with plugsky-pro as the default model.

bash
curl -fsSL https://plugsky.com/install-desktop | bash
plugsky-desktop

Available for macOS (Apple Silicon + Intel), Linux (x64), Windows (x64).

Plugsky Web (Open WebUI-based)

Self-hosted AI chat UI in your browser. Built on the open-source Open WebUI project. Docker or pip install, both backends supported. The plugsky-fusion model is pre-listed.

bash
curl -fsSL https://plugsky.com/install-web | bash
plugsky-web start
# → opens http://localhost:8080

Latest releases (v0.1.0 — 2026-06-23)

All three are MIT-licensed. See NOTICE for full upstream attribution.

Operations

Rate limits & quotas

All plans include unlimited usage within fair-use rate limits — no per-token charges, no per-request charges, no overage fees. The only limit is the per-minute request rate (RPM) for your tier. Increase limits from the Dashboard or by emailing support@plugsky.com.

PlanMonthly feeFair-use RPMConcurrentAPI keysSeats
Trial$5 / 7 days60511
Starter$20 / mo601051
Builder$60 / mo30050205
Scale$120 / mo1,00020010025
EnterpriseAnnual contractCustom (10K+)CustomUnlimitedUnlimited

Same flat rate on every model — no separate pricing for plugsky-frontier vs plugsky-micro. All 31 models are included on every paid plan. Hit a 429? The response includes a Retry-After header. The SDKs retry with exponential backoff automatically.

Hit a 429? The response includes a Retry-After header. Use the SDKs — they retry with exponential backoff automatically.

Retries & idempotency

All POST endpoints accept an Idempotency-Key header. Re-sending the same key returns the cached result for 24 hours. This makes your POSTs safe to retry without double-billing or double-creating resources.

bash
curl -X POST https://api.plugsky.com/v1/chat/completions \
  -H "Authorization: Bearer $PLUGSKY_API_KEY" \
  -H "Idempotency-Key: $(uuidgen)" \
  -H "Content-Type: application/json" \
  -d '{"model":"plugsky-pro","messages":[{"role":"user","content":"hello"}]}'

Errors & status codes

CodeMeaningWhat to do
400Bad request — malformed JSON, invalid paramValidate locally before sending
401Invalid or missing API keyCheck Authorization header
403Key lacks required scopeCheck key role in Dashboard
404Model or resource not foundList /v1/models to see what's available
409Conflict (duplicate idempotency key with different body)Generate a fresh key per logical request
429Rate limit hitHonor Retry-After
500Internal errorRetry with backoff. Open a ticket if persistent.
503Upstream provider downSmart-routed requests automatically failover

Webhooks

Subscribe to 9 event types: batch.completed, fine_tuning.completed, invoice.paid, key.rotated, quota.warning, quota.exceeded, model.deprecated, usage.threshold, audit.alert. HMAC-SHA256 signed. Configure in Dashboard → Webhooks.

Logs & observability

Every request is logged with: timestamp, model, tokens, latency, status, key ID, project ID, region, request ID, optional user tag. Export to Datadog, Splunk, Grafana, New Relic, OpenTelemetry, or your SIEM.

Status & SLAs

Live status: /status. Public incident history. Uptime SLAs:

  • Builder / Scale: 99.9% monthly uptime, 10% credit on miss
  • Enterprise: 99.95% monthly uptime, 25% credit, 99.99% on multi-region deployments

Deployment topologies

Same API, four deployment models. Pick one, or combine them across teams.

DEFAULT
Plugsky Cloud
Multi-tenant SaaS. me-central-1 (UAE) primary, eu-west-1, us-east-1, ap-southeast-1. Fastest to start.
ENTERPRISE
VPC deployment
Plugsky runs inside your AWS / GCP / Azure / OCI VPC. Your network, your peering, your KMS, your keys.
REGULATED
On-premises
Helm chart or air-gap installer for your data center. GPU pool: H100, H200, MI300X, or CPU-only.
SOVEREIGN
Air-gapped
No internet at all. Bundle ships on physical media. Monthly model refresh by courier.
FINANCIAL
Bring-your-own-cloud
Plugsky control plane runs in our cloud; inference runs in your cloud account. You pay your hyperscaler directly.
SAAS
White-label
Your brand on the dashboard, your domain, your colors. Resell AI under your own SKU.

Decision matrix

You need…Use
Ship in 1 day, no compliancePlugsky Cloud
Data stays in me-central-1Plugsky Cloud (region pinned)
No data leaves your AWS / Azure / GCP accountVPC deployment
SAMA / CBUAE / NSD audit trailVPC deployment + customer-managed keys
Air-gap, no internetOn-prem or air-gapped
Resell AI under your brandWhite-label

Security & compliance

Security model

  • Encryption in transit: TLS 1.3 only, HSTS, modern ciphers
  • Encryption at rest: AES-256-GCM, customer-managed keys available
  • Network isolation: per-tenant VPC, security groups, no shared kernel
  • Tenant isolation: logical (RBAC + scoped keys) or physical (your own cluster)
  • Secret hygiene: keys never logged, never returned in responses, hashed at rest with Argon2id
  • Pen tests: quarterly by HackerOne + an external firm. Reports under NDA.
  • Bug bounty: up to $25,000. security@plugsky.com

Data residency

Choose per-request or pin globally. Regions: me-central-1 (UAE — default GCC), sa-central-1 (Riyadh — Enterprise), eu-west-1 (Dublin), eu-central-1 (Frankfurt), us-east-1, us-west-2, ap-southeast-1 (Singapore). Data never leaves the pinned region. Deep dive →

Compliance & certifications

  • SOC 2 Type II — annually audited, report under NDA
  • ISO 27001 — InfoSec management
  • ISO 27701 — Privacy management
  • ISO 27017 / 27018 — Cloud & PII
  • GDPR — EU data protection
  • HIPAA — Healthcare (Enterprise + BAA)
  • PCI DSS — Card data safety (no inference on card data unless on-prem)
  • FedRAMP Moderate — In process, available on Enterprise
  • UAE PDPL — Federal Data Protection Law
  • DIFC DPL — Dubai International Financial Centre
  • SAMA CSF — Saudi Central Bank cyber framework
  • NSD — National Security Directive alignment (Enterprise on-prem)

DPA & legal

Standard Contractual Clauses (SCCs) baked into the DPA. Sub-processor list published and updated within 30 days of any change. Read the DPA →

Audit logs

Every key action — creation, rotation, scope change, deletion — is logged with actor, timestamp, IP, and request body hash. Exportable to your SIEM (Splunk, Sentinel, QRadar, Chronicle) via webhook or Kinesis/Firehose.

BYOK / HSM

Bring Your Own Key. Plugsky never sees your key — you import it into our HSM integration (AWS KMS, GCP KMS, Azure Key Vault, HashiCorp Vault, Thales Luna, AWS CloudHSM). Key rotation, revocation, and audit all yours.

PII handling

Three modes: no-PII (strict filter, PII auto-redacted), detect-only (PII tagged but not modified), passthrough (your responsibility). Default is detect-only for inference, no-PII for embeddings. Run the residency checklist →

Billing

Plans

Flat monthly fee per workspace. All 31 models included on every paid plan — no per-token charges, no per-request charges, no overage fees, no surprise bills. Cancel or downgrade anytime.

PlanMonthly feeWhat's includedBest for
Trial$5 / 7 daysAll 31 models, 60 RPM, 1 seat, 1 API key, no cardFirst call, evaluation
Starter$20 / moAll models up to plugsky-pro, 60 RPM, 1 seat, 5 keysSolo devs, side projects
Builder$60 / moAll models up to plugsky-max + vision, 300 RPM, 5 seats, 20 keysProduction teams
Scale$120 / moAll 31 models including frontier, 1,000 RPM, 25 seats, 100 keys, SSOHigh-volume SaaS
EnterpriseAnnual contractUnlimited usage, 10K+ RPM, unlimited seats, on-prem, BYOK, 99.99% SLA, DPA, BAA, dedicated engineerBanks, gov, regulated

Annual billing saves 20%. Enterprise plans are annual contracts priced to your deployment, security, and volume requirements — enterprise@plugsky.com.

Usage & metering

Unlimited usage on every plan. No per-token charges, no per-request charges, no overage fees. The only limit is the fair-use RPM for your tier (60, 300, 1,000, or 10K+ on Enterprise). Token counts are still returned in every response (usage.prompt_tokens, usage.completion_tokens) for observability — but they don't drive billing.

Invoices & taxes

Monthly billing on the 1st. PDF invoices emailed automatically. VAT-compliant for UAE (5%), KSA (15%), EU (reverse charge), and US (no sales tax on SaaS in most states). Wire transfer, ACH, SEPA, and major credit cards. Annual contracts: pay upfront, save 15%.

Quotas & limits

Hard $ caps at the project level prevent runaway spend. Soft warning alerts at 50%, 80%, 95%. Hard block at 100% (auto-reject 402). You can set overage_behavior=allow with a finance-approved key to allow overage up to 3× the cap with auto-billing.

Migration guides

From OpenAI

  1. Generate a Plugsky key in the Dashboard
  2. In your code, change base_url to https://api.plugsky.com/v1
  3. Optionally map gpt-4oplugsky-pro, gpt-4o-miniplugsky-lite for cost savings
  4. Run your existing evals — should pass unchanged
  5. Switch DNS / cut over when ready

Need a per-language migration walkthrough? Full guide → or generate code for your stack →

From Anthropic

Plugsky exposes the Messages API at /v1/messages with full Anthropic compatibility. Just change base_url and your Claude SDK code works as-is. Map claude-3-5-sonnetplugsky-pro for 40%+ savings, same quality tier.

From Azure OpenAI

Point your Azure SDK at https://api.plugsky.com/v1 (Azure SDK supports custom endpoints). Models keep their Azure names with the azure/ prefix. Existing content filters and Azure-specific features have Plugsky equivalents — see the full compatibility matrix →

From AWS Bedrock

Use the Bedrock SDK's endpoint_url parameter. The Converse API is supported at /v1/bedrock/converse for drop-in compatibility.

Reference

Glossary

TermDefinition
TokenThe atomic unit of billing. ~4 chars in English. ~1.5 chars in Arabic.
Context windowMax tokens a model can see in a single call (input + output).
EmbeddingA fixed-length vector representation of text. Used for semantic search.
RAGRetrieval-Augmented Generation: retrieve relevant docs, stuff into prompt, generate.
Function callingModel returns a structured tool call instead of free text. You execute, return result.
Fine-tuningContinue training a base model on your data. SFT (supervised) or DPO (preference).
DistillationTrain a small model to mimic a large one. Cheaper, faster, similar quality.
AgentModel + tools + memory + planning loop. Autonomous multi-step task execution.
Vector storeIndexed embeddings for fast similarity search. Plugsky includes one out of the box.
BYOKBring Your Own Key. You control the encryption keys. We can't read your data.
SovereignHosted entirely inside one jurisdiction, with no foreign access. PDPL-compliant.

Changelog

/changelog — full release history. Subscribe to RSS or the model.deprecated webhook.

Support