Getting started
Five minutes from zero to your first model response. The Plugsky API is 100% OpenAI-compatible — change your base_url and your existing code, SDK, and prompts keep working.
Install
# Python (OpenAI SDK already works)
pip install opencode
# Node.js / TypeScript
npm i opencode
# Go
go get github.com/plugsky/opencode-go
# Or use raw HTTP — no SDK required
Your first API call
from openai import OpenAI
client = OpenAI(
api_key="psk_live_…", # your Plugsky key
base_url="https://api.plugsky.com/v1", # the only line that changes
)
resp = client.chat.completions.create(
model="plugsky-pro",
messages=[{"role": "user", "content": "Say hello in 5 languages"}],
max_tokens=200,
)
print(resp.choices[0].message.content)
That's it. Same request shape, same response shape, same streaming, same function-calling, same JSON mode. Full OpenAI-compatibility reference →
Core concepts
- Model. The inference engine.
plugsky-microthroughplugsky-frontier, plus 13+ third-party (opencode.ai, NVIDIA NIM, Mistral, Cohere, Stability). - Request. A single API call. Token-counted. Priced.
- Thread / Run. Stateful multi-turn conversation (Assistants API).
- Tool. A function you expose to the model for function-calling.
- Knowledge base / Vector store. Indexed documents the model can retrieve from (RAG).
- Endpoint / Region. Where the model runs.
me-central-1(UAE),eu-west-1,us-east-1, plus customer VPC and on-prem.
Authentication
Plugsky uses bearer-token API keys. Keys are project-scoped, role-scoped, and rotatable without downtime.
API keys
Generate keys in the Dashboard → API keys. Each key carries a scope and a project. Three built-in roles:
| Role | Can do | Cannot do |
|---|---|---|
read | List models, get usage, view audit logs | Inference, key creation |
infer | Chat / embeddings / image / audio / batch | Admin, billing, key creation |
admin | All of the above + key mgmt + billing + RBAC | Org-level settings (root only) |
Environment variables
export PLUGSKY_API_KEY="psk_live_…"
export PLUGSKY_BASE_URL="https://api.plugsky.com/v1"
export PLUGSKY_PROJECT="prj_8x2…" # optional, defaults to your first project
export PLUGSKY_REGION="me-central-1"
Scopes & roles
Each key has a comma-separated scope list. Examples: chat:write,embeddings:write,files:read. Use the narrowest scope that works — production keys should never have admin.
OAuth 2.0 (3rd-party apps)
Plugsky supports standard OAuth 2.0 authorization-code flow with PKCE for SaaS apps that want to offer "Sign in with Plugsky" or access their users' Plugsky workspaces. Read the full OAuth guide →
API reference
Every endpoint, every parameter, every status code. Compatible with OpenAI's /v1/* namespace; Plugsky-specific extensions live under /v1/plugsky/*.
Chat completions
model, messages ·
Optional: temperature, top_p, n, stream, stop, max_tokens, presence_penalty, frequency_penalty, tools, tool_choice, response_format, seed, user
from openai import OpenAI
client = OpenAI(api_key="psk_live_…", base_url="https://api.plugsky.com/v1")
stream = client.chat.completions.create(
model="plugsky-pro",
messages=[{"role": "user", "content": "Write a haiku about GCC summers"}],
stream=True,
temperature=0.7,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
Legacy completions
/v1/chat/completions for new integrations.Embeddings
resp = client.embeddings.create(
model="plugsky-embed-2",
input=["Plugsky is a sovereign AI cloud", "GCC banks run on us"],
)
print(len(resp.data[0].embedding), "dimensions") # 1536
Image generation
plugsky-imagine-xl, plugsky-imagine-fast.Audio (ASR / TTS)
whisper-plugsky model.Moderation
pii, prompt-injection.Files
Batch API
Fine-tuning
plugsky-micro, plugsky-lite, plugsky-pro. Your data, your region, your keys.Assistants / Agents
Function calling / tools
Pass a tools array with JSON-Schema function definitions. The model returns a structured tool_calls payload you execute, then return the result. Streaming and parallel tool calls supported. Generate your function schema →
Responses (stateful)
Plugsky extensions
model="auto" and Plugsky picks the best model for the prompt, your cost target, and your latency target.Models
31 first-party models across free, paid, and embedding tiers — all served behind one OpenAI-compatible endpoint, billed from one invoice, governed by one set of policies. Use GET /v1/models for the live list. Hover the ⓘ icon in the dashboard Models page for full details on any model.
Free tier — instant, no API key required
| Model | Context | Best for | Tier |
|---|---|---|---|
plugsky-micro | 131K | Fast, cheap — classification, simple chat, intent detection | free |
plugsky-lite | 32K | Support & chat automation — moderate complexity | free |
Paid tier — balanced general agents
| Model | Context | Best for | Tier |
|---|---|---|---|
plugsky-plus | 32K | Balanced general agent — good quality at lower cost | paid |
plugsky-pro | 65K | Coding & reasoning (default) — strong general purpose | default |
plugsky-max | 131K | Complex multi-step — deep reasoning | reasoning |
plugsky-frontier | 131K | Frontier-tier — Mistral Large 3 675B (EU origin, 128K context) | reasoning |
Specialized — reasoning, vision, code, long-context
| Model | Context | Best for | Capabilities |
|---|---|---|---|
plugsky-reasoning | 65K | Deep reasoning, math, code — NVIDIA Nemotron Super 120B | 🧠 reasoning · 🔧 tools |
plugsky-kimi | 131K | Long-context (256K) — MoonshotAI Kimi K2.6 | 📄 long-context |
plugsky-deepseek-pro | 65K | Reasoning + code — DeepSeek V4 Pro | 🧠 reasoning · 💻 code |
plugsky-deepseek-flash | 32K | Fast DeepSeek — V4 Flash | ⚡ fast |
plugsky-gpt-oss | 32K | Open-source GPT — gpt-oss-120B (NVIDIA) | 🧠 reasoning · 📄 long-context |
plugsky-qwen-next | 131K | Alibaba Qwen3 Next 80B MoE (256K context) | 📄 long-context · ⚛ MoE |
plugsky-coder | 131K | Best open coding model — Qwen3 Coder 480B MoE | 💻 code · 📄 long-context |
plugsky-minimax | 32K | NVIDIA MiniMax-M3 — strong multimodal + reasoning | 👁 vision · 🎬 video · 🧠 reasoning |
plugsky-vision-fast | 32K | Multimodal fast — Llama 3.2 Vision 11B | 👁 vision |
plugsky-llama4 | 131K | Meta Llama 4 Maverick 17B (128 experts MoE) | ⚛ MoE · 📄 long-context |
plugsky-qwen-vl | 262K | Qwen 3.5 397B MoE — multimodal + 256K context + reasoning | 👁 vision · 🧠 reasoning · 📄 long-context |
plugsky-longctx | 131K | Mistral Large 3 675B — European, 128K context | 📄 long-context · 🇪🇺 EU |
plugsky-mistral-medium | 131K | Mistral Medium 3.5 128B — fast 128K context | 📄 long-context |
plugsky-gemma-4 | 32K | Google Gemma 3 Nano 4B — fast + multimodal | 👁 vision · ⚡ fast |
plugsky-nano | 1M | NVIDIA Nemotron 3 Nano 30B MoE — 1M context, fast | 📄 long-context · ⚛ MoE |
plugsky-tiny | 131K | NVIDIA Nemotron Nano 9B v2 — small, fast, low cost | ⚡ fast |
plugsky-coder-fast | 32K | Fast coding — Llama 3.2 3B (newer than 3.1) | 💻 code · ⚡ fast |
plugsky-phi | 131K | NVIDIA Nemotron Mini 4B — ultra-compact, very fast | ⚡ fast |
plugsky-ultra | 1M | NVIDIA Nemotron 3 Nano Omni 30B — omni-modal + reasoning, 1M context | 👁 vision · 🧠 reasoning · 📄 long-context |
plugsky-gemma3-nano-2b | 32K | Google Gemma 3 Nano 2B — ultra-compact, fast | ⚡ fast |
plugsky-gemma3-nano-4b | 32K | Google Gemma 3 Nano 4B — small + fast + multimodal | ⚡ fast |
plugsky-mistral-small | 131K | Mistral Small 4 119B — fast + 128K context | 📄 long-context |
Embedding models — for RAG & semantic search
| Model | Dimensions | Max tokens | Notes |
|---|---|---|---|
plugsky-embed | 4096 | 8192 | Default. Best $/quality for RAG. |
plugsky-embed-nim | 4096 | 8192 | NVIDIA NV-Embed v1 — best general embeddings. |
plugsky-embed-multilingual | 1024 | 8192 | BGE-M3 — multilingual embeddings (100+ langs). |
Smart routing & Model Fusion
Two ways to get cost savings automatically. Set model="plugsky-fusion" to use the dashboard's default chain (sequential, parallel, cost-saver — your choice). Or set model="auto" with a route_hint for Plugsky's classifier-based routing. Typical savings: 60-80% on production traffic. See Model Fusion in the dashboard for the full UI.
# Option 1: use your configured Fusion chain
resp = client.chat.completions.create(
model="plugsky-fusion", # runs the workspace's default chain
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.model) # which model actually answered (e.g. "plugsky-micro")
# Option 2: smart routing with cost hint
resp = client.chat.completions.create(
model="auto", # classifier picks the best model
route_hint="cost", # cost | quality | latency
max_cost_per_1m=0.50,
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.model) # the model Plugsky chose
SDKs
The OpenAI SDK works as-is. We also ship idiomatic first-party SDKs that add Plugsky-specific features (smart routing, batch, regional pinning).
pip install opencode. Streaming, async, type hints.npm i opencode. Browser, Bun, Deno, Cloudflare Workers.go get github.com/plugsky/opencode-go. Context-aware.cargo add opencode. tokio, async-std, sync./v1/openapi.json.Framework integrations
- LangChain —
ChatOpenAI(base_url="https://api.plugsky.com/v1") - LlamaIndex —
OpenAI(base_url=…) - Vercel AI SDK —
openai("…", { baseURL: "https://api.plugsky.com/v1" }) - Haystack —
OpenAIGenerator(api_base=…) - Semantic Kernel —
OpenAIChatCompletion(…endpoint=…) - AutoGen —
OpenAIWrapper(base_url=…) - OpenAI Playground — Custom base URL field. Tested daily.
Example: streaming with back-pressure
import openai
client = openai.OpenAI(api_key="psk_live_…", base_url="https://api.plugsky.com/v1")
with client.chat.completions.stream(
model="plugsky-pro",
messages=[{"role":"user","content":"Tell me a 500-word story about Plugsky"}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
final = stream.get_final_completion()
print("\n--- usage:", final.usage)
Official software & installers
First-party desktop, CLI, and self-hosted apps — all open-source under MIT / Apache-2.0, branded Plugsky, and pre-configured to talk to the Plugsky API. Install with one command.
Plugsky CLI
AI coding agent for your terminal. Fork of opencode, rebranded 100% as Plugsky. MIT-licensed, supports all major platforms.
# macOS / Linux
curl -fsSL https://plugsky.com/install | bash
# Windows (PowerShell)
irm https://plugsky.com/install | iex
# With API key
curl -fsSL https://plugsky.com/install | bash -s -- --api-key sk-live-...
plugsky
View on GitHub → · All 58 integrations →
Plugsky Desktop (Jan-based)
Native AI chat client for macOS, Linux, Windows. Built on the open-source Jan project (Apache-2.0). Branded Plugsky, pre-configured with plugsky-pro as the default model.
curl -fsSL https://plugsky.com/install-desktop | bash
plugsky-desktop
Available for macOS (Apple Silicon + Intel), Linux (x64), Windows (x64).
Plugsky Web (Open WebUI-based)
Self-hosted AI chat UI in your browser. Built on the open-source Open WebUI project. Docker or pip install, both backends supported. The plugsky-fusion model is pre-listed.
curl -fsSL https://plugsky.com/install-web | bash
plugsky-web start
# → opens http://localhost:8080
Latest releases (v0.1.0 — 2026-06-23)
curl -fsSL https://plugsky.com/install | bashcurl -fsSL https://plugsky.com/install-desktop | bashcurl -fsSL https://plugsky.com/install-web | bashAll three are MIT-licensed. See NOTICE for full upstream attribution.
Operations
Rate limits & quotas
All plans include unlimited usage within fair-use rate limits — no per-token charges, no per-request charges, no overage fees. The only limit is the per-minute request rate (RPM) for your tier. Increase limits from the Dashboard or by emailing support@plugsky.com.
| Plan | Monthly fee | Fair-use RPM | Concurrent | API keys | Seats |
|---|---|---|---|---|---|
| Trial | $5 / 7 days | 60 | 5 | 1 | 1 |
| Starter | $20 / mo | 60 | 10 | 5 | 1 |
| Builder | $60 / mo | 300 | 50 | 20 | 5 |
| Scale | $120 / mo | 1,000 | 200 | 100 | 25 |
| Enterprise | Annual contract | Custom (10K+) | Custom | Unlimited | Unlimited |
Same flat rate on every model — no separate pricing for plugsky-frontier vs plugsky-micro. All 31 models are included on every paid plan. Hit a 429? The response includes a Retry-After header. The SDKs retry with exponential backoff automatically.
Hit a 429? The response includes a Retry-After header. Use the SDKs — they retry with exponential backoff automatically.
Retries & idempotency
All POST endpoints accept an Idempotency-Key header. Re-sending the same key returns the cached result for 24 hours. This makes your POSTs safe to retry without double-billing or double-creating resources.
curl -X POST https://api.plugsky.com/v1/chat/completions \
-H "Authorization: Bearer $PLUGSKY_API_KEY" \
-H "Idempotency-Key: $(uuidgen)" \
-H "Content-Type: application/json" \
-d '{"model":"plugsky-pro","messages":[{"role":"user","content":"hello"}]}'
Errors & status codes
| Code | Meaning | What to do |
|---|---|---|
400 | Bad request — malformed JSON, invalid param | Validate locally before sending |
401 | Invalid or missing API key | Check Authorization header |
403 | Key lacks required scope | Check key role in Dashboard |
404 | Model or resource not found | List /v1/models to see what's available |
409 | Conflict (duplicate idempotency key with different body) | Generate a fresh key per logical request |
429 | Rate limit hit | Honor Retry-After |
500 | Internal error | Retry with backoff. Open a ticket if persistent. |
503 | Upstream provider down | Smart-routed requests automatically failover |
Webhooks
Subscribe to 9 event types: batch.completed, fine_tuning.completed, invoice.paid, key.rotated, quota.warning, quota.exceeded, model.deprecated, usage.threshold, audit.alert. HMAC-SHA256 signed. Configure in Dashboard → Webhooks.
Logs & observability
Every request is logged with: timestamp, model, tokens, latency, status, key ID, project ID, region, request ID, optional user tag. Export to Datadog, Splunk, Grafana, New Relic, OpenTelemetry, or your SIEM.
Status & SLAs
Live status: /status. Public incident history. Uptime SLAs:
- Builder / Scale: 99.9% monthly uptime, 10% credit on miss
- Enterprise: 99.95% monthly uptime, 25% credit, 99.99% on multi-region deployments
Deployment topologies
Same API, four deployment models. Pick one, or combine them across teams.
Decision matrix
| You need… | Use |
|---|---|
| Ship in 1 day, no compliance | Plugsky Cloud |
| Data stays in me-central-1 | Plugsky Cloud (region pinned) |
| No data leaves your AWS / Azure / GCP account | VPC deployment |
| SAMA / CBUAE / NSD audit trail | VPC deployment + customer-managed keys |
| Air-gap, no internet | On-prem or air-gapped |
| Resell AI under your brand | White-label |
Security & compliance
Security model
- Encryption in transit: TLS 1.3 only, HSTS, modern ciphers
- Encryption at rest: AES-256-GCM, customer-managed keys available
- Network isolation: per-tenant VPC, security groups, no shared kernel
- Tenant isolation: logical (RBAC + scoped keys) or physical (your own cluster)
- Secret hygiene: keys never logged, never returned in responses, hashed at rest with Argon2id
- Pen tests: quarterly by HackerOne + an external firm. Reports under NDA.
- Bug bounty: up to $25,000. security@plugsky.com
Data residency
Choose per-request or pin globally. Regions: me-central-1 (UAE — default GCC), sa-central-1 (Riyadh — Enterprise), eu-west-1 (Dublin), eu-central-1 (Frankfurt), us-east-1, us-west-2, ap-southeast-1 (Singapore). Data never leaves the pinned region. Deep dive →
Compliance & certifications
- SOC 2 Type II — annually audited, report under NDA
- ISO 27001 — InfoSec management
- ISO 27701 — Privacy management
- ISO 27017 / 27018 — Cloud & PII
- GDPR — EU data protection
- HIPAA — Healthcare (Enterprise + BAA)
- PCI DSS — Card data safety (no inference on card data unless on-prem)
- FedRAMP Moderate — In process, available on Enterprise
- UAE PDPL — Federal Data Protection Law
- DIFC DPL — Dubai International Financial Centre
- SAMA CSF — Saudi Central Bank cyber framework
- NSD — National Security Directive alignment (Enterprise on-prem)
DPA & legal
Standard Contractual Clauses (SCCs) baked into the DPA. Sub-processor list published and updated within 30 days of any change. Read the DPA →
Audit logs
Every key action — creation, rotation, scope change, deletion — is logged with actor, timestamp, IP, and request body hash. Exportable to your SIEM (Splunk, Sentinel, QRadar, Chronicle) via webhook or Kinesis/Firehose.
BYOK / HSM
Bring Your Own Key. Plugsky never sees your key — you import it into our HSM integration (AWS KMS, GCP KMS, Azure Key Vault, HashiCorp Vault, Thales Luna, AWS CloudHSM). Key rotation, revocation, and audit all yours.
PII handling
Three modes: no-PII (strict filter, PII auto-redacted), detect-only (PII tagged but not modified), passthrough (your responsibility). Default is detect-only for inference, no-PII for embeddings. Run the residency checklist →
Billing
Plans
Flat monthly fee per workspace. All 31 models included on every paid plan — no per-token charges, no per-request charges, no overage fees, no surprise bills. Cancel or downgrade anytime.
| Plan | Monthly fee | What's included | Best for |
|---|---|---|---|
| Trial | $5 / 7 days | All 31 models, 60 RPM, 1 seat, 1 API key, no card | First call, evaluation |
| Starter | $20 / mo | All models up to plugsky-pro, 60 RPM, 1 seat, 5 keys | Solo devs, side projects |
| Builder | $60 / mo | All models up to plugsky-max + vision, 300 RPM, 5 seats, 20 keys | Production teams |
| Scale | $120 / mo | All 31 models including frontier, 1,000 RPM, 25 seats, 100 keys, SSO | High-volume SaaS |
| Enterprise | Annual contract | Unlimited usage, 10K+ RPM, unlimited seats, on-prem, BYOK, 99.99% SLA, DPA, BAA, dedicated engineer | Banks, gov, regulated |
Annual billing saves 20%. Enterprise plans are annual contracts priced to your deployment, security, and volume requirements — enterprise@plugsky.com.
Usage & metering
Unlimited usage on every plan. No per-token charges, no per-request charges, no overage fees. The only limit is the fair-use RPM for your tier (60, 300, 1,000, or 10K+ on Enterprise). Token counts are still returned in every response (usage.prompt_tokens, usage.completion_tokens) for observability — but they don't drive billing.
Invoices & taxes
Monthly billing on the 1st. PDF invoices emailed automatically. VAT-compliant for UAE (5%), KSA (15%), EU (reverse charge), and US (no sales tax on SaaS in most states). Wire transfer, ACH, SEPA, and major credit cards. Annual contracts: pay upfront, save 15%.
Quotas & limits
Hard $ caps at the project level prevent runaway spend. Soft warning alerts at 50%, 80%, 95%. Hard block at 100% (auto-reject 402). You can set overage_behavior=allow with a finance-approved key to allow overage up to 3× the cap with auto-billing.
Migration guides
From OpenAI
- Generate a Plugsky key in the Dashboard
- In your code, change
base_urltohttps://api.plugsky.com/v1 - Optionally map
gpt-4o→plugsky-pro,gpt-4o-mini→plugsky-litefor cost savings - Run your existing evals — should pass unchanged
- Switch DNS / cut over when ready
Need a per-language migration walkthrough? Full guide → or generate code for your stack →
From Anthropic
Plugsky exposes the Messages API at /v1/messages with full Anthropic compatibility. Just change base_url and your Claude SDK code works as-is. Map claude-3-5-sonnet → plugsky-pro for 40%+ savings, same quality tier.
From Azure OpenAI
Point your Azure SDK at https://api.plugsky.com/v1 (Azure SDK supports custom endpoints). Models keep their Azure names with the azure/ prefix. Existing content filters and Azure-specific features have Plugsky equivalents — see the full compatibility matrix →
From AWS Bedrock
Use the Bedrock SDK's endpoint_url parameter. The Converse API is supported at /v1/bedrock/converse for drop-in compatibility.
Reference
Glossary
| Term | Definition |
|---|---|
| Token | The atomic unit of billing. ~4 chars in English. ~1.5 chars in Arabic. |
| Context window | Max tokens a model can see in a single call (input + output). |
| Embedding | A fixed-length vector representation of text. Used for semantic search. |
| RAG | Retrieval-Augmented Generation: retrieve relevant docs, stuff into prompt, generate. |
| Function calling | Model returns a structured tool call instead of free text. You execute, return result. |
| Fine-tuning | Continue training a base model on your data. SFT (supervised) or DPO (preference). |
| Distillation | Train a small model to mimic a large one. Cheaper, faster, similar quality. |
| Agent | Model + tools + memory + planning loop. Autonomous multi-step task execution. |
| Vector store | Indexed embeddings for fast similarity search. Plugsky includes one out of the box. |
| BYOK | Bring Your Own Key. You control the encryption keys. We can't read your data. |
| Sovereign | Hosted entirely inside one jurisdiction, with no foreign access. PDPL-compliant. |
Changelog
/changelog — full release history. Subscribe to RSS or the model.deprecated webhook.
Support
- Email: support@plugsky.com (24/5 on Builder+, 24/7 on Scale+)
- Slack Connect: for Enterprise customers
- Office hours: weekly community call, Thu 4pm GST — join →
- Status: /status
- Security issues: security@plugsky.com (PGP key in
/.well-known/security.txt)