← LinkedIn demo

AILANG×huggingface.co

AI portable generated 2026-05-14
agent-ready privacy portable

huggingface.co scored 5/10 on portable.

The radar shows AILANG-readiness across three commercial concerns. High means huggingface.co is already strong there; low means AILANG could meaningfully help.

Why portable scored 5/10
  • Page copy that names one specific LLM provider (e.g. "powered by Claude") without portability claims.
  • Body mentions two or more named AI providers (Claude, GPT, Gemini, Mistral, Llama, etc.) — already vendor-multi.
  • Body mentions self-hosted, on-prem, WASM, Docker, Kubernetes, or "deploy anywhere" — runtime portability claimed.
  • Body mentions "bring your own key", "BYOK", "any LLM", or "model-agnostic" — caller controls the model.

Full breakdown ↓ · View rubric ↗

AI/ML developers and enterprises looking to build and deploy AI applications using various models and providers.

Hugging Face Inference Providers offer a unified API for developers to access hundreds of machine learning models from various providers. It enables building AI applications with tasks like text generation, image creation, and search, reducing vendor lock-in and simplifying model deployment and management.
Inference Providers Machine Learning Models API AI Infrastructure Vendor Lock-in Serverless inference

What AILANG Parse sees on huggingface.co

Structural extraction — the same content an AI agent would consume from this page.

21 headings5 images11 lists1 tables53 links10 code samplesHTML parsing by AILANG Parse

5 sections — page skeleton

1 header 3 navs 1 main

21 headings

Inference Providers Inference Providers Partners Why Choose Inference Providers? Key Features Getting Started

5 images

Hugging Face's logoHugging Face's logoInference Playground thumbnail

11 list items

[ Models ](/models) [ Datasets ](/datasets) [ Spaces ](/spaces) [ Buckets new](/storage) [ Docs ](/docs) [ Enterprise ](/enterprise) [Pricing](/pricing) [Log In](/login) [Sign Up](/join) **Text Generation**: Use Large language models with tool-calling capabilities for chatbot… **Image and Video Generation**: Create custom images and videos, including support for Lo… **Search & Retrieval**: State-of-the-art embeddings for semantic search, RAG systems, and…
Show the full extract — what AILANG Parse pulled from this page
# Inference Providers · Hugging Face


*Header:*
[Image: Hugging Face's logo]

Hugging Face

[Hugging Face](/)

- [ Models ](/models)
- [ Datasets ](/datasets)
- [ Spaces ](/spaces)
- [ Buckets new](/storage)
- [ Docs ](/docs)
- [ Enterprise ](/enterprise)
- [Pricing](/pricing)
- [Log In](/login)
- [Sign Up](/join)

Inference Providers documentation

Inference Providers

# Inference Providers

🏡 View all docsAWS Trainium & InferentiaAccelerateArgillaAutoTrainBitsandbytesCLIChat UIDataset viewerDatasetsDeploying on AWSDiffusersDistilabelEvaluateGoogle CloudGoogle TPUsGradioHubHub Python LibraryHuggingface.jsInference Endpoints (dedicated)Inference ProvidersKernelsLeRobotLeaderboardsLightevalMicrosoft AzureOptimumPEFTReachy MiniSafetensorsSentence TransformersTRLTasksText Embeddings InferenceText Generation InferenceTokenizersTrackioTransformersTransformers.jsXetsmolagentstimm

Search documentation

main

EN

Get Started

[Inference Providers](/docs/inference-providers/en/index)

[Pricing and Billing](/docs/inference-providers/en/pricing)

[Hub integration](/docs/inference-providers/en/hub-integration)

[Security](/docs/inference-providers/en/security)

Guides

[Your First API Call](/docs/inference-providers/en/guides/first-api-call)

[Building Your First AI App](/docs/inference-providers/en/guides/building-first-app)

[Structured Outputs with LLMs](/docs/inference-providers/en/guides/structured-output)

[Function Calling](/docs/inference-providers/en/guides/function-calling)

[Responses API (beta)](/docs/inference-providers/en/guides/responses-api)

[How to use OpenAI gpt-oss](/docs/inference-providers/en/guides/gpt-oss)

[Build an Image Editor](/docs/inference-providers/en/guides/image-editor)

[Automating Code Review with GitHub Actions](/docs/inference-providers/en/guides/github-actions-code-review)

[Agentic Coding Environments with OpenEnv](/docs/inference-providers/en/guides/coding-environment)

[Evaluating Models with Inspect](/docs/inference-providers/en/guides/evaluation-inspect-ai)

Integrations

[Overview](/docs/inference-providers/en/integrations/index)

[Add Your Integration](/docs/inference-providers/en/integrations/adding-integration)

[Claude Code](/docs/inference-providers/en/integrations/claude-code)

[Hermes Agent](/docs/inference-providers/en/integrations/hermes-agent)

[NeMo Data Designer](/docs/inference-providers/en/integrations/datadesigner)

[MacWhisper](/docs/inference-providers/en/integrations/macwhisper)

[OpenCode](/docs/inference-providers/en/integrations/opencode)

[Pi](/docs/inference-providers/en/integrations/pi)

[Vision Agents](/docs/inference-providers/en/integrations/visionagents)

[VS Code with GitHub Copilot](/docs/inference-providers/en/integrations/vscode)

Inference Tasks

[Chat Completion](/docs/inference-providers/en/tasks/chat-completion)

[Feature Extraction](/docs/inference-providers/en/tasks/feature-extraction)

[Text to Image](/docs/inference-providers/en/tasks/text-to-image)

[Text to Video](/docs/inference-providers/en/tasks/text-to-video)

Other Tasks

Providers

[Cerebras](/docs/inference-providers/en/providers/cerebras)

[Cohere](/docs/inference-providers/en/providers/cohere)

[DeepInfra](/docs/inference-providers/en/providers/deepinfra)

[Fal AI](/docs/inference-providers/en/providers/fal-ai)

[Featherless AI](/docs/inference-providers/en/providers/featherless-ai)

[Fireworks](/docs/inference-providers/en/providers/fireworks-ai)

[Groq](/docs/inference-providers/en/providers/groq)

[Hyperbolic](/docs/inference-providers/en/providers/hyperbolic)

[HF Inference](/docs/inference-providers/en/providers/hf-inference)

[Novita](/docs/inference-providers/en/providers/novita)

[Nscale](/docs/inference-providers/en/providers/nscale)

[OVHcloud AI Endpoints](/docs/inference-providers/en/providers/ovhcloud)

[Public AI](/docs/inference-providers/en/providers/publicai)

[Replicate](/docs/inference-providers/en/providers/replicate)

[SambaNova](/docs/inference-providers/en/providers/sambanova)

[Scaleway](/docs/inference-providers/en/providers/scaleway)

[Together](/docs/inference-providers/en/providers/together)

[WaveSpeedAI](/docs/inference-providers/en/providers/wavespeed)

[Z.ai](/docs/inference-providers/en/providers/zai-org)

[Hub API](/docs/inference-providers/en/hub-api)

[Register as an Inference Provider](/docs/inference-providers/en/register-as-a-provider)

[Image: Hugging Face's logo]

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

[Sign Up](/join)

to get started

Copy page

# Inference Providers

[Image]

[Image]

Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. They are also integrated into our client SDKs (for JS and Python), making it easy to explore serverless inference of models on your favorite providers.

## Partners

Our platform integrates with leading AI infrastructure providers, giving you access to their specialized capabilities through a single, consistent API. Here’s what each partner supports:

| Provider | Chat completion (LLM) | Chat completion (VLM) | Feature Extraction | Text to Image | Text to video | Speech to text |
| --- | --- | --- | --- | --- | --- | --- |
| [Cerebras](./providers/cerebras) | ✅ |  |  |  |  |  |
| [Cohere](./providers/cohere) | ✅ | ✅ |  |  |  |  |
| [DeepInfra](./providers/deepinfra) | ✅ | ✅ |  |  |  |  |
| [Fal AI](./providers/fal-ai) |  |  |  | ✅ | ✅ | ✅ |
| [Featherless AI](./providers/featherless-ai) | ✅ | ✅ |  |  |  |  |
| [Fireworks](./providers/fireworks-ai) | ✅ | ✅ |  |  |  |  |
| [Groq](./providers/groq) | ✅ | ✅ |  |  |  |  |
| [HF Inference](./providers/hf-inference) | ✅ | ✅ | ✅ | ✅ |  | ✅ |
| [Hyperbolic](./providers/hyperbolic) | ✅ | ✅ |  |  |  |  |
| [Novita](./providers/novita) | ✅ | ✅ |  |  | ✅ |  |
| [Nscale](./providers/nscale) | ✅ | ✅ |  | ✅ |  |  |
| [OVHcloud AI Endpoints](./providers/ovhcloud) | ✅ | ✅ |  |  |  |  |
| [Public AI](./providers/publicai) | ✅ |  |  |  |  |  |
| [Replicate](./providers/replicate) |  |  |  | ✅ | ✅ | ✅ |
| [SambaNova](./providers/sambanova) | ✅ |  | ✅ |  |  |  |
| [Scaleway](./providers/scaleway) | ✅ |  | ✅ |  |  |  |
| [Together](./providers/together) | ✅ | ✅ |  | ✅ |  |  |
| [WaveSpeedAI](./providers/wavespeed) |  |  |  | ✅ | ✅ |  |
| [Z.ai](./providers/zai-org) | ✅ | ✅ |  |  |  |  |

## Why Choose Inference Providers?

When you build AI applications, it’s tough to manage multiple provider APIs, comparing model performance, and dealing with varying reliability. Inference Providers solves these challenges by offering:

**Instant Access to Cutting-Edge Models**: Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks. Whether you need the latest language models, state-of-the-art image generators, or domain-specific embeddings, you’ll find them here.

**Zero Vendor Lock-in**: Unlike being tied to a single provider’s model catalog, you get access to models from Cerebras, Groq, Together AI, Replicate, and more — all through one consistent interface.

**Production-Ready Performance**: Built for enterprise workloads with the reliability your applications demand.

Here’s what you can build:

- **Text Generation**: Use Large language models with tool-calling capabilities for chatbots, content generation, and code assistance
- **Image and Video Generation**: Create custom images and videos, including support for LoRAs and style customization
- **Search & Retrieval**: State-of-the-art embeddings for semantic search, RAG systems, and recommendation engines
- **Traditional ML Tasks**: Ready-to-use models for classification, NER, summarization, and speech recognition

⚡ **Get Started for Free**: Inference Providers includes a generous free tier, with additional credits for [PRO users](https://hf.co/subscribe/pro) and [Team & Enterprise organizations](https://huggingface.co/enterprise).

## Key Features

- **🎯 All-in-One API**: A single API for text generation, image generation, document embeddings, NER, summarization, image classification, and more.
- **🔀 Multi-Provider Support**: Easily run models from top-tier providers like fal, Replicate, Sambanova, Together AI, and others.
- **🚀 Scalable & Reliable**: Built for high availability and low-latency performance in production environments.
- **🔧 Developer-Friendly**: Simple requests, fast responses, and a consistent developer experience across Python and JavaScript clients.
- **👷 Easy to integrate**: Drop-in replacement for the OpenAI chat completions API.
- **💰 Cost-Effective**: No extra markup on provider rates.

## Getting Started

Inference Providers works with your existing development workflow. Whether you prefer Python, JavaScript, or direct HTTP calls, we provide native SDKs and OpenAI-compatible APIs to get you up and running quickly.

We’ll walk through a practical example using [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b), a state-of-the-art open-weights conversational model.

### Inference Playground

Before diving into integration, explore models interactively with our [Inference Playground](https://huggingface.co/playground). Test different [chat completion models](http://huggingface.co/models?inference_provider=all&sort=trending&other=conversational) with your prompts and compare responses to find the perfect fit for your use case.

[Image: Inference Playground thumbnail]

[(link)](https://huggingface.co/playground)

### Authentication

You’ll need a Hugging Face token to authenticate your requests. Create one by visiting your [token settings](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained) and generating a `fine-grained` token with `Make calls to Inference Providers` permissions.

For complete token management details, see our [security tokens guide](https://huggingface.co/docs/hub/en/security-tokens).

### Quick Start - LLM

Let’s start with the most common use case: conversational AI using large language models. This section demonstrates how to perform chat completions using DeepSeek V3, showcasing the different ways you can integrate Inference Providers into your applications.

Whether you prefer our native clients, want OpenAI compatibility, or need direct HTTP access, we’ll show you how to get up and running with just a few lines of code.

#### Python

Here are three ways to integrate Inference Providers into your Python applications, from high-level convenience to low-level control:

huggingface_hub

openai

requests

For convenience, the `huggingface_hub` library provides an [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/guides/inference) that automatically handles provider selection and request routing.

In your terminal, install the Hugging Face Hub Python client and log in:

Copied

pip install huggingface_hub
hf auth login # get a read token from hf.co/settings/tokens

You can now use the client with a Python interpreter.

By default, our system automatically selects the fastest available provider for the specified model (equivalent to the `:fastest` policy — highest throughput in tokens per second).

You can change the provider selection policy by appending a policy suffix to the model id: `:cheapest` for the most cost-efficient provider (lowest price per output token), or `:preferred` to follow your preference order in [Inference Provider settings](https://hf.co/settings/inference-providers). For example, `openai/gpt-oss-120b:cheapest`.

You can also select the provider of your choice by appending the provider name to the model id (e.g. `"openai/gpt-oss-120b:sambanova"`).

Copied

import os
from huggingface_hub import InferenceClient

client = InferenceClient()

completion = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {
 
Screenshot of huggingface.co

Couldn't render a preview for this site. Open the URL in a new tab ↗

Screenshot via thum.io

huggingface.co scored 5/10 on portable. AILANG opportunity is therefore 5/10. Here's where it would land first.

Same module, any LLM — picked at the CLI

Provider selection isn't a code edit — it's a flag on the run command. The exact same compiled .ail file talks to Anthropic, Google, OpenAI, OpenRouter or local Ollama depending on what you pass to `--ai`. Vendor lock-in becomes a shell-history concern.

# Same chat.ail, three vendors — no source change.
ailang run --ai claude-haiku-4-5  chat.ail
ailang run --ai gemini-2.5-flash chat.ail
ailang run --ai gpt-5.1-nano     chat.ail
# std/ai dispatches to each provider's native API.
→ AILANG docs

Structured output, portable across providers

callJson(prompt, schema) maps to each provider's native structured-output primitive — responseSchema for Gemini, response_format for OpenAI, forced-tool for Anthropic. Your schema, their plumbing.

let result = callJson(prompt, intentSchema);
-- same AILANG code, four different provider paths underneath.
→ AILANG docs

OpenRouter routing with replayable resolution

Reach SOTA open-source models through OpenRouter; the resolved model ID is logged so the eval is replayable months later, even if the upstream router has moved on.

call(prompt, model = "openrouter/meta-llama/llama-4-405b");
-- the eval harness pins the exact resolved model ID.
→ AILANG docs

How this page was made

func sketchSite(url: string<pii>, topic: Topic) -> Sketch
  ! {Net @limit=1, AI @limit=5, FS @limit=4, Process, Declassify}
SignalTopicResultPointsAILANG primitive
agent.json referencedagent-ready0/1ailang serve-api generates A2A agent cards automatically — bonus if you're an early adopter
openapi.json referencedagent-ready0/2ailang serve-api generates OpenAPI 3.1 from Hindley-Milner type signatures
MCP endpoint referencedagent-ready0/2ailang serve-api --mcp-http exposes typed functions as MCP tools
Public API docs linkedagent-ready2/2ailang serve-api hosts Swagger + ReDoc at /api/_meta/ by default
Webhooks documentedagent-ready0/2ailang serve-api handles webhooks as typed handler functions with effect-tracked side effects
Rate limits documentedagent-ready0/2Capability budgets — Net @limit=N is the symmetric server-side primitive for what agents see as rate limits
Streaming / SSE endpointagent-ready0/2std/stream — ssePost and Stream effect handle event-source endpoints with typed event types
Sandbox / test environment offeredagent-ready0/2ailang --ai-stub plus mock effect handlers — deterministic, capability-scoped fakes for any effect, including Net and AI
Authentication documentedagent-ready0/2std/jwt for verification, IFC labels (string / string) to keep credentials out of public sinks at the type level
Idempotency keys documentedagent-ready0/2Pure functions are idempotent by construction; requires/ensures contracts express idempotence as a static guarantee
AG-UI streaming protocolagent-ready0/1std/stream — the AG-UI event lifecycle (RUN_STARTED → TEXT_MESSAGE_CONTENT → TOOL_CALL_RESULT → RUN_FINISHED) is a textbook sum type. ADTs + exhaustive pattern matching make every event-type branch a compile error to skip.
HTTP 402 agent payments (x402 / pay-per-crawl)agent-ready0/1Net @endpoint-scoped capability budgets bound payment destinations; requires { amount <= budget } gates the payload; IFC labels keep the signed payment key out of public sinks. Same primitives cover x402 payload signing and Cloudflare's crawler-price negotiation.
AP2 Agent Payments Protocolagent-ready0/1Mandates ARE contracts. requires { intent.price <= mandate.maxPrice } + ensures { cart.total <= intent.price } is a one-to-one translation of an Intent/Cart Mandate into AILANG. Z3 can verify the bounds at compile time.
UTCP tool-calling protocolagent-ready0/1Typed function signatures are the manifest. ailang serve-api emits the same metadata as a UTCPManual (name, input/output schema, native endpoint) — direct-call discovery without a proxy server.
End-to-end encryption documentedprivacy0/2IFC labels (string) force decryption to flow through a typed boundary; the compiler refuses to publish sealed values without explicit declassification
Compliance certifications citedprivacy0/2requires/ensures contracts express machine-verifiable claims; capability budgets bound audit-trail effects; effect rows leave nothing un-declared
Data minimisation languageprivacy0/2Capability scoping — each Net call declares its endpoint in the effect row, so "doesn't sell" becomes a type-system-enforceable claim, not a marketing one
Third-party domains restrainedprivacy0/2Capability scoping — each Net call declares its endpoint in the effect row
Data residency / on-prem languageprivacy0/2Three-runtime deploy — same module runs in WASM (browser), Cloud Run, and native CLI
Single-vendor LLM languageportable2/2std/ai multi-provider — switch from Anthropic to Gemini to OpenAI without rewriting
Multiple AI providers citedportable2/2std/ai — one Step API across Anthropic, OpenAI, Gemini, OpenRouter, Ollama, and custom-package providers
Cross-runtime / deployment portabilityportable0/2Effect handlers as runtime adapters — same .ail runs as WASM in the browser, a Cloud Run container, and a native CLI; only the handlers change
BYO key / model-agnosticportable0/2AILANG WASM — the full interpreter ships as a browser bundle, so caller-held keys (BYOK), offline apps, and embedded demos all work client-side