AILANG × huggingface.co

AI/ML developers and enterprises looking to build and deploy AI applications using various models and providers.

Hugging Face Inference Providers offer a unified API for developers to access hundreds of machine learning models from various providers. It enables building AI applications with tasks like text generation, image creation, and search, reducing vendor lock-in and simplifying model deployment and management.

Inference Providers Machine Learning Models API AI Infrastructure Vendor Lock-in Serverless inference

What AILANG Parse sees on huggingface.co

Structural extraction — the same content an AI agent would consume from this page.

21 headings5 images11 lists1 tables53 links10 code samplesHTML parsing by AILANG Parse

5 sections — page skeleton

1 header 3 navs 1 main

21 headings

Inference Providers Inference Providers Partners Why Choose Inference Providers? Key Features Getting Started

5 images

11 list items

[ Models ](/models) [ Datasets ](/datasets) [ Spaces ](/spaces) [ Buckets new](/storage) [ Docs ](/docs) [ Enterprise ](/enterprise) [Pricing](/pricing) [Log In](/login) [Sign Up](/join) **Text Generation**: Use Large language models with tool-calling capabilities for chatbot… **Image and Video Generation**: Create custom images and videos, including support for Lo… **Search & Retrieval**: State-of-the-art embeddings for semantic search, RAG systems, and…

Show the full extract — what AILANG Parse pulled from this page

# Inference Providers · Hugging Face


*Header:*
[Image: Hugging Face's logo]

Hugging Face

[Hugging Face](/)

- [ Models ](/models)
- [ Datasets ](/datasets)
- [ Spaces ](/spaces)
- [ Buckets new](/storage)
- [ Docs ](/docs)
- [ Enterprise ](/enterprise)
- [Pricing](/pricing)
- [Log In](/login)
- [Sign Up](/join)

Inference Providers documentation

Inference Providers

# Inference Providers

🏡 View all docsAWS Trainium & InferentiaAccelerateArgillaAutoTrainBitsandbytesCLIChat UIDataset viewerDatasetsDeploying on AWSDiffusersDistilabelEvaluateGoogle CloudGoogle TPUsGradioHubHub Python LibraryHuggingface.jsInference Endpoints (dedicated)Inference ProvidersKernelsLeRobotLeaderboardsLightevalMicrosoft AzureOptimumPEFTReachy MiniSafetensorsSentence TransformersTRLTasksText Embeddings InferenceText Generation InferenceTokenizersTrackioTransformersTransformers.jsXetsmolagentstimm

Search documentation

main

EN

Get Started

[Inference Providers](/docs/inference-providers/en/index)

[Pricing and Billing](/docs/inference-providers/en/pricing)

[Hub integration](/docs/inference-providers/en/hub-integration)

[Security](/docs/inference-providers/en/security)

Guides

[Your First API Call](/docs/inference-providers/en/guides/first-api-call)

[Building Your First AI App](/docs/inference-providers/en/guides/building-first-app)

[Structured Outputs with LLMs](/docs/inference-providers/en/guides/structured-output)

[Function Calling](/docs/inference-providers/en/guides/function-calling)

[Responses API (beta)](/docs/inference-providers/en/guides/responses-api)

[How to use OpenAI gpt-oss](/docs/inference-providers/en/guides/gpt-oss)

[Build an Image Editor](/docs/inference-providers/en/guides/image-editor)

[Automating Code Review with GitHub Actions](/docs/inference-providers/en/guides/github-actions-code-review)

[Agentic Coding Environments with OpenEnv](/docs/inference-providers/en/guides/coding-environment)

[Evaluating Models with Inspect](/docs/inference-providers/en/guides/evaluation-inspect-ai)

Integrations

[Overview](/docs/inference-providers/en/integrations/index)

[Add Your Integration](/docs/inference-providers/en/integrations/adding-integration)

[Claude Code](/docs/inference-providers/en/integrations/claude-code)

[Hermes Agent](/docs/inference-providers/en/integrations/hermes-agent)

[NeMo Data Designer](/docs/inference-providers/en/integrations/datadesigner)

[MacWhisper](/docs/inference-providers/en/integrations/macwhisper)

[OpenCode](/docs/inference-providers/en/integrations/opencode)

[Pi](/docs/inference-providers/en/integrations/pi)

[Vision Agents](/docs/inference-providers/en/integrations/visionagents)

[VS Code with GitHub Copilot](/docs/inference-providers/en/integrations/vscode)

Inference Tasks

[Chat Completion](/docs/inference-providers/en/tasks/chat-completion)

[Feature Extraction](/docs/inference-providers/en/tasks/feature-extraction)

[Text to Image](/docs/inference-providers/en/tasks/text-to-image)

[Text to Video](/docs/inference-providers/en/tasks/text-to-video)

Other Tasks

Providers

[Cerebras](/docs/inference-providers/en/providers/cerebras)

[Cohere](/docs/inference-providers/en/providers/cohere)

[DeepInfra](/docs/inference-providers/en/providers/deepinfra)

[Fal AI](/docs/inference-providers/en/providers/fal-ai)

[Featherless AI](/docs/inference-providers/en/providers/featherless-ai)

[Fireworks](/docs/inference-providers/en/providers/fireworks-ai)

[Groq](/docs/inference-providers/en/providers/groq)

[Hyperbolic](/docs/inference-providers/en/providers/hyperbolic)

[HF Inference](/docs/inference-providers/en/providers/hf-inference)

[Novita](/docs/inference-providers/en/providers/novita)

[Nscale](/docs/inference-providers/en/providers/nscale)

[OVHcloud AI Endpoints](/docs/inference-providers/en/providers/ovhcloud)

[Public AI](/docs/inference-providers/en/providers/publicai)

[Replicate](/docs/inference-providers/en/providers/replicate)

[SambaNova](/docs/inference-providers/en/providers/sambanova)

[Scaleway](/docs/inference-providers/en/providers/scaleway)

[Together](/docs/inference-providers/en/providers/together)

[WaveSpeedAI](/docs/inference-providers/en/providers/wavespeed)

[Z.ai](/docs/inference-providers/en/providers/zai-org)

[Hub API](/docs/inference-providers/en/hub-api)

[Register as an Inference Provider](/docs/inference-providers/en/register-as-a-provider)

[Image: Hugging Face's logo]

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

[Sign Up](/join)

to get started

Copy page

# Inference Providers

[Image]

[Image]

Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. They are also integrated into our client SDKs (for JS and Python), making it easy to explore serverless inference of models on your favorite providers.

## Partners

Our platform integrates with leading AI infrastructure providers, giving you access to their specialized capabilities through a single, consistent API. Here’s what each partner supports:

| Provider | Chat completion (LLM) | Chat completion (VLM) | Feature Extraction | Text to Image | Text to video | Speech to text |
| --- | --- | --- | --- | --- | --- | --- |
| [Cerebras](./providers/cerebras) | ✅ |  |  |  |  |  |
| [Cohere](./providers/cohere) | ✅ | ✅ |  |  |  |  |
| [DeepInfra](./providers/deepinfra) | ✅ | ✅ |  |  |  |  |
| [Fal AI](./providers/fal-ai) |  |  |  | ✅ | ✅ | ✅ |
| [Featherless AI](./providers/featherless-ai) | ✅ | ✅ |  |  |  |  |
| [Fireworks](./providers/fireworks-ai) | ✅ | ✅ |  |  |  |  |
| [Groq](./providers/groq) | ✅ | ✅ |  |  |  |  |
| [HF Inference](./providers/hf-inference) | ✅ | ✅ | ✅ | ✅ |  | ✅ |
| [Hyperbolic](./providers/hyperbolic) | ✅ | ✅ |  |  |  |  |
| [Novita](./providers/novita) | ✅ | ✅ |  |  | ✅ |  |
| [Nscale](./providers/nscale) | ✅ | ✅ |  | ✅ |  |  |
| [OVHcloud AI Endpoints](./providers/ovhcloud) | ✅ | ✅ |  |  |  |  |
| [Public AI](./providers/publicai) | ✅ |  |  |  |  |  |
| [Replicate](./providers/replicate) |  |  |  | ✅ | ✅ | ✅ |
| [SambaNova](./providers/sambanova) | ✅ |  | ✅ |  |  |  |
| [Scaleway](./providers/scaleway) | ✅ |  | ✅ |  |  |  |
| [Together](./providers/together) | ✅ | ✅ |  | ✅ |  |  |
| [WaveSpeedAI](./providers/wavespeed) |  |  |  | ✅ | ✅ |  |
| [Z.ai](./providers/zai-org) | ✅ | ✅ |  |  |  |  |

## Why Choose Inference Providers?

When you build AI applications, it’s tough to manage multiple provider APIs, comparing model performance, and dealing with varying reliability. Inference Providers solves these challenges by offering:

**Instant Access to Cutting-Edge Models**: Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks. Whether you need the latest language models, state-of-the-art image generators, or domain-specific embeddings, you’ll find them here.

**Zero Vendor Lock-in**: Unlike being tied to a single provider’s model catalog, you get access to models from Cerebras, Groq, Together AI, Replicate, and more — all through one consistent interface.

**Production-Ready Performance**: Built for enterprise workloads with the reliability your applications demand.

Here’s what you can build:

- **Text Generation**: Use Large language models with tool-calling capabilities for chatbots, content generation, and code assistance
- **Image and Video Generation**: Create custom images and videos, including support for LoRAs and style customization
- **Search & Retrieval**: State-of-the-art embeddings for semantic search, RAG systems, and recommendation engines
- **Traditional ML Tasks**: Ready-to-use models for classification, NER, summarization, and speech recognition

⚡ **Get Started for Free**: Inference Providers includes a generous free tier, with additional credits for [PRO users](https://hf.co/subscribe/pro) and [Team & Enterprise organizations](https://huggingface.co/enterprise).

## Key Features

- **🎯 All-in-One API**: A single API for text generation, image generation, document embeddings, NER, summarization, image classification, and more.
- **🔀 Multi-Provider Support**: Easily run models from top-tier providers like fal, Replicate, Sambanova, Together AI, and others.
- **🚀 Scalable & Reliable**: Built for high availability and low-latency performance in production environments.
- **🔧 Developer-Friendly**: Simple requests, fast responses, and a consistent developer experience across Python and JavaScript clients.
- **👷 Easy to integrate**: Drop-in replacement for the OpenAI chat completions API.
- **💰 Cost-Effective**: No extra markup on provider rates.

## Getting Started

Inference Providers works with your existing development workflow. Whether you prefer Python, JavaScript, or direct HTTP calls, we provide native SDKs and OpenAI-compatible APIs to get you up and running quickly.

We’ll walk through a practical example using [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b), a state-of-the-art open-weights conversational model.

### Inference Playground

Before diving into integration, explore models interactively with our [Inference Playground](https://huggingface.co/playground). Test different [chat completion models](http://huggingface.co/models?inference_provider=all&sort=trending&other=conversational) with your prompts and compare responses to find the perfect fit for your use case.

[Image: Inference Playground thumbnail]

[(link)](https://huggingface.co/playground)

### Authentication

You’ll need a Hugging Face token to authenticate your requests. Create one by visiting your [token settings](https://huggingface.co/settings/tokens/new?ownUserPermissions=inference.serverless.write&tokenType=fineGrained) and generating a `fine-grained` token with `Make calls to Inference Providers` permissions.

For complete token management details, see our [security tokens guide](https://huggingface.co/docs/hub/en/security-tokens).

### Quick Start - LLM

Let’s start with the most common use case: conversational AI using large language models. This section demonstrates how to perform chat completions using DeepSeek V3, showcasing the different ways you can integrate Inference Providers into your applications.

Whether you prefer our native clients, want OpenAI compatibility, or need direct HTTP access, we’ll show you how to get up and running with just a few lines of code.

#### Python

Here are three ways to integrate Inference Providers into your Python applications, from high-level convenience to low-level control:

huggingface_hub

openai

requests

For convenience, the `huggingface_hub` library provides an [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/guides/inference) that automatically handles provider selection and request routing.

In your terminal, install the Hugging Face Hub Python client and log in:

Copied

pip install huggingface_hub
hf auth login # get a read token from hf.co/settings/tokens

You can now use the client with a Python interpreter.

By default, our system automatically selects the fastest available provider for the specified model (equivalent to the `:fastest` policy — highest throughput in tokens per second).

You can change the provider selection policy by appending a policy suffix to the model id: `:cheapest` for the most cost-efficient provider (lowest price per output token), or `:preferred` to follow your preference order in [Inference Provider settings](https://hf.co/settings/inference-providers). For example, `openai/gpt-oss-120b:cheapest`.

You can also select the provider of your choice by appending the provider name to the model id (e.g. `"openai/gpt-oss-120b:sambanova"`).

Copied

import os
from huggingface_hub import InferenceClient

client = InferenceClient()

completion = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {

huggingface.co scored 5/10 on portable. AILANG opportunity is therefore 5/10. Here's where it would land first.

Same module, any LLM — picked at the CLI

Provider selection isn't a code edit — it's a flag on the run command. The exact same compiled .ail file talks to Anthropic, Google, OpenAI, OpenRouter or local Ollama depending on what you pass to `--ai`. Vendor lock-in becomes a shell-history concern.

# Same chat.ail, three vendors — no source change.
ailang run --ai claude-haiku-4-5  chat.ail
ailang run --ai gemini-2.5-flash chat.ail
ailang run --ai gpt-5.1-nano     chat.ail
# std/ai dispatches to each provider's native API.

→ AILANG docs

Structured output, portable across providers

callJson(prompt, schema) maps to each provider's native structured-output primitive — responseSchema for Gemini, response_format for OpenAI, forced-tool for Anthropic. Your schema, their plumbing.

let result = callJson(prompt, intentSchema);
-- same AILANG code, four different provider paths underneath.

→ AILANG docs

OpenRouter routing with replayable resolution

Reach SOTA open-source models through OpenRouter; the resolved model ID is logged so the eval is replayable months later, even if the upstream router has moved on.

call(prompt, model = "openrouter/meta-llama/llama-4-405b");
-- the eval harness pins the exact resolved model ID.

→ AILANG docs

How this page was made

func sketchSite(url: string<pii>, topic: Topic) -> Sketch ! {Net @limit=1, AI @limit=5, FS @limit=4, Process, Declassify}

Read your site once. No second fetch, no crawl.

Up to five AI calls, capped. No runaway analysis.

Four file writes. The report, the topic index, the rubric breakdown, the queue ack. Nothing else touched.

Process invokes docparse for structured extraction. The capability is visible in the type.

Your URL crossed Declassify. No PII survives into this page.

Signal	Topic	Result	Points	AILANG primitive
agent.json referenced	agent-ready	✗	0/1	ailang serve-api generates A2A agent cards automatically — bonus if you're an early adopter
openapi.json referenced	agent-ready	✗	0/2	ailang serve-api generates OpenAPI 3.1 from Hindley-Milner type signatures
MCP endpoint referenced	agent-ready	✗	0/2	ailang serve-api --mcp-http exposes typed functions as MCP tools
Public API docs linked	agent-ready	✓	2/2	ailang serve-api hosts Swagger + ReDoc at /api/_meta/ by default
Webhooks documented	agent-ready	✗	0/2	ailang serve-api handles webhooks as typed handler functions with effect-tracked side effects
Rate limits documented	agent-ready	✗	0/2	Capability budgets — Net @limit=N is the symmetric server-side primitive for what agents see as rate limits
Streaming / SSE endpoint	agent-ready	✗	0/2	std/stream — ssePost and Stream effect handle event-source endpoints with typed event types
Sandbox / test environment offered	agent-ready	✗	0/2	ailang --ai-stub plus mock effect handlers — deterministic, capability-scoped fakes for any effect, including Net and AI
Authentication documented	agent-ready	✗	0/2	std/jwt for verification, IFC labels (string / string) to keep credentials out of public sinks at the type level
Idempotency keys documented	agent-ready	✗	0/2	Pure functions are idempotent by construction; requires/ensures contracts express idempotence as a static guarantee
AG-UI streaming protocol	agent-ready	✗	0/1	std/stream — the AG-UI event lifecycle (RUN_STARTED → TEXT_MESSAGE_CONTENT → TOOL_CALL_RESULT → RUN_FINISHED) is a textbook sum type. ADTs + exhaustive pattern matching make every event-type branch a compile error to skip.
HTTP 402 agent payments (x402 / pay-per-crawl)	agent-ready	✗	0/1	Net @endpoint-scoped capability budgets bound payment destinations; requires { amount <= budget } gates the payload; IFC labels keep the signed payment key out of public sinks. Same primitives cover x402 payload signing and Cloudflare's crawler-price negotiation.
AP2 Agent Payments Protocol	agent-ready	✗	0/1	Mandates ARE contracts. requires { intent.price <= mandate.maxPrice } + ensures { cart.total <= intent.price } is a one-to-one translation of an Intent/Cart Mandate into AILANG. Z3 can verify the bounds at compile time.
UTCP tool-calling protocol	agent-ready	✗	0/1	Typed function signatures are the manifest. ailang serve-api emits the same metadata as a UTCPManual (name, input/output schema, native endpoint) — direct-call discovery without a proxy server.
End-to-end encryption documented	privacy	✗	0/2	IFC labels (string) force decryption to flow through a typed boundary; the compiler refuses to publish sealed values without explicit declassification
Compliance certifications cited	privacy	✗	0/2	requires/ensures contracts express machine-verifiable claims; capability budgets bound audit-trail effects; effect rows leave nothing un-declared
Data minimisation language	privacy	✗	0/2	Capability scoping — each Net call declares its endpoint in the effect row, so "doesn't sell" becomes a type-system-enforceable claim, not a marketing one
Third-party domains restrained	privacy	✗	0/2	Capability scoping — each Net call declares its endpoint in the effect row
Data residency / on-prem language	privacy	✗	0/2	Three-runtime deploy — same module runs in WASM (browser), Cloud Run, and native CLI
Single-vendor LLM language	portable	○	2/2	std/ai multi-provider — switch from Anthropic to Gemini to OpenAI without rewriting
Multiple AI providers cited	portable	✓	2/2	std/ai — one Step API across Anthropic, OpenAI, Gemini, OpenRouter, Ollama, and custom-package providers
Cross-runtime / deployment portability	portable	✗	0/2	Effect handlers as runtime adapters — same .ail runs as WASM in the browser, a Cloud Run container, and a native CLI; only the handlers change
BYO key / model-agnostic	portable	✗	0/2	AILANG WASM — the full interpreter ships as a browser bundle, so caller-held keys (BYOK), offline apps, and embedded demos all work client-side

The rubric that scored this page is open-source AILANG code — every signal extractor is contract-verified. View rubric →

AILANG×huggingface.co

huggingface.co scored 5/10 on portable.

AI/ML developers and enterprises looking to build and deploy AI applications using various models and providers.

What AILANG Parse sees on huggingface.co

Same module, any LLM — picked at the CLI

Structured output, portable across providers

OpenRouter routing with replayable resolution

How this page was made