Skip to main content

Providers

Learn how to deploy + call models from different providers on LiteLLM

📄️ OpenAI

LiteLLM supports OpenAI Chat + Embedding calls.

📄️ OpenAI (Text Completion)

LiteLLM supports OpenAI text completion models

📄️ OpenAI-Compatible Endpoints

To call models hosted behind an openai proxy, make 2 changes:

📄️ Azure OpenAI

API Keys, Params

📄️ Azure AI Studio

LiteLLM supports all models on Azure AI Studio

📄️ VertexAI [Anthropic, Gemini, Model Garden]

vertex_ai/ route

📄️ Gemini - Google AI Studio

Pre-requisites

📄️ Anthropic

LiteLLM supports all anthropic models.

📄️ AWS Sagemaker

LiteLLM supports All Sagemaker Huggingface Jumpstart Models

📄️ AWS Bedrock

ALL Bedrock models (Anthropic, Meta, Mistral, Amazon, etc.) are Supported

📄️ LiteLLM Proxy (LLM Gateway)

LiteLLM Providers a self hosted proxy server (AI Gateway) to call all the LLMs in the OpenAI format

📄️ Mistral AI API

https://docs.mistral.ai/api/

📄️ Codestral API [Mistral AI]

Codestral is available in select code-completion plugins but can also be queried directly. See the documentation for more details.

📄️ Cohere

API KEYS

📄️ Anyscale

https://app.endpoints.anyscale.com/

📄️ Huggingface

LiteLLM supports the following types of Hugging Face models:

📄️ 🆕 Databricks

LiteLLM supports all models on Databricks

📄️ IBM watsonx.ai

LiteLLM supports all IBM watsonx.ai foundational models and embeddings.

📄️ Predibase

LiteLLM supports all models on Predibase

📄️ Nvidia NIM

https://docs.api.nvidia.com/nim/reference/

📄️ Cerebras

https://inference-docs.cerebras.ai/api-reference/chat-completions

📄️ Volcano Engine (Volcengine)

https://www.volcengine.com/docs/82379/1263482

📄️ Triton Inference Server

LiteLLM supports Embedding Models on Triton Inference Servers

📄️ Ollama

LiteLLM supports all models from Ollama

📄️ Perplexity AI (pplx-api)

https://www.perplexity.ai

📄️ FriendliAI

https://suite.friendli.ai/

📄️ Groq

https://groq.com/

📄️ 🆕 Github

https://github.com/marketplace/models

📄️ Deepseek

https://deepseek.com/

📄️ Fireworks AI

https://fireworks.ai/

📄️ Clarifai

Anthropic, OpenAI, Mistral, Llama and Gemini LLMs are Supported on Clarifai.

📄️ VLLM

LiteLLM supports all models on VLLM.

📄️ Xinference [Xorbits Inference]

https://inference.readthedocs.io/en/latest/index.html

📄️ Cloudflare Workers AI

https://developers.cloudflare.com/workers-ai/models/text-generation/

📄️ DeepInfra

https://deepinfra.com/

📄️ AI21

LiteLLM supports the following AI21 models:

📄️ NLP Cloud

LiteLLM supports all LLMs on NLP Cloud.

📄️ Replicate

LiteLLM supports all models on Replicate

📄️ Together AI

LiteLLM supports all models on Together AI.

📄️ Voyage AI

https://docs.voyageai.com/embeddings/

📄️ Jina AI

https://jina.ai/embeddings/

📄️ Aleph Alpha

LiteLLM supports all models from Aleph Alpha.

📄️ Baseten

LiteLLM supports any Text-Gen-Interface models on Baseten.

📄️ OpenRouter

LiteLLM supports all the text / chat / vision models from OpenRouter

📄️ PaLM API - Google

Warning: The PaLM API is decomissioned by Google The PaLM API is scheduled to be decomissioned in October 2024. Please upgrade to the Gemini API or Vertex AI API

📄️ Sambanova

https://community.sambanova.ai/t/create-chat-completion-api/

📄️ Custom API Server (Custom Format)

Call your custom torch-serve / internal LLM APIs via LiteLLM

📄️ Petals

Petals//github.com/bigscience-workshop/petals