Skip to main content

OpenAI Passthrough

Pass-through endpoints for /openai

Overview

FeatureSupportedNotes
Cost TrackingNot supported
LoggingWorks across all integrations
StreamingFully supported
Load BalancingWhen using router models

When to use this?

  • For 90% of your use cases, you should use the native LiteLLM OpenAI Integration (/chat/completions, /embeddings, /completions, /images, /batches, etc.)
  • Use this passthrough to call less popular or newer OpenAI endpoints that LiteLLM doesn't fully support yet, such as /assistants, /threads, /vector_stores

Simply replace https://api.openai.com with LITELLM_PROXY_BASE_URL/openai

Quick Start

1. Setup config.yaml

model_list:
# Deployment 1
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY_1 # reads from environment

# Deployment 2
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY_2 # reads from environment

Set OpenAI API keys in your environment:

export OPENAI_API_KEY_1="your-first-api-key"
export OPENAI_API_KEY_2="your-second-api-key"

2. Start Proxy

litellm --config config.yaml

# RUNNING on http://0.0.0.0:4000

3. Use OpenAI SDK

Replace https://api.openai.com with your proxy URL:

import openai

client = openai.OpenAI(
base_url="http://0.0.0.0:4000/openai",
api_key="sk-1234" # your litellm proxy api key
)

# Use any OpenAI endpoint
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a math tutor.",
model="gpt-4" # uses router model from config.yaml
)

Load Balancing

Define multiple deployments with the same model_name for automatic load balancing:

model_list:
# Deployment 1
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY_1

# Deployment 2
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_API_KEY_2

The proxy automatically distributes requests across both API keys.

Specifying Router Models

For endpoints that don't have a model field in the request body (e.g., /files/delete), specify the router model using:

Option 1: Request Header

import openai

client = openai.OpenAI(
base_url="http://0.0.0.0:4000/openai",
api_key="sk-1234",
default_headers={"X-LiteLLM-Target-Model": "gpt-4"}
)

# Delete a file using the specified router model
client.files.delete(file_id="file-abc123")

Option 2: Request Body

import openai

client = openai.OpenAI(
base_url="http://0.0.0.0:4000/openai",
api_key="sk-1234"
)

# Upload file with target model
file = client.files.create(
file=open("data.jsonl", "rb"),
purpose="batch",
extra_body={"target_model_names": "gpt-4"}
)

# Or with a list (first model will be used)
file = client.files.create(
file=open("data.jsonl", "rb"),
purpose="batch",
extra_body={"target_model_names": ["gpt-4", "gpt-3.5-turbo"]}
)

Usage Examples

Assistants API

Create OpenAI Client

Make sure you do the following:

  • Point base_url to your LITELLM_PROXY_BASE_URL/openai
  • Use your LITELLM_API_KEY as the api_key
import openai

client = openai.OpenAI(
base_url="http://0.0.0.0:4000/openai", # <your-proxy-url>/openai
api_key="sk-anything" # <your-proxy-api-key>
)

Create an Assistant

# Create an assistant
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a math tutor. Help solve equations.",
model="gpt-4o",
)

Create a Thread

# Create a thread
thread = client.beta.threads.create()

Add a Message to the Thread

# Add a message
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Solve 3x + 11 = 14",
)

Run the Assistant

# Create a run to get the assistant's response
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)

# Check run status
run_status = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id
)

Retrieve Messages

# List messages after the run completes
messages = client.beta.threads.messages.list(
thread_id=thread.id
)

Delete the Assistant

# Delete the assistant when done
client.beta.assistants.delete(assistant.id)