OpenAI Passthrough

Pass-through endpoints for /openai

Overview

Feature	Supported	Notes
Cost Tracking	❌	Not supported
Logging	✅	Works across all integrations
Streaming	✅	Fully supported
Load Balancing	✅	When using router models

When to use this?

For 90% of your use cases, you should use the native LiteLLM OpenAI Integration (/chat/completions, /embeddings, /completions, /images, /batches, etc.)
Use this passthrough to call less popular or newer OpenAI endpoints that LiteLLM doesn't fully support yet, such as /assistants, /threads, /vector_stores

Simply replace https://api.openai.com with LITELLM_PROXY_BASE_URL/openai

Quick Start

1. Setup config.yaml

model_list:
  # Deployment 1
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY_1  # reads from environment
  
  # Deployment 2
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY_2  # reads from environment

Set OpenAI API keys in your environment:

export OPENAI_API_KEY_1="your-first-api-key"
export OPENAI_API_KEY_2="your-second-api-key"

2. Start Proxy

litellm --config config.yaml

# RUNNING on http://0.0.0.0:4000

3. Use OpenAI SDK

Replace https://api.openai.com with your proxy URL:

import openai

client = openai.OpenAI(
    base_url="http://0.0.0.0:4000/openai",
    api_key="sk-1234"  # your litellm proxy api key
)

# Use any OpenAI endpoint
assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a math tutor.",
    model="gpt-4"  # uses router model from config.yaml
)

Load Balancing

Define multiple deployments with the same model_name for automatic load balancing:

model_list:
  # Deployment 1
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY_1
  
  # Deployment 2
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY_2

The proxy automatically distributes requests across both API keys.

Specifying Router Models

For endpoints that don't have a model field in the request body (e.g., /files/delete), specify the router model using:

Option 1: Request Header

import openai

client = openai.OpenAI(
    base_url="http://0.0.0.0:4000/openai",
    api_key="sk-1234",
    default_headers={"X-LiteLLM-Target-Model": "gpt-4"}
)

# Delete a file using the specified router model
client.files.delete(file_id="file-abc123")

Option 2: Request Body

import openai

client = openai.OpenAI(
    base_url="http://0.0.0.0:4000/openai",
    api_key="sk-1234"
)

# Upload file with target model
file = client.files.create(
    file=open("data.jsonl", "rb"),
    purpose="batch",
    extra_body={"target_model_names": "gpt-4"}
)

# Or with a list (first model will be used)
file = client.files.create(
    file=open("data.jsonl", "rb"),
    purpose="batch",
    extra_body={"target_model_names": ["gpt-4", "gpt-3.5-turbo"]}
)

Usage Examples

Assistants API

Create OpenAI Client

Make sure you do the following:

Point base_url to your LITELLM_PROXY_BASE_URL/openai
Use your LITELLM_API_KEY as the api_key

import openai

client = openai.OpenAI(
    base_url="http://0.0.0.0:4000/openai",  # <your-proxy-url>/openai
    api_key="sk-anything"  # <your-proxy-api-key>
)

Create an Assistant

# Create an assistant
assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a math tutor. Help solve equations.",
    model="gpt-4o",
)

Create a Thread

# Create a thread
thread = client.beta.threads.create()

Add a Message to the Thread

# Add a message
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Solve 3x + 11 = 14",
)

Run the Assistant

# Create a run to get the assistant's response
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

# Check run status
run_status = client.beta.threads.runs.retrieve(
    thread_id=thread.id,
    run_id=run.id
)

Retrieve Messages

# List messages after the run completes
messages = client.beta.threads.messages.list(
    thread_id=thread.id
)

Delete the Assistant

# Delete the assistant when done
client.beta.assistants.delete(assistant.id)

Overview​

When to use this?​

Quick Start​

1. Setup config.yaml​

2. Start Proxy​

3. Use OpenAI SDK​

Load Balancing​

Specifying Router Models​

Option 1: Request Header​

Option 2: Request Body​

Usage Examples​

Assistants API​

Create OpenAI Client​

Create an Assistant​

Create a Thread​

Add a Message to the Thread​

Run the Assistant​

Retrieve Messages​

Delete the Assistant​