Skip to main content

/generateContent

Use LiteLLM to call Google AI's generateContent endpoints for text generation, multimodal interactions, and streaming responses.

Overview​

FeatureSupportedNotes
Cost Tracking✅
Logging✅works across all integrations
End-user Tracking✅
Streaming✅
Fallbacks✅between supported models
Loadbalancing✅between supported models

Usage​


LiteLLM Python SDK​

Non-streaming example​

Basic Text Generation
from litellm.google_genai import agenerate_content
from google.genai.types import ContentDict, PartDict
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

contents = ContentDict(
parts=[
PartDict(text="Hello, can you tell me a short joke?")
],
role="user",
)

response = await agenerate_content(
contents=contents,
model="gemini/gemini-2.0-flash",
max_tokens=100,
)
print(response)

Streaming example​

Streaming Text Generation
from litellm.google_genai import agenerate_content_stream
from google.genai.types import ContentDict, PartDict
import os

# Set API key
os.environ["GEMINI_API_KEY"] = "your-gemini-api-key"

contents = ContentDict(
parts=[
PartDict(text="Write a long story about space exploration")
],
role="user",
)

response = await agenerate_content_stream(
contents=contents,
model="gemini/gemini-2.0-flash",
max_tokens=500,
)

async for chunk in response:
print(chunk)

LiteLLM Proxy Server​

  1. Setup config.yaml
model_list:
- model_name: gemini-flash
litellm_params:
model: gemini/gemini-2.0-flash
api_key: os.environ/GEMINI_API_KEY
  1. Start proxy
litellm --config /path/to/config.yaml
  1. Test it!
Google GenAI SDK with LiteLLM Proxy
from google.genai import Client
import os

# Configure Google GenAI SDK to use LiteLLM proxy
os.environ["GOOGLE_GEMINI_BASE_URL"] = "http://localhost:4000"
os.environ["GEMINI_API_KEY"] = "sk-1234"

client = Client()

response = client.models.generate_content(
model="gemini-flash",
contents=[
{
"parts": [{"text": "Write a short story about AI"}],
"role": "user"
}
],
config={"max_output_tokens": 100}
)