List of Top models, Vision & Audio APIs

Explore the Best AI Models

X.AI grok 2 1212 language model for text generation and understanding.

131K Context$2.0/M Input tokens | $10.0/M Output tokens

X.AI grok imagine image pro model for generating high-quality images and visual content.

Formula: 0.07

Advanced image generation model with support for multiple resolutions and styles.

Formula: 0.02

X.AI grok 2 image 1212 model for generating high-quality images and visual content.

Formula: 0.07

Grok 4 with July 9th 2024 training cutoff, featuring enhanced reasoning capabilities.

256K Context$3.0/M Input tokens | $15.0/M Output tokens

X.AI grok 4 1 fast reasoning language model for text generation and understanding.

2000K Context$0.2/M Input tokens | $0.5/M Output tokens

X.AI grok 4 1 fast non reasoning language model for text generation and understanding.

2000K Context$0.2/M Input tokens | $0.5/M Output tokens

X.AI grok 2 vision 1212 multimodal vision model for understanding and generating content from visual inputs.

33K Context$2.0/M Input tokens | $10.0/M Output tokens

X.AI grok code fast 1 specialized model for code generation and understanding.

256K Context$0.2/M Input tokens | $1.5/M Output tokens

Lightweight version of Grok 3 optimized for cost-effective applications.

131K Context$0.3/M Input tokens | $0.5/M Output tokens

Grok 3 is a powerful language model with advanced capabilities for text generation and understanding.

131K Context$3.0/M Input tokens | $15.0/M Output tokens

X.AI grok 4 fast non reasoning language model for text generation and understanding.

2000K Context$0.2/M Input tokens | $0.5/M Output tokens

X.AI grok 4 fast reasoning language model for text generation and understanding.

2000K Context$0.2/M Input tokens | $0.5/M Output tokens

Gemini 3 Pro is Google's flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window.

1050K Context$0.5/M Input tokens | $3.0/M Output tokens

Flagship model for coding and agentic tasks across industries. Supports reasoning with effort levels: none, low, medium, high, xhigh.

400K Context$1.8/M Input tokens | $14.0/M Output tokens

Vision-capable model for image understanding, OCR, captioning, and multimodal Q&A.

0K ContextNo pricing info

Gemini 3 Pro is Google's flagship frontier model for high-precision multimodal reasoning, combining strong performance across text, image, video, audio, and code with a 1M-token context window.

1050K Context$2.0/M Input tokens | $12.0/M Output tokens

Smaller, cheaper GPT-5 variant retaining strong quality with tool-use support for practical apps.

400K Context$1.3/M Input tokens | $1.0/M Output tokens

A mini model built for Max coding & agentic workflows with just 10 billion activated parameters.

262K Context$0.3/M Input tokens | $1.1/M Output tokens

Our fastest model with near-frontier intelligence.

200K Context$1.0/M Input tokens | $5.0/M Output tokens