Explore a wide range of AI models. We support models for natural language tasks, image generation, and domain-specific use cases in crypto and Web3. We also provide model fine-tuning service and private model deployment.
model_id
in Heurist API/SDK.
For interactive testing and API exploration, visit our Chat Completion Endpoint documentation which includes a built-in API testing interface.
To estimate usage costs in credits, see the LLM Credits Table below.
deepseek/deepseek-r1
: DeepSeek R1 is a groundbreaking open-source AI model that achieves performance comparable to OpenAI’s o1 model across math, code, and reasoning tasks, supporting self-verification and reflection, while being more cost-efficient than its competitors.
deepseek/deepseek-v3
: DeepSeek V3 0324 version is a powerful Mixture-of-Experts (MoE) language model with 685B total parameters, activating 37B parameters for each token. It demonstrates notable improvements over its predecessor, DeepSeek-V3, and achieves results comparable to Claude Sonnet 3.7.
deepseek/deepseek-r1-distill-llama-70b
: A distilled version of DeepSeek R1, using Llama 3 70B as the base model. It achieves results comparable to DeepSeek R1 but much more cost-efficient and faster.
openai/gpt-oss-120b
: Open-weight 117B-parameter Mixture-of-Experts language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. Activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. Supports configurable reasoning depth, full chain-of-thought access, and native tool use including function calling and structured outputs.
openai/gpt-oss-20b
: Open-weight 21B parameter model from OpenAI under Apache 2.0 license. Uses Mixture-of-Experts architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. Supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling and tool use.
meta-llama/llama-3.3-70b-instruct
: The latest Llama 3 model, outperforming many of the available open source and closed chat models on common industry benchmarks.
nvidia/llama-3.1-nemotron-70b-instruct
: specialized version of the Llama model tailored by NVIDIA for complex instruction-following tasks, delivering high-quality, human-like responses across a variety of applications while leveraging advanced NVIDIA AI technologies for optimal performance and scalability.
NousResearch/Hermes-3-Llama-3.1-8B
: Flagship Hermes LLM trained by Nous Research, with advanced agentic capabilities, enhanced roleplaying, reasoning, multi-turn conversation, long context coherence and agentic abilities. Uncensored
asi1-mini
: ASI1-mini is the first Web3-native LLM, specifically built and optimized for supporting complex agentic workflows. Developed by Fetch.ai, it features adaptive reasoning and context-aware decision-making.
mistralai/mistral-small-24b-instruct
: 24B instruction-tuned Mistral Small 3 model optimized for low-latency agentic use with native function calling and JSON outputs. Strong reasoning for its size, multilingual, 32k context, Apache-2.0 licensed.
google/gemini-2.5-flash
: Hybrid reasoning model with controllable “thinking budgets” that balances speed, cost, and quality. Natively multimodal (text, images, audio, video) with a 1M-token context window—ideal for fast, production chat, summarization, and extraction.
google/gemini-2.5-pro
: Most capable Gemini for complex tasks and coding. Natively multimodal with long context and advanced reasoning; excels at video understanding, planning, and end-to-end code generation for interactive apps.
anthropic/claude-sonnet-4
: High-performance, hybrid reasoning model with strong coding and agentic tool use. 200k context (1M beta), controllable extended thinking, and reliable long-form generation for production assistants.
anthropic/claude-3.5-haiku
: Fast, cost-efficient Claude model for high-volume workloads. Low-latency multimodal understanding with solid instruction-following—well-suited to routing, extraction, and lightweight chat.
openai/gpt-5
: Next-generation unified model combining adaptive reasoning with native multimodality and long context. Designed for agentic workflows with built-in tool use, structured outputs, and persistent context/memory.
openai/gpt-5-mini
: Compact GPT‑5 variant balancing quality and latency for production. Multimodal + reasoning capabilities at lower cost—good default for assistants, batch processing, and RAG orchestration.
openai/gpt-5-nano
: Ultra-low-latency GPT‑5 tier for on-device or cost-sensitive tasks (classification, autocomplete, routing). Optimized for fast responses and structured outputs with minimal overhead.
Model | Input (per 1M tokens) | Output (per 1M tokens) |
---|---|---|
nvidia/llama-3.1-nemotron-70b-instruct | 15 | 30 |
meta-llama/llama-3.3-70b-instruct | 15 | 30 |
NousResearch/Hermes-3-Llama-3.1-8B | 10 | 10 |
deepseek/deepseek-r1 | 300 | 300 |
deepseek/deepseek-v3 | 100 | 100 |
deepseek/deepseek-r1-distill-llama-70b | 80 | 80 |
asi1-mini | 100 | 100 |
google/gemini-2.5-flash | 30 | 250 |
google/gemini-2.5-pro | 125 | 1000 |
anthropic/claude-sonnet-4 | 300 | 1500 |
anthropic/claude-3.5-haiku | 100 | 400 |
openai/gpt-oss-20b | 10 | 50 |
openai/gpt-oss-120b | 30 | 100 |
openai/gpt-5 | 150 | 1200 |
openai/gpt-5-mini | 25 | 200 |
openai/gpt-5-nano | 5 | 40 |
mistralai/mistral-small-24b-instruct | 30 | 30 |
HeuristLogo
: Flux LoRA that can generate the logo of Heurist. Trigger word: Heuristai logo
or hexagonal logo
.
FLUX.1-dev
: State-of-the-art open source image generation model that excels at a variety of image styles.
Aurora
: SD1.5 checkpoint for anime girls.
AnimagineXL
: SDXL checkpoint for anime images. It can generate characters from well-known anime series.
CyberRealisticXL
: SDXL checkpoint for realistic portraits.
BrainDance
: SD1.5 checkpoint for cartoon, anime and watercolor styles.
YamersCartoonArcadia
: SD1.5 checkpoint for stylized 2D cartoon.
ArthemyComics
: SD1.5 checkpoint for fantasy cartoon images.
AAMXLAnimeMix
: SDXL checkpoint for anime art and hentai.
SDXLUnstableDiffusersV11
: SDXL checkpoint that enhances SDXL capabilities in creating vibrant arts, designs and photo-realistic images.
SDXL
: General-purpose image generation model developed by Stability AI.
FLUX.1-kontext-pro
: Advanced AI model for intelligent image editing with context-aware capabilities. Excels at precise modifications, character consistency, and iterative editing workflows while maintaining visual quality across multiple edits.
FLUX.1-kontext-max
: Premium image editing model offering maximum performance with enhanced prompt adherence and superior typography generation. Designed for professional-grade editing tasks requiring the highest quality output.
BAAI/bge-large-en-v1.5
: A large-scale multilingual embedding model. It converts text into a 1024 dimensional vector representation. The max input length is 512 tokens.