Supported Models

You can use the following model name as model_id in Heurist API/SDK. For interactive testing and API exploration, visit our Chat Completion Endpoint documentation which includes a built-in API testing interface. To estimate usage costs in credits, see the LLM Credits Table below.

Large Language Models (LLMs)

  • deepseek/deepseek-r1: DeepSeek R1 is a groundbreaking open-source AI model that achieves performance comparable to OpenAI’s o1 model across math, code, and reasoning tasks, supporting self-verification and reflection, while being more cost-efficient than its competitors.
  • deepseek/deepseek-v3: DeepSeek V3 0324 version is a powerful Mixture-of-Experts (MoE) language model with 685B total parameters, activating 37B parameters for each token. It demonstrates notable improvements over its predecessor, DeepSeek-V3, and achieves results comparable to Claude Sonnet 3.7.
  • deepseek/deepseek-r1-distill-llama-70b: A distilled version of DeepSeek R1, using Llama 3 70B as the base model. It achieves results comparable to DeepSeek R1 but much more cost-efficient and faster.
  • openai/gpt-oss-120b: Open-weight 117B-parameter Mixture-of-Experts language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. Activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. Supports configurable reasoning depth, full chain-of-thought access, and native tool use including function calling and structured outputs.
  • openai/gpt-oss-20b: Open-weight 21B parameter model from OpenAI under Apache 2.0 license. Uses Mixture-of-Experts architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. Supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling and tool use.
  • meta-llama/llama-3.3-70b-instruct: The latest Llama 3 model, outperforming many of the available open source and closed chat models on common industry benchmarks.
  • nvidia/llama-3.1-nemotron-70b-instruct: specialized version of the Llama model tailored by NVIDIA for complex instruction-following tasks, delivering high-quality, human-like responses across a variety of applications while leveraging advanced NVIDIA AI technologies for optimal performance and scalability.
  • NousResearch/Hermes-3-Llama-3.1-8B: Flagship Hermes LLM trained by Nous Research, with advanced agentic capabilities, enhanced roleplaying, reasoning, multi-turn conversation, long context coherence and agentic abilities. Uncensored
  • asi1-mini: ASI1-mini is the first Web3-native LLM, specifically built and optimized for supporting complex agentic workflows. Developed by Fetch.ai, it features adaptive reasoning and context-aware decision-making.
  • mistralai/mistral-small-24b-instruct: 24B instruction-tuned Mistral Small 3 model optimized for low-latency agentic use with native function calling and JSON outputs. Strong reasoning for its size, multilingual, 32k context, Apache-2.0 licensed.
  • google/gemini-2.5-flash: Hybrid reasoning model with controllable “thinking budgets” that balances speed, cost, and quality. Natively multimodal (text, images, audio, video) with a 1M-token context window—ideal for fast, production chat, summarization, and extraction.
  • google/gemini-2.5-pro: Most capable Gemini for complex tasks and coding. Natively multimodal with long context and advanced reasoning; excels at video understanding, planning, and end-to-end code generation for interactive apps.
  • anthropic/claude-sonnet-4: High-performance, hybrid reasoning model with strong coding and agentic tool use. 200k context (1M beta), controllable extended thinking, and reliable long-form generation for production assistants.
  • anthropic/claude-3.5-haiku: Fast, cost-efficient Claude model for high-volume workloads. Low-latency multimodal understanding with solid instruction-following—well-suited to routing, extraction, and lightweight chat.
  • openai/gpt-5: Next-generation unified model combining adaptive reasoning with native multimodality and long context. Designed for agentic workflows with built-in tool use, structured outputs, and persistent context/memory.
  • openai/gpt-5-mini: Compact GPT‑5 variant balancing quality and latency for production. Multimodal + reasoning capabilities at lower cost—good default for assistants, batch processing, and RAG orchestration.
  • openai/gpt-5-nano: Ultra-low-latency GPT‑5 tier for on-device or cost-sensitive tasks (classification, autocomplete, routing). Optimized for fast responses and structured outputs with minimal overhead.

LLM Credits Table

Pricing is in credits per 1M tokens.
ModelInput (per 1M tokens)Output (per 1M tokens)
nvidia/llama-3.1-nemotron-70b-instruct1530
meta-llama/llama-3.3-70b-instruct1530
NousResearch/Hermes-3-Llama-3.1-8B1010
deepseek/deepseek-r1300300
deepseek/deepseek-v3100100
deepseek/deepseek-r1-distill-llama-70b8080
asi1-mini100100
google/gemini-2.5-flash30250
google/gemini-2.5-pro1251000
anthropic/claude-sonnet-43001500
anthropic/claude-3.5-haiku100400
openai/gpt-oss-20b1050
openai/gpt-oss-120b30100
openai/gpt-51501200
openai/gpt-5-mini25200
openai/gpt-5-nano540
mistralai/mistral-small-24b-instruct3030

Note on uncensored models:

The models marked as Uncensored are finetuned with specific datasets to eliminate censorship. For other models, it’s possible to avoid censorship by using jailbreaking prompts, which works in most cases.

Image Generation Models

  • HeuristLogo: Flux LoRA that can generate the logo of Heurist. Trigger word: Heuristai logo or hexagonal logo.
  • FLUX.1-dev: State-of-the-art open source image generation model that excels at a variety of image styles.
  • Aurora: SD1.5 checkpoint for anime girls.
  • AnimagineXL: SDXL checkpoint for anime images. It can generate characters from well-known anime series.
  • CyberRealisticXL: SDXL checkpoint for realistic portraits.
  • BrainDance: SD1.5 checkpoint for cartoon, anime and watercolor styles.
  • YamersCartoonArcadia: SD1.5 checkpoint for stylized 2D cartoon.
  • ArthemyComics: SD1.5 checkpoint for fantasy cartoon images.
  • AAMXLAnimeMix: SDXL checkpoint for anime art and hentai.
  • SDXLUnstableDiffusersV11: SDXL checkpoint that enhances SDXL capabilities in creating vibrant arts, designs and photo-realistic images.
  • SDXL: General-purpose image generation model developed by Stability AI.

Image Editing Models

  • FLUX.1-kontext-pro: Advanced AI model for intelligent image editing with context-aware capabilities. Excels at precise modifications, character consistency, and iterative editing workflows while maintaining visual quality across multiple edits.
  • FLUX.1-kontext-max: Premium image editing model offering maximum performance with enhanced prompt adherence and superior typography generation. Designed for professional-grade editing tasks requiring the highest quality output.

Interactive Web Interface

For hands-on experimentation with image and video models, visit Heurist Imagine where you can test and use our supported models directly in your browser. For testing and experimenting with language models, visit Pondera - a free interactive chatbot app where you can try out our LLMs with various configurations and settings.

Embedding Models

  • BAAI/bge-large-en-v1.5: A large-scale multilingual embedding model. It converts text into a 1024 dimensional vector representation. The max input length is 512 tokens.

Need More Models?

Any image generation models from Civitai and LLMs from HuggingFace can be supported upon request. If you’re interested in hosting your models on Heurist, or if you want customized models that adapts to your specific use cases, please contact us at team@heurist.xyz