Home
Blog
New AI inference models available now on Gcore

New AI inference models available now on Gcore

November 17, 2025

2 min read

New AI inference models available now on Gcore

We’ve expanded our Application Catalog with a new set of high-performance models across embeddings, text-to-speech, multimodal LLMs, and safety. All models are live today via Everywhere Inference and Everywhere AI, and are ready to deploy in just 3 clicks with no infrastructure management and no setup overhead.

This update brings stronger retrieval accuracy, more expressive voice generation, real-time audio-native LLMs, and enterprise-grade safety controls. Whether you’re building search pipelines, conversational agents, IVR systems, or production-scale AI applications, these additions give you more flexibility to optimize for quality, latency, and cost.

Text embeddings (5 new models)

High-quality embeddings are the backbone of any AI that needs to find, rank, or understand information, including RAG, semantic search, personalization, recommendations, and clustering. This new set of embedding models dramatically improves retrieval precision, cross-lingual reach, and overall RAG quality.

Alibaba-NLP/gte-Qwen2-7B-instruct: High-quality instruction-tuned embeddings for retrieval, reranking, and semantic search across broad domains. Ideal for RAG pipelines that need strong generalization.
BAAI/bge-m3: Multilingual, multi-function embeddings built for search, clustering, and cross-lingual retrieval. A great fit for global applications and multi-language knowledge bases.
intfloat/e5-mistral-7b-instruct: E5-style instruction-following embeddings optimized for retrieval tasks, question-answer matching, and ranking. Strong performance on RAG evaluation benchmarks.
Qwen/Qwen3-Embedding-4B: A cost-efficient, versatile embedding model delivering balanced performance for large-scale retrieval workloads.
Qwen/Qwen3-Embedding-8B: A higher-capacity sibling offering premium embedding quality for challenging retrieval, reranking, and high-accuracy semantic search.

Text-to-speech (2 new models)

Voice is becoming a first-class interface. These new TTS models make agents feel more natural, reduce robotic cadence, and improve clarity, especially in high-volume workflows like support, IVR, media generation, and automation.

microsoft/VibeVoice-1.5B: Neural TTS with natural prosody, expressive cadence, and fast synthesis, built for interactive applications where latency matters.
ResembleAI/chatterbox: Production-ready TTS capable of expressive, characterful speech. Ideal for agents, IVR, content workflows, and automated voice experiences.

Text + audio LLMs (2 new models)

These new multimodal LLMs accept both text and audio, enabling real-time voice agents, transcription intelligence, and interactive multimodal applications. They eliminate the need to stitch together separate ASR → LLM → TTS pipelines.

mistralai/Voxtral-Mini-3B-2507: A lightweight speech-and-text LLM for real-time voice agents. Handles both text and audio inputs/outputs and is optimized for low-latency scenarios.
mistralai/Voxtral-Small-24B: A mid-size Voxtral variant offering higher-quality multimodal reasoning and richer conversational speech. Suitable for advanced voice assistants, transcription workflows, and audio-aware applications.

Safety models (3 new models)

As enterprises deploy AI into production, safety is non-negotiable. These models offer high-quality classification, risk detection, and output transformation to help organizations stay compliant.

openai/gpt-oss-safeguard-120b: A high-capacity safety model supporting policy classification, risk detection, and output guidance. Built for enterprise-grade moderation systems.
openai/gpt-oss-safeguard-20b: A lighter, faster safeguard variant designed to power low-latency moderation pipelines without sacrificing accuracy.
Qwen/Qwen3Guard-Gen-8B: A guardrail model specialized in detecting unsafe content and transforming or steering outputs toward compliant responses.

Deploy the latest models in 3 clicks and 10 seconds

All models are available today via Gcore Everywhere AI and Gcore Everywhere Inference. Deploy publicly or privately, whichever fits your architecture.

You get:

Global low-latency routing
Predictable cost and usage visibility
Zero infrastructure management
Instant scaling to production workloads

Open the Gcore Customer Portal, choose a model, and deploy in just three clicks.

Deploy these new AI models today

Mili Leitner Cohen

Content Marketing Lead, AI Products

Introducing faster, lower-cost LLM inference with NVIDIA Dynamo

Imagine if you could click a button and suddenly your GPUs increase their throughput by 6x. Or reduce latency by 2x. Or route inference requests seamlessly across different GPU types.That's the experience we're bringing to our inference cus

New AI inference models on Application Catalog: translation, agents, and flagship reasoning

We’ve expanded our AI inference Application Catalog with three new state-of-the-art models, covering massively multilingual translation, efficient agentic workflows, and high-end reasoning. All models are live today via Everywhere Inference

Introducing Gcore Everywhere AI: 3-click AI training and inference for any environment

For enterprises, telcos, and CSPs, AI adoption sounds promising…until you start measuring impact. Most projects stall or even fail before ROI starts to appear. ML engineers lose momentum setting up clusters. Infrastructure teams battle to b

Introducing AI Cloud Stack: turning GPU clusters into revenue-generating AI clouds

Enterprises and cloud providers face major roadblocks when trying to deploy GPU infrastructure at scale: long time-to-market, operational inefficiencies, and difficulty bringing new capacity to market profitably. Establishing AI environment

Edge AI is your next competitive advantage: highlights from Seva Vayner’s webinar

Edge AI isn’t just a technical milestone. It’s a strategic lever for businesses aiming to gain a competitive advantage with AI.As AI deployments grow more complex and more global, central cloud infrastructure is hitting real-world limits: c

From budget strain to AI gain: Watch how studios are building smarter with AI

Game development is in a pressure cooker. Budgets are ballooning, infrastructure and labor costs are rising, and players expect more complexity and polish with every release. All studios, from the major AAAs to smaller indies, are feeling t