Gaming industry under DDoS attack. Get DDoS protection now. Start onboarding
  1. Home
  2. Blog
  3. New AI inference models available now on Gcore
News
AI

New AI inference models available now on Gcore

  • November 17, 2025
  • 2 min read
New AI inference models available now on Gcore

We’ve expanded our Application Catalog with a new set of high-performance models across embeddings, text-to-speech, multimodal LLMs, and safety. All models are live today via Everywhere Inference and Everywhere AI, and are ready to deploy in just 3 clicks with no infrastructure management and no setup overhead.

This update brings stronger retrieval accuracy, more expressive voice generation, real-time audio-native LLMs, and enterprise-grade safety controls. Whether you’re building search pipelines, conversational agents, IVR systems, or production-scale AI applications, these additions give you more flexibility to optimize for quality, latency, and cost.

Text embeddings (5 new models)

High-quality embeddings are the backbone of any AI that needs to find, rank, or understand information, including RAG, semantic search, personalization, recommendations, and clustering. This new set of embedding models dramatically improves retrieval precision, cross-lingual reach, and overall RAG quality.

  • Alibaba-NLP/gte-Qwen2-7B-instruct: High-quality instruction-tuned embeddings for retrieval, reranking, and semantic search across broad domains. Ideal for RAG pipelines that need strong generalization.
  • BAAI/bge-m3: Multilingual, multi-function embeddings built for search, clustering, and cross-lingual retrieval. A great fit for global applications and multi-language knowledge bases.
  • intfloat/e5-mistral-7b-instruct: E5-style instruction-following embeddings optimized for retrieval tasks, question-answer matching, and ranking. Strong performance on RAG evaluation benchmarks.
  • Qwen/Qwen3-Embedding-4B: A cost-efficient, versatile embedding model delivering balanced performance for large-scale retrieval workloads.
  • Qwen/Qwen3-Embedding-8B: A higher-capacity sibling offering premium embedding quality for challenging retrieval, reranking, and high-accuracy semantic search.

Text-to-speech (2 new models)

Voice is becoming a first-class interface. These new TTS models make agents feel more natural, reduce robotic cadence, and improve clarity, especially in high-volume workflows like support, IVR, media generation, and automation.

  • microsoft/VibeVoice-1.5B: Neural TTS with natural prosody, expressive cadence, and fast synthesis, built for interactive applications where latency matters.
  • ResembleAI/chatterbox: Production-ready TTS capable of expressive, characterful speech. Ideal for agents, IVR, content workflows, and automated voice experiences.

Text + audio LLMs (2 new models)

These new multimodal LLMs accept both text and audio, enabling real-time voice agents, transcription intelligence, and interactive multimodal applications. They eliminate the need to stitch together separate ASR → LLM → TTS pipelines.

  • mistralai/Voxtral-Mini-3B-2507: A lightweight speech-and-text LLM for real-time voice agents. Handles both text and audio inputs/outputs and is optimized for low-latency scenarios.
  • mistralai/Voxtral-Small-24B: A mid-size Voxtral variant offering higher-quality multimodal reasoning and richer conversational speech. Suitable for advanced voice assistants, transcription workflows, and audio-aware applications.

Safety models (3 new models)

As enterprises deploy AI into production, safety is non-negotiable. These models offer high-quality classification, risk detection, and output transformation to help organizations stay compliant.

  • openai/gpt-oss-safeguard-120b: A high-capacity safety model supporting policy classification, risk detection, and output guidance. Built for enterprise-grade moderation systems.
  • openai/gpt-oss-safeguard-20b: A lighter, faster safeguard variant designed to power low-latency moderation pipelines without sacrificing accuracy.
  • Qwen/Qwen3Guard-Gen-8B: A guardrail model specialized in detecting unsafe content and transforming or steering outputs toward compliant responses.

Deploy the latest models in 3 clicks and 10 seconds

All models are available today via Gcore Everywhere AI and Gcore Everywhere Inference. Deploy publicly or privately, whichever fits your architecture.

You get:

  • Global low-latency routing
  • Predictable cost and usage visibility
  • Zero infrastructure management
  • Instant scaling to production workloads

Open the Gcore Customer Portal, choose a model, and deploy in just three clicks.

Deploy these new AI models today

Related articles

Introducing Gcore Everywhere AI: 3-click AI training and inference for any environment

For enterprises, telcos, and CSPs, AI adoption sounds promising…until you start measuring impact. Most projects stall or even fail before ROI starts to appear. ML engineers lose momentum setting up clusters. Infrastructure teams battle to b

Introducing AI Cloud Stack: turning GPU clusters into revenue-generating AI clouds

Enterprises and cloud providers face major roadblocks when trying to deploy GPU infrastructure at scale: long time-to-market, operational inefficiencies, and difficulty bringing new capacity to market profitably. Establishing AI environment

Edge AI is your next competitive advantage: highlights from Seva Vayner’s webinar

Edge AI isn’t just a technical milestone. It’s a strategic lever for businesses aiming to gain a competitive advantage with AI.As AI deployments grow more complex and more global, central cloud infrastructure is hitting real-world limits: c

From budget strain to AI gain: Watch how studios are building smarter with AI

Game development is in a pressure cooker. Budgets are ballooning, infrastructure and labor costs are rising, and players expect more complexity and polish with every release. All studios, from the major AAAs to smaller indies, are feeling t

How AI-enhanced content moderation is powering safe and compliant streaming

How AI-enhanced content moderation is powering safe and compliant streaming

As streaming experiences a global boom across platforms, regions, and industries, providers face a growing challenge: how to deliver safe, respectful, and compliant content delivery at scale. Viewer expectations have never been higher, like

Announcing new tools, apps, and regions for your real-world AI use cases

Three updates, one shared goal: helping builders move faster with AI. Our latest releases for Gcore Edge AI bring real-world AI deployments within reach, whether you’re a developer integrating genAI into a workflow, an MLOps team scaling in

Subscribe to our newsletter

Get the latest industry trends, exclusive insights, and Gcore updates delivered straight to your inbox.