Modèle Updated 2026-04
Multimodal
Definition
A multimodal model processes and generates multiple data types: text, images, audio and video.
See also in the glossary
L
LLM (Large Language Model)
An LLM is an AI model trained on billions of texts, capable of understanding and generating human language.
G
Generative AI
Generative AI refers to artificial intelligence systems capable of creating original content: text, images, video, audio, code.
T
Text-to-Image
Text-to-Image refers to generating images from text descriptions using generative AI models.
T
Text-to-Speech
Text-to-Speech converts written text into spoken voice using AI, with increasingly realistic results.
Tools that use multimodal
Frequently Asked Questions
Which LLMs are multimodal?
GPT-4o, Gemini 2.0, Claude Opus. Most major LLMs are multimodal in 2026.
Does multimodal mean the model does everything?
No. A multimodal model processes multiple input types but doesn't necessarily excel at each one.