# Vision Models

- [Overview](/guides/vision-models/vision-models.md)
- [Llama 3.2 Vision](/guides/vision-models/llama-vision.md): Run Meta's Llama 3.2 Vision for image understanding on Clore.ai
- [LLaVA](/guides/vision-models/llava-vision-language.md): Chat with images using LLaVA vision-language model on Clore.ai
- [Qwen2.5-VL Vision Language Model](/guides/vision-models/qwen-vl.md): Run Qwen2.5-VL, the leading open vision-language model, for image/video/document understanding on Clore.ai GPUs.
- [Florence-2](/guides/vision-models/florence2.md): Microsoft Florence-2 for captioning, detection, and segmentation
- [SAM2 Video](/guides/vision-models/sam2-video.md): Track and segment objects in video with Meta's SAM2 on Clore.ai
- [GroundingDINO](/guides/vision-models/groundingdino.md): Detect any object using text descriptions with GroundingDINO
