Model

Instructions

Preview Caption

  • Upload a single image and generate a detailed caption
  • Try different models to compare results

Create Dataset

  • Upload multiple images to process them all at once
  • All images will be captioned and saved with matching .txt files
  • By default, captions include [trigger] at the beginning (you can modify the trigger word)
  • Click "Process Images" to generate captions and create a downloadable dataset
  • Use the download button to get a ZIP file containing all images and caption files

Models Available

  • Florence-2-Flux: Faster version with good quality captions
  • Florence-2-Flux-Large: Provides detailed captions with better image understanding
  • florence-2-large-ft-moredetailed: Fine-tuned specifically for more detailed captions
  • Florence-2-large-PromptGen-v2.0: Memory efficient model with high quality detailed captions

MiaoshouAI/Florence-2-large-PromptGen-v2.0 Features

  • Improved caption quality for detailed captions
  • Memory efficient (requires only ~1GB VRAM)
  • Fast generation while maintaining high quality
  • Supports multiple caption formats including detailed captions, tags, and analysis

Supported image formats: JPG, JPEG, PNG