PickSkill
← Back

flux2-swift-mlx

Use when generating images with Flux.2 on Apple Silicon, working with MLX Swift, or implementing text-to-image/image-to-image pipelines in Swift.

README.md
Rendered from GitHub raw
View raw ↗

Flux.2 Swift MLX

A native Swift implementation of Flux.2 image generation models, running locally on Apple Silicon Macs using MLX.

FluxForge Studio on the App Store Website

Downloads

📦 Latest Release (v2.1.0) — Universal binaries for Apple Silicon

Download Description
Flux2App Demo macOS app with T2I, I2I, chat (guide)
Flux2CLI Image generation CLI (guide)
FluxEncodersCLI Text encoders CLI (guide)

Note: On first launch, macOS may block unsigned apps. Right-click → Open to bypass Gatekeeper.

Features

Image Generation (Flux2Core)

  • Native Swift: Pure Swift implementation, no Python dependencies at runtime
  • MLX Acceleration: Optimized for Apple Silicon (M1/M2/M3/M4) using MLX
  • Multiple Models: Dev (32B), Klein 4B, and Klein 9B variants
  • Quantized Models: On-the-fly quantization (qint8/int4) for all models — Dev fits in ~17GB at int4
  • Text-to-Image: Generate images from text prompts
  • Image-to-Image: Transform images with text prompts and configurable strength
  • Multi-Image Conditioning: Combine elements from up to 3 reference images
  • Prompt Upsampling: Enhance prompts with Mistral/Qwen3 before generation
  • LoRA Support: Load and apply LoRA adapters for style transfer
  • LoRA Training: Train your own LoRAs on Apple Silicon (guide)
  • LoRA Evaluation: Automated pipeline to evaluate training gap and recommend parameters (guide)
  • Image-to-Image Training: Train paired I2I LoRAs (e.g. style transfer, image restoration)
  • CLI Tool: Full-featured command-line interface (Flux2CLI)
  • macOS App: Demo SwiftUI application (Flux2App) with T2I, I2I, and chat

Text Encoders (FluxTextEncoders)

  • Mistral Small 3.2 (24B): Text encoder for FLUX.2 dev/pro
  • Qwen3 (4B/8B): Text encoder for FLUX.2 Klein
  • Qwen3.5-4B VLM: Native vision-language model for image analysis (~3GB, auto-downloaded)
  • FLUX.2 Image Description: VLM-powered image analysis optimized for FLUX.2 regeneration
  • Image Comparison: Score two images on scene and style fidelity (0-10)
  • Text Generation: Streaming text generation with configurable parameters
  • Interactive Chat: Multi-turn conversation with chat template support
  • Vision Analysis: Image understanding via Pixtral (Mistral) or Qwen3.5 vision encoders
  • FLUX.2 Embeddings: Extract embeddings compatible with FLUX.2 image generation
  • CLI Tool: Complete command-line interface (FluxEncodersCLI)

Requirements

  • macOS 15.0 (Sequoia) or later (built on macOS 26 Tahoe)
  • Apple Silicon Mac (M1/M2/M3/M4)
  • Xcode 16.0 or later

Memory requirements by model (with on-the-fly quantization):

Model int4 qint8 bf16
Klein 4B 16 GB 16 GB 24 GB
Klein 9B 16 GB 24 GB 32 GB
Dev (32B) 32 GB 96 GB 96 GB

Installation

Download from the Releases page:

# CLI
unzip Flux2CLI-v2.1.0-macOS.zip
./Flux2CLI t2i "a cat" --model klein-4b
 
# App
unzip Flux2App-v2.1.0-macOS.zip
open Flux2App.app

Build from Source

git clone https://github.com/VincentGourbin/flux-2-swift-mlx.git
cd flux-2-swift-mlx

Build with Xcode (not swift build):

  1. Open the project in Xcode
  2. Select Flux2CLI or Flux2App scheme
  3. Build with Cmd+B (or Cmd+R to run)

Download Models

The models are downloaded automatically from HuggingFace on first run.

For Dev (32B):

  • Text Encoder: Mistral Small 3.2 (~25GB 8-bit)
  • Transformer: Flux.2 Dev (~33GB qint8, ~17GB int4)
  • VAE: Flux.2 VAE (~3GB)

For Klein 4B/9B:

  • Text Encoder: Qwen3-4B or Qwen3-8B (~4-8GB 8-bit)
  • Transformer: Klein 4B (~4-7GB) or Klein 9B (~5-17GB depending on quantization)
  • VAE: Flux.2 VAE (~3GB)

Models are cached in ~/Library/Caches/models/ by default (configurable via --models-dir or ModelRegistry.customModelsDirectory for sandboxed apps).

Usage

CLI

# Fast generation with Klein 4B (~26s, commercial OK)
flux2 t2i "a beaver building a dam" --model klein-4b
 
# Better quality with Klein 9B (~62s)
flux2 t2i "a beaver building a dam" --model klein-9b
 
# Maximum quality with Dev (~35min, requires 64GB+ RAM)
flux2 t2i "a beautiful sunset over mountains" --model dev
 
# With custom parameters
flux2 t2i "a red apple on a white table" \
  --width 512 \
  --height 512 \
  --steps 20 \
  --guidance 4.0 \
  --seed 42 \
  --output apple.png
 
# Image-to-Image with reference image
flux2 i2i "transform into a watercolor painting" \
  --images photo.jpg \
  --strength 0.7 \
  --steps 28 \
  --output watercolor.png
 
# Multi-image conditioning (combine elements)
flux2 i2i "a cat wearing this jacket" \
  --images cat.jpg \
  --images jacket.jpg \
  --steps 28 \
  --output cat_jacket.png

See CLI Documentation for all options.

As a Library

import Flux2Core
 
// Initialize pipeline
let pipeline = try await Flux2Pipeline()
 
// Generate image
let image = try await pipeline.generateTextToImage(
    prompt: "a beautiful sunset over mountains",
    height: 512,
    width: 512,
    steps: 20,
    guidance: 4.0
) { current, total in
    print("Step \(current)/\(total)")
}

Architecture

Flux.2 Dev is a ~32B parameter rectified flow transformer:

  • 8 Double-stream blocks: Joint attention between text and image
  • 48 Single-stream blocks: Combined text+image processing
  • 4D RoPE: Rotary position embeddings for T, H, W, L axes
  • SwiGLU FFN: Gated activation in feed-forward layers
  • AdaLN: Adaptive layer normalization with timestep conditioning

Text encoding uses Mistral Small 3.2 to generate 15360-dim embeddings.

On-the-fly Quantization

All models support on-the-fly quantization to reduce transformer memory. No need to download separate variants — one bf16 model file serves all levels.

Model bf16 qint8 (-47%) int4 (-72%)
Klein 4B 7.4 GB 3.9 GB 2.1 GB
Klein 9B 17.3 GB 9.2 GB 4.9 GB
Dev (32B) 61.5 GB 32.7 GB 17.3 GB
# Klein 9B with qint8 (fits in 24 GB)
flux2 t2i "a cat" --model klein-9b --transformer-quant qint8
 
# Dev with int4 (fits in 32 GB)
flux2 t2i "a cat" --model dev --transformer-quant int4

See Quantization Benchmark for detailed measurements and visual comparison.

Documentation

Guides

Guide Description
CLI Documentation Command-line interface — all commands and options
LoRA Guide Loading and using LoRA adapters
LoRA Training Guide Training parameters, DOP, gradient checkpointing, YAML config
LoRA Evaluation Automated gap analysis and training parameter recommendations
VLM API Qwen3.5 VLM — image analysis, comparison, LoRA training setup
Text Encoders FluxTextEncoders library API and CLI
Custom Model Integration Integrating custom MLX-compatible models into the framework
Flux2App Guide Demo macOS application

Examples and Benchmarks

Example Description
Examples Gallery Overview of all examples with sample outputs
Model Comparison Dev vs Klein 4B vs Klein 9B — performance, quality, when to use each
Quantization Benchmark Measured memory, speed, and visual quality for bf16/qint8/int4
Flux.2 Dev Examples T2I, I2I, multi-image conditioning, VLM image interpretation
Flux.2 Klein 4B Examples Fast T2I, multiple resolutions, quantization comparison
Flux.2 Klein 9B Examples T2I, multiple resolutions, prompt upsampling

LoRA Training

Guide Description
LoRA Evaluation Pipeline New — Automated gap analysis: VLM describes reference, generates baseline, compares, recommends training params
Cat Toy (Subject LoRA) Subject injection with DOP, trigger word sks (Klein 4B)
Tarot Style (Style LoRA) Style transfer, trigger word rwaite, 32 training images (Klein 4B)

Help Wanted — The LoRA evaluation parameter recommendations are based on initial heuristics and will be refined with user feedback. If you use evaluate-lora and train LoRAs, please share your results to help improve the recommendations!

Current Limitations

  • Dev Performance: Generation takes ~30 min for 1024x1024 images (use Klein for faster results)
  • Dev Memory: Requires 32GB+ with int4, 64GB+ with qint8 (Klein 4B works with 16GB)
  • LoRA Training: Supported on Klein 4B, Klein 9B, and Dev. Enable gradient_checkpointing: true for larger models to reduce memory by ~50%. Image-to-Image training doubles sequence length — gradient checkpointing is recommended.

Acknowledgments

License

MIT License - see LICENSE file.


Disclaimer: This is an independent implementation and is not affiliated with Black Forest Labs. Flux.2 model weights are subject to their own license terms.