Flux.2 Swift MLX
A native Swift implementation of Flux.2 image generation models, running locally on Apple Silicon Macs using MLX.
Downloads
📦 Latest Release (v2.1.0) — Universal binaries for Apple Silicon
| Download | Description |
|---|---|
| Flux2App | Demo macOS app with T2I, I2I, chat (guide) |
| Flux2CLI | Image generation CLI (guide) |
| FluxEncodersCLI | Text encoders CLI (guide) |
Note: On first launch, macOS may block unsigned apps. Right-click → Open to bypass Gatekeeper.
Features
Image Generation (Flux2Core)
- Native Swift: Pure Swift implementation, no Python dependencies at runtime
- MLX Acceleration: Optimized for Apple Silicon (M1/M2/M3/M4) using MLX
- Multiple Models: Dev (32B), Klein 4B, and Klein 9B variants
- Quantized Models: On-the-fly quantization (qint8/int4) for all models — Dev fits in ~17GB at int4
- Text-to-Image: Generate images from text prompts
- Image-to-Image: Transform images with text prompts and configurable strength
- Multi-Image Conditioning: Combine elements from up to 3 reference images
- Prompt Upsampling: Enhance prompts with Mistral/Qwen3 before generation
- LoRA Support: Load and apply LoRA adapters for style transfer
- LoRA Training: Train your own LoRAs on Apple Silicon (guide)
- LoRA Evaluation: Automated pipeline to evaluate training gap and recommend parameters (guide)
- Image-to-Image Training: Train paired I2I LoRAs (e.g. style transfer, image restoration)
- CLI Tool: Full-featured command-line interface (
Flux2CLI) - macOS App: Demo SwiftUI application (
Flux2App) with T2I, I2I, and chat
Text Encoders (FluxTextEncoders)
- Mistral Small 3.2 (24B): Text encoder for FLUX.2 dev/pro
- Qwen3 (4B/8B): Text encoder for FLUX.2 Klein
- Qwen3.5-4B VLM: Native vision-language model for image analysis (~3GB, auto-downloaded)
- FLUX.2 Image Description: VLM-powered image analysis optimized for FLUX.2 regeneration
- Image Comparison: Score two images on scene and style fidelity (0-10)
- Text Generation: Streaming text generation with configurable parameters
- Interactive Chat: Multi-turn conversation with chat template support
- Vision Analysis: Image understanding via Pixtral (Mistral) or Qwen3.5 vision encoders
- FLUX.2 Embeddings: Extract embeddings compatible with FLUX.2 image generation
- CLI Tool: Complete command-line interface (
FluxEncodersCLI)
Requirements
- macOS 15.0 (Sequoia) or later (built on macOS 26 Tahoe)
- Apple Silicon Mac (M1/M2/M3/M4)
- Xcode 16.0 or later
Memory requirements by model (with on-the-fly quantization):
| Model | int4 | qint8 | bf16 |
|---|---|---|---|
| Klein 4B | 16 GB | 16 GB | 24 GB |
| Klein 9B | 16 GB | 24 GB | 32 GB |
| Dev (32B) | 32 GB | 96 GB | 96 GB |
Installation
Pre-built Binaries (Recommended)
Download from the Releases page:
# CLI
unzip Flux2CLI-v2.1.0-macOS.zip
./Flux2CLI t2i "a cat" --model klein-4b
# App
unzip Flux2App-v2.1.0-macOS.zip
open Flux2App.app
Build from Source
git clone https://github.com/VincentGourbin/flux-2-swift-mlx.git
cd flux-2-swift-mlx
Build with Xcode (not swift build):
- Open the project in Xcode
- Select
Flux2CLIorFlux2Appscheme - Build with
Cmd+B(orCmd+Rto run)
Download Models
The models are downloaded automatically from HuggingFace on first run.
For Dev (32B):
- Text Encoder: Mistral Small 3.2 (~25GB 8-bit)
- Transformer: Flux.2 Dev (~33GB qint8, ~17GB int4)
- VAE: Flux.2 VAE (~3GB)
For Klein 4B/9B:
- Text Encoder: Qwen3-4B or Qwen3-8B (~4-8GB 8-bit)
- Transformer: Klein 4B (~4-7GB) or Klein 9B (~5-17GB depending on quantization)
- VAE: Flux.2 VAE (~3GB)
Models are cached in ~/Library/Caches/models/ by default (configurable via --models-dir or ModelRegistry.customModelsDirectory for sandboxed apps).
Usage
CLI
# Fast generation with Klein 4B (~26s, commercial OK)
flux2 t2i "a beaver building a dam" --model klein-4b
# Better quality with Klein 9B (~62s)
flux2 t2i "a beaver building a dam" --model klein-9b
# Maximum quality with Dev (~35min, requires 64GB+ RAM)
flux2 t2i "a beautiful sunset over mountains" --model dev
# With custom parameters
flux2 t2i "a red apple on a white table" \
--width 512 \
--height 512 \
--steps 20 \
--guidance 4.0 \
--seed 42 \
--output apple.png
# Image-to-Image with reference image
flux2 i2i "transform into a watercolor painting" \
--images photo.jpg \
--strength 0.7 \
--steps 28 \
--output watercolor.png
# Multi-image conditioning (combine elements)
flux2 i2i "a cat wearing this jacket" \
--images cat.jpg \
--images jacket.jpg \
--steps 28 \
--output cat_jacket.png
See CLI Documentation for all options.
As a Library
import Flux2Core
// Initialize pipeline
let pipeline = try await Flux2Pipeline()
// Generate image
let image = try await pipeline.generateTextToImage(
prompt: "a beautiful sunset over mountains",
height: 512,
width: 512,
steps: 20,
guidance: 4.0
) { current, total in
print("Step \(current)/\(total)")
}
Architecture
Flux.2 Dev is a ~32B parameter rectified flow transformer:
- 8 Double-stream blocks: Joint attention between text and image
- 48 Single-stream blocks: Combined text+image processing
- 4D RoPE: Rotary position embeddings for T, H, W, L axes
- SwiGLU FFN: Gated activation in feed-forward layers
- AdaLN: Adaptive layer normalization with timestep conditioning
Text encoding uses Mistral Small 3.2 to generate 15360-dim embeddings.
On-the-fly Quantization
All models support on-the-fly quantization to reduce transformer memory. No need to download separate variants — one bf16 model file serves all levels.
| Model | bf16 | qint8 (-47%) | int4 (-72%) |
|---|---|---|---|
| Klein 4B | 7.4 GB | 3.9 GB | 2.1 GB |
| Klein 9B | 17.3 GB | 9.2 GB | 4.9 GB |
| Dev (32B) | 61.5 GB | 32.7 GB | 17.3 GB |
# Klein 9B with qint8 (fits in 24 GB)
flux2 t2i "a cat" --model klein-9b --transformer-quant qint8
# Dev with int4 (fits in 32 GB)
flux2 t2i "a cat" --model dev --transformer-quant int4
See Quantization Benchmark for detailed measurements and visual comparison.
Documentation
Guides
| Guide | Description |
|---|---|
| CLI Documentation | Command-line interface — all commands and options |
| LoRA Guide | Loading and using LoRA adapters |
| LoRA Training Guide | Training parameters, DOP, gradient checkpointing, YAML config |
| LoRA Evaluation | Automated gap analysis and training parameter recommendations |
| VLM API | Qwen3.5 VLM — image analysis, comparison, LoRA training setup |
| Text Encoders | FluxTextEncoders library API and CLI |
| Custom Model Integration | Integrating custom MLX-compatible models into the framework |
| Flux2App Guide | Demo macOS application |
Examples and Benchmarks
| Example | Description |
|---|---|
| Examples Gallery | Overview of all examples with sample outputs |
| Model Comparison | Dev vs Klein 4B vs Klein 9B — performance, quality, when to use each |
| Quantization Benchmark | Measured memory, speed, and visual quality for bf16/qint8/int4 |
| Flux.2 Dev Examples | T2I, I2I, multi-image conditioning, VLM image interpretation |
| Flux.2 Klein 4B Examples | Fast T2I, multiple resolutions, quantization comparison |
| Flux.2 Klein 9B Examples | T2I, multiple resolutions, prompt upsampling |
LoRA Training
| Guide | Description |
|---|---|
| LoRA Evaluation Pipeline | New — Automated gap analysis: VLM describes reference, generates baseline, compares, recommends training params |
| Cat Toy (Subject LoRA) | Subject injection with DOP, trigger word sks (Klein 4B) |
| Tarot Style (Style LoRA) | Style transfer, trigger word rwaite, 32 training images (Klein 4B) |
Help Wanted — The LoRA evaluation parameter recommendations are based on initial heuristics and will be refined with user feedback. If you use
evaluate-loraand train LoRAs, please share your results to help improve the recommendations!
Current Limitations
- Dev Performance: Generation takes ~30 min for 1024x1024 images (use Klein for faster results)
- Dev Memory: Requires 32GB+ with int4, 64GB+ with qint8 (Klein 4B works with 16GB)
- LoRA Training: Supported on Klein 4B, Klein 9B, and Dev. Enable
gradient_checkpointing: truefor larger models to reduce memory by ~50%. Image-to-Image training doubles sequence length — gradient checkpointing is recommended.
Acknowledgments
- Black Forest Labs for Flux.2
- Hugging Face Diffusers for reference implementation
- MLX team at Apple for the ML framework
License
MIT License - see LICENSE file.
Disclaimer: This is an independent implementation and is not affiliated with Black Forest Labs. Flux.2 model weights are subject to their own license terms.

