Этот пост — тест производительности DGX Spark для ряда открытых нейросетей.

Дисклеймер: я энтузиаст, а не промышленная лаборатория. Это лишь информационный материал. Я не даю гарантий и не несу ответственности за то, как вы используете или интерпретируете эти цифры. Я просто тестирую нейросети, которые успешно запускаются в моей среде на DGX Spark.

Конфигурация теста¶

Приложение: демо ComfyUI/Gradio
Нагрузка: несколько генераций за запуск (в большинстве тестов)
Запуски: повторные; ComfyUI перезапускается с нуля перед каждым батч‑запуском
Холодный старт учитывается один раз за запуск
Идентичные промпт, модель, сэмплер, разрешение и параметры между запусками
Остывание: устройство охлаждается до простоя между запусками

Методика¶

Запустить ComfyUI/Gradio, проверить идентичность графа и параметров.
Дать устройству остыть до простоя, затем перезапустить приложение с нуля перед каждым запуском, чтобы учесть стоимость холодного старта один раз за прогон.
Выполнить рабочую нагрузку.
Зафиксировать тайминги из консоли/stdout.

Результаты (сводка)¶

Z Image Turbo¶

Промпт:

A breath-taking extreme close-up cinematic still of an action-hero squirrel with 
(reddish-colored fur:1.25) riding on the back of a fast-moving massive shark. 
The shark is half submerged into the water and has (cybernetic augmentations:1.2) to its body. 
Two rockets are strapped either side of the shark's body. 
The squirrel is laying low and clinging onto the big shark. 
The squirrel is wearing straps around its chest and (swimming goggles over its head:1.1) and has a wide open mouth 
and scared expression on its face. The water is deep blue and splashes around the shark. 
There's an impressive explosion with black plumes in the background. 
In the background is the blue sky with a gradient going from light blue to deep blue. 
The lighting is sunny. Motion blur and sense of speed. Cinematic movie poster shot. 
Extremely realistic and detailed textures. extremely artistic, high contrast with (deep blacks:0.2), p
erfectly framed composition, rule of thirds, golden ratio, perfectly balanced composition, eye-candy, 
highly detailed, best quality, award winning, featured

DGX Spark:¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load ZImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7672.25 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
unet missing: ['norm_final.weight']
Requested to load Lumina2
loaded completely; 72907.03 MB usable, 11739.55 MB loaded, full load: True

Requested to load AutoencodingEngine
loaded completely; 68185.67 MB usable, 159.87 MB loaded, full load: True
Prompt executed in 46.04 seconds - first run
Prompt executed in 5.96 seconds
Prompt executed in 5.96 seconds
Prompt executed in 6.00 seconds
Prompt executed in 5.96 seconds
Prompt executed in 6.01 seconds
Prompt executed in 6.01 seconds
Prompt executed in 6.01 seconds

RTX 4090, RunPod:¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load ZImageTEModel_
loaded completely; 22478.49 MB usable, 7672.25 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
unet missing: ['norm_final.weight']
Requested to load Lumina2
loaded completely; 14706.11 MB usable, 11739.55 MB loaded, full load: True

Requested to load AutoencodingEngine
loaded completely; 282.12 MB usable, 159.87 MB loaded, full load: True

Prompt executed in 26.28 seconds - first run
Prompt executed in 2.57 seconds
Prompt executed in 2.55 seconds
Prompt executed in 2.61 seconds
Prompt executed in 2.55 seconds
Prompt executed in 2.63 seconds
Prompt executed in 2.56 seconds
Prompt executed in 2.57 seconds

QWEN Image edit 2509, Lightning 4-step, cfg: 1¶

Промпт:

Replace a squirrel with an action-hero sloth lying low and clinging on tightly, 
wearing straps around its chest and swimming goggles on its head, 
its mouth wide open with a scared expression, extremely realistic and detailed textures, 
highly detailed, best quality.

DGX Spark.¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 53450.61 MB usable, 242.03 MB loaded, full load: True

Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load QwenImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7909.74 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 35513.91 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 80.77 seconds - first run
Prompt executed in 17.87 seconds
Prompt executed in 17.96 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.04 seconds
Prompt executed in 18.05 seconds

RTX 4090, RunPod.¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 20297.69 MB usable, 242.03 MB loaded, full load: True

Requested to load QwenImageTEModel_
loaded completely; 22144.46 MB usable, 7910.29 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 22007.91 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 72.46 seconds - first run
Prompt executed in 7.14 seconds
Prompt executed in 8.63 seconds
Prompt executed in 8.50 seconds
Prompt executed in 8.47 seconds
Prompt executed in 8.49 seconds
Prompt executed in 8.47 seconds
Prompt executed in 8.46 seconds

QWEN IMAGE EDIT 2509, 20 steps, no loras, cfg 2.5,¶

DGX Spark¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 53384.45 MB usable, 242.03 MB loaded, full load: True

Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load QwenImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7909.74 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 35497.00 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 223.67 seconds
Prompt executed in 172.16 seconds
Prompt executed in 171.87 seconds
Prompt executed in 171.94 seconds
Prompt executed in 172.47 seconds
Prompt executed in 172.18 seconds
Prompt executed in 171.67 seconds
Prompt executed in 171.74 seconds

RTX 4090, RunPod¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 20297.69 MB usable, 242.03 MB loaded, full load: True

Requested to load QwenImageTEModel_
loaded completely; 22144.46 MB usable, 7910.29 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 22007.91 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 104.81 seconds - first run
Prompt executed in 57.80 seconds
Prompt executed in 57.88 seconds
Prompt executed in 57.91 seconds
Prompt executed in 57.74 seconds
Prompt executed in 57.81 seconds
Prompt executed in 58.01 seconds
Prompt executed in 57.63 seconds

Flux 2, редактирование изображения. Q8-0 GGUF:¶

Промпт:

Replace a squirrel with an action-hero sloth, keep equipment the same

DGX Spark¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load AutoencoderKL
loaded completely; 109398.04 MB usable, 160.31 MB loaded, full load: True

Requested to load Flux2TEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 33080.59 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
gguf qtypes: F32 (128), Q8_0 (160), BF16 (11)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load Flux2
loaded partially; 30328.59 MB usable, 29958.78 MB loaded, 3854.25 MB offloaded, 344.25 MB buffer reserved, lowvram patches: 0

Prompt executed in 580.03 seconds
Prompt executed in 280.86 seconds
Prompt executed in 261.52 seconds
Prompt executed in 266.16 seconds

Flux2 bf16 (версия 60 ГБ) — падение на обеих системах¶

нужно решить проблему, когда модель не выгружается из памяти после подготовки на GPU

InfiniTalk:¶

Отчет по времени генерации видео 1:¶

infinitetalk-14B_infinitetalk-480_1_1_A_woman_is_passionately_singing_into_a_professiona_20251207_153723.mp4

Начало генерации: 2025-12-07 14:36:08,764
Этапы прогресса (каждый — 8 итераций):
Этап 1: 16:10 (121.35 с/итерацию)
Этап 2: 13:18 (99.82 с/итерацию)
Этап 3: 13:21 (100.13 с/итерацию)
Этап 4: 13:21 (100.20 с/итерацию)
Завершение: 2025-12-07 15:37:26,601

Отчет по времени генерации видео 2:¶

infinitetalk-14B_infinitetalk-480_1_1_A_woman_is_passionately_singing_into_a_professiona_20251207_153723.mp4

Начало генерации: 2025-12-07 16:02:31,707
Этапы прогресса (каждый — 8 итераций):
Этап 1: 14:37 (109.65 с/итерацию)
Этап 2: 13:45 (103.23 с/итерацию)
Этап 3: 13:44 (103.12 с/итерацию)
Этап 4: 13:40 (102.52 с/итерацию)
Завершение: 2025-12-07 17:01:54,140

Hunyuan3d 2.1¶

DGX Spark¶

3D + текстурирование. Настройки по умолчанию из gradio‑демо.

Тест: камень с руной

обработка: 205.9s – первый запуск
обработка: 215.5s
обработка: 220.2s
обработка: 222.4s

Волшебная шляпа Tencent:

обработка | 204.4s – первый запуск
обработка | 190.9s
обработка | 186.5s
обработка | 184.5s

RTX 4090, RunPod¶

Error CUDA out of memory. Tried to allocate 3.38 GiB. GPU 0 has a total capacity of 23.64 GiB of which 340.81 MiB is free. Process 143587 has 23.30 GiB memory in use. Of the allocated memory 20.43 GiB is allocated by PyTorch, and 2.38 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)