Skip to content

This post is a DGX Spark performance test for some of open-source neural networks.

Disclaimer: I am an enthusiast, not an industry lab. This is an informational write‑up only. I make no guarantees and accept no responsibility for how you use or interpret these numbers. I simply test the neural networks that successfully ran in my environment on DGX Spark.

Test setup

  • App: ComfyUI/Gradio demos
  • Workload: multiple generations per run (mostly)
  • Runs: repeated runs; ComfyUI is restarted from scratch before every batch run
  • Cold start is included once per run
  • Identical prompt, model, sampler, resolution, and parameters across runs
  • Cooldown: the device is left to cool down to idle temps between runs

Methodology

  1. Launch ComfyUI/Gradio, verify identical workflow graph and parameters.
  2. Allow the device to cool down to idle, then restart the app from scratch before each run to include cold-start cost once per run.
  3. Execute the workload.
  4. Capture console/stdout timings

Results (summary)

Z Image Turbo

ComfyUI setup – Z Image Turbo

Prompt:

A breath-taking extreme close-up cinematic still of an action-hero squirrel with 
(reddish-colored fur:1.25) riding on the back of a fast-moving massive shark. 
The shark is half submerged into the water and has (cybernetic augmentations:1.2) to its body. 
Two rockets are strapped either side of the shark's body. 
The squirrel is laying low and clinging onto the big shark. 
The squirrel is wearing straps around its chest and (swimming goggles over its head:1.1) and has a wide open mouth 
and scared expression on its face. The water is deep blue and splashes around the shark. 
There's an impressive explosion with black plumes in the background. 
In the background is the blue sky with a gradient going from light blue to deep blue. 
The lighting is sunny. Motion blur and sense of speed. Cinematic movie poster shot. 
Extremely realistic and detailed textures. extremely artistic, high contrast with (deep blacks:0.2), p
erfectly framed composition, rule of thirds, golden ratio, perfectly balanced composition, eye-candy, 
highly detailed, best quality, award winning, featured
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load ZImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7672.25 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
unet missing: ['norm_final.weight']
Requested to load Lumina2
loaded completely; 72907.03 MB usable, 11739.55 MB loaded, full load: True

Requested to load AutoencodingEngine
loaded completely; 68185.67 MB usable, 159.87 MB loaded, full load: True
Prompt executed in 46.04 seconds - first run
Prompt executed in 5.96 seconds
Prompt executed in 5.96 seconds
Prompt executed in 6.00 seconds
Prompt executed in 5.96 seconds
Prompt executed in 6.01 seconds
Prompt executed in 6.01 seconds
Prompt executed in 6.01 seconds
Z-Image result 1 Z-Image result 2 Z-Image result 3 Z-Image result 4
Z-Image result 5 Z-Image result 6 Z-Image result 7 Z-Image result 8

QWEN Image edit 2509

Prompt:

Replace a squirrel with an action-hero sloth lying low and clinging on tightly, 
wearing straps around its chest and swimming goggles on its head, 
its mouth wide open with a scared expression, extremely realistic and detailed textures, 
highly detailed, best quality.

QWEN IMAGE EDIT 2509, Lightning 4-step, cfg: 1

ComfyUI setup – Qwen Image Edit 2509 (Lightning 4-step)

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 53450.61 MB usable, 242.03 MB loaded, full load: True

Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load QwenImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7909.74 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 35513.91 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 80.77 seconds - first run
Prompt executed in 17.87 seconds
Prompt executed in 17.96 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.04 seconds
Prompt executed in 18.05 seconds
Qwen 4-step result 1 Qwen 4-step result 2 Qwen 4-step result 3 Qwen 4-step result 4
Qwen 4-step result 5 Qwen 4-step result 6 Qwen 4-step result 7 Qwen 4-step result 8

QWEN IMAGE EDIT 2509, 20 steps, no loras, cfg 2.5

ComfyUI setup – Qwen Image Edit 2509 (20 steps)

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 53384.45 MB usable, 242.03 MB loaded, full load: True

Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load QwenImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7909.74 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 35497.00 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 223.67 seconds
Prompt executed in 172.16 seconds
Prompt executed in 171.87 seconds
Prompt executed in 171.94 seconds
Prompt executed in 172.47 seconds
Prompt executed in 172.18 seconds
Prompt executed in 171.67 seconds
Prompt executed in 171.74 seconds
Qwen 20-step result 1 Qwen 20-step result 2 Qwen 20-step result 3 Qwen 20-step result 4
Qwen 20-step result 5 Qwen 20-step result 6 Qwen 20-step result 7 Qwen 20-step result 8

Flux 2, image edit. Q8-0 GGUF:

ComfyUI setup – Flux 2 GGUF Q8-0

Prompt:

Replace a squirrel with an action-hero sloth, keep equipment the same
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load AutoencoderKL
loaded completely; 109398.04 MB usable, 160.31 MB loaded, full load: True

Requested to load Flux2TEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 33080.59 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
gguf qtypes: F32 (128), Q8_0 (160), BF16 (11)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load Flux2
loaded partially; 30328.59 MB usable, 29958.78 MB loaded, 3854.25 MB offloaded, 344.25 MB buffer reserved, lowvram patches: 0

Prompt executed in 580.03 seconds
Prompt executed in 280.86 seconds
Prompt executed in 261.52 seconds
Prompt executed in 266.16 seconds
Flux2 GGUF result 1 Flux2 GGUF result 2
Flux2 GGUF result 3 Flux2 GGUF result 4

Flux2 bf16 (60 Gb version) - Crash.

need to solve a problem when model is not unloaded from memory after GPU preparation

InfiniTalk:

Video Generation Timing Report 1:

infinitetalk-14B_infinitetalk-480_1_1_A_woman_is_passionately_singing_into_a_professiona_20251207_153723.mp4

  • Generation Start: 2025-12-07 14:36:08,764
  • Progress Stages (each — 8 iterations):
  • Stage 1: 16:10 (121.35 s/iteration)
  • Stage 2: 13:18 (99.82 s/iteration)
  • Stage 3: 13:21 (100.13 s/iteration)
  • Stage 4: 13:21 (100.20 s/iteration)
  • Completion: 2025-12-07 15:37:26,601

Video Generation Timing Report 2:

infinitetalk-14B_infinitetalk-480_1_1_A_woman_is_passionately_singing_into_a_professiona_20251207_153723.mp4

  • Generation Start: 2025-12-07 16:02:31,707
  • Progress Stages (each — 8 iterations):
  • Stage 1: 14:37 (109.65 s/iteration)
  • Stage 2: 13:45 (103.23 s/iteration)
  • Stage 3: 13:44 (103.12 s/iteration)
  • Stage 4: 13:40 (102.52 s/iteration)
  • Completion: 2025-12-07 17:01:54,140

Hunyuan3d 2.1

3d + texturing. Default settings from gradio demo.

Stone with a rune test

  • processing: 205.9s – First run
  • processing: 215.5s
  • processing: 220.2s
  • processing: 222.4s

Tencent magic hat:

  • processing | 204.4s – First run
  • processing | 190.9s
  • processing | 186.5s
  • processing | 184.5s