This post is a DGX Spark performance test for some of open-source neural networks.
Disclaimer: I am an enthusiast, not an industry lab. This is an informational write‑up only. I make no guarantees and accept no responsibility for how you use or interpret these numbers. I simply test the neural networks that successfully ran in my environment on DGX Spark.
Test setup¶
- App: ComfyUI/Gradio demos
- Workload: multiple generations per run (mostly)
- Runs: repeated runs; ComfyUI is restarted from scratch before every batch run
- Cold start is included once per run
- Identical prompt, model, sampler, resolution, and parameters across runs
- Cooldown: the device is left to cool down to idle temps between runs
Methodology¶
- Launch ComfyUI/Gradio, verify identical workflow graph and parameters.
- Allow the device to cool down to idle, then restart the app from scratch before each run to include cold-start cost once per run.
- Execute the workload.
- Capture console/stdout timings
Results (summary)¶
Z Image Turbo¶
Prompt:
A breath-taking extreme close-up cinematic still of an action-hero squirrel with
(reddish-colored fur:1.25) riding on the back of a fast-moving massive shark.
The shark is half submerged into the water and has (cybernetic augmentations:1.2) to its body.
Two rockets are strapped either side of the shark's body.
The squirrel is laying low and clinging onto the big shark.
The squirrel is wearing straps around its chest and (swimming goggles over its head:1.1) and has a wide open mouth
and scared expression on its face. The water is deep blue and splashes around the shark.
There's an impressive explosion with black plumes in the background.
In the background is the blue sky with a gradient going from light blue to deep blue.
The lighting is sunny. Motion blur and sense of speed. Cinematic movie poster shot.
Extremely realistic and detailed textures. extremely artistic, high contrast with (deep blacks:0.2), p
erfectly framed composition, rule of thirds, golden ratio, perfectly balanced composition, eye-candy,
highly detailed, best quality, award winning, featured
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Requested to load ZImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7672.25 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
unet missing: ['norm_final.weight']
Requested to load Lumina2
loaded completely; 72907.03 MB usable, 11739.55 MB loaded, full load: True
Requested to load AutoencodingEngine
loaded completely; 68185.67 MB usable, 159.87 MB loaded, full load: True
Prompt executed in 46.04 seconds - first run
Prompt executed in 5.96 seconds
Prompt executed in 5.96 seconds
Prompt executed in 6.00 seconds
Prompt executed in 5.96 seconds
Prompt executed in 6.01 seconds
Prompt executed in 6.01 seconds
Prompt executed in 6.01 seconds
![]() |
![]() |
![]() |
![]() |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
QWEN Image edit 2509¶
Prompt:
Replace a squirrel with an action-hero sloth lying low and clinging on tightly,
wearing straps around its chest and swimming goggles on its head,
its mouth wide open with a scared expression, extremely realistic and detailed textures,
highly detailed, best quality.
QWEN IMAGE EDIT 2509, Lightning 4-step, cfg: 1¶
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Requested to load WanVAE
loaded completely; 53450.61 MB usable, 242.03 MB loaded, full load: True
Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load QwenImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7909.74 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 35513.91 MB usable, 19483.95 MB loaded, full load: True
Prompt executed in 80.77 seconds - first run
Prompt executed in 17.87 seconds
Prompt executed in 17.96 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.04 seconds
Prompt executed in 18.05 seconds
![]() |
![]() |
![]() |
![]() |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
QWEN IMAGE EDIT 2509, 20 steps, no loras, cfg 2.5¶
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Requested to load WanVAE
loaded completely; 53384.45 MB usable, 242.03 MB loaded, full load: True
Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load QwenImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7909.74 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 35497.00 MB usable, 19483.95 MB loaded, full load: True
Prompt executed in 223.67 seconds
Prompt executed in 172.16 seconds
Prompt executed in 171.87 seconds
Prompt executed in 171.94 seconds
Prompt executed in 172.47 seconds
Prompt executed in 172.18 seconds
Prompt executed in 171.67 seconds
Prompt executed in 171.74 seconds
![]() |
![]() |
![]() |
![]() |
|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
Flux 2, image edit. Q8-0 GGUF:¶
Prompt:
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Requested to load AutoencoderKL
loaded completely; 109398.04 MB usable, 160.31 MB loaded, full load: True
Requested to load Flux2TEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 33080.59 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
gguf qtypes: F32 (128), Q8_0 (160), BF16 (11)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load Flux2
loaded partially; 30328.59 MB usable, 29958.78 MB loaded, 3854.25 MB offloaded, 344.25 MB buffer reserved, lowvram patches: 0
Prompt executed in 580.03 seconds
Prompt executed in 280.86 seconds
Prompt executed in 261.52 seconds
Prompt executed in 266.16 seconds
![]() |
![]() |
|---|---|
![]() |
![]() |
Flux2 bf16 (60 Gb version) - Crash.¶
need to solve a problem when model is not unloaded from memory after GPU preparation
InfiniTalk:¶
Video Generation Timing Report 1:¶
infinitetalk-14B_infinitetalk-480_1_1_A_woman_is_passionately_singing_into_a_professiona_20251207_153723.mp4
- Generation Start: 2025-12-07 14:36:08,764
- Progress Stages (each — 8 iterations):
- Stage 1: 16:10 (121.35 s/iteration)
- Stage 2: 13:18 (99.82 s/iteration)
- Stage 3: 13:21 (100.13 s/iteration)
- Stage 4: 13:21 (100.20 s/iteration)
- Completion: 2025-12-07 15:37:26,601
Video Generation Timing Report 2:¶
infinitetalk-14B_infinitetalk-480_1_1_A_woman_is_passionately_singing_into_a_professiona_20251207_153723.mp4
- Generation Start: 2025-12-07 16:02:31,707
- Progress Stages (each — 8 iterations):
- Stage 1: 14:37 (109.65 s/iteration)
- Stage 2: 13:45 (103.23 s/iteration)
- Stage 3: 13:44 (103.12 s/iteration)
- Stage 4: 13:40 (102.52 s/iteration)
- Completion: 2025-12-07 17:01:54,140
Hunyuan3d 2.1¶
3d + texturing. Default settings from gradio demo.
Stone with a rune test
- processing: 205.9s – First run
- processing: 215.5s
- processing: 220.2s
- processing: 222.4s
Tencent magic hat:
- processing | 204.4s – First run
- processing | 190.9s
- processing | 186.5s
- processing | 184.5s































