This post is a DGX Spark performance test for some of the open-source neural networks.

Disclaimer: I am an enthusiast, not an industry lab. This is an informational writing‑up only. I make no guarantees and accept no responsibility for how you use or interpret these numbers. I simply test the neural networks that successfully ran in my environment on DGX Spark.

Test setup¶

App: ComfyUI/Gradio demos
Workload: multiple generations per run (mostly)
Runs: repeated runs; ComfyUI is restarted from scratch before every batch run
Cold start is included once per run
Identical prompt, model, sampler, resolution, and parameters across runs
Cooldown: the device is left to cool down to idle temps between runs

Methodology¶

Launch ComfyUI/Gradio, verify identical workflow graph and parameters.
Allow the device to cool down to idle, then restart the app from scratch before each run to include cold-start cost once per run.
Execute the workload.
Capture console/stdout timings

Results (summary)¶

Z Image Turbo¶

Prompt:

A breath-taking extreme close-up cinematic still of an action-hero squirrel with 
(reddish-colored fur:1.25) riding on the back of a fast-moving massive shark. 
The shark is half submerged into the water and has (cybernetic augmentations:1.2) to its body. 
Two rockets are strapped either side of the shark's body. 
The squirrel is laying low and clinging onto the big shark. 
The squirrel is wearing straps around its chest and (swimming goggles over its head:1.1) and has a wide open mouth 
and scared expression on its face. The water is deep blue and splashes around the shark. 
There's an impressive explosion with black plumes in the background. 
In the background is the blue sky with a gradient going from light blue to deep blue. 
The lighting is sunny. Motion blur and sense of speed. Cinematic movie poster shot. 
Extremely realistic and detailed textures. extremely artistic, high contrast with (deep blacks:0.2), p
erfectly framed composition, rule of thirds, golden ratio, perfectly balanced composition, eye-candy, 
highly detailed, best quality, award winning, featured

DGX Spark:¶

Result:

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load ZImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7672.25 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
unet missing: ['norm_final.weight']
Requested to load Lumina2
loaded completely; 72907.03 MB usable, 11739.55 MB loaded, full load: True

Requested to load AutoencodingEngine
loaded completely; 68185.67 MB usable, 159.87 MB loaded, full load: True
Prompt executed in 46.04 seconds - first run
Prompt executed in 5.96 seconds
Prompt executed in 5.96 seconds
Prompt executed in 6.00 seconds
Prompt executed in 5.96 seconds
Prompt executed in 6.01 seconds
Prompt executed in 6.01 seconds
Prompt executed in 6.01 seconds

RTX 4090, RunPod:¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load ZImageTEModel_
loaded completely; 22478.49 MB usable, 7672.25 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
model weight dtype torch.bfloat16, manual cast: None
model_type FLOW
unet missing: ['norm_final.weight']
Requested to load Lumina2
loaded completely; 14706.11 MB usable, 11739.55 MB loaded, full load: True

Requested to load AutoencodingEngine
loaded completely; 282.12 MB usable, 159.87 MB loaded, full load: True

Prompt executed in 26.28 seconds - first run
Prompt executed in 2.57 seconds
Prompt executed in 2.55 seconds
Prompt executed in 2.61 seconds
Prompt executed in 2.55 seconds
Prompt executed in 2.63 seconds
Prompt executed in 2.56 seconds
Prompt executed in 2.57 seconds

QWEN Image edit 2509, Lightning 4-step, cfg: 1¶

Prompt:

Replace a squirrel with an action-hero sloth lying low and clinging on tightly, 
wearing straps around its chest and swimming goggles on its head, 
its mouth wide open with a scared expression, extremely realistic and detailed textures, 
highly detailed, best quality.

DGX Spark.¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 53450.61 MB usable, 242.03 MB loaded, full load: True

Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load QwenImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7909.74 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 35513.91 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 80.77 seconds - first run
Prompt executed in 17.87 seconds
Prompt executed in 17.96 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.02 seconds
Prompt executed in 18.04 seconds
Prompt executed in 18.05 seconds

RTX 4090, RunPod.¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 20297.69 MB usable, 242.03 MB loaded, full load: True

Requested to load QwenImageTEModel_
loaded completely; 22144.46 MB usable, 7910.29 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 22007.91 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 72.46 seconds - first run
Prompt executed in 7.14 seconds
Prompt executed in 8.63 seconds
Prompt executed in 8.50 seconds
Prompt executed in 8.47 seconds
Prompt executed in 8.49 seconds
Prompt executed in 8.47 seconds
Prompt executed in 8.46 seconds

QWEN IMAGE EDIT 2509, 20 steps, no loras, cfg 2.5,¶

DGX Spark¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 53384.45 MB usable, 242.03 MB loaded, full load: True

Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load QwenImageTEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 7909.74 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 35497.00 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 223.67 seconds
Prompt executed in 172.16 seconds
Prompt executed in 171.87 seconds
Prompt executed in 171.94 seconds
Prompt executed in 172.47 seconds
Prompt executed in 172.18 seconds
Prompt executed in 171.67 seconds
Prompt executed in 171.74 seconds

RTX 4090, RunPod¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load WanVAE
loaded completely; 20297.69 MB usable, 242.03 MB loaded, full load: True

Requested to load QwenImageTEModel_
loaded completely; 22144.46 MB usable, 7910.29 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.bfloat16
model_type FLUX
Requested to load QwenImage
loaded completely; 22007.91 MB usable, 19483.95 MB loaded, full load: True

Prompt executed in 104.81 seconds - first run
Prompt executed in 57.80 seconds
Prompt executed in 57.88 seconds
Prompt executed in 57.91 seconds
Prompt executed in 57.74 seconds
Prompt executed in 57.81 seconds
Prompt executed in 58.01 seconds
Prompt executed in 57.63 seconds

Flux 2, image edit. Q8-0 GGUF:¶

Prompt:

Replace a squirrel with an action-hero sloth, keep equipment the same

DGX Spark¶

VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16

Requested to load AutoencoderKL
loaded completely; 109398.04 MB usable, 160.31 MB loaded, full load: True

Requested to load Flux2TEModel_
loaded completely; 95367431640625005117571072.00 MB usable, 33080.59 MB loaded, full load: True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
gguf qtypes: F32 (128), Q8_0 (160), BF16 (11)
model weight dtype torch.bfloat16, manual cast: None
model_type FLUX
Requested to load Flux2
loaded partially; 30328.59 MB usable, 29958.78 MB loaded, 3854.25 MB offloaded, 344.25 MB buffer reserved, lowvram patches: 0

Prompt executed in 580.03 seconds
Prompt executed in 280.86 seconds
Prompt executed in 261.52 seconds
Prompt executed in 266.16 seconds

Flux2 bf16 (60 Gb version) - Crash on both systems¶

need to solve a problem when model is not unloaded from memory after GPU preparation

InfiniTalk:¶

Video Generation Timing Report 1:¶

infinitetalk-14B_infinitetalk-480_1_1_A_woman_is_passionately_singing_into_a_professiona_20251207_153723.mp4

Generation Start: 2025-12-07 14:36:08,764
Progress Stages (each — 8 iterations):
Stage 1: 16:10 (121.35 s/iteration)
Stage 2: 13:18 (99.82 s/iteration)
Stage 3: 13:21 (100.13 s/iteration)
Stage 4: 13:21 (100.20 s/iteration)
Completion: 2025-12-07 15:37:26,601

Video Generation Timing Report 2:¶

infinitetalk-14B_infinitetalk-480_1_1_A_woman_is_passionately_singing_into_a_professiona_20251207_153723.mp4

Generation Start: 2025-12-07 16:02:31,707
Progress Stages (each — 8 iterations):
Stage 1: 14:37 (109.65 s/iteration)
Stage 2: 13:45 (103.23 s/iteration)
Stage 3: 13:44 (103.12 s/iteration)
Stage 4: 13:40 (102.52 s/iteration)
Completion: 2025-12-07 17:01:54,140

Hunyuan3d 2.1¶

DGX Spark¶

3d + texturing. Default settings from gradio demo.

Stone with a rune test

processing: 205.9s – First run
processing: 215.5s
processing: 220.2s
processing: 222.4s

Tencent magic hat:

processing | 204.4s – First run
processing | 190.9s
processing | 186.5s
processing | 184.5s

RTX 4090, RunPod¶

3d + texturing. Default settings from gradio demo.

Error
CUDA out of memory. Tried to allocate 3.38 GiB. GPU 0 has a total capacity of 23.64 GiB of which 340.81 MiB is free. 
Process 143587 has 23.30 GiB memory in use. Of the allocated memory 20.43 GiB is allocated by PyTorch, 
and 2.38 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting 
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. 
See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Tencent magic hat, 3d only, No Textures

processing | 2.8/21.3s - First run
processing | 2.2/20.5s
processing | 2.0/20.4s
processing | 3.7/20.6s

Stone with a rune test, 3d only, No Textures

processing | 2.3/23.5s - First run
processing | 4.8/21.9s
processing | 3.8/21.2s
processing | 3.3/20.8s