Short name describing what triggered the graph break
dtype mismatch between tensor and its gradient
Values or code snippet captured at the break point
tensor dtype: {value.dtype}; grad dtype: {safe_grad(value).dtype}
Explanation of why the graph break was triggered
Inconsistent dtype between tensor and its gradient. This can happen in FSDP and crashes meta tensor creation.
Hints on how to resolve the graph break