Question 1

Is FlashAttention an approximation of regular attention?

Accepted Answer

This is covered in the "Understand FlashAttention and Tiling" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Question 2

Why is standard attention memory-bound instead of compute-bound?

Accepted Answer

This is covered in the "Understand FlashAttention and Tiling" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Question 3

What is online softmax and why does it stay exact across tiles?

Accepted Answer

This is covered in the "Understand FlashAttention and Tiling" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Question 4

How does FlashAttention-2 differ from FlashAttention-1?

Accepted Answer

This is covered in the "Understand FlashAttention and Tiling" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

Question 5

What changes in FlashAttention-3 on Hopper GPUs?

Accepted Answer

This is covered in the "Understand FlashAttention and Tiling" learning path on Droplet. Start with daily 5-minute micro-lessons that build from fundamentals to hands-on application.

⚡Understand FlashAttention and Tiling

Phase 1Why attention bleeds memory before it bleeds math

Attention isn't slow — its memory traffic is

SRAM is the scratchpad nobody used

The N×N matrix that never had to exist

Same math, completely different schedule

Phase 2Tiling and online softmax on graph paper

Cut the matrix until it fits in your scratchpad

The running-max trick that streams softmax exactly

Walk one Q tile through every K tile on paper

The backward pass that never stored the matrix

Causal masking that doesn't waste tiles

Phase 3FA1 vs FA2 vs FA3 — same math, better schedule

FA2 flipped the loop and doubled the speedup

FA3 stops waiting for memory and starts overlapping it

Your team turned on FA. Which version is running?

PyTorch SDPA, xFormers, and the FA family tree

Phase 4Estimate FA speedup from a roofline you draw

Draw the roofline. Predict FA's win for your sequence.

Frequently asked questions

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition

Phase 1Why attention bleeds memory before it bleeds math

Attention isn't slow — its memory traffic is

SRAM is the scratchpad nobody used

The N×N matrix that never had to exist

Same math, completely different schedule

Phase 2Tiling and online softmax on graph paper

Cut the matrix until it fits in your scratchpad

The running-max trick that streams softmax exactly

Walk one Q tile through every K tile on paper

The backward pass that never stored the matrix

Causal masking that doesn't waste tiles

Phase 3FA1 vs FA2 vs FA3 — same math, better schedule

FA2 flipped the loop and doubled the speedup

FA3 stops waiting for memory and starts overlapping it

Your team turned on FA. Which version is running?

PyTorch SDPA, xFormers, and the FA family tree

Phase 4Estimate FA speedup from a roofline you draw

Draw the roofline. Predict FA's win for your sequence.

Frequently asked questions

Related paths

🐍Python Decorators Introduction

🦀Rust Lifetimes Explained

☸️Kubernetes Core Concepts

📈Big O Intuition