PyTorch tensor broadcasting in 3 minutes

Broadcasting, or arithmetic operations between tensors, is simple. If you're confused, give me 180 seconds to give you the most straightforward explanation on the web.

Two things to know:

Arithmetic between two tensors is always done elementwise, and requires tensor shapes to match.
Broadcasting resizes tensors of different shape so that arithmetic can then be done.

Tensor arithmetic requires matching tensor shapes

// Good
[10, 11, 12] + [1, 1, 1] = [11, 12, 13]

// No good
[10, 11, 12] + [1, 1]    = ?

A "broadcast" duplicates items to guarantee matching size, so arithmetic is possible

// Your input
[10, 11, 12] + [1]       = [11, 12, 13]

// After broadcasting
[10, 11, 12] + [1, 1, 1] = [11, 12, 13]

So how does this actually work?

Every tensor has a corresponding shape.

tensor = [[1, 2, 3],
          [4, 5, 6]]

shape  = [2, 3] <-

Anytime we do arithmetic between two tensors that don't have identical shapes, we see if we can use broadcasting to make their shapes match. The example below contains the shape vectors of the two tensors involved in the arithmetic operation.

When thinking about broadcasting, we don't need to think about what's in the tensor. Only about the shape of the tensor.

An example

We start by right aligning the two tensors, and then begin comparing dimension by dimension.

Step 1

shapeA = [4, 5, 3, 2]
shapeB =    [5, 1, 2]
                   ^

These are equal, so no change.

Step 2

shapeA = [4, 5, 3, 2] => [4, 5, 3, 2]
shapeB =    [5, 1, 2]       [5, 3, 2]
                ^               ^

One of the two dimensions has a size of 1, we just duplicate that single item dimension to match the dimensions between the two tensors.

Step 3

shapeA = [4, 5, 3, 2]
shapeB =    [5, 3, 2]
             ^

The dimension sizes match, so no change. Continue leftwards.

Step 4

shapeA = [4, 5, 3, 2] => [4, 5, 3, 2] => [4, 5, 3, 2]
shapeB =    [5, 3, 2]    [1, 5, 3, 2]    [4, 5, 3, 2]
          ^               ^               ^

One of the tensors doesn't have as many dimensions as the other! So we wrap it in a new dimension, and then broadcast it as usual. Now that the tensor dimensions match, we can multiply them as usual.

What's being duplicated?

shapeA = [4, 5, 3, 2] => [4, 5, 3, 2] => [4, 5, 3, 2]
shapeB =    [5, 3, 2]    [1, 5, 3, 2]    [4, 5, 3, 2]
          ^               ^               ^

// We duplicate [5, 3, 2]

We are always duplicating the entirety of the data whose dimensions to the right. In this case we're duplicating everything in that [5, 3, 2] 4 times.

Yes, we're often duplicating vectors and matrices

tensorA = [[1, 2], [3, 4], [5, 6]]
tensorB = [[0, 2]]

shapeA = [3, 2] => [3, 2]
shapeB = [1, 2] => [3, 2]

// tensorB gets broadcast to [[0, 2], [0, 2], [0, 2]]
tensorA - tensorB

Run it back

Again, broadcasting is simply an operation to resize your tensor so that arithmetic can be done between tensors of matching sizes. That's it. Try it out in pytorch / matlab / numpy or whatever lib you write code with.

@ me on twitter.com if you found this helpful or have lingering questions.