published on June 02, 2024 in gacor

How do I compute multiple per-sample gradients efficiently?

I am trying to compute multiple loss gradients efficiently (without a for loop) in PyTorch. Given:

import torch from torch import nn class NeuralNetwork(nn.Module): def __init__(self): super().__init__() self.linear = nn.Sequential( nn.Linear(input_size, 16, bias=False), nn.Linear(16, output_size, bias=False), ) def forward(self, x): return self.linear(x) device = "cpu" input_size = 2 output_size = 2 x = torch.randn(10, 1, input_size).to(device) y = torch.randn(10, 1, output_size).to(device) model = NeuralNetwork().to(device) loss_fn = nn.MSELoss() def loss_grad(x, label): y = model(x) loss = loss_fn(y, label) grads = torch.autograd.grad(loss, model.parameters(), retain_graph=True) return grads

The following works, but uses a for loop:

# inefficient but works def compute_for(): grads = [loss_grad(x[i], y[i]) for i in range(x.shape[0])] print(grads) compute_for()

For efficiency, I tried using torch.vmap instead:

# potentially more efficient but doesn't work def compute_vmap(): grads = torch.vmap(loss_grad)(x, y) print(grads) compute_vmap()

I was expecting it to compute the gradients of the losses w.r.t. the parameters for each element in x, y. Instead, I get an error:

RuntimeError: element 0 of tensors does not require grad

As I understand, this means that elements from the tensor x will be computed and they don't individually require grad.

How can I modify this code so that it computes all gradients? Or is there another method to do that?

1 Answer

The per-sample gradients may be computed using vmap as shown in the relevant tutorial:

from torch.func import functional_call, vmap, grad def compute_loss(params, buffers, sample, target): batch = sample.unsqueeze(0) targets = target.unsqueeze(0) predictions = functional_call(model, (params, buffers), (batch,)) loss = loss_fn(predictions, targets) return loss params = {k: v.detach() for k, v in model.named_parameters()} buffers = {k: v.detach() for k, v in model.named_buffers()} ft_compute_grad = grad(compute_loss) ft_compute_sample_grad = vmap(ft_compute_grad, in_dims=(None, None, 0, 0)) ft_per_sample_grads = ft_compute_sample_grad(params, buffers, x, y) print(ft_per_sample_grads)

These match the gradients computed individually for each pair (x[i], y[i]).

ncG1vNJzZmirpJawrLvVnqmfpJ%2Bse6S7zGiorp2jqbawutJobm9rZWiCdYKOoaawZZSkeqp5wqikqa2kmnquwcutoKmklWK9pr6MrJimqJyaeqi%2BwJ2gnqakqHqmssWimqKdnqm5ug%3D%3D