Datasets / Dataloaders

Error received 0 items of ancdata

Occurs when setting num_workers > 0 in DataLoader (i.e on Azure VMs).

# Use
torch.multiprocessing.set_sharing_strategy('file_system')

Source: https://discuss.pytorch.org/t/runtimeerror-received-0-items-of-ancdata/4999/2

Error with shared memory on Kubernetes

Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.

Solution: mount volume for /dev/shm in the Pod / Job:

spec:
  volumes:
  - name: dshm
    emptyDir:
      medium: Memory
  containers:
  - image:  image-name #specify your image name here
    volumeMounts:
      - mountPath: /dev/shm
        name: dshm

General

Check if CUDA is enabled

import torch
torch.cuda.is_available()

GPU does not work / GPU not available

  1. Check CUDA version (nvcc --version)
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2018 NVIDIA Corporation
    Built on Sat_Aug_25_21:08:01_CDT_2018
    Cuda compilation tools, release 10.0, V10.0.130
    
  2. Install PyTorch with appropriate CUDA version, i.e for conda installation: install pytorch torchvision cudatoolkit=10.0 -c pytorch
No matches...