Datasets / Dataloaders
Error received 0 items of ancdata
Occurs when setting num_workers > 0 in DataLoader (i.e on Azure VMs).
# Use
torch.multiprocessing.set_sharing_strategy('file_system')
Source: https://discuss.pytorch.org/t/runtimeerror-received-0-items-of-ancdata/4999/2
Error with shared memory on Kubernetes
Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
Solution: mount volume for /dev/shm
in the Pod / Job:
spec:
volumes:
- name: dshm
emptyDir:
medium: Memory
containers:
- image: image-name #specify your image name here
volumeMounts:
- mountPath: /dev/shm
name: dshm
General
Check if CUDA is enabled
import torch
torch.cuda.is_available()
GPU does not work / GPU not available
- Check CUDA version (
nvcc --version
)nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130
- Install PyTorch with appropriate CUDA version, i.e for
conda
installation:install pytorch torchvision cudatoolkit=10.0 -c pytorch
No matches...