Debugging the TensorFlow / Cuda error on AWS – ImportError: libcublas.so.9.0: cannot open shared object file

Cause of error

This error is caused because mismatch in versions of tensorflow-gpu and CUDA. Every tensorflow-gpu lib is dependent on a very specific CUDA version.

Check our versions

Check tensorflow-gpu version :

pip list | grep tensorflow-gpu

Our tensorflow-gpu version is 1.8.0.

Check CUDA version:

ls -l /usr/local/cuda

Our cuda version is cuda-8.0.

Investigate issue

What are the compatible cuda versions for tensorflow :

Let’s refer to official TensorFlow page for the version compatibility.

So required dependencies for our tensorflow-gpu versions are:

  • tensorflow-gpu 1.8.0
  • Python 2.7
  • Cuda 9

So we have 2 options:

  1. Downgrade tensorflow-gpu to version 1.4.0 to match our system cuda version cuda-8.0
  2. Upgrade system cuda version to cuda-9.0 to match our tensor flow gpu version 1.8.0.

Also note your Python version if the version of tensorflow gpu supports your python version.

Option 1. Upgrading/Downgrading system Cuda

AWS AMI’s have multiple versions of cuda pre installed on the box and you might have your new cuda version already. So in that case we just need to update the softlink to point to our expected version of cuda. If not you will have to install your new cuda version.

# Look at the current cuda version
ls -l /usr/local/cuda

# Look at the required cuda version
ls /usr/local/cuda-9.0

# Remove softlink to current cuda version
sudo rm /usr/local/cuda

# Add softlink to new version
ln -s /usr/local/cuda-9.0 /usr/local/cuda

Option 2. Upgrading/Downgrading tensorflow-gpu

pip uninstall tensorflow-gpu

pip install tensorflow-gpu==1.4.0

pip list | grep tensorflow

Thats all

Sometimes a notebook restart is required. The most important thing is to match the expected version dependencies.

 

Leave a Reply

Your email address will not be published. Required fields are marked *