Get GPU from AWS
Let’s create a GPU instance for our Deep Learning workloads. We need an AWS EC2 instance for this. Login to AWS web console and lookup for the EC2 service and click Launch Instance.
Look for the Deep Learning AMI‘s. These are preferable since they come with Python3, Jupyter and lot of other libraries pre-installed. Select an instance type that might be suitable for you.
Security details
Jupyter notebook can be accessed in 2 ways:
- Approach 1 – Tunnel to the EC2 instance (recommended)
- Approach 2 – Expose the HTTP and TCP port to talk to Jupyter
For approach 1 we just need the SSH configs in the security group. It should be present by default and no changes should be required.
Approach 2 is a little less recommended since it opens up the instance further to the internet. However this can be done via the HTTPS port 443 and Custom TCP port 8888.
Key-pair / pem file
Another important step is to note is to have access to the key-pair that is used to create the instance. If you do not have a key pair you can generate one at the time of creation. AWS will download the key for you if you decide to create one.
Note: If there are multiple team mates working on this box, it’s very important to share the key-pair (pem) file via Last pass or other password sharing services. DO-NOT use emails / slack for password/key-file exchanges.
Create instance
Thats all for configs, create the instance. This will take 2-3 mins. Get the IP address of the created box.
Login to the box
We can now login to the box –
# The downloaded key-pair (pem) file is more permissive. # Fix permissions. # chmod 400 <path to pem file> chmod 400 ~/path/to/key.pem # Get in the box # ssh -i <path to pem file> -L <local box:port>:<remote box:port> ubuntu@<ec2 ip address> ssh -i ~/path/to/key.pem -L localhost:9999:localhost:8888 ubuntu@11.11.11.11
Note: I recommend using port 9999 for above command localhost:9999:localhost:8888
, since your local installation of Jupyter might interfere with this setup.
Get Jupyter started
On the remote box (EC2 box)’s terminal start Jupyter. Its recommended to start Jupyter in a Screen session so that you can close the terminal and Jupyter server still stays alive.
# Create a new screen session screen -S jupyter_session # Start Jupyter jupyter notebook
This should provide us a URL and a password(token) for our Jupyter notebooks – http://localhost:8888/?token=thisisalongsecrettoken
.
Get running
You should be able to access Jupyter on your local box now on http://localhost:9999
Where’s my data
You could download your data from S3 (amazon recommended storage) from your terminal or via python. I had my data on Google drive so I am using this cool utility lib googledrivedownloader to get my data to the box. You will need to create a sharabele link to the data in your G-Drive for this lib. The lib also creates sub-directory and unzips the data for us.
! pip3 install "googledrivedownloader==0.4" from google_drive_downloader import GoogleDriveDownloader as gdd #file_id - id of the sharable link of the file from Google Drive. gdd.download_file_from_google_drive( file_id='1jgSl1thisisafakekeyRleIGpIO3Q', dest_path='./data/Flikr8k_text.zip', unzip=True)
Cleanup is important $$$
Make sure you stop/terminate the instance when not in use. Else Amazon gets $$$. On stopping the instance the data will stay on the node storage and you will not be charged. Please read official docs for more info.
Important Note: your notebooks and downloaded data will be deleted on terminating the instance and you will have to get through all of the above steps again.
Thats a wrap
In a future post we will have a look at Amazon Sagemaker which provides a much unified experience for Data Science notebooks.
Thats all for this post. Hope it was useful. Get the models running.