What you can do

Edit your photos

Outline image

Change background color

Resize image

Add white background

Top AI tools

AI Background Remover

Batch Mode

Save hours by editing in batch

Instantly remove the most complex backgrounds from multiple images at once.

Industries

Solutions

Remove Background API

Generate Background API

For your business

Photoroom API

Discover the power of the Photoroom API and how it can help streamline visual content across different industries.

Inside PhotoroomThe Hunt for Cheap Deep Learning GPUs

The Hunt for Cheap Deep Learning GPUs

Eliot AndresJuly 12, 2020

Recently, I have been struggling to find cheap and reliable GPUs to train deep learning models. In this article, I will summarize the options you have to run deep learning computations on GPUs.

Not too long ago, you could rent a beefy GPU machine for 100€/month. Hetzner, a German server provider, was offering those specs:

It was fast and reliable. The good times. However, they discontinued this offering. Nowadays, if you want to get a GPU for deep learning, you have several options:

Use a cloud provider (GCP, AWS, Azure)
Use a cloud provider with preemptible machines
Rent a bare metal machine
Build your own

Foreword

Hetzner offered cheap and reliable servers. They had a good reputation. Why did they stop? While there is no official reason, it is likely that changes in the Nvidia's license is the reason. NVIDIA updated their license to ban the use of consumer GPUs (e.g. 1080, 2080 models) in their data centers. Therefore, most large server provider stopped offering cheap GPU servers.

Using a Cloud provider

Google Cloud, AWS and Azure all offer GPU machines. This is the most expensive option in our list. In theory, you can scale your cluster's size on demand. They offer GPUs for training (V100) and inference (T4).

My experience: some providers run unscheduled maintenance on your machine. It means they will need to kill your instance to migrate it to another (but keep the content of the disk). You get a 1 hour termination notice for GCP, more for the others. It very inconvenient when you start a large training over the weekend, only to realize that your machine has been killed on Friday evening. On top of that, some regions sometimes run out of GPUs. This means that when attempting to create a machine, it will fail. This does not happen often, but when it does it is very annoying.

Pros:

Scaling on demand (limited by quota and availability)
Can pick any number of CPUs (useful for preprocessing-intensive jobs)

Cons:

Unscheduled maintenance is a pain(1 hour notice for GCP, ~24hours for AWS, can happen once a week)
Expensive

Using preemptible instances

Most cloud providers offer preemptible machines, with a significant discount (at least 50%, often more). In exchange, you accept that your machine can be killed at any moment. It is not very convenient when training models and saving the checkpoints every epoch. Working around that takes a lot of engineering.

My experience: my instances are sometimes killed in less than an hour, making it unusable. Try it out and see if it works for you (might depend on the region)

Pros:

Cheaper
Scalable

Cons:

Machine can be killed at any moment

Renting a bare metal machine

Some providers are still offering consumer GPUs, officially not for deep learning. A Google search will yield plenty of them. You can also look here. The price vary from provider to provider.

My experience: Reliability is not great. I made the mistake of using one of those servers as a production server. Then, it went down on a Saturday at 1 am. Here is the support's answer:

YMMV, and you must make your own arbitrage between price and reliability.

Pros:

Cheap, plenty
No weekly scheduled maintenance

Cons:

Sometimes unreliable (YMMV)
Does not scale quickly as with a regular cloud provider (need to order the machine, sometimes need a monthly commitment)

Subrenting a server

I never tried this, but vast.ai is a marketplace offering very affordable prices. Anyone can list a GPU there, therefore I am not exactly sure how reliable it is.

Build your GPU server

If you have the time and the rack space, building your own GPU machine might be the cheapest option. Depending on how cheap you need to go, keep an eye for used GPUs on eBay. Keep in mind that you will have to pay for electricity and that having a noisy machine heating your office in the middle of summer is the best way to turn your colleagues into enemies.

Pros:

Cheapest option (depending on electricity cost)
Custom specs (useful if you need plenty of storage)

Cons:

Time consuming
Not convenient (noise, heat)

What we ended up doing at Photoroom

For training, we built our own machine (using 2080 TIs). For larger training, we use GCP with V100s and cross fingers that there will not be any maintenance event. For inference, we use GCP's T4 GPUs, in a managed instance group. This means that if they need to kill a machine for maintenance, they will automatically spin up a new one.

Conclusion

Please keep in mind that I am not endorsing any of those options, pick one at your own risk. In the end, it's a trade-off between price, convenience, reliability and scalability. Also note that running inference on CPUs can be cheaper. A few helpful links:

Any idea on how to improve this ? Any comment? Reach out on Twitter

Eliot AndresCTO & Co-founder @ Photoroom