Popular tools

Photo editing

Easy photo editing for individuals, growing brands, and enterprises alike: remove backgrounds, resize, polish, and create impactful product visuals in minutes.

Use cases

Faster product listings

Catalogue at scale

Scale ads

Drive online sales

Increase brand visibility

Create social media posts

Marketplace playground

Retail playground

Image editing playground

ROI calculator

For your business

Customer stories

Discover how enterprises, small businesses and entrepreneurs achieve professional results with Photoroom.

The Hunt for Cheap Deep Learning GPUs

July 12, 2020

Recently, I have been struggling to find cheap and reliable GPUs to train deep learning models. In this article, I will summarize the options you have to run deep learning computations on GPUs.

Not too long ago, you could rent a beefy GPU machine for 100€/month. Hetzner, a German server provider, was offering those specs:

It was fast and reliable. The good times. However, they discontinued this offering. Nowadays, if you want to get a GPU for deep learning, you have several options:

Use a cloud provider (GCP, AWS, Azure)
Use a cloud provider with preemptible machines
Rent a bare metal machine
Build your own

Foreword

Hetzner offered cheap and reliable servers. They had a good reputation. Why did they stop? While there is no official reason, it is likely that changes in the Nvidia's license is the reason. NVIDIA updated their license to ban the use of consumer GPUs (e.g. 1080, 2080 models) in their data centers. Therefore, most large server provider stopped offering cheap GPU servers.

Using a Cloud provider

Google Cloud, AWS and Azure all offer GPU machines. This is the most expensive option in our list. In theory, you can scale your cluster's size on demand. They offer GPUs for training (V100) and inference (T4).

My experience: some providers run unscheduled maintenance on your machine. It means they will need to kill your instance to migrate it to another (but keep the content of the disk). You get a 1 hour termination notice for GCP, more for the others. It very inconvenient when you start a large training over the weekend, only to realize that your machine has been killed on Friday evening. On top of that, some regions sometimes run out of GPUs. This means that when attempting to create a machine, it will fail. This does not happen often, but when it does it is very annoying.

Pros:

Scaling on demand (limited by quota and availability)
Can pick any number of CPUs (useful for preprocessing-intensive jobs)

Cons:

Unscheduled maintenance is a pain(1 hour notice for GCP, ~24hours for AWS, can happen once a week)
Expensive

Using preemptible instances

Most cloud providers offer preemptible machines, with a significant discount (at least 50%, often more). In exchange, you accept that your machine can be killed at any moment. It is not very convenient when training models and saving the checkpoints every epoch. Working around that takes a lot of engineering.

My experience: my instances are sometimes killed in less than an hour, making it unusable. Try it out and see if it works for you (might depend on the region)

Pros:

Cheaper
Scalable

Cons:

Machine can be killed at any moment

Renting a bare metal machine

Some providers are still offering consumer GPUs, officially not for deep learning. A Google search will yield plenty of them. You can also look here. The price vary from provider to provider.

My experience: Reliability is not great. I made the mistake of using one of those servers as a production server. Then, it went down on a Saturday at 1 am. Here is the support's answer:

YMMV, and you must make your own arbitrage between price and reliability.

Pros:

Cheap, plenty
No weekly scheduled maintenance

Cons:

Sometimes unreliable (YMMV)
Does not scale quickly as with a regular cloud provider (need to order the machine, sometimes need a monthly commitment)

Subrenting a server

I never tried this, but vast.ai is a marketplace offering very affordable prices. Anyone can list a GPU there, therefore I am not exactly sure how reliable it is.

Build your GPU server

If you have the time and the rack space, building your own GPU machine might be the cheapest option. Depending on how cheap you need to go, keep an eye for used GPUs on eBay. Keep in mind that you will have to pay for electricity and that having a noisy machine heating your office in the middle of summer is the best way to turn your colleagues into enemies.

Pros:

Cheapest option (depending on electricity cost)
Custom specs (useful if you need plenty of storage)

Cons:

Time consuming
Not convenient (noise, heat)

What we ended up doing at Photoroom

For training, we built our own machine (using 2080 TIs). For larger training, we use GCP with V100s and cross fingers that there will not be any maintenance event. For inference, we use GCP's T4 GPUs, in a managed instance group. This means that if they need to kill a machine for maintenance, they will automatically spin up a new one.

Conclusion

Please keep in mind that I am not endorsing any of those options, pick one at your own risk. In the end, it's a trade-off between price, convenience, reliability and scalability. Also note that running inference on CPUs can be cheaper. A few helpful links:

Any idea on how to improve this ? Any comment? Reach out on Twitter

Eliot AndresCo-founder @ Photoroom

Keep reading

The Hunt for Cheap Deep Learning GPUs

Eliot Andres

What's new in product: February 2024

Jeanette Sha

Building Google-Docs-like live collaboration for a cross-platform app used by millions (in Rust)

Eliot Andres

Explore new Gen AI features coming soon to the Photoroom API

Udo Kaja

Building a modern data stack to ship models to millions of users

Benjamin Lefaudeux

How we automated our changelog thanks to ChatGPT

Jeremy Benaim

What's new in product: October 2023

Jeanette Sha

How we divided our server latency by 3 by switching from T4 GPUs to A10g

Matthieu Toulemont

10 tools used to ship an iOS app in 2 weeks

Matthieu Rouif

How we measured the CO2 emissions of our AI models at inference time

Matthieu Toulemont

See all

Sell faster with studio‑quality product visuals

Drive sales with professional visuals you can create in minutes, with brand consistency and control.

Start free trial

Contact sales