Artificial intelligence, through deep learning, is a well-established discipline. The algorithms used are progressing, but these algorithms depend on very physical machines, we are going to see how to build a deep learning server!
What is a server? It’s a machine that costs to buy, consumes and costs electricity to use, and uses whatever software you want on the operating system you choose to operate the components you have purchased.
First question to ask: what do we want? Deep learning can be done on just about any Laptop. Having results close to the state of the art on a dataset like MNIST is accessible even on processors from old phones. On CIFAR10, low to mid-range laptops are sufficient.
When you want to leave the sandbox to enter the real world, in which an image is rarely 32 * 32 pixels or when you want to process texts efficiently, it is time to ask yourself the question of buying a waiter.
From there, everything is imaginable, from the small homemade server at $1000 for students / beginners, to the Nvidia DGX / HGX server at 150,000 € for the CAC40 companies, through the small business server at $5000.
The Laptop: the basis
Since I feel like not everyone who reads this has it, here’s the basics for a Laptop. Let’s forget about the screen, mouse, keyboard, thermal paste, wifi card, etc. A Laptop needs:
• From a power supply (PSU)
• From a motherboard (MB)
• A processor (CPU)
• Memory (RAM)
• Storage (SSD / HDD)
• A graphics card (GPU)
During installation, the object of the game is to put the squares in the squares and the circles in the circles. The motherboard contains lots of “holes” (connectors / slots), you have to put the squares (components / component wires) in, and you have to fit it into the box, then you plug in, you start and it works (see lots of videos on youtube for the exact order of operations).
It may happen that a graphics card is integrated in some processors, this does not exempt from obtaining a dedicated graphics card (apart).
Now that you know what you need, the right question is, what are you going to buy for each piece in this set. And this is where we come back to the first part: it all depends on your use.
I will walk you through the qualitative and limiting elements of each of these parts, and then it will be up to you to find, price fluctuations being, the cheapest for your use, depending on when you read this.
The number of graphics card
A machine with a single GPU over its entire life cycle can cost € 1000, a machine that could go up to 4 GPUs but with the same components as the first will cost € 2000 (without the other 3 GPUs of course) . It all depends on your ability to estimate your needs over 5 years.
First step therefore, ask yourself how many GPUs you want, and will want. You are a student? It will be one or two. Are you a small business? Between 2 and 4. A large company? Between 4 and 8. Obviously, this is important because going from 1 to 2 GPUs, with the same GPU, makes it possible to double the computing power over a lot of use, without having to buy a machine.
More GPUs = Need for more physical PCIex16 ports on the motherboard, = Limitation on choice of CPU (not all of them manage 4 GPUs), = Increase in power supply.
To understand this it’s simple, a GPU needs a PCIe slot, the CPU must be able to communicate with the GPU through the PCIe port, and GPUs are very energy intensive.
Whatever happens, the objective of a deep learning build is to create a bottleneck on the GPU (s), i.e. to limit the computing time to the power of the / GPU (s), and not to CPU, not to memory, not to hard disk read etc. If you have 4 GPUs but your processor was already running at 100% with two, congratulations you have two more decorations in your box (but tuning is expensive).
For 2 GPUs, with 1s CPU => GPU1, 1s CPU => GPU2, and 1s (GPU1 + GPU2), you have 3s of calculation. With 4 GPUs you will go to 1s CPU => GPU (i) x4, and 1s (GPU (i) x4), i.e. 4 + 1 = 5s. You have a CPU bottleneck, you have to change the code or buy a new CPU, not new GPUs. + 67% computing time, + 100% computing power.
For 2 GPUs, with 0.1s CPU => GPU (i) and 2.8s (GPU1 + GPU2), you have 3s of calculation. For 4 GPUs, you will have 3.2s of computing time, + 6.7% of computing time and + 100% of computing power.
The CPU => GPU operation is a data preparation, transfer operation, but it can also include reading from the disk in case of a disk bottleneck etc. Note that the worst case of having + 100% of computing time with + 100% of computing power, it is the same as leaving the config at 2GPU running longer.
And finally, having 100% more calculation available does not mean a division by two of the training time since this increase is parallel and the algorithms do not necessarily follow a linear law on the data time_entrainement = f (power_parallel).
Obviously, even if the build is important to avoid bottlenecks elsewhere than on the GPU, the algorithms used are just as important.
The processor / motherboard pair
The basis of a Laptop is to determine its range, and this is framed by the processor / motherboard pair. The processors are often only associated with a very small number of motherboards.
The two processor manufacturers are Intel and AMD , and each has different ranges, ranging from low cost and mainstream to the high-end server market.
AMD is the preferred manufacturer in terms of quality / price ratio. Fortunately, tools exist, and in particular the PC Part Picker site which will allow you to avoid
CPU / MB incompatibilities.
Since we are interested in power, we start with the processor. It should have a bare minimum of 4 8-thread cores in 2020. You can go up to 16 32-thread cores, and more if you know you will need them.
Prices range from 100 € to> 4000 €. The rule is simple, the cheaper it is, the less you can do something with it.
Obviously, this choice must be adapted to the number of GPUs, 4 cores for 4 GPUs is not matching (bottleneck CPU), and 32 cores for 1 GPU either. It is not a rule, it all depends on the uses, but the general idea except in specific cases is there.
So we place the threshold on what we want to do, and we take the cheapest. For optimized deep learning uses, the processor is used little compared to the GPU, a processor for a deep learning server should not exceed 400 € with 8 cores.
Some AMD threadripper processors (server oriented) have the advantage of being inexpensive, the TR1920X is 270 +/- 30 € with 12 cores, but the associated motherboard (and therefore the socket) are not compatible with the new (very high) TR3XXX threadripper ranges nor with the large public and less energy-consuming Ryzen simple (3/5/7/9) ranges.
The price of the processor is analyzed by being aware of all the prices of the components it conditions. But the line of simple Ryzen processors can’t handle 4 GPUs, that’s where you make a choice.
• Qualitative factors of the CPU: Number of threads, Consumption in watt
• Limiting factors of the CPU: Number of supported PCIe (and therefore GPU)
The libs used in deep are mainly Pytorch and Tensorflow which are mainly based on CUDA, CUDA mainly works on NVIDIA GPUs, so unless you look for the complicated paths, you will buy an NVIDIA GPU .
There are three types of cooling. Either “blower”, the air is sucked in, passes over the GPU and exits at the back of the case. Either with a fan and the air is sent to the box, or to watercooling.
For one GPU or at most two mid-range GPUs, it doesn’t matter. Beyond that, a “blower” must be taken, otherwise the air returned to the box will create overheating. So there again, you have to anticipate, once purchased you will not “be able” to change.
Why aren’t there just blowers? Because they are mainly noisier, which is less pleasant for gamers, for example.
Four ranges, at first glance:
• Low end: GTX 1050 (Ti) / 1060 (Ti) / 1070 (Ti) / 16xx (S)
• Mid-range: RTX 3080 / RTX 3070 / RTX 2060 (S) / 2070 (S)
• Top of the range: RTX 3090 / RTX2080Ti
• Very high-end: Titan RTX / V100 / some Quadro etc.
What you need to look primarily at is the price, memory, and horsepower.
2 GB is enough for small images, 4 GB will do more, 8 GB for real uses (RTX3070), ~ 11 GB (RTX3080 / GTX1080Ti / RTX2080Ti / GTX TITAN) for research uses and more than 20 GB (RTX 3090 / Titan RTX) for specific uses.
Personally, in terms of value for money I recommend 8 ~ 11 GB RTX for just about any purpose. The evolution of scores in deep learning is done more through qualitative algorithmic advances than quantitative hardware.
Non-graphics memory can be limiting depending on usage. Memories have generations (DDR3, DDR4, DDR5), DDR4 is currently the one used on recent machines. You absolutely have to take 8GB or more, ideally 16GB leaving space to go to 32GB (for students), 32GB if you know that you have the use (for small business), and beyond for special cases.
For example it is possible (but rarely useful) to load an entire dataset into memory. If we have 30,000 images of 500x500x3 bytes, that gives 21GB, so we have to take 32GB of RAM. Thus, your algorithms will have no latency related to reading on the storage space. But you are unlikely to have a bottleneck here if you are using an SSD.
In general, each motherboard has a list of compatible RAMs that it is best to consult before purchasing to avoid unpleasant surprises. The qualitative characteristics of a RAM are mainly its speed (2133Mhz or more for a DDR4 generation) and its latency, but this data is not very important in deep learning. Do not put 50 € more to have 4000Mhz with neon lights.
• RAM qualitative factors: Speed, latency
• RAM Limiting Factors: Amount of Memory
We’re in 2020, so you’re going to take an SSD. Here it’s very simple, a 2TB SSD costs around 300 € +/- 100 € (see more).
The main thing about SSDs is that you have enough space to put at least one dataset, the one you are using, as well as your applications and your OS. Imagenet2012 weighs 140GB for 1M300 Images. Coco2017 is at 50GB for 125k images. Imagine that you are not going too far off the beaten track, 500GB to 1TB of SSD will suffice. 2TB for specific uses.
The characteristics delivered by the manufacturers are often extremely theoretical. Rely only on benchmarks that can be found online, and on these benchmarks what interests you is mainly the reading speed.
In detail, there are several types of SSD. NVMe M.2 is the preferred set of specifications concerning format and connectivity, NVMe SSDs then have a bar shape and can be integrated directly into a specific M.2 port on the motherboard.
SSDs are made of cells, they can store 1bit / cell, or 2, or 3, 4 or even 5. Each number is associated with a type of SSD: SLC (Single-level cell) = 1 bit / cell, MLC = 2bit / c, TLC = 3bit / c, QLC / 4bit / c.
If some SSDs are very expensive with the same amount of memory, it is because they guarantee better reading and especially writing speed thanks to the architecture of the cells.
In general, from the most expensive to the cheapest we have SLC then MLC then TLC then QLC. Since we are mainly interested in reading speed in deep learning, QLCs may well be enough for us.
• Qualitative factors of the SSD: Read speed then write speed
• Limiting factors of SSD: Size, format
Best Laptops For Deep Learning in 2021
Acer Predator Triton 500 PT515
- 10th Generation Intel Core i7-10750H 6-Core Processor (Up to 5.0GHz) with Windows 10 Home
- Overclockable NVIDIA GeForce RTX 2070 SUPER Graphics with Max-Q Design & 8 GB of dedicated GDDR6 VRAM
- 15.6″ Full HD (1920 x 1080) widescreen LED-backlit IPS display with NVIDIA G-SYNC technology | 300Hz Refresh Rate | 3ms Overdrive Response Time | 300nit Brightness | 100% sRGB
- 16 GB DDR4 2933MHz Memory | 512 GB PCIe NVMe SSD (2 x PCIe M.2 Slots with 1 Slot Open for Easy Upgrades)
- Per-key RGB Backlit Keyboard with Customizable Lighting | LAN: Killer Gaming Network E3100G | Wireless: Killer Double Shot Pro Wireless-AX 1650i 802.11ax Wi-Fi 6 | 4th Gen All-Metal AeroBlade 3D Fan
ASUS ROG Zephyrus S
- Activision call of Duty: black ops 4 game codes – offer valid 09/10/18–12/31/18, while supplies last. Go to the official ASUS site for FAQs, full terms and conditions, and to verify purchase for Redemption. See product description for additional information.
- 15.6” Full HD high refresh rate 144Hz 3ms IPS-Type Display with slim 6.5mm Bezel
- NVIDIA GeForce GTX 1070 8GB GDDR5 *(with Max Q Technology)
- 8th-generation Intel Core i7-8750H (up to 3.9GHz) processor. Battery – 60WHrs, 4S1P, 4-cell Li-ion
- 0.62” thin, 4.6 lbs. ultraportable military-grade magnesium alloy body gaming Laptop with premium cover CNC-milled from Solid aluminum
- 1TB PCIe NVMe M.2 SSD; 24GB 2666Hz DDR4; Windows 10 Home
- ROG active Aerodynamic system (AAs) with upgraded 12V fans and anti-dust tunnels to preserve cooling performance and system stability
- Customizable 4-zone ASUS Aura RGB Gaming Keyboard
MSI GS65 Stealth-1668
- 15.6″ FHD, Anti-Glare Wide View Angle 144Hz 3ms 72%NTSC NVIDIA GeForce GTX1660Ti 6G GDDR6
- Intel Core i7-9750H 2.6 – 4.5GHz Intel 9560 Jefferson Peak (2×2 802.11 ac)
- 16 GB (8G 2) DDR4 2666MHz 2 Sockets; Max Memory 64 GB
- Killer Gaming Network E2500 Dynaudio Speakers 2W2 720p HD Webcam
- USB 3.2 Gen2 3; Thunderbolt 31
Dell Inspiron 15 5000
- 7th Generation Intel Core i5-7300HQ Quad Core (6 MB Cache, up to 3.5 GHz)
- 8GB 2400MHz DDR4 up to 32 GB (additional memory sold separately)
- 1 TB 5400 rpm Hard Drive, No Optical Drive option
- 15.6-inch FHD (1920 x 1080) Anti-Glare LED-Backlit Display
- NVIDIA GeForce GTX 1050
HP OMEN | 15
- 15.6″ diagonal FHD, 144 Hz, IPS, anti-glare, micro-edge, WLED-backlit, 300 nits, 72% NTSC (1920 x 1080), AMD Ryzen 7 4800H (2.9 GHz base clock, up to 4.3 GHz max boost clock, 4 MB L2 cache, 8 cores)
- 1 TB PCIe NVMe M.2 SSD, 16 GB DDR4-3200 SDRAM (2 x 8 GB)
- 1 Super Speed USB Type-C 5Gbps signaling rate (DisplayPort 1.4, HP Sleep and Charge); 1 Super Speed USB Type-A 5Gbps signaling rate (HP Sleep and Charge); 2 Super Speed USB Type-A 5Gbps signaling rate; 1 Mini DisplayPort; 1 HDMI 2.0a; 1 RJ-45; 1 AC smart pin; 1 headphone/microphone combo, NVIDIA GeForce GTX 1660 Ti 6 GB Graphics
- Intel Wi-Fi 6 AX 200 (2×2) and Bluetooth 5 Combo, Integrated 10/100/1000 GbE LAN
- HP Wide Vision HD camera with integrated dual array digital microphone, Windows 10 Home
GIGABYTE Aero 15X
- 15.6″ 5 mm Thin Bezel FHD 144Hz Pantone X-Rite 1920×1080 IPS anti-glare display LCD
- Intel Core i7-8750H (2.2GHz-4.1GHz) NVIDIA GeForce GTX 1070 GDDR5 8GB Max Q Supports NVIDIA Optimum Technology
- 16 GB DDR4 2666MHz Memory 512 GB M.2 NVME PCIe Gen3 x4 SSD Windows 10 Home
- 94.24Wh 10hrs Long Battery Life Gigabyte Fusion Per Key RGB Keyboard Dolby Atmos Gaming Thunderbolt3, SD Card Reader(UHS-II)
- 14 x 9.8 x 0.78″ 4.62 IB. 2 years global warranty
- Maximum Memory: 32 GB
ASUS ROG Zephyrus GX501
- Powerful and efficient GeForce GTX 1080 8 GB with Max-Q design and 8th-Gen Intel Core i7-8750H (up to 3.9 GHz) Processor
- Ultra-thin and ultra-light gaming laptop with a thickness of only 0.7″ (with the lid closed) and weighing only 4.9lbs
- 144Hz 15.6″ Full HD IPS-Type AHVA G-SYNC Display with 3ms response time
- The Fastest SSD and RAM: featuring 512 GB PCIe SSD (Hyperdrive up to 3478 MB/s sequential read rate) and 16 GB DDR4 2666MHz; Windows 10 Home
- Quiet and cool featuring ROG Active Aerodynamic System which improves airflow by up to 40 percent and reduces temperatures by up to 20 percent compared to conventional cooling. (Actual cooling performance varies)