eRacks/AINSLEY 4U AI server, top-open angle view — eRacks/AINSLEY 4U AI server

Last week we wrote about the 2026 AI GPU landscape – the hardware story. This week we want to talk about the question that comes right after a buyer picks a GPU: what actually runs on this thing once it’s plugged in?

For a lot of our boutique competitors, the answer is “our proprietary OS, our proprietary management tool, and the AI runtime we picked for you.” That model trades convenience for lock-in. We chose differently. Every eRacks AI server – AILSA, AIDAN, AINSLEY, AISHA – ships with the same software stack: vanilla Ubuntu 26.04 LTS plus a curated set of open-source AI tools, all pre-installed, all standard packages, nothing custom-forked.

What’s actually on the box when it arrives

By default, every AI server ships with:

Ubuntu 26.04 LTS – standard server install, your team already knows it, supported until 2031.
Ollama – the easiest local LLM runtime. OpenAI-compatible API. Pull and run any open-weight model with one command (ollama pull llama4 for Llama 4 Scout, ollama pull qwen3, ollama pull deepseek-v3.2, etc.).
Open WebUI – polished self-hosted chat interface. Looks and feels like ChatGPT. Multi-user, works for the whole team out of the box.
vLLM – production-grade inference engine for high-throughput workloads. PagedAttention, continuous batching, the works.
PyTorch + CUDA + cuDNN – matched to your installed GPU. Training and custom model work ready to go.
Docker + NVIDIA Container Toolkit – containerized AI workloads work without setup.
Standard Linux dev tools – Python 3.12, git, build-essential, tmux, htop, the lot.

That’s the hardware-AI side. On the storage and platform side, you also get the standard eRacks Linux base: ZFS available, SSH hardened, automatic security updates configured, monitoring hooks ready. No surprises, no proprietary agents.

Plug in, open a browser, start chatting

The whole “first 10 minutes” experience looks like this:

Rack the server, connect ethernet, power on.
SSH in once with the credentials we ship in the envelope, set your own password.
Open http://<server-ip>:3000 in any browser on your network.
Open WebUI loads. Click “Sign up” (first user becomes admin), pick a model from the dropdown, start typing.

That’s it. The model list is pre-populated with whatever we sized your GPU for – Llama 4 Scout (17B active, 10M context) on the 48GB tier, Qwen 3 30B / DeepSeek-V3.2 distill on 32GB, Llama 3.1 8B + Mistral on 16GB. You can ollama pull any other model from the Ollama registry or Hugging Face the same day.

Why we chose this stack over building our own

There’s a temptation when you sell hardware to also sell a “platform” – a custom Linux fork, a branded management UI, a vendor-locked update channel. Some of our competitors do this. We don’t, for four reasons:

1. Your team already knows Ubuntu. Every Linux admin in your shop has used Ubuntu. Deploying our box is not a training exercise. Vanilla apt works. Standard systemd. No “did you check the wiki for this version of OurOS” support calls.

2. No vendor lock-in. If we go out of business tomorrow (we’ve been around since 1999, but still), your hardware keeps running on a fully supported open OS. You’re not stranded on an orphaned proprietary stack.

3. Updates are yours to control. When a new version of Ollama drops (which is every couple weeks), or when Meta drops Llama 4.5, or DeepSeek pushes V3.3, you can ollama pull it the same hour it lands. You don’t wait for us to vet it and ship a new firmware bundle.

4. The open-source ecosystem ships faster than any single vendor. Ollama, Open WebUI, vLLM, llama.cpp, the Hugging Face ecosystem – these tools improve weekly. Llama 4, Qwen 3.5, DeepSeek V3.2 all dropped in the last few months and were running on customers’ eRacks boxes within days of release. A vendor stack that re-bundles them is always a release behind. Vanilla Ubuntu lets you ride the open-source release cadence directly.

For teams that want different

That said: if you want a different OS, we’ll ship that too. Customers commonly ask for:

Debian 12 stable – rock-solid, smaller footprint, similar tooling.
Rocky Linux 9 / RHEL 9 – for teams standardized on enterprise RHEL.
Proxmox VE – if you want to virtualize the AI server alongside other workloads.
Bring your own image – we can boot your custom OS from USB before shipping.

And if you want a different inference stack:

llama.cpp + llama-server instead of Ollama (more control, smaller dependency footprint)
Text Generation Inference (Hugging Face’s TGI) for production deployments
SGLang for advanced structured-output workloads
LangChain / LlamaIndex stacks for RAG and agents
JupyterLab + a stack of ML tooling if your buyer is an ML researcher

Tell us what you want at order time and we’ll pre-install it. If you don’t want anything, we’ll ship the bare OS.

The point

The hardware decision (which GPU, how much VRAM, how many drives) is the visible part of buying an AI server. The software decision is the longer-term part – it’s what your team interacts with every day for the next 5-7 years. We think that decision should be yours, on a stack you can fork, audit, replace, and redeploy on commodity hardware if you ever change vendors.

We’ve been shipping open-source Linux servers since 1999. Same approach. New use case.

Browse the AI server lineup →

joe April 20th, 2026

Posted In: AI, Deep Learning, Linux, Open Source, Rackmount Servers, servers

Tags: eRacks, Intel Arc, LLM, Rackmount Servers

What “VRAM fits my model” actually means

As a rule of thumb for local inference:

Model weight size ≈ parameters × bytes per weight. A 7B-parameter model at 4-bit quantization needs roughly 3.5–4 GB. The same model in full FP16 precision needs ~14 GB.
Add 2–4 GB of working memory on top for KV cache, context window, and runtime overhead — more if you want long contexts.
If your model plus overhead doesn’t fit, you’ll spill to system RAM or disk, and your tokens-per-second drops by an order of magnitude.

So the VRAM tier you need is driven by what you want to run, not by marketing tier names. Here’s how the 2026 market actually lines up.

The seven tiers

Tier	VRAM	Price range	Models it runs comfortably	Example cards
Low-Profile (2U)	8–16 GB	$320–$450	3B–8B quantized, embeddings, small classifiers	RTX 5060 LP, Intel Arc Pro B50, nVidia RTX A1000/A2000 LP
Entry	16 GB	$480–$1,500	7B–13B full, 30B quantized	RTX 4060 Ti, RTX 5070 Ti, RTX 5080, AMD RX 9060 XT 16GB, AMD RX 9070
Workstation	20 GB single-slot	$1,280–$2,500	13B full, 34B quantized; quiet, ECC, space-efficient	nVidia RTX A4000 Ada (single-slot), AMD Radeon Pro W7800 32GB
Prosumer	24–32 GB	$2,000–$3,740	34B full, 70B quantized	RTX 3090 Ti refurb, AMD RX 7900 XTX, RTX 5090 (availability-dependent)
Server	48 GB	$1,299–$8,800	70B full, early 100B class	Intel Arc Pro B60 Dual 48GB, RTX 6000 Ada, NVIDIA L40S (passive), AMD Radeon Pro W7900
Flagship	96 GB	~$9,680	70B full comfortably, 120B quantized, long-context everything	RTX PRO 6000 Blackwell 96GB ECC
Datacenter	192 GB HBM3	$15k+ (by quote)	Serious training + 405B-class inference	AMD Instinct MI300X

The two surprise cards of 2026

If you only remember two things from this post, remember these:

Intel Arc Pro B50 ($399). A 16 GB low-profile card for under $400 didn’t exist twelve months ago. This card ships with both a standard and a low-profile bracket in a dual-slot form factor, slides into a 2U chassis without drama, and gets you enough VRAM for 7B-class models, embedding pipelines, and small classification workloads. As a starter card for a team dipping into local AI, nothing NVIDIA sells competes on $/GB at this form factor.

Intel Arc Pro B60 Dual 48GB ($1,299). This one is genuinely wild. Intel’s Project Battlematrix puts two Arc Pro B60 GPUs on a single PCIe card with 48 GB total VRAM — at roughly a fifth the price of an NVIDIA RTX 6000 Ada ($7,150) or a quarter the price of an L40S ($8,800). The software stack isn’t as mature as CUDA and your specific workload may or may not run well on Intel’s Battlematrix Linux drivers, but if your model runs, you’re getting 48 GB of VRAM for $1,299. For inference-bound 70B-quantized workloads where you don’t need peak training throughput, this is the best $/VRAM-GB in the market right now by a wide margin.

The AMD side

AMD’s RDNA 4 generation (RX 9060 XT, RX 9070, RX 9070 XT) turns out to be genuinely competitive for consumer-grade AI inference once you’re running on a framework that’s ROCm-aware — llama.cpp, Ollama, and vLLM all work. Performance-per-dollar on 16GB RDNA 4 cards is very close to the NVIDIA 50-series and sometimes ahead. For customers who don’t need CUDA and want to avoid NVIDIA’s pricing, this is a real path.

On the workstation side, AMD’s Radeon Pro W7800 (32 GB) and W7900 (48 GB) are direct replacements for NVIDIA’s RTX A5000/A6000 at roughly half the price, with ECC memory and workstation driver support. If you’re building a quiet single-user AI workstation, the W-series deserves a serious look.

At the top end, the AMD Instinct MI300X with 192 GB of HBM3 is the only single card that holds an entire 405B-class model in VRAM without any quantization tricks. It’s quote-only, it’s expensive, and the software story is still improving — but for the handful of customers for whom “does it fit” is more important than any other consideration, it’s currently the only game in town below $30k.

Which eRacks AI server for which tier?

We built our AI rackmount server line around this same VRAM-first thinking. Each model defaults to a different VRAM tier out of the box, and you can upgrade within the tier or jump tiers at configuration time:

eRacks/AILSA — 2U, from $5,995. Default tier: Low-Profile. The “affordable starter” for teams trying local AI for the first time. Upgrade chassis to 3U-GPU or 4U-GPU if you want to move up to full-height cards later.
eRacks/AIDAN — 2U full-height (up to 3 GPUs mounted sideways), from $9,995. Default tier: Entry 16GB. For 7B–13B models full-precision.
eRacks/AINSLEY — 4U, from $14,995. Default tier: Prosumer 24–32GB. For 34B full or 70B quantized, with room for up to 4 full-height GPUs.
eRacks/AISHA — 4U 8-GPU, from $19,995. Default tier: Workstation. Scales to the Server and Flagship tiers with up to 8 full-height GPUs — including the Intel Arc Pro B60 Dual for 48GB-per-card pricing unavailable anywhere else.

All four run Ubuntu Linux LTS Server out of the box, come with ECC-capable DDR5 RAM up to 512 GB, and ship with assembly, burn-in, and a 3-year warranty.

A note about prices

Our internal component costs tracked above — and therefore the baseline configuration prices you see on each product page — are mid-April 2026. The two forces moving them right now are (1) the AI-driven DDR5 memory supply crunch, which has roughly doubled ECC server RAM pricing since Q3 2025, and (2) the NAND flash shortage pushing SSD prices up. We’ll keep this post synced with our configurator. If you see a number here that doesn’t match what the configurator shows, trust the configurator — it’s the system of record.

Questions we haven’t answered yet

This post is the overview. Over the next few weeks we’ll be publishing deeper dives on:

Why we just bumped our RAM prices 3x — an honest look at the 2026 memory market
Arc Pro B60 Dual vs RTX 6000 Ada — real-world benchmarks on Llama 3 70B quantized
The eRacks AI server lineup in depth — AILSA, AIDAN, AINSLEY, AISHA side-by-side

Got a specific model you want to run and aren’t sure which tier fits? Drop us a line and we’ll build the configuration for you.

joe April 15th, 2026

Posted In: AI, Deep Learning, LLM, Local AI, New products, Open Source, Rackmount Servers, servers, Technology

Tags: AI, AMD Radeon, Blackwell, Deep Learning, eRacks, GPU, Inference, Intel Arc, Llama, LLM, Local AI, Machine Learning, Open Source, Rackmount Servers, RDNA 4, VRAM

One Comment

Last week we wrote about the 2026 AI GPU landscape – the hardware story. This week we want to talk…

UPDATE: as of Aug/Sep we've been playing with the new M.2 Intel Optane 16G SSD - Awesome! We'll be integrating…

Note: This version of Ubuntu switches to systemd as its init or "pid 1" mechanism. eMail or post with questions.…

Hi Problem solved. I tried Ubuntu 14.04 unsuccessfully. Got to the grub install and failed - the message was: grub…

Good afternoon, I'm hoping you may be able to help me sort out a problem. I've working through it and…

eRacks Systems Tech Blog

Open Source Experts Since 1999

Your AI Server, Your Software Stack: Why eRacks Doesn’t Ship a Custom AI OS