🖥️ Dual-GPU LLM Workstation Build Spec

Target use case: Repo-aware coding assistant (30–34B models, fine-tuning, RAG) on local hardware.

1. Core Components

🔹 Motherboard

MSI MEG X670E ACE (Recommended)

PCIe layout: Dual PCIe 5.0 x16 slots, official x16 / x0 or x8 / x8 mode.
Spacing: 2 reinforced x16 slots with good clearance for dual 3-slot GPUs.
VRM: Premium 22+2 phases → stable for Ryzen 9 CPUs under sustained load.
Networking: Onboard 2.5 GbE + WiFi 6E.
Storage: 4× M.2 (PCIe Gen5 + Gen4), plenty for NVMe datasets.
BIOS: Stable lane bifurcation support, enthusiast-class firmware.

🔹 GPUs

2× NVIDIA GeForce RTX 5070 Ti 16 GB

Architecture: Ada/Blackwell family (mid-high tier).
VRAM: 16 GB GDDR7 per card → 32 GB effective with tensor parallel.
Compute: Strong CUDA/Tensor core count, ideal for 30B class coder models in 4-bit quantization.
Cooling: Triple-fan 3-slot designs recommended (ASUS TUF / MSI Gaming Trio / Gigabyte Eagle).
Setup: Run via tensor parallel with vLLM/ExLlamaV2 for efficient sharding.

🔹 CPU

AMD Ryzen 9 7950X / 9950X

16 cores, 32 threads.
Plenty for preprocessing, vector search, and RAG pipelines.
PCIe 5.0 support to fully unlock X670E lane config.

🔹 Memory

128 GB DDR5 (2×64 GB)

DDR5-5600 or higher (EXPO profile). (Crucial 5600, 2x64GB is good)
Required for large context inference + QLoRA fine-tuning.

🔹 Storage

2 TB NVMe Gen4 (Samsung 990 Pro / WD Black SN850X) for OS + models.
4 TB NVMe Gen4/Gen5 for vector DB + repo snapshots.
Optional: HDD RAID (NAS) for long-term storage/backup.

🔹 PSU

1000–1200 W ATX 3.0 Gold+

Dual 16-pin (12VHPWR) connectors for GPUs.
Corsair RM1200e Shift / Seasonic Vertex GX-1200 recommended.

🔹 Case & Cooling

Full tower (Coolermaster TD500 Mesh v2, with extra fans).
6-fan airflow, 3-slot GPU clearance.
CPU cooler: Thermalright Peerless Assassin 140.

2. Software Stack

Inference (multi-GPU)

vllm serve codellama/CodeLlama-34B-Instruct \
  --tensor-parallel-size 2 \
  --max-model-len 32768 \
  --dtype auto

vLLM: Handles tensor parallel across 2 GPUs.
ExLlamaV2: For GPTQ/EXL2 quantized models (efficient VRAM use).
llama.cpp: Optional GGUF loader with CPU offload.

Fine-Tuning

QLoRA/PEFT for efficient repo-specific fine-tuning.
With 2×16 GB GPUs, LoRA adapters on 30B models become feasible.

RAG (Repo-Aware Assistant)

Index repo into Chroma/Weaviate/Qdrant vector DB.
Chunk by function/class boundaries.
Serve via OpenAI-compatible API (vllm.entrypoints.openai.api_server).
Integrate with VS Code / JetBrains as local Copilot.

3. What You Can Run

GPU Setup	Max model size (quantized)	Coding ability vs ChatGPT
1×5070 Ti 16 GB	13B (GGUF/4-bit)	“Junior dev who knows repo”
2×5070 Ti 16 GB	30–34B (4-bit)	“Mid/senior dev, repo-aware”
4× cards (future)	70B class (QLoRA/RAG)	“GPT-3.5-like, repo-aware”

4. Why This Build Works

Safe motherboard choice (MEG X670E ACE) guarantees x8/x8 dual GPU operation, avoiding lane quirks.
2×5070 Ti 16 GB = sweet spot for 30B-class coder models → excellent balance between VRAM and compute.
128 GB DDR5 ensures enough headroom for RAG + fine-tuning.
2.5 GbE LAN is sufficient for serving LLM API to your dev machines.
Modular upgrade path → add GPUs later (4× config) or swap to larger VRAM SKUs.

✅ Verdict:
This dual-5070 Ti build on the MSI MEG X670E ACE is a reliable workstation for repo-tuned coding LLMs. You’ll be able to run and fine-tune 30–34B models, integrate them with your IDE, and get a private assistant that often beats ChatGPT on your codebase, while still using ChatGPT for global reasoning tasks.

The Daily Kebab

The ramblings of a technomuse

Local LLM for coding