// Senior Staff Software Engineer

Rami AlGhanmi.

Distributed systems·AI infrastructure·Crusoe Energy

I design and operate the distributed systems that keep production running at scale — and the harder the operational problem, the more I want it. If I do my job right, you have no idea I exist.

01 What I work on — focus areas
01 · distributed systems

Distributed systems at scale

Architecting the data, compute, and control planes that run production — and keeping them running when they're under load.

02 · operations

Operating at production scale

On-call, incidents, capacity, observability — the operational discipline that turns a working system into a reliable one.

03 · devops

DevOps & GitOps

Self-service CI/CD, infrastructure-as-code, container delivery, and the deployment automation behind two decades of production systems.

04 · ai/hpc

AI & HPC infrastructure

Managed services for GPU-intensive training and inference. Kubernetes, Slurm, and heterogeneous compute at scale.

02 Selected work — last decade

Crusoe Energy

AI & HPC managed services

Building managed infrastructure for GPU-intensive AI training and inference workloads — Kubernetes, Slurm, and the operational tooling around them.

k8sslurmgpu-scheduling

Workday

Operational DataLake migration to AWS

Two-phase migration architecture — DataSync for transfer, EMR Spark for transformation — that decoupled copy from logic, allowing thorough validation and reuse downstream. Delivered without disrupting production workloads.

aws-datasyncemr-sparkhadoopawsk8sdevops

Workday

Unified observability across the fleet

Architected and operated the EKS-based telemetry platform that replaced a sprawl of per-system tooling — single pane of glass across every environment, lower capex, and the operational signal engineers actually trusted.

ekstelemetrymulti-cloudawsk8sdevops

Symantec

First cloud-native product to production

Led the production deployment of Symantec Endpoint Protection Cloud — the company's first SaaS product — and built the self-service CI/CD pipeline behind it from scratch.

ci/cdsaaspublic-cloudawsopenstackk8sdevops
03 Trajectory — twenty years, five chapters
YearsWhereWhat
2025 – nowCrusoe EnergyAI & HPC infrastructure. Managed services for GPU workloads at scale. Current
2019 – 2024WorkdayDistributed infrastructure, DevOps tooling, and fleet-wide observability. DataLake migration to AWS. Kubernetes platform for public-cloud delivery with zero-downtime deploys.
2014 – 2019SymantecCloud security. First SaaS product to production. Established in-house DevOps practice — self-service CI/CD, IaC, and microservice containerization with Docker & Kubernetes.
2008 – 2014USC · NASA JPLMS & PhD coursework. Earth-science data systems at JPL. Built a git-based assignment-delivery and grading pipeline as TA — early DevOps instincts.
2004 – 2008KFUPMBS, Computer Engineering. Hardware-software fundamentals.
04 About — in brief

Twenty years of building distributed systems that have to stay up — not because of a master plan, but because the harder problems kept being more interesting than the easier ones. Started in security at Symantec, moved into infrastructure engineering and cloud architecture at Workday, now working on AI/HPC at Crusoe.

I tend to be the person on the team who picks up the thing nobody else wants to own — the migration that has to be invisible, the system that has to work for ten different teams with conflicting needs, the launch that can't slip. Patient with detail, allergic to drama, comfortable on the bridge when production is on fire.

Stack

kubernetesslurmawsterraformargocdansiblelinuxgopythonhelmprometheussparkhadoop

// elsewhere

Find me.

I don't post much, but this is where I am when I do. The fastest way to reach me is LinkedIn.