Infrastructure & Platform Lead (Sovereign AI)

  • Location Icon Remote from Australia
  • Industry Icon DevOps
  • Work Type Icon Permanent / Full Time
About
We’re hiring a hands‑on Technical Team Leader to help build and operate sovereign, on‑shore AI infrastructure supporting GPU, HPC, and high‑performance workloads.
This is not a people‑manager role. You will remain deeply technical and act as the primary escalation point for complex infrastructure and platform issues.
You’ll lead a small team and own the stability, scalability, and operational maturity of private cloud platforms running Kubernetes and OpenStack on bare metal.

What You’ll Do
  • Lead by example as the technical authority for the platform
  • Design, operate, and improve production Kubernetes (bare‑metal / private cloud)
  • Own reliability of OpenStack‑based infrastructure
  • Lead incident response, post‑incident reviews, and remediation
  • Improve monitoring, alerting, runbooks, and change practices
  • Communicate technical risk and incidents clearly to leadership
Hard Requirements
  • Deep, practical Linux experience (non‑negotiable)
  • Hands‑on bare‑metal or private cloud experience
  • Production Kubernetes in real environments
  • Comfortable leading during incidents
  • Strong communication and ownership mindset
Nice to Have and Trainable for the Right Person 
  • OpenStack in production
  • Kubernetes + OpenStack integration
  • SRE / operational maturity frameworks
  • GPU or HPC environments
Not a Fit If You
  • Are hyperscaler‑only (AWS/GCP/Azure with no hardware)
  • Are Microsoft‑heavy with limited Linux depth
  • Are no longer hands‑on
This role suits engineers who want serious responsibility, autonomy, and to stay close to real systems at scale.
APPLY NOW
WHAT THEY SAY
Coma