
Senior DevOps Engineer
Surf AI is the agentic operations platform for enterprise security teams. We don't just surface risk, we close it. Our platform connects context across identity, cloud, HR, IT, and SaaS systems, and uses specialized AI agents to drive remediation end-to-end, with human oversight at every step.
We're backed by Accel, Cyberstarts, and Boldstart Ventures, and trusted by Fortune 500 enterprises already deploying Surf in production.
Our team is small and senior, with deep roots in identity, security, and enterprise infrastructure. We work at the intersection of agentic AI and applied security - and we take seriously what it means to build systems that act in real enterprise environments.
Who are we looking for?
We're looking for a Senior DevOps Engineer to join our DevOps Team.
In this role, you’ll be responsible for designing, implementing, and scaling the infrastructure that powers our agentic platform. You’ll take a leading role in shaping and building our CI/CD pipelines, observability stack, cloud environments, and internal developer tools - all critical to enabling fast, secure, and scalable development.
What you'll do
- Architect & Build: Design and implement scalable, secure, and observable infrastructure in AWS and Kubernetes.
- Platform Engineering: Evolve our CI/CD pipelines using GitHub Actions and GitOps workflows via ArgoCD.
- Infrastructure as Code: Maintain and expand our Terraform and Crossplane-based infrastructure.
- Observability: Own our Datadog stack (metrics, traces, dashboards, alerts) and promote SLOs and performance standards.
- Automation: Build robust tooling in Python and Bash to support developers, ML workflows (SageMaker), and data pipelines.
- Security & Reliability: Ensure high standards for container security, secrets management, and fault-tolerant systems.
Required Skills & Experience
- 4+ years of hands-on experience in DevOps or Infrastructure Engineering within a production environment.
- Strong expertise in AWS. Experience with other major cloud platforms is also acceptable.
- In-depth knowledge of Kubernetes operations and its ecosystem, including tools such as Helm, ArgoCD, RBAC, Ingress, and network policies.
- Extensive experience with Terraform.
- Proven experience in building and managing CI/CD pipelines, ideally using GitHub Actions or similar tools.
- Experience in building images and image lifecycle best practices (including security, caching, and SBOMs).
- Comfortable in scripting and automating, particularly with Python and Bash.
- Hands-on experience with Datadog (or similar observability tools) for monitoring logs, metrics, APM, and building dashboards.
- Familiarity with GitOps practices and infrastructure lifecycle management using tools such as Crossplane.
Bonus Skills
- Experience supporting SageMaker, ML workflows, or data science teams.
- Familiarity with service mesh (e.g., Istio) or zero-trust networking.
- Cost optimization and security governance in AWS environments.
- Prior experience in a startup or zero-to-one infrastructure buildout.
Why Join Us?
If you want to work on foundational systems, ship AI into production, and help define how agentic security actually operates, this is an opportunity to do it early and with real ownership.
Create a Job Alert
Interested in building your career at Surf AI? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field