Job Application for Senior Site Reliability Engineer at MA Capital

Who We Are

MA Capital US LLC is a proprietary trading firm specializing in systematic and high-performing discretionary strategies across multiple asset classes. We leverage advanced technology, quantitative research, and sophisticated models to capitalize on opportunities in global markets. Our culture is built on innovation, efficiency, and transparency. We are committed to providing liquidity and supporting fair, efficient markets, while ensuring our traders and professionals have flexibility, tools, and continuous learning opportunities to succeed.

Position Overview

We’re seeking a Senior Site Reliability Engineer to support and evolve MA Capital’s production trading environment with a strong focus on Linux performance, automation, and reliability. This role is for a builder, not a maintainer, you won’t just tune a kernel, you’ll write the code that ensures every system is born optimized, scalable, and repeatable.

As part of our platform evolution, this role will help move the organization from manual tuning and reactive troubleshooting toward automated performance, deep observability, and engineered reliability. You will help ensure trading systems meet strict uptime and performance requirements by building code-driven solutions, while working closely with developers and traders to improve system stability, observability, and operational maturity in a highly latency-sensitive environment.

MA Capital employs individuals with a passion for innovation and solving hard problems. If you’re looking for a role where you can take ownership, raise the bar operationally, and directly influence the reliability of a high-performance trading platform, MA Capital is your home.

Key Responsibilities

Reliability & Production Ownership: Own the availability, stability, and performance of Linux-based trading systems (RedHat, Rocky, Ubuntu).
Incident Response: Lead and participate in incident response, on-call rotations, and post-incident reviews, producing clear and blameless post-mortems while implementing automation or process improvements to prevent repeat failures.
Operational Processes: Develop and maintain runbooks, documentation, and operational standards to ensure consistent, repeatable production support.
Production Readiness: Partner with developers and traders to ensure systems are designed and deployed with reliability, performance, and operational readiness in mind.
Linux Systems & Performance: Perform OS- and system-level tuning (CPU topology, IRQ affinity, memory, networking) to support deterministic, latency-sensitive workloads.
Performance Diagnostics: Diagnose complex performance issues using perf, ftrace, tcpdump, & eBPF.
Automation & Infrastructure: Treat infrastructure and system configuration as version-controlled, reproducible code using Ansible, Terraform, Python, and shell scripting, ensuring systems are consistently built optimized.
CI/CD & Deployment: Design and improve CI/CD pipelines that incorporate automated testing and performance validation prior to production release.
Core Services: Support and automate core infrastructure services including DNS, NFS, LDAP/Active Directory, and multicast networking.
Monitoring & Observability: Build and evolve monitoring, alerting, and logging for trading-critical systems, improving alert quality, response times, and operational visibility.
Reliability Engineering: Reduce operational toil through automation, tooling, and standardization, implementing automated remediation for known failure scenarios.
Process & Maturity Advancement: Identify gaps in operational practices and help drive the organization toward proactive reliability engineering through scalable processes and tooling.

Required Qualifications

Experience: 4–8+ years of experience in Site Reliability Engineering, Linux engineering, DevOps, or infrastructure-focused roles.
Production Systems: Hands-on experience supporting highly available, performance-sensitive systems in production environments.
Linux Expertise: Deep understanding of Linux internals, including scheduling, memory management, interrupts, filesystems, and storage behavior.
Networking: Strong knowledge of TCP/IP, UDP, multicast, and networked services.
Automation & Tooling: Proficiency with Ansible, Terraform, Python, shell scripting, YAML/JSON, & Git-based workflows.
Containers & Observability: Experience with Docker (or similar platforms) and familiarity with observability stacks such as Prometheus, Grafana, ELK, or comparable tooling.
Databases & Logging: Experience supporting production databases and integrating system and application logs with centralized logging or SIEM platforms.
Operational Rigor: Strong documentation skills & solid understanding of incident management & on-call best practices.

Why Join Us?

Professional Development: Hands-on experience in a high-performance trading environment where your work directly impacts system reliability and execution.
Startup Environment: Agile and entrepreneurial culture that values ownership, accountability, and rapid iteration.
Efficient Infrastructure: Exposure to a highly optimized trading stack built for performance, resilience, and scalability.
Comprehensive Health Coverage: Medical, dental, and vision insurance.
401(k) Retirement Plan: Helping you and your family plan for a secure financial future.

Create a Job Alert

Interested in building your career at MA Capital? Get future opportunities sent straight to your email.

Senior Site Reliability Engineer

Apply for this job