Back to jobs

Staff Site Reliability Engineering (SRE)

Full remote · Paris, Paris, France

About the job

Alma shapes the fintech landscape. We strive to serve and empower consumers and merchants by developing innovative solutions that redefine their purchase experience.

About the mission

  • Organize and prioritize SRE roadmaps to ensure that the infrastructure is aligned with customer needs (internal and external)
  • Lead cross-functional initiatives within the product teams.
  • Regularly interact with stakeholders and senior management, ensuring alignment and effective communication on key initiatives.
  • Promote automation and SRE best practices to optimize operational efficiency.
  • Develop and maintain backup and disaster recovery strategies to protect data and ensure business continuity.
  • Design, implement and maintain monitoring tools to track key system metrics, health indicators and our SLAs/SLOs.
  • Provide technical support and expertise to engineering teams for the resolution of application and infrastructure incidents.
  • Carry out in-depth analyzes of incidents in order to identify the underlying causes and put in place corrective measures.
  • Maintain the platform in operational condition by implementing updates, security patches and continuous improvements.
  • Participate in the optimization of the operating costs of the platform.
  • Supporting and guide SREs through knowledge-sharing and collaboration, fostering continuous improvement across the team

About you 

  • At least 8 years in the management of cloud infrastructures.
  • You also have experience in project management, enabling you to oversee and drive initiatives from planning to successful delivery
  • Strong presentation and communication skills to collaborate with different teams and share problems and solutions effectively.
  • Deep knowledge of Google Cloud Platform or other cloud providers.
  • Good network knowledge.
  • Experience in setting up and maintaining monitoring tools, analyzing metrics and malfunctions.
  • Practice of Infrastructure as code.
  • Ability to solve problems methodically and work effectively under pressure during critical incidents.
  • Practice of English.

Our technical stack

  • Cloud providers: GCP, CloudFlare, AWS
  • Backend: Python + FastAPI and Flask
  • Frontend: React / Typescript
  • Databases technologies: PostgreSQL, Redis, BigQuery
  • Log and error management: Datadog, Sentry
  • CI/CD: Github Actions, Docker
  • Monitoring: Datadog
  • Infrastructure as Code: Terraform

About the recruitment process

  • Interview with Talent Acquisition (30-45 min)
  • Interview with Engineering Manager (45-60 min)
  • Take-home Coding test, followed by a remote feedback session and a system design test (90 min)
  • Team Fit interview (30 min)

 

Apply for this job

*

indicates a required field

Resume/CV

Accepted file types: pdf, doc, docx, txt, rtf

Cover Letter

Accepted file types: pdf, doc, docx, txt, rtf