Senior DevOps Engineer
About AI71:
AI71 is an industry leader in artificial intelligence, delivering innovative solutions that empower developers, businesses and governments to solve complex challenges. AI71 builds secure, enterprise-ready applications powered by cutting-edge technology—tailored for knowledge workers and sector-specific needs. AI71 bridges the gap between advanced AI and real-world impact. Guided by a strong commitment to research and responsibility, we create transformative solutions that drive progress and empower communities.
The Role:
We are seeking a highly motivated and skilled DevOps Engineer to join our team. You’ll play a crucial role in building, deploying, and maintaining scalable and reliable systems and infrastructure, working closely with development teams to ensure operational efficiency and smooth deployment pipelines.
What You'll Do:
- Design, implement, and maintain CI/CD pipelines to streamline development workflows.
- Build and manage scalable infrastructure for AI model deployment and lifecycle management.
- Automate infrastructure provisioning and management using tools like Terraform, Ansible, or CloudFormation.
- Optimize cloud-based and on-premises resources for scalability and cost efficiency.
- Manage and fine-tune queuing systems and real-time streaming architectures.
- Monitor and troubleshoot production systems to ensure uptime and performance.
- Implement logging, monitoring, and alerting solutions using tools such as Prometheus, Grafana, ELK stack, etc.
- Set up comprehensive monitoring for both system metrics and ML model performance.
- Conduct root cause analyses and post-mortems to improve system reliability.
- Collaborate with development and QA teams to deploy new features into production seamlessly.
- Promote best practices in system architecture, security, and performance.
- Participate in a rotating on-call schedule for production system support.
- Ensure infrastructure complies with security and compliance standards (e.g., SOC2, ISO27001).
- Securely manage secrets and credentials using tools like Vault or AWS Secrets Manager.
What You'll Bring:
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
- Proficiency in at least one scripting language: Python, Bash, or Go.
- Hands-on experience with cloud platforms like AWS, Azure, or Google Cloud.
- Skilled in containerization and orchestration with Docker and Kubernetes.
- Experience using CI/CD tools such as Azure DevOps, Jenkins, GitLab CI/CD, or CircleCI.
- Knowledge of monitoring and observability tools like Prometheus, Datadog, New Relic, Grafana, or PagerDuty.
- Understanding of networking fundamentals including DNS, load balancing, and firewalls.
- Familiarity with real-time streaming architectures for AI and data applications.
Great Pluses / Preferred Experience
- Experience with Infrastructure as Code (IaC) tools like Terraform or Pulumi.
- Understanding of service mesh technologies like Istio or Linkerd.
- Familiarity with database scaling and administration, including VectorDBs, SQL, and NoSQL systems.
- Previous experience in a high-traffic production environment.
Why AI71:
- Mission-Driven Work: Work on cutting-edge AI applications with a talented and passionate team, solving real-world challenges in critical sectors.
- Unparalleled Opportunity: This is a chance to innovate and solve real-world challenges using AI at a company with unique access to world-leading models and resources.
- Career Growth: We offer competitive compensation, benefits, and significant career growth opportunities as a foundational member of the team.
- World-Class Environment: Enjoy a flexible working environment and the latest tools & technologies needed to do your best work.
Create a Job Alert
Interested in building your career at AI71? Get future opportunities sent straight to your email.
Apply for this job
*
indicates a required field