Back to jobs

Site Reliability Engineer

Bengaluru, India
About Groww:
We are a passionate group of people focused on making financial services accessible to every Indian through a multi-product platform. Each day, we help millions of customers take charge of their financial journey. Customer obsession is in our DNA. Every product, every design, every algorithm down to the tiniest detail is executed keeping the customers’ needs and convenience in mind. Our people are our greatest strength. Everyone at Groww is driven by ownership, customer-centricity, integrity and the passion to constantly challenge the status quo.
 
Are you as passionate about defying conventions and creating something extraordinary as we are? Let’s chat.
 
Our Vision
Every individual deserves the knowledge, tools, and confidence to make informed financial decisions. At Groww, we are making sure every Indian feels empowered to do so through a cutting-edge multi-product platform offering a variety of financial services. Our long-term vision is to become the trusted financial partner for millions of Indians.
 
Our Values
Our culture enables us to be what we are — India’s fastest-growing financial services company. It fosters an environment where collaboration, transparency, and open communication take center-stage and hierarchies fade away. There is space for every individual to be themselves and feel motivated to bring their best to the table, as well as craft a promising career for themselves.
The values that form our foundation are:
  • Radical customer centricity
  • Ownership-driven culture
  • Keeping everything simple
  • Long-term thinking
  • Complete transparency

What you’ll do:
  • Monitor and troubleshoot issues related to system performance, availability, and security.
  • Define and implement Service Level Indicators (SLI), Service Level Objectives (SLO), and Error Budgets to measure and improve service reliability.
  • Analyze and report on Metrics and Trace data using Grafana.
  • Participate in on-call rotation to provide 24/7 support for critical production systems.
  • Collaborate with development teams to ensure new features and services are designed with scalability and reliability in mind.
  • Help in rolling out new security and infra features as and when released.
  • Proactively identify and resolve issues before they impact customers.
  • Manage app releases by automating the deployment process, ensuring proper version control, and managing the rollout to minimize the impact on users.
  • Coordinate between developers and operations to ensure smooth software releases and timely resolution of production issues.
  • Conduct Root Cause Analysis (RCA) of production incidents and develop plans to prevent future occurrences.
  • Review and optimize system performance, identify bottlenecks and implement capacity planning and recovery  strategies.
  • Valuate and automate manual and repetitive tasks to reduce toil and improve system efficiency.
  • Use CI/CD tools such as Git, Jira, and Jenkins to streamline the software development process.
 
What We're Looking For:
  • 4-6 years of relevant work experience.
  • Bachelor's or Master's degree in Computer Science or a related field.
  • Strong understanding of Linux/Unix systems administration and networking.
  • Experience with cloud platforms such as GCP, AWS.
  • Strong programming skills in one or more languages such as Python, Java, or Go.
  • Experience with monitoring and alerting tools such as Grafana, Prometheus, or New Relic.
  • Experience with configuration management too.
  • Strong problem-solving skills.
  • Strong communication and teamwork skills.
  • Experience with Kubernetes, Docker, and other containerization technologies is a plus

Apply for this job

*

indicates a required field

Resume/CV*

Accepted file types: pdf, doc, docx, txt, rtf


Select...
Select...
Select...
Select...