Back to jobs
Data Engineer
Bengaluru
Role Overview
We are seeking a Data Engineer to join our growing team and play a critical role in developing and maintaining scalable data pipelines and infrastructure. You will work closely with data scientists, product engineers, and business teams to ensure that data flows seamlessly through the system and is ready for analysis, reporting, and model development.
This role is perfect for someone who enjoys building efficient data systems, optimizing data flow, and ensuring data quality at scale.
Responsibilities
- Data Pipeline Development: Design, develop, and maintain efficient, scalable data pipelines using technologies such as Apache Kafka, Apache Spark, Airflow, and ETL processes that can process large volumes of data for machine learning, analytics, and business intelligence purposes.
- Real-time Data Flow: Implement real-time data streaming solutions and manage complex event processing using Apache Kafka, ensuring timely delivery and processing of data.
- Data Integration & Big Data: Integrate data from various sources, including third-party services, into a unified data warehouse or data lake, leveraging big data technologies such as Apache Hadoop and Spark for large-scale data processing.
- Data Workflow Orchestration: Design and manage data workflows and orchestration using Airflow to automate and monitor batch and real-time data pipelines.
- Database Management: Maintain and optimize relational (SQL) and NoSQL (e.g., MongoDB, Cassandra) databases to ensure high availability, scalability, and performance.
- Data Modeling: Work with data scientists and analysts to design and implement data models that support various product features and insights, particularly in complex data environments.
- Performance & Scalability: Optimize and ensure scalability of data processes to handle high-volume data flows, ensuring smooth operation and system reliability.
- Cloud Infrastructure: Build and optimize cloud-based solutions using platforms such as AWS, GCP, or Azure to manage large-scale data processing and storage.
- Data Quality & Security: Implement data validation and quality checks, ensuring that the data is accurate, clean, and consistent across the system. Work closely with the security team to ensure secure data handling and compliance with relevant industry standards (e.g., HIPAA, GDPR).
- Collaboration & Cross-Functional Work: Collaborate with data scientists, engineers, and product teams to build a robust, data-driven platform that enhances decision-making and optimizes business processes.
Qualifications
- 3+ years of experience in data engineering, big data engineering, or related roles
- Strong proficiency in data processing frameworks such as Apache Kafka, Apache Spark, Airflow, and Hadoop
- Experience building and optimizing ETL pipelines and integrating data from diverse sources
- Strong programming skills in languages such as Python, Java, or Scala
- Expertise with big data technologies and platforms (e.g., AWS Redshift, Google BigQuery, Apache Hive)
- Experience with data orchestration tools like Airflow and Apache NiFi
- Proficiency in SQL and experience with both relational and NoSQL databases (e.g., MongoDB, Cassandra)
- Familiarity with cloud platforms (AWS, GCP, or Azure) and data tools (e.g., Redshift, BigQuery)
- Strong knowledge of data quality practices, data governance, and security in regulated environments
- Excellent communication skills, with the ability to explain complex data processes and systems to non-technical stakeholders
- Bonus: Familiarity with machine learning systems, data pipelines for AI, or real-time analytics
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience
Apply for this job
*
indicates a required field