Data Engineering Pipeline Engineer – Role
ABOUT THE ROLE
We are looking for a skilled Data Engineering Pipeline Engineer to design, build, and maintain scalable data infrastructure that powers our analytics, machine learning, and business intelligence platforms. You will work across the full data lifecycle — from ingestion and transformation to storage and delivery — ensuring reliability, performance, and governance at every stage. This is a high-impact role where you will collaborate closely with data scientists, analysts, and platform engineers to ship robust pipelines that enable data-driven decisions across the organization.
KEY RESPONSIBILITIES
• Design, build, and maintain scalable batch and real-time data pipelines using tools such as Apache Spark, Kafka, Flink, or Airflow
• Develop and optimize ETL/ELT workflows to ingest data from diverse sources including APIs, databases, event streams, and flat files
• Architect and manage cloud-based data infrastructure on AWS, GCP, or Azure (e.g., S3, BigQuery, Redshift, Databricks, Snowflake)
• Implement data quality monitoring, alerting, and observability frameworks to ensure pipeline reliability and SLA compliance
• Collaborate with data scientists and ML engineers to support model training, feature engineering, and inference pipelines
• Partner with analytics engineers to maintain and evolve data warehouse models (dbt, dimensional modeling)
• Define and enforce data governance standards including cataloging, lineage tracking, and access control policies
• Optimize pipeline performance through profiling, query tuning, partitioning strategies, and cost management
• Document pipeline architecture, data contracts, and runbooks for operational clarity
• Participate in on-call rotation and incident response for critical data infrastructure
REQUIRED QUALIFICATIONS
• 4+ years of experience in data engineering, ETL development, or a related software engineering discipline
• Proficiency in Python and/or Scala for pipeline development; strong SQL skills across multiple dialects
• Hands-on experience with distributed processing frameworks such as Apache Spark, Beam, or Flink
• Experience with workflow orchestration tools such as Apache Airflow, Prefect, or Dagster
• Deep familiarity with cloud data platforms (AWS, GCP, or Azure) and managed services such as BigQuery, Redshift, or Synapse
• Experience designing and maintaining data warehouses or lakehouses (Snowflake, Databricks, Delta Lake, Iceberg)
• Strong understanding of data modeling concepts: normalization, star/snowflake schema, slowly changing dimensions
• Experience with streaming and event-driven architectures using Kafka, Kinesis, or Pub/Sub • Familiarity with CI/CD practices and infrastructure-as-code tools (Terraform, Pulumi) for data platform deployments
• Excellent communication skills with the ability to translate business requirements into technical solutions