Senior Data Engineer
Date: Mar 30, 2026
Location: MADRID, ES, 28037
Company: HOLCIM Group
SUMMARY OF THE JOB
We are seeking a seasoned Senior Data Engineer to design, build, and optimize our next-generation data platform. You will be responsible for architecting scalable data pipelines, managing large-scale distributed systems, and ensuring our data infrastructure in AWS and Databricks is robust and efficient. The ideal candidate is a Spark expert with a deep understanding of the AWS ecosystem and a passion for automation.
MAIN ACTIVITIES / RESPONSIBILITIES
-
Pipeline Architecture: Design and implement complex batch and streaming ETL/ELT pipelines using Python, SQL, and Spark to process massive datasets.
-
Cloud Infrastructure: Leverage AWS Data Analytics services to build scalable, secure, and cost-effective data solutions.
-
Orchestration & DevOps: Manage and automate data workflows using Airflow, while utilizing Docker and ECS for containerized application deployment.
-
System Optimization: Monitor and tune the performance of distributed systems (Spark Cluster) to ensure high availability and low latency.
-
Infrastructure as Code: Utilize AWS CloudFormation or Terraform to manage data infrastructure, ensuring repeatable and version-controlled environments.
-
Cost Optimization: Monitor and optimize AWS spend by selecting appropriate instance types (Spot vs. On-Demand) and refining data storage strategies.
-
Security & Compliance: Implement IAM roles, bucket policies, and encryption (KMS) to ensure data is secure at rest and in transit.
-
Collaboration: Work within an Agile framework to deliver iterative value, collaborating closely with Data Scientists and Stakeholders to translate business needs into technical reality.
JOB DIMENSIONS
List of direct reports:
-
Up to 2 Direct Reports, and around 15 externals
Key interfaces, stakeholders and relationships:
-
Internal:
-
GDS: product manager, application manager, data & analytics & AI team
-
Country business stakeholders
-
-
External : 3rd party vendors
PROFILE REQUIRED
-
Experience: Minimum 4+ years of hands-on experience in active Big Data environments and 2+ years specializing in Data Analytics within AWS.
-
Compute & Processing: Amazon EMR: Architecting and managing Spark clusters for large-scale distributed processing.
-
AWS Glue: Developing serverless ETL jobs, managing the Data Catalog, and implementing Glue Crawlers.
-
-
Storage & Warehousing:
-
Amazon S3: Implementing "Data Lake" best practices, including partitioning, compression (Parquet/Avro), and lifecycle policies.
-
Amazon Redshift: Designing star/snowflake schemas and optimizing query performance for high-volume data warehousing.
-
Amazon Athena: Performing ad-hoc SQL analysis directly on S3 data.
-
Experience with open table formats (iceberg/delta)
-
-
Orchestration & Integration:
-
Amazon MWAA (Managed Workflows for Apache Airflow): Deploying and scaling Airflow environments.
-
AWS Lambda: Building event-driven data triggers and micro-services.
-
-
Streaming (Advantage): Amazon Kinesis or MSK (Managed Streaming for Kafka) for real-time data ingestion.
-
-
Core Engineering: Expert-level proficiency in Spark, Python, and SQL.
-
Infrastructure & Tooling: Proven experience with Airflow for orchestration and Docker/ECS for containerization.
-
Good knowledge in Databricks and data mesh architectures. Good understanding in how to implement and maintain Lakehouse data models (bronze / silver / gold layers) using Delta Lake for reliability, ACID transactions, time travel and schema evolution.
-
Solid software engineering practices: Git, CI/CD for data pipelines, automated testing, code quality and documentation.
-
Communication: Excellent written and oral English communication skills, with the ability to explain complex technical concepts to non-technical audiences.
-
Degree in Computer Science, Engineering, Mathematics or related field, or equivalent practical experience.
PREFERRED “PLUS” QUALIFICATION
-
Real-time Processing: Experience with streaming and distributed messaging applications like Flink and Kafka.
-
Core Tech: Java programming.
-
Industrialise ML use cases
-
Data Visualization: Experience with QlikView or QlikSense to support BI initiatives.
-
Agile: Experience working in a fast-paced Scrum or Kanban environment.
-
Certifications: AWS Certified Data Engineer – Associate/Professional or AWS Certified Solutions Architect, Databricks Data engineer (Associated/Professional) certification
-
DevOps: Experience with Openshift, Github Actions or Jenkins for CI/CD of data workflows.