Top 12 ETL Developers for Data Pipelines

ETL (Extract, Transform, Load) has become the backbone of modern data infrastructure, powering everything from real-time analytics to machine learning workflows.
Behind this evolution is a global community of brilliant developers—visionary open-source contributors, startup founders still writing code, influential thought leaders, and engineers at the world’s top tech companies. The following list spotlights the most impactful ETL developers shaping the data landscape today through their tools, platforms, and ideas.
- Jeremiah Lowin
- Tyler Akidau
- Maxime Beauchemin
- Arvind Prabhakar
- Tristan Handy
- Joe Witt
- Michel Tricot
- George Fraser
- Douwe Maan
- Matt Casters
- Erik Bernhardsson
- Nick Schrock
Now, let’s delve deeper into their ETL-related achievements and contributions.
Jeremiah Lowin

Ep.130 — Make Original Mistakes.
Nationality: American
Jeremiah is the founder and CEO of Prefect, an open-source workflow orchestration tool often dubbed the “new Airflow”.
A former quantitative finance technologist, Jeremiah became a PMC member of Apache Airflow through his work automating dataflows in industry. In 2018, he founded Prefect to solve limitations he saw in Airflow’s scheduling and data flow handling (Prefect introduces a hybrid flow control system and focuses on data reliability). Under his leadership, Prefect’s “dataflow automation” platform has been adopted to run millions of pipelines, with a robust open-source community. Prefect’s success (raising over $30M and seeing rapid enterprise uptake) is a testament to Jeremiah’s dual focus on community and product – he deliberately nurtured Prefect’s open-source community from zero to thousands. Jeremiah also openly shares his insights on building healthy OSS communities.
His blend of technical skill (he codes much of Prefect Core) and community leadership has made Jeremiah Lowin a key influencer in the ETL orchestration space.
- LinkedIn: Jeremiah Lowin
- X (Twitter): @jlowin
Tyler Akidau
Nationality: American
Tyler is one of the founding engineers of Apache Beam and the tech lead who shaped Google’s cloud data processing services (Cloud Dataflow).
At Google, Tyler spent over a decade building distributed systems for both batch and streaming ETL. He was the architect of the Dataflow Model for unified batch/stream processing and a co-creator of Google’s internal MillWheel stream processing engine. His work culminated in Apache Beam, the open-source programming model unifying batch and streaming pipelines, where Tyler is a PMC member. He is also widely known for his “Streaming 101” and “102” articles, which have educated tens of thousands of engineers on modern streaming ETL concepts. Now a software engineer at Snowflake, Tyler continues to influence streaming ETL practices. He advocates that batch and stream processing are “two sides of the same coin” and pushes the industry towards a seamless blend of the two.
Tyler Akidau’s thought leadership and engineering contributions (including a book Streaming Systems) have fundamentally advanced the state-of-the-art for ETL on massive data streams.
- LinkedIn: Tyler Akidau
- X (Twitter): @takidau
Maxime Beauchemin
Nationality: French
Maxime is the original creator of Apache Airflow, the widely adopted workflow orchestration platform for ETL pipelines, and Apache Superset (an open-source BI platform).
A veteran data engineer, Maxime developed these tools while at Airbnb to streamline complex data workflows and democratize analytics. He previously cut his teeth in data warehousing at Ubisoft and Yahoo, then built analytics infrastructure at Facebook. In 2017 he founded Preset, a startup offering Superset as a managed service, and continues to champion open-source data engineering. Maxime’s work on Airflow has had a massive impact on the ETL ecosystem – Airflow became an Apache project and has tens of thousands of users orchestrating data pipelines globally.
His thought leadership (e.g. “The Rise of the Data Engineer”) and active coding involvement have cemented him as a top influencer in ETL technologies.
- LinkedIn: Maxime Beauchemin
- X (Twitter): @mistercrunch
- GitHub: mistercrunch
Arvind Prabhakar
Nationality: Indian
Arvind is an open-source luminary in data integration, known for creating Apache Flume and co-founding StreamSets (a pioneering DataOps platform).
As an early engineer at Cloudera, Arvind was the architect and PMC chair of Apache Flume, which in the early 2010s became a go-to framework for collecting and transferring log data at scale. In 2014, he teamed up with Girish Pancha to start StreamSets, aiming to bring continuous data ingestion (“always-on” ETL) to modern architectures. At StreamSets (acquired by Software AG in 2022), Arvind served as CTO and chief product officer, guiding development of its platform for building and monitoring smart data pipelines. He is an ASF member and has also contributed to projects like Sqoop and Storm.
Arvind’s career reflects a relentless focus on solving data movement challenges: from Flume’s durable log collection to StreamSets’ intelligent pipelines, he has consistently pushed the envelope in ETL system design. This breadth of impact over a decade cements his place among top ETL developers.
- LinkedIn: Arvind Prabhakar
- X (Twitter): @aprabhakar
Tristan Handy
Nationality: American
Tristan is the founder and CEO of dbt Labs, the company behind dbt (data build tool) – a popular open-source framework that has revolutionized the “T” in ELT by enabling analysts to own data transformations.
Tristan launched dbt in 2016 and grew it from a niche open-source project into a global movement in analytics engineering, now used by over 50,000 companies. Under his leadership, dbt Labs (formerly Fishtown Analytics) scaled to a unicorn startup valued at $4.2B. Tristan has over 20 years of data experience and has been a key voice in the modern data stack renaissance. He pioneered the concept of analytics engineering – applying software engineering best practices to ETL – through extensive blogging, community building, and the widely read dbt newsletter and podcast.
By enabling SQL-focused data practitioners to produce production-quality data pipelines, Tristan Handy’s work with dbt has had a profound impact on data engineering workflows.
- LinkedIn: Tristan Handy
- X (Twitter): @jthandy
- GitHub: jthandy
Joe Witt
Nationality: American
Joe is a co-creator of Apache NiFi, the powerful real-time data ingestion and ETL platform born out of the NSA.
Joe spent a decade at the NSA developing “Niagara Files”, which was open-sourced as NiFi in 2014. In 2015, he co-founded Onyara, a startup to productize NiFi, which was quickly acquired by Hortonworks as NiFi gained traction in industry. As Vice President of Engineering at Cloudera (after the Hortonworks merger), Joe continued to drive NiFi’s development and community. He remains a PMC member on the Apache NiFi project, guiding its roadmap. Under Joe’s stewardship, NiFi became a top choice for building secure, high-throughput data flows – from IoT sensor streams to enterprise ETL – valued for its easy UI and provenance features. Today, Witt is applying his dataflow expertise as co-founder/CEO of Datavolo, focusing on AI-driven data ingestion.
Joe Witt’s unique journey from intelligence agency innovator to open-source leader highlights his outsized influence on ETL system design.
- LinkedIn: Joe Witt
- X (Twitter): @joewitt26
Michel Tricot
Nationality: French
Michel Tricot is the co-founder and CEO of Airbyte, the open-source data integration platform that has quickly become a standard for ELT pipelines.
With 15+ years in data engineering, Michel previously led the data ingestion team at LiveRamp, where he managed hundreds of TBs of daily data syncing and realized the need for standardized connectors. In 2020, he launched Airbyte to “commoditize data integration” by providing a community-maintained library of connectors. Under Michel’s technical leadership, Airbyte’s open-source project exploded in popularity, raising over $180M and cultivating a contributor community around its Singer-based connector protocol. Within 5 months of launch, Michel’s team raised a $5.2M seed to accelerate Airbyte’s growth. Today, Airbyte is used by thousands of companies to extract and load data from APIs and databases into warehouses.
Michel Tricot’s vision of an open, extensible ETL ecosystem has influenced how organizations onboard data, making him a top ETL innovator.
- LinkedIn: Michel Tricot
- X (Twitter): @MichelTricot
- GitHub: michel-tricot
George Fraser
Nationality: American
George is the co-founder and CEO of Fivetran, a leading fully managed ETL service that helped pioneer the modern “ELT” approach to cloud data integration.
A former neuroscientist turned entrepreneur, George started Fivetran in 2012 (with co-founder Taylor Brown) after observing how cumbersome data pipelines were in practice. He led Fivetran through Y Combinator and into a global enterprise now valued at $5.6 billion. Under George’s leadership, Fivetran created a “data pipelines as a service” model that automatically extracts and loads data from dozens of SaaS sources, freeing engineers from maintenance. Today, thousands of companies rely on Fivetran’s connectors for their analytics pipelines. George remains hands-on with product strategy and often shares insights (even authoring articles on TechCrunch) about data engineering best practices.
His work has streamlined ETL for the masses, positioning Fivetran as a “global leader in data movement” and George Fraser as a prominent figure in the ETL domain.
- LinkedIn: George Fraser
- GitHub: georgewfraser
Douwe Maan
Nationality: Dutch
Douwe Maan is the founder and CEO of Meltano, an open-source DataOps platform focused on ELT pipelines.
Formerly employee #10 at GitLab, Douwe spun out Meltano from an internal project into an independent startup in 2021. Meltano provides an integration framework that ties together Singer extractors and dbt transformations, reflecting Douwe’s vision for a modular, end-to-end ETL solution. While at GitLab, Douwe contributed to Meltano’s early development and served as its general manager, before leading its successful spin-off (with investment from Alphabet GV). Under his leadership, Meltano has gained traction as a flexible, self-hosted alternative to proprietary ETL tools, embraced by data engineers seeking a single platform for the entire data lifecycle.
Douwe’s engineering background (he spent years as a developer and engineering manager at GitLab) and his decision to “follow his passion” to start a data engineering venture have made him an influential voice in the open-source ETL community.
- LinkedIn: Douwe Maan
- X (Twitter): @DouweM
- GitHub: DouweM
Matt Casters
Nationality: Belgian
Matt is the original creator of Kettle (Pentaho Data Integration), one of the first open-source ETL tools, and a co-founder of Apache Hop.
He began developing Kettle in 2001 and open-sourced it in 2005, ushering in a new era of affordable ETL for the BI community. As chief architect for Pentaho after they acquired Kettle, Matt led its evolution into a enterprise-grade ETL suite used by thousands of organizations. Today, he continues to innovate as an Apache Hop PMC member, reimagining Kettle’s legacy for modern data workloads. With over two decades in data integration, Matt is revered for making ETL accessible – his Pentaho Kettle project was among the first to challenge proprietary players. He has also contributed to Neo4j’s ETL capabilities as a solutions architect.
Matt Casters’ enduring impact on the ETL field, from early adoption of open-source data integration to mentoring a new generation via Apache Hop, firmly establishes him among the top ETL experts.
- LinkedIn: Matt Casters
- X (Twitter): @mattcasters
- GitHub: mattcasters
Erik Bernhardsson
Nationality: Swedish
Erik is the creator of Luigi, an open-source Python framework for building complex batch data pipelines.
He developed Luigi at Spotify in 2012 to manage the company’s large-scale music recommendation ETL workflows. Luigi (with 10k+ GitHub stars) became one of the first popular DAG schedulers, influencing later tools like Airflow. Beyond Luigi, Erik is also known for open-sourcing Annoy (vector search library) and for building Spotify’s first music recommendation system. After Spotify, he served as CTO at Better.com, and later founded Modal, a serverless data infrastructure startup. Erik’s blend of hands-on engineering and visionary leadership is evident: his blog posts on data engineering and tech culture have a wide following, and he’s even a former IOI gold medalist in programming.
By open-sourcing Luigi and advocating for functional data engineering practices, Erik Bernhardsson has significantly shaped how practitioners approach ETL pipeline construction.
- LinkedIn: Erik Bernhardsson
- X (Twitter): @bernhardsson
- GitHub: erikbern
Nick Schrock
Most ETL tools treat data pipelines as black boxes. Dagster opens the box and gives you knobs.
Nationality: American
Nick is the founder of Dagster (Elementl), an open-source data orchestrator that brings software engineering discipline to ETL pipelines.
A former Facebook engineer, Nick co-created GraphQL in 2012 and led the Product Infrastructure team that built foundational tools like GraphQL and React. Drawing on that experience, he launched Dagster to address the pain points of data engineering – Dagster introduces a new programming model for data workflows, emphasizing type-safe, testable, and reusable ETL components. As CEO of Elementl (Dagster’s company), Nick remains deeply involved in the code and design of the platform. Under his guidance, Dagster has gained a strong following among data engineers for its innovative approach to pipeline definition and observability.
Nick, now CTO at Dagster Labs and former Facebook engineer, is a leading ETL architect, known for shaping top developer frameworks and redefining data orchestration.
- LinkedIn: Nick Schrock
- X (Twitter): @schrockn
Wrap Up
These legends represent exceptional talent, making them extremely challenging to headhunt. However, there are thousands of other highly skilled IT professionals available to hire with our help. Contact us, and we will be happy to discuss your hiring needs.
Note: We’ve dedicated significant time and effort to creating and verifying this curated list of top talent. If you intend to share or make use of it in any way, we kindly ask that you include a backlink to the original source – EchoGlobal.