Is Data still relevant?

Yes, data remains central to business operations. Organizations rely on data engineering, analytics, and science for decision-making, automation, and AI development.

How much do Data developers make?

Salaries differ by role and region. In the US, data engineers and developers typically earn between $90,000 and $140,000 per year. Freelance rates are often $40 to $120 per hour.

Who is a subject matter expert in Data?

Experts include professionals with experience in data architecture, ETL pipelines, big data frameworks like Spark, and database management. Many also specialize in machine learning and cloud data platforms.

Is it hard to find Data programmers?

Yes, competition for skilled data professionals is strong. While many developers have SQL knowledge, advanced skills in big data, streaming, or large-scale analytics are less common.

What are the best sites to hire Data freelancers?

Popular platforms include Upwork, Toptal, and Fiverr, as well as specialized communities such as Kaggle and Stack Overflow Jobs.

Top 18 Data Engineers in the Field (2026)

Data engineering sits at the heart of modern analytics, bridging raw data and actionable insights.

The individuals below represent the best data engineers globally, excelling in open-source contributions, leadership at high-impact tech companies, influential blogging and community engagement, and competition accolades. They have built and maintained the platforms and tools that power data-driven organizations. Each profile includes background information and links to their active public profiles so you can follow their work.

Ali Ghodsi
Martin Kleppmann
Jay Kreps
Tristan Handy
Wes McKinney
DJ Patil
Dhruba Borthakur
Doug Cutting
Jordan Tigani
Frank McSherry
Neha Narkhede
Maxime Beauchemin
Zhamak Dehghani
Stephan Ewen
Reynold Xin
Aritra Ghosh
Holden Karau
Gwen Shapira

Now, let’s delve deeper into their remarkable careers and contributions.

Ali Ghodsi

Nationality: Iranian

Ali is the CEO and co-founder of Databricks, and was one of the original creators of Apache Spark. A former academic from Sweden (PhD in distributed computing), Ali helped turn Spark from a research project into an open-source powerhouse.

In 2013 he co-founded Databricks to commercialize Spark, and became CEO in 2016. Under his leadership, Databricks has become a leader in unified data analytics (recently valued over $60B) while remaining committed to open source (e.g., releasing Delta Lake and MLflow). Ali is known for his vision of the “Lakehouse” architecture that blends data lakes and warehouses. He’s an active voice in the data community, appearing in keynote talks and interviews about the future of AI and data.

Linkedin: Ali Ghodsi
X (Twitter): @alighodsi

Martin Kleppmann

Nationality: British

Martin is a researcher in distributed systems at the University of Cambridge and author of the acclaimed book Designing Data-Intensive Applications. Martin’s book (published 2017) has become a “bible” for data engineers, distilling the principles behind databases, streams, and distributed algorithms.

Previously, Martin was an engineer in industry – he co-founded and sold two startups and worked on large-scale data infrastructure at LinkedIn. He also co-developed Apache Samza (a stream processing framework) during his time at LinkedIn, contributing to early adoption of stream processing. Currently as an academic, he focuses on local-first collaboration software and CRDTs, pushing the frontier of realtime collaborative data systems. Martin continues to engage the community through his blog, talks, and open-source projects, bridging theoretical advances with practical engineering.

Linkedin: Martin Kleppmann
X (Twitter): @martinkl
Website/Blog: martin.kleppmann.com

Jay Kreps

Jay Kreps - Top 18 Data Engineers in the Field

Nationality: American

Jay is the co-founder and CEO of Confluent and one of the original creators of Apache Kafka.

At LinkedIn, Jay and his colleagues built Kafka to handle the company’s massive event streams, and it open-sourced in 2011. Kafka’s publish/subscribe model has since become a standard for streaming data across thousands of organizations. In 2014 Jay left LinkedIn to found Confluent, bringing a cloud-native Kafka platform to enterprises. He has overseen Confluent’s growth (now a public company) and the evolution of Kafka into an ecosystem. Jay also coined the idea of the “log” as the heart of data systems in his book I ❤️ Logs. He frequently blogs and speaks about streaming architectures and has helped shape how modern companies think about real-time data.

Linkedin: Jay Kreps
X (Twitter): @jaykreps

Tristan Handy

Nationality: American

Tristan is the Founder and CEO of dbt Labs, the company behind dbt (data build tool). Tristan launched dbt in 2016 (initially as an open-source project at Fishtown Analytics) to empower data analysts to adopt software engineering best practices in analytics – namely, writing modular SQL transformations with version control and testing.

dbt has since sparked the analytics engineering movement and is used by over 60,000 companies. Tristan has grown Fishtown into dbt Labs, a venture-backed firm that now offers dbt Cloud and has become a hub of the modern data stack. He’s also known for his thought leadership via the weekly “Analytics Engineering Roundup” newsletter and podcast, where thousands tune in to hear Tristan discuss data team practices. His mix of community-building and product vision has made dbt an indispensable tool in the data engineer’s toolbox.

Linkedin: Tristan Handy
X (Twitter): @jthandy
Github: jthandy

Wes McKinney

Wes McKinney - Top 18 Data Engineers in the Field

Nationality: American

Wes is the original creator of pandas, the ubiquitous Python data analysis library, and a co-creator of Apache Arrow. His work fundamentally improved how data scientists and engineers handle data in Python.

Wes wrote Python for Data Analysis, and created pandas in 2008 to bring R-like DataFrames to Python. He later founded Ursa Labs and co-created Arrow, a cross-language in-memory data format that has become an industry standard (enabling zero-copy data sharing between systems). Today Wes is a co-founder of Voltron Data (after Ursa Labs merged into it), where he continues to develop Arrow and related tools. He’s also a Principal Architect at Posit (RStudio) as of 2024, bridging Python and R ecosystems. Wes is an active open source advocate, often sharing insights on GitHub and Twitter, and has received awards for his contributions to data science software.

Linkedin: Wes McKinney
X (Twitter): @wesmckinn
Github: wesm
Website/Blog: wesmckinney.com

DJ Patil

Nationality: American

DJ Patil is often cited as one of the most influential data scientists in the world and was the first-ever U.S. Chief Data Scientist. With a background in mathematics, DJ helped coin the term “data scientist” during his tenure as Chief Scientist at LinkedIn in the late 2000s, where he led the development of LinkedIn’s early data products.

He also held senior data roles at eBay and PayPal. As U.S. Chief Data Scientist, DJ evangelized for data-driven policymaking and worked on initiatives in health care, criminal justice, and education, demonstrating the social impact of data engineering. After government, he entered venture capital and is currently a General Partner at GreatPoint Ventures, while advising startups and public organizations on data strategy.

DJ Patil remains a prominent public speaker on the power of data, and his blend of technical and leadership experience continues to inspire the next generation of data professionals.

Linkedin: DJ Patil
X (Twitter): @dpatil

Dhruba Borthakur

Nationality: Indian

Dhruba is the CTO and co-founder of Rockset, a real-time analytics database startup, and a veteran engineer behind key big data storage technologies. At Yahoo, Dhruba was one of the founding engineers of the Hadoop HDFS, which provided petabyte-scale storage to the early big data world.

Later at Facebook, he architected the distributed storage engine RocksDB as the founding engineer on Facebook’s database team. In 2016 he co-founded Rockset to build a cloud-native analytical database for fast SQL on semi-structured data. Rockset’s indexing technology owes much to Dhruba’s deep storage expertise. (Notably, Rockset was acquired by OpenAI in 2025, indicating the value of its technology.) Beyond these, Dhruba has contributed to Apache HBase and worked on the Haystack photo storage system at Facebook.

He frequently shares his knowledge in database conferences and continues to push the envelope of low-latency analytics in the cloud era.

Linkedin: Dhruba Borthakur
X (Twitter): @dhruba_rocks

Doug Cutting

Doug Cutting - Top 18 Data Engineers in the Field

Nationality: American

Doug is the creator of Apache Hadoop and a legend in open-source big data. He also created Apache Lucene and co-created Apache Nutch (web crawler) in the early 2000s.

Hadoop, born from Doug’s work at Yahoo around 2005, implemented the MapReduce paper and the Hadoop Distributed File System (HDFS) – forming the backbone of the big data movement. In 2009 Doug co-founded Cloudera and joined as Chief Architect, helping to bring Hadoop to the enterprise. He also served as Chairman of the Apache Software Foundation. Even after the Hadoop era, Doug continues to champion open data platforms.

His work enabled the era of distributed data lakes, and terms like “Hadoop ecosystem” exist largely thanks to him. In recent years he’s been an advocate for data privacy and open source governance.

Linkedin: Doug Cutting
X (Twitter): @cutting
Github: cutting

Jordan Tigani

Nationality: American

Jordan is the co-founder and CEO of MotherDuck, a startup bringing the power of the open-source DuckDB project to the cloud. Prior to MotherDuck, Jordan was a founding engineer and longtime leader on Google BigQuery – he helped build BigQuery’s storage and metadata systems in its early 2010s launch, and later served as BigQuery’s Director of Engineering.

After Google, he was Chief Product Officer at SingleStore, another database company. In 2021, Jordan co-founded MotherDuck to integrate DuckDB’s in-process analytics with cloud scalability, aiming to provide fast analytics on smaller-scale data without complex infrastructure. Under his leadership, MotherDuck has gained buzz (raising over $100M and a $400M valuation). Jordan is also known for co-authoring the book Google BigQuery: The Definitive Guide and for his engaging conference talks on data architecture.

He brings a pragmatic perspective on when to leverage “big” data tech versus “duck-sized” data solutions.

Linkedin: Jordan Tigani
X (Twitter): @jrdntgn

Frank McSherry

Nationality: American

Frank is the chief scientist and co-founder of Materialize, and a researcher famed for his work on streaming dataflow systems. While at Microsoft Research, Frank co-invented Timely Dataflow and Differential Dataflow, two innovative computational models for incremental computing.

These became the foundation for Materialize’s real-time SQL database, which can maintain complex query results continuously. Materialize’s engine is built on Timely/Differential, frameworks that Frank open-sourced. In academia, Frank is known for contributions to database theory and privacy. At Materialize, he initially served as CEO and then transitioned to CTO – focusing on engineering the product’s core. Frank is respected for bringing rigorous computer science into practical systems; he often writes blog posts and speaks about how Materialize achieves its “streaming tables” magic.

He also remains active in the Rust community and is an advocate for open science in software.

Linkedin: Frank McSherry
Github: frankmcsherry

Neha Narkhede

Neha Narkhede - Top 18 Data Engineers in the Field

Nationality: Indian

Neha is the co-founder and former CTO of Confluent, and a co-creator of Apache Kafka during her time at LinkedIn.

At LinkedIn, Neha was instrumental in developing Kafka as a reliable, high-throughput distributed messaging system that now handles trillions of events per day across industries. In 2014 she co-founded Confluent to build a streaming data platform around Kafka, and led its technology strategy as CTO. Neha has since moved on to co-found Oscilar (her second startup, focused on AI-driven risk management), where she is CEO. She has been recognized in Forbes’s Top 50 Women in Tech and MIT Tech Review’s Innovators Under 35.

Neha is also a sought-after speaker on streaming architectures and entrepreneurship, and serves as a board member for Confluent.

Linkedin: Neha Narkhede
X (Twitter): @nehanarkhede
Website/Blog: nehanarkhede.com

Maxime Beauchemin

Nationality: French

Max is the original creator of Apache Airflow and Apache Superset, two widely used open-source data tools. At Airbnb circa 2014, Max created Airflow to automate complex data pipelines (now a top workflow orchestrator in ETL and data engineering).

He later created Superset as an open-source business intelligence and data visualization platform. In 2019, Max founded Preset, a startup providing a managed Superset platform, where he is CEO. With past data engineering stints at Facebook, Airbnb, and Lyft, Max has consistently built tools to fill gaps in the data ecosystem. He’s an open-source evangelist and active on social media, where he shares thoughts on data tool design.

Max’s contributions have saved countless data engineers from “reinventing the wheel” by providing production-ready frameworks.

Linkedin: Maxime Beauchemin
X (Twitter): @mistercrunch
Github: mistercrunch

Zhamak Dehghani

The future of data is decentralized, domain-oriented, and product-driven.

Nationality: Iranian

Zhamak is best known as the creator of the “Data Mesh” paradigm – a decentralized approach to enterprise data architecture. She introduced the concept in 2019 while at ThoughtWorks, via influential articles that challenged traditional monolithic data lakes.

In 2022, Zhamak authored Data Mesh: Delivering Data-Driven Value at Scale, elaborating on treating data as a product and organizing teams around data domains. To further this vision, she founded Nextdata in 2023 and serves as CEO, aiming to productize data mesh principles. In April 2025, Zhamak’s company launched Nextdata OS, a platform for building “autonomous data products” that operationalize data mesh ideas. Zhamak is a frequent keynote speaker and thought leader in data management, advocating for federated governance and self-serve data infrastructure.

Her work is reshaping how large organizations manage analytical data at scale.

Linkedin: Zhamak Dehghani
X (Twitter): @zhamakd

Stephan Ewen

Stephan Ewen - Top 18 Data Engineers in the Field

Nationality: German

Stephan is a co-founder of Apache Flink and was the CTO of Ververica (formerly data Artisans), the company that commercialized Flink.

Stephan started Flink as part of his research at TU Berlin, and helped shape it into a powerful open-source stream processing engine known for high-throughput, low-latency processing. In 2014 he co-founded data Artisans in Berlin to bring Flink to industry; the company was later acquired by Alibaba and rebranded Ververica. Stephan oversaw Flink’s evolution to handle real-time streaming at companies like Alibaba, Netflix, and Uber. In 2022, he left Ververica to start a new venture called Restate (focused on stateful event processing), where he is now Founder and CTO.

With over a decade of building streaming systems, Stephan remains a prominent voice in stream processing, often sharing his insights on topics like event-driven architecture and state management.

Linkedin: Stephan Ewen
X (Twitter): @StephanEwen
Github: StephanEwen

Reynold Xin

Nationality: Chinese

Reynold is a co-founder and Chief Architect at Databricks, and one of the original developers of Apache Spark. At Databricks, Reynold has overseen major technical contributions to Spark – he led the creation of Spark SQL/DataFrames and the Project Tungsten engine for optimizing in-memory computation.

These efforts greatly improved Spark’s performance and usability, expanding it from batch jobs to a general analytics engine. Reynold remains deeply involved in Spark’s development (he’s a Spark PMC member) and in Databricks’ product strategy. He frequently shares new features at Databricks’ Data + AI Summits and on the Databricks blog, helping engineers understand advanced topics like adaptive query execution and Photon engine optimizations.

With his academic background (Berkeley AMPLab) and practical leadership, Reynold is a key driver of Spark’s continued evolution in the open-source community.

Linkedin: Reynold Xin
X (Twitter): @rxin

Aritra Ghosh

Nationality: Indian

Aritra is an Azure‑savvy data engineering specialist and founder dedicated to helping enterprises accelerate their data modernization journeys, often leveraging large language models and AI‑driven solutions.

His expertise shines through his detailed Azure Data Engineering Cheat Sheet on LinkedIn, which offers professionals a practical, at‑a‑glance reference for Azure data engineering tools and workflows. With deep experience as a Solutions Architect (designing high‑performance, cloud‑native ETL and data processing platforms using Azure Data Factory, Databricks, Data Lake, and DevOps pipelines) Aritra brings hands‑on technical prowess to enterprise transformation projects.

Through his strategic leadership at Vidyutva and insightful industry contributions, he continues to bridge the gap between cloud infrastructure and impactful AI‑powered data solutions.

Linkedin: Aritra Ghosh

Holden Karau

Nationality: Canadian

Holden is an open-source engineer and author known for her contributions to Apache Spark. She became a Spark committer in the project’s early days and co-authored several influential books, including Learning Spark (2015) and High Performance Spark (2017).

Holden has worked at companies like IBM, Google, and Netflix on large-scale data platforms, often focusing on improving Spark’s usability. As a transgender woman in tech, Holden is also a champion for diversity and mentorship in the data community. She is a frequent speaker at conferences (known for her fun live-coding talks) and shares knowledge through blogs and YouTube. In recent years, Holden founded a startup exploring AI applications while continuing to contribute to Spark and related projects.

Her approachable teaching style and deep expertise have made her a beloved figure in the Spark community.

Linkedin: Holden Karau
X (Twitter): @holdenkarau
Github: holdenk

Gwen Shapira

Nationality: American

Gwen is a data architect and developer advocate with deep expertise in streaming data systems, particularly Apache Kafka and its surrounding ecosystem.

She spent several years at Confluent as an architect, helping organizations design and operate large-scale streaming platforms. Gwen is also a co-author of Kafka: The Definitive Guide, a widely used reference for engineers building event-driven systems. Her work focuses on practical system design, data modeling for streams, and operational reliability.

Gwen is known for translating complex distributed systems concepts into clear, actionable guidance through talks, blogs, and training. She remains active in the data engineering community as a speaker and advisor, particularly around real-time analytics and streaming architectures.

Linkedin: Gwen Shapira
X (Twitter): @gwenshap
Github: gwenshap

Wrap Up

These experts represent exceptional talent, making them extremely challenging to headhunt. However, there are thousands of other highly skilled IT professionals available to hire with our help. Contact us, and we will be happy to discuss your hiring needs.

Note: We’ve dedicated significant time and effort to creating and verifying this curated list of top talent. However, if you believe a correction or addition is needed, feel free to reach out. We’ll gladly review and update the page.