Big Data Experts: Top 15 to Hire

In the fast-evolving world of big data, a select group of industry professionals stand out for their impactful contributions.
Below we present leading big data experts active in the last years, each excelling through open-source innovation, startup leadership, influential blogging or speaking, major roles at tech companies, or prize-winning competition performances. These profiles highlight why they’re at the forefront of big data and include links to their public profiles for further insight:
- Hilary Mason
- Matei Zaharia
- Jay Kreps
- Neha Narkhede
- Doug Cutting
- D.J. Patil
- Jeremy Howard
- Wes McKinney
- Shay Banon
- Maxime Beauchemin
- Kirk Borne
- Matthew Rocklin
- Zhamak Dehghani
- Abhishek Thakur
- Ali Ghodsi
Now, let’s delve into each expert’s background and why they are notable.
Hilary Mason

What Happens to Data Science in the Age of AI?
Nationality: American
Hilary is a leading voice in data science and big data analytics.
She was Chief Scientist at Bitly, where she applied machine learning to understand internet attention patterns in real-time. In 2014, Hilary co-founded Fast Forward Labs, an R&D startup focused on emerging AI and data technologies, later acquired by Cloudera. As CEO of Fast Forward Labs, she led research on practical machine learning innovations. Hilary is also known for her influential writing and speaking: she’s blogged insights on data strategy, appeared in publications like Fast Company and Scientific American, and has been honored in Forbes 40 under 40 in tech.
A prominent advocate for the data science community (co-founder of HackNY, DataGotham), Hilary Mason has helped shape how businesses realize value from big data through intelligent algorithms and intuitive storytelling.
- LinkedIn: Hilary Mason
- X (Twitter): @hmason
- Website: hilarymason.com
Matei Zaharia
Nationality: Romanian
Matei is the original creator of Apache Spark, a landmark open-source engine for large-scale data processing.
He co-founded Databricks to commercialize Spark, serving as its CTO while maintaining an active role in its development and related projects like MLflow and Delta Lake. An accomplished computer scientist (associate professor at UC Berkeley), Matei has received awards like the ACM Dissertation Award for his work on Spark. Under his leadership, Databricks has grown into one of the world’s most valuable big data companies (valued at $62 billion in 2025).
His blend of academic expertise and hands-on coding makes him a pivotal figure bridging research and industry in big data.
- LinkedIn: Matei Zaharia
- X (Twitter): @matei_zaharia
- GitHub: mateiz
Jay Kreps
We were not going to make short-term decisions. We wanted to set the company up to execute over the longer term, and there’s a really significant opportunity in the data streaming space. [If we] don’t build something for that larger opportunity, then we’re going to miss out.
Nationality: American
Jay is best known as a co-creator of Apache Kafka, the distributed streaming platform that has become a backbone for real-time data pipelines.
While at LinkedIn, he helped design Kafka to handle high-throughput event data, and later co-founded Confluent in 2014 to build a company around Kafka’s ecosystem. As Confluent’s CEO, Jay has guided its growth while remaining deeply technical – he’s authored influential papers (including on the Kappa architecture for streaming data) and continues to advocate for developer-friendly data infrastructure.
Under his leadership, Confluent has made Kafka enterprise-ready, and Jay’s vision for data streaming has shaped how modern organizations integrate and react to big data in real time.
- LinkedIn: Jay Kreps
- X (Twitter): @jaykreps
- Personal Blog: jaykreps.com
Neha Narkhede
Nationality: Indian
Neha played a key role in building the big data streaming revolution.
As a software engineer at LinkedIn, she co-created Apache Kafka in 2011 to handle the site’s massive data feed. In 2014, Neha co-founded Confluent and as CTO led its technology development, helping companies adopt Kafka for mission-critical use (from Goldman Sachs trading to Netflix recommendations). She has since been recognized as one of America’s top young technology innovators (MIT Innovators Under 35) for “teaching companies to swim” in torrents of data.
Today, Neha continues to innovate as founder of a new startup (Oscilar) and as an investor, while remaining a prominent voice in streaming analytics and an inspiration for women in big data tech.
- LinkedIn: Neha Narkhede
- X (Twitter): @nehanarkhede
- Website: nehanarkhede.com
Doug Cutting
Nationality: American
Doug is often dubbed the “father of Hadoop” for pioneering the open-source framework that ushered in the era of Big Data.
He co-created Apache Hadoop (with Mike Cafarella) by implementing Google’s MapReduce paper in open source, enabling reliable, distributed processing of huge datasets. Before Hadoop, Doug created Apache Lucene (a popular search engine library) and co-created Nutch (a web crawler), key components that influenced search and big data indexing. In the last decade, Doug served as Chief Architect at Cloudera, guiding Hadoop’s enterprise adoption. He remains an Apache Software Foundation advocate and board member, championing open-source data ecosystems.
Doug’s contributions – from HDFS storage to the MapReduce processing paradigm – are foundational to today’s big data platforms.
- LinkedIn: Doug Cutting
- X (Twitter): @cutting
DJ Patil
Nationality: American
Dhanurjay “DJ” is a pioneer of the data science profession in industry and government.
In 2008, he (along with Jeff Hammerbacher) famously coined the job title “Data Scientist” to describe their work applying big data at LinkedIn and Facebook. DJ went on to become the first Chief Data Scientist of the United States, appointed in 2015, where he led national initiatives on data-driven policymaking. In the private sector, he has held senior roles at eBay, PayPal, LinkedIn, and later served as Head of Data Products at RelateIQ (Salesforce). DJ is known for promoting the power of open data and data ethics, and has been a top influencer shaping data strategy across industries.
He continues to build bridges between tech and society – currently as a General Partner at a venture firm – and remains a sought-after advisor for companies aiming to leverage big data responsibly and at scale.
- LinkedIn: D.J. Patil
- X (Twitter): @dpatil
Jeremy Howard
Nationality: Australian
Jeremy is an Australian data scientist and entrepreneur who has achieved global recognition in both the big data competition arena and open-source education.
He was a top-ranked Kaggle competitor (winning various machine learning competitions) and later became President and Chief Scientist of Kaggle, helping grow the platform for data science contests. Jeremy co-founded fast.ai, a research lab and online learning platform making deep learning more accessible. Through fast.ai he has developed the popular fastai library and taught thousands of students practical machine learning. Previously, he founded analytics startup Optimal Decisions (acquired in 2011) and led data products at Singularity University.
Jeremy’s current focus is on democratizing AI; his influential courses and MOOC have enabled many to enter AI by leveraging big data. He’s a frequent keynote speaker and was recognized as a Young Global Leader by the WEF for his contributions.
- LinkedIn: Jeremy Howard
- X (Twitter): @jeremyphoward
- Website: jeremy.fast.ai
Wes McKinney
Nationality: American
Wes is the software developer behind pandas, the Python data analysis library ubiquitous in data science.
He created pandas in 2008 to simplify working with tabular data in Python, and it has since become a cornerstone tool for data manipulation. In recent years, Wes led the design of Apache Arrow, an open-source columnar memory format accelerating big data interoperability. He co-founded Voltron Data (2020) to unify and advance the Arrow ecosystem across languages. Previously, Wes worked at Two Sigma and Cloudera, and authored the definitive book Python for Data Analysis. His work as pandas “BDFL” (Benevolent Dictator for Life) and Arrow champion has massively improved the productivity of data engineers and scientists dealing with large datasets.
Wes continues to innovate in high-performance computing, focusing on making big data processing faster and more accessible in the Python community.
- LinkedIn: Wes McKinney
- X (Twitter): @wesmckinn
- GitHub: wesm
- Website: wesmckinney.com
Shay Banon
Nationality: Israeli
Shay is the original author of Elasticsearch, the open-source distributed search and analytics engine that has become a de facto standard in big data search technology.
He developed Elasticsearch in 2010 (inspired by a project to help his wife search recipes) and open-sourced it, later co-founding Elastic to support and expand the stack (Elasticsearch, Kibana, Beats, Logstash). As CEO and now CTO of Elastic, Shay oversaw the company’s growth from a small project to a public company and a vibrant community, with Elastic’s products used for everything from enterprise log analysis to security data lakes. He has remained deeply involved in the technical roadmap, guiding features like real-time distributed indexing and search scalability.
Shay’s work has greatly democratized powerful search and querying of big data, enabling organizations worldwide to turn large datasets into actionable insights quickly.
- LinkedIn: Shay Banon
- X (Twitter): @kimchy
Maxime Beauchemin
Nationality: French
Maxime has built some of the most widely used open-source tools in data engineering.
He created Apache Airflow in 2014 while at Airbnb, to programmatically orchestrate complex data pipelines; Airflow is now a top project for ETL workflow scheduling. He also created Apache Superset, a popular open-source business intelligence and data visualization platform. Maxime later founded Preset (2019) to bring Superset to the cloud and continue evolving modern data exploration. With past stints at Facebook, Airbnb, and Lyft, Maxime has deep practical insight into scaling data systems. He’s an active blogger (“The Rise of the Data Engineer” is one of his noted essays) and speaks frequently on the state of data tooling.
By open-sourcing Airflow and Superset, Maxime empowered thousands of organizations to build reliable data pipelines and democratize data insights without expensive proprietary software.
- LinkedIn: Maxime Beauchemin
- X (Twitter): @mistercrunch
- Website: Medium
Kirk Borne
Nationality: American
Dr. Kirk Borne is a globally recognized big data evangelist who has consistently ranked among the top worldwide influencers in data science and AI since 2013.
An astrophysicist by training (he was a NASA researcher on the Hubble mission), Kirk transitioned to data science and helped pioneer the use of big data in astronomy. He served as Principal Data Scientist at Booz Allen Hamilton, advising large enterprises on data strategy, and is now Chief Science Officer at DataPrime. Kirk is extremely active on social media and blogging – with over 300k followers on X (Twitter), he shares insights on big data, machine learning, IoT, and data literacy daily. He also mentors at universities and speaks at dozens of conferences, spreading best practices in data management and analytics.
For his contributions to data advocacy and education, Kirk Borne has been consistently hailed as a leading voice making big data approachable and exciting for broad audiences.
- LinkedIn: Kirk Borne
- X (Twitter): @KirkDBorne
- Website: Data Leadership Group
- Blog: DataScienceCentral
Matthew Rocklin
Nationality: American
Matthew is the initial author of Dask, a flexible Python library for parallel computing that scales popular data science workflows to multi-core machines and clusters.
Created in the mid-2010s, Dask has become a vital tool to extend PyData (NumPy, pandas, scikit-learn) for “big data” scenarios by distributing work across clusters. Matthew led Dask’s development first at Anaconda and NVIDIA, and in 2020 he founded Coiled to provide cloud-hosted Dask solutions. At Coiled (where he is CEO), he continues to write code, focusing on improving Python’s scalability for big data analytics. Matthew is an open-source pragmatist – he actively maintains many Dask-related projects and engages the community through blog posts and talks. With a PhD in physics, he brings scientific rigor to computing.
By enabling Python users to handle large datasets with familiar tools, Rocklin’s work significantly lowers barriers in big data computing for scientists and businesses alike.
- LinkedIn: Matthew Rocklin
- X (Twitter): @mrocklin
- GitHub: mrocklin
- Website: matthewrocklin.com
Zhamak Dehghani
Nationality: Iranian-American
Zhamak Dehghani is known for introducing the concept of Data Mesh in 2019, a paradigm shift in big data architecture.
While a technology director at ThoughtWorks, she identified challenges in monolithic data lakes and proposed Data Mesh as a decentralized, product-driven approach to make enterprise data more agile and scalable. Her thought leadership (through influential articles and a 2022 book “Data Mesh: Delivering Data-Driven Value at Scale”) has sparked a global movement among companies to reorganize how they manage analytics data. Zhamak recently founded Nextdata (2022) to build platforms supporting data mesh principles. With a background in software engineering and distributed systems, she remains a hands-on innovator.
Zhamak is a frequent keynote speaker and blogger on data architecture and has quickly become one of the most respected thought leaders in big data design, helping enterprises treat data as a product, not just an afterthought.
- LinkedIn: Zhamak Dehghani
- X (Twitter): @zhamakd
- Blog Post: Data Mesh Principles
Abhishek Thakur
Nationality: Indian
Abhishek Thakur is a superstar in the competitive data science world and an influential content creator.
He earned fame by becoming the world’s first Quadruple Grandmaster on Kaggle – achieving top-tier Grandmaster status in Kaggle’s four categories of competition, kernels, discussions, and datasets. This reflects dozens of gold medals and winning solutions in machine learning contests. Abhishek has applied this expertise in industry as well, previously serving as Chief Data Scientist at Boost.ai and currently building AutoML tools at Hugging Face. He is the author of the popular book “Approaching (Almost) Any Machine Learning Problem” (2020) where he shares pragmatic advice from his competition experience.
Abhishek also runs a YouTube channel and blog where he breaks down complex ML topics for a wide audience. His achievements in global competitions and passion for knowledge-sharing have cemented him as a big data and ML influencer in the community.
- LinkedIn: Abhishek Thakur
- X (Twitter): @abhi1thakur
- Kaggle: abhishek
Ali Ghodsi
Nationality: Swedish-Iranian
Ali Ghodsi is the CEO and co-founder of Databricks, a leading big data and AI platform company valued at $62 billion in 2025.
With a PhD in distributed computing, Ali was an early contributor to the Apache Spark project that Databricks emerged from, working alongside its creators at UC Berkeley. He helped design aspects of Spark’s cluster management (he also co-created Apache Mesos in academia). Transitioning from academia to industry, Ali has led Databricks since 2016, overseeing its growth and the development of the Unified Data Analytics Platform (which integrates Spark with Delta Lake, MLflow, etc.). He is known for his product vision and for keeping Databricks closely tied to open-source innovation. Ali is also an Executive Chairman of Anyscale (company for the Ray project), reflecting his continued passion for cutting-edge distributed systems.
His unique journey from researcher to CEO exemplifies how to turn big data research into impactful enterprise technology.
- LinkedIn: Ali Ghodsi
- X (Twitter): @alighodsi
Wrap Up
These legends represent exceptional talent, making them extremely challenging to headhunt. However, there are thousands of other highly skilled IT professionals available to hire with our help. Contact us, and we will be happy to discuss your hiring needs.
Note: We’ve dedicated significant time and effort to creating and verifying this curated list of top talent. If you intend to share or make use of it in any way, we kindly ask that you include a backlink to the original source – EchoGlobal.