20 Databricks Consultants You Can Trust

databricks consultants - 20 Databricks Consultants You Can Trust

Databricks’ rise has been propelled by a global community of founders, engineers, and advocates who drive innovation in data engineering, machine learning, and lakehouse architecture.

Below is a curated list of the most outstanding Databricks experts worldwide, selected for their open-source contributions, entrepreneurial leadership, technical influence, key industry roles, and accolades in data competitions. Each has a proven track record in the Apache Spark and Databricks ecosystem, whether by writing seminal code, scaling startups, educating the community, or winning recognition for technical excellence.

  1. Luan Moreno
  2. Michael Armbrust
  3. Bill Chambers
  4. Jacek Laskowski
  5. Bartosz Konieczny
  6. Denny Lee
  7. Simon Whiteley
  8. Holden Karau
  9. Adi Polak
  10. Jules S. Damji
  11. Ke Jia Yao
  12. Tathagata Das
  13. Scott Haines
  14. Derar Alhussein
  15. Nick Pentreath
  16. Rajaniesh Kaushikk
  17. Ben Wilson
  18. Terry McCann
  19. Prashanth Babu Velanati
  20. Hyukjin Kwon

Now, let’s delve deeper into their achievements and contributions.

Luan Moreno

YouTube Video

Nationality: Brazilian

Luan is a data platform architect and educator who has become a leading voice in Brazil’s big data community. He is the CEO & Founder of One Way Solutions, a consultancy focused on cloud data engineering, and has been honored as both a Microsoft MVP and a Databricks MVP for his community work.

Luan has a talent for building large-scale solutions on Azure Databricks – for instance, he led a project to implement a nationwide IoT data ingestion system using Spark Streaming and Delta Lake on Azure for a utilities company. He frequently shares his knowledge through Portuguese-language content: he launched the “Engenharia de Dados Academy” YouTube series to teach Spark and Databricks fundamentals to a Brazilian audience. Additionally, Luan has written blog articles and delivered talks on topics like Data Lakehouse best practices and implementing MLOps with Databricks. He often collaborates with Microsoft and Databricks as a subject matter expert, co-presenting in webinars about migrating from legacy ETL to Spark-based pipelines.

Michael Armbrust

Delta Lake is an open source, opinionated framework built on top of Spark for interacting with and maintaining data lake platforms that incorporates the lessons learned at Databricks from countless customer use cases.

Nationality: American

Michael is a Principal Engineer at Databricks and the mastermind behind Apache Spark SQL and Structured Streaming. He led the development of the Spark SQL engine and its Catalyst optimizer, making Spark a powerhouse for dataframes and interactive querying.

Michael was also a co-creator of Delta Lake, bringing ACID transactions to the data lakehouse; he served as the initial architect for Delta’s core design. His contributions have enabled enterprises to build reliable data pipelines and streaming applications on Databricks. Michael frequently speaks at Data + AI Summits and industry conferences, sharing deep technical insights on streaming, query optimization, and lakehouse best practices.

With an academic background from UC Berkeley, he continues to innovate at Databricks – recently working on Project Lightspeed to enhance Spark streaming performance. Michael’s work has directly influenced the big data community; he co-authored the book Learning Spark (2nd Edition), and as a mentor he’s known to engage with open-source contributors on issues and pull requests.

Bill Chambers

Nationality: American

Bill is a distinguished data engineer and author who helped democratize Spark for developers. As an early Solutions Architect at Databricks (and later Product Manager), Bill co-authored Spark: The Definitive Guide with Matei Zaharia – one of the most widely read books on Apache Spark.

In Databricks’ formative years, he was the product manager for the first version of Delta Lake, guiding its development to add reliable data management to Spark. Bill’s passion for teaching showed early: he created SparkTutorials.net to educate others, and even developed a Python data science course for Berkeley’s MIDS program. After Databricks, he joined Anyscale as a founding PM, applying his expertise to the Ray distributed AI platform. Now an entrepreneur, Bill continues to advise startups at the intersection of data and AI. He is revered for his ability to bridge hands-on coding with clear explanations, and his contributions have empowered thousands of engineers to succeed with Spark and Databricks.

Jacek Laskowski

Jacek Laskowski - 20 Databricks Consultants You Can Trust

Nationality: Polish

Jacek is a freelance data engineering consultant and one of the foremost experts on Apache Spark’s internals. Based in Poland, he has spent years drilling into Spark, Delta Lake, and Databricks, authoring comprehensive online books such as The Internals of Apache Spark and Mastering Spark SQL.

Jacek’s writings – freely available as GitBooks and blog posts – are revered by the community for clarifying the inner workings of Spark’s engines and APIs. He is an avid instructor, running workshops and trainings to help developers effectively use Spark, Kafka, and the Databricks platform. As a Databricks MVP, Jacek also actively assists others on Stack Overflow and the Databricks forums, and his detailed answers have earned him a top reputation among data engineers.

Bartosz Konieczny

Nationality: Polish

Bartosz is a freelance data engineer and influential blogger behind WaitingForCode.com, a rich repository of knowledge on Spark, Delta Lake, and data engineering design patterns. A Polish-born consultant now in France, he has 15+ years of hands-on experience and has become an AWS Data Hero and Databricks MVP through his community contributions.

Bartosz has written hundreds of in-depth articles demystifying everything from Spark’s Catalyst optimizer to Delta Lake’s transaction log, helping engineers worldwide troubleshoot and optimize their pipelines. In 2023, he authored Data Engineering Design Patterns (O’Reilly) – a book capturing best practices for building scalable data systems. Bartosz is also active on GitHub, where he shares demo projects and code samples accompanying his blog posts.

Denny Lee

Nationality: American

Denny is a Senior Developer Advocate at Databricks and an Apache Spark veteran who co-founded the Delta Lake open-source project. With decades of data engineering experience, Denny helped bring reliable data lake capabilities to Spark, spearheading Delta Lake’s community adoption.

He is a Spark and MLflow contributor and currently a maintainer of Delta Lake, where his contributions span code, documentation, and community evangelism. Denny’s influence is felt through his prolific writing and speaking: he co-authored Delta Lake: The Definitive Guide (O’Reilly, 2023) and regularly publishes articles on performance tuning, streaming, and lakehouse best practices. A self-professed “data junkie,” he built some of the first Petabyte-scale pipelines in the cloud and often shares war stories to help others avoid common pitfalls.

Denny also nurtures the next generation of data talent by answering questions on forums and featuring community contributors in Delta Lake spotlight blogs. His enthusiasm for technology and community has made him one of the most visible and approachable faces of Databricks.

Simon Whiteley

Simon Whiteley - 20 Databricks Consultants You Can Trust

Nationality: British

Simon is a prominent Databricks Lakehouse evangelist in the UK and co-founder of Advancing Analytics, a data consultancy specializing in Azure and Databricks solutions. As CTO of Advancing Analytics, he has led cutting-edge projects implementing Delta Lake and MLflow for enterprises, while also nurturing a knowledge-sharing culture through his popular YouTube channel. Simon’s Advancing Spark YouTube series and live streams have gained a global following for their practical demos of new Databricks features and honest discussions of data architecture best practices. A Microsoft Data Platform MVP and Databricks Beacon, he’s equally comfortable discussing Power BI integrations as he is performance tuning a Spark job. Simon frequently presents at conferences like SQLBits and the Data + AI Summit, often delving into the nitty-gritty of the medallion architecture, Delta Live Tables, and real-world use cases of the lakehouse. His enthusiasm makes complex tech approachable for all audiences. Through consulting and content creation, Simon has helped countless organizations and engineers unlock the full potential of Databricks.

Holden Karau

It is possible to construct a Spark query that fails on gigabytes of data but, when refactored and adjusted… succeeds on the same system with terabytes of data.

Nationality: Canadian

Holden is an open-source powerhouse and Apache Spark committer known for her influential contributions and advocacy in the big data world. A self-described “transgender Canadian geek,” she has co-authored several major books on Spark, including Learning Spark and High Performance Spark, distilling complex distributed systems concepts for developers.

Holden’s journey spans engineering roles at IBM, Google, and Netflix, where she improved platforms for large-scale data processing. In the Spark project, she has contributed code across the stack and is a PMC member, often focusing on PySpark, testing, and making Spark more accessible. She’s also active in related projects like Apache Beam, Kubeflow, and Airflow, bridging various data tools. Holden is a sought-after speaker globally — her workshops and talks are both entertaining and educational.

Currently, as a co-founder of a startup applying AI to healthcare paperwork, she exemplifies “coding while founding.” Holden is also a champion for diversity in tech, mentoring others and fostering an inclusive community. Her mix of deep technical skill and engaging personality has made her one of the most beloved figures in the Spark and Databricks ecosystem.

Adi Polak

Adi Polak - 20 Databricks Consultants You Can Trust

Nationality: Israeli

Adi is a software engineer turned tech influencer who bridges the worlds of big data analytics and machine learning. Currently a Director at a real-time data streaming company (Confluent) and formerly a Cloud Developer Advocate at Microsoft, Adi has been a Databricks ambassador and MVP, sharing her expertise in Spark and MLOps across communities.

She authored the forthcoming O’Reilly book Scaling Machine Learning with Spark, where she distills patterns for building ML pipelines on distributed data platforms. Adi is known for her energetic keynotes at the Data + AI Summit and other conferences, where she often covers how to operationalize AI on lakehouse architectures. In addition to her day job, she contributes to open source and academic efforts – for example, she has written about TensorFlow on Databricks and contributed samples for deep learning on Spark. With an MSc in computer science, Adi has a strong technical foundation which she leverages to advise startups and mentor women in tech.

Jules S. Damji

Nationality: American

Jules is a Lead Developer Advocate at Databricks and co-author of the definitive guidebook Learning Spark, 2nd Edition (O’Reilly). With over 20 years in the software industry, he has become a trusted mentor to data engineers and scientists adopting Apache Spark.

Jules focuses on “making big data simple” – whether by writing lucid blog posts on new Databricks features, creating hands-on tutorials, or speaking at meetups and summits. He has contributed to various open-source projects in the Databricks ecosystem, including examples and tooling for MLflow and Delta Lake. As an educator at heart, Jules frequently appears in Databricks webinar series interviewing fellow experts and spotlighting community use cases. His background spans roles at companies like Sun Microsystems and Microsoft, which gives him a broad perspective on technology evolution. Jules is also an adjunct instructor, having taught Spark courses to hundreds of students and professionals.

Ke Jia Yao

Nationality: Chinese

Kent Yao is one of the most prolific Apache Spark contributors in Asia and a leader in the open-source community. Based in China, he is an Apache Software Foundation member, Apache Spark PMC member, and the PMC Chair of Apache Kyuubi – an OSS project enabling multi-tenant Spark SQL servers.

Kent has contributed hundreds of patches to Spark across SQL, core, and Python APIs, making him the top individual Spark contributor globally by commit count. He is passionate about improving Spark’s usability and performance; for example, he helped develop Spark’s Kubernetes support and optimized PySpark’s pandas API (Koalas) for broader adoption. Kent is also known for fostering collaboration between the Chinese big data community and global projects – he often blogs in both Chinese and English and has organized local meetups/hackathons.

Tathagata Das

Tathagata Das - 20 Databricks Consultants You Can Trust

Nationality: Indian

Tathagata (TD) is the original creator of Apache Spark Streaming and a key architect of structured streaming at Databricks. As a Staff Software Engineer, he has been a core developer of Apache Spark since the project’s early days, focusing on real-time data processing and streaming APIs.

Tathagata designed Spark’s first streaming model (D-Streams) during his PhD at UC Berkeley, and later co-authored the research that evolved into Structured Streaming, enabling continuous applications with exactly-once guarantees. At Databricks, he also contributed to Delta Lake’s integration with streaming, making it possible to build reliable end-to-end pipelines on one platform. Tathagata is a member of the Spark PMC and even co-authored Learning Spark, 2nd Edition, reflecting his commitment to educating users. His innovations have earned him awards like the Facebook Fellowship during grad school, and he continues to push the envelope on stream processing at Databricks.

Scott Haines

Nationality: American

Scott is a seasoned data engineer and author who specializes in streaming analytics at scale. He is currently a Distinguished Software Engineer at Nike, where he leads the design of real-time data pipelines and personalization platforms. Scott is best known in the community as the author of Modern Data Engineering with Apache Spark (Apress, 2022), a hands-on guide to building mission-critical streaming applications.

In it, he shares lessons learned from his time at companies like Yahoo and Twilio, where he built large-scale analytics systems processing billions of events. Scott’s expertise spans Spark Streaming, Kafka, and NoSQL stores, often combining them in Lambda/Kappa architectures for low-latency analytics. He actively contributes to open source – for example, he’s provided feedback and minor patches to Spark’s streaming subsystem – and writes articles on his Medium blog about telemetry and data pipeline design. As a Databricks Champion, Scott enjoys helping others debug performance issues or tune their cluster configurations.

Derar Alhussein

Nationality: French

Derar is a data engineering educator and cloud consultant based in France, known for his impactful training materials on Apache Spark and Databricks. He wears many hats – Udemy instructor, O’Reilly author, and Databricks MVP – reflecting his dedication to teaching others.

Derar created one of the highest-rated online courses for Spark on Udemy, guiding thousands of students through hands-on exercises in data processing and analysis. He also authored an O’Reilly video series on developing end-to-end data pipelines on the lakehouse. With a background in software engineering, Derar has consulted for companies in EMEA, helping them migrate legacy ETL to modern Spark-based workflows on Azure Databricks. He’s recognized for breaking down complex topics like performance tuning, job scheduling, and Delta Lake optimization into digestible lessons.

Active in the Databricks Community, Derar often answers questions and writes blog posts – for instance, he has written about optimizing Spark join strategies and debugging out-of-memory errors. He was honored as a Databricks MVP in 2023 for these community contributions and was featured in Databricks’ “Meet the MVP” series. Approachable and passionate, Derar continues to lower the barrier to entry for big data technologies through education and mentorship.

Nick Pentreath

Nick Pentreath 1 - 20 Databricks Consultants You Can Trust

Nationality: South African

Nick is a leading expert at the intersection of machine learning and big data, renowned for his early work on Spark’s MLlib and his contributions to ML systems in industry. He co-founded Graphflow, a machine learning startup focused on recommendation systems, which was acquired after he built its core product.

Nick later became a Principal Engineer at IBM’s CODAIT group, where he worked on open-source AI projects and continued to contribute as a Spark PMC member and committer, particularly to Spark’s machine learning library. He authored Machine Learning with Spark (Packt, 2015), one of the first books to illustrate how to implement scalable ML pipelines on Apache Spark.

Nick has also competed in data science contests and was a top-ranked Kaggler in the early 2010s, bringing a competitive edge to his practical solutions. In recent years, he’s been active in the TensorFlow and Ray communities, bridging Spark with deep learning and reinforcement learning workflows.

Rajaniesh Kaushikk

Nationality: Indian

Rajaniesh is an enterprise architect and prolific content creator who champions Azure Databricks adoption in enterprise settings. With over 22 years of IT experience, he holds dual titles as a Microsoft MVP and Databricks MVP, reflecting his expertise in both Azure cloud and the Databricks platform.

Rajaniesh leads cloud solution architecture at a global firm, where he designs end-to-end data platforms combining Azure services, Power BI, and Databricks to drive business transformation. On the side, he runs a technical blog “Beyond the Horizon” sharing step-by-step guides on topics like Databricks Delta Live Tables, MLflow model serving, and integrating Databricks with Azure Synapse. He also produces YouTube videos and demo code on GitHub, offering tips and tricks for automation, debugging, and best practices. Rajaniesh’s community involvement is extensive: he’s a frequent speaker at Azure and data summits in India and the Middle East, and he mentors young architects through local meetup groups.

Ben Wilson

Nationality: American

Ben is a Principal Architect at Databricks and a machine learning engineering guru who has helped dozens of companies take ML projects from prototype to production. He is the creator of the Databricks Labs AutoML Toolkit, an open-source project that automates model development and tracking using MLflow.

Ben’s deep expertise in MLOps culminated in his book Machine Learning Engineering in Action (Manning, 2022), where he provides a framework for building reliable, scalable AI systems. Before Databricks, Ben worked as a data scientist and solutions architect in industries ranging from semiconductors to e-commerce, often leading teams to build data platforms from scratch. He is also a committer to MLflow and has contributed features to improve its integration with Databricks. As a frequent speaker, Ben covers topics like project planning for AI, monitoring drift in production, and enabling CI/CD for data pipelines.

Terry McCann

Terry McCann 1 - 20 Databricks Consultants You Can Trust

Nationality: British

Terry is a data engineering leader and entrepreneur who has significantly influenced the UK’s data community. He is the CEO and Principal Consultant of Advancing Analytics, a consultancy he founded to help businesses maximize value from data using Azure and Databricks tools.

Terry is also a Microsoft MVP in Data Platform, recognized for his expertise in Azure Synapse, Azure Databricks, and end-to-end analytics solutions. With a background that includes a Master’s in Data Science (Distinction), he combines academic rigor with real-world pragmatism. Terry has implemented modern lakehouse architectures for clients across finance, retail, and gaming, often featuring Databricks as the processing engine and Power BI for reporting. A passionate educator, he runs the “Data Science in Production” user group and has organized events like SQL Saturday in Exeter. Terry is a fixture at international conferences – his sessions at SQLBits and PASS Summit cover topics like “Engineering vs. Data Science in Databricks” and applying DevOps to machine learning.

Prashanth Babu Velanati

Nationality: Indian

Prashanth is a Lead Solutions Architect at Databricks in Europe, revered for his ability to design and evangelize advanced lakehouse solutions. He has over a decade of experience in big data and has become a go-to expert for Databricks’ field engagements, guiding Fortune 500 customers on their data strategies.

Prashanth is also a Databricks Certified Developer and recently co-authored Delta Lake: The Definitive Guide (O’Reilly, 2023) with Denny Lee and others. Within Databricks, Prashanth contributes to best practice guides and reference architectures. He actively shares knowledge on Databricks forums and Stack Overflow, answering complex questions around cluster sizing, multi-cloud deployments, and security setup. Prashanth’s background includes working at Hadoop-based startups, which gives him a unique comparative insight he uses to help clients modernize to Spark on the cloud.

Hyukjin Kwon

Nationality: South Korean

Hyukjin is a Staff Software Engineer at Databricks and the #1 contributor to Apache Spark open source, with a focus on PySpark and pandas API on Spark. Hailing from South Korea, Hyukjin has taken on a tech lead role for PySpark, ensuring that Python users get first-class support and performance when working with Spark.

He was a major force behind Koalas (pandas API for Spark) which later became PySpark’s pandas API, helping Python data scientists scale their code seamlessly. Beyond Spark, Hyukjin contributes to multiple projects and is known for thorough code reviews that uphold Spark’s quality. He engages actively with the community through JIRA, mailing lists, and talks – in PyData conferences he’s shared strategies for scaling pandas workloads with PySpark. Hyukjin’s dedication earned him an Apache Software Foundation Membership, and he is often acknowledged by colleagues as a linchpin of Spark’s development.

Wrap Up

These legends represent exceptional talent, making them extremely challenging to headhunt. However, there are thousands of other highly skilled IT professionals available to hire with our help. Contact us, and we will be happy to discuss your hiring needs.

Note: We’ve dedicated significant time and effort to creating and verifying this curated list of top talent. However, if you believe a correction or addition is needed, feel free to reach out. We’ll gladly review and update the page.

Ready to get started?