Top 18 Pandas Specialists for Data Analysis

pandas specialists - Top 18 Pandas Specialists for Data Analysis

Pandas has become a cornerstone of data analysis in Python, thanks to a global community of contributors, educators, and practitioners.

Below is an updated list of 18 top Pandas specialists around the world. This list spans core open-source contributors, startup founders who still code, influential bloggers/educators, industry experts using Pandas at scale, and competition champions. Each person is actively shaping the Pandas ecosystem in 2025, through code, teaching, or real-world applications.

  1. Matthew Rocklin
  2. Joris Van den Bossche
  3. Patrick Hoefler
  4. Tom Augspurger
  5. William (Will) Ayd
  6. Marc Garcia
  7. Ted Petrou
  8. Kevin Sheppard
  9. Chang She
  10. Matt Harrison
  11. Abhishek Thakur
  12. Gilberto Titericz (Giba)
  13. Jake VanderPlas
  14. Irv Lustig
  15. Kevin Markham
  16. Boris Paskhaver
  17. Devin Petersohn
  18. Phillip Cloud

Now, let’s delve into their remarkable journeys and contributions:

Matthew Rocklin

YouTube Video

Nationality: American

Matthew Rocklin is not a Pandas core dev, but as the creator of Dask, he’s intimately tied to Pandas’ ecosystem. Matthew developed Dask to scale Python analytics, allowing Pandas operations on big data by distributing work.

He has a PhD in physics and previously worked at Anaconda and NVIDIA’s RAPIDS team, optimizing Pandas for GPUs. In 2020, he founded Coiled Computing, a startup to bring Dask to enterprises. Matthew’s contributions include making Pandas release the GIL for better multithreading and coordinating Dask’s DataFrame API to match Pandas. He frequently writes and speaks about scaling Pandas (e.g., “Scaling Pandas with Dask” webinars). Under his leadership, Dask DataFrame has become a popular way to run Pandas-like analyses on large clusters. Matthew also co-authored academic papers on transparent scaling of Pandas workflows.

Joris Van den Bossche

Nationality: Belgian

Joris Van den Bossche is a long-time Pandas core developer and an advocate for the broader PyData ecosystem.

With a PhD in air quality research, Joris transitioned from science to software, contributing heavily to Pandas’ documentation, API design, and community outreach. He has led international tutorials on Pandas (PyData, EuroSciPy) and is also a maintainer of GeoPandas (bringing spatial data support to Pandas). In recent years, Joris has been working at Voltron Data to integrate Pandas with Apache Arrow and improve performance for next-generation data tools. Notably, he introduced the new nullable pd.NA and extension dtypes to unify missing-value handling in Pandas. Joris’s work connects Pandas with other libraries like scikit-learn (e.g., he co-developed the ColumnTransformer in scikit-learn).

Patrick Hoefler

Nationality: German

Patrick Höfler is a newer core contributor to Pandas who has quickly made a name for himself by tackling complex issues.

Based in Europe, Patrick started contributing around 2020 and is now a full core team member working on roadmap items like improved support for extension arrays and nullable dtypes. He also helps maintain the Dask integration. Patrick is notable for driving the “Arrow-backed” data types in Pandas 2.x and beyond, which boost performance for strings and categorical data. He actively engages in community outreach: for example, he leads PyLadies Berlin sprints to onboard new Pandas contributors. Patrick holds a Masters in Mathematics and is pursuing another in Software Engineering, bringing both theoretical and practical rigor. By day, he worked at Coiled and now at Citadel as a software engineer, ensuring that his Pandas expertise is applied in high-performance computing environments.

Tom Augspurger

Tom Augspurger - Top 18 Pandas Specialists for Data Analysis

Nationality: American

Tom Augspurger is a core developer of Pandas and Dask who has significantly improved Pandas’ performance and interoperability.

With a background in economics, Tom started using Pandas during grad school and soon became a contributor. He worked at Anaconda and later Microsoft, where he built scalable data tools on the Azure Planetary Computer team. Tom has been instrumental in developing Pandas extension arrays, optimized groupby operations, and the Pandas 1.x to 2.x transition. He is also a maintainer of Dask, ensuring that Dask’s distributed dataframes align with Pandas’ API. In 2024, Tom joined NVIDIA to work on GPU-accelerated data science. He is known for his “Modern Pandas” blog series, where he taught idiomatic Pandas patterns to users, and for co-maintaining pandas-stubs (type hints for Pandas).

William (Will) Ayd

Nationality: American

Will Ayd is a Pandas core maintainer since 2018 and an author of the Pandas Cookbook.

With a decade-plus of experience in data consulting, Will has helped many companies implement Pandas in production. He focuses on making Pandas more accessible and efficient: for example, he has contributed to performance enhancements and documentation. Will’s book Pandas Cookbook (3rd Ed., 2023) covers practical recipes from basic to advanced Pandas, and he frequently blogs about Pandas and Arrow integration (e.g., speeding up Pandas with Apache Arrow). Will has also spoken at PyData conferences about optimization techniques in Pandas. By day, he works as a data architect, applying Pandas in retail analytics.

Marc Garcia

Nationality: Spanish

Marc Garcia is a Pandas core developer and a data engineer who has championed many improvements in Pandas.

Based in Spain, Marc has 15+ years of Python experience and over 8 years in data engineering. He became a core contributor around 2017 and led a global documentation sprint to improve Pandas’ API reference. Marc’s contributions include enhancements to Pandas’ API consistency and the Arrow integration (he gave talks like “Pandas 2.0 and the Arrow Revolution”). He also co-created the Ibis project (for pandas-like analytics on SQL engines) and worked on the Airspeed Velocity benchmarking tool to track Pandas performance. Marc has received a NumFOCUS sustainability award for his work on open-source Pandas. He has also organized PyData conferences and taught Pandas tutorials in Europe and Latin America.

Ted Petrou

Ted Petrou - Top 18 Pandas Specialists for Data Analysis

Nationality: American

Ted Petrou is the author of Pandas Cookbook (2017) and founder of Dunder Data, a company dedicated to teaching Python data science.

With a master’s in statistics, Ted transitioned from being a professional poker player and math teacher to a data scientist and instructor. He wrote Pandas Cookbook and more recently Master Data Analysis with Python, which provide in-depth Pandas training through real-world examples. Ted has created popular libraries like dexplo and dexplot to extend Pandas’ data exploration and visualization capabilities. Based in Houston, he also founded the Houston Data Science Meetup to foster community learning. Ted is active on Medium and YouTube, sharing advanced Pandas techniques (e.g., method chaining, data cleaning with PyJanitor). He is recognized for his practical teaching style that often uncovers lesser-known Pandas features for power users.

Kevin Sheppard

Nationality: American

Dr. Kevin Sheppard is a Pandas maintainer who brings a rigorous econometrics background to the project.

An Associate Professor at Oxford University, Kevin is known for his contributions to statistical computing in Python (he created the ARCH and linearmodels libraries for financial econometrics). He joined the Pandas core team to improve time-series and numeric robustness. For example, Kevin helped enhance Pandas’ random number capabilities and statistical tests, and he maintains pandas-stats related add-ons. He also integrated his arch library with Pandas for easier time series analysis. Kevin has produced extensive teaching materials on Pandas for economics and finance students and is an advocate of Python in academic finance (he wrote a free e-book “Introduction to Python for Econometrics” that includes a new chapter on Pandas).

Chang She

Nationality: American

Chang She was the second major contributor to Pandas after Wes McKinney. A former financial quant, Chang joined Wes at AQR and co-founded the startup DataPad in 2013 to commercialize Pandas.

He was crucial in Pandas’ early growth, contributing to API design and implementation (especially in the 0.x series). Chang later managed engineering teams at Cloudera, focusing on big-data integration for Pandas. In recent years, he turned to entrepreneurship again, founding Eto Labs and then LanceDB, working on AI data infrastructure. While his current focus is a vector database, Chang’s legacy in Pandas persists – he laid much of the groundwork in its early code and mentored newer maintainers when the project was still young. He has spoken at Data Council and PyData events, reminiscing about Pandas’ creation and its design philosophy.

Matt Harrison

Matt Harrison - Top 18 Pandas Specialists for Data Analysis

Nationality: American

Matt Harrison is a Python and data science author and trainer who has written multiple books covering Pandas.

He runs MetaSnake, a corporate training firm for Python and data science. Matt has over 20 years of Python experience and has taught Pandas to many engineering teams. His books include Learning the Pandas Library (2016) and chapters on Pandas in Python for Data Science Handbook (as technical reviewer). Matt is known for breaking down complex topics into digestible “cookbook” recipes and for promoting the use of Pandas in production systems. He is active on social media and often shares “Pandas one-liners” and time-saving tricks with his ~160k Twitter followers. Matt’s training sessions (at companies and conferences) focus on making analysts comfortable with Pandas, from basic data cleaning to time series and joins.

Abhishek Thakur

You don’t need to be the best coder to win machine learning competitions. You need to be the best problem solver.

Nationality: Indian

Abhishek Thakur is a data scientist celebrated as the world’s first Kaggle quadruple Grandmaster (across competitions, notebooks, discussions, datasets).

With such extensive competitive experience, Abhishek has used Pandas in hundreds of machine learning projects and shared numerous Pandas tips in his award-winning Kaggle notebooks. He authored the book Approaching (Almost) Any Machine Learning Problem, which includes practical Pandas workflows for feature engineering and data cleaning. Formerly at Hugging Face, Abhishek is now building his own startup (Arcee AI) focused on automated machine learning. On his YouTube channel and blog, he often demonstrates how to analyze datasets using Pandas as a first step in modeling. Abhishek’s Kaggle insights (like efficient DataFrame operations for big competition data) are highly valued by the community.

Gilberto Titericz (Giba)

Nationality: Brazilian

Gilberto “Giba” Titericz Jr. is a legend in competitive data science, known for being a former #1 Kaggle Grandmaster who held the top rank for over two years.

Hailing from Brazil, Giba has won dozens of Kaggle competitions, and Pandas has been one of his go-to tools for data preprocessing and feature engineering. He later transitioned into industry roles: he worked on data science at Airbnb and is now a data scientist on NVIDIA’s RAPIDS team, which develops GPU-accelerated Pandas-like libraries. At NVIDIA, Gilberto contributes to cuDF (a Pandas equivalent on GPUs) and helps bridge Kaggle-style workflows to GPU dataframes. He frequently shares his expertise in meetups and interviews, explaining how to optimize Pandas code for speed. His journey from electrical engineer to data scientist is an inspiration for self-taught practitioners.

Jake VanderPlas

Jake VanderPlas - Top 18 Pandas Specialists for Data Analysis

Nationality: American

Jake VanderPlas is an astronomer-turned-data scientist who authored the Python Data Science Handbook, one of the most widely read introductions to Pandas (and other PyData tools).

As the former director of Open Software at UW’s eScience Institute, Jake focused on developing and promoting tools like Pandas for scientific research. He has contributed to Pandas in the form of documentation and feedback, and indirectly by creating Altair (a visualization library that works smoothly with Pandas DataFrames). Jake’s Handbook (first published in 2016) has a full chapter on data manipulation with Pandas, which has been freely available and instrumental in training new users. He’s also a frequent keynote speaker (PyCon 2017, etc.), where he often emphasizes the importance of accessible APIs – Pandas being a prime example. Currently, Jake works at Google on tools like JAX, but he remains an advocate of the Pandas paradigm for dataframes.

Irv Lustig

Nationality: American

Dr. Irv Lustig is an operations research (OR) veteran who has become a proponent of Pandas for analytics in the OR community. An INFORMS Fellow with decades of experience at IBM and Princeton Consultants, Irv specialized in optimization software.

In recent years, he recognized the power of Pandas for data wrangling in optimization projects (data-driven modeling). He created a popular Pandas cheat sheet for analytics practitioners, tailoring Pandas functionalities to those coming from Excel or R in the OR world. Irv gives webinars (e.g., with Gurobi) on “Data-First Optimization Development with Pandas,” teaching OR professionals how to preprocess and explore data with Pandas before feeding it into solvers. In 2024, he even delivered a talk titled “Pandas for Analytics Practitioners” at an INFORMS conference, bringing Pandas to a new audience. His emphasis is on using Pandas to structure real-world data for decision models efficiently.

Kevin Markham

Nationality: American

Kevin Markham is a data science educator best known as the founder of Data School.

He has taught Pandas to over a million students worldwide through his online courses and YouTube videos. Kevin’s engaging tutorials (like “Pandas Q&A” series on YouTube) and blog posts demystify common Pandas challenges (e.g., the notorious SettingWithCopyWarning). Formerly the lead data science instructor at General Assembly in Washington, D.C., Kevin launched Data School in 2014 to focus on accessible training for newcomers. He covers Pandas extensively in his curriculum, emphasizing real-world examples and clear explanations. Kevin is also a speaker at major conferences (PyCon, PyData) – his PyCon 2019 talk on “Pandas best practices” was highly rated. Through Data School’s newsletter and social media, he shares weekly Pandas tips, spreading best practices globally.

Boris Paskhaver

Boris Paskhaver - Top 18 Pandas Specialists for Data Analysis

Nationality: American

Boris Paskhaver is a software engineer and instructor who authored Pandas in Action (Manning, 2021), an accessible introduction to Pandas for developers.

Based in New York, Boris has taught over 300,000 students via his Udemy courses on Pandas and Python. He transitioned from a web developer to a data instructor, recognizing a need for clear, example-driven Pandas materials. Pandas in Action covers the library from fundamentals to advanced use, and Boris often uses analogies to SQL to help learners grasp Pandas concepts. He is active on social media, sharing Pandas tips and promoting a visual understanding of DataFrame operations (he’s known for visual aids in his book). Additionally, Boris creates YouTube content and writes articles about data manipulation best practices. Beyond Pandas, he teaches general programming, but Pandas remains one of his core specialties.

Devin Petersohn

Nationality: American

Devin Petersohn is the creator of Modin, an open-source project that aims to scale Pandas effortlessly by distributing computations across cores or clusters.

Devin developed Modin during his PhD at UC Berkeley RISELab. Modin’s premise is that you can replace import pandas as pd with import modin.pandas as pd and leverage multiple CPUs or nodes – effectively a parallel Pandas. Devin’s research on Modin won accolades for advancing scalable data science. He co-founded Ponder, a startup to commercialize Modin’s technology, where he serves as CTO. Through Ponder and open source, Devin continues to enhance Modin’s compatibility with the latest Pandas API and integrate with backends like Ray or Dask. He frequently speaks at meetups and podcasts about “scaling Pandas,” sharing insights on decoupling Pandas API from single-machine limitations.

Phillip Cloud

Nationality: American

Phillip Cloud is a software engineer who has been deeply involved in improving Pandas’ performance and interoperability. Formerly a Pandas core contributor, Phillip worked closely with Wes McKinney on the Ibis project and on integrating Apache Arrow with Pandas.

He contributed to Arrow’s Python API (pyarrow) and helped develop zero-copy data exchanges between Pandas and Arrow, addressing many “10 Things I Hate About pandas” issues. Phillip also pioneered the use of alternative computing backends in Pandas – for example, exploring how to offload string operations to Arrow or how to leverage Numba for accelerating group-bys. He has since taken a leadership role in the Ibis framework (a Pandas-like API for databases) as its current lead maintainer. Phillip’s broad vision is to make Pandas-style analysis composable and backend-agnostic, as outlined in collaborative blogs with Wes McKinney. He’s now at Voltron Data, continuing to unite Pandas with modern analytics engines.

Wrap Up

These legends represent exceptional talent, making them extremely challenging to headhunt. However, there are thousands of other highly skilled IT professionals available to hire with our help. Contact us, and we will be happy to discuss your hiring needs.

Note: We’ve dedicated significant time and effort to creating and verifying this curated list of top talent. However, if you believe a correction or addition is needed, feel free to reach out. We’ll gladly review and update the page.

Ready to get started?