Mastering the Google Cloud Certified Professional Data Engineer Exam 2024
Are you considering becoming a Google Cloud Certified Professional Data Engineer?
In this article, we’ll explore the recent updates to the exam and training materials for 2024, an overview of my experience taking the exam, and what it takes to prepare for and pass the exam on the first try!
Introduction
About Me
My path to passing the Google Certified Professional Data Engineer exam is a little different than most, so I’d like to share my perspective on the recent changes to this exam and how you can be successful in preparing for and passing this exam.
My experience in Google Cloud started in 2014 (2008 if you count Gmail and “Google Apps” aka Workspace). My first Google certification was the Professional Cloud Architect certification in 2021 and one of the first to obtain the (then BETA) Professional Cloud Database Engineer certification in 2022.
Since the Generative AI wave started in 2023, I had started preparing for the Professional Machine Learning Engineer certification which I most recently passed in May 2024. While my hands-on experience and SkillsBoost learning path for the ML certification really helped in passing this exam, having more foundational knowledge in Data Engineering would have made it much easier. As a result, I decided to circle back to the Professional Data Engineer learning path and attempt the exam — after already obtaining the Database and Machine Learning certifications.
In hindsight, I would recommend that anyone considering the Database or Machine Learning certification START with the Data Engineer certification (which is the opposite of what I did;).
Let’s explore how to get started preparing for the Google Cloud Professional Data Engineer certification!
Preparation Steps
Self-Assessment
It’s important to understand where you are starting from. As with any professional-level certification, you should have a number of years of hands-on experience with the tools. The exam guidelines state:
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using Google Cloud.
Independent of the Google-specific content, it’s important to have a fundamental understanding of core Data Engineering concepts. If you are just starting out, check out some of the foundational Data Engineering content on Coursera.
Get Familiar with the Exam Topics
Next, check out the Exam Topics which are available from the Google Cloud Certification page: https://cloud.google.com/learn/certification/guides/data-engineer
- Section 1: Designing data processing systems (~22% of the exam)
- Section 2: Ingesting and processing the data (~25% of the exam)
- Section 3: Storing the data (~20% of the exam)
- Section 4: Preparing and using data for analysis (~15% of the exam)
- Section 5: Maintaining and automating data workloads (~18% of the exam)
These topics were last updated in late 2023 — which IMHO recognizes that some of the former content is better covered in the Database and Machine Learning certification paths.
According to a recent Google Cloud Community post:
“The new Professional Data Engineer exam reflects updates to Google Cloud’s data storing, data sharing, and data governance solutions. The new exam has less emphasis on operationalizing machine learning models and more focus on the data engineer’s role in making data usable and valuable. As machine learning and AI continue to evolve, the critical skill of making data usable places data engineers as a critical path for enterprise success.”
Google Cloud SkillsBoost Training:
Google provides a comprehensive self-paced learning path through Google Cloud SkillsBoost — which includes both video lessons and hands-on labs to familiarize you with the topics and tools covered on the exam. The learning path is maintained by Google and aligned to the exam guide.
Professional Data Engineer Learning Path: https://www.cloudskillsboost.google/paths/16
Supplemental Reading and Practice Tests:
One of the best resources that I have found to study for the Google Cloud Professional Certification exams is the Official Study Guide published by Sybex. While the Professional Data Engineer Study Guide (1st edition) was published in 2020 (before the latest update to the exam), it still is very relevant and covers many of the key topics on the exam. Addtionally, the study guide includes a number of additional practice questions.
Hands-on Experience:
Finally, taking what you have learned from the SkillsBoost learning path and Official Study Guide and applying it with hands-on access to the tools in Google Cloud give you the practical expereince required to be successful on the exam (and beyond).
To get started for free, check out https://cloud.google.com/blog/products/gcp/getting-started-with-google-cloud-for-free
Overview of Exam Topics
Data Pipelines & ETL
Expect a number of questions around various Google Cloud tools for building and maintaining data pipelines — including ETL. Questions may cover the following tools:
- Dataflow
- Dataproc
- Dataprep
Pub/Sub & Kafka (and connectors)
Familiarize yourself with how Pub/Sub and Kafka could be leveraged for streaming data and what connectors existing for various Google Cloud Databases.
BigQuery
BigQuery is probably to most important tool in a Data Engineer’s toolbox. Be sure to understand how BigQuery works, how to connect to BigQuery, how to secure your data, and ways to ingest data to BigQuery.
Big Lake / BigQuery Omni
Similarly, know how Big Lake and BigQuery Omni work and how they allow you to incorporate external data sources into BigQuery.
Dataplex
A recent addition to the learning path is a section for Dataplex. Know what Dataplex does and how it contributes to Data Governance and Data Lineage.
IAM Roles
Be sure to review IAM roles related to Cloud Storage, Databases, and Data Pipeline tools — as well as BigQuery.
SQL Proxy
While there was not as much as a focus on Cloud SQL, know how and where to leverage SQL Proxy to provide secure access to Cloud SQL.
Cloud Composer / Airflow DAGs
Get familiar with Cloud Composer and Airflow as they related to pipeline orchestration as well as associated DAGs.
Hadoop
Related, know what Apache Hadoop does and how it can be used in Google Cloud.
Cloud Storage
Perhaps the most versatile data storage option on Google Cloud, Cloud Storage may be mentioned in a number of different ways:
- Retention Policy
- IAM role
- Storage Classes
- Region vs multi-region
Missing Exam Topics
Some of the topics that were not well represented on the exam, but were part of the Exam Study Guide (1st edition) were as follows:
Database
While Google managed databases such as BigQuery, CloudSQL, and Firestore were included, there was not the focus that I expected on things like Spanner and BigTable. Even Cloud SQL was limited to SQL Proxy related questions. The de-emphasis of Database questions is like a reaction to the Database Engineer certification where these types of questions were more relevant.
Machine Learning
Similar to the lack of focus on Database questions, Machine Learning was limited to questions around preparing data for Machine Learning as opposed to MLOps or Model Training. Again, this change is likely due to a better focus on ML in the Machine Learning certification.
Additional Tips and Recommendations
Once I got started, I found that developing a habit of 1-hour per day to study, participate in SkillsBoost training and labs, and practice test questions was the best way to prepare and stay on track.
Preparing for the exam
I cannot stress enough the importance of a good night’s sleep and a positive mindset before the exam.
After the Exam
Once you have obtained your certification, consider actively participating in online forums and communities to connect with other aspiring data engineers to share your story!
Conclusion
In this article, I shared my experience preparing for and completing the Google Cloud Professional Data Engineer certification path — including a reflection on what to expect on the exam and some resources to help you prepare.
For those of you that have successfully obtained this certification — Congratulations! And for those of you actively preparing, I hope that this information was helpful and wish you best of luck in completing the exam!
Finally, please share i the comments any tips or resources that have helped you on your certification journey!