Google Cloud

Free PMLE - Professional Machine Learning Engineer Practice Questions

Test your knowledge with 10 free sample practice questions for the PMLE - Professional Machine Learning Engineer certification. Each question includes a detailed explanation to help you learn.

10 Questions

No time limit

Free - No signup required

Disclaimer: These are original, AI-generated practice questions created by ProctorPulse for exam preparation purposes. They are not sourced from any official exam and are not affiliated with or endorsed by Google Cloud. Use them as a study aid alongside official preparation materials.

Question 1: A deployed machine learning model is experiencing concept drift, affecting its performance. Which strategy should be implemented to address and manage this issue effectively?

A. Set up regular intervals for retraining the model based on accumulated new data. (Correct Answer)
B. Increase the complexity of the model to reduce the impact of concept drift.
C. Establish an alert system to detect changes in data distribution and trigger retraining accordingly. (Correct Answer)
D. Rely on the original training data and adjust the prediction threshold as needed.

Explanation: To manage concept drift effectively in a production environment, it is important to implement strategies that ensure the model adapts to new patterns in data. Regularly retraining the model on new data (Option A) helps it learn from the latest trends. Additionally, setting up alerts to detect shifts in data distribution and trigger retraining (Option C) ensures timely updates to the model, maintaining its accuracy and relevance. Increasing model complexity (Option B) does not directly address concept drift, and relying solely on the original data (Option D) may fail to capture new data patterns.

Question 2: (Select all that apply) A company has developed an ML solution that processes user data. Due to new data privacy regulations, the company must ensure compliance. What actions should the company take to maintain compliance with these regulations?

A. Conduct a privacy impact assessment to understand potential risks. (Correct Answer)
B. Implement a data retention policy aligning with the new regulations. (Correct Answer)
C. Encrypt all data at rest and in transit to enhance security. (Correct Answer)
D. Disable all user data logging to eliminate privacy concerns.

Explanation: To ensure compliance with new data privacy regulations, a company should conduct a privacy impact assessment to evaluate and mitigate potential risks (A), implement a data retention policy that aligns with legal requirements (B), and ensure data security by encrypting data at rest and in transit (C). Simply disabling data logging (D) is not a practical or comprehensive compliance strategy as it may impact the functionality and auditability of the solution.

Question 3: An ML operations team experienced a production outage when their model serving infrastructure failed. While they successfully restored the trained model artifacts from backup storage, the system remained non-functional because critical dependencies were missing. To prevent similar incidents, what components should be included in a comprehensive ML system backup strategy?

A. Model weights files, serialized model objects, and the hyperparameter configuration file used during the final training run
B. Model artifacts, preprocessing transformation logic, feature engineering code, dependency specifications, and serving configuration files (Correct Answer)
C. Trained model checkpoints, training dataset snapshots, and the original raw data sources used for initial model development
D. Model binary files, the training script that generated the model, and archived logs from the most recent training job execution

Explanation: Maintaining ML solutions requires backing up all components necessary for complete system restoration, not just model files. A production ML system depends on: (1) model artifacts containing learned parameters, (2) preprocessing pipelines that transform incoming data to match training expectations, (3) feature engineering code that derives computed features, (4) dependency specifications ensuring correct library versions, and (5) serving configurations defining how the model processes requests. Option A focuses only on model-related files without preprocessing or serving components. Option C emphasizes training assets rather than deployment dependencies. Option D includes training artifacts but misses the preprocessing and feature engineering logic essential for inference. Complete backup procedures must capture the entire inference pipeline to enable rapid recovery from infrastructure failures.

Question 4: An ML pipeline serving production predictions has accumulated technical debt: hardcoded date transformations assume UTC timezone, preprocessing functions are duplicated across three notebook files, and the pipeline uses a deprecated but functional API endpoint scheduled for removal in 6 months. Resource constraints allow addressing only one issue this quarter. How should you prioritize remediation efforts?

A. Refactor the duplicated preprocessing functions into a shared module to reduce maintenance overhead and ensure consistency across all pipeline components
B. Replace the deprecated API endpoint immediately since it has a known sunset date that creates a predictable operational risk (Correct Answer)
C. Document the hardcoded timezone assumptions and create monitoring alerts for timezone-related prediction anomalies before addressing other technical debt
D. Establish a comprehensive technical debt inventory with risk scoring across all three issues before committing resources to any single remediation effort

Explanation: When prioritizing technical debt in ML systems, the deprecated API endpoint with a 6-month sunset date presents the highest immediate risk. This issue has a known failure timeline and could cause complete service disruption if not addressed before removal. The duplicated preprocessing logic (option A) creates maintenance burden but doesn't threaten system availability. The timezone assumptions (option C) may cause data quality issues but the system currently functions, making monitoring a reasonable interim step rather than immediate priority. Creating an inventory (option D) is valuable for long-term planning but delays addressing the time-bound critical risk. In maintaining ML solutions, prioritize technical debt that poses concrete operational risks with defined timelines over debt that primarily affects development efficiency or data quality without imminent failure scenarios.

Question 5: What approach should you take to update the external libraries of a deployed machine learning model to ensure minimal disruption to the service?

A. Create a Docker container with the updated dependencies and perform a canary release. (Correct Answer)
B. Update the libraries directly on the production server during low-traffic hours.
C. Use a virtual environment to test the updates and then apply them to the production system.
D. Implement a serverless architecture to handle updates automatically.

Explanation: Using a Docker container to encapsulate your application along with its dependencies allows you to manage updates in a controlled environment. A canary release strategy involves deploying the update to a small subset of users first, reducing the risk of widespread disruption if any issues arise. This approach aligns with maintaining ML solutions by ensuring stability and performance through careful dependency management.

Question 6: A company needs to ensure rapid recovery of its machine learning model and associated data following a data center outage. Which strategy would best facilitate this recovery process?

A. Store model artifacts and data in a geographically distributed cloud storage with versioning enabled. (Correct Answer)
B. Utilize a single on-premises server with RAID configuration for storing model artifacts and data.
C. Implement a nightly backup of model artifacts and data to an external hard drive stored onsite.
D. Schedule weekly data export to a regional data center with no automated recovery procedures.

Explanation: To minimize downtime and ensure rapid recovery, it's crucial to store model artifacts and data in a geographically distributed cloud storage with versioning. This approach provides redundancy and ensures that the data is accessible even if one location is compromised. Additionally, versioning helps in recovering the specific version of the model and data needed. Other options, like using a single on-premises server or external hard drive, are less reliable due to single points of failure and slower recovery times.

Question 7: A production recommendation system deployed six months ago shows declining user engagement metrics, with click-through rates dropping from 8.2% to 5.7%. Analysis reveals that user interaction patterns have shifted toward mobile app usage and shorter browsing sessions, while the model was trained on desktop-heavy traffic data. What retraining trigger approach would most reliably identify this type of gradual performance degradation?

A. Implement a statistical process control chart tracking prediction confidence scores with control limits set at ±2 standard deviations from the training baseline, triggering retraining when consecutive batches exceed thresholds
B. Configure a sliding window comparison that monitors the distribution distance between incoming feature data and training data using KL divergence, triggering retraining when divergence exceeds a calibrated threshold (Correct Answer)
C. Schedule calendar-based retraining every 90 days regardless of performance metrics, ensuring the model periodically incorporates recent data patterns into the training corpus
D. Deploy a real-time alerting system that triggers immediate retraining whenever daily prediction latency increases by more than 15% compared to the previous week's average

Explanation: This scenario describes data drift—a gradual shift in the statistical properties of input features (desktop to mobile, longer to shorter sessions) that degrades model performance over time. Option B correctly identifies distribution monitoring using divergence metrics (such as KL divergence, PSI, or Wasserstein distance) as the most effective approach for detecting gradual drift. By continuously comparing incoming data distributions against training distributions in a sliding window, this method directly measures the core issue: changing feature characteristics. Option A monitors prediction confidence, which may remain artificially stable even as the model makes incorrect predictions with high confidence on drifted data. Option C uses time-based triggers that lack responsiveness to actual drift patterns—the model might retrain too early or too late relative to actual performance degradation. Option D focuses on latency, an infrastructure concern unrelated to prediction quality or data drift. For maintaining ML solutions experiencing gradual drift, distribution-based monitoring provides the most direct signal that the production data no longer resembles training conditions, enabling timely and justified retraining decisions.

Question 8: (Select all that apply) A financial services company deploys ML models that process sensitive customer data in production. The ML engineering team needs to establish dependency management practices that minimize supply chain security risks while maintaining reproducibility. Which practices would help protect against malicious package injection and dependency vulnerabilities?

A. Pin all Python package versions with cryptographic hash verification in requirements files, and configure the package installer to reject packages without matching hashes to ensure installation integrity (Correct Answer)
B. Use a private package repository mirror that scans and caches approved packages, combined with network policies that prevent direct installation from public repositories during model deployment (Correct Answer)
C. Configure automated dependency scanning tools that run on each training pipeline execution to detect known vulnerabilities, and establish a quarterly manual review process for updating pinned versions
D. Implement a centralized requirements management system where all package versions are tested in an isolated environment before approval, with automated rollback capabilities if post-deployment issues occur (Correct Answer)

Explanation: Secure dependency management for production ML systems requires multiple complementary approaches. Option A is correct because hash verification (using tools like pip's --require-hashes flag) ensures that installed packages match exact cryptographic signatures, preventing substitution attacks even if a repository is compromised. Option B is correct as private mirrors with security scanning create a controlled supply chain where only vetted packages are available, and network policies enforce this boundary during deployment. Option D is correct because centralized testing and approval workflows ensure packages work correctly in the target environment before production use, while rollback capabilities provide recovery options. Option C is insufficient alone because quarterly manual reviews create large windows of exposure to newly discovered vulnerabilities, and scanning during training doesn't protect deployment environments. Effective ML solution maintenance combines preventive controls (hash pinning, private repositories) with detective controls (scanning) and response mechanisms (rollback), forming defense-in-depth against supply chain attacks that could compromise sensitive financial data.

Question 9: Given the quarterly regulatory changes and multi-jurisdictional requirements, which approach provides the most sustainable framework for maintaining compliance while minimizing operational overhead?

A. Implement a centralized compliance metadata registry that maps model artifacts to jurisdiction-specific requirements, coupled with automated policy engines that trigger re-evaluation workflows when regulatory updates are ingested, and maintain versioned compliance attestations linked to model lineage graphs (Correct Answer)
B. Deploy separate model training and serving infrastructure in each jurisdiction with localized compliance controls, establish manual quarterly audit cycles where compliance officers review model documentation, and implement hard geographic boundaries that prevent cross-jurisdiction data flow
C. Create a compliance dashboard that aggregates model performance metrics across jurisdictions, schedule quarterly retraining cycles aligned with regulatory review periods, and maintain a centralized data warehouse with encryption-at-rest to satisfy all privacy requirements simultaneously
D. Establish region-specific feature stores with jurisdiction-tagged datasets, implement quarterly compliance sprints where engineering teams manually update model documentation to reflect new regulations, and deploy versioned models with jurisdiction identifiers in their metadata tags

Explanation: Maintaining ML solutions in regulated environments requires scalable, automated compliance frameworks that adapt to regulatory evolution. Option A establishes a sustainable architecture through: (1) a compliance metadata registry that creates a single source of truth for jurisdiction-specific requirements mapped to model artifacts, (2) automated policy engines that proactively detect regulatory changes and trigger necessary workflows (re-evaluation, attestation updates, or remediation), (3) versioned compliance attestations that create auditable records tied to model lineage, enabling historical compliance verification. This approach minimizes operational overhead by automating compliance monitoring while maintaining the flexibility to adapt to quarterly changes. Option B relies heavily on manual processes (quarterly audits, manual reviews) that don't scale with regulatory change frequency and creates operational silos. Option C conflates performance monitoring with compliance maintenance and incorrectly assumes a single encryption approach satisfies diverse jurisdictional requirements (data residency, purpose limitation, consent management vary significantly). Option D depends on manual quarterly sprints, which introduces lag between regulatory changes and compliance updates, creating risk windows. The competency of maintaining ML solutions emphasizes establishing proactive, automated mechanisms that ensure continuous compliance rather than reactive, manual processes.

Question 10: What is an effective approach to manage and reduce technical debt in an ML project that was deployed hastily?

A. Conduct regular code reviews and refactor the codebase incrementally. (Correct Answer)
B. Focus solely on adding new features to outpace technical debt issues.
C. Ignore the existing technical debt and prioritize performance improvements.
D. Establish a dedicated team to monitor and address technical debt continuously.

Explanation: Conducting regular code reviews and refactoring the codebase incrementally is an effective way to manage and reduce technical debt. This allows for continuous improvement and ensures that the code remains maintainable and scalable over time. Ignoring debt or focusing only on new features can exacerbate the problem, while a dedicated team may not be feasible or necessary for all projects.

Question 1Medium

A deployed machine learning model is experiencing concept drift, affecting its performance. Which strategy should be implemented to address and manage this issue effectively?

(Select all that apply)

ASet up regular intervals for retraining the model based on accumulated new data.

BIncrease the complexity of the model to reduce the impact of concept drift.

CEstablish an alert system to detect changes in data distribution and trigger retraining accordingly.

DRely on the original training data and adjust the prediction threshold as needed.

Question 2Medium

(Select all that apply) A company has developed an ML solution that processes user data. Due to new data privacy regulations, the company must ensure compliance. What actions should the company take to maintain compliance with these regulations?

(Select all that apply)

AConduct a privacy impact assessment to understand potential risks.

BImplement a data retention policy aligning with the new regulations.

CEncrypt all data at rest and in transit to enhance security.

DDisable all user data logging to eliminate privacy concerns.

Question 3Easy

An ML operations team experienced a production outage when their model serving infrastructure failed. While they successfully restored the trained model artifacts from backup storage, the system remained non-functional because critical dependencies were missing. To prevent similar incidents, what components should be included in a comprehensive ML system backup strategy?

AModel weights files, serialized model objects, and the hyperparameter configuration file used during the final training run

BModel artifacts, preprocessing transformation logic, feature engineering code, dependency specifications, and serving configuration files

CTrained model checkpoints, training dataset snapshots, and the original raw data sources used for initial model development

DModel binary files, the training script that generated the model, and archived logs from the most recent training job execution

Question 4Medium

An ML pipeline serving production predictions has accumulated technical debt: hardcoded date transformations assume UTC timezone, preprocessing functions are duplicated across three notebook files, and the pipeline uses a deprecated but functional API endpoint scheduled for removal in 6 months. Resource constraints allow addressing only one issue this quarter. How should you prioritize remediation efforts?

ARefactor the duplicated preprocessing functions into a shared module to reduce maintenance overhead and ensure consistency across all pipeline components

BReplace the deprecated API endpoint immediately since it has a known sunset date that creates a predictable operational risk

CDocument the hardcoded timezone assumptions and create monitoring alerts for timezone-related prediction anomalies before addressing other technical debt

DEstablish a comprehensive technical debt inventory with risk scoring across all three issues before committing resources to any single remediation effort

Question 5Medium

What approach should you take to update the external libraries of a deployed machine learning model to ensure minimal disruption to the service?

ACreate a Docker container with the updated dependencies and perform a canary release.

BUpdate the libraries directly on the production server during low-traffic hours.

CUse a virtual environment to test the updates and then apply them to the production system.

DImplement a serverless architecture to handle updates automatically.

Question 6Hard

A company needs to ensure rapid recovery of its machine learning model and associated data following a data center outage. Which strategy would best facilitate this recovery process?

AStore model artifacts and data in a geographically distributed cloud storage with versioning enabled.

BUtilize a single on-premises server with RAID configuration for storing model artifacts and data.

CImplement a nightly backup of model artifacts and data to an external hard drive stored onsite.

DSchedule weekly data export to a regional data center with no automated recovery procedures.

Question 7Medium

A production recommendation system deployed six months ago shows declining user engagement metrics, with click-through rates dropping from 8.2% to 5.7%. Analysis reveals that user interaction patterns have shifted toward mobile app usage and shorter browsing sessions, while the model was trained on desktop-heavy traffic data. What retraining trigger approach would most reliably identify this type of gradual performance degradation?

AImplement a statistical process control chart tracking prediction confidence scores with control limits set at ±2 standard deviations from the training baseline, triggering retraining when consecutive batches exceed thresholds

BConfigure a sliding window comparison that monitors the distribution distance between incoming feature data and training data using KL divergence, triggering retraining when divergence exceeds a calibrated threshold

CSchedule calendar-based retraining every 90 days regardless of performance metrics, ensuring the model periodically incorporates recent data patterns into the training corpus

DDeploy a real-time alerting system that triggers immediate retraining whenever daily prediction latency increases by more than 15% compared to the previous week's average

Question 8Medium

(Select all that apply) A financial services company deploys ML models that process sensitive customer data in production. The ML engineering team needs to establish dependency management practices that minimize supply chain security risks while maintaining reproducibility. Which practices would help protect against malicious package injection and dependency vulnerabilities?

(Select all that apply)

APin all Python package versions with cryptographic hash verification in requirements files, and configure the package installer to reject packages without matching hashes to ensure installation integrity

BUse a private package repository mirror that scans and caches approved packages, combined with network policies that prevent direct installation from public repositories during model deployment

CConfigure automated dependency scanning tools that run on each training pipeline execution to detect known vulnerabilities, and establish a quarterly manual review process for updating pinned versions

DImplement a centralized requirements management system where all package versions are tested in an isolated environment before approval, with automated rollback capabilities if post-deployment issues occur

Question 9Hard

Given the quarterly regulatory changes and multi-jurisdictional requirements, which approach provides the most sustainable framework for maintaining compliance while minimizing operational overhead?

AImplement a centralized compliance metadata registry that maps model artifacts to jurisdiction-specific requirements, coupled with automated policy engines that trigger re-evaluation workflows when regulatory updates are ingested, and maintain versioned compliance attestations linked to model lineage graphs

BDeploy separate model training and serving infrastructure in each jurisdiction with localized compliance controls, establish manual quarterly audit cycles where compliance officers review model documentation, and implement hard geographic boundaries that prevent cross-jurisdiction data flow

CCreate a compliance dashboard that aggregates model performance metrics across jurisdictions, schedule quarterly retraining cycles aligned with regulatory review periods, and maintain a centralized data warehouse with encryption-at-rest to satisfy all privacy requirements simultaneously

DEstablish region-specific feature stores with jurisdiction-tagged datasets, implement quarterly compliance sprints where engineering teams manually update model documentation to reflect new regulations, and deploy versioned models with jurisdiction identifiers in their metadata tags

Question 10Easy

What is an effective approach to manage and reduce technical debt in an ML project that was deployed hastily?

AConduct regular code reviews and refactor the codebase incrementally.

BFocus solely on adding new features to outpace technical debt issues.

CIgnore the existing technical debt and prioritize performance improvements.

DEstablish a dedicated team to monitor and address technical debt continuously.

Ready for More?

These 10 questions are just a preview. Create a free account to practice up to 3 topics with 50 questions per day — or upgrade to Pro for unlimited access.

Ready to Pass the PMLE - Professional Machine Learning Engineer?

Join thousands of professionals preparing for their PMLE - Professional Machine Learning Engineer certification with ProctorPulse. AI-generated questions, detailed explanations, and progress tracking.