GCP PROFESSIONAL DATA ENGINEER CERTIFICATION Questions P-8

webstoryworldwide.com
2 months ago
1. Which Google Cloud service is recommended as the primary, cost-effective object storage for a data lakehouse, capable of handling structured, semi-structured, and unstructured data?
- Apache Iceberg
- Cloud Storage
- BigQuery
- AlloyDB

✅ Correct Answer: Cloud Storage
🧠 Why this is the exam-correct answer

Cloud Storage is the primary and most cost-effective object storage service on Google Cloud and is explicitly recommended as the foundation for a data lake or lakehouse.

It can natively store:

Structured data (CSV, Parquet, Avro)

Semi-structured data (JSON, logs, events)

Unstructured data (images, videos, audio, documents)

✔ Exam keywords matched:

primary · cost-effective · object storage · data lakehouse · all data types

This combination unambiguously maps to Cloud Storage.

❌ Why the other options are wrong (common PDE traps)
Apache Iceberg

❌ Not a storage service

It is a table format that runs on top of object storage (like Cloud Storage)

Exam trap: confuses format with storage

BigQuery

Optimized for analytics, not raw object storage

More expensive for large raw datasets

❌ Not used as the base storage layer of a lakehouse

AlloyDB

OLTP relational database

❌ Completely unsuitable for data lakes or lakehouses


2. How does BigQuery enable a unified analytics platform in the Cymbal company's data lakehouse?
- By using federated queries to analyze data directly in external sources like AlloyDB and in Iceberg tables on Cloud Storage without moving the data.
- By converting all unstructured and semi-structured data into a single proprietary format before analysis.
- By requiring all data to be moved from Cloud Storage and AlloyDB into its native storage.
- By exclusively managing high-transaction operational workloads with low latency.

✅ Correct Answer

By using federated queries to analyze data directly in external sources like AlloyDB and in Iceberg tables on Cloud Storage without moving the data.

🧠 Why this is the exam-correct answer

BigQuery enables a unified analytics platform in a lakehouse by supporting federated queries and external tables, allowing analytics without data movement.

In the Cymbal data lakehouse scenario, BigQuery can:

Query data in place from:

AlloyDB (via BigQuery Omni / federated queries)

Cloud Storage using open formats like Apache Iceberg

Provide a single SQL analytics layer

Avoid costly, complex ETL pipelines

Preserve data in open formats

✔ Key exam phrases matched:

unified analytics platform · lakehouse · without moving the data · external sources

These phrases directly map to BigQuery federated analytics.

❌ Why the other options are wrong (exam traps)
By converting all unstructured and semi-structured data into a single proprietary format

BigQuery supports open formats (Parquet, Iceberg)

Google promotes open lakehouse architectures

❌ Contradicts lakehouse principles

By requiring all data to be moved from Cloud Storage and AlloyDB into its native storage

BigQuery explicitly supports query-in-place

Data movement is optional, not required

❌ Opposite of federated analytics

By exclusively managing high-transaction operational workloads

That describes OLTP databases, not BigQuery

BigQuery is OLAP, not low-latency transactional

❌ Completely incorrect


3. What is the primary role of Apache Iceberg in a data lakehouse architecture built on Google Cloud?
- It acts as the central query engine for running high-speed SQL analytics.
- It is a fully managed, PostgreSQL-compatible database for operational workloads.
- It is the primary service used to store raw, unprocessed data files of any format or size.
- It adds a metadata layer to files in Cloud Storage, enabling features like schema evolution, time travel, and efficient querying.

✅ Correct Answer: It adds a metadata layer to files in Cloud Storage, enabling features like schema evolution, time travel, and efficient querying.

🧠 Why this is the exam-correct answer

Apache Iceberg plays the role of a table format / metadata layer in a Google Cloud lakehouse—not a storage system and not a query engine.

In a Google Cloud lakehouse, Iceberg:

Sits on top of object storage like Cloud Storage

Manages table metadata (schemas, partitions, snapshots)

Enables advanced lakehouse features:

Schema evolution

Time travel

ACID-like guarantees

Efficient pruning and querying

✔ Exam keywords matched:

metadata layer · schema evolution · time travel · efficient querying

These are signature capabilities of Apache Iceberg.

❌ Why the other options are wrong (common PDE traps)
It acts as the central query engine for running high-speed SQL analytics

Iceberg does not execute queries

Query engines include BigQuery, Spark, Trino, etc.

❌ Confuses table format with query engine

It is a fully managed, PostgreSQL-compatible database

That describes AlloyDB

Iceberg is not a database

❌ Completely incorrect

It is the primary service used to store raw, unprocessed data

Raw storage is handled by Cloud Storage

Iceberg references files, it doesn’t store them

❌ Confuses storage with metadata management


4. For an online retail business like Cymbal, which of the following use cases is best suited for AlloyDB?
- Storing raw, unstructured log files generated from website clicks.
- Performing complex analytical queries on historical marketing campaign performance.
- Storing large volumes of historical sales data for quarterly trend analysis.
- Managing real-time customer order processing and inventory updates.

✅ Correct Answer

Managing real-time customer order processing and inventory updates.

🧠 Why this is the exam-correct answer

AlloyDB is Google Cloud’s high-performance, PostgreSQL-compatible OLTP database, purpose-built for transactional workloads.

For an online retail business like Cymbal, AlloyDB is ideal for:

Real-time order processing

Inventory updates

Customer transactions

Low-latency reads/writes

High concurrency with ACID guarantees

✔ Exam keywords matched:

real-time · order processing · inventory updates · operational workloads

These are classic OLTP use cases, which map directly to AlloyDB.

❌ Why the other options are wrong (common PDE traps)
Storing raw, unstructured log files

Best handled by Cloud Storage

Logs are append-heavy and unstructured

❌ Not a relational OLTP use case

Performing complex analytical queries on historical marketing data

Best suited for BigQuery

Analytical, scan-heavy workloads

❌ OLAP ≠ AlloyDB

Storing large volumes of historical sales data for trend analysis

Again, BigQuery or a lakehouse (Cloud Storage + Iceberg) is preferred

AlloyDB is optimized for current operational data, not large historical scans

❌ Costly and inefficient for analytics


5. When querying Apache Iceberg tables in Cloud Storage via BigLake, how does BigQuery leverage Iceberg's structure for performance?

-BigQuery loads all Iceberg data into native Colossus storage before querying for optimization, increasing data movement.
- BigQuery reimplements its own partitioning and clustering logic over Iceberg files, ignoring the existing Iceberg metadata.
- BigQuery performs full scans of all data files within the Cloud Storage bucket regardless of query filters, as it doesn't understand Iceberg's file organization.
- BigQuery intelligently leverages the existing structure defined in the Iceberg table's metadata, using its partitioning (e.g., transaction_date) to identify and prune irrelevant files, and its file-level statistics (mimicking clustering by sorted columns like customer_id) for powerful predicate pushdown to skip non-matching files.

✅ Correct Answer

BigQuery intelligently leverages the existing structure defined in the Iceberg table's metadata, using its partitioning (e.g., transaction_date) to identify and prune irrelevant files, and its file-level statistics (mimicking clustering by sorted columns like customer_id) for powerful predicate pushdown to skip non-matching files.

🧠 Why this is the exam-correct answer

When querying Apache Iceberg tables stored in Cloud Storage via BigLake, BigQuery does not copy data or ignore Iceberg semantics. Instead, it understands Iceberg metadata and uses it for performance optimizations:

Partition pruning: Uses Iceberg partitions (e.g., transaction_date) to avoid scanning irrelevant data.

Predicate pushdown: Uses file-level statistics (min/max, counts) recorded by Iceberg to skip files that cannot match the query.

Query-in-place: Reads data where it lives in Cloud Storage via BigLake—no data movement.

This delivers warehouse-grade performance on lakehouse data.

✔ Exam keywords matched: leverage metadata, partition pruning, predicate pushdown, skip files, no data movement.

❌ Why the other options are wrong (common PDE traps)

Loads all data into native storage first → ❌ BigQuery does not require copying Iceberg data into Colossus for querying.

Ignores Iceberg metadata and reimplements partitioning/clustering → ❌ BigQuery uses Iceberg’s metadata.

Performs full scans regardless of filters → ❌ Iceberg metadata enables file pruning, avoiding full scans.


6. In BigQuery, what roles do "slots" and "shuffle" play in enabling fast query execution on massive datasets?

- Slots are physical servers that users must provision, and shuffle is a data encryption method for security.
- Slots manage data storage within Colossus, while shuffle performs data archival to reduce costs.
- Slots optimize data loading into BigQuery, and shuffle is responsible for data visualization post-query.
- Slots are virtual workers, self-contained units of computational power that process small pieces of data simultaneously in a massively parallel manner, while shuffle is the process of redistributing intermediate data using Google’s petabit internal network (Jupiter) for efficient aggregation and joining.

✅ Correct Answer

Slots are virtual workers, self-contained units of computational power that process small pieces of data simultaneously in a massively parallel manner, while shuffle is the process of redistributing intermediate data using Google’s petabit internal network (Jupiter) for efficient aggregation and joining.

🧠 Why this is the exam-correct answer

In BigQuery, fast query execution at massive scale is enabled by two core concepts:

Slots

Virtual compute workers (not physical servers)

Execute query stages in massive parallelism

Each slot processes a small chunk of data

Google dynamically allocates thousands of slots per query

Shuffle

The redistribution of intermediate query results

Happens during operations like:

GROUP BY

JOIN

ORDER BY

Uses Google’s Jupiter petabit-scale internal network

Enables efficient aggregation and joins across slots

✔ Exam keywords matched:

massively parallel · virtual workers · redistributing intermediate data · aggregation · joining

This description exactly matches how BigQuery is architected internally.

❌ Why the other options are wrong (common PDE traps)

Physical servers users must provision → ❌ BigQuery is fully serverless

Data storage or archival roles → ❌ That’s Colossus, not slots or shuffle

Data loading or visualization → ❌ Not related to execution internals


7. What problem does the BigLake service solve within a lakehouse architecture for enterprises like Cymbal?

- It creates separate data silos for structured and unstructured data, making it difficult to analyze different types of data together.
- It forces data into a single, proprietary format, leading to vendor lock-in and limited interoperability.
- It requires building complex ETL pipelines to duplicate data from the data lake to the data warehouse for structured analysis.
- BigLake acts as a storage engine and connector, allowing BigQuery to directly query data stored in open formats within object storage (like Google Cloud Storage) as external tables, combining the low-cost, flexible storage of a data lake with the powerful querying and governance features of a data warehouse, without moving or duplicating data.

✅ Correct Answer

BigLake acts as a storage engine and connector, allowing BigQuery to directly query data stored in open formats within object storage (like Google Cloud Storage) as external tables, combining the low-cost, flexible storage of a data lake with the powerful querying and governance features of a data warehouse, without moving or duplicating data.

🧠 Why this is the exam-correct answer

BigLake solves a core lakehouse problem: how to get warehouse-grade analytics and governance on data lake storage without copying data.

For enterprises like Cymbal, BigLake:

Connects BigQuery directly to data in Cloud Storage

Supports open formats (e.g., Apache Iceberg)

Enables external tables with:

Fine-grained access control

Centralized governance

High-performance query-in-place

Eliminates data duplication and complex ETL

✔ Exam keywords matched:

lakehouse · open formats · query in place · governance · no data movement

❌ Why the other options are wrong (common PDE traps)

Creates data silos → ❌ BigLake does the opposite by unifying access

Forces proprietary formats → ❌ BigLake promotes open formats

Requires duplicating data via ETL → ❌ BigLake removes the need for ETL copies


8. How does BigLake centralize governance and security for external data queried through BigQuery?

- It duplicates data into BigQuery's native storage to apply security policies, increasing storage costs and data latency.
- It only supports basic table-level security, not fine-grained controls like row-level or column-level security.
- It requires end-users to have direct permissions on the underlying Cloud Storage buckets to access the data, decentralizing control.
- BigLake uses access delegation by associating an external table with a service account that has Cloud Storage permissions, allowing users to be granted fine-grained security controls (including row-level and column-level security) directly on the BigLake tables within BigQuery, without needing direct access to the raw files.

✅ Correct Answer

BigLake uses access delegation by associating an external table with a service account that has Cloud Storage permissions, allowing users to be granted fine-grained security controls (including row-level and column-level security) directly on the BigLake tables within BigQuery, without needing direct access to the raw files.

🧠 Why this is the exam-correct answer

BigLake centralizes governance and security for lakehouse data by decoupling storage permissions from analytics access.

Here’s how it works with BigQuery and Cloud Storage:

Access delegation:
BigLake tables are bound to a service account that has permissions on Cloud Storage.

No direct bucket access for users:
Analysts query via BigQuery and do not need IAM access to the underlying files.

Fine-grained controls in BigQuery:
Apply row-level security (RLS) and column-level security (CLS) directly on BigLake tables.

Centralized governance:
Security policies live in BigQuery—consistent, auditable, and scalable.

✔ Exam keywords matched:

centralized governance · access delegation · fine-grained security · no direct file access

❌ Why the other options are wrong (common PDE traps)

Duplicates data into native storage → ❌ BigLake is query-in-place; no copying required.

Only table-level security → ❌ BigLake supports RLS and CLS via BigQuery.

Requires direct Cloud Storage permissions for users → ❌ BigLake removes this requirement.


9. Which statement accurately describes BigQuery's interaction with Apache Iceberg tables via BigLake?

- BigQuery ignores Iceberg's metadata, requiring full table scans for queries, making it inefficient for large datasets.
- BigQuery requires a proprietary connector to interact with Iceberg, limiting interoperability and open standards.
- BigQuery can only read data from Iceberg tables but cannot perform write operations like UPDATE, DELETE, or MERGE.
- BigQuery offers first-class, native support for Apache Iceberg through BigLake, understanding its metadata for advanced optimizations like partitioning and clustering, and allowing direct UPDATE, DELETE, and MERGE statements on Iceberg tables.

✅ Correct Answer

BigQuery offers first-class, native support for Apache Iceberg through BigLake, understanding its metadata for advanced optimizations like partitioning and clustering, and allowing direct UPDATE, DELETE, and MERGE statements on Iceberg tables.

🧠 Why this is the exam-correct answer

Through BigLake, BigQuery provides native, first-class support for Apache Iceberg—this is a key pillar of Google’s lakehouse strategy.

BigQuery:

Understands Iceberg metadata (schemas, partitions, snapshots, file statistics)

Uses that metadata for partition pruning and predicate pushdown

Treats Iceberg tables as fully managed analytical tables

Supports DML operations directly on Iceberg tables:

UPDATE

DELETE

MERGE

✔ Exam keywords matched:

first-class support · native · open formats · metadata-aware · UPDATE / DELETE / MERGE

This is exactly how Google positions BigQuery + BigLake in PDE case studies.

❌ Why the other options are wrong (common PDE traps)

Ignores Iceberg metadata and requires full scans
❌ Opposite of reality—BigQuery actively uses Iceberg metadata for optimization.

Requires a proprietary connector
❌ BigLake is native, standards-based, and built around open formats.

Read-only access to Iceberg tables
❌ BigQuery supports full DML, which is a major differentiator versus legacy external tables.


10. What is a primary benefit of BigQuery being a "fully managed" and "serverless" data warehouse for an online retail company like Cymbal?

- It necessitates building complex ETL pipelines to move data from the data lake to the warehouse for analysis.
- It allows Cymbal's data architects to spend significant time on capacity planning and server management.
- It requires manual provisioning of hardware and software patching, increasing control for data architects.
- Google handles all infrastructure, patches, updates, and hardware failures, and automatically allocates necessary resources, significantly reducing operational overhead and enabling seamless scaling.

✅ Correct Answer

Google handles all infrastructure, patches, updates, and hardware failures, and automatically allocates necessary resources, significantly reducing operational overhead and enabling seamless scaling.

🧠 Why this is the exam-correct answer

Being fully managed and serverless is one of the most important advantages of BigQuery, especially for an online retail company like Cymbal that needs to scale analytics with demand (sales, campaigns, seasonal spikes).

BigQuery:

Requires no server provisioning

Eliminates capacity planning

Automatically:

Scales compute and storage

Handles OS and software patching

Manages hardware failures

Allows teams to focus on data and insights, not infrastructure

✔ Exam keywords matched:

fully managed · serverless · reduced operational overhead · automatic scaling

These phrases directly map to the correct option.

❌ Why the other options are wrong (common PDE traps)

Necessitates complex ETL pipelines → ❌ BigQuery reduces ETL via federation and BigLake

Spend time on capacity planning → ❌ BigQuery removes this responsibility

Manual provisioning and patching → ❌ Opposite of serverless


11. The separation of storage and compute in BigQuery's architecture offers which key advantage for managing costs and ensuring performance?

- It requires pre-provisioning all necessary compute power, even when not in use, leading to potential overspending.
- It ties compute resources directly to storage, meaning both must scale together, which can increase fixed costs.
- Storage and compute can scale independently, allowing users to add more storage as data grows, and use more compute power only when needed for complex queries, paying only for the compute used.
- It increases the complexity of managing data growth and query demands, as resources are tightly coupled.

✅ Correct Answer

Storage and compute can scale independently, allowing users to add more storage as data grows, and use more compute power only when needed for complex queries, paying only for the compute used.

🧠 Why this is the exam-correct answer

A core architectural principle of BigQuery is the separation of storage and compute, which directly enables cost efficiency and performance at scale.

This design means:

Storage (data in Colossus) scales independently as data grows

Compute (slots) is allocated only when queries run

No need to provision idle compute

You pay for:

Stored data

Compute actually consumed (on-demand or reserved)

✔ Exam keywords matched:

independent scaling · pay only for what you use · cost control · performance on demand

This is exactly what Google tests around serverless analytics economics.

❌ Why the other options are wrong (common PDE traps)

Pre-provision all compute → ❌ BigQuery is serverless

Compute tied to storage → ❌ Opposite of BigQuery’s design

Increased management complexity → ❌ Separation reduces ops burden


12. How do partitioning and clustering optimize BigQuery performance and cost for a large dataset like Cymbal's sales data?

- Partitioning divides tables into smaller, manageable segments (like by date) so BigQuery only scans relevant partitions, drastically reducing data scanned; clustering sorts data within those partitions (like by customer ID), allowing BigQuery to jump directly to specific data, providing finer-grained data pruning.
- Partitioning shuffles data across all tables to distribute load, and clustering encrypts sensitive columns within the data.
- Partitioning creates duplicate copies of data for redundancy, while clustering is only applicable to external tables.
- Both partitioning and clustering require scanning the entire dataset for every query, increasing processing time and cost.

✅ Correct Answer

Partitioning divides tables into smaller, manageable segments (like by date) so BigQuery only scans relevant partitions, drastically reducing data scanned; clustering sorts data within those partitions (like by customer ID), allowing BigQuery to jump directly to specific data, providing finer-grained data pruning.

🧠 Why this is the exam-correct answer

For large datasets like Cymbal’s sales data, BigQuery relies on partitioning and clustering to minimize scanned data—directly improving performance and cost.

Partitioning

Splits tables into logical segments (commonly by date)

BigQuery scans only the partitions referenced in the query

Huge cost savings for time-bound queries (e.g., last 30 days)

Clustering

Sorts data within each partition by selected columns (e.g., customer_id)

Enables fine-grained pruning using file-level statistics

Reduces scan even further when filters match clustered columns

✔ Exam keywords matched:

reduce data scanned · partition pruning · finer-grained pruning · cost optimization

❌ Why the other options are wrong (common PDE traps)

Shuffling and encryption → ❌ Not related to partitioning or clustering

Duplicate data and external-only → ❌ Incorrect and misleading

Requires scanning entire dataset → ❌ Opposite of their purpose


13. How does a data lakehouse architecture combine the best features of data lakes and data warehouses?

- By always requiring data to be moved and duplicated between a data lake and a data warehouse.
- By creating separate, specialized platforms for BI and AI workloads to maximize performance.
- By enforcing a strict schema-on-write approach for all data types to ensure high data quality.
- By implementing a metadata and governance layer on top of open-format files in low-cost object storage.

✅ Correct Answer

By implementing a metadata and governance layer on top of open-format files in low-cost object storage.

🧠 Why this is the exam-correct answer

A data lakehouse merges the strengths of data lakes (low cost, flexibility) and data warehouses (performance, governance) by adding a metadata and governance layer on top of data stored in open formats (like Parquet/Iceberg) in low-cost object storage.

In Google Cloud terms, this typically means:

Data stored in Cloud Storage

Table/metadata layers such as Apache Iceberg

Warehouse-grade analytics and governance via BigQuery (often through BigLake)

This approach enables:

Schema evolution & time travel

ACID-like guarantees

Fine-grained security and governance

Query-in-place without data duplication

✔ Exam keywords matched:

metadata layer · governance · open formats · low-cost object storage

❌ Why the other options are wrong (common PDE traps)

Always requiring data movement and duplication
❌ Lakehouse architectures aim to avoid duplication

Separate platforms for BI and AI
❌ Lakehouse unifies analytics and ML on the same data

Strict schema-on-write for all data
❌ Lakehouses support schema-on-read and schema evolution


14. Cymbal, an e-commerce company, needs a solution to support traditional business intelligence (BI), modern data science, and artificial intelligence (AI) workloads on a single, governed copy of its data, while also breaking down data silos between sales and customer review data. Which architecture is best suited for this need?

- A data lake, using Cloud Storage, for initial, low-cost storage of massive volumes of raw data for AI/ML research.
- Separate data lakes and data warehouses, with manual data integration as needed.
- A data lakehouse architecture with BigQuery and BigLake.
- A data warehouse, like BigQuery, for high-speed, interactive BI on structured data.

✅ Correct Answer

A data lakehouse architecture with BigQuery and BigLake.

🧠 Why this is the exam-correct answer

Cymbal’s requirements are very explicit:

Single, governed copy of data

Support for traditional BI, data science, and AI/ML

Break down data silos (sales + customer reviews)

Avoid duplication and manual integration

A data lakehouse architecture using BigQuery and BigLake is purpose-built for exactly this scenario.

With this architecture:

Data lives once in open formats (typically in Cloud Storage)

BigQuery provides a single SQL analytics layer for BI, ML (BigQuery ML), and ad-hoc analysis

BigLake enables:

Query-in-place on lake data

Fine-grained governance (row/column-level security)

Unified access across structured and semi-structured data

Sales data and customer review data are analyzed together, eliminating silos

✔ Exam keywords matched:

single governed copy · BI + data science + AI · break down silos · lakehouse

❌ Why the other options are wrong (common PDE traps)
A data lake using Cloud Storage only

Great for low-cost storage

❌ Lacks native BI performance and governance

❌ Not ideal for enterprise BI users

Separate data lakes and data warehouses

Causes data duplication

Requires manual ETL

❌ Reinforces silos instead of breaking them

A data warehouse only (BigQuery)

Excellent for BI

❌ Not ideal alone for raw, semi-structured, or unstructured data


15. Which of the following best describes the primary characteristic of a data lake?

- Data is cleaned, transformed, and structured before storage, optimized for business intelligence.
- It primarily handles structured data and is optimized for high-performance queries on consistent information.
- It stores enormous amounts of raw data in its native format with schema-on-read, ideal for exploration and machine learning (ML).
- It is a highly organized library designed for fast and accurate reports based on pre-defined metrics.

✅ Correct Answer

It stores enormous amounts of raw data in its native format with schema-on-read, ideal for exploration and machine learning (ML).

🧠 Why this is the exam-correct answer

The primary characteristic of a data lake is flexibility:

Data is stored as-is (raw, native formats)

Supports structured, semi-structured, and unstructured data

Uses schema-on-read (structure is applied when data is queried, not when stored)

Ideal for:

Data exploration

Data science

Machine learning

Advanced analytics

✔ Exam keywords matched:

raw data · native format · schema-on-read · ML and exploration

These phrases are definitive indicators of a data lake in PDE exams.

❌ Why the other options are wrong (common exam traps)

Cleaned, transformed, structured before storage
❌ That describes a data warehouse

Primarily handles structured data, optimized for high-performance queries
❌ Again, a data warehouse

Highly organized library for fast reports with predefined metrics
❌ Classic BI warehouse description


16. What is a key disadvantage of a traditional data warehouse?

- It primarily uses low-cost object storage, leading to security risks with raw data.
- It is inflexible and cannot easily accommodate new data types or unstructured data.
- It is ideal for AI/ML model training due to its support for advanced analytics.
- It risks becoming a "data swamp" without proper governance.

✅ Correct Answer

It is inflexible and cannot easily accommodate new data types or unstructured data.

🧠 Why this is the exam-correct answer

A traditional data warehouse is built around:

Schema-on-write

Highly structured data

Carefully modeled tables designed for BI and reporting

This makes it excellent for consistent, repeatable analytics, but a key disadvantage is inflexibility when:

New data sources arrive frequently

Data is semi-structured or unstructured (JSON, logs, text, images)

Schemas evolve rapidly (common in modern digital businesses)

In PDE exams, this limitation is often contrasted with data lakes and lakehouse architectures, which are designed to handle diverse and evolving data types more easily.

✔ Exam keywords matched:

inflexible · new data types · unstructured data · schema-on-write

❌ Why the other options are wrong (common PDE traps)

Low-cost object storage with security risks
❌ Describes a poorly governed data lake, not a warehouse

Ideal for AI/ML model training
❌ Warehouses (e.g., BigQuery) can support ML, but this is not a traditional warehouse’s primary strength

Risk of becoming a data swamp
❌ That is a data lake risk without governance, not a warehouse issue