1. You need to choose a data storage solution to support a transactional system. Your customers are primarily based in one region. You want to reduce your administration tasks and focus engineering effort on building your business application. What should you do?
- Create a Cloud Storage bucket with a regional bucket.
- Use Spanner.
- Install a database of your choice on a Compute Engine VM.
- Use Cloud SQL.
The correct answer is:
β
Use Cloud SQL
Why Cloud SQL is the best choice
Your requirements are:
π§Ύ Transactional system (OLTP)
π Customers primarily in one region
π§βπ» Minimal administration
π Focus engineering effort on application development, not ops
π Cloud SQL is a fully managed relational database service designed exactly for this scenario.
What Cloud SQL gives you
Managed MySQL, PostgreSQL, or SQL Server
Automatic:
Backups
Patching
Replication
High availability
Strong consistency and ACID transactions
Regional deployment with low latency
Minimal operational overhead
This lets your team focus on business logic, not database maintenance.
Why the other options are not appropriate
β Cloud Storage (regional bucket)
Object storage, not transactional
No ACID transactions or SQL support
β Spanner
Designed for global-scale, multi-region systems
More complex and expensive than needed
β Database on Compute Engine VM
High operational burden (patching, backups, scaling)
Not aligned with βreduce administration tasksβ
Exam rule of thumb π§
Single-region transactional app + low ops β Cloud SQL
2. You have data that is ingested daily and frequently analyzed in the first month. Thereafter, the data is retained only for audits, which happen occasionally every few years. You need to configure cost-effective storage. What should you do?
- Configure a lifecycle policy on Cloud Storage.
- Create a bucket on Cloud Storage with Autoclass configured.
- Configure a data retention policy on Cloud Storage.
- Create a bucket on Cloud Storage with object versioning configured.
β
Correct Answer: Configure a lifecycle policy on Cloud Storage.
Clear and final explanation
Your data access pattern is well-defined and predictable:
π
Ingested daily
π Frequently analyzed in the first month
π Rarely accessed afterward (audits every few years)
π― Goal: cost-effective storage
When the hot β cold β archive timeline is known, the recommended and exam-expected solution is to use Cloud Storage lifecycle policies.
Why lifecycle policy is correct
A lifecycle policy lets you explicitly control storage costs by automatically transitioning data based on age, for example:
Standard for first 30 days (frequent access)
Coldline / Archive after 30 days (rare access)
Optional deletion after a long retention period
This provides:
Precise cost optimization
Predictable behavior
Low operational overhead
Alignment with Google Cloud best practices when access patterns are known
Why the other options are incorrect
β Autoclass
Best when access patterns are unknown or unpredictable. Here, they are clearly known.
β Retention policy
Enforces immutability, does not reduce storage cost.
β Object versioning
Increases storage usage and cost; does not optimize for access frequency.
3. A manager at Cymbal Retail expresses concern about unauthorized access to objects in your Cloud Storage bucket. You need to evaluate all access on all objects in the bucket. What should you do?
- Enable and then review the Data Access audit logs.
- Route the Admin Activity logs to a BigQuery sink and analyze the logs with SQL queries.
- Change the permissions on the bucket to only trusted employees.
- Review the Admin Activity audit logs.
β
Correct Answer: Enable and then review the Data Access audit logs.
Why this is the correct solution
Your concern is unauthorized access to objects in a Cloud Storage bucket.
To evaluate who accessed which objects (read/write/list), you must look at data-level access events.
π Data Access audit logs record:
Object reads
Object writes
Object deletions
Object list operations
These logs provide per-object visibility, which is exactly what you need.
β οΈ Important:
Data Access logs are NOT enabled by default β you must explicitly enable them first.
Why the other options are incorrect
β Review Admin Activity audit logs
Admin Activity logs track configuration changes (IAM, bucket settings)
They do NOT show object reads/writes
β Route Admin Activity logs to BigQuery
Still the wrong log type
Youβd only see admin actions, not object access
β Change permissions to trusted employees
Preventive, not investigative
Does not help you evaluate existing or past access
Exam rule of thumb π§
Need to see who accessed data β Data Access audit logs
4. You need to store data long term and use it to create quarterly reports. What storage class should you choose?
- Archive
- Standard
- Coldline
- Nearline
β
Correct Answer: Coldline
Why Coldline is the best choice
Your requirements are:
π Store data long term
π Access it quarterly (about once every 3 months)
π° Optimize storage cost
π Cloud Storage Coldline is specifically designed for data that is:
Accessed less than once per month
Needed occasionally (e.g., quarterly reports)
Must remain readily retrievable (unlike Archive, which has higher access costs)
Coldline offers:
Much lower storage cost than Standard or Nearline
Reasonable access costs for infrequent (quarterly) reads
No long restore delays
Why the other options are incorrect
β Standard
Too expensive for long-term, infrequently accessed data
β Nearline
Optimized for data accessed about once per month, not quarterly
β Archive
Best for very rarely accessed data (years)
Higher retrieval cost than Coldline
Overkill for quarterly reporting
Quick decision table π§
Access Frequency Best Storage Class
Daily / weekly Standard
Monthly Nearline
Quarterly Coldline β
Years Archive
5. You have several large tables in your transaction databases. You need to move all the data to BigQuery for the business analysts to explore and analyze the data. How should you design the schema in BigQuery?
- Combine all the transactional database tables into a single table using outer joins.
- Redesign the schema to normalize the data by removing all redundancies.
- Retain the data on BigQuery with the same schema as the source.
- Redesign the schema to denormalize the data with nested and repeated data.
β
Correct Answer: Redesign the schema to denormalize the data with nested and repeated data.
Why this is the best practice in BigQuery
BigQuery is a columnar, analytics-first data warehouse, not a traditional OLTP database. It is optimized for:
Large-scale scans
Fewer joins
Analytical queries run by business analysts
π Because of this, denormalized schemas with nested and repeated fields are the recommended design pattern.
Benefits of denormalization in BigQuery
π Better query performance (fewer joins)
π° Lower cost (less data scanned)
π Simpler SQL for analysts
π§± Uses BigQueryβs strengths:
Nested (STRUCT)
Repeated (ARRAY) fields
Example:
Orders table with repeated line items instead of a separate order_items table.
Why the other options are incorrect
β Combine all tables into a single table with outer joins
Creates extremely wide tables
Difficult to maintain
Often leads to data duplication and confusion
β Normalize the schema
OLTP best practice, not OLAP
Causes expensive joins and poor performance in BigQuery
β Retain the same schema as the source
OLTP schemas are usually normalized
Misses BigQuery optimization opportunities
Exam rule of thumb π§
BigQuery analytics β Denormalize + nested & repeated fields
6. Cymbal Retail has accumulated a large amount of data. Analysts and leadership are finding it difficult to understand the meaning of the data, such as BigQuery columns. Users of the data don't know who owns what. You need to improve the searchability of the data. What should you do?
- Export the data to Cloud Storage with descriptive file names.
- Create tags for data entries in Cloud Catalog.
- Rename BigQuery columns with more descriptive names.
- Add a description column corresponding to each data column.
β
Correct Answer: Create tags for data entries in Cloud Data Catalog.
Why this is the right solution
Your core problems are about data discovery and understanding, not data storage or schema design:
Analysts donβt understand what columns mean
Users donβt know who owns which data
You need to improve searchability across datasets
π Cloud Data Catalog is Google Cloudβs centralized metadata management and discovery service, designed exactly for this use case.
What Data Catalog tags provide
Business and technical metadata such as:
Data owner
Data domain
Sensitivity / classification
Column meaning
Searchable metadata across BigQuery, Cloud Storage, etc.
A shared, authoritative source of truth about data assets
No need to modify schemas or duplicate data
This directly solves:
βWhat does this column mean?β
βWho owns this data?β
βWhere is the authoritative dataset?β
Why the other options are not sufficient
β Export data to Cloud Storage with descriptive names
Loses BigQuery analytics capabilities
Does not solve metadata or ownership discovery
β Rename BigQuery columns
Helpful, but limited
Does not capture ownership, domain, or business context
Risky for existing queries and pipelines
β Add a description column for each column
Not scalable or maintainable
Pollutes the schema
Not searchable in a meaningful way
Exam rule of thumb π§
Data meaning + ownership + searchability β Data Catalog tags
7. You are ingesting data that is spread out over a wide range of dates into BigQuery at a fast rate. You need to partition the table to make queries performant. What should you do?
- Create an integer-range partitioned table.
- Create an ingestion-time partitioned table with daily partitioning type.
- Create an ingestion-time partitioned table with yearly partitioning type.
- Create a time-unit column-partitioned table with yearly partitioning type.
β
Correct Answer: Create an ingestion-time partitioned table with daily partitioning type.
Why this is the best choice
Your situation:
π
Data spans a wide range of dates
π Data is ingested at a fast rate
β‘ You want performant queries in BigQuery
Likely querying data by recent days / date ranges
π Ingestion-time partitioning with daily partitions is the recommended and most common approach for this scenario.
Benefits
Automatically partitions data based on when it arrives
No dependency on a date column being present or clean
Optimized for:
Fast ingestion
Queries filtering by recent time windows
Simple to manage and widely used in streaming and batch ingestion
Why the other options are not ideal
β Integer-range partitioned table
Useful for numeric ranges (IDs, counters)
Not suitable for date-based analytics
β Ingestion-time partitioned table with yearly partitioning
Partitions would be too large
Poor query pruning and performance
β Time-unit column-partitioned table with yearly partitioning
Yearly partitions are too coarse
Leads to scanning excessive data
Exam rule of thumb π§
High-ingest + wide date range + analytics β Ingestion-time partitioned table (daily)
8. Your analysts repeatedly run the same complex queries that combine and filter through a lot of data on BigQuery. The data changes frequently. You need to reduce the effort for the analysts. What should you do?
- Create a dataset with the data that is frequently queried.
- Export the frequently queried data into Cloud SQL.
- Export the frequently queried data into a new table.
- Create a view of the frequently queried data.
β
Correct Answer: Create a view of the frequently queried data.
Why this is the best solution
Your situation:
π Analysts repeatedly run the same complex queries
π Underlying data changes frequently
π― Goal is to reduce analyst effort, not duplicate data
π BigQuery views are designed exactly for this use case.
Benefits of using a view
Encapsulates complex SQL logic in one place
Analysts can query the view with simple SELECT statements
Always reflects the latest underlying data
No data duplication
Easy to maintain and update centrally
This dramatically improves productivity and consistency.
Why the other options are incorrect
β Create a dataset with frequently queried data
A dataset is just a container, not a solution
β Export data into Cloud SQL
Not suitable for large analytical workloads
Adds unnecessary complexity
β Export data into a new table
Data duplication
Requires refresh logic
Risk of stale data
Exam rule of thumb π§
Repeated complex queries + frequently changing data β View
9. You have data stored in a Cloud Storage bucket. You are using both Identity and Access Management (IAM) and Access Control Lists (ACLs) to configure access control. Which statement describes a user's access to objects in the bucket?
- The user has no access if IAM denies the permission.
- The user has access if either IAM or ACLs grant a permission
- The user has no access if either IAM or ACLs deny a permission.
- The user only has access if both IAM and ACLs grant a permission.
β
Correct Answer: The user has access if either IAM or ACLs grant a permission.
Why this is correct
In Cloud Storage, when both IAM and ACLs are enabled, access evaluation works as follows:
Permissions are additive
There is no explicit deny
A user is granted access if ANY applicable policy allows it
π This means:
If IAM allows OR an ACL allows β access is granted
Key rule to remember π§
Effective access = IAM permissions βͺ ACL permissions
So even if:
IAM does not grant access, but ACL does β β
access
ACL does not grant access, but IAM does β β
access
Why the other options are incorrect
β The user has no access if IAM denies the permission
β There is no explicit βdenyβ in IAM; lack of permission β deny.
β The user has no access if either IAM or ACLs deny a permission
β Again, no explicit deny; permissions are additive.
β The user only has access if both IAM and ACLs grant a permission
β This is incorrect; both are not required.
Exam rule of thumb π§
IAM + ACLs = additive permissions (logical OR)
10. You have large amounts of data stored on Cloud Storage and BigQuery. Some of it is processed, but some is yet unprocessed. You have a data mesh created in Dataplex. You need to make it convenient for internal users of the data to discover and use the data. What should you do?
- Create a lake for Cloud Storage data and a zone for BigQuery data.
- Create a lake for unprocessed data and assets for processed data.
- Create a raw zone for the unprocessed data and a curated zone for the processed data.
- Create a lake for BigQuery data and a zone for Cloud Storage data.
β
Correct Answer: Create a raw zone for the unprocessed data and a curated zone for the processed data.
Why this is the right approach
You already have a data mesh implemented with Dataplex, and your goal is to make data:
π Easy to discover
π Easy to understand
π§βπΌ Convenient for internal users to consume
Dataplex best practices recommend organizing data by data lifecycle and quality, not by storage system.
How Dataplex is meant to be structured
Lake β Represents a business domain (e.g., Retail, Sales)
Zones β Represent stages of data maturity
Assets β Point to actual data (BigQuery datasets, GCS buckets)
The canonical zone pattern is:
πΉ Raw zone
Unprocessed / ingested data
Source-aligned, minimal validation
Not intended for broad consumption
πΉ Curated zone
Cleaned, transformed, trusted data
Business-ready
Intended for analysts, dashboards, ML, reporting
This makes it very clear to users:
What data is safe and ready to use
What data is still being prepared
Why the other options are incorrect
β Create a lake for Cloud Storage data and a zone for BigQuery data
β Lakes are not meant to separate by storage type.
β Create a lake for unprocessed data and assets for processed data
β Lakes represent domains, not processing stages.
β Create a lake for BigQuery data and a zone for Cloud Storage data
β Same issue: incorrect abstraction.
Exam rule of thumb π§
Dataplex data organization β Raw zone β Curated zone
11. Cymbal Retail collects large amounts of data that is useful for improving business operations. The company wants to store and analyze this data in a serverless and cost-effective manner using Google Cloud. The analysts need to use SQL to write the queries. What tool can you use to meet these requirements?
- Data Fusion
- Spanner
- BigQuery
- Memorystore
β
Correct Answer: BigQuery
Why BigQuery is the right tool
Your requirements are very clear:
βοΈ Serverless (no infrastructure management)
π° Cost-effective for large-scale analytics
π Large amounts of data
π§ SQL-based analysis for analysts
π Designed for business analytics
π BigQuery is Google Cloudβs fully managed, serverless data warehouse, built exactly for this use case.
What BigQuery provides
Serverless architecture (no servers, clusters, or tuning)
Pay-per-query / capacity pricing β cost control
Standard SQL support
Scales automatically to petabytes of data
Integrates with BI tools and ML (BigQuery ML)
Why the other options are incorrect
β Data Fusion
ETL / pipeline creation tool
Not a data warehouse or analytics engine
β Spanner
Globally distributed transactional database (OLTP)
Not cost-effective for analytical workloads
β Memorystore
In-memory cache (Redis/Memcached)
Not for analytics or SQL querying
Exam rule of thumb π§
Serverless + SQL analytics + large data β BigQuery
12. Cymbal Retail also collects large amounts of structured, semistructured, and unstructured data. The company wants a centralized repository to store this data in a cost-effective manner using Google Cloud. What tool can you use to meet these requirements?
- Cloud SQL
- Cloud Storage
- Bigtable
- Dataflow
β
Correct Answer: Cloud Storage
Why Cloud Storage is the right choice
Your requirements are:
π Large amounts of data
π§© Structured, semi-structured, and unstructured data
π Centralized repository
π° Cost-effective
βοΈ Google Cloudβnative
π Cloud Storage is designed exactly for this use case and acts as a data lake on Google Cloud.
What Cloud Storage provides
Supports all data types:
Structured (CSV, Parquet, Avro)
Semi-structured (JSON, XML)
Unstructured (images, videos, logs)
Highly durable and scalable
Multiple cost-optimized storage classes
Serverless (no infrastructure to manage)
Integrates with:
BigQuery
Dataflow
Dataproc
Dataplex
This makes Cloud Storage the recommended centralized storage layer.
Why the other options are incorrect
β Cloud SQL
Relational database
Not designed for large-scale or unstructured data
β Bigtable
Optimized for low-latency key-value access
Not a general-purpose data lake
β Dataflow
Data processing service
Does not store data
Exam rule of thumb π§
Centralized, cost-effective storage for all data types β Cloud Storage