New Year Sale Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 65pass65

Good News !!! Professional-Data-Engineer Google Professional Data Engineer Exam is now Stable and With Pass Result

Professional-Data-Engineer Practice Exam Questions and Answers

Google Professional Data Engineer Exam

Last Update 2 days ago
Total Questions : 372

Google Professional Data Engineer Exam is stable now with all latest exam questions are added 2 days ago. Incorporating Professional-Data-Engineer practice exam questions into your study plan is more than just a preparation strategy.

Professional-Data-Engineer exam questions often include scenarios and problem-solving exercises that mirror real-world challenges. Working through Professional-Data-Engineer dumps allows you to practice pacing yourself, ensuring that you can complete all Google Professional Data Engineer Exam practice test within the allotted time frame.

Professional-Data-Engineer PDF

Professional-Data-Engineer PDF (Printable)
$43.75
$124.99

Professional-Data-Engineer Testing Engine

Professional-Data-Engineer PDF (Printable)
$50.75
$144.99

Professional-Data-Engineer PDF + Testing Engine

Professional-Data-Engineer PDF (Printable)
$63.7
$181.99
Question # 1

You work for a global shipping company. You want to train a model on 40 TB of data to predict which ships in each geographic region are likely to cause delivery delays on any given day. The model will be based on multiple attributes collected from multiple sources. Telemetry data, including location in GeoJSON format, will be pulled from each ship and loaded every hour. You want to have a dashboard that shows how many and which ships are likely to cause delays within a region. You want to use a storage solution that has native functionality for prediction and geospatial processing. Which storage solution should you use?

Options:

A.  

BigQuery

B.  

Cloud Bigtable

C.  

Cloud Datastore

D.  

Cloud SQL for PostgreSQL

Discussion 0
Question # 2

You’ve migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you’d like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you’d like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.

What should you do?

Options:

A.  

Increase the size of your parquet files to ensure them to be 1 GB minimum.

B.  

Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.

C.  

Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.

D.  

Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

Discussion 0
Question # 3

Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_partitioned. To reduce the cost of queries, your organization created a view called events, which queries only the last 14 days of data. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read the events data via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.)

Options:

A.  

Create a new view over events using standard SQL

B.  

Create a new partitioned table using a standard SQL query

C.  

Create a new view over events_partitioned using standard SQL

D.  

Create a service account for the ODBC connection to use for authentication

E.  

Create a Google Cloud Identity and Access Management (Cloud IAM) role for the ODBC connection and shared “events”

Discussion 0
Question # 4

You are migrating your data warehouse to Google Cloud and decommissioning your on-premises data center Because this is a priority for your company, you know that bandwidth will be made available for the initial data load to the cloud. The files being transferred are not large in number, but each file is 90 GB Additionally, you want your transactional systems to continually update the warehouse on Google Cloud in real time What tools should you use to migrate the data and ensure that it continues to write to your warehouse?

Options:

A.  

Storage Transfer Service for the migration, Pub/Sub and Cloud Data Fusion for the real-time updates

B.  

BigQuery Data Transfer Service for the migration, Pub/Sub and Dataproc for the real-time updates

C.  

gsutil for the migration; Pub/Sub and Dataflow for the real-time updates

D.  

gsutil for both the migration and the real-time updates

Discussion 0
Question # 5

An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

Options:

A.  

Use a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.

B.  

Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source

C.  

Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

D.  

Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

Discussion 0
Question # 6

You have data located in BigQuery that is used to generate reports for your company. You have noticed some weekly executive report fields do not correspond to format according to company standards for example, report errors include different telephone formats and different country code identifiers. This is a frequent issue, so you need to create a recurring job to normalize the data. You want a quick solution that requires no coding What should you do?

Options:

A.  

Use Cloud Data Fusion and Wrangler to normalize the data, and set up a recurring job.

B.  

Use BigQuery and GoogleSQL to normalize the data, and schedule recurring quenes in BigQuery.

C.  

Create a Spark job and submit it to Dataproc Serverless.

D.  

Use Dataflow SQL to create a job that normalizes the data, and that after the first run of the job, schedule the pipeline to execute recurrently.

Discussion 0
Question # 7

You maintain ETL pipelines. You notice that a streaming pipeline running on Dataflow is taking a long time to process incoming data, which causes output delays. You also noticed that the pipeline graph was automatically optimized by Dataflow and merged into one step. You want to identify where the potential bottleneck is occurring. What should you do?

Options:

A.  

Insert a Reshuffle operation after each processing step, and monitor the execution details in the Dataflow console.

B.  

Log debug information in each ParDo function, and analyze the logs at execution time.

C.  

Insert output sinks after each key processing step, and observe the writing throughput of each block.

D.  

Verify that the Dataflow service accounts have appropriate permissions to write the processed data to the output sinks

Discussion 0
Question # 8

You need to migrate a Redis database from an on-premises data center to a Memorystore for Redis instance. You want to follow Google-recommended practices and perform the migration for minimal cost. time, and effort. What should you do?

Options:

A.  

Make a secondary instance of the Redis database on a Compute Engine instance, and then perform a live cutover.

B.  

Write a shell script to migrate the Redis data, and create a new Memorystore for Redis instance.

C.  

Create a Dataflow job to road the Redis database from the on-premises data center. and write the data to a Memorystore for Redis instance

D.  

Make an RDB backup of the Redis database, use the gsutil utility to copy the RDB file into a Cloud Storage bucket, and then import the RDB tile into the Memorystore for Redis instance.

Discussion 0
Question # 9

Your team is building a data lake platform on Google Cloud. As a part of the data foundation design, you are planning to store all the raw data in Cloud Storage You are expecting to ingest approximately 25 GB of data a day and your billing department is worried about the increasing cost of storing old data. The current business requirements are:

• The old data can be deleted anytime

• You plan to use the visualization layer for current and historical reporting

• The old data should be available instantly when accessed

• There should not be any charges for data retrieval.

What should you do to optimize for cost?

Options:

A.  

Create the bucket with the Autoclass storage class feature.

B.  

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to nearline, 90 days to coldline. and 365 days to archive storage class. Delete old data as needed.

C.  

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to coldline, 90 days to nearline. and 365 days to archive storage class Delete old data as needed.

D.  

Create an Object Lifecycle Management policy to modify the storage class for data older than 30 days to nearlme. 45 days to coldline. and 60 days to archive storage class Delete old data as needed.

Discussion 0
Question # 10

You work for a large ecommerce company. You are using Pub/Sub to ingest the clickstream data to Google Cloud for analytics. You observe that when a new subscriber connects to an existing topic to analyze data, they are unable to subscribe to older data for an upcoming yearly sale event in two months, you need a solution that, once implemented, will enable any new subscriber to read the last 30 days of data. What should you do?

Options:

A.  

Create a new topic, and publish the last 30 days of data each time a new subscriber connects to an existing topic.

B.  

Set the topic retention policy to 30 days.

C.  

Set the subscriber retention policy to 30 days.

D.  

Ask the source system to re-push the data to Pub/Sub, and subscribe to it.

Discussion 0
Get Professional-Data-Engineer dumps and pass your exam in 24 hours!

Free Exams Sample Questions