Databricks Certified Data Analyst Associate – Part I

Databricks Certified Data Analyst Associate – Part I

Section 1: Databricks SQL
● Describe the key audience and side audiences for Databricks SQL.
● Describe that a variety of users can view and run Databricks SQL dashboards as
stakeholders.
● Describe the benefits of using Databricks SQL for in-Lakehouse platform data processing.

https://learn.microsoft.com/en-us/azure/databricks/lakehouse

Describe Unity Catalog

https://www.mssqltips.com/sqlservertip/7827/databricks-unity-catalog-for-unified-data-governance


● Describe how to complete a basic Databricks SQL query.
● Identify Databricks SQL queries as a place to write and run SQL code.
● Identify the information displayed in the schema browser from the Query Editor page.
● Identify Databricks SQL dashboards as a place to display the results of multiple queries at
once.
● Describe how to complete a basic Databricks SQL dashboard.
● Describe how dashboards can be configured to automatically refresh.
● Describe the purpose of Databricks SQL endpoints/warehouses.
● Identify Serverless Databricks SQL endpoint/warehouses as a quick-starting option.
● Describe the trade-off between cluster size and cost for Databricks SQL
endpoints/warehouses.
● Identify Partner Connect as a tool for implementing simple integrations with a number of
other data products.
● Describe how to connect Databricks SQL to ingestion tools like Fivetran.
● Identify the need to be set up with a partner to use it for Partner Connect.
● Identify small-file upload as a solution for importing small text files like lookup tables and
quick data integrations.
● Import from object storage using Databricks SQL.
● Identify that Databricks SQL can ingest directories of files of the files are the same type.
● Describe how to connect Databricks SQL to visualization tools like Tableau, Power BI, and
Looker.
● Identify Databricks SQL as a complementary tool for BI partner tool workflows.
● Describe the medallion architecture as a sequential data organization and pipeline system
of progressively cleaner data.
● Identify the gold layer as the most common layer for data analysts using Databricks SQL.
● Describe the cautions and benefits of working with streaming data.
● Identify that the Lakehouse allows the mixing of batch and streaming workloads.
Section 2: Data Management
● Describe Delta Lake as a tool for managing data files.


● Describe that Delta Lake manages table metadata.
● Identify that Delta Lake tables maintain history for a period of time.
● Describe the benefits of Delta Lake within the Lakehouse.
● Describe persistence and scope of tables on Databricks.
● Compare and contrast the behavior of managed and unmanaged tables.
● Identify whether a table is managed or unmanaged.
● Explain how the LOCATION keyword changes the default location of database contents.
● Use Databricks to create, use, and drop databases, tables, and views.
● Describe the persistence of data in a view and a temp view
● Compare and contrast views and temp views.
● Explore, preview, and secure data using Data Explorer.
● Use Databricks to create, drop, and rename tables.
● Identify the table owner using Data Explorer.
● Change access rights to a table using Data Explorer.
● Describe the responsibilities of a table owner.
● Identify organization-specific considerations of PII data
Section 3: SQL in the Lakehouse
● Identify a query that retrieves data from the database with specific conditions
● Identify the output of a SELECT query
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-syntax-qry-select


● Compare and contrast MERGE INTO, INSERT TABLE, and COPY INTO.
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-merge-into


● Simplify queries using subqueries.


● Compare and contrast different types of JOINs.
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-syntax-qry-select-join


● Aggregate data to achieve a desired output and usage of cube, roll-up to aggregate data

https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-syntax-qry-select-groupby



● Use windows functions

https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/sql-ref-window-functions


● Identify a benefit of having ANSI SQL as the standard in the Lakehouse.
● Identify, access, and clean silver-level data.
● Utilize query history and caching to reduce development time and query latency.
● Optimize performance using higher-order Spark SQL functions.
● Create and apply UDFs in common scaling scenarios.


Section 4: Data Visualization and Dashboarding
● Create basic, schema-specific visualizations using Databricks SQL.

https://docs.databricks.com/en/sql/user/visualizations/index.html


● Identify which types of visualizations can be developed in Databricks SQL (table, details,
counter, pivot).
● Explain how visualization formatting changes the reception of a visualization
● Describe how to add visual appeal through formatting
● Identify that customizable tables can be used as visualizations within Databricks SQL.
● Describe how different visualizations tell different stories.
● Create customized data visualizations to aid in data storytelling.
● Create a dashboard using multiple existing visualizations from Databricks SQL Queries.
● Describe how to change the colors of all of the visualizations in a dashboard.
● Describe how query parameters change the output of underlying queries within a
dashboard
● Identify the behavior of a dashboard parameter

https://docs.databricks.com/en/dashboards/tutorials/query-based-params.html


● Identify the use of the “Query Based Dropdown List” as a way to create a query parameter
from the distinct output of a different query.
● Identify the method for sharing a dashboard with up-to-date results.
● Describe the pros and cons of sharing dashboards in different ways
● Identify that users without permission to all queries, databases, and endpoints can easily
refresh a dashboard using the owner’s credentials.
● Describe how to configure a refresh schedule

https://learn.microsoft.com/en-us/azure/databricks/sql/user/queries/schedule-query


● Identify what happens if a refresh rate is less than the Warehouse’s “Auto Stop”
● Describe how to configure and troubleshoot a basic alert
● Describe how notifications are sent when alerts are set up based on the configuration


Section 5: Analytics applications
● Understand common statistic terms

● Compare and contrast discrete and continuous statistics.


● Describe descriptive statistics.

● Describe statistical distribution


● Compare and contrast key statistical measures.


● Perform ETL Pipeline

Leave a Reply

Your email address will not be published. Required fields are marked *