Why do I need Data Warehouse if I have Power BI?

Power BI is quite often a first choice at early stage of Business Intelligence projects. The more projects, the more customer’s questions, why do we need Data Warehouse if we have Power Query available in Power BI. Let’s discuss it!

Memory consumption of Power BI service

Any data transformation on big data volume can generate high memory and cpu usage on Power BI service, impacting the performance of your workspace and other business users exploring Power BI reports. Finally, it is significantly reducing overall business users satisfaction because of slower reporting capabilities.

Additionally, it is worth to notice that some of data processing (especially on big datasets) cannot be processed sucesfully on Power BI Service because of limitations like Mashup Evaluation Container process limitation (commit size) that could stop your Power BI data processing (error). This limitation cannot be easily managed only by Power BI administrator because it requires to scale up your Power BI Premium capacity (impacting cost management).

The M evaluation exceeded the memory limit. 
To address the issue consider optimizing the M expressions, reducing the concurrency of operations that are memory intensive or upgrading to increase the available memory.

Heavy data transformations can be performed either in Database or Databricks.

Query Folding

Query Folding allows to generate single query statement to retrieve and transform source data. Such a query is executed on source compute engine, meaning that there is no data transformation processing in Power Query mashup engine. That is why, it confirms high efficiency. Query folding is available only in relational databases, OData feeds (including SharePoint lists), Exchange, and Active Directory and limited types of transformation. While using query folding, data transformation processing is moved to source datasets, reducing memory and cpu usage of Power BI Service. Query folding is not working on data sources like flat files, blobs etc.

Single Source of Truth

Database allows to store terabytes of data and create single platform for certified data. In this way elminating data silos and increasing data reliability. Power BI is limited by following maximum model size limit. Centralized data platform ensures high data quality provided by Business Intelligence team and allow to share it within company. Once most of KPIs are already defined, tested and accepted in database, they can be re-used in multiple Power BI datasets by different business departments.

Power BI Pro Power BI Premium Per User Power BI Premium Per Capacity
Model Size Limit 1GB 100GB 400GB

Audit of Data Preparation

There is no built-in functionality to track number of inserted, updated and deleted records in Power Query. Data Warehouse & ETL tools significantly increase transparency of data processing. It is extremely important to meet compliance regulations.

Incremental Refresh

Incremental refresh is not recommended (possible only via custom query functions) for data sources not supporting query folding. It works best for relational data sources like SQL Database or Azue Synapse.

Unable to confirm if the M query can be folded. 
It is not recommended to use incremental refresh with non-foldable queries.

Data Lineage of Power Query

It is much easier to manage dependencies and order of data transformation for enterprise scale in Data Warehouse rather than Power Query. Additionally, you cannot refresh single table in Power BI service during dataset refresh. All source tables used in data transformation are refreshed too. It is significantly impacting on total refresh time.

Composite Model & Real Time Analytics

Composite Model allows to have both direct query and import connections in single dataset. This approach is commonly used in case of big data volume or near-real time analytics.

♦ big data volume – aggregated data is stored in import mode (eg. once per day), while single transaction data is stored in database. Import mode confirms fast respond time for commonly used data. Only some of queries (single transaction requests) are executed against database. Power BI model size is significantly reduced.

♦ near-real time analytics – aggregated data is stored in import mode (eg. once per day) that provides fast respond time. Frequently changing data is stored in database. Such reporting requests are executed directly against source database, reducing data latency related with Power BI data refresh schedule (ref: Refresh Rate limits)

Power BI Pro Power BI Premium Per User Power BI Premium Per Capacity
Refresh Rate 8/day 48/day 48/day

DirectQuery and Live Connection source datasets are limited. Direct Query approach can be used for Databases, Azure Data Explorer, Databricks etc. However, flat files or blobs do not. Power BI Data Sources and corresponding connection types are listed here.

Summary

If it is expected that your data will scale up really fast, it is strongly recommended to treat these files outside PBI to avoid performance issues and increase data transformation capabilities. It is extremely important to keep high performing and stable PBI reporting platform for end users.