Power BI – The Data Model – An Introduction to Relationships

An Introduction to Relationships

Previous Session Summary

In the previous lesson, we covered how to enter the PowerQuery editor for the first time by highting some Purchase Order Data in our current workbook and then going to the Data tab and then going to From Table/Range.
We then looked at the different components of the PowerQuery editor being the Preview Screen, the Query Pane, the Tabs and Ribbon and the Query Settings Pane.

Lesson Synopsis

In this lesson we start looking at some of the basic steps that we or PowerQuery automatically use to transform our source data from its original ‘dirty’ state, to clean columnar data. Now before we start off in this lesson, one very simple, important point:
You cannot edit individual cells in the PowerQuery Editor. You can only perform transformations on whole columns, rows or tables.
In this lesson then we’re going to continue working with the PO data from the last lesson. We’re going to look at a couple of transformation steps which PowerQuery has added automatically and then we’re going to put in a few of our own, including removing some unnecessary rows, promoting the top row into the Header row and then loading our nice, neat columnar data into Excel.

Resource Listing

The Video

Play Video

Lesson Notes

In the next few lessons we’re going to look at how to create relationships between datasets in our data model. Now this is a very important topic in Power BI so first of all, what is a relationship?

Well, a relationship connects two or more data tables together using a single column of data which is called a Key and being able to create relationships is very exciting as they allow us to:

  1. Link our datasets together to act like one big dataset
  2. Create filters that manipulate more than one dataset
  3. Perform lookups of information from one dataset to another more easily

This might all sound a little complicated but actually the concept of a relationship is one that you’ve unconsciously been aware of for years if you’ve been using HLOOKUPS, VLOOKUPS or INDEXMATCHES as part of. In each of those formulas you identify a piece of data common in both datasets to  bring information from one report and put it into another.

Switch to Data tab in Power BI

Now as we can see in the Data tab of the Power BI screen, we already have sales data in our Power BI file. It’s this one here marked Retail Analysis showing the sales of each unit in each store as well as the price and margin but the data’s not particularly rich or easy to interrogate. It doesn’t for example show where the product was sold, who the product buyer was, what category the product belongs to and so on.

 This is what the other tables bring in, and these are called Dimension tables. Let’s have a look at those.

First up we have the District report and this shows us the District Manager responsible for each region of our stores. We can see that we have a district name and a district ID as well as the name of the District Manager.

Next we have the Item dataset. This shows us the person responsible for buying each item, the category that each item belongs to as well as the category segment and ItemID which are the same as the SKU numbers that we have in the Retail analysis tab.

Thirdly we have a Store information which shows information about each store where we have sold products we can see that we have a LocationID, the name of the city or town, county and country where the store is held, when the store was opened and a few other useful pieces of information.

As it stands, before Power BI, if we want to say do some analysis of our sales say by District and by Product category then we’d need to do a couple of things.

Cut to Excel

Ok, here I am in an Excel file and this file shows the Retail analysis tab here, the District, Item and Store information respectively on these tabs.

In order to do our analysis, we need to create a static table or pivot table (which I prefer) of the information in the Retail analysis tab, but at the moment it doesn’t have all the information that we need. We need to bring into it,

The Category field from the tab Item and the DistrictName column from the tab Store. These I can do with a couple of Vlookups or indexmatches

  1. =VLOOKUP(B2,Item!$A$2:$C$364185,3,FALSE)
  2. =VLOOKUP(C2,Store!$A$2:$F$103,6,FALSE)

That wasn’t too painful but it could take a while on a less powerful computer and of course if you were doing this live, so you were receiving new sales results each day you’d have to do that lookup each day too as well as bringing in any other information from those other tables that you might want to analyse. From here of course we cold create a pivot table and cut the data however we want.

Not so with Power BI. No VLOOKING up required, we just need to make sure that both our reports are in the data model and that we’ve created a relationship between the two reports.

So here we are back in Power BI and in our Model View. First of all we’re going to create a relationship between the Retail analysis table and our Store Dimension table.

To create a relationship, we need to find a piece of data that’s present in both datasets. That for us is going to be the Location ID, because as we can see in the field pane, the Location ID is present in both datasets.

So why do we need a piece of data that’s present in both datasets? Well think of it like the VLOOKUP that we just performed in the Excel file. Here we go, here’s the VLOOPKUP formula that we just put into the last file to put the ItemCategory column form the Store table and put it into the Retail analysis table. Here we have an extract from the Retail Analysis table and here from the Item table.

The first parameter of the VLOOKUP says to Excel, ok look in Column B of the Retail Analysis table which is ItemID, then go and look in the Item table in column A and find me that same piece of data. When you’ve found that piece of data in Column A, go 3 columns across and grab the piece of data you find in that column and put it back in the retail analysis table.

We are therefore creating a sort of relationship that only exists in this vlookup between the column ItemID in the Retail Analysis table and ItemID in the Item table. We can also therefore call these two columns, our “keys”.

We can create the same relationship in Power BI but there’s no need to do VLOOKUPS. Once you have created the relationship, you have access to all the dataset all the time. Our common piece of data, our key therefore acts as a link between the two datasets and acts to bring them together into one combined dataset.

So how do we create our relationship?

On the Home tab in the relationships section of the ribbon, I’ll select the button Manage relationships. That then brings up our Manage Relationships menu and we’ve got a couple of options here. So first I’ll select New, and then we get a Create Relationship menu up which is very similar to the Merge and Append functions in Stage 1. So then to create our relationship we select the two tables between which we want to create the relationship.  We’ll create the same relationship that the VLOOKUP created so the relationship between ItemID in the Item dataset and Item ID in the Retail Analysis dataset. I’ll select the Item dataset from the drop down menu and then highlight tuhe column Item ID. Next in tuhe drop down menu below I’ll select the Retail Analysis dataset and the ItemID column.

Now there’s a few other bits and pieces below which we’ll get to in later lessons, but for the moment I’m going to select OK to create our first relationship. Ok that brings us back to the Manage Relationships screen and we can see that we now have a relationship between Item and Retail Analysis. Ok now I can press close.

Ok that’s our relationship created. So let’s look at our Model View to make it official. Here we go, we can now see a line joining the two datasets between which we’ve just created a relationship, Item and Retail Analysis. If I hover over the line then we can see that it turns yellow and the two fields that link the two datasets also turn yellow.

Ok next lesson we’re going to delve deeper into how this relationship works…

.... and a little more!

The Formula Bar – The Formula Bar does not automatically come up in the PowerQuery Editor the first time that you enter the Editor in Excel. To bring the Formula Bar up, go to the View tab. In the Layout section of the Ribbon, tick the Formula Bar box.

Remove rows – While we remove the unnecessary top rows in the video, many system reports also have totals lines at the bottom of the reports which can cause a problem when we’re doing things like summing our data later once we’re finished in PowerQuery. We also have the option to remove a specific number of bottom rows using the Remove Rows drop down in the Home tab, as well as removing alternative rows, duplicate rows, rows with errors and blank rows.

Use First Row as Headers – In the video we promote the first row in the table into the headers row. However we can also take the Headers and put them into the first row. To do this go to the same Use First Row as Headers drop down in the Home or Transform tab and select Use Headers as First Row.