Power BI – The Data Model – Controlling Multiple Datasets Using Relationships

First Transformations in the PowerQuery Editor

Previous Session Summary

In the previous lesson, we covered how to enter the PowerQuery editor for the first time by highting some Purchase Order Data in our current workbook and then going to the Data tab and then going to From Table/Range.
We then looked at the different components of the PowerQuery editor being the Preview Screen, the Query Pane, the Tabs and Ribbon and the Query Settings Pane.

Lesson Synopsis

In this lesson we start looking at some of the basic steps that we or PowerQuery automatically use to transform our source data from its original ‘dirty’ state, to clean columnar data. Now before we start off in this lesson, one very simple, important point:
You cannot edit individual cells in the PowerQuery Editor. You can only perform transformations on whole columns, rows or tables.
In this lesson then we’re going to continue working with the PO data from the last lesson. We’re going to look at a couple of transformation steps which PowerQuery has added automatically and then we’re going to put in a few of our own, including removing some unnecessary rows, promoting the top row into the Header row and then loading our nice, neat columnar data into Excel.

Resource Listing

The Video

Play Video

Lesson Notes

Hi there and welcome to Lesson 8 of the second stage of the Power BI pack and this is our final Relationships talk for the moment. Now as we talked about a couple of lessons ago we’re going to use relationships to control multiple datasets at once – we’re going to use our ProductDimTable to control the 2018 Sales Dataset and the 2017 Sales Dataset at once.

So here we are back in our Power BI file. Now to do this we need to do a couple of things. Firstly we need to import the 2017 Sales data and that’s straight forward enough: we do it in exactly the same way as the 2018 Sales Data. I’ll select Get Data, connect to the folder and off we go, but I’m going to do the rest of screen as you know how to connect to and clean data now.

Switch to Data View

Ok now we’ve got our 2017 Sales Data view and let’s have a quick look, and yes it’s exactly the same as the 2018 Sales Data view. So it’s the same report but just showing sales that were made in 2017, not 2018.

Ok then the next thing to do is create a relationship between our ProductDimTable and our newly imported 2017 Sales Data. Now of course to do this, we go to our Manage Relationships button, select New and then go through the same process as we did to connect the ProductDimTbale and the 2018 Sales Data table. We first select the ProductDimTable and then select the SKU column. We then select the 2017 Sales Data table and select the SKU column. Now finally I need to make sure that the Cardinality says Many to One and press OK.

Right now let’s look at the Model View. Now we can see that we have connected the ProductDimTable to the 2017 Sales Data table and the 2018 Sales Data tab. Now pay attention here – you see these arrows here on the lines connecting the datasets? These represent how the filter context flows. And that at the moment is going to mean absolutely nothing so let’s switch back into the report view and look at our visualisation again before we look at that statement again. Right let’s get rid of all of these fields and then bring back the table visualisation and put in the SKU field from the ProductDimTable. Ok Power BI has automatically summarised the dataset to a count so I’ll just select the drop down menu of the SKU in Values and select Don’t Summarize. Ok now we’ve got our SKU numbers.

Then next let’s put in the 2017 Sales Units and then the same from the 2018 report. Ok now look at this we can compare the 2017 Sales units to the 2018 Sales Units – easy!!!

Ok then why don’t we get rid of the SKU numbers – we don’t actually need that even if we have connected use this column to connect the two datasets. Let’s put in the Departments instead – another field in the ProductDimTable. Now we’re talking – our sales Units for 2017 and 2018 are summarised by department. And this will work for any field in the ProductDimTable. We could take off department and port in port code or Port name. We could do margin if we wanted to assessing performance by mention – ah so many options.

Now let’s go back to the point I mentioned earlier, when I was talking about the direction that the filter flows. Let’s go back to the Model View for a moment. Ok we can see the birds eye view of our report and here we can see the arrows on the lines and as we can see the arrows are pointing away from the ProductDimTable and toward the 2017 Sales Data and 2018 Sales Data. What does this mean?

What this means is that we can use the ProductDimTable to control the 2017 Sales Data table and 2018 Sales Data table but not the other way around and this is because filter context only flows downhill from the Table designated as 1 to the table designated as Many. And what this means is that the filter that’s being applied to any field in the “1” table also applies to any data included in our visualisations that comes from our tables that are linked via relationships. However filters (unless we specifically ask them to) do not flow uphill. So we cannot employ a filter on a field in the 2018 Sales Data table to influence fields in the 2017 Sales Data Table. Let’s do an example though. I’ll go back into the View and then take the Departments field out of our visualisation and then put on our SKU field from the ProductDimTable and this table of course works and we see the total amount of sales of each Product in 2017 and 2018. This works because we’ve taken the field from the ProductDimTable and this asks Power BI to pivot the information in the other two reports according to this field as if all three datasets were part of one big table.

Now let’s take out the SKU field from our visualisation and and put in the SKU field from the 2018 Sales data report and see what happens. Right now this time, the SKU information comes in and the 2018 Sales Units sum for each line just like in a pivot table and that’s to be expected as after all, these two fields come from the same report so the information should be able to interact and be pivoted just as though it were in a normal pivot table. However the 2017 Sales Data simply shows one number on every line. Let’s go back to the Model View quickly to work out why.

Ok the reason that it does this is because there is neither a direct relationship between the 2018 and 2017 Sales Data and while the 2018 Sales Data and 2017 Sales Data are connected indirectly, they are connected through the ProductDimTable. Now the relationship between the 2018 Sales Data and the ProductDimTable allows for filters to flow downhill i.e. from the Product DimTable down to the 2018 Sales Data but not the other way round and then back down to the 2017 Sales Data table and therefore we can’t use the 2018 table to influence the 2017 table.

.... and a little more!

The Formula Bar – The Formula Bar does not automatically come up in the PowerQuery Editor the first time that you enter the Editor in Excel. To bring the Formula Bar up, go to the View tab. In the Layout section of the Ribbon, tick the Formula Bar box.

Remove rows – While we remove the unnecessary top rows in the video, many system reports also have totals lines at the bottom of the reports which can cause a problem when we’re doing things like summing our data later once we’re finished in PowerQuery. We also have the option to remove a specific number of bottom rows using the Remove Rows drop down in the Home tab, as well as removing alternative rows, duplicate rows, rows with errors and blank rows.

Use First Row as Headers – In the video we promote the first row in the table into the headers row. However we can also take the Headers and put them into the first row. To do this go to the same Use First Row as Headers drop down in the Home or Transform tab and select Use Headers as First Row.