Power BI – The Data Model – Calculated Columns

How to Create Calculated Columns

Previous Session Summary

In the previous lesson, we covered how to enter the PowerQuery editor for the first time by highting some Purchase Order Data in our current workbook and then going to the Data tab and then going to From Table/Range.
 
We then looked at the different components of the PowerQuery editor being the Preview Screen, the Query Pane, the Tabs and Ribbon and the Query Settings Pane.

Lesson Synopsis

In this lesson we start looking at some of the basic steps that we or PowerQuery automatically use to transform our source data from its original ‘dirty’ state, to clean columnar data. Now before we start off in this lesson, one very simple, important point:
 
You cannot edit individual cells in the PowerQuery Editor. You can only perform transformations on whole columns, rows or tables.
 
In this lesson then we’re going to continue working with the PO data from the last lesson. We’re going to look at a couple of transformation steps which PowerQuery has added automatically and then we’re going to put in a few of our own, including removing some unnecessary rows, promoting the top row into the Header row and then loading our nice, neat columnar data into Excel.

Resource Listing

The Video

Play Video

Lesson Notes

Hi there and we’re going to leave behind relationships for the moment and move onto a new topic of calculated columns.

So to kick off, what is a calculated column?

A calculated column is a column that is created by the user within Power BI that is not in the original dataset. Of course we have calculated columns all the time in standard Excel and these work in the same way as those in Excel formatted tables which is to say that you define the calculation once and the calculation is applied to every cell in the calculated column. There is no capability to ‘hard code’ numbers or have different calculations in different rows of the same column

Creating a Calculated Field can be very useful so let’s dive right in and have a go.

Hwere we are in the data view of Power BI in our 2018 Retail Analysis tab and to kick off we’re going to do something really simple and calculate the cost of our product. We’ve already got margin here and price here, so cost should be really straight forward – it’s simply Price less margin.

To add a Calculated Column then, we can do a couple of different things. We can either go up to the Home tab, and in the Calculations section of the ribbon select New Column, or we can go to the modelling tab and select New Column in the Calculations section, or we can do what I normally do and go to the fields section on the right hand side of the pane, right click the report that we’d like to add the new column into and select New Column.

Our new column then appears at the end of the dataset, highlighted in yellow and saying Column. We can also see up here in the formula bar, that we the word “Column” and an equals sign.

So first off, this formula bar is not like the formula bar in PowerQuery, because it uses a completely different code. Whereas in the PowerQuery editor we used M Code, in the Data View screen and in fact in most other parts of Power BI, we use a code called DAX which stands for Data Analysis Expression if you’re interested. We go into DAX a bit more in future lessons, but DAX is a code which syntax very closely resembles code that we use everyday in Excel. For example all the aggregation formulas, such as SUM, COUNT, STDEV, MAX, MIN, they’re all here. Text Functions like LEFT, RIGHT, UPPER, CONCATENATE, and so on also make an appearance and this of course makes DAX easier to start off with, however be prepared because there are a lot of other new functions which make DAX particularly brilliant.

 So let’s rename the column Cost_GBP. To do that I’ll just type over the word Column in the formula bar and then enter the name and put the equals sign back in. Now for the simple formula. Ok. Writing a formula in Power BI is like writing a formula in Excel formatted tables, you write a formula over a column not over a cell, so you won’t find any cell coordinates in our formulas here, we instead reference table and column names instead.

So here we go; first of all, I want to put in my report name that houses the column that I want to use. To do this I’ll put in an apostrophe, and intellisense pops up offering me all of the tables and fields currently in the dataset. I’ll select 2018 Retail Analysis and press tab and then intellisense narrows down my options to only the columns that are available in the 2018 Retail Analysis table. I’ll type in a square bracket, which denotes to Power BI that I’m starting a column and then put in the first few letters of Price_GBP. Intellisense then gives me the option of Proce GBP and I can select it using tab.  I’ll then put in a minus sign, then put in an apostrophe again to bring up Intellisense and I’ll select 2018 Retail Analysis again by pressing tab, then put in a bracket and then scroll down to Gross Margin.  And there we go, that’s the whole formula. I’ll press enter now, then Power BI does its magic and calculates the whole column as we can see over here

Ok now a spot of formatting I think, so I’ll go to the Modelling tab and then go to the formatting section of the Ribbon and then select the apostrophe button which formats the column to have two decimal places and a comma to separate thousands.

Ok great. And it’s that simple. Calculated Columns!

Now it’s best practice to ensure that every time we reference a column in a calculation, we have the name of the report first and then the name of the column in square brackets. All of the fields we used in this calculation are in the report in which we’ve created the column, so we could get away with not referencing the report and just having the name of the fields. However it’s best to have the report name first as there may be columns called the same title in multiple reports as we do here between the 2018 and 2019 Retail Analysis reports and it’ll take more time to work out which ones are being used in a calculation. It’s really not that different to Calculated Fields in Formatted tables, and those of you that are used to that will adapt very quickly. Notice though that there are no cell coordinates and that’s because there’s no need for them. Every calculation is done on a by column basis and therefore, cell coordinates aren’t required. That also of course means that there can be no exceptions on a cell by cell basis.

Let’s try another one. Let’s calculate the total units sold in the year 2018. I’ll select another new column, and call the column Total Sales Units. Next I’ll then put in a SUM formula for this one so I’ll write SUM and then a bracket and automatically, Power BI offers me the choice of all of the available columns currently connected to the data model. I’ll select ‘2018 Retail Analysis with Units at standard Price in square brackets and then close the brackets. Ok and then Power BI has calculated the same number for every column. Does that surprise you?

It shouldn’t do really – it’s just giving us the total of all the items in the Units at standard price column and because we’ve asked the model to perform the same calculation on every row and the numbers in the Units at standard price column doesn’t change, then it returns the same number for every row.

Ok let’s summarise this lesson with some useful guidelines:

Power BI will apply the same calculation to ALL rows in a column

Formulas should include table titles and column titles surrounded by square brackets e.g. Item[ItemID]

Report titles with spaces have to be encased in apostrophes ‘2018 Retail Analysis’         

Column formulas don’t have to use functions such as SUM/COUNT/AVERAGE

.... and a little more!

The Formula Bar – The Formula Bar does not automatically come up in the PowerQuery Editor the first time that you enter the Editor in Excel. To bring the Formula Bar up, go to the View tab. In the Layout section of the Ribbon, tick the Formula Bar box.

Remove rows – While we remove the unnecessary top rows in the video, many system reports also have totals lines at the bottom of the reports which can cause a problem when we’re doing things like summing our data later once we’re finished in PowerQuery. We also have the option to remove a specific number of bottom rows using the Remove Rows drop down in the Home tab, as well as removing alternative rows, duplicate rows, rows with errors and blank rows.

Use First Row as Headers – In the video we promote the first row in the table into the headers row. However we can also take the Headers and put them into the first row. To do this go to the same Use First Row as Headers drop down in the Home or Transform tab and select Use Headers as First Row.