First Transformations in the PowerQuery Editor

First Transformations in the PowerQuery Editor

Previous Session Summary

In the previous lesson, we covered how to enter the PowerQuery editor for the first time by highting some Purchase Order Data in our current workbook and then going to the Data tab and then going to From Table/Range.
 
We then looked at the different components of the PowerQuery editor being the Preview Screen, the Query Pane, the Tabs and Ribbon and the Query Settings Pane.

Lesson Synopsis

In this lesson we start looking at some of the basic steps that we or PowerQuery automatically use to transform our source data from its original ‘dirty’ state, to clean columnar data. Now before we start off in this lesson, one very simple, important point:
 
You cannot edit individual cells in the PowerQuery Editor. You can only perform transformations on whole columns, rows or tables.
 
In this lesson then we’re going to continue working with the PO data from the last lesson. We’re going to look at a couple of transformation steps which PowerQuery has added automatically and then we’re going to put in a few of our own, including removing some unnecessary rows, promoting the top row into the Header row and then loading our nice, neat columnar data into Excel.

Resource Listing

The Video

Play Video

Lesson Notes

The first two steps that have been created in the Query Settings pane are ‘Source’ and ‘Changed Type’.

The Source step tells PowerQuery where to find the information that you want to bring into the PowerQuery Editor. ‘Changed Type’ on the other hand dictates to PowerQuery, the format that our data should be in in each column.
 
There are a number of different datatypes in PowerQuery including Text, Whole Number, Decimal Number, Date, DateTime, Percentage, True/False (Boolean) and a few more. Having the correct datatype is much more important in PowerQuery than it is in Standard Excel. For example:
 
a) the Text, Date, DateTime and Number types have sets of functions that you can only use with each datatype.
b) assigning incorrect datatypes can result in cell errors e.g. assigning number a column which has text in some cells will result in errors in the cells where text exists. We’ll look at this in a later lesson.
 
Next we look at how to add our first manual command to PowerQuery.
 
We frequently need to remove unncessary rows, and we can do this either at the top or the bottom of the dataset using the Remove Rows option in the Home tab. This is the first command that we manually give to PowerQuery to transform the dataset and it will remove the unnecessary rows at the top of our PO dataset. After that we want to move the top row of data into the Column Headers position. To do this, we use the Use First Row as Headers option in the Home or in the Transform tab.
 
Finally we then put the resulting dataset, all cleaned and formatted back into the Excel workbook using the Close and Load option in the Home tab.
 

.... and a little more!

The Formula Bar – The Formula Bar does not automatically come up in the PowerQuery Editor the first time that you enter the Editor in Excel. To bring the Formula Bar up, go to the View tab. In the Layout section of the Ribbon, tick the Formula Bar box.

Remove rows – While we remove the unnecessary top rows in the video, many system reports also have totals lines at the bottom of the reports which can cause a problem when we’re doing things like summing our data later once we’re finished in PowerQuery. We also have the option to remove a specific number of bottom rows using the Remove Rows drop down in the Home tab, as well as removing alternative rows, duplicate rows, rows with errors and blank rows.

Use First Row as Headers – In the video we promote the first row in the table into the headers row. However we can also take the Headers and put them into the first row. To do this go to the same Use First Row as Headers drop down in the Home or Transform tab and select Use Headers as First Row.