So far in this course, we’ve learned that PowerQuery changes data from a ‘dirty’ format into a ‘clean’ columnar version. It does this through recording a list of steps, which it then regurgitates when you click refresh.
What we start to look at in this lesson is the code and code structure which PowerQuery uses to write these steps. The code is called ‘M’ and we can write this code in the formula bar or in the Advanced Editor.
Hi there and congratulations on getting to this lesson, where we start to have a look at the code that PowerQuery uses.
Now, you’ve already got more than enough information to save time, perform some brilliant transformations of your datasets, and perhaps start to have a look at Powerpivot.
However, even with all of this new knowledge we haven’t really touched the code that writes all of these wonderful transformations. That’s not necessarily a bad thing. As we’ve already shown, you can do an awful lot without needing to know about the code, but understanding how it works starts to be more important as you get more proficient at using PowerQuery.
Why is it important? Because at some point, you’ll want to do something in PowerQuery and you’ll hit a wall with what you’re able to do through the ribbon. In fact, in a few lessons time, we have a look at conditional statements, and while basic and nested IF statements are straightforward, you need to understand a bit of M to put in ‘and’ and ‘or’ operators.
Alternatively, perhaps we have a common code failure where a column in some imported data has changed name and that’s resulted in a refresh error. That’s a really easy thing to fix…once you understand a little bit of how the code is structured.
Where can I look at the Code behind the steps?
A nice simple question to start off with; where can I look at the code.
Well there’s two places that you can look at the code. If you just want to look at the code of a particular step, then you can of course select that step in the Query Settings Pane and then review the code in the Formula Bar. If you haven’t had a go at reviewing the Formula Bar yet, then you may need to select the option for it to show up in the PowerQuery Editor. If this is the case then go to the View tab and select the Formula Bar box.
If you want to look at all the code for all steps, then we need to go to the Advanced Editor. We can do this by going to the View tab and the on the ribbon, select ‘Advanced Editor’.
That’s a lot of words in an incomprehensible structure…
Well, yes to start with, a block of code in a long query can look a bit daunting so let’s break it down into steps (haha!).
The ‘let’
Queries always start with a ‘let’ expression which basically tells PowerQuery that you’re about to start giving it some code. Note that it is completely in lower case. Leave it where it is and don’t mess with it 🙂
The big block of code (eeek!)
Once you’ve got past the ‘let’ expression, you’ll then be left with a block of code, with an ‘in’ statement at the bottom and a smidge of text afterwards. When I first saw this code, I looked at it cluelessly, wondering what on earth it all meant. However it is quite logical, and with a little bit of knowledge, it can start to be unpicked and understood.
Principle 1:
Each line of text in the code corresponds to a step in the Query Settings pane. Once the code for that step has finished, there is a comma to designate the end of the step.
So consider the following text for example.
Source = Folder.Files(“C:\Users\xyza\OneDrive\Excel lessons\Excel – PowerQuery\Lesson 8\Arrivals data”),
#”Filtered Rows” = Table.SelectRows(Source, each ([Extension] = “.xlsx”)),
This code contains two steps: a Source step of a query that we looked at back in Lesson 8 and then a Filtered Rows step. The Source step finishes at the comma on the second line, and then the Filtered Rows step starts. This step then finishes with a comma ready to go onto the next step, for which we don’t have the code here.
You would be able to find the Source step in pretty much any query (until you start getting super advanced) as all queries need to start with some sort of data, and the Source step is always needed to connect to data somewhere. To find it – simply head over to your query settings pane, and you’ll find it right at the top.
Principle 2:
The first words of each line, until the ‘=’ sign, is the name of the step. These can be matched directly through to the names of the steps in the Query Settings pane.
Again look at the following code:
Source = Folder.Files(“C:\Users\xyza\OneDrive\Excel lessons\Excel – PowerQuery\Lesson 8\Arrivals data”),
#”Filtered Rows” = Table.SelectRows(Source, each ([Extension] = “.xlsx”)),
You can see right at the beginning of each step, the word ‘Source’ and the words #”Filtered Rows”. These are the names that correspond to each step name in the Query Settings pane. You will be able to match through every line to every step in the Query Settings Pane in this way.
A couple of additional points here as well.
You’ll probably have spotted that #”Filtered Rows” has a hashtag preceding the text and speechmarks around the text, while the word Source does not. The hashtag and the speechmarks are required for any step name that has a space in it hence the step name Source doesn’t need them. If you don’t want the bother of the hashtag and the speechmarks then replace the space between words with an underscore, and that will do the trick.
Secondly, you can rename the steps whatever you like – just change the name in the Advanced Editor. However you may be safer doing it in the Query Settings pane (just right click the step you want to rename and click ‘Rename’) as we’ll discover in Principle 3
Principle 3
Each ‘Step’ is actually the name for a table of data that is then referenced in a later step. Each row is fundamentally therefore an equation.
This one will become a bit clearer in the following lesson, but again we’ll go back to the following code
Source = Folder.Files(“C:\Users\xyza\OneDrive\Excel lessons\Excel – PowerQuery\Lesson 8\Arrivals data”),
#”Filtered Rows” = Table.SelectRows(Source, each ([Extension] = “.xlsx”)),
Let’s look at the #”Filtered Rows” row.
a) We start off with the name of the table or step.
b) We then have an equals sign. This is where we, the user, explain to PowerQuery how the #”Filtered Rows” should be calculated.
c) After the equals sign we have a PowerQuery function (which we go into in the next lesson). In this case, the function is called ‘Table.SelectRows’. Pretty much all functions as a starting parameter take another table that we have defined and in this case it is the ‘Source’ table which is the previous step. This step is therefore saying:
To calculate the #”Filtered Rows” table, grab the result of the Source step (or the ‘Source’ table) and then filter it for only the rows where in the Extension column there is the value “.xlsx”
A couple more quick points on this principle.
1) While in this example the #“Filtered Rows” table takes the table from the previous step as a starting point (the Source table), it doesn’t have to be the previous step. A step in PowerQuery can take a table from wherever it’s defined in the code, even if the code is after the current step.
2) Secondly, remember in the previous principle when you could rename a table or a step? If you rename a step or table in the Advanced Editor then you’ve got to make sure that you rename it wherever else that table is referenced in the code (probably the following step). If you don’t rename the step or table everywhere in the code, then you’ll get an error. If you rename the step in the Query Settings pane however, then PowerQuery will rename the step everywhere in the code.
Principle 4:
The code will finish win an ‘in’ statement, followed by a previously defined table name
Simple principle this one – have a look at the below text:
#”Removed Columns” = Table.RemoveColumns(#”Added Custom1″,{“MoS Booking.1.1”, “MoS Booking.1.2”})
in
#”Removed Columns”
This text shows the last three lines of a query: the last defined table in the code, the ‘in’ statement, and the table that we’d like to ultimately load to a table or Data Model.
Once we’ve got to the point where we’re happy with the look of the final table in the preview screen, then we have an ‘in’ statement denoting that we’ve got to the end of the code. The text after the ‘in’ statement is the name of the table with which we want to finish.
Again, please note that you don’t need to finish the code with the last defined table or step in the code (although you almost always will!). You can finish it anywhere, even on the Source table if you like.
Also note that in the final row before the ‘in’ statement, we don’t have a comma to finish the code.
How to think of the code – Think of the code as a series of calculations that look like the following:
let
Source = Go and grab the data for cleaning at this location
Table1 = Take the data in the Source table and perform the action in this row’s function
Table2 = Take the data in Table1 and perform the action in this row’s function
Table3 = Take the data in Table2 and perform the action in this row’s function
in
Give the result of Table 3
You could also think of each row as a variable or virtual table if you’ve got some DAX experience