Referencing and Creating Lists

Referencing and Creating Lists

Previous Session Summary

In the previous lesson we had a little look at PowerPivot (great tool) and how we can load data from PowerQuery straight into this advanced Data Analytics tool.

Lesson Synopsis

The last lesson was very much skimming the surface of PowerPivot. In this lesson we’re going to go a bit deeper and show actually how you could put a basic model together and how PowerQuery can really help with making model creation so easy.

In terms of PowerQuery skills, we’re looking particularly at referencing queries and creating lists.

Lists are essentially a single column worth of data, and for the purpose of this course, that’s all we need to know. They do get quite a bit more advanced as you get more competent in PowerQuery though and have their own separate set of customs (more on these later!)

Referencing queries on the other hand is another way of duplicating a query, which you might remember right from the beginning of the course is where you create an exact copy of a query. However, whereas a duplicated query won’t change if you change the original query, a referenced query WILL change. A referenced query will always start at the point that the original query finishes. If you change the original query, then the referenced query updates itself to match.

Resource Listing

The Video

Play Video

Lesson Notes

In the video to this lesson we cover Referencing and Creating lists, as well as building a basic payroll model in PowerPivot. In this accompanying text, we’re going to look at the referencing and creation of lists only, but if you do want to find out a bit more about how the PowerPivot Model is created, then go to the ‘a little bit more’ section at the bottom of the page.

Referencing

Referencing is one of two ways to copy queries in PowerQuery (the other of course is the duplication method). However, although they look the same, they are fundamentally different.

Duplicated queries copy every single step in an original or base query. If you choose to duplicate a query then the query settings pane on the right hand side of the screen in both the base query and your duplicated query will have exactly the same steps. Once the query has been duplicated, from that point on the queries ca diverge.

Referenced queries on the other hand copies the original query so that any changes subsequently made to that original query come through into the new duplicated query as well. A referenced query starts at exactly the step that the original query finishes, regardless of where that query finishes and whether new steps are added after the referencing has taken place. We can see this in the query settings pane, where a referenced query will always just show a single step ‘Source’. In the formula bar of that single step will be the name of the original query. Very different to a duplicated query.

So how do we create a referenced query?

Creating a referenced query.

To create a referenced query;

  1. In the query pane on the left hand side of the screen, select the query that you’d like to reference.
  2. In the Home tab, select the ‘Manage’ drop down
  3. Select Reference from the subsequent list.

What is a list?

A list is one of the three main building blocks of PowerQuery code (also known as ‘M’ code), the others being ‘Records’ which are essentially rows, and ‘Tables’, which are fairly self explanatory.

Lists are principally used in more complex PowerQuery functions and more as a sort of interim between complex functions, but they can be created and viewed on screen as we’ll see in this lesson.

Particularly in PowerPivot, we use what we call dimension tables, which are essentially descriptive tables that you can connect to fact tables which are historical tables of records such as sales. We teach this in our PowerPivot pack, but a dimension table can be used to filter multiple fact tables. A very basic dimension is a single list of unique values, and this is what we’re going to create here.

How do I create a list?

Creating a list is very straightforward:

  1. simply highlight the column which is going to act as a list. Bear in mind that all other columns will be removed.
  2. Go to the Transform tab and then in the Any Column section of the ribbon, select Convert to List.

And it’s that straightforward.

What you’ll now be left with is a single column of data, with all the other data removed. As you get more advanced, you’re now able to use a lot of List functions, but as I previously said, these are a good deal more advanced than we need to go into at the moment.

An alternative to this method (which is particularly advantageous, if you subsequently want to add in other columns), is simply selecting the column you want to keep, right clicking on the column header and then selecting Remove Other Columns. This has the same result as the List option insofar as only the column you selected is kept, but the query remains in a table format rather than a list format. This means that you can continue to add in columns to your one remaining column, which you can’t when the column is in a list format. If you want to convert your list back to a table format, then simply select ‘Convert to table’ on the ribbon, and you can start adding in data again.

How can I remove duplicates?

Removing duplicate values in a column or list of data is a really useful function, and we use it a lot in standard Excel (just look in the Data tab, in the Data Tools section of the ribbon and you’ll find the Excel version of this function).

To remove duplicate values in a column of data:

  1. select the header of the column where you’d like to remove duplicates,
  2. go to the Transform tab
  3. select ‘Remove Duplicates’.

Alternatively, with the column selected, right click the header of the selected column and select Remove Duplicates from the pop up list.

Remove duplicates and character/case sensitivity.

A really important point to remind ourselves of here is that PowerQuery is character and case sensitive. Consider the following list of values:

Browning 1
Browning 1
Browning2
BROWNING 1
BrOwNiNg 1

Removing duplicates from this list would leave us with the following list:

Browning 1
Browning2
BROWNING 1
BrOwNiNg 1

Even though the first, second, fourth and fifth values in the original list have the same characters and spaces, remove duplicates will only remove the first or second value. This is because only those values are exactly the same in terms of case as well as character.

What would you do if you wanted to remove all duplicates from the above list then regardless of case?

We’d of course go to the Format section of the Transform tab, and from the drop down list, format the capitalisation, using ‘Lower’, ‘Upper’ or ‘Proper’. As long as the characters in each value are the same then case issues should be resolved, and remove duplicates should then leave us with the following list:

Browning 1
Browning2

It’s also wise to check for any trailing spaces using the ‘Trim’ function, again found in the formatting section.

.... and a little more!

PowerPivot and relationships – Oh go on then – here’s a bit more information on how we managed to put data from three different tables into a single pivot table. We don’t show this in the video so pay attention!

a) First of all, once all the data from all the queries is in the data model, we need to create a relationship between our Employee List and Gross Salary File, and then our Employee List and our Payroll File. To do this, go to the PowerPivot tab in standard Excel and select Manage.

Once you’re in the data model, go to the ‘Design’ tab and select ‘Create Relationship’. A menu, similar to the Merge menu in PowerQuery should pop up asking you to select the tables between which you want to create a relationship. Select the Employee List table and then highlight the only column in that table. Below, select the Gross Salary File and then highlight the employee second name column. Click ok and then create the same relationship between the Employee list table and the Payroll File.  What we’ve done here is connected both ‘fact’ tables (the gross salary and payroll tables) to the ‘dimenson’ table using a ‘key’, a piece of information common in both datasets.

b) Go back to the Home tab and select PivotTable on the ribbon.

c) In standard Excel in the field list of the PivotTable, you’ll now see all three tables.

d) Drag the Employee Listing column from the Employee Listing query into the rows pane.

e) You now need to create three calculations. One calculating the total gross salary, the next calculating the total payroll and the third calculating the difference between the two.

To do this, in the field list right click on the Gross Salary query header and select Add Measure. In the Measure Name box, type ‘ Total Gross Salary’ and then in the Formula box, type ‘=sum(‘. This should bring up intellisense and you can select the column from the Gross Salary Query that you want to sum which should be [Gross Salary GBP]. Close the brackets and click ok.

Create the same measure in the Payroll table, calling the measure something like [Total Payroll GBP] but summing the Payroll Value GBP column.

Finally create a measure calculating the difference between the two. Create the measure by right clicking on the Employee Listing Header in the Field List and call the measure ‘Variance’. In the formula box type an = and then open a square bracket “[“. This will bring up the measures that you have just created and you can select each, putting a subtract operator inbetween (-). Click Ok.

f) You can then drag the three new measures into the values pane of the field list.