Other Data Sources and Querying Web Pages

Other Data Sources and Querying Web Pages

Previous Session Summary

In the previous lesson, we revised dataset refreshing including how refreshing picks up changes in data included in a data source as well as any referenced queries created from a query that we have refreshed. We also saw how refreshing a query automatically updates any pivot tables connected to the refreshed dataset.

Lesson Synopsis

So far in this course we’ve focused on creating queries from either tables in Excel workbooks, worksheets in Excel workbooks, or folders of Excel workbooks. We’ve looked at these primarily as these are by far and away, the most popular source of data for finance teams. There are plenty of other options though and we list some of these in this lesson.
 
We also look at one of the other main sources that you might connect to; web data and show you some of the pitfalls of simply trying to copy and paste data from a website.
 
Finally we look at some of the options for refreshing data in Excel including automatically refreshing data and fast and slow refreshing options.

Resource Listing

The Video

Play Video

Lesson Notes

So far in this course we’ve done a lot of work with data that’s been imported from Excel files, but in fact we can connect our queries directly from an awful lot of other sources of data.

Now we’re going to look at some of the ones available in Excel, but Power BI which also uses PowerQuery has an awful lot more and we’ll look at those in a few lessons time.

So how can we find these other data sources? Of course in the usual way!

  1. In an Excel spreadsheet go to the Data tab
  2. Go to the get Data button on the ribbon.

From here there are plenty of options of course. In the course, we’ve always gone to “From File” and used either “From Workbook” or “From Folder”.

– “From Text/CSV” is also available which connects to a CSV file, although in truth, PowerQuery will do the same thing if you go through the From Workbook route and then select a CSV file (you’ll need to show ‘All Files’ in the browsing menu for a CSV file to show).

– There’s also a couple of other options including JSON which is a Java file and XML files which are Extensible Markup Language. In both cases, simply select either XML or JSON and then select where you’ve saved the file to connect to them.

– Finally, and not shown in the video, depending on your version of Excel are Sharepoint Folders. These are extremely useful particularly when using Power BI as they make automated file refreshing easier. However, as long as you’ve synced your Sharepoint site to your OneDrive account, and therefore the Sharepoint site can be accessed through Windows Explorer, you’ll be able to access files saved in Sharepoint through the normal “From Workbook connector.

Database connection options:

Other options that are available including querying information directly from a database including:

  • Access databases
  • SQL databases,
  • Analysis Services or
  • SQL Server Analysis Service Databases
  • Azure databases.

Aside from Access which your Finance Team may use independently, if your business happens to use one or more of these databases, they’ll very likely be run and curated by your IT team so get in touch with them and they should be able to help you out with connection. This would include giving you vital information including server or database addresses.

Microsoft Subscriptions

Additional connection options are available in Excel depending on your level of Microsoft subscription. Two other useful connectors that were not shown on the video is the ability to connect to Microsoft Exchange and extract data from emails as well as Sharepoint Folders.

Connecting to inboxes and extracting data from attachments is such a handy feature as it means that you don’t need to always be saving attachments to drives before connecting to those attachments. Do note though that you can’t connect to a personal email address, such as one ending in ‘hotmail.com’ or ‘outlook.com – it has to be a company email address that uses Microsoft Exchange.

If your company uses Sharepoint, then the Sharepoint folder is also extremely helpful as it allows you to connect directly to Sharepoint folders and files through the web. If you’re interested in learning more about this file connection type then have a look at this lesson.

Other Queries

 We’ve already looked at connecting to a table or range right at the beginning of this course, and we’re going to look at connecting to a web page in a moment, but these three options here are also worth a mention.

  • ODATA connection is a way of connecting directly to web data that can’t be accessed through the normal web connection capabilities.
  • ODBC connectors and OLEDB connectors are both used for connecting to systems that don’t have readymade connectors for PowerQuery. When we get into using Power BI you’ll see that there are a lot of these readymade connectors but not one for every application.
  • If you’re trying to extract data from a system and there isn’t a readymade PowerQuery connector, you may be able to fashion one using these options. This topic is super advanced and beyond the scope of this course
 

Connecting to data on a website

Connecting to data on a website is actually a very straightforward thing to do. Let’s imagine that we wanted to download some information directly from a website. Now historically, we would have had to go to the webpage and attempt to copy and paste the data. Sometimes this would work, other times it might not. Sometimes we might be able to copy the data but then pasting it would result in some strange code or peculiar format.

So how can we use PowerQuery to connect to website data?

  1. Go into the Data tab
  2. Select Get Data then “From Other Queries”
  3. From here, select “From Web”.
  4. In the resulting pop up menu you’ll be asked to input a URL and in our video we used the following web address:

https://en.wikipedia.org/wiki/List_of_Category_5_Atlantic_hurricanes  

At this point, Excel may ask how we want to connect to the web data. As we’re querying a public website here, Anonymous on the left works fine.

PowerQuery then connects to the website and gives us the Navigator screen which has the options for which of the tables on the webpage we want to use. A bit of trial and error may be needed to find the data that you’d like but use the preview screen here and you’ll find the data you’d like to use!

Refreshing options in Excel

Now while, we’re at it – another excellent tool that PowerQuery has is the automatic refresh option.

A few lessons ago we looked at how you could manually refresh queries and looked at it again in the last lesson, but if other people have access to our files and we want the information to be as live as possible, without having to hit refresh…well we need to do something else.

 What we can do is switch on automatic refreshing which will refresh the connection every hour, every 10 minutes or whatever amount of time that we choose.

To do that we need to:

  1. Go to the Queries and Connections screen which we can do through the Data tab and then the Queries and Connections option.
  2. Right click on the query that we want to automatically refresh and select Properties right at the bottom of the list.
  3. On the resulting pop up menu, on the Usage tab we have a nice little section called Refresh Control.
  4. Here we can select however often we would like the query to be refreshed.

There’s also a couple of other useful options including Refreshing Data When Opening a File and Enabling a Fast Data Load. Both of these are useful options, but be warned that the former could really slow down the opening of a file and if external users are using the file, they need to be warned of that! The latter is also useful but can also really slow down any other work you’re doing on your computer (depending upon how pacey your computer is!)

.... and a little more!

Refreshing CSV files vs .xlsx files – A key difference that’s worth noting between .xlsx files (standard Excel workbooks) and CSV files is on refreshing. If you connect to a .xlsx file and then subsequently add new columns to that file, then PowerQuery will spot the new columns and bring them into the query. That’s not the case with CSV files where PowerQuery will only bring in the amount of columns that it identified in the original CSV file. The number of columns brought in can be changed in the Source step in the Formula bar but it has to be done manually, when new columns are included on the source data.