How to get data from a PDF into Excel using PowerQuery
One of the questions I used to ask all the time as a finance analyst was ‘how can I get data from a PDF and put it into Excel?’
A lot of useful information that’s delivered to a business is in a PDF format – invoices particularly. In one of my earlier roles, one of the jobs of my team was to extract data from PDF invoices and put it in Excel so that we could validate that we were paying for the right services. We received a lot of invoices every month and some suppliers wouldn’t give us Excel backup so we had to extract the data out of the PDF invoices instead.
I wish I had known PowerQuery then as well as I do now! Back then we used to literally copy and paste the data in or if that didn’t work, manually (eek!) write the information out into Excel.
With PowerQuery it is so easy to connect to a PDF file and then find the data that you want. Once you’ve found it, then it’s really easy to transform the data (if it needs transforming) and pop it back into Excel.
The thing is though that many people who know PowerQuery in Excel, don’t know that you can really easily connect to PDFs because there’s no obvious connector in the Get Data menu on the Data tab’s ribbon. That’s ok though, because we can just put in a blank query and then use a really easy bit of code that you can simply copy from the below video.
Of course, while I’m talking about PowerQuery in Excel, it’s even easier to connect to, and extract data from a PDF using Power BI. That’s because in Power BI, there is a connector. So check out the below video to see how it’s done, firstly in PowerQuery in Power BI and then in PowerQuery in Exce
If you’d like to follow along with the video, then download the PDF file in the link to the right, and connect Power BI or PowerQuery in Excel to the file:
A little bit about this connector...
So this code to enable the connector in Excel is a bit more complicated compared to some of the other codes that we’ve seen in the past.
=Pdf.Tables(File.Contents(PDF location on OneDrive), [Implementation = “1.1”])
Other Popular Finance Team Training posts on PowerQuery...
- Why is My Merge Not Working in PowerQuery?Excel, Power BI, PowerQueryOctober 28, 2021
- How to VLOOKUP Multiple Matching Rows Using PowerQueryPower BI, PowerQueryJanuary 12, 2021
- How to Connect PowerQuery in Power BI or Excel to a Sharepoint filePower BI, PowerQueryDecember 14, 2020
- How to use Date.IsInNextMonth in PowerQueryPower BI, PowerQueryDecember 2, 2020
- Creating a Pivot Table from Data on Multiple Sheets in an Excel WorkbookExcel, PowerQuery, UncategorizedNovember 11, 2020