Tag Archives: Internals

Extracting Power Queries

TL;DR Another place to find your Power BI queries is in the data sources of the embedded tabular model. They are stored in a zip file which is encoded as a base64 string in the Mashup field of the connection string.

While we are waiting for Microsoft team to release new features that would allow exporting Power Query queries outside Power BI ecosystem, we might as well explore all of the other possibilities available to us. So far I’ve counted 3 ways of getting a hold of our precious M queries:

  1. Sending a frown email
  2. Parsing the *.pbix files
  3. Using the clipboard (doh!)

As I’ve been poking around this black box that is Power BI, I have discovered another place where M queries are stored — the data source connection string of the embedded tabular model. The steps that I took in order to get there are:

  1. Find the port number on which the $Embedded$ Tabular model is running on
  2. Connect to that Tabular instance and get the Connection Strings of the Tabular Schema Data Sources
  3. Parse the Connection Strings for Mashup field
  4. Convert the base64 Mashup field into a binary file
  5. Unzip the binary file and display Formulas/Section1.m file

Here is me trying to do everything in one go via PowerShell.

PowerThiefEZGIF

When it comes to scripting, the power of the Shell is far greater than that of the Query, but in this case all of the prerequisites are there to replicate the above steps in pure M.

 

Finding $Embedded$ Tabular port number

I’ve tried to solve this problem before via R scripts, however as Mim pointed out to me, there is a simpler Power Query way of doing this. Because Power BI stores its auto generated port number for the tabular model in a text file (msmdsrv.port.txt) under AnalysisServicesWorkspaces, the task is reduced to reading the contents of that file.

By the way my approach uses a short circuit logic, assuming that the NTFS Access Date records for C:/Users are up to date, apparently they might take up to an hour to update. So if you find this script misbehaving have a look at Imke’s blog post for the original script.

Get Tabular Schema Data Sources

You can read about the latest undocumented Tabular DMVs from Meagan’s blog, or fire up SQL Server Profiler, check the boxes for End Command events, and script out the tabular model from SSMS and you’ll  discover a plethora of DMV being displayed in the profiler’s trace. DMV hiding all of the connection strings is TMSCHEMA_DATA_SOURCES.

Extract Mashup field into binary

I’ve encountered some difficulties parsing the connection string with a Text.Split function so I’ve used a Splitter function instead.

Unzip Formulas/Section1.m from Mashup binary

I’ve seen at least 2 functions for unzipping files in Power Query written by KenR and Mark White. Also I’ve described before how to call R unz function from an R.Execute wrapper. This is the result of combining the two.

Wrapping all of the functions above and adjusting them to support multiple instances of Power BI while applying some minification techniques we get the final M script.

Some of the use cases that are still left for me to explore with this approach:

  • pushing M scripts to GitHub, thus replicating the query catalog functionality
  • exporting the Power Queries to *.odc files and consequently importing M scripts from Power BI desktop into Excel

 PowerShell version

For the curious minds, here is the PowerShell code I used in Power Query reconnaissance.

Please note that this approach can be used for extracting Power Queries for already loaded data and only in Power BI desktop.

Some Power Query internals

Whenever I’m given a black box, I marvel at its elegance. Then after a while, frustration settles in, as the IN’s and OUT’s don’t add up and even going through the official documentation doesn’t help. This is when I reach a point that I bring in a hammer and start banging at this black box until something gives in and secrets start pouring.

Same happened with Power BI and its elegant data mashup engine. I love the fact that it is a “pure, higher-order, dynamically typed, partially lazy functional language” , but the documentation doesn’t tell you much on how things work under the hood. So I started digging. And when I peeked at Microsoft.MashupEngine.dll internals, I was surprised to find half a dozen modules filled with Power Query (M) code. Being written by Microsoft, I assume that these modules can be used as a reference for best practices in Power Query code standards.

Some of the practices shared across all of the modules are:

Each module is wrapped in a section which

is an organizational concept that allows related expressions to be named and grouped within a document

Also the functions that are exposed to users are decorated with shared statement –

which omits the requirement to use a section-access-expression when referring to shared members outside of the containing section.

Although I’ve seen occasional use of section-access-expression as well –  LibraryModule!UICulture.GetString and TableModule!Table.AddColumns. My interpretation of these practices are, if you create a library in M, make sure to create a new named section, and mark all of the public members with shared decorator.

Moreover, the use of recursion is acceptable as long as it reduces the complexity of the code. And also I’ve seen a novel use of let expression, where the else clause of an if statement required some precomputed values (check Table.SplitColumn).

Here is the summary of Power Query modules that I’ve managed to salvage.

Module.Date.m

Date time arithmetic which doesn’t cover all of the business use-cases, however presents a very good source of inspiration for your own Date related functions. For even more Date related functions, make sure to check pquery repository.

Module.List.m

This module is a great introduction to functional programming. It contains 10 List related functions that are build upon other List related functions. My favorite one is:

shared List.RemoveLastN =
(list as list, optional countOrCondition) as list =>
 List.Reverse(
List.Skip(
List.Reverse(list), countOrCondition));

Module.SharePoint15.m

Contains the details the code for SharePoint15.Tables, SharePoint15.Contents and SharePoint15.Files. Anyone that had to deal with fetching data from SharePoint into Power BI will recognize some of the details they had to think of when using the SharePoint connector. Moreover there are some great examples on how to use OData.Feeds and Web.Contents.

Module.SqlExpression.m

Contains details on data type mapping between SQL Server and Power BI that are used in SqlExpression.SchemaFrom. An interesting observation in this module is that although you can write a switch statement using some lists constructs, Microsoft kept this function very expressive, which means that you can see a dozen of else if’s.

Module.Table.m

Probably one of the most useful modules of the lot. As everyone is using Tables to output the results of their data wrangling. This module includes the details of 78 Table related functions.There are even some details about query folding. As expected most of the Table functions are based on Record and List primitives.

Module.Text.m

The last of the lot. Although this module contains the details of a single public function Text.Format, it also also contains the details of a function which can be used for text parsing. Have a look at GetPlaceholders.

GetPlaceholders =(
        offset as number,
        openCode as text,
        closeCode as text,
        numberReference as logical
) as list => ...

I’ve added all of these modules to GitHub.

github-80257bbcc9a5340365b9ad97f4b67032

https://github.com/Hugoberry/PowerQueryModules