Tag Archives: Power Query

Extracting Power Queries

TL;DR Another place to find your Power BI queries is in the data sources of the embedded tabular model. They are stored in a zip file which is encoded as a base64 string in the Mashup field of the connection string.

While we are waiting for Microsoft team to release new features that would allow exporting Power Query queries outside Power BI ecosystem, we might as well explore all of the other possibilities available to us. So far I’ve counted 3 ways of getting a hold of our precious M queries:

  1. Sending a frown email
  2. Parsing the *.pbix files
  3. Using the clipboard (doh!)

As I’ve been poking around this black box that is Power BI, I have discovered another place where M queries are stored — the data source connection string of the embedded tabular model. The steps that I took in order to get there are:

  1. Find the port number on which the $Embedded$ Tabular model is running on
  2. Connect to that Tabular instance and get the Connection Strings of the Tabular Schema Data Sources
  3. Parse the Connection Strings for Mashup field
  4. Convert the base64 Mashup field into a binary file
  5. Unzip the binary file and display Formulas/Section1.m file

Here is me trying to do everything in one go via PowerShell.


When it comes to scripting, the power of the Shell is far greater than that of the Query, but in this case all of the prerequisites are there to replicate the above steps in pure M.


Finding $Embedded$ Tabular port number

I’ve tried to solve this problem before via R scripts, however as Mim pointed out to me, there is a simpler Power Query way of doing this. Because Power BI stores its auto generated port number for the tabular model in a text file (msmdsrv.port.txt) under AnalysisServicesWorkspaces, the task is reduced to reading the contents of that file.

//Assume current user is the one with latest [Date accessed] folder in C:\Users
CurrentUser = Table.FirstN(
{{"Date accessed", Order.Descending}}),
//Read the contents of file msmdsrv.port.txt from subfolder of AnalysisServicesWorkspaces
Port = Lines.FromBinary(
"\AppData\Local\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces"),
each ([Name] = "msmdsrv.port.txt"))[Content]{0},null,null,1200){0}
view raw MSMDSRVport.m hosted with ❤ by GitHub

By the way my approach uses a short circuit logic, assuming that the NTFS Access Date records for C:/Users are up to date, apparently they might take up to an hour to update. So if you find this script misbehaving have a look at Imke’s blog post for the original script.

Get Tabular Schema Data Sources

You can read about the latest undocumented Tabular DMVs from Meagan’s blog, or fire up SQL Server Profiler, check the boxes for End Command events, and script out the tabular model from SSMS and you’ll  discover a plethora of DMV being displayed in the profiler’s trace. DMV hiding all of the connection strings is TMSCHEMA_DATA_SOURCES.

(port as text) =>
//Get $Embedded$ database name
catalog = AnalysisServices.Databases("localhost:"&port)[Name]{0},
//Run DMV query to get data source details
dataSources = AnalysisServices.Database("localhost:"&port,catalog,
[Query="select * from $system.TMSCHEMA_DATA_SOURCES"])

Extract Mashup field into binary

I’ve encountered some difficulties parsing the connection string with a Text.Split function so I’ve used a Splitter function instead.

(connectionString as text) =>
//Define a semicolon splitter
split = Splitter.SplitTextByDelimiter(";"),
//Split the connection string by semicaolon and assume that Mashup is the last element
mashup = Text.Replace(List.Last(split(connectionString)),"Mashup=","")
//Convert base64 string to binary

Unzip Formulas/Section1.m from Mashup binary

I’ve seen at least 2 functions for unzipping files in Power Query written by KenR and Mark White. Also I’ve described before how to call R unz function from an R.Execute wrapper. This is the result of combining the two.

(binaryZip,fileName) =>
UInt32 = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,ByteOrder.LittleEndian),
UInt16 = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,ByteOrder.LittleEndian),
//ZIP file header fixed size structure
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
CompressedSize = UInt32,
UncompressedSize = UInt32,
FileNameLen = UInt16,
ExtraFieldLen = UInt16]),
//ZIP file header dynamic size structure
FileData = (h)=> BinaryFormat.Record([
FileName = BinaryFormat.Text(h[FileNameLen]),
ExtraField = BinaryFormat.Text(h[ExtraFieldLen]),
UncompressedData = BinaryFormat.Transform(
(x) => try
Binary.Buffer(Binary.Decompress(x, Compression.Deflate))
otherwise null)]),
//Parsing the binary in search for PKZIP header signature
ZipIterator = BinaryFormat.Choice(UInt32, (signature) => if signature <> 0x04034B50
then BinaryFormat.Record([FileName=null])
else BinaryFormat.Choice(Header,(z)=>FileData(z))),
ZipFormat = BinaryFormat.List(ZipIterator),
out = List.Select(ZipFormat(binaryZip), each _[FileName]=fileName)
view raw Unz.m hosted with ❤ by GitHub

Wrapping all of the functions above and adjusting them to support multiple instances of Power BI while applying some minification techniques we get the final M script.

//Declaring compact function
MSMDSRVport = () => let cu=Table.FirstN(Table.Sort(Folder.Contents("C:\Users"),{{"Date accessed",Order.Descending}}),1)[Name]{0} in List.Transform(Table.SelectRows(Folder.Files("C:\Users\"&cu&"\AppData\Local\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces"),each [Name]="msmdsrv.port.txt")[Content],each Lines.FromBinary(_,null,null,1200){0}),
GetConnectionString = (port) => let c=AnalysisServices.Databases("localhost:"&port)[Name]{0} in AnalysisServices.Database("localhost:"&port,c,[Query="select * from $system.TMSCHEMA_DATA_SOURCES"])[ConnectionString],
ConnectionString2Bin = (cs) => let sp=Splitter.SplitTextByDelimiter(";") in Binary.FromText(Text.Replace(Text.Replace(List.Last(sp(cs)),"Mashup=",""),"mashup=","")),
Unz = (bZ,fN) => let UInt32=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,ByteOrder.LittleEndian),UInt16=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,ByteOrder.LittleEndian),H=BinaryFormat.Record([MH=BinaryFormat.Binary(14),CS=UInt32,US=UInt32,FNL=UInt16,EFL=UInt16]),FD=(h)=>BinaryFormat.Record([FN=BinaryFormat.Text(h[FNL]),EF=BinaryFormat.Text(h[EFL]),UD=BinaryFormat.Transform(BinaryFormat.Binary(h[CS]),(x)=>try Binary.Buffer(Binary.Decompress(x,Compression.Deflate)) otherwise null)]),ZI=BinaryFormat.Choice(UInt32,(_)=>if _<>0x04034B50 then BinaryFormat.Record([FN=null]) else BinaryFormat.Choice(H,(z)=>FD(z))),ZF=BinaryFormat.List(ZI),out=List.Select(ZF(bZ),each _[FN]=fN) in out{0}[UD],
//Get all the ports for $Embedded$ Tabular instances
Port = Table.FromList(MSMDSRVport(),null,{"port"}),
//Add a column with embedded connection strings
AddConString = Table.ExpandListColumn(Table.AddColumn(Port,"ConnectionString",each GetConnectionString([port])), "ConnectionString"),
//Parse the ConnectionString column to reveal the Mashup string in binary format
AddMashupBin = Table.AddColumn(AddConString,"Mashup",each ConnectionString2Bin([ConnectionString])),
//Unzip from the binary Mashups the Formulas/Section1.m files
ParseMScripts = Table.TransformColumns(AddMashupBin,{"Mashup", each Unz(_,"Formulas/Section1.m")}),
//Present the M scripts as lines
out = Table.AddColumn(ParseMScripts,"MLines", each Lines.FromBinary([Mashup]))
view raw PowerQueryThief.ts hosted with ❤ by GitHub

Some of the use cases that are still left for me to explore with this approach:

  • pushing M scripts to GitHub, thus replicating the query catalog functionality
  • exporting the Power Queries to *.odc files and consequently importing M scripts from Power BI desktop into Excel

 PowerShell version

For the curious minds, here is the PowerShell code I used in Power Query reconnaissance.

#Finding the portnumber on which the $Embedded$ tabular model is running on
$embedded = "$env:LOCALAPPDATA\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces"
$ports = Get-ChildItem $embedded -rec | where {$_.Name -eq "msmdsrv.port.txt"}
$port = Get-Content $ports.FullName -Encoding Unicode
#Getting the data sources from the $Embedded$ tabular model
[xml] $db = Invoke-ASCmd -Server:localhost:$port -Query:"SELECT * from `$SYSTEM.TMSCHEMA_DATA_SOURCES"
$cs = $db.return.root.row.ConnectionString
#Cleaning up the connection string
$b64 = $cs.Split(";")[-1].Trim("Mashup=").Trim("""")
$bytes = [Convert]::FromBase64String($b64)
$temp = "c:\temp"
#Output to a binary ZIP file
[IO.File]::WriteAllBytes("$temp\a.zip", $bytes)
Add-Type -AssemblyName System.IO.Compression.FileSystem
($m = Get-Content $temp\a\Formulas\Section1.m)
view raw PowerQueryThief.ps1 hosted with ❤ by GitHub

Please note that this approach can be used for extracting Power Queries for already loaded data and only in Power BI desktop.

R, Power Query and msmdsrv.exe port number

The quest to fetch the details of the ever evasive $Embedded$ connection to the local tabular data model has always interested the Power BI community. There are numerous tips on how to obtain the details involving either shell or PowerShell scripts or even V..B..A…

I want to share with you an approach that is based on using R.Execute() in Power Query, and has the advantage of not implying a dependency outside the Power BI environment. This is a re-iteration of a pattern that I’ve already mentioned in one of my previous posts, which explained hot to use R to get STDOUT as a data source in Power Query.

In the R script I use, the function system2() executes system commands tasklist and netstat. Because I need to output the results of these commands into data frames, I also use the function read.fwf() in order to parse the fixed delimited output of STDOUT.

##Get msmdsrv process details
msmdsrv_proc <- system2("tasklist",args = c('/fi','"imagename eq msmdsrv.exe"'), stdout=TRUE)
##Get all TCP ports
tcp <- system2("netstat", args = c('-anop','TCP'), stdout=TRUE)
view raw msmdsrvr.r hosted with ❤ by GitHub

Already in the Power Query world, I transform the two data frames msmdsrv and ports via function formatTable, which trims the table contents and promotes the headers. Then I match/join the two tables on [PID] and parse the contents of column [Local Address] to get the desired port number. Here is the whole script :

RScript = R.Execute("msmdsrv_proc <- system2(""tasklist"",args = c('/fi','""imagename eq msmdsrv.exe""'), stdout=TRUE) #(lf)msmdsrv_clean<-msmdsrv_proc[-c(1,3)]#(lf)msmdsrv<-read.fwf(textConnection(msmdsrv_clean),widths=c(25,9,17,12,13),comment.char="""")#(lf)tcp <- system2(""netstat"", args = c('-anop','TCP'), stdout=TRUE)#(lf)ports<-read.fwf(textConnection(tcp),skip=3,widths=c(9,23,23,16,10))"),
formatTable = (T) => Table.PromoteHeaders(Table.TransformColumns(T,{{"V1",Text.Trim},{"V2",Text.Trim},{"V3",Text.Trim},{"V4",Text.Trim},{"V5",Text.Trim}})),
msmdsrv = formatTable(RScript{[Name="msmdsrv"]}[Value]),
ports = formatTable(RScript{[Name="ports"]}[Value]),
match = Table.SelectRows(ports, each [PID]=Table.ToRecords(msmdsrv){0}[PID]){0}[Local Address],
out = List.Last(Text.Split(match,":"))
view raw msmdsrvPORT.M hosted with ❤ by GitHub

If you prefer keeping all the filtering logic inside the R script, here is my take on how to do it:

Source = #table(null,{{1}}),
RScript = R.Execute("output<-dataset #(lf)msmdsrv_proc <- system2(""tasklist"",args = c('/nh','/fi','""imagename eq msmdsrv.exe""'), stdout=TRUE) #(lf)msmdsrv_pid <- strsplit(msmdsrv_proc[2],""\\s+"")[[1]][2]#(lf)msmdsrv_regex <- paste(c(""ESTABLISHED\\s*"", msmdsrv_pid),collapse="""")#(lf)tcp <- system2(""netstat"", args = c('-ano'), stdout=TRUE) #(lf)msmdsrv_connections<-grep(msmdsrv_regex,tcp,value=TRUE)#(lf)msmdsrv_inbound<-strsplit(strsplit(msmdsrv_connections[1],""\\s+"")[[1]][3],"":"")[[1]] #(lf)output$port<-tail(msmdsrv_inbound,n=1)",[dataset=Source]),
out = RScript{[Name="output"]}[Value]
view raw MSMDSRVport.m hosted with ❤ by GitHub

As a final note I wanted to add that there is actually a way of connecting to the embedded tabular data model which doesn’t require any dependencies. Try the command:

= AnalysisServices.Databases("*")

Unfortunately it uses the same connection context as the live connection so you’ll get an exception like:

Microsoft.Mashup.Evaluator.Interface.ErrorException: Connections to the server context that is currently active are not supported.


Which is not very practical.

R.Execute() the Swiss Army knife of Power Query

I rarely use R in Power BI, but when I do,  I make sure to keep the data mashup out of it. I don’t have anything against R, it’s arguably the best statistical tool out there, however when it comes to using an expressive language for doing data wrangling, Power Query (M) is my language of choice.

Although R has been introduced to Power BI ecosystem as a complimentary tool, its capabilities exceed by far those of the Microsoft’s analytics suite. That is why in areas where Power BI is falling behind, I can always rely on R to bridge the gap.

Below I describe 5 patterns of how I use R.Execute() in Power Query. I have divided these in 2 classes – as intended and unintended use cases – just to separate the expectation for how much support these patterns will be getting from Power BI team.

Note: When using R.Execute() the only rule I follow is that if I need some data out of an R routine, I must put the execution results into a dataframe. Other than that all R rules apply.

The intended use of R in Power Query

Executing an algorithm

This is probably one of the most legit use-cases for using R in Power BI. Because there is no language/framework/library that can compare to the vast variety of algorithms implemented in R. If you need to run some exotic algorithm on your data, most likely there is an optimised version written an R, you just have to find the right package (and the right repository: CRAN or MRAN to name a few).

Here is a standard R call executing KMeans clustering on iris datataset.

RScript = R.Execute("set.seed(20)#(lf)irisCluster<-kmeans(iris[,3:4],3,nstart=20)#(lf)iris$cluster<-irisCluster$cluster"),
out = RScript{[Name="iris"]}[Value]
view raw RKmeans.m hosted with ❤ by GitHub

Calling an R function on Power Query input

Regex is the perfect example here. While we wait for its implementation to arrive in Power Query, why not use R regex related functions regexpr, regmatches, etc?

Here is an example for parsing @mentions and #tags from a twitter feed.

Source = Csv.Document(File.Contents("RTwitterSample.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.Csv]),
twitts = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
RScript = R.Execute("temp<-dataset#(lf)pattern<-""[#@]\\w+""#(lf)temp$mention<-sapply(temp$text,function(x) toString(regmatches(x, regexpr(pattern, x))))",[dataset=twitts ]),
out = RScript{[Name="temp"]}[Value]
view raw RTwitts.m hosted with ❤ by GitHub

Or another cheeky  example of trying to salvage removed passwords from GitHub commits.

Unintended uses of R in Power Query

My favorite ones of the lot. I must admit these are the patterns that I abuse the most.

Performing file based operations

Say you have some CSV data stored in a ZIP file on your file system somewhere, the only way to get a hold of that data is either by manually unzipping the data, or use one of the lovely UnZip functions written in Power Query by either KenR or Mark White. The R way is even simpler, use read.csv(unz(zip,file.csv)) and you’re done!

RScript = R.Execute("datacsv<-read.csv(unz(""d:\\Downloads\\R_Twitter_sample.zip"",""R_Twitter_sample.csv""))"),
out = RScript{[Name="datacsv"]}[Value]
view raw ReadRZip.m hosted with ❤ by GitHub

Exporting data from Power Query

You’ve done your data analysts in Power Query and you’re stuck with all this data in the $Embedded$ Tabular Data Model. The only way to get the data out is by exporting this data via a table chart, into a CSV. On the other hand you can use R, and export effortlessly millions of records via write.csv  function and not worry about rendering problems that might be caused by the table chart visualisation.

seed = {1..10000000},
seedTable = Table.FromList(seed, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
RScript = R.Execute("write.csv(dataset,""d:\\Downloads\\out.csv"")",[dataset=seedTable]),
out = RScript
view raw 10MlinesToCSV.m hosted with ❤ by GitHub

Using STDOUT as a data source in Power Query

This pattern was inspired by one of the questions from the Power BI community. Again, referring back to R, you can call a system commands via system2() function and use its console output as a data source in Power Query.

RScript = R.Execute("output <- read.table(text=system2(""whoami"", stdout=TRUE))"),
output = RScript{[Name="output"]}[Value]
view raw STOUTin.m hosted with ❤ by GitHub

Some Power Query internals

Whenever I’m given a black box, I marvel at its elegance. Then after a while, frustration settles in, as the IN’s and OUT’s don’t add up and even going through the official documentation doesn’t help. This is when I reach a point that I bring in a hammer and start banging at this black box until something gives in and secrets start pouring.

Same happened with Power BI and its elegant data mashup engine. I love the fact that it is a “pure, higher-order, dynamically typed, partially lazy functional language” , but the documentation doesn’t tell you much on how things work under the hood. So I started digging. And when I peeked at Microsoft.MashupEngine.dll internals, I was surprised to find half a dozen modules filled with Power Query (M) code. Being written by Microsoft, I assume that these modules can be used as a reference for best practices in Power Query code standards.

Some of the practices shared across all of the modules are:

Each module is wrapped in a section which

is an organizational concept that allows related expressions to be named and grouped within a document

Also the functions that are exposed to users are decorated with shared statement –

which omits the requirement to use a section-access-expression when referring to shared members outside of the containing section.

Although I’ve seen occasional use of section-access-expression as well –  LibraryModule!UICulture.GetString and TableModule!Table.AddColumns. My interpretation of these practices are, if you create a library in M, make sure to create a new named section, and mark all of the public members with shared decorator.

Moreover, the use of recursion is acceptable as long as it reduces the complexity of the code. And also I’ve seen a novel use of let expression, where the else clause of an if statement required some precomputed values (check Table.SplitColumn).

Here is the summary of Power Query modules that I’ve managed to salvage.


Date time arithmetic which doesn’t cover all of the business use-cases, however presents a very good source of inspiration for your own Date related functions. For even more Date related functions, make sure to check pquery repository.


This module is a great introduction to functional programming. It contains 10 List related functions that are build upon other List related functions. My favorite one is:

shared List.RemoveLastN =
(list as list, optional countOrCondition) as list =>
List.Reverse(list), countOrCondition));


Contains the details the code for SharePoint15.Tables, SharePoint15.Contents and SharePoint15.Files. Anyone that had to deal with fetching data from SharePoint into Power BI will recognize some of the details they had to think of when using the SharePoint connector. Moreover there are some great examples on how to use OData.Feeds and Web.Contents.


Contains details on data type mapping between SQL Server and Power BI that are used in SqlExpression.SchemaFrom. An interesting observation in this module is that although you can write a switch statement using some lists constructs, Microsoft kept this function very expressive, which means that you can see a dozen of else if’s.


Probably one of the most useful modules of the lot. As everyone is using Tables to output the results of their data wrangling. This module includes the details of 78 Table related functions.There are even some details about query folding. As expected most of the Table functions are based on Record and List primitives.


The last of the lot. Although this module contains the details of a single public function Text.Format, it also also contains the details of a function which can be used for text parsing. Have a look at GetPlaceholders.

GetPlaceholders =(
        offset as number,
        openCode as text,
        closeCode as text,
        numberReference as logical
) as list => ...

I’ve added all of these modules to GitHub.



[Matrix] x [Matrix]

Although matrices are not first class citizens in Power Query, quite often a business problem comes along which might greatly be simplified by thinking of it in matrix terms.

You can of course imitate matrices in Power BI by means of tables, two dimensional lists or records but this approach doesn’t come with any matrix related functions. So as always you have to roll up your sleeves and roll out your own set of needed operations.

I would like to share with you an approach of doing matrix multiplication which can be translated into other algebra operations.

We’ll start by reviewing how matrix multiplication works from the diagram below.


If we break down the problem, we can think of two matrices A and B, in M(Power Query) terms as two tables with matching size of rows in matrix A and columns in matrix B respectively. Then the matrix multiplication operation can be reduced to a problem of iterating through all row and column combinations and computing the dot product of equally sized vectors.

The dot product in M can be performed by using List.Accumulate which aggregates the product of a List.Zip which in turn pulls the respective elements of two vectors into a tuples. Here is the code.

dotProduct = (v1,v2) =>
       List.Zip({v1,v2}),0,(agg,_)=> agg+(_{1}*_{0}))

Next, we breakdown the matrix A into rows – Table.ToRows(A) – and matrix B into columns Table.ToColumns(B). And iterate through all the combinations of row vectors and column vectors in order to accumulate the results into a nested list of dot products.

Check the code below for use of nested List.Accumulate.

               (_,ai)=> _&{ List.Accumulate(Table.ToColumns(B),
                          {},(_,bj)=> _&{ dotProduct(ai,bj) })

Putting the two operations together results in the following final function:

(A,B) =>
//vector multiplication
dotProduct = (v1,v2) =>
_& { List.Accumulate(Table.ToColumns(B),
_&{ dotProduct(ai,bj) })

How much data can fit in Power BI Desktop

I’ve heard this question being asked multiple times, in forums, blog posts and every single client that doesn’t trust the performance of the Power BI desktop app. I’ve set myself the challenge to finding this out, while exploring the versatility of Power Query M language.

One of the most powerful data structures that M has in its arsenal is the List. I’ll start by creating a simple list of numbers.

= {1..1000}

Let’s Apply this query and check the result in the Data Model.


This list looks like a table, acts like a table – then it must be a table. While I’m here I’ll change the Data Type of the column Query1 from Text to Whole Number. This way I’ll save the memory footprint for next time I try to load as much data as possible in in this single column table.

Next step – construct the biggest list possible.

= {1..3000000000]


And I’ve hit my first limit. Another try with 231-1.

= {1..2147483647}

Success! But not for long, as once I apply the changes and wait for the Query Changes to be committed to Data Model, at around 2,000,707,684 loaded records I was faced with another error.


I’ve tried to repeat the process with a smaller lists and got my first success at 1,999,999,981 records.


Which produced a 29GB file.


That’s a lot of records, and only to add them to the Data Model I needed 55GB of memory.


Not too bad for a desktop application.

What about if I wanted to load lots of data that is not that unique? Say 232-2 of ones?

each [counter]<2147483647,
each [counter=[counter]+1, uno=1],
each [uno]

This query completed in no-time, and with very little memory footprint as the entire msmdsrv.exe service stayed within the 1GB limit. The output saved on disk occupied 191KB. Go compression!

How about the wideness of data? I used a different set of queries to find the limit.

I’ve hit gold with :


Which resulted in a 167MB file.

And the imprecise errors that kept pop-ing up were: