All posts by hugoberry

Closures, metadata and cascading parameters

If you’re interested in cascading parameters and Power BI custom connectors is an option for your project, you should give a try to Navigation Tables. Which soon enough could make their way into M standard library. If this is not an option, you might find your solution in the next few scrips tangled between functional programming concepts. You will also learn about:

  1. Where to set the allowed values for function parameters
  2. How to build a function inception with higher order functions
  3. How to access a variable outside its scope
  4. And finally get to a result which mimics cascading selections

demoCascading

Documenting the parameters

Weirdly enough, there isn’t much documentation on how to document your functions. Except Chris’  and Imke’s blogs this information is scattered across various forums and Q&A sites. [*]

For the purpose of our exercise, we’ll need to know how to decorate a function parameter with a list of allowed values. This list is stored as a metadata tag on the parameter data type within the function type (of the function).

Here is an illustrative diagram on how to interpret the previous statement:

FunctionDocumentationMetadata

The best way to add the AllowedValues list to a function is by building the chain above and assigning it to an already defined function by means of  Value.ReplaceType.  I will be using the following function (setParameters) that does just that. 

(f as function, allowedValues as list)=>
let
//create a new parameter type with metadata for allowed values
parameterType = Value.Type(allowedValues{1}) meta [Documentation.AllowedValues = allowedValues],
//create a function type with custom parameter type
myFunType = type function (select as parameterType) as function
in //replace the orginal function type with the new myFunType
Value.ReplaceType(f,myFunType)
view raw setParameters.ts hosted with ❤ by GitHub

Higher order function

In Power Query a function can return another function as a value.  This is called a Higher Order Function (HOF).  We’ve already seen an example of a HOF in the previous paragraph — Value.ReplaceType. Although sometimes you can get to an unusual notation when you deal with HOF (see below), they come very handy when you need to decorate a function definition dynamically. Or in our case dependent on user selection. 

let
ring = () => () =>[Answer="Hello"],
firstCall = ring(), //returns a function
secondCall = ring()() //returns Answer="Hello"
in
secondCall

Lets start with a function fn, which returns the value of the parameter x.

(x) => x
//do filtered operations on x
view raw fn.ts hosted with ❤ by GitHub

Say we want the allowed values for parameter x to be the digits 1 to 9. Because we can’t modify the function definition of fn from within its scope, we’ll have to create a HOF that returns a copy of fn with decorated parameter x. Call it makeClosure.

let
fnCopy = setParameters(fn,{1..9})
in fnCopy
view raw makeClosure.ts hosted with ❤ by GitHub

If you’ve rushed and invoked makeClosure with a parameter selection, you must have noticed that the parameter values were indeed {1..9}, however not much cascading is going on as the execution of this function returned only your selection.

To get to the double-hop of the cascading functionality we’ll have to wrap the call in makeClosure into another HOF, that takes as a parameter the selection of x from the previous call and returns again a copy of the original function fn but with a filtered down list for the allowed values of x.

let
//overwrite function type for _fn_ with a parameter list limited to x
fnCopy = (x) => setParameters(fn,{1..x})
in //wrap the previous overwrite in another function call with allowed parameters {1..9}
setParameters(fnCopy,{1..9})
view raw makeClosure.ts hosted with ❤ by GitHub

This time around, when making a selection on makeClosure and clicking invoke, you’ll be presented with another dialog with already preferred values for x. Cascading in action!

Closures

Another important feature of cascading parameters is making a final invocation dependent on both of the previous selections. This functionality can be achieved by means of closures.

Functions in Power Query have access to at least two environments, the global environment visible via #shared and the local environment, which is the union of parameters and local variables. When you define a function inside another function, the enclosed function can access the variables of the outer function. A HOF in which the returned function references the variables of the HOF local environment is called a closure.

Closures turn out to be a great way to remember environments that existed at the time they were created. Lets follow an example:

let
addTo = (x) => (y) => x + y,
//make a closure with scope x=5
addToFive = addTo(5),
//becomes addTo(5) => (y) => 5 + y
eight = addToFive(3)
in //becomes addTo(5) => (3) => 5 + 3
eight
view raw closureExample.ts hosted with ❤ by GitHub

addTo is a HOF that returns the function (y) => y+x . Although this function doesn’t have a local variable x, on invocation it will read it from the environment of addTo. Thus addToFive is nothing but the original addTo function with stored environment [x=5].

Getting back to our original example, in order to call the function fn with a previous selection, we’ll have to convert it to a closure. For the sake of simplicity I assume the final operation is multiplication.

(x) =>
//make selection or calculation based on parameters (x,y)
(y) => x*y
view raw fn.ts hosted with ❤ by GitHub

Subsequently, the only additional change to the function that triggers the cascading selection (makeClosure), is to pass the first selection of x as a parameter to fn. This way we are modifying the definition of fn with a filtered selection of x, and also are passing it as part of the calling environment.

let
//overwrite function type for _fn_ with a parameter list limited to x AND pass it into the scope of _fn_
fnCopy = (x) => setParameters(fn(x),{1..x})
in //wrap the previous overwrite in another function call with allowed parameters {1..9}
setParameters(fnCopy,{1..9})
view raw makeClosure.ts hosted with ❤ by GitHub

Final notes

Although the examples that I followed in this post are trivial and involve only filtering digits and perform multiplication, the solution can easily be adapted to perform database calls and/or web requests. And of course more than one parameter cascading.

You can read the original discussion that inspired this solution here.

* [Update] While browsing through Data Connectors repository found this useful documentation as well: Adding Function Documentation

Exposing #shared in Azure Analysis Services

While I enjoy that now I can write PowerQuery expressions everywhere in Analysis Services, the fact that I don’t know what is available at hand when I embed an M script in my model — is a bit of an annoyance. Especially when that script ends up working in Data Tools and Power BI but spits up various errors during processing of a table.

Expression in partition ‘Partition’ in table ‘SharedExpressionTable’ references an unknown entity.

Being able to access the #shared variable would solve the issue. However the only way to access it is by creating a cube level expression and call it from a partition level expression. This can be achieved in 3 ways:

  1. Via clicky-click Data Tools
  2. By creating a new cube with all the required expressions via a single TMSL script
  3. Or by creating an expression and a table based on that expression with some TMSL and XMLA (because TMSL doesn’t allow you to create all the objects that are available in tabular)

I’ll leave the first option for the brave ones, as I kept experiencing various problems when trying to use Power Query in SSDT 2017. As for the remaining two options here are the scripts.

Option 2

In order to create a new cube [Mcube] with all the required prerequisites, run the following command in SSMS as an XMLA script:

createMcube

The interesting bits are at line 8, a construct which creates a cube level Expression and line 36 which is an M script that calls the previously created expression via #sections[main] section of the document.

Option 3

If you’re not happy with polluting your AS instance with various transient cubes you can always add a cube level expression to an existing table with the XMLA template below:

<Batch Transaction="true" xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"&gt;
<Create xmlns="http://schemas.microsoft.com/analysisservices/2014/engine"&gt;
<DatabaseID>tab</DatabaseID>
<Expressions>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot; xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<xs:element>
<xs:complexType>
<xs:sequence>
<xs:element type="row"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="row">
<xs:sequence>
<xs:element name="Name" type="xs:string" sql:field="Name" minOccurs="0"/>
<xs:element name="Description" type="xs:string" sql:field="Description" minOccurs="0"/>
<xs:element name="Kind" type="xs:long" sql:field="Kind" minOccurs="0"/>
<xs:element name="Expression" type="xs:string" sql:field="Expression" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
<row xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
<Name>SharedExpression</Name>
<Kind>0</Kind>
<Expression>#shared</Expression>
</row>
</Expressions>
</Create>
</Batch>

It is worth pointing out that the Kind of this expression is 0 (line 24), which could only mean that M is a first class citizen in this new world.

Don’t forget to create a table that sources the data from SharedExpression expression.

{
"create": {
"parentObject": {
"database": "tab"
},
"table": {
"name": "SharedExpressionTable",
"columns": [
{
"name": "Name",
"dataType": "string",
"sourceColumn": "Name"
},
{
"name": "Value",
"dataType": "string",
"sourceColumn": "Value"
}
],
"partitions": [
{
"name": "Partition",
"dataView": "full",
"source": {
"type": "m",
"expression": "#sections[main][SharedExpression]"
}
}
]
}
}
}
view raw CreateMTable.tmsl hosted with ❤ by GitHub

The rest of the action

In order to populate the SharedExpressionTable with the contents of #shared you will have to process the [Partition] with another TMSL command or clicky-click SSMS which is also a reliable choice.

{
"refresh": {
"type": "full",
"objects": [
{
"database": "Mcube",
"table": "SharedExpressionTable",
"partition": "Partition"
}
]
}
}

To view the results use DAX or navigate to SharedExpressionTable in your reporting tool of choice (which has to be Power BI 🙂 )

evaluate 'SharedExpressionTable'

Drawing Homer Simpson

Recently I started exploring the limits of standard charts in Power BI and ended up drawing all sorts of mathematical functions. For the fun of it, I’ve decided to port a famous Homer Simpson-like curve to DAX and was really impressed by the result.

As always, there is no magic involved, just the usual trio:

  1. Prepare some iterator ranges in M
  2. Add some curve functions in DAX
  3. Visualize it 😍

Depending on how you look at this, the final solution can be summarize into 65 LOC = 15 LOC of M + 2*25 LOC of DAX. But check out the curves on those lines.

The M code

The Power Query code necessary for this doodle, generates some ranges that are specific for drawing various parts of the cartoon figure. For convenience all of them are expressed in units of π.

let
//generate a range between [start] and [end] in number of [steps] with a category descriptor
range = (category,steps,start,end)=>List.Accumulate({1..steps},{{category,start}},(s,c)=>s&{{category,start+(c/steps)*(end-start)}}),
//shortcut
pi = Number.PI
//stiching multiple ranges into a an iterator table
in #table(type table[#"fn"=number,#"i"=number],
range(0,20,35*pi,36*pi)&
range(1,10,31*pi,32*pi)&
range(2,10,27*pi,28*pi)&
range(3,30,23*pi,24*pi)&
range(4,30,19*pi,20*pi)&
range(5,30,15*pi,17*pi)&
range(6,30,11*pi,13*pi)&
range(7,30,7*pi,9*pi)&
range(8,30,3*pi,5*pi)&
range(9,30,1*pi,3*pi))
view raw Homer.pq hosted with ❤ by GitHub

The DAX code

The DAX code is broken down in 2 measure for X and Y coordinates. Each measure performs two steps:

  1. Define a collection of 10 heavily trigonometric functions (fn[index])
  2. Map the curve function fn[index] to a category fn which corresponds to a drawing range iterator

X coordinates measure

X =
VAR t = LASTNONBLANK(Homer[i],1)
VAR fnX = LASTNONBLANK(Homer[fn],1)
VAR fn0 = (-11/8*sin(17/11 - 8*t) - 3/4*sin(11/7 - 6*t) - 9/10*sin(17/11 - 5*t) + 349/9*sin(t + 11/7) + 17/12*sin(2*t + 8/5) + 288/41*sin(3*t + 8/5) + 69/10*sin(4*t + 8/5) + 8/5*sin(7*t + 13/8) + 4/7*sin(9*t + 28/17) + 4/7*sin(10*t + 19/11) + 1051/8)
VAR fn1 = (-3/4*sin(11/7 - 5*t) - 54*sin(11/7 - t) + 237/8*sin(2*t + 11/7) + 52/11*sin(3*t + 33/7) + 38/9*sin(4*t + 11/7) + 249/2)
VAR fn2 = (-16/9*sin(14/9 - 5*t) - 5/2*sin(14/9 - 3*t) + 781/8*sin(t + 33/7) + 291/11*sin(2*t + 11/7) + 23/7*sin(4*t + 11/7) + 18/19*sin(6*t + 11/7) + 2/5*sin(7*t + 61/13) + 24/23*sin(8*t + 14/9) + 1/27*sin(9*t + 5/11) + 4/11*sin(10*t + 11/7) + 1/75*sin(11*t + 5/8) + 1411/7)
VAR fn3 = (-7/11*sin(13/10 - 13*t) + 3003/16*sin(t + 33/7) + 612/5*sin(2*t + 11/7) + 542/11*sin(3*t + 47/10) + 137/7*sin(4*t + 51/11) + 53/7*sin(5*t + 17/11) + 23/12*sin(6*t + 41/9) + 94/11*sin(7*t + 51/11) + 81/11*sin(8*t + 41/9) + 53/12*sin(9*t + 23/5) + 73/21*sin(10*t + 13/9) + 15/7*sin(11*t + 6/5) + 37/7*sin(12*t + 7/5) + 5/9*sin(14*t + 27/7) + 36/7*sin(15*t + 9/2) + 68/23*sin(16*t + 48/11) + 14/9*sin(17*t + 32/7) + 1999/9)
VAR fn4 = (1692/19*sin(t + 29/19) + 522/5*sin(2*t + 16/11) + 767/12*sin(3*t + 59/13) + 256/11*sin(4*t + 31/7) + 101/5*sin(5*t + 48/11) + 163/8*sin(6*t + 43/10) + 74/11*sin(7*t + 49/12) + 35/4*sin(8*t + 41/10) + 22/15*sin(9*t + 29/14) + 43/10*sin(10*t + 4) + 16/7*sin(11*t + 6/5) + 11/21*sin(12*t + 55/14) + 3/4*sin(13*t + 37/10) + 13/10*sin(14*t + 27/7) + 2383/6)
VAR fn5 = (-1/9*sin(7/5 - 10*t) - 2/9*sin(11/9 - 6*t) + 20/11*sin(t + 16/15) + 7/13*sin(2*t + 15/4) + 56/13*sin(3*t + 25/9) + 1/6*sin(4*t + 56/15) + 5/16*sin(5*t + 19/8) + 2/5*sin(7*t + 5/16) + 5/12*sin(8*t + 17/5) + 1/4*sin(9*t + 3) + 1181/4)
VAR fn6 = (-1/6*sin(8/11 - 5*t) + 5/8*sin(t + 6/5) + 13/5*sin(2*t + 45/14) + 10/3*sin(3*t + 7/2) + 13/10*sin(4*t + 24/25) + 1/6*sin(6*t + 9/5) + 1/4*sin(7*t + 37/13) + 1/8*sin(8*t + 13/4) + 1/9*sin(9*t + 7/9) + 2/9*sin(10*t + 63/25) + 1/10*sin(11*t + 1/9) + 4137/8)
VAR fn7 = (-17/13*sin(6/5 - 12*t) - 15/7*sin(25/26 - 11*t) - 13/7*sin(3/14 - 10*t) - 25/7*sin(9/13 - 6*t) - 329/3*sin(8/17 - t) + 871/8*sin(2*t + 2) + 513/14*sin(3*t + 5/4) + 110/9*sin(4*t + 3/8) + 43/8*sin(5*t + 1/5) + 43/13*sin(7*t + 42/11) + 49/16*sin(8*t + 11/13) + 11/5*sin(9*t + 2/7) + 5/7*sin(13*t + 42/13) + 1729/4)
VAR fn8 = (427/5*sin(t + 91/45) + 3/11*sin(2*t + 7/2) + 5656/11)
VAR fn9 = (-10/9*sin(7/10 - 4*t) - 7/13*sin(5/6 - 3*t) - 732/7*sin(4/7 - t) + 63/31*sin(2*t + 1/47) + 27/16*sin(5*t + 11/4) + 3700/11)
RETURN
SWITCH(TRUE(),
fnX=0,fn0,
fnX=1,fn1,
fnX=2,fn2,
fnX=3,fn3,
fnX=4,fn4,
fnX=5,fn5,
fnX=6,fn6,
fnX=7,fn7,
fnX=8,fn8,
fnX=9,fn9,
BLANK())
view raw HomerX.dax hosted with ❤ by GitHub

Y coordinates measure

Y =
VAR t = LASTNONBLANK(Homer[i],1)
VAR fnX = LASTNONBLANK(Homer[fn],1)
VAR fn0 = (-4/11*sin(7/5 - 10*t) - 11/16*sin(14/13 - 7*t) - 481/11*sin(17/11 - 4*t) - 78/7*sin(26/17 - 3*t) + 219/11*sin(t + 11/7) + 15/7*sin(2*t + 18/11) + 69/11*sin(5*t + 11/7) + 31/12*sin(6*t + 47/10) + 5/8*sin(8*t + 19/12) + 10/9*sin(9*t + 17/11) + 5365/11)
VAR fn1 = (-75/13*sin(14/9 - 4*t) - 132/5*sin(11/7 - 2*t) - 83*sin(11/7 - t) + 1/7*sin(3*t + 47/10) + 1/8*sin(5*t + 14/11) + 18332/21)
VAR fn2 = (191/3*sin(t + 33/7) + 364/9*sin(2*t + 33/7) + 43/22*sin(3*t + 14/3) + 158/21*sin(4*t + 33/7) + 1/4*sin(5*t + 74/17) + 121/30*sin(6*t + 47/10) + 1/9*sin(7*t + 17/6) + 25/11*sin(8*t + 61/13) + 1/6*sin(9*t + 40/9) + 7/6*sin(10*t + 47/10) + 1/14*sin(11*t + 55/28) + 7435/8)
VAR fn3 = (-4/7*sin(14/9 - 13*t) + 2839/8*sin(t + 47/10) + 893/6*sin(2*t + 61/13) + 526/11*sin(3*t + 8/5) + 802/15*sin(4*t + 47/10) + 181/36*sin(5*t + 13/3) + 2089/87*sin(6*t + 14/3) + 29/8*sin(7*t + 69/16) + 125/12*sin(8*t + 47/10) + 4/5*sin(9*t + 53/12) + 93/47*sin(10*t + 61/13) + 3/10*sin(11*t + 9/7) + 13/5*sin(12*t + 14/3) + 41/21*sin(14*t + 22/5) + 4/5*sin(15*t + 22/5) + 14/5*sin(16*t + 50/11) + 17/7*sin(17*t + 40/9) + 4180/7)
VAR fn4 = (-7/4*sin(8/11 - 14*t) - 37/13*sin(3/2 - 12*t) + 2345/11*sin(t + 32/21) + 632/23*sin(2*t + 14/3) + 29/6*sin(3*t + 31/21) + 245/11*sin(4*t + 5/4) + 193/16*sin(5*t + 7/5) + 19/2*sin(6*t + 32/7) + 19/5*sin(7*t + 17/9) + 334/23*sin(8*t + 35/8) + 11/3*sin(9*t + 21/11) + 106/15*sin(10*t + 22/5) + 52/15*sin(11*t + 19/12) + 7/2*sin(13*t + 16/13) + 12506/41)
VAR fn5 = (-3/7*sin(1/10 - 9*t) - 1/8*sin(5/14 - 5*t) - 9/8*sin(26/17 - 2*t) + 18/7*sin(t + 14/11) + 249/50*sin(3*t + 37/8) + 3/13*sin(4*t + 19/9) + 2/5*sin(6*t + 65/16) + 9/17*sin(7*t + 1/4) + 5/16*sin(8*t + 44/13) + 2/9*sin(10*t + 29/10) + 6689/12)
VAR fn6 = (-1/27*sin(1 - 11*t) - 1/6*sin(4/11 - 10*t) - 1/5*sin(2/11 - 9*t) - 7/20*sin(1/2 - 5*t) - 51/14*sin(29/28 - 3*t) + 23/7*sin(t + 18/5) + 25/9*sin(2*t + 53/12) + 3/2*sin(4*t + 41/15) + 1/5*sin(6*t + 36/11) + 1/12*sin(7*t + 14/3) + 3/10*sin(8*t + 19/9) + 3845/7)
VAR fn7 = (-8/7*sin(1/3 - 13*t) - 9/13*sin(4/5 - 11*t) - 32/19*sin(17/12 - 9*t) - 11/6*sin(9/13 - 8*t) - 169/15*sin(8/17 - 3*t) + 917/8*sin(t + 55/12) + 669/10*sin(2*t + 4/13) + 122/11*sin(4*t + 49/24) + 31/9*sin(5*t + 1/8) + 25/9*sin(6*t + 6/7) + 43/10*sin(7*t + 1/21) + 18/19*sin(10*t + 9/13) + 2/9*sin(12*t + 31/15) + 1309/5)
VAR fn8 = (-267/38*sin(3/10 - 2*t) + 625/8*sin(t + 62/17) + 8083/14)
VAR fn9 = (1370/13*sin(t + 25/6) + 41/21*sin(2*t + 205/51) + 11/16*sin(3*t + 8/13) + 9/13*sin(4*t + 26/9) + 6/5*sin(5*t + 11/14) + 2251/4)
RETURN
SWITCH(TRUE(),
fnX=0,fn0,
fnX=1,fn1,
fnX=2,fn2,
fnX=3,fn3,
fnX=4,fn4,
fnX=5,fn5,
fnX=6,fn6,
fnX=7,fn7,
fnX=8,fn8,
fnX=9,fn9,
BLANK())
view raw HomerY.dax hosted with ❤ by GitHub

The visual

To build the visualization, select the standard Scattered Chart. Add the column i to Details and Play Axis. Then add fn to the Legend. And to finish all off, map X to X Axis and Y to Y axis — DOH!

Chart

Select all of the fn ids from the charts’ legend and press play.

homerOptim

The origin

For the curious minds, the functions above were taken from WolframAlpha. Their scientist have written a series of blog posts in which they describe in minute detail the steps required for generating various curves. Start with:

It would be an understatement to I say that everything boils down to a Fast Fourier Transform — but it does 🙂

And finally — here is the original Homer Simpson-like curve:

homerFormula

Navigating over 600+ M funcitons

Whenever I’m faced with a data mashup problem in Power BI, I try to check if it can be resolved with a standard M functions exposed by the intrinsic variable #shared. Navigating this data structure when you don’t know what you’re looking for seems to be tedious task. Up until recently my function discovery workflow included to turn the output of #shared into a table and then search for keywords in the name of the functions. Hoping that the developers called the functions accordingly. However, after discovering where the documentation for the functions is stored I’ve come up with more flexible ways of navigating the 600+ definitions.

Function documentation

A while ago I’ve stumbled across a wonderful GitHub repository — Power BI Desktop Query Extensions written by Taylor Clark. This is one of those hidden treasures that is filled with golden nuggets. I recommend it to anyone interested in taking their Power Query skills to the next level. Although the repository is actually a single file with 545 LOC, in those lines you’ll see some of the best examples of text, list and table manipulation. My favorite finds are those on testing and documenting M functions.

Apparently the documentation that you see while browsing #shared is stored as a metadata record on the type of the functions.

ListPositions

In order to view it you need to first check the type of the function and then retrieve the metadata. For example:

= Value.Metadata(Value.Type(List.Positions))

Metadata

Something that looks like a record in the query view, can easily be transformed into a table. And there are so many ways you can render the information stored as a table.

Function signature

Before proceeding any further there is still one piece of the puzzle missing —the signature of the functions. I couldn’t find it neither in the documentation metadata, nor via other standard functions. So I had to roll out a custom query that generates a text like:

"function (list as list) as list"

It uses a combination of Type.FunctionParametersType.FunctionReturn and a looooong list of type to text conversion.

(placeholder as function)=>
let
//Serialize type to text
TypeAsText = (value as any) =>
let
prefix = if Type.IsNullable(value) then "nullable " else ""
in
prefix&(
if Type.Is(value, type binary) then "binary" else
if Type.Is(value, type date) then "date" else
if Type.Is(value, type datetime) then "datetime" else
if Type.Is(value, type datetimezone) then "datetimezone" else
if Type.Is(value, type duration) then "duration" else
if Type.Is(value, type function) then "function" else
if Type.Is(value, type list) then "list" else
if Type.Is(value, type logical) then "logical" else
if Type.Is(value, type none) then "none" else
if Type.Is(value, type null) then "null" else
if Type.Is(value, type number) then "number" else
if Type.Is(value, type record) then "record" else
if Type.Is(value, type table) then "table" else
if Type.Is(value, type text) then "text" else
if Type.Is(value, type time) then "time" else
if Type.Is(value, type type) then "type" else
if Type.Is(value, type any) then "any"
else error "unknown"),
//if parameter is Optional set prefix
OptionalPrefix = (_)=>if Type.IsNullable(_) then "optional " else "",
//get list of function parameters
parameters = Type.FunctionParameters(Value.Type(placeholder)),
//create a text list of parameters and associate types "[optional] paramname as type"
parametersWithTypes = List.Accumulate(Record.FieldNames(parameters),{},
(state,cur)=>state&{
OptionalPrefix(Record.Field(parameters,cur))&
cur&" as "&TypeAsText(Record.Field(parameters,cur))})
in
//merge parameter list and prefix with "function (" and suffix with function return type
"function ("&
Text.Combine(parametersWithTypes,", ")&
") as "&
TypeAsText(Type.FunctionReturn(Value.Type(placeholder)))
view raw Signature.cs hosted with ❤ by GitHub

Stiching everything together

I’ve started from #shared record, transforming it to table, then filtering only on function definitions and then added columns one by one for Category, Description, function Examples and Signature. After some string manipulation to parse the Module names I ended up with the query below.

let
sharedTab = Record.ToTable(#shared),
//select only functions from #shared
functions = Table.SelectRows(sharedTab, each Type.Is(Value.Type([Value]),type function)),
//parse Module from function name
modules = Table.AddColumn(functions, "Module", each Text.Split([Name], "."){0}, type text),
functionNames = Table.AddColumn(modules, "FunctionName", each List.Last(Text.Split([Name], ".")), type text),
//get category from documentation
categories = Table.AddColumn(functionNames, "Category", each try Value.Metadata(Value.Type(Record.Field(#shared,[Name])))[Documentation.Category] otherwise ""),
//parse only the first code example from documentation
examples = Table.AddColumn(categories, "Examples", each
let eg = Value.Metadata(Value.Type(Record.Field(#shared,[Name])))[Documentation.Examples]?
in if Type.Is(Value.Type(eg),type record) then eg[Code] else eg{0}?[Code]?),
//get the short description from the documentation
descriptions = Table.AddColumn(examples, "Description", each Value.Metadata(Value.Type(Record.Field(#shared,[Name])))[Documentation.Description]?),
//parse subcategories
subcategories = Table.AddColumn(descriptions, "DotCategory", each List.Last(Text.Split([Category],"."))),
//adding the signature of the functions
out = Table.AddColumn(subcategories, "Signature", each Signature(Record.Field(#shared,[Name])))
in
out
view raw NavigateShared.cs hosted with ❤ by GitHub

I was planning to represent this data in a tile chart. Similar to a periodic table. However after experiencing performance problems while rendering 600 tiles with Infographic Designer 1.5.2, I gave up on the whole idea and opted to visualize all of this information in a treemap.

I’ve added a dummy column([Uno]) with value 1 and used as a Value field in the treemap chart. Then I’ve added column [Module] to the Group field and the [FunctionName] in Details section. All of the remaining columns: [Description], [Signature] and [Examples], I’ve added to the Tooltips section of the chart. To control the treemap, I’ve used a matrix which acts as a filter on [Category] column.

treemap

Click to view the embedded version of the report ➡

Exporting Power Query tables to SQL Server

Power Query is a great tool for data mashup, however when it comes to really show its powers as a desktop ETL tool, you realise that no matter how powerful at E an T it is, the L is falling behind, as you can only load the data into the embedded tabular data model that comes with Power BI.

Ignoring R, local cubes and HTTP POST requests, I wanted to showcase a few other methods for exporting data from Power Query and loading it to SQL Server.

The logic I follow is very simple — because there is no obvious limitation to the type of queries you can run against OleDb or ADO.NET datasets — a SELECT query will be executed with same success as an INSERT query.

As a prerequisite, on the SQL Server side I’ve created a table which I populate initially with single records, then with 1M records.

CREATE TABLE [dbo].[Demo](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Key] [int] NULL,
[Value] [varchar](50) NULL
)
view raw DemoTable.sql hosted with ❤ by GitHub

OleDb

Following the logic described above, inserting a single entry into Demo table via OleDb looks fairly trivial.

= OleDb.DataSource(
"provider=SQLOLEDB.1;data source=.;database=QueryPower",
[Query=Text.Format("INSERT INTO [dbo].[Demo] VALUES(#{0},'#{1}')",{1,"A"})])
view raw InsertSqlOleDb.m hosted with ❤ by GitHub

However if you try to follow this approach you’ll soon notice that in addition to fiddling around with Text.Format to get your INSERT statement righ, you will also have to approve each parameter variation from your SQL statement. Which makes this approach unfeasible unless you construct your INSERT statement to include all of the millions of records in a single string.

ADO.NET

On the other hand an ADO.NET connection wrapped in a Value.NativeQuery function provides a more familiar approach to performing data INSERT. This method has the convenience of passing query parameters as a record or list inside the SQL statement. As a result you need to approve the firewall rule for the SQL query only once.

= Value.NativeQuery(
Sql.Database(".", "QueryPower"),
"INSERT INTO [dbo].[Demo] VALUES(@key,@value)",
[key=10,value="A"])
view raw InsertSqlAdoDb.m hosted with ❤ by GitHub

Changing the scale

The methods I’ve described so far work reasonably well for times when you need to insert a few hundred records in a SQL table. However when the problem explodes to millions of rows, you need to do some prep work before sending over all of that data. I’ll detail 3 techniques for exporting datasets with millions of records: Table Valued Parameters, XML and JSON.

As an example I use a 1M records table I generate from a list with an incremental sequence Key and GUID generated Value field. Table dummy.

= Table.FromRecords(
List.Generate(()=> [ key = 0, value = Text.NewGuid()],
each [key] < 1000000,
each [key = [key]+1,
value = Text.NewGuid()]))
view raw dummy.m hosted with ❤ by GitHub

TVP

When doing bulk inserts in ADO.NET, the most recommended approach is to use a Table Valued Parameter to send data to a Stored Procedure which performs the INSERT. As a prerequisite on SQL Server side I had to add a table type and a stored procedure that accepts this table type as a READONLY parameter.

CREATE PROC [dbo].[usp_InsertDemoXML] @xmlBin AS varbinary(max)
AS
BEGIN
declare @xml xml = convert(xml,@xmlBin)
INSERT INTO Demo ([Key],[Value])
SELECT
[key] = x.v.value('key[1]','int'),
[Value] = x.v.value('value[1]','varchar(50)')
FROM @xml.nodes('/table/row') x(v)
END
view raw XML.sql hosted with ❤ by GitHub

Although sending a TVP into an SP is a straightforward task in .NET, it requires casting the data reader (table) to a SqlDbType.Structured data type. 

I wanted to rely on Power Query internal data type conversion and sent a table as a parameter to my newly created SP. Needless to say I was greeted with the error below


After giving up on the idea of casting a Power Query table into a SQL Structured data type, I’ve decided to replicate the internal call during of an  exec SP with @TVP. Thus I’ve built a rather lengthy 1002 row SQL statement that declares an internal table, inserts all the records from the dummy table and calls the SP with internal table as a parameter.

let
//building the Table Value Constructor for a record
rows = (t) => Table.TransformRows(t, each "("&Text.From([key])&",'"&[value]&"')"),
//group entries in sets of 1000 because Table Value Constructor have a limitation of 1000 entries
values = List.Generate(()=>0,
(i)=>i<1000,
(i)=>i+1,
(i)=>Text.Combine(rows(Table.Range(dummy,i*1000,1000)),",")),
//adding the declaration of TVP and the call to SP that uses TVP
TVP = Text.Combine({"declare @p1 dbo.DemoList #(lf)"}&
List.Transform(values, each "insert into @p1 values"&_&"#(lf)")
&{"#(lf)exec dbo.usp_InsertDemoTVP @TVP=@p1"}),
//call the lenghty SQL statement
out = Value.NativeQuery(Sql.Database(".", "QueryPower"),TVP)
in
out
view raw TVP.m hosted with ❤ by GitHub

XML

Back in the days when XML was still cool, sending some tabular data to SQL Server was the matter of serializing a recordset into an XML document and sending it to a stored procedure that parsed it back into a recordset. Nothing has changed since on the SQL Server side.

CREATE PROC [dbo].[usp_InsertDemoXML] @xmlBin AS varbinary(max)
AS
BEGIN
declare @xml xml = convert(xml,@xmlBin)
INSERT INTO Demo ([Key],[Value])
SELECT
[key] = x.v.value('key[1]','int'),
[Value] = x.v.value('value[1]','varchar(50)')
FROM @xml.nodes('/table/row') x(v)
END
view raw XML.sql hosted with ❤ by GitHub

Since Power Query lacks any functions for constructing XML documents, I found it very difficult to come up with an approach for building binary XML documents that performs on par to JSON serialization. I’ve tried various combinations of looping through the records of a table via List.Generate and List.Accumulate, nevertheless Table.TransformRows and avoiding making any calls to Text.Format seemed to deliver best performance.

let
//Transform each row of [dummy] table into a binary XML
rows = Table.TransformRows(dummy,
each Binary.Combine({Text.ToBinary("<row><key>")
,Text.ToBinary(Text.From([key]))
,Text.ToBinary("</key><value>")
,Text.ToBinary([value])
,Text.ToBinary("</value></row>")})),
//Wrapping up all the rows in <table> tags
out = Binary.Combine({Text.ToBinary("<table>")}&
rows&
{Text.ToBinary("</table>")})
in
out
view raw dummy2XML.m hosted with ❤ by GitHub

On calling Value.NativeQuery, Power Query converts the binary string into SQL varbinary(max) data type, thus for the final call it requires only mapping the placeholder to the binary XML document.

let
out = Value.NativeQuery(Sql.Database(".", "QueryPower"),"usp_InsertDemoXML @xmlBin",[xmlBin=dummy2XML])
in
out
view raw XML.m hosted with ❤ by GitHub

JSON 

Since SQL Server 2016 introduction of JSON support, sending and receiving data to/from SQL Server got a lot easier. Power BI is no exception, sending data to a SQL Server table requires addition of a SP with JSON parameter and on Power Query side serializing the dataset as a text bases JSON object with Json.FomValue.

CREATE PROC [dbo].[usp_InsertDemoJSON] @json AS nvarchar(max)
AS
BEGIN
INSERT INTO Demo ([Key],[Value])
SELECT *
FROM OPENJSON(@json)
WITH ([Key] int, [Value] varchar(50))
END
view raw JSON.sql hosted with ❤ by GitHub
let
//serialize table [dummy] into a JSON object
json = Text.FromBinary(Json.FromValue(dummy)),
//pass JSON object to SP that unwrapps it as a recordset & performs INSERT operation
out = Value.NativeQuery(Sql.Database(".", "QueryPower"),"usp_InsertDemoJSON @json",[json=json])
in
out
view raw JSON.m hosted with ❤ by GitHub

Because of native JSON support on both SQL and Power BI sides, this was by far the most easy and fast way of sending data across.

4 ways to get USERNAME in Power Query

Regardless of what are you requirements, be that providing some user context during slicing of data or a filtering mechanism for data querying, having a function like USERNAME() in Power Query toolbox  can greatly simplify your data mashup scripts. While trying to build a more robust approach for self referencing queries in M, I’ve managed to collect 4 methods on getting the current username in Power BI. These methods rely on a variety of dependencies — R, NTFS, LocalDB and PowerShell, and come with their own advantages and disadvantages. Based on your setup you might find some of the methods more appropriate than the others, however they are not a universal replacement for a native supported function like USERNAME().

1. R.Execute()

Once again the Swiss Army knife of Power Query — R with its hook function R.Execute, never fails at extending Power BI capabilities. Here is an example of how I call the system command whoami via R function system2 and use its output as a data source in Power Query.

let
RScript = R.Execute("output <- read.table(text=system2(""whoami"", stdout=TRUE))"),
output = RScript{[Name="output"]}[Value]
in
output
view raw STOUTin.m hosted with ❤ by GitHub

Pros: The advantage of this approach is that it can be adapted to get the STOUT of other system commands as well.

Cons: It depends on calling R, which is currently supported only by Power BI Desktop.

2. NTFS

The output table of Folder.Contents in Power Query contains the field Access Date which records the timestamp of when a file/folder was moved, opened, or access by any other means. Assuming that the NTFS Access Date records for C:/Users sub-folders are up to date, then the folder with the latest Access Date should be the current user.

A note of caution here, NTFS might take up to an hour to update the file times. Which makes this method unreliable in certain scenarios. Nevertheless I’ve recommended this shortcut in the past, and it is still my preferred way of getting to msmdsrv.port.txt file.

//Assumes current user is the one with latest [Date accessed] folder in C:\Users
let
Users = Folder.Contents("C:/Users"),
CurrentUser = Table.FirstN(Table.Sort(Users,{{"Date accessed", Order.Descending}}),1)[Name]{0}
in
CurrentUser
view raw UserNameNTFS.m hosted with ❤ by GitHub

Pros: Simple

Cons: Unreliable as NTFS can take up to 1 hour to update the Access Date records of a folder. The Antivirus and other background processes can interfere with correctness of Access Date ⇒ Current User. Also the output of this function doesn’t contain the domain name.

3. LocalDB or any other SQL source

If you happen to have a SQL Server instance lying around or by some curious accident you’ve installed SQL Server Express with support for LocalDB, you can make a trivial SQL query to get SYSTEM_USER. You don’t even need an assigned database for that.

There is always the option of installing only LocalDB feature of SQL Server Express.

install

Make sure to select the version that is relevant to your setup as the connection strings for LocalDB have changed over the years

//LocalDB
UserNameLocalDB =
() => Sql.Database("(localdb)\v11.0","",[Query="SELECT SYSTEM_USER"])
//LocalDB from SQL Server 2016 Express
UserNameLocalDB2016 =
() => Sql.Database("(localdb)\MSSQLLocalDB","",[Query="SELECT SYSTEM_USER"])
//Local instance of SQL Server
UserNameSQL =
() => Sql.Database(".","",[Query="SELECT SYSTEM_USER"])
view raw UserNameLocalDB.m hosted with ❤ by GitHub

username

Pros: You might already have LocalDB installed.

Cons: Requires access to a SQL Server instance or installation of LocalDB.

4. HTTP API for Executing PowerShell Scripts

This method relies on a lightweight service which listens to port 8888 and executes PowerShell commands sent over HTTP. The PowerShell team has released this script/service as an alternative to the SOAP heavy Windows Remote Management, for more details please read the associated blog post: Simple HTTP api for Executing PowerShell Scripts. The service itself is written in PowerShell and the 300 lines of code are very easy to follow and adapt to the relevant scenario.

Once I have downloaded, installed and imported the module, I start an instance by using the commandlet Start-HTTPListener.

Start-httpListener

To get the current user in Power Query I make an HTTP call to localhost:8888 with system command whoami.

// Based on running the simple HTTP API service from:
// https://blogs.msdn.microsoft.com/powershell/2014/09/29/simple-http-api-for-executing-powershell-scripts/
let
output = Json.Document(Web.Contents("http://localhost:8888/?command=whoami&quot;))
in
output
view raw UserNameHTTPapi.m hosted with ❤ by GitHub

Back in the PowerShell console, the flag -verbose provides more details to what is happening behind the scenes:

whoami

Pros: This method can be expanded to sourcing the output from other system commands as well as PowerShell commandlets.

Cons: Introduces an additional dependency which might interfere with users’ execution policies. The PowerShell commandlet has to be running in the background.

IMAO

We shouldn’t be asking Power BI team to implement only small utilities like UserName() or CookBreakfast(). Instead we should see the bigger problem and ask for a bigger hammer. I would want Power BI to expose a functions for parsing the console output, this would help with accessing a lot of obsolete data sources as well as standard system commands. Or if we would like to keep only to system specific domain how about having a connector for WMI the same as SSIS does?

Extracting Power Queries

TL;DR Another place to find your Power BI queries is in the data sources of the embedded tabular model. They are stored in a zip file which is encoded as a base64 string in the Mashup field of the connection string.

While we are waiting for Microsoft team to release new features that would allow exporting Power Query queries outside Power BI ecosystem, we might as well explore all of the other possibilities available to us. So far I’ve counted 3 ways of getting a hold of our precious M queries:

  1. Sending a frown email
  2. Parsing the *.pbix files
  3. Using the clipboard (doh!)

As I’ve been poking around this black box that is Power BI, I have discovered another place where M queries are stored — the data source connection string of the embedded tabular model. The steps that I took in order to get there are:

  1. Find the port number on which the $Embedded$ Tabular model is running on
  2. Connect to that Tabular instance and get the Connection Strings of the Tabular Schema Data Sources
  3. Parse the Connection Strings for Mashup field
  4. Convert the base64 Mashup field into a binary file
  5. Unzip the binary file and display Formulas/Section1.m file

Here is me trying to do everything in one go via PowerShell.

PowerThiefEZGIF

When it comes to scripting, the power of the Shell is far greater than that of the Query, but in this case all of the prerequisites are there to replicate the above steps in pure M.

 

Finding $Embedded$ Tabular port number

I’ve tried to solve this problem before via R scripts, however as Mim pointed out to me, there is a simpler Power Query way of doing this. Because Power BI stores its auto generated port number for the tabular model in a text file (msmdsrv.port.txt) under AnalysisServicesWorkspaces, the task is reduced to reading the contents of that file.

()=>
let
//Assume current user is the one with latest [Date accessed] folder in C:\Users
CurrentUser = Table.FirstN(
Table.Sort(Folder.Contents("C:\Users"),
{{"Date accessed", Order.Descending}}),
1)[Name]{0},
//Read the contents of file msmdsrv.port.txt from subfolder of AnalysisServicesWorkspaces
Port = Lines.FromBinary(
Table.SelectRows(
Folder.Files("C:\Users\"&
CurrentUser&
"\AppData\Local\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces"),
each ([Name] = "msmdsrv.port.txt"))[Content]{0},null,null,1200){0}
in
Port
view raw MSMDSRVport.m hosted with ❤ by GitHub

By the way my approach uses a short circuit logic, assuming that the NTFS Access Date records for C:/Users are up to date, apparently they might take up to an hour to update. So if you find this script misbehaving have a look at Imke’s blog post for the original script.

Get Tabular Schema Data Sources

You can read about the latest undocumented Tabular DMVs from Meagan’s blog, or fire up SQL Server Profiler, check the boxes for End Command events, and script out the tabular model from SSMS and you’ll  discover a plethora of DMV being displayed in the profiler’s trace. DMV hiding all of the connection strings is TMSCHEMA_DATA_SOURCES.

(port as text) =>
let
//Get $Embedded$ database name
catalog = AnalysisServices.Databases("localhost:"&port)[Name]{0},
//Run DMV query to get data source details
dataSources = AnalysisServices.Database("localhost:"&port,catalog,
[Query="select * from $system.TMSCHEMA_DATA_SOURCES"])
in
dataSources[ConnectionString]{0}

Extract Mashup field into binary

I’ve encountered some difficulties parsing the connection string with a Text.Split function so I’ve used a Splitter function instead.

(connectionString as text) =>
let
//Define a semicolon splitter
split = Splitter.SplitTextByDelimiter(";"),
//Split the connection string by semicaolon and assume that Mashup is the last element
mashup = Text.Replace(List.Last(split(connectionString)),"Mashup=","")
in
//Convert base64 string to binary
Binary.FromText(mashup)

Unzip Formulas/Section1.m from Mashup binary

I’ve seen at least 2 functions for unzipping files in Power Query written by KenR and Mark White. Also I’ve described before how to call R unz function from an R.Execute wrapper. This is the result of combining the two.

(binaryZip,fileName) =>
let
//shorthand
UInt32 = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,ByteOrder.LittleEndian),
UInt16 = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,ByteOrder.LittleEndian),
//ZIP file header fixed size structure
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
CompressedSize = UInt32,
UncompressedSize = UInt32,
FileNameLen = UInt16,
ExtraFieldLen = UInt16]),
//ZIP file header dynamic size structure
FileData = (h)=> BinaryFormat.Record([
FileName = BinaryFormat.Text(h[FileNameLen]),
ExtraField = BinaryFormat.Text(h[ExtraFieldLen]),
UncompressedData = BinaryFormat.Transform(
BinaryFormat.Binary(h[CompressedSize]),
(x) => try
Binary.Buffer(Binary.Decompress(x, Compression.Deflate))
otherwise null)]),
//Parsing the binary in search for PKZIP header signature
ZipIterator = BinaryFormat.Choice(UInt32, (signature) => if signature <> 0x04034B50
then BinaryFormat.Record([FileName=null])
else BinaryFormat.Choice(Header,(z)=>FileData(z))),
ZipFormat = BinaryFormat.List(ZipIterator),
out = List.Select(ZipFormat(binaryZip), each _[FileName]=fileName)
in
out{0}[UncompressedData]
view raw Unz.m hosted with ❤ by GitHub

Wrapping all of the functions above and adjusting them to support multiple instances of Power BI while applying some minification techniques we get the final M script.

let
//Declaring compact function
MSMDSRVport = () => let cu=Table.FirstN(Table.Sort(Folder.Contents("C:\Users"),{{"Date accessed",Order.Descending}}),1)[Name]{0} in List.Transform(Table.SelectRows(Folder.Files("C:\Users\"&cu&"\AppData\Local\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces"),each [Name]="msmdsrv.port.txt")[Content],each Lines.FromBinary(_,null,null,1200){0}),
GetConnectionString = (port) => let c=AnalysisServices.Databases("localhost:"&port)[Name]{0} in AnalysisServices.Database("localhost:"&port,c,[Query="select * from $system.TMSCHEMA_DATA_SOURCES"])[ConnectionString],
ConnectionString2Bin = (cs) => let sp=Splitter.SplitTextByDelimiter(";") in Binary.FromText(Text.Replace(Text.Replace(List.Last(sp(cs)),"Mashup=",""),"mashup=","")),
Unz = (bZ,fN) => let UInt32=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32,ByteOrder.LittleEndian),UInt16=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16,ByteOrder.LittleEndian),H=BinaryFormat.Record([MH=BinaryFormat.Binary(14),CS=UInt32,US=UInt32,FNL=UInt16,EFL=UInt16]),FD=(h)=>BinaryFormat.Record([FN=BinaryFormat.Text(h[FNL]),EF=BinaryFormat.Text(h[EFL]),UD=BinaryFormat.Transform(BinaryFormat.Binary(h[CS]),(x)=>try Binary.Buffer(Binary.Decompress(x,Compression.Deflate)) otherwise null)]),ZI=BinaryFormat.Choice(UInt32,(_)=>if _<>0x04034B50 then BinaryFormat.Record([FN=null]) else BinaryFormat.Choice(H,(z)=>FD(z))),ZF=BinaryFormat.List(ZI),out=List.Select(ZF(bZ),each _[FN]=fN) in out{0}[UD],
//Get all the ports for $Embedded$ Tabular instances
Port = Table.FromList(MSMDSRVport(),null,{"port"}),
//Add a column with embedded connection strings
AddConString = Table.ExpandListColumn(Table.AddColumn(Port,"ConnectionString",each GetConnectionString([port])), "ConnectionString"),
//Parse the ConnectionString column to reveal the Mashup string in binary format
AddMashupBin = Table.AddColumn(AddConString,"Mashup",each ConnectionString2Bin([ConnectionString])),
//Unzip from the binary Mashups the Formulas/Section1.m files
ParseMScripts = Table.TransformColumns(AddMashupBin,{"Mashup", each Unz(_,"Formulas/Section1.m")}),
//Present the M scripts as lines
out = Table.AddColumn(ParseMScripts,"MLines", each Lines.FromBinary([Mashup]))
in
out
view raw PowerQueryThief.ts hosted with ❤ by GitHub

Some of the use cases that are still left for me to explore with this approach:

  • pushing M scripts to GitHub, thus replicating the query catalog functionality
  • exporting the Power Queries to *.odc files and consequently importing M scripts from Power BI desktop into Excel

 PowerShell version

For the curious minds, here is the PowerShell code I used in Power Query reconnaissance.

#Finding the portnumber on which the $Embedded$ tabular model is running on
$embedded = "$env:LOCALAPPDATA\Microsoft\Power BI Desktop\AnalysisServicesWorkspaces"
$ports = Get-ChildItem $embedded -rec | where {$_.Name -eq "msmdsrv.port.txt"}
$port = Get-Content $ports.FullName -Encoding Unicode
#Getting the data sources from the $Embedded$ tabular model
[xml] $db = Invoke-ASCmd -Server:localhost:$port -Query:"SELECT * from `$SYSTEM.TMSCHEMA_DATA_SOURCES"
$db.return.root.row
$cs = $db.return.root.row.ConnectionString
#Cleaning up the connection string
$b64 = $cs.Split(";")[-1].Trim("Mashup=").Trim("""")
$bytes = [Convert]::FromBase64String($b64)
$temp = "c:\temp"
#Output to a binary ZIP file
[IO.File]::WriteAllBytes("$temp\a.zip", $bytes)
#Unzip
Add-Type -AssemblyName System.IO.Compression.FileSystem
[System.IO.Compression.ZipFile]::ExtractToDirectory("$temp\a.zip","$temp\a")
#TADA!
($m = Get-Content $temp\a\Formulas\Section1.m)
view raw PowerQueryThief.ps1 hosted with ❤ by GitHub

Please note that this approach can be used for extracting Power Queries for already loaded data and only in Power BI desktop.

R, Power Query and msmdsrv.exe port number

The quest to fetch the details of the ever evasive $Embedded$ connection to the local tabular data model has always interested the Power BI community. There are numerous tips on how to obtain the details involving either shell or PowerShell scripts or even V..B..A…

I want to share with you an approach that is based on using R.Execute() in Power Query, and has the advantage of not implying a dependency outside the Power BI environment. This is a re-iteration of a pattern that I’ve already mentioned in one of my previous posts, which explained hot to use R to get STDOUT as a data source in Power Query.

In the R script I use, the function system2() executes system commands tasklist and netstat. Because I need to output the results of these commands into data frames, I also use the function read.fwf() in order to parse the fixed delimited output of STDOUT.

##Get msmdsrv process details
msmdsrv_proc <- system2("tasklist",args = c('/fi','"imagename eq msmdsrv.exe"'), stdout=TRUE)
msmdsrv_clean<-msmdsrv_proc[-c(1,3)]
msmdsrv<-read.fwf(textConnection(msmdsrv_clean),widths=c(25,9,17,12,13),comment.char="")
##Get all TCP ports
tcp <- system2("netstat", args = c('-anop','TCP'), stdout=TRUE)
ports<-read.fwf(textConnection(tcp),skip=3,widths=c(9,23,23,16,10))
view raw msmdsrvr.r hosted with ❤ by GitHub

Already in the Power Query world, I transform the two data frames msmdsrv and ports via function formatTable, which trims the table contents and promotes the headers. Then I match/join the two tables on [PID] and parse the contents of column [Local Address] to get the desired port number. Here is the whole script :

let
RScript = R.Execute("msmdsrv_proc <- system2(""tasklist"",args = c('/fi','""imagename eq msmdsrv.exe""'), stdout=TRUE) #(lf)msmdsrv_clean<-msmdsrv_proc[-c(1,3)]#(lf)msmdsrv<-read.fwf(textConnection(msmdsrv_clean),widths=c(25,9,17,12,13),comment.char="""")#(lf)tcp <- system2(""netstat"", args = c('-anop','TCP'), stdout=TRUE)#(lf)ports<-read.fwf(textConnection(tcp),skip=3,widths=c(9,23,23,16,10))"),
formatTable = (T) => Table.PromoteHeaders(Table.TransformColumns(T,{{"V1",Text.Trim},{"V2",Text.Trim},{"V3",Text.Trim},{"V4",Text.Trim},{"V5",Text.Trim}})),
msmdsrv = formatTable(RScript{[Name="msmdsrv"]}[Value]),
ports = formatTable(RScript{[Name="ports"]}[Value]),
match = Table.SelectRows(ports, each [PID]=Table.ToRecords(msmdsrv){0}[PID]){0}[Local Address],
out = List.Last(Text.Split(match,":"))
in
out
view raw msmdsrvPORT.M hosted with ❤ by GitHub

If you prefer keeping all the filtering logic inside the R script, here is my take on how to do it:

let
Source = #table(null,{{1}}),
RScript = R.Execute("output<-dataset #(lf)msmdsrv_proc <- system2(""tasklist"",args = c('/nh','/fi','""imagename eq msmdsrv.exe""'), stdout=TRUE) #(lf)msmdsrv_pid <- strsplit(msmdsrv_proc[2],""\\s+"")[[1]][2]#(lf)msmdsrv_regex <- paste(c(""ESTABLISHED\\s*"", msmdsrv_pid),collapse="""")#(lf)tcp <- system2(""netstat"", args = c('-ano'), stdout=TRUE) #(lf)msmdsrv_connections<-grep(msmdsrv_regex,tcp,value=TRUE)#(lf)msmdsrv_inbound<-strsplit(strsplit(msmdsrv_connections[1],""\\s+"")[[1]][3],"":"")[[1]] #(lf)output$port<-tail(msmdsrv_inbound,n=1)",[dataset=Source]),
out = RScript{[Name="output"]}[Value]
in
out
view raw MSMDSRVport.m hosted with ❤ by GitHub

As a final note I wanted to add that there is actually a way of connecting to the embedded tabular data model which doesn’t require any dependencies. Try the command:

= AnalysisServices.Databases("*")

Unfortunately it uses the same connection context as the live connection so you’ll get an exception like:

Microsoft.Mashup.Evaluator.Interface.ErrorException: Connections to the server context that is currently active are not supported.

ErrorLocal

Which is not very practical.

R.Execute() the Swiss Army knife of Power Query

I rarely use R in Power BI, but when I do,  I make sure to keep the data mashup out of it. I don’t have anything against R, it’s arguably the best statistical tool out there, however when it comes to using an expressive language for doing data wrangling, Power Query (M) is my language of choice.

Although R has been introduced to Power BI ecosystem as a complimentary tool, its capabilities exceed by far those of the Microsoft’s analytics suite. That is why in areas where Power BI is falling behind, I can always rely on R to bridge the gap.

Below I describe 5 patterns of how I use R.Execute() in Power Query. I have divided these in 2 classes – as intended and unintended use cases – just to separate the expectation for how much support these patterns will be getting from Power BI team.

Note: When using R.Execute() the only rule I follow is that if I need some data out of an R routine, I must put the execution results into a dataframe. Other than that all R rules apply.

The intended use of R in Power Query

Executing an algorithm

This is probably one of the most legit use-cases for using R in Power BI. Because there is no language/framework/library that can compare to the vast variety of algorithms implemented in R. If you need to run some exotic algorithm on your data, most likely there is an optimised version written an R, you just have to find the right package (and the right repository: CRAN or MRAN to name a few).

Here is a standard R call executing KMeans clustering on iris datataset.

let
RScript = R.Execute("set.seed(20)#(lf)irisCluster<-kmeans(iris[,3:4],3,nstart=20)#(lf)iris$cluster<-irisCluster$cluster"),
out = RScript{[Name="iris"]}[Value]
in
out
view raw RKmeans.m hosted with ❤ by GitHub

Calling an R function on Power Query input

Regex is the perfect example here. While we wait for its implementation to arrive in Power Query, why not use R regex related functions regexpr, regmatches, etc?

Here is an example for parsing @mentions and #tags from a twitter feed.

let
Source = Csv.Document(File.Contents("RTwitterSample.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.Csv]),
twitts = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
RScript = R.Execute("temp<-dataset#(lf)pattern<-""[#@]\\w+""#(lf)temp$mention<-sapply(temp$text,function(x) toString(regmatches(x, regexpr(pattern, x))))",[dataset=twitts ]),
out = RScript{[Name="temp"]}[Value]
in
out
view raw RTwitts.m hosted with ❤ by GitHub

Or another cheeky  example of trying to salvage removed passwords from GitHub commits.

Unintended uses of R in Power Query

My favorite ones of the lot. I must admit these are the patterns that I abuse the most.

Performing file based operations

Say you have some CSV data stored in a ZIP file on your file system somewhere, the only way to get a hold of that data is either by manually unzipping the data, or use one of the lovely UnZip functions written in Power Query by either KenR or Mark White. The R way is even simpler, use read.csv(unz(zip,file.csv)) and you’re done!

let
RScript = R.Execute("datacsv<-read.csv(unz(""d:\\Downloads\\R_Twitter_sample.zip"",""R_Twitter_sample.csv""))"),
out = RScript{[Name="datacsv"]}[Value]
in
out
view raw ReadRZip.m hosted with ❤ by GitHub

Exporting data from Power Query

You’ve done your data analysts in Power Query and you’re stuck with all this data in the $Embedded$ Tabular Data Model. The only way to get the data out is by exporting this data via a table chart, into a CSV. On the other hand you can use R, and export effortlessly millions of records via write.csv  function and not worry about rendering problems that might be caused by the table chart visualisation.

let
seed = {1..10000000},
seedTable = Table.FromList(seed, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
RScript = R.Execute("write.csv(dataset,""d:\\Downloads\\out.csv"")",[dataset=seedTable]),
out = RScript
in
out
view raw 10MlinesToCSV.m hosted with ❤ by GitHub

Using STDOUT as a data source in Power Query

This pattern was inspired by one of the questions from the Power BI community. Again, referring back to R, you can call a system commands via system2() function and use its console output as a data source in Power Query.

let
RScript = R.Execute("output <- read.table(text=system2(""whoami"", stdout=TRUE))"),
output = RScript{[Name="output"]}[Value]
in
output
view raw STOUTin.m hosted with ❤ by GitHub

Some Power Query internals

Whenever I’m given a black box, I marvel at its elegance. Then after a while, frustration settles in, as the IN’s and OUT’s don’t add up and even going through the official documentation doesn’t help. This is when I reach a point that I bring in a hammer and start banging at this black box until something gives in and secrets start pouring.

Same happened with Power BI and its elegant data mashup engine. I love the fact that it is a “pure, higher-order, dynamically typed, partially lazy functional language” , but the documentation doesn’t tell you much on how things work under the hood. So I started digging. And when I peeked at Microsoft.MashupEngine.dll internals, I was surprised to find half a dozen modules filled with Power Query (M) code. Being written by Microsoft, I assume that these modules can be used as a reference for best practices in Power Query code standards.

Some of the practices shared across all of the modules are:

Each module is wrapped in a section which

is an organizational concept that allows related expressions to be named and grouped within a document

Also the functions that are exposed to users are decorated with shared statement –

which omits the requirement to use a section-access-expression when referring to shared members outside of the containing section.

Although I’ve seen occasional use of section-access-expression as well –  LibraryModule!UICulture.GetString and TableModule!Table.AddColumns. My interpretation of these practices are, if you create a library in M, make sure to create a new named section, and mark all of the public members with shared decorator.

Moreover, the use of recursion is acceptable as long as it reduces the complexity of the code. And also I’ve seen a novel use of let expression, where the else clause of an if statement required some precomputed values (check Table.SplitColumn).

Here is the summary of Power Query modules that I’ve managed to salvage.

Module.Date.m

Date time arithmetic which doesn’t cover all of the business use-cases, however presents a very good source of inspiration for your own Date related functions. For even more Date related functions, make sure to check pquery repository.

Module.List.m

This module is a great introduction to functional programming. It contains 10 List related functions that are build upon other List related functions. My favorite one is:

shared List.RemoveLastN =
(list as list, optional countOrCondition) as list =>
 List.Reverse(
List.Skip(
List.Reverse(list), countOrCondition));

Module.SharePoint15.m

Contains the details the code for SharePoint15.Tables, SharePoint15.Contents and SharePoint15.Files. Anyone that had to deal with fetching data from SharePoint into Power BI will recognize some of the details they had to think of when using the SharePoint connector. Moreover there are some great examples on how to use OData.Feeds and Web.Contents.

Module.SqlExpression.m

Contains details on data type mapping between SQL Server and Power BI that are used in SqlExpression.SchemaFrom. An interesting observation in this module is that although you can write a switch statement using some lists constructs, Microsoft kept this function very expressive, which means that you can see a dozen of else if’s.

Module.Table.m

Probably one of the most useful modules of the lot. As everyone is using Tables to output the results of their data wrangling. This module includes the details of 78 Table related functions.There are even some details about query folding. As expected most of the Table functions are based on Record and List primitives.

Module.Text.m

The last of the lot. Although this module contains the details of a single public function Text.Format, it also also contains the details of a function which can be used for text parsing. Have a look at GetPlaceholders.

GetPlaceholders =(
        offset as number,
        openCode as text,
        closeCode as text,
        numberReference as logical
) as list => ...

I’ve added all of these modules to GitHub.

github-80257bbcc9a5340365b9ad97f4b67032

https://github.com/Hugoberry/PowerQueryModules