Navigating over 600+ M funcitons

Whenever I’m faced with a data mashup problem in Power BI, I try to check if it can be resolved with a standard M functions exposed by the intrinsic variable #shared. Navigating this data structure when you don’t know what you’re looking for seems to be tedious task. Up until recently my function discovery workflow included to turn the output of #shared into a table and then search for keywords in the name of the functions. Hoping that the developers called the functions accordingly. However, after discovering where the documentation for the functions is stored I’ve come up with more flexible ways of navigating the 600+ definitions.

Function documentation

A while ago I’ve stumbled across a wonderful GitHub repository — Power BI Desktop Query Extensions written by Taylor Clark. This is one of those hidden treasures that is filled with golden nuggets. I recommend it to anyone interested in taking their Power Query skills to the next level. Although the repository is actually a single file with 545 LOC, in those lines you’ll see some of the best examples of text, list and table manipulation. My favorite finds are those on testing and documenting M functions.

Apparently the documentation that you see while browsing #shared is stored as a metadata record on the type of the functions.

ListPositions

In order to view it you need to first check the type of the function and then retrieve the metadata. For example:

= Value.Metadata(Value.Type(List.Positions))

Metadata

Something that looks like a record in the query view, can easily be transformed into a table. And there are so many ways you can render the information stored as a table.

Function signature

Before proceeding any further there is still one piece of the puzzle missing —the signature of the functions. I couldn’t find it neither in the documentation metadata, nor via other standard functions. So I had to roll out a custom query that generates a text like:

"function (list as list) as list"

It uses a combination of Type.FunctionParametersType.FunctionReturn and a looooong list of type to text conversion.

(placeholder as function)=>
let
//Serialize type to text
TypeAsText = (value as any) =>
let
prefix = if Type.IsNullable(value) then "nullable " else ""
in
prefix&(
if Type.Is(value, type binary) then "binary" else
if Type.Is(value, type date) then "date" else
if Type.Is(value, type datetime) then "datetime" else
if Type.Is(value, type datetimezone) then "datetimezone" else
if Type.Is(value, type duration) then "duration" else
if Type.Is(value, type function) then "function" else
if Type.Is(value, type list) then "list" else
if Type.Is(value, type logical) then "logical" else
if Type.Is(value, type none) then "none" else
if Type.Is(value, type null) then "null" else
if Type.Is(value, type number) then "number" else
if Type.Is(value, type record) then "record" else
if Type.Is(value, type table) then "table" else
if Type.Is(value, type text) then "text" else
if Type.Is(value, type time) then "time" else
if Type.Is(value, type type) then "type" else
if Type.Is(value, type any) then "any"
else error "unknown"),
//if parameter is Optional set prefix
OptionalPrefix = (_)=>if Type.IsNullable(_) then "optional " else "",
//get list of function parameters
parameters = Type.FunctionParameters(Value.Type(placeholder)),
//create a text list of parameters and associate types "[optional] paramname as type"
parametersWithTypes = List.Accumulate(Record.FieldNames(parameters),{},
(state,cur)=>state&{
OptionalPrefix(Record.Field(parameters,cur))&
cur&" as "&TypeAsText(Record.Field(parameters,cur))})
in
//merge parameter list and prefix with "function (" and suffix with function return type
"function ("&
Text.Combine(parametersWithTypes,", ")&
") as "&
TypeAsText(Type.FunctionReturn(Value.Type(placeholder)))
view raw Signature.cs hosted with ❤ by GitHub

Stiching everything together

I’ve started from #shared record, transforming it to table, then filtering only on function definitions and then added columns one by one for Category, Description, function Examples and Signature. After some string manipulation to parse the Module names I ended up with the query below.

let
sharedTab = Record.ToTable(#shared),
//select only functions from #shared
functions = Table.SelectRows(sharedTab, each Type.Is(Value.Type([Value]),type function)),
//parse Module from function name
modules = Table.AddColumn(functions, "Module", each Text.Split([Name], "."){0}, type text),
functionNames = Table.AddColumn(modules, "FunctionName", each List.Last(Text.Split([Name], ".")), type text),
//get category from documentation
categories = Table.AddColumn(functionNames, "Category", each try Value.Metadata(Value.Type(Record.Field(#shared,[Name])))[Documentation.Category] otherwise ""),
//parse only the first code example from documentation
examples = Table.AddColumn(categories, "Examples", each
let eg = Value.Metadata(Value.Type(Record.Field(#shared,[Name])))[Documentation.Examples]?
in if Type.Is(Value.Type(eg),type record) then eg[Code] else eg{0}?[Code]?),
//get the short description from the documentation
descriptions = Table.AddColumn(examples, "Description", each Value.Metadata(Value.Type(Record.Field(#shared,[Name])))[Documentation.Description]?),
//parse subcategories
subcategories = Table.AddColumn(descriptions, "DotCategory", each List.Last(Text.Split([Category],"."))),
//adding the signature of the functions
out = Table.AddColumn(subcategories, "Signature", each Signature(Record.Field(#shared,[Name])))
in
out
view raw NavigateShared.cs hosted with ❤ by GitHub

I was planning to represent this data in a tile chart. Similar to a periodic table. However after experiencing performance problems while rendering 600 tiles with Infographic Designer 1.5.2, I gave up on the whole idea and opted to visualize all of this information in a treemap.

I’ve added a dummy column([Uno]) with value 1 and used as a Value field in the treemap chart. Then I’ve added column [Module] to the Group field and the [FunctionName] in Details section. All of the remaining columns: [Description], [Signature] and [Examples], I’ve added to the Tooltips section of the chart. To control the treemap, I’ve used a matrix which acts as a filter on [Category] column.

treemap

Click to view the embedded version of the report ➡

Exporting Power Query tables to SQL Server

Power Query is a great tool for data mashup, however when it comes to really show its powers as a desktop ETL tool, you realise that no matter how powerful at E an T it is, the L is falling behind, as you can only load the data into the embedded tabular data model that comes with Power BI.

Ignoring R, local cubes and HTTP POST requests, I wanted to showcase a few other methods for exporting data from Power Query and loading it to SQL Server.

The logic I follow is very simple — because there is no obvious limitation to the type of queries you can run against OleDb or ADO.NET datasets — a SELECT query will be executed with same success as an INSERT query.

As a prerequisite, on the SQL Server side I’ve created a table which I populate initially with single records, then with 1M records.

CREATE TABLE [dbo].[Demo](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Key] [int] NULL,
[Value] [varchar](50) NULL
)
view raw DemoTable.sql hosted with ❤ by GitHub

OleDb

Following the logic described above, inserting a single entry into Demo table via OleDb looks fairly trivial.

= OleDb.DataSource(
"provider=SQLOLEDB.1;data source=.;database=QueryPower",
[Query=Text.Format("INSERT INTO [dbo].[Demo] VALUES(#{0},'#{1}')",{1,"A"})])
view raw InsertSqlOleDb.m hosted with ❤ by GitHub

However if you try to follow this approach you’ll soon notice that in addition to fiddling around with Text.Format to get your INSERT statement righ, you will also have to approve each parameter variation from your SQL statement. Which makes this approach unfeasible unless you construct your INSERT statement to include all of the millions of records in a single string.

ADO.NET

On the other hand an ADO.NET connection wrapped in a Value.NativeQuery function provides a more familiar approach to performing data INSERT. This method has the convenience of passing query parameters as a record or list inside the SQL statement. As a result you need to approve the firewall rule for the SQL query only once.

= Value.NativeQuery(
Sql.Database(".", "QueryPower"),
"INSERT INTO [dbo].[Demo] VALUES(@key,@value)",
[key=10,value="A"])
view raw InsertSqlAdoDb.m hosted with ❤ by GitHub

Changing the scale

The methods I’ve described so far work reasonably well for times when you need to insert a few hundred records in a SQL table. However when the problem explodes to millions of rows, you need to do some prep work before sending over all of that data. I’ll detail 3 techniques for exporting datasets with millions of records: Table Valued Parameters, XML and JSON.

As an example I use a 1M records table I generate from a list with an incremental sequence Key and GUID generated Value field. Table dummy.

= Table.FromRecords(
List.Generate(()=> [ key = 0, value = Text.NewGuid()],
each [key] < 1000000,
each [key = [key]+1,
value = Text.NewGuid()]))
view raw dummy.m hosted with ❤ by GitHub

TVP

When doing bulk inserts in ADO.NET, the most recommended approach is to use a Table Valued Parameter to send data to a Stored Procedure which performs the INSERT. As a prerequisite on SQL Server side I had to add a table type and a stored procedure that accepts this table type as a READONLY parameter.

CREATE PROC [dbo].[usp_InsertDemoXML] @xmlBin AS varbinary(max)
AS
BEGIN
declare @xml xml = convert(xml,@xmlBin)
INSERT INTO Demo ([Key],[Value])
SELECT
[key] = x.v.value('key[1]','int'),
[Value] = x.v.value('value[1]','varchar(50)')
FROM @xml.nodes('/table/row') x(v)
END
view raw XML.sql hosted with ❤ by GitHub

Although sending a TVP into an SP is a straightforward task in .NET, it requires casting the data reader (table) to a SqlDbType.Structured data type. 

I wanted to rely on Power Query internal data type conversion and sent a table as a parameter to my newly created SP. Needless to say I was greeted with the error below


After giving up on the idea of casting a Power Query table into a SQL Structured data type, I’ve decided to replicate the internal call during of an  exec SP with @TVP. Thus I’ve built a rather lengthy 1002 row SQL statement that declares an internal table, inserts all the records from the dummy table and calls the SP with internal table as a parameter.

let
//building the Table Value Constructor for a record
rows = (t) => Table.TransformRows(t, each "("&Text.From([key])&",'"&[value]&"')"),
//group entries in sets of 1000 because Table Value Constructor have a limitation of 1000 entries
values = List.Generate(()=>0,
(i)=>i<1000,
(i)=>i+1,
(i)=>Text.Combine(rows(Table.Range(dummy,i*1000,1000)),",")),
//adding the declaration of TVP and the call to SP that uses TVP
TVP = Text.Combine({"declare @p1 dbo.DemoList #(lf)"}&
List.Transform(values, each "insert into @p1 values"&_&"#(lf)")
&{"#(lf)exec dbo.usp_InsertDemoTVP @[email protected]"}),
//call the lenghty SQL statement
out = Value.NativeQuery(Sql.Database(".", "QueryPower"),TVP)
in
out
view raw TVP.m hosted with ❤ by GitHub

XML

Back in the days when XML was still cool, sending some tabular data to SQL Server was the matter of serializing a recordset into an XML document and sending it to a stored procedure that parsed it back into a recordset. Nothing has changed since on the SQL Server side.

CREATE PROC [dbo].[usp_InsertDemoXML] @xmlBin AS varbinary(max)
AS
BEGIN
declare @xml xml = convert(xml,@xmlBin)
INSERT INTO Demo ([Key],[Value])
SELECT
[key] = x.v.value('key[1]','int'),
[Value] = x.v.value('value[1]','varchar(50)')
FROM @xml.nodes('/table/row') x(v)
END
view raw XML.sql hosted with ❤ by GitHub

Since Power Query lacks any functions for constructing XML documents, I found it very difficult to come up with an approach for building binary XML documents that performs on par to JSON serialization. I’ve tried various combinations of looping through the records of a table via List.Generate and List.Accumulate, nevertheless Table.TransformRows and avoiding making any calls to Text.Format seemed to deliver best performance.

let
//Transform each row of [dummy] table into a binary XML
rows = Table.TransformRows(dummy,
each Binary.Combine({Text.ToBinary("<row><key>")
,Text.ToBinary(Text.From([key]))
,Text.ToBinary("</key><value>")
,Text.ToBinary([value])
,Text.ToBinary("</value></row>")})),
//Wrapping up all the rows in <table> tags
out = Binary.Combine({Text.ToBinary("<table>")}&
rows&
{Text.ToBinary("</table>")})
in
out
view raw dummy2XML.m hosted with ❤ by GitHub

On calling Value.NativeQuery, Power Query converts the binary string into SQL varbinary(max) data type, thus for the final call it requires only mapping the placeholder to the binary XML document.

let
out = Value.NativeQuery(Sql.Database(".", "QueryPower"),"usp_InsertDemoXML @xmlBin",[xmlBin=dummy2XML])
in
out
view raw XML.m hosted with ❤ by GitHub

JSON 

Since SQL Server 2016 introduction of JSON support, sending and receiving data to/from SQL Server got a lot easier. Power BI is no exception, sending data to a SQL Server table requires addition of a SP with JSON parameter and on Power Query side serializing the dataset as a text bases JSON object with Json.FomValue.

CREATE PROC [dbo].[usp_InsertDemoJSON] @json AS nvarchar(max)
AS
BEGIN
INSERT INTO Demo ([Key],[Value])
SELECT *
FROM OPENJSON(@json)
WITH ([Key] int, [Value] varchar(50))
END
view raw JSON.sql hosted with ❤ by GitHub
let
//serialize table [dummy] into a JSON object
json = Text.FromBinary(Json.FromValue(dummy)),
//pass JSON object to SP that unwrapps it as a recordset & performs INSERT operation
out = Value.NativeQuery(Sql.Database(".", "QueryPower"),"usp_InsertDemoJSON @json",[json=json])
in
out
view raw JSON.m hosted with ❤ by GitHub

Because of native JSON support on both SQL and Power BI sides, this was by far the most easy and fast way of sending data across.

4 ways to get USERNAME in Power Query

Regardless of what are you requirements, be that providing some user context during slicing of data or a filtering mechanism for data querying, having a function like USERNAME() in Power Query toolbox  can greatly simplify your data mashup scripts. While trying to build a more robust approach for self referencing queries in M, I’ve managed to collect 4 methods on getting the current username in Power BI. These methods rely on a variety of dependencies — R, NTFS, LocalDB and PowerShell, and come with their own advantages and disadvantages. Based on your setup you might find some of the methods more appropriate than the others, however they are not a universal replacement for a native supported function like USERNAME().

1. R.Execute()

Once again the Swiss Army knife of Power Query — R with its hook function R.Execute, never fails at extending Power BI capabilities. Here is an example of how I call the system command whoami via R function system2 and use its output as a data source in Power Query.

let
RScript = R.Execute("output <- read.table(text=system2(""whoami"", stdout=TRUE))"),
output = RScript{[Name="output"]}[Value]
in
output
view raw STOUTin.m hosted with ❤ by GitHub

Pros: The advantage of this approach is that it can be adapted to get the STOUT of other system commands as well.

Cons: It depends on calling R, which is currently supported only by Power BI Desktop.

2. NTFS

The output table of Folder.Contents in Power Query contains the field Access Date which records the timestamp of when a file/folder was moved, opened, or access by any other means. Assuming that the NTFS Access Date records for C:/Users sub-folders are up to date, then the folder with the latest Access Date should be the current user.

A note of caution here, NTFS might take up to an hour to update the file times. Which makes this method unreliable in certain scenarios. Nevertheless I’ve recommended this shortcut in the past, and it is still my preferred way of getting to msmdsrv.port.txt file.

//Assumes current user is the one with latest [Date accessed] folder in C:\Users
let
Users = Folder.Contents("C:/Users"),
CurrentUser = Table.FirstN(Table.Sort(Users,{{"Date accessed", Order.Descending}}),1)[Name]{0}
in
CurrentUser
view raw UserNameNTFS.m hosted with ❤ by GitHub

Pros: Simple

Cons: Unreliable as NTFS can take up to 1 hour to update the Access Date records of a folder. The Antivirus and other background processes can interfere with correctness of Access Date ⇒ Current User. Also the output of this function doesn’t contain the domain name.

3. LocalDB or any other SQL source

If you happen to have a SQL Server instance lying around or by some curious accident you’ve installed SQL Server Express with support for LocalDB, you can make a trivial SQL query to get SYSTEM_USER. You don’t even need an assigned database for that.

There is always the option of installing only LocalDB feature of SQL Server Express.

install

Make sure to select the version that is relevant to your setup as the connection strings for LocalDB have changed over the years

//LocalDB
UserNameLocalDB =
() => Sql.Database("(localdb)\v11.0","",[Query="SELECT SYSTEM_USER"])
//LocalDB from SQL Server 2016 Express
UserNameLocalDB2016 =
() => Sql.Database("(localdb)\MSSQLLocalDB","",[Query="SELECT SYSTEM_USER"])
//Local instance of SQL Server
UserNameSQL =
() => Sql.Database(".","",[Query="SELECT SYSTEM_USER"])
view raw UserNameLocalDB.m hosted with ❤ by GitHub

username

Pros: You might already have LocalDB installed.

Cons: Requires access to a SQL Server instance or installation of LocalDB.

4. HTTP API for Executing PowerShell Scripts

This method relies on a lightweight service which listens to port 8888 and executes PowerShell commands sent over HTTP. The PowerShell team has released this script/service as an alternative to the SOAP heavy Windows Remote Management, for more details please read the associated blog post: Simple HTTP api for Executing PowerShell Scripts. The service itself is written in PowerShell and the 300 lines of code are very easy to follow and adapt to the relevant scenario.

Once I have downloaded, installed and imported the module, I start an instance by using the commandlet Start-HTTPListener.

Start-httpListener

To get the current user in Power Query I make an HTTP call to localhost:8888 with system command whoami.

// Based on running the simple HTTP API service from:
// https://blogs.msdn.microsoft.com/powershell/2014/09/29/simple-http-api-for-executing-powershell-scripts/
let
output = Json.Document(Web.Contents("http://localhost:8888/?command=whoami&quot;))
in
output
view raw UserNameHTTPapi.m hosted with ❤ by GitHub

Back in the PowerShell console, the flag -verbose provides more details to what is happening behind the scenes:

whoami

Pros: This method can be expanded to sourcing the output from other system commands as well as PowerShell commandlets.

Cons: Introduces an additional dependency which might interfere with users’ execution policies. The PowerShell commandlet has to be running in the background.

IMAO

We shouldn’t be asking Power BI team to implement only small utilities like UserName() or CookBreakfast(). Instead we should see the bigger problem and ask for a bigger hammer. I would want Power BI to expose a functions for parsing the console output, this would help with accessing a lot of obsolete data sources as well as standard system commands. Or if we would like to keep only to system specific domain how about having a connector for WMI the same as SSIS does?