Monthly Archives: February 2017

[Matrix] x [Matrix]

Although matrices are not first class citizens in Power Query, quite often a business problem comes along which might greatly be simplified by thinking of it in matrix terms.

You can of course imitate matrices in Power BI by means of tables, two dimensional lists or records but this approach doesn’t come with any matrix related functions. So as always you have to roll up your sleeves and roll out your own set of needed operations.

I would like to share with you an approach of doing matrix multiplication which can be translated into other algebra operations.

We’ll start by reviewing how matrix multiplication works from the diagram below.

313px-matrix_multiplication_diagram_2-svg

If we break down the problem, we can think of two matrices A and B, in M(Power Query) terms as two tables with matching size of rows in matrix A and columns in matrix B respectively. Then the matrix multiplication operation can be reduced to a problem of iterating through all row and column combinations and computing the dot product of equally sized vectors.

The dot product in M can be performed by using List.Accumulate which aggregates the product of a List.Zip which in turn pulls the respective elements of two vectors into a tuples. Here is the code.

dotProduct = (v1,v2) =>
       List.Accumulate(
       List.Zip({v1,v2}),0,(agg,_)=> agg+(_{1}*_{0}))

Next, we breakdown the matrix A into rows – Table.ToRows(A) – and matrix B into columns Table.ToColumns(B). And iterate through all the combinations of row vectors and column vectors in order to accumulate the results into a nested list of dot products.

Check the code below for use of nested List.Accumulate.


List.Accumulate(Table.ToRows(A),
               {},
               (_,ai)=> _&{ List.Accumulate(Table.ToColumns(B),
                          {},(_,bj)=> _&{ dotProduct(ai,bj) })
})

Putting the two operations together results in the following final function:

How much data can fit in Power BI Desktop

I’ve heard this question being asked multiple times, in forums, blog posts and every single client that doesn’t trust the performance of the Power BI desktop app. I’ve set myself the challenge to finding this out, while exploring the versatility of Power Query M language.

One of the most powerful data structures that M has in its arsenal is the List. I’ll start by creating a simple list of numbers.


= {1..1000}

Let’s Apply this query and check the result in the Data Model.

table_list

This list looks like a table, acts like a table – then it must be a table. While I’m here I’ll change the Data Type of the column Query1 from Text to Whole Number. This way I’ll save the memory footprint for next time I try to load as much data as possible in in this single column table.

Next step – construct the biggest list possible.


= {1..3000000000]

32bitlimit

And I’ve hit my first limit. Another try with 231-1.


= {1..2147483647}

Success! But not for long, as once I apply the changes and wait for the Query Changes to be committed to Data Model, at around 2,000,707,684 loaded records I was faced with another error.

2bn

I’ve tried to repeat the process with a smaller lists and got my first success at 1,999,999,981 records.

big

Which produced a 29GB file.

filesize

That’s a lot of records, and only to add them to the Data Model I needed 55GB of memory.

resources

Not too bad for a desktop application.

What about if I wanted to load lots of data that is not that unique? Say 232-2 of ones?


List.Generate(()=>[counter=-2147483647,uno=1],
each [counter]<2147483647,
each [counter=[counter]+1, uno=1],
each [uno]
)


This query completed in no-time, and with very little memory footprint as the entire msmdsrv.exe service stayed within the 1GB limit. The output saved on disk occupied 191KB. Go compression!

How about the wideness of data? I used a different set of queries to find the limit.

I’ve hit gold with :


=#table(null,{{1..15986}})

Which resulted in a 167MB file.

And the imprecise errors that kept pop-ing up were: