Power Query is a great tool for data mashup, however when it comes to really show its powers as a desktop ETL tool, you realise that no matter how powerful at E an T it is, the L is falling behind, as you can only load the data into the embedded tabular data model that comes with Power BI.
Ignoring R, local cubes and HTTP POST requests, I wanted to showcase a few other methods for exporting data from Power Query and loading it to SQL Server.
The logic I follow is very simple — because there is no obvious limitation to the type of queries you can run against OleDb or ADO.NET datasets — a SELECT query will be executed with same success as an INSERT query.
As a prerequisite, on the SQL Server side I’ve created a table which I populate initially with single records, then with 1M records.
Following the logic described above, inserting a single entry into Demo table via OleDb looks fairly trivial.
However if you try to follow this approach you’ll soon notice that in addition to fiddling around with Text.Format to get your INSERT statement righ, you will also have to approve each parameter variation from your SQL statement. Which makes this approach unfeasible unless you construct your INSERT statement to include all of the millions of records in a single string.
On the other hand an ADO.NET connection wrapped in a Value.NativeQuery function provides a more familiar approach to performing data INSERT. This method has the convenience of passing query parameters as a record or list inside the SQL statement. As a result you need to approve the firewall rule for the SQL query only once.
Changing the scale
The methods I’ve described so far work reasonably well for times when you need to insert a few hundred records in a SQL table. However when the problem explodes to millions of rows, you need to do some prep work before sending over all of that data. I’ll detail 3 techniques for exporting datasets with millions of records: Table Valued Parameters, XML and JSON.
As an example I use a 1M records table I generate from a list with an incremental sequence Key and GUID generated Value field. Table dummy.
When doing bulk inserts in ADO.NET, the most recommended approach is to use a Table Valued Parameter to send data to a Stored Procedure which performs the INSERT. As a prerequisite on SQL Server side I had to add a table type and a stored procedure that accepts this table type as a READONLY parameter.
Although sending a TVP into an SP is a straightforward task in .NET, it requires casting the data reader (table) to a SqlDbType.Structured data type.
I wanted to rely on Power Query internal data type conversion and sent a table as a parameter to my newly created SP. Needless to say I was greeted with the error below
After giving up on the idea of casting a Power Query table into a SQL Structured data type, I’ve decided to replicate the internal call during of an exec SP with @TVP. Thus I’ve built a rather lengthy 1002 row SQL statement that declares an internal table, inserts all the records from the dummy table and calls the SP with internal table as a parameter.
Back in the days when XML was still cool, sending some tabular data to SQL Server was the matter of serializing a recordset into an XML document and sending it to a stored procedure that parsed it back into a recordset. Nothing has changed since on the SQL Server side.
Since Power Query lacks any functions for constructing XML documents, I found it very difficult to come up with an approach for building binary XML documents that performs on par to JSON serialization. I’ve tried various combinations of looping through the records of a table via List.Generate and List.Accumulate, nevertheless Table.TransformRows and avoiding making any calls to Text.Format seemed to deliver best performance.
On calling Value.NativeQuery, Power Query converts the binary string into SQL varbinary(max) data type, thus for the final call it requires only mapping the placeholder to the binary XML document.
Since SQL Server 2016 introduction of JSON support, sending and receiving data to/from SQL Server got a lot easier. Power BI is no exception, sending data to a SQL Server table requires addition of a SP with JSON parameter and on Power Query side serializing the dataset as a text bases JSON object with Json.FomValue.
Because of native JSON support on both SQL and Power BI sides, this was by far the most easy and fast way of sending data across.