R.Execute() the Swiss Army knife of Power Query

I rarely use R in Power BI, but when I do, I make sure to keep the data mashup out of it. I don’t have anything against R; it’s arguably the best statistical tool out there; however, when it comes to using an expressive language for doing data wrangling, Power Query (M) is my language of choice.

Although R has been introduced to Power BI ecosystem as a complementary tool, its capabilities exceed Microsoft’s analytics suite. That is why Power BI is falling behind in areas where I can always rely on R to bridge the gap.

Below I describe five patterns of how I use R.Execute() in Power Query. I have divided these into two classes – as intended and unintended use cases – to separate the expectation for how much support these patterns will be getting from Power BI team.

Note: When using R.Execute() the only rule I follow is that if I need some data out of an R routine, I must put the execution results into a dataframe. Other than that, all R rules apply.

The intended use of R in Power Query

Executing an algorithm

This is probably one of the most legit use-cases for using R in Power BI. Because no language/framework/library can compare to the vast variety of algorithms implemented in R. If you need to run some exotic algorithm on your data, most likely there is an optimised version written an R, you just have to find the right package (and the right repository: CRAN or MRAN to name a few).

Here is a standard R call executing KMeans clustering on iris dataset.

let
RScript = R.Execute("set.seed(20)#(lf)irisCluster<-kmeans(iris[,3:4],3,nstart=20)#(lf)iris$cluster<-irisCluster$cluster"),
out = RScript{[Name="iris"]}[Value]
in
out
view raw RKmeans.m hosted with ❤ by GitHub

Calling an R function on Power Query input

Regex is the perfect example here. While we wait for its implementation to arrive in Power Query, why not use R regex related functions regexprregmatches,etc.?

Here is an example for parsing @mentions and #tags from a Twitter feed.

let
Source = Csv.Document(File.Contents("RTwitterSample.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.Csv]),
twitts = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
RScript = R.Execute("temp<-dataset#(lf)pattern<-""[#@]\\w+""#(lf)temp$mention<-sapply(temp$text,function(x) toString(regmatches(x, regexpr(pattern, x))))",[dataset=twitts ]),
out = RScript{[Name="temp"]}[Value]
in
out
view raw RTwitts.m hosted with ❤ by GitHub

Or another cheeky example of trying to salvage removed passwords from GitHub commits.

Unintended uses of R in Power Query

My favorite ones of the lot. I must admit these are the patterns that I abuse the most.

Performing file based operations

Say you have some CSV data stored in a ZIP file on your file system somewhere; the only way to get a hold of that data is either by manually unzipping the data, or use one of the lovely UnZip functions written in Power Query by either KenR or Mark White. The R way is even simpler, use read.csv(unz(zip,file.csv)) and you’re done!

let
RScript = R.Execute("datacsv<-read.csv(unz(""d:\\Downloads\\R_Twitter_sample.zip"",""R_Twitter_sample.csv""))"),
out = RScript{[Name="datacsv"]}[Value]
in
out
view raw ReadRZip.m hosted with ❤ by GitHub

Exporting data from Power Query

You’ve done your data analysts in Power Query, and you’re stuck with all this data in the $Embedded$ Tabular Data Model. The only way to get the data out is by exporting this data via a table chart into a CSV. On the other hand, you can use R, and export effortlessly millions of records via write.csv function and not worry about rendering problems that the table chart visualisation might cause.

let
seed = {1..10000000},
seedTable = Table.FromList(seed, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
RScript = R.Execute("write.csv(dataset,""d:\\Downloads\\out.csv"")",[dataset=seedTable]),
out = RScript
in
out
view raw 10MlinesToCSV.m hosted with ❤ by GitHub

Using STDOUT as a data source in Power Query

This pattern was inspired by one of the questions from the Power BI community. Again, referring back to R, you can call a system commands via system2() function and use its console output as a data source in Power Query.

let
RScript = R.Execute("output <- read.table(text=system2(""whoami"", stdout=TRUE))"),
output = RScript{[Name="output"]}[Value]
in
output
view raw STOUTin.m hosted with ❤ by GitHub

13 Comments

  1. When writing a CSV File in Power Query, is there a way to have the file name in a Parameter that can be passed to R.Execute … rather than having the output file name hard coded?

    1. Is it possible to pass Parameters
    2. Is so, what is the syntax to use

    Would appreciate any help.

    Like

    1. Absolutely. Because R.Execute() expects a string with all the R code inside you could patch it with the parameterized file path. Have a look at the example below.


      let
      seed = {1..1000000},
      path = "d:\\Downloads\\out.csv",
      seedTable = Table.FromList(seed, Splitter.SplitByNothing()),
      RScript = R.Execute("write.csv(dataset,"""&path&""")",[dataset=seedTable]),
      out = RScript
      in
      out

      view raw

      WriteToCSV.pq

      hosted with ❤ by GitHub

      Like

      1. Much appreciated. I had only 2 sets of ” around the filename in my test.

        By using path I also cut out the need to go setting the working directory.

        Thank you so much.

        Ted

        Like

  2. Great post, thank you!

    Can the following be rewritten to specify the table using strTable thus avoiding the need to use TableQuery?

    let

    strTable = “Root Cause”,
    TableQuery = #”Root Cause”,

    RScript = R.Execute(“write.csv(dataset,””C:\\Users\\RichardLove\\Desktop\\” & strTable & “.csv””)”,[dataset=TableQuery]),
    out = RScript
    in
    out

    Thanks again,
    Richard

    Like

  3. I really appreciate your post. It offers some new options for some reporting solutions. I would like to know how would I refer to another library in Power Query Editor? I’m succesfully using some library in R script Editor i.e.: library (tidyr) but I can’t find anywhere how to use another library. My motivation is to improve the speed of write.csv by using fwrite from data.table. Any help would be appreciated.

    Eric

    Like

  4. Thanks for the post, this will come in handy. Is it possible to modify the write.csv to append the dataset to an existing file? I am thinking of incremental data loads.

    Like

  5. I have a field that is typed as text in the Power BI table, but in R it is treated as numeric and gets written through write.csv as scientific notation. So this does mess with the data in my example

    Like

Leave a reply to hugoberry Cancel reply