Basics of Python in Power BI

Disclaimer: My main area of expertise is SQL Server Business Intelligence and Power BI. I am a novice at Python. There are probably a million better ways to write the Python code, but I am impressed by the language and what I have been able to achieve with it.

I started using Python a bit more than 6 months ago because of the projects I am working on where data services are predominantly made available through rest APIs. The libraries available in Python made life easy and I knew it was inevitable for me to go down this road when Machine Learning Services (MLS) was released in SQL Server 2017.

To shorten a long story, I started using Python as a data staging solution using the Anaconda distribution (for machine learning capability and alignment to MLS). My focus was retrieving data from APIs into Pandas dataframes and then writing the results to SQL Server tables. Python proved to be surprising easy to learn (although I had some .NET background), although I knew my focus was incredibly narrow.

When Microsoft announced a preview of Python integration (for both Power Query and data visualization) I was so excited that I got up early on a bank holiday to try it out for myself. There was literally no examples to follow other than the data visualization bits from the blog post announcing the August release of Power BI Desktop.

I made some assumptions that the Pandas dataframes would be recognized by Power BI and got the Power Query bits working quickly. I struggled a little bit with a basic Python data visualization because it was the first time I ever tried to use Python for visualization. The end result was 3 simple steps to enable Python in Power BI and 3 more steps to get started with Python in Power BI.

Step A: Install the August 2018 release of Power BI Desktop

Step B: Enable Python support from the Preview featurespy1

 

Step C: Restart Power BI Desktop and set your Python home directory
py2

You may leave the default directory if you don’t have Python installed. Otherwise choose the path of the Python distribution you want Power BI to use when running your report.

Now 3 steps to get started with Python in Power BI

Step 1: Test your Python script in a Python editor

My sample data is a daily historical exchange rate for South African Rand (ZAR) against the US Dollar (USD). Its a simple CSV file containing the date and the ZAR amount for $1.

With the Anaconda distribution of Python, the Pandas data manipulation and analytics library is already installed. If you are using a different Python distribution, you may need to install Pandas.

The code has been kept fairly simple to get the basics right:

import pandas
df_USD = pandas.read_csv('D:\\data\\csv\\ZARUSD201x.csv')
df_USD.plot(kind='line', x=['ValueDate'], y=['TheValue'])

‘The first line of code adds a reference to the pandas library. The second line of code uses Pandas to open a CSV file and load it into a data structure called a dataframe. The third line of code plots a line chart using the date as the x-axis and the value as the y-axis.

pp1

The result may look good in the image above, but it appears much smaller in the editor. The CSV file also adds one small issue viz. the data is sorted descending. We need two small changes to the code to improve the result.

import pandas
df_USD = pandas.read_csv('D:\\data\\csv\\ZARUSD201x.csv').sort_values('ValueDate')
df_USD.plot(kind='line', x=['ValueDate'], y=['TheValue'], figsize=(25,10))

Line 2 makes use of the Pandas dataframe sort_values function to re-order the data. Line 3 adds the figsize parameter to control the display size of the chart. The result is much better.

pp2

Step 2: Get data into the Power BI data model using Python script

Now that we have a working Python script we can add it to Power BI. We do this in two parts. In this step we add the data to the data model via Power Query and in step 3 we will add the code for the visualization.

py4

Copy the first two lines of code into the Python script dialogue.

py5

The Navigator will show all available dataframes. In this example it will only be df_USD.

py6

A preview of the data will be displayed when selecting the dataframe. The data types were correctly detected in this example so there is no need to edit in Power Query. Select Load to add the data to the data model.

py7

Step 3: Plot the data in Power BI using the “Py” visual

Add a “Py” visual to the canvas. You will need to enable script visuals. Add the value and date fields to the visual. Change the date field to show the actual ValueDate (not the hierarchy). To make the Py visual respond to slicers, add year and month slicers from the ValueDate hierarchy.

py10

The “Py” visual will be blank initially and a Python script window appears below that. For the Python visual the data is required as a Pandas dataframe. The Power BI data model fields that are selected are converted to a dataframe (dataset) and the dataset is de-duplicated. This part of the script is generated by Power BI and appears in grey. The code to plot the chart goes below this.

 

py8

Now copy the last line of code from the Python editor and modify it to use the Power BI dataframe.

dataset.plot(kind='line', x=['ValueDate'], y=['TheValue'], figsize=(25,10))

Unfortunately this does not render anything. If we look at the details of the error message we see that nothing is rendered (even though the Spyder editor was able to render the plot)

py9

To fix this we have to use the matplotlib Python library to render the plot.

from matplotlib import pyplot
dataset.plot(kind='line', x=['ValueDate'], y=['TheValue'], figsize=(25,10))
pyplot.show()

Now the visual appears, but there is still one problem.

py11

Its not impossible, but highly unlikely that the exchange rate is just climbing on a daily basis for July 2018. The problem here is that a Pandas dataframe has an index and automatically sorts the data on this index. To change the sort order we change the sorting of the dataframe in a similar way to the way we sorted the dataframe in step 1.

dataset.sort_values('ValueDate').plot(kind='line', x=['ValueDate'], y=['TheValue'], figsize=(25,10))

Now we get the correct result.

py12

I hope this post helps you get started with Python in Power BI. While I’m a novice at Python, I thought it would be worth sharing with others who are also starting out.

Posted in Uncategorized

Managed Self-service BI

… and the journey ahead.

It’s interesting to watch technology vendors and consulting companies conjure new buzz words to sell products and services. Sometimes they just try too hard. They should try to focus on practical realities we can relate to. Self-service BI is a classic example of a term that’s used as if self-service BI is something completely new, to the extent that it’s often referred to as the second wave of BI.

Really? Is it just not a fresh look at self-service BI and an attempt to improve the approach through new technology and better processes? In other words, is it not just an attempt to shift from an unmanaged self-service mode to a managed self-service mode? In my opinion, it certainly is, but it is also much more than that.

Shadow BI

Self-service BI has many guises. I am not sure about the rest of the world, but in South Africa the most popular incarnation is an MIS (Management Information Systems) capability. Self-service BI or Shadow BI exists as a response to challenges that IT business intelligence teams have failed to solve. This generally boils down to taking too long to deliver answers and not always getting the answers right. The single most important point to acknowledge is that Shadow BI teams are answering critical business questions, and hence immensely valuable to the business.

ShadowBI

But Shadow BI comes with its own challenges. This includes the infamous “spread-mart” hell which breaks every rule in the book around data management and governance. It is also labor intense (repeated effort and high maintenance) and prone to human error (business rules live in the minds of individuals). So the business ends up with serious risk exposure around business continuity and accuracy of process measurement, and hence decision quality could be compromised.

Technology Evolution

Meanwhile in the world of technology things are moving forward exponentially. Scale of computing, spectrum of functional possibilities and ease of use drive massive potential change. Just like we’ve packed super computers into smart phones, business intelligence desktop tools start to deliver capabilities at scale without the complexities of the business intelligence developers’ traditional toolkit. All of a sudden technology vendors have made available DIY versions of ETL tools, analytical data modelling tools (“cubes”) and data visualisation tools. Now the world is your oyster … or not.

Process Evolution  

A new process is required that must resolve two key challenges. We have already pointed out that traditional IT BI methods have failed to deliver accurate and timeous results. Here we need to use the new technology to build prototypes before we embark on lengthy IT BI projects. Prototyping quickly establishes the viability and correctness of a solution. Compare this to creating requirements documents, developing a solution and finding gaps and problems during user testing. Scale of DIY BI tools also provide a means to leverage the prototype as an interim solution while the more robust solution is being developed.

The second challenge we need to resolve, is to find a permanent home for self-service solutions. If we simply adopted the new technology without rethinking the self-service process, then we will be responsible for a new breed of super spread-marts and a more chaotic world of Shadow BI. To address this, we need to establish a self-service solution industrialization process. This effectively means promoting self-service solutions to enterprise class platforms.

SelfServiceProcess

Continuous Collaboration

But it’s not just collaboration within teams that’s required. Our new process will also require improved collaboration between IT and business. More than ever do we need to combine business process (domain) knowledge with IT design and technical skills. Even with the most advanced technology in the world, success is not a given unless there is serious Business-IT collaboration. Collaboration is assumed to be ongoing and encouraged through iterative and agile delivery techniques.

Driving Change

Technology and process adoption is not a switch you flick. The historical reasons for the existence of Shadow BI makes it all the more difficult to drive change. At least it is a known and acknowledged change that is required. It gets more complicated though because the world has changed. Consumer culture has changed. Businesses are changing and the elevated importance of the data driven organisation adds pressure to the equation.

Bigger questions emerge. We have to go beyond our need for a managed self-service BI model. Our organisation needs a new data and analytics strategy. So we go to the drawing board and ask:

  • Have we appropriately prioritized and leveraged data/analytics in our organisation? Is it a C-suite priority? (we can’t drive change without the right levels of support).
  • How do we drive a shift in analytical focus? (looking back must only be for the sake of looking forward).
  • Have we structured our people/teams accordingly? Are we trying to solve new problems based on old team structures and dated skill sets?
  • Have we got the right systems and building blocks in place? Is our analytics platform modern enough? How do we retro-fit master data management into our existing systems?

Big questions indeed. And there are many more we could ask. So while we focus on moving from Shadow BI to managed self-service BI, we have to keep in mind that it requires a broader strategic outlook and road-map.

This post was motivated by the challenges I see BI teams facing and their attempt to solve these challenges with a bottom-up approach. Too few of these teams are succeeding and it is not a lack of skill or motivation, but a failure at the very top of these organisations to respond (or acknowledge) the importance of data and analytics in the face of an economic paradigm shift.

Posted in Self-Service BI

What is Power BI?

… and why you should be interested.

Technology is a business enabler. Our mind-set must be one of solving business problems and then determine how technology will help solve the problems. The first pass for evaluating technology requires an initial understanding of the key concepts rather than platforms and architecture. In this post I’ll try to demystify what Power BI 2.0 is. It has its origins in Power BI version 1 which was preceded by Power Pivot, and this may have left a trail of confusion. As a rapidly evolving platform the information in this post may be out of date, or simply incorrect because I have misunderstood some concepts or features.

Power BI is labelled as a Business Analytics suite. There is much confusion and variation around the term Business Analytics, so I am surprised that Microsoft used this label. While I am not going to attempt to define business analytics, it is sufficient for the purpose of this post to say that it encompasses self-service BI and also refers to a continuous and iterative analytics process that matches the business need for speed, agility and a forward/future focused analytics approach.

Power BI Solar System

I like to describe Power BI as a solar system. The Power BI Service is the sun in this solar system. This solar system belongs to the Cortana Analytics galaxy in the Microsoft universe.

Rafal Lukawiecki provided a very simple but elegant explanation of Cortana Analytics during a keynote at SQLSaturday in Johannesburg (slides available here). Also have a look at the example of Cortana Analytics in healthcare. But back to Power BI …

The Power BI Service is the glue between the user and the data. This analytics function is supported by content builders who create reports, dashboards and content packs. The content is further enhanced through features such as natural language query, data driven alerts, annotations and sharing.

To gain a better understanding of this over-simplified view, we have to navigate the map of the Power BI solar system. The original idea of this layout came from a Jen Underwood slide, but I have adapted this with in an attempt to pack more detail and context into a single image.

Power BI Map

Navigating the map is reasonably simple. There are 4 areas to review:

  • Source Data (entire bottom half of the image)
  • Development Tools (top left hand corner)
  • Power BI Service (top center)
  • Data Presentation and Exploration (top right hand corner)

Source data is the data you want to surface insights from. This data may reside in files, databases or accessible via services. The data may be on-premises or in the cloud. The data may be public or private. The data may be raw (transactional system data), curated (data warehouse data) or packaged (content packs) for end user consumption. A personal gateway lets you connect to data on your laptop/device. While a connector is available for on-premises SQL Server Analysis Services, it is the much anticipated enterprise gateway that we patiently are waiting for. But even more exciting are the possibilities with the Power BI Rest APIs that allow organisations to create custom applications that can push data directly to the Power BI service.

PowerBIDesktop

To author content for Power BI we need a combination of the Power BI Desktop tool (or Excel with the Power add-ins) and the Power BI Service. Power BI Desktop and Excel are self-service BI tools that are more than capable of delivering “power user” features for data integration (ETL), data modelling and report building. Officially Excel is not part of the Power BI suite, but Amir Netz has promised much love for the Excel crowd and Office 2016 is proof of this (currently in preview and rolling out from 22 September). Excel 2016 preview sports the same version of the tabular model engine that is currently available in Power BI Desktop. Transition between the tools is not what users are hoping for yet, but this is outside of the scope of this post.

PowerBIDashboard

Once a Power BI Desktop (or Excel) solution has been created, the content can be published to the Power BI service. The service becomes the host for the queries, data (model) and reports. The only parts of the solution that can only be created in the service are the dashboards and content packs (prepacked solutions including dashboards). These content packs could be commercially packaged solutions or custom content packs created within your organisation. The service includes a natural language query feature at dashboard level that lets you type in questions that are answered through data or data visualizations. The service also manages data refresh, alerts and annotations.

Finally there is the end user experience. Content can be accessed via browsers (yes, Chrome and Safari are supported), via the Power BI application (in the Windows Store) on Windows 8 & 10, or via mobiles devices (phones and tablets … it appears that Apple devices are a priority). The tablet experience is richer than the phone in that it can drill down to underlying reports which the phone app can’t do.

Overall the Power BI offering from Microsoft ranks high in completeness of vision both as a standalone business analytics suite and as a solar system within the Microsoft universe that offers companion platforms such as SQL Server, HDInsight, Machine Learning, etc. As a new product, ability to execute is reasonable with a demonstrated commitment to incremental delivery. The service is currently updated weekly and Power BI Desktop monthly. The Power BI team in Microsoft has a keen ear for customer input which bodes well for future expansion of this solar system.

More compelling though is that Microsoft are opening up parts of the platform to developers. Custom visualization is a very clever response to the data viz reputation of products like Tableau. REST APIs are a powerful means to embed analytics in your organisations application & data architecture, and entrenches the Microsoft integration capability.

While the marketing material talks about any data, anywhere, any way … it is the lesser emphasized platform to extend and build on (rather than replace) that I find attractive.

Tagged with:
Posted in Power BI

First thoughts

Having threatened to blog for many years, there is no better time than the present to actually do so.

We live in an era that creates more data about events or activities than we can make sense of. New technologies can help us harvest, shape and analyse data. But finding patterns and determining & actioning insight requires both analytical & practical intelligence. So technology is only an enabler if we leverage what’s possible and know how to put the tools to good use.

As a consultant, I get exposure to a variety of business challenges that require analytical solutions. Using new age self-service BI tools (mostly Power BI) has made it possible to convert a conceptual solution to a tangible proof of concept with ease and within a very short space of time. My background as a BI professional with tools such as Analysis Services, Integration Services and Reporting  Services has made the transition trivial. In contrast, business adoption of self-service BI tools are slow and challenging for many reasons that may be a future topic of discussion.

In this space I’ll attempt to blog about common, useful and interesting patterns that I encounter during my labour of love in the world of data. Although it has a technology focus, expect some random posts about BI concepts, analytical thinking and practical intelligence.

Posted in Other