Developing Predictive Analytics Solutions Using Agile Techniques

Let me preface this blog by giving full credit to Dr. Martin John Madsen, Senior Data Analytics Consultant with Allegient.  This was a blog he wanted some feedback on and I thought it had a great focal point that I think many companies run into, which is the processes around developing data science solutions.  He doesn’t have his own website, but I wanted this content published, so he gave me permission to do so on mine. With that said, let’s dig in.

 

There is a lot of confusion around the traditional software life-cycle vs a data science life-cycle. Are they completely different animals? They both leverage technology that solves business problems, so we should be able to use similar techniques, right?  Well, yes and no. Let’s discuss.

Half of predictive analytics projects started by companies fail “because they aren’t completed within budget or on schedule, or because they fail to deliver the features and benefits that are optimistically agreed on at their outset.” [[1]]  This story is very familiar in the software development world. [[2]] Agile techniques, complemented by Development Operations (DevOps) methodologies, were developed to address some of the key challenges is bringing software projects to completion. In this post, I address one way to adapt these techniques for use in a data analytics project. [[3]]

CRISP-DM Methodology

The current standard [[4]] methodology for data science projects is the Cross Industry Standard Process for Data Mining (CRISP-DM) (illustrated in the circular diagram) [[5]]. It captures the iterative nature of doing data science. Similar to traditional BI, data science is iterative by nature.

CRISP-DM_Process_Diagram[[6]]

The challenge with CRISP-DM is getting actionable results from the data science project – turning it into business processes and getting results out to decision makers. There are several potential traps in the methodology that can lead to project failure:

  • Getting stuck in the Data Understanding-Data Preparation phase – Large data can be overwhelming and lead to the team getting lost in trying to match the business use cases with the available data.
  • Getting stuck in the data preparation-modeling loop This potentially can be an infinite loop without sufficient controls and focus for breaking out of the loop. There is no such thing as a perfect model, but it is hard to determine when the model is “good enough”
  • Getting out of the main business understanding to evaluation loop Data analytics projects can iterate many times over this entire loop and never break out into deployment.

A great way to avoid these traps is to combine the CRISP-DM methodology [[7]] with result-driven Agile methodology, with the integrated techniques from DevOps/DataOps [[8]].

Agile Methodology

Adapt the Agile Scrum Framework [[9]] to the needs of a data analytics project [5] by mapping the roles and events onto the CRISP-DM methodology. The resources involved in the Scrum Framework are illustrated below and can be heavily leveraged with the CRISP-DM methodology as well.

Agile Scrum Framework at a Glance[[10]]

The Scrum Team

The Product Owner

From the Scrum Guide:

The Product Owner is responsible for maximizing the value of the product and the work of the Development Team. How this is done may vary widely across organizations, Scrum Teams, and individuals. [9]

For a predictive analytics project, this is either the data science project sponsor or a member of the organizational leadership team. Having a product owner helps to provide clear guidance and direction to the data science team and keep the project focused on real business needs.

The Data Science Team

Typically called the “development team” in Agile guides and recently modified to be a DevOps team, the data science team includes everyone who is working on the data science project. From [9]:

The Development Team consists of professionals who do the work of delivering a potentially releasable Increment of “Done” product at the end of each Sprint. Only members of the Development Team create the Increment.

Likewise, a data science (or DataOps) team consists of members with complementary skills [[11]] including:

  • Data engineers who are responsible for capturing, storing, and processing data;
  • Data scientists who work on the data cleaning and predictive modeling;
  • Business analysts who connect an understanding of the business with data understanding;
  • Platform administrators who work with the data engineers and data scientists to develop deployable products; and,
  • UX designers who work on the front-end data communication with the data product users.

The Scrum Master

A Scrum Master acts as the data science team guide and interface between the data science team, the product owner, and the organization.

Scrum Events

The Scrum methodology breaks up the overall project into smaller pieces of work known as sprints with the goal of producing a potentially usable product at the end of each sprint.

The heart of Scrum is a Sprint, a time-box of one month or less during which a “Done”, useable, and potentially releasable product Increment is created. Sprints best have consistent durations throughout a development effort. A new Sprint starts immediately after the conclusion of the previous Sprint. [9]

The iterative nature of the CRISP-DM doesn’t fall nicely into the more linearly-focused Agile Sprint. However, mapping key components of CRISP-DM onto Agile Sprints helps keep focus on creating usable business products at the end of each sprint.

First Sprint

The goal of the first sprint is to reach a point where the team understands the business objectives and organizational data. From the CRISP-DM method:

The first stage of the CRISP-DM process is to understand what you want to accomplish from a business perspective. Your organization may have competing objectives and constraints that must be properly balanced. The goal of this stage of the process is to uncover important factors that could influence the outcome of the project. Neglecting this step can mean that a great deal of effort is put into producing the right answers to the wrong questions. [5]

Furthermore, the sprint should gather an initial collection of data sources including the tools required for data loading. [5]

This sprint is considered “Done” when the team presents a report describing the key business issues, an inventory of available data assets, a plan for answering the top business data questions, and a description of what success will look like.

Second Sprint

To front-load the entire data process, combine several of the CRISP-DM stages into a single sprint with the goal of delivering a minimally viable predictive product at the end of the sprint. The combined CRISP-DM stages are

  1. Data Preparation: perform data cleaning, enrichment, and feature engineering steps
  2. Modelling: select and assess modeling techniques, tune model parameters
  3. Evaluation: evaluate model performance against the business goals

This sprint is “done” when the team either has a model that performs at an acceptable level, or has determined that the data are not sufficient to meet the business goals. In the case of an acceptable model, the goal of the sprint is to have the initial model ready for further testing and deployment into a production environment. When the data are not sufficient to meet the business goals, the sprint produces a report documenting the evidence for this outcome.

Third Sprint

In the case where the test model developed in the second sprint is meeting business goals, the goal of the third sprint is to get the model into production.

In the deployment stage you’ll take your evaluation results and determine a strategy for their deployment. If a general procedure has been identified to create the relevant model(s), this procedure is documented here for later deployment. It makes sense to consider the ways and means of deployment during the business understanding phase as well, because deployment is absolutely crucial to the success of the project. This is where predictive analytics really helps to improve the operational side of your business. [5]

The sprint is considered “done” when the team deploys a functional predictive analytics model in the production environment. At this point, the predictive analytics model can start to generate value for the business.

In the event where the second sprint finds that the business goal cannot be met with existing data, a third (and successive) sprint starts back at the beginning, selecting another business goal for evaluation or selecting a different set of data to work with.

Conclusion

Adopting this combination of Agile and CRISP-DM methodologies creates a framework for moving predictive analytics projects into the production environment where they can have a positive impact on the business. It helps teams break out of potential infinite loop traps and keep them focused on the overall goal: providing a positive return on investment for the business.

[1] http://analytics-magazine.org/the-data-economy-why-do-so-many-analytics-projects-fail/, https://www.analyticsvidhya.com/blog/2016/05/8-reasons-analytics-machine-learning-models-fail-deployed/

[2] http://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/delivering-large-scale-it-projects-on-time-on-budget-and-on-value

[3] Other posts that also look at the Agile/Data analytics mash-up: http://www.kdnuggets.com/2017/04/librarian-scientist-alchemist-engineer-dataops.html, https://www.svds.com/tbt-successful-data-teams-are-agile-and-cross-functional/

[4] http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html

[5] http://www.sv-europe.com/crisp-dm-methodology/

[6] https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

[7] http://www.kdnuggets.com/2017/02/real-world-results-agile-data-science-teams.html

[8] https://www.tamr.com/from-devops-to-dataops-by-andy-palmer/, ttps://en.wikipedia.org/wiki/Dataops

[9] https://www.scrumguides.org/scrum-guide.html

[10] http://agileforall.com/resources/introduction-to-agile/

[11] http://www.datasciencecentral.com/profiles/blogs/what-roles-do-you-need-in-your-data-science-team, http://www.kdnuggets.com/2015/08/3-components-successful-data-science-team.html

Advertisements

Mobile Reporting for a Mobile World – SSRS 2016 and Mobile Reports

With SQL Server 2016, as most of us are aware, there are numerous features that bring some amazing functionality to end users and really strive to complete the end-to-end user experience.  Microsoft has improved all key areas of the BI stack over the last decade by leaps and bounds up through SQL Server 2014…..with the exception of SSRS. Yes, there have been improvements, but they’ve been minor. In my humble yet honest opinion, Microsoft has fallen short of meeting customer expectations in this area. Their visualizations were incomplete, the user experience was a bit clunky, and performance was never terribly great.

There were constant hurdles to overcome when using SSRS. Personally, I would snapshot everything, cache as much as possible, automate delivery during off-hours and do my best to have SSRS display reports and do nothing more because performance seemed sub-part.  All of my stored procedures would do all of the heavy lifting on the database side, and the majority of my “report development” time would be spent on the data architecture piece. SSRS would simply render and display the data.

Fast forward to February 4, 2014…..the day Satya Nadella was appointed CEO of Microsoft and the exact day that Microsoft began the path toward data and cloud leadership. He has done more for this company than imaginable and brought it from the brink of obscurity to the industry(ies) leader.  He has accepted the shortcomings that the product stack has had and began planning to overcome each and every one of them.

Microsoft knew that the user community wasn’t terribly fond of SSRS, licensing was clear evidence of this. Tableau was the recognized market leader with their flashy visualizations and that “shiny new object” appeal. Rather than trying to build in-house with personnel that may not have had the creativity nor vision, they went after a company that clearly illustrated visualization leadership, Datazen.

What came next was a combination of the great visualizations of mobile reports (and still including their paginated reports) with the one piece of SSRS that was stable and highly respected, the security layer.

The Business Case for SSRS 2016 and Mobile Reports

So the obvious question here is, “Well I already have SSRS, why do I need this?” Or “I already have Tableau, Cognos, or some other reporting platform. Why would I want to switch to this?”

So for me, the above image makes one, and possibly the most important [particularly among leadership], aspect crystal clear…..consumability….the ability to consume reports on ANY devices ranging from your PC to your tablet to your mobile phone, all of which have a level of control allowing you to maximize your screen real estate.

Capabilities

Business insights through rich visualizations on any device. Native apps for Windows, iOS and Android

Benefits

  • Access your data from anywhere
  • Touch optimized data exploration and perfect scaling to any screen form-factor
  • Collaborate with colleagues on the go

Innovations

  • Acquisition of Datazen
  • Datazen Migration Assistant (for those that currently ues Datazen)
  • New Web Portal
  • Branding
  • Updated Report Builder
  • Mobile Report Publisher
  • Higher DPI
  • Printing (no ActiveX)
  • HTML 5
  • Visual Studio 2015

With all of this stated, it’s important to understand…..

Mobile Reports vs Traditional (Paginated) SSRS Reports

SSRS 2016 Overview

Now that we have a decent understanding of the inception of SSRS 2016, as well as what mobile reporting is, let’s dive into what SSRS 2016 IS and how Microsoft came to the conclusions they did by looking at the roadmap below. As we can see, the yellow circled items reflect SSRS 2016, but there’s an entire ecosystem of business intelligence that includes Power  BI, mobile apps, Excel, etc.

Mobile - BI Landscape.JPG

SSRS 2016 also includes the following improvements:

  • HTML 5 Rendering Engine
  • Cross browser compliant (Edge, Explorer, Firefox, Safari, etc.)
  • PDF Replaces ActiveX for Remote Printing
  • No ActiveX controls required (security hole)
  • Modern paginated reports
  • New, modern styles for charts, gauges, maps and other visualizations
  • File Export
  • XML, CSV, PDF, MHTML, Excel, PowerPoint, TIFF and Word *(advanced features)
  • Custom Branding
  • Customize the web portal logo and colors by using a branding pack
  • KPIs
  • Create data-driven and contextual KPIs in the web portal, including the ability to drill-through to more detailed reports (mobile, paginated, etc.)
  • Favorites
  • Save your daily/favorite content, including KPIs, Mobile/Paginated Reports, and PBI Reports

SSRS Web Portal

Web Portal

One of the great improvements is the addition of the “Web Portal”. I generally change the URL to reflect this nomenclature as well so it becomes http://localhost/WebPortal/.  I don’t follow the traditional “reports” nomenclature because it’s so much more than that now. It’s a “one stop shop” for all of your reporting needs, from data sources to data sets, to KPIs to mobile reports, to paginated reports, to Power BI reports (to be discussed later), to any other file of your choosing (similar to SharePoint repository).

Branding

Similar to SharePoint and modifying the cascading style sheets (CSS), users can modify some of the basic items such as the logo, title, title colors, background colors, etc.

Mobile - Web Portal.JPG

A brand package for Reporting Services consists of three items and is packaged as a zip file, which must then be uploaded to SSRS in the Branding section of the Settings page.

Mobile - Branding Files

Favorites

Once at the Power BI Web Portal, users have the ability to add their “every day” items to their list of Favorites. This way, every time they log into the portal, their Favorites are displayed only. This helps to overcome the overbearing directory structure that is traditional seen in SSRS. Users don’t need to remember which items reside in which path, they simply need to “Star” it and it’s now on their home page at their convenience. There is still the ability to search for specific reports, but I would only use that when you are unsure of what you’re looking for.

Mobile - Favorite KPI

How to Create SSRS 2016 Mobile Reports

Before we discuss how to create SSRS 2016 Mobile Reports, let’s quickly review the evolution of the tools from SQL Server inception to current day.

Mobile - Reporting Tool Evolution

As we can see, some tools have come and gone, while one has remained relatively stable and that’s Visual Studio. This tool will still be the primary and recommended development tool for standard (paginated) SSRS reports. Assuming most have used this tool and written SSRS reports, we’re going to focus on SSRS 2016 Mobile Reports, and the new tool we use to create them.

Mobile - SSRS Tools

SQL Server Mobile Report Publisher

Install the Mobile Report Publisher tool to create mobile reports. Use this SQL Server Mobile Report Publisher link to download the tool.

This application requires .NET Framework 4.5 (or later) and Visual C++ Redistributable for Visual Studio 2012 (x86).

Mobile - SSRS Mobile Report Publisher.JPG

Once installation is complete, fire up the Mobile Report Publisher!

Mobile - SSRS Mobile Report Publisher Screen.JPG

You’ll immediately notice a completely different UI compared to both the classic Report Builder and Visual Studio. There are 6 basic settings you need to understand in order to create reports.

  • Layout (Navigators, Gauges, Charts)
  • Data (Add, Refresh, Export)
  • Report Settings
  • Preview
  • Grid Control
  • Layout Control

Before we dig into more detail, let’s decide which approach we want to take when creating our mobile report.

Design First

In the design-first approach, create a mobile report layout first without importing any data.  This is a good way to create a mobile report when unsure that the data is formatted correctly.  Without real data, gallery elements are automatically bound to generated simulated data, which can be exported and used as a template to describe the required data.

Data First

The data-first approach is to import all required data first, then design the mobile report and set data properties on the mobile report elements.  This has the advantage of being able to connect each element to real data when added to the layout.  When using a data-first approach, be sure that the real data is formatted correctly for use with Mobile Report Publisher.

My personal preference is the Data First approach. I like to understand how each and every role of user is going to interact with the data, at which level of granularity, etc. Then once that’s established, I create data sets at the lowest level first, then aggregate up so each and evern layer of reporting can be satisfied (detailed level [paginated], mobile report, KPI, etc.)

I feel that this process is akin to building a house. The foundation is your data architecture, the windows, curtains, doors, etc. are your reporting solution (final and decorative pieces), while your framing and walls are your data strategy which helps combine the two.

Mobile - Reporting Approach

Once you’ve selected your approach, create a data source and a data set with either the standard Report Builder or Visual Studio. This is a pain point I hope they fix soon, because as of now you cannot create a data source or dataset with the SQL Server Mobile Report Publisher.

Mobile - Available data sources.JPG

Proper Chart Types

Once you’ve created both the data source and data set, select the proper chart type. There is some overlap with the standard SSRS paginated report chart types, so select the one matching the type and granularity of data. The chart below should be helpful.

Mobile - Report Chart Types.JPG

Mobile Specific Chart Types

For chart types specific to the mobile reporting component of SSRS, reference the following chart to follow best practices and recommendations.

Mobile - Mobile Specific Charts

Number of Charts per Layout

Once you’ve decided which chart type to use, click and drag it to the grid. You can then resize, change the data source, add drill through actions, etc. from there. To maximize the user experience, be sure to configure each report size with a specific layout. Screen real estate is tricky. Too much is overpowering, while too little leaves something to be desired. Consider putting more report parts on the computer layout, with less on the tablet, and the fewest on the mobile. Prioritize according to chart importance and insight value to the anticipated audience.

Mobile - Select Display Type.png

Once this is all completed and your high level approach is completed, dig into the details with the Layout, Data and Settings tabs to specify the following information:

  • Title
  • Sub-Title
  • Number Format
  • Data Structure
  • Drill-through (URL and parameter)
  • Series
  • Aggregations (Sum, Avg, etc.)
  • Currency (USD, EU, etc.)
  • Date Information

We won’t dig into all of this, or specifics around mobile report development, as the intent of this was to demonstrate the overall value and capabilities of SSRS 2016 Mobile Reports.

One final thing to note is that there is an option to Preview the report similar to Report Builder and Visual Studio, so be sure to use that before you deploy (save) to the SSRS 2016 server.

SSRS 2016 and Power BI

Many people mistakenly think that SSRS 2016 and Power BI are the same thing, but in my opinion, they are completely different. The functionality, the licensing, etc. are all different. The only things they have in common are that they’re reporting solutions (geared to different audiences), and they play well with one another.

I’ll be posting another blog in the coming weeks comparing and contrasting the two, as well as possibly leveraging BOTH!

The current version (as of now is SP1) currently supports uploading Power BI files, but they function similarly to SharePoint and any other website. When the user clicks on the file in the download bar, your browser detects the MIME type and opens the associated thick client (if available), in this case Power BI Desktop. SSRS 2016 currently DOES NOT render Power BI reports. This means that each and every person that would like to use the Power BI files MUST have the Desktop application installed. There is a workaround where you can create a HTML page that has a iFrame embedded that you can get from the Power BI portal, but bear in mind this isn’t truly an on-premises solution. The report and data (usually) is still coming FROM the Power BI portal.

Mobile - PBI Download

However, with all of that said, the crowd has been clamoring for an on-premises solution for Power BI. Previously, Microsoft’s answer would be SSRS 2016…..and it looks like it still is.

SSRS 2016 Technical Preview

Currently in technical preview, Microsoft has answered the call and will support Power BI on-premises. SSRS 2016 Technical Preview can currently RENDER Power BI reports, not just store them.  Here are a few important points on the current state.

  • On-premises solution to Power BI Sites (Cloud)
  • Create reports with Power BI Desktop
  • Connect “live” to Analysis Services models – both Tabular and Multidimensional (cubes)
  • Visually explore data and create an interactive report
  • Save reports to SSRS 2016 Technical Preview
  • View and interact with the report in your web browser
  • Same version of Power BI Desktop

Mobile - SSRS Technical Preview

So it looks like this may be released with SQL Server 2016 SP2, which would be a great addition. It’s still a mystery as to which features will be supported and which will not. Things like the Natural Language, Quick Insights and dashboards probably will not, while a myriad of additional data sources and drill-throughs probably will. This is all speculation, but time will tell and there are exciting times ahead!

To see/hear the full Allegient webinar, be sure to check it out here!