Titanic Data


Here’s a link to download the Titanic data — http://lib.stat.cmu.edu/S/Harrell/data/descriptions/titanic.html — it’s really useful in analytics and data science projects. You can:

  1. Build a predictive model. Example: https://www.kaggle.com/c/titanic
  2. I also use this data set to create interactive dashboards on tools like Qlik and Tableau to understand their features.


If you liked this, you may also like other data sets that I have here: http://parasdoshi.com/2012/07/31/where-can-we-find-datasets-that-we-can-play-with-for-business-intelligence-data-mining-data-analysis-projects/

Qlik sense: How to see Data Load Editor scripts for apps developed by your Team members?


(This post first appeared on the Qlik Community. here)


So you just joined a Business Intelligence Team and one of the responsibilities include building apps for your business users. Eventually, you would have a need to see Data Load editor scripts for apps developed by other members in the team. So what permission do you need to be able to do that?

Credits: darkhorse

Qliksense Version: Enterprise Server 2.0

Source: can’t see a peer’s data load editor scripts


This a two-step process.

1) Get “content admin” access (or “higher” level access)

2) Double check if you have access to see data load scripts for ALL apps

Step 1:

The short answer is that you need “Content Admin” permission from your Qlik sense admin…But with this access level, you will have access to other developer’s app via QMC. If you need to do this via HUB as well then you will have to change the content admin role.

Here’s how Serhan ( darkhorse ) explained how to get this done:

QMC–> Security Rules–>Content Admin–> Edit–> Context–> Both in Hub and QMC

Qlik sense management console

Step 2:

Now, once you get the “content” admin access, you might want to double two things:

1) You can get access to data load scripts on published apps — (I was able to do this but there still seems to some open questions around some folks not being able to see the data load scripts for published apps. If this is the case for you, you need to duplicate the app on your “my work” area and see the scripts)

2) You can duplicate apps on your “my Work” area and see scripts — this is also useful if you want to make changes to published apps that are out there.


I hope this helps you resolve the permission issues and help you collaborate with your team members!

Data puking and how T-mobile alienated a potential customer:


I saw this ad on a highway earlier today and my reaction: why would I switch to a network that has just “96%” coverage.

T mobile ad — example of data puking

…instead of converting a potential buyer, this ad actually made me more nervous. You know why? Its a case of what I like to call “data puking” where you throw bunch of numbers/stats/data at someone hoping that they will take action based off of it. So what would have helped in this ad? It would have been great to see it compared against someone else. Something like: we have the largest coverage compared to xyz. My ATT connection is spotty in downtown areas so if it said something like we have 96% coverage compared to ATT’s 80% then I would have been much more likely to make the switch.

I wrote about this adding benchmark in your analysis here

Takeaway from this blog: don’t throw data points at your customers. Give them the context and guide them through the actions that you want them to take.

How to add Sparkline data visualization to Google spread sheets?


I like using spark lines data viz when it makes sense! It’s a great way to visualize trends in the data without taking too much space. Now, I knew how to add sparklines in Excel but recently, I wanted to use that on Google sheet and I had to figure it out so here are my notes:

1. Google has an inbuilt function called “SPARKLINE” to do this.

2. Sample usage: =SPARKLINE(B2:G2) — by default you can put line chart in your cells.

3. Then there are other options including changing the chart type. You can find them documented here:  https://support.google.com/docs/answer/3093289

4. One of the best practices that I advocate when you spark-line to “compare” trends is to make sure that you have the consistent axis definition. So the sample usage for that could like this:


(if you want to do this for excel then here’s the post: http://parasdoshi.com/2015/03/10/how-to-assign-same-axis-values-to-a-group-of-spark-lines-in-excel/ )

After you’re done, here’s what a finished version could like on Google sheet:

Google Sheet Data visualization spark line

Here’s the working google sheet: https://docs.google.com/a/parasdoshi.com/spreadsheets/d/1EJYDTxOifeEL-YwW1a0oxXw7tFG1iAVQlwjo4EU8R-s/edit?usp=sharing

Data -> Insights -> ?


I was at the HP Big data conference last week and I heard something during the keynote that’s worth sharing with you.

As Data & Analytics professionals, we spend a lot of our time on finding insights, trends & patterns out of the data but the keynote speaker (Ken Rudin, Facebook) encouraged everyone to take that a step further = Think about Driving impact based on the insights. It’s simple yet a powerful idea! Over past few months, I have started working closely with decision makers and helping drive impact vs just “handing-off” insights.

I hope that helps! Just wanted to share that with you. What do you think?


What percentage of users are authenticated? (Google Universal Analytics)


You’re using Google’s Universal Analytics — That’s great! They key to make sure that you get the most out of it is to make sure that you incentivize your users to log-in aka authenticate. First step in doing that is to figure out percentage of users that are authenticated…Here’s how you can see that report:

1. Login to Google Analytics

2. Select your view > Go to “Reporting” section

3. Navigate to Audience > Behavior > User-ID coverage

Google Analytics User ID Universal

4. On this report, you can see authenticated vs unauthenticated sessions:

Percentage of authenticated users google analytics Universal


In this post, we talked about how to run a report that shows you percentage of authenticated users. (In google’s Universal analytics)

A key driver for business intelligence adoption: Embedded analytics.


Did you know most business intelligence (BI) solutions are under-utilized? Your BI solution might be one of them — I definitely had some BI solutions that were not as widely used as I had imagined! Don’t believe me? Take a guess at “number of active users” for your BI solution and then look up that number by using your BI server logs. Invariably, this is Shocking to most BI project leaders = Their BI solution is not as widely used as they had imagined! Ok, so what can you do? Let me share one key driver to drive business intelligence adoption: Embedded analytics.

Embedded analytics

#1: what is Embedded analytics? 

Embedded analytics is a technology practice to integrate analytics inside software applications. In the context of this post, it means integrating BI reports/dashboards in most commonly used apps inside your organization.

#2: why should you care? 

You should care because it increase your business intelligence adoption. I’ve seen x2 gains in number of active users just by embedding analytics. if you want to understand why it’s effective at driving adoption, here’s my interpretation:

Change is hard. You know that — then why do you ask your business users to “change” their workflow and come to your BI solution to access the data that they need. Let’s consider an alternative — put data left, right & center of their workflow!

Example: You are working with a team that spends most of their time on a CRM system then consider putting your reports & dashboards inside the CRM system and not asking them to do this:

Open a new tab > Enter your BI tool URL > Enter User Name > Enter Password > Oops wrong password > Enter password again > Ok, I am in > Search for the Report > Oops, not this one! > Ok go back and search again > Open report > loading…1….2….3…. > Ok, here’s the report!  

You see, that’s painful! Here’s an alternative user experience with embedded analytics:

They are in their favorite CRM system! And see a nice little report embedded inside their system and they can click on that report to open that report for deeper analysis in your BI solution.

How easy* was that?

*Some quick notes from the field:

1) it’s easy for users but It’s not easy to implement! But well — there’s ROI if you invest your resources in setting up embedded analytics correctly!

2) Don’t forget context! example: if a user is in their CRM system and is looking at one of their problem customers — then wouldn’t it be great if your reports would display key data points filtered for that customer! So context. Very important!

3) Start small. Implement embedded analytics for one subject area (e.g. customer analysis) for one business team inside one app! Learn from that. Adjust according to your specific needs & company culture AND if that works — then do a broad roll out!

Now, think of all the places you can embed analytics in your organization. Give your users an easy way to get access to the reports. Don’t build it and wait for them to come to you — go embed your analytics anywhere and everywhere it makes sense!

#3: Stepping back

Other than Embedded analytics — you need to take a look at providing user support and training as well…And continue monitoring usage! (if you’re trying to spread data driven culture via your BI solution then you should “eat at your own restaurant” and base your adoption efforts on your usage numbers and not guesses!)


In this post, I shared why embedded analytics can be a key drive for driving business intelligence adoption.

Every Data Analyst Needs to check out this FREE excel add-in: Power Query!


Power Query is amazing! It takes the data analysis capabilities of Excel to whole new level! In this post, I am going to share three reasons:

1. it enables repeatable mash-up of data!

Have you every had to do your data analysis tasks repeatedly on the data with same structure? Do you get “new” data every other week and need to go through the same data transformation workflow to get to the data that you need?

What’s the solution? Well, you can look at MACRO’s! Or you can request your IT department to create a Business Intelligence platform. However, what if you need to modify your data mashup workflow then these solutions don’t look great, do they now?

Don’t worry! Power Query is here!

It enables repeatable mashup of data like you might have never seen before! You need to try it to believe.

It’s very easy to input new data to Power Query and it enables you to retrieve final output based on new data using a “refresh” feature.

Each data-mashup is recorded as steps which you can go back and edit if you need to.

Power Query Refresh

2. It’s super-flexible!

Any data mashup performed using Power Query is expressed using its formula language called “M”. You can edit the code if you need to and as you can imagine such a platform enables much-needed flexibility for the analyst’s.

3. It has awesome advance features!

Do you want to Merge data? How about Join? Are you tired with VLOOKUP’s! Don’t worry! it’s super easy with Power Query! Here’s a post: Join Excel Tables in Power Query

How about Pivot or Unpivot? Done! Check this out: Unpivot excel data using Power Query

How about searching for online & open data sets? Done!

How about connecting to data sources that “Data” section of Excel doesn’t support yet? (Example: Facebook) – DONE! Power Query makes that happen for you.

And That’s not a complete list!

Plus you can unlock the “Power” (pun intended) of Power Query by using it with other tools in Power BI Stack. (Power Pivot, Power View, etc…) OR you can use the your final output from Power Query with other tools too! After all it’s an excel file.


If you haven’t already then check out Power Query! it’s free and works with Excel 2010 and above.

Author: Paras Doshi

Top two key techniques to analyze data:


There are many techniques to analyze data. In this post, we’re going to talk about two techniques that are critical for good data analysis! They are called “Benchmarking” and “Segmentation” techniques – Let’s talk a bit more about them:

1. Benchmarking

It means that when you analyze your numbers, you compare it against some point of reference. This would help you quickly add context to your analysis and help you assess if the number if good or bad. This is super important! it adds meaning to you data!

Let’s look at an example. CEO wants to see Revenue numbers for 2014 and an analyst is tasked to create this report. If you were the analyst, which report would you think resonated more w/ the CEO? Left or Right?

Benchmarking data providing context in analysis

I hope the above example helped you understand the importance of providing context w/ your data.

Now, let’s briefly talk about where do you get the data for benchmark?

There are two main sources: 1) Internal & 2) External

The example that you saw above was using an Internal source as a benchmark.

An example of an external benchmark could be subscribing to Industry news/data so that you understand how your business is running compared to similar other businesses. If your business sees a huge spike in sales, you need to know if it’s just your business or if it’s an Industry wide phenomenon. For instance, in Q4 most e-commerce sites would see spike in their sales – they would be able to understand what’s driving it only if they analyze by looking at Industry data and realizing that it’s shopping season!

Now, let’s shift gears and talk about technique #2: Segmentation.

2. Segmentation

Segmentation means that you break your data into categories (a.k.a segments) for analysis. So why do want to do that? Looking at the data at aggregated level is certainly helpful and helps you figure out the direction for your analysis. The real magic & powerful insights are usually derived by analyzing the segments (or sub sets of data)

Let’s a look at an example.

Let’s say CEO of a company looks at profitability numbers. He sees $6.5M and it’s $1M greater than last years – so that’s great news, right? But does that mean everything is fine and there’s no scope of optimization? Well – that could only be found out if you segment your data. So he asks his analyst to look at the data for him. So analyst goes back and after some experimentation & interviews w/ business leaders, he find an interesting insight by segmenting data by customers & sales channel! He finds that even though the company is profitable – there is a huge opportunity to optimize profitability for customer segment #1 across all sales channel (especially channel #1 where there’s a $2M+ loss!) Here’s a visual:

segmentation data Improve profitability low margin service offerings customers

I hope that helps to show that segmentation is a very important technique in data analysis!


In this post, we saw segmentation & benchmark techniques that you can apply in your daily data analysis tasks!

Five actions that you can take if you measure your analytics/business-intelligence solution usage:



In this post, I am going to share five actions that you can take you if measure your analytics/business-intelligence solution usage:

Five actions!

I’ll highly encourage business stakeholders & IT managers to consider measuring the usage of their analytics/business-intelligence solutions. From a technical standpoint, it shouldn’t be a difficult problem since most of the analytics & business intelligence tools will give you user activity logs. So, what’s the benefit of measuring usage? Well, in short, it’s like “eating at your restaurant” – if you’re trying to spread culture of data driven decision-making in your organization, you need to lead by example! And one way you can achieve that is by building a tiny Business Intelligence solution that measures user activity on top of your analytics/business-intelligence solution. if you decide to build that then here are five actions that you can take based on your usage activity:

Let’s broadly classify them in two main categories: Pro-active & Reactive actions.

A. Pro-active actions:

1. Identify “Top” users and get qualitative feedback from them. Understand why they find it valuable & find a way to spread their story to others in the organization

2. Reach out to users who were once active users but lately haven’t logged into the system. Figure out why they stopped using the system.

3. Reach out to inactive users who have never used the system. it’s easy to find inactive users by comparing your user-list with the usage activity logs. Once you have done that, Figure out the root-cause – a. Lack of Training/Documentation b. unfriendly/hard-to-use system c. difficult to navigate; And once you have identified the root-cause, fix it!

B. Reactive actions:

4. If the usage trend if going down then alert your business stakeholders about it and find the root-cause to fix it?

Possible root causes:

– IT System Failure? Fix: make sure that problem in the system never happens again!

– Lack of documentation/Training? Fix: Increase # of training session & documentation

downward trend line chart

5. It’s a great way to prove ROI of an analytics/business-intelligence solution and it can help you secure sponsorship for your future projects!


In this post, you saw five actions that you can take if you measure your usage activity of your analytics/business-intelligene solution.

I hope this was helpful! I had mentioned user training in this article and so if you want to learn a little bit more about it, here are a couple of my posts:

1. http://parasdoshi.com/2014/05/05/presented-at-sqlsat-305-dallas-ba-edition/

2. http://parasdoshi.com/2014/05/07/how-to-train-your-users-to-create-their-own-business-intelligence-reports-5-of-5-post-training/