SQL: How to get first/last transaction details for each user account?

Looking at user’s first/last transaction details is a common requirement. So given that you have a “user_transaction_details” table which looks something like:

Transaction_id | user_id | transaction_date | purchase amount

How would get first/last transaction details for each user account?

Here’s a design pattern:


select AllTransactions.user_id,AllTransactions.purchase_amt from user_transaction_details AllTransactions
inner join 
(
select user_id, min(transaction_timestamp) as first_transaction_timestamp from user_transaction_details
group by user_id
) FirstTransaction
on AllTransactions.user_id = FirstTransaction.user_id and AllTransactions.transaction_timestamp = FirstTransaction.first_transaction_timestamp

To get the last transaction details for each user account, replace min function with max and that should be it.

Note: Depending on your data model and how you used it in the join, it might be that there would be multiple rows marked as “first/last” transaction and so would want to spend some time figuring out how to deal with these multiple row situation especially if you’re planning to further join this data.

Conclusion:
In this post, I shared a design pattern to write a SQL query to get first/last transaction details for each user account type.

Question for you:
How would you solve this problem? Share you query in the comments section!

PASS Business Analytics conference 2015 – Keynote Day 1. #passbac

I’m at PASS BA conference this week to learn about current & future state of business analytics world. I would categorize today’s keynote as a session that gave me an insight what’s our “future”.

Before I begin describing my takeaway’s, I would provide some context by providing keynote speaker’s background.

The keynote speaker was Carlo Ratti who directs MIT’s seanseable city — a group that focuses on how to use data to understand our cities and works on research projects to make it better!

So, the theme of the keynote was to show some of the research projects and inspire audience on how big data was used to come up with actionable insights to make our cities better. The projects are usually at the intersection of Internet of things and Big Data.

With that context, here’s are my notes:

– What is Big Data? 

(I thought that was funny but behind it, there’s an interesting takeaway: we need NEW tools and approach to deal with Big Data! btw, MIT has their in-house tools to generate data viz’s on bigdata — one such example: Data Collider http://datacollider.io/)

– Internet of things & Big data

Carlo also showed the relation between Internet of things and Big data by using data to show exponential growth that we have seen in mobile connections and how it’s poised to grow further in coming decade because of internet of things.

– Big Data = opportunistic + user-generated + purposely sensed. 

Carlo then categorized his projects in three categories: Opportunistic, User-generated & Purposely sensed.

– Opportunistic

couple of examples shared by the speaker:

1. Re-drawing Great Britain’s map from a network of human interactions:

2. A better way to handle traffic at traffic lights. (Project Wave) http://senseable.mit.edu/wave/

user-generated

1. using user-generated content to figure out where’s the best party in Barcelona? (project World’s eye)

http://senseable.mit.edu/worldseyes/visuals.html

2. analyzing user tweets during an event. (Project: Tweetbursts)

http://senseable.mit.edu/tweetbursts/

– purposely sensed

1. Having sensors in the bike that gives you info like pollution, traffic congestion and road congestions in real-time.

(Project: Copenhagen wheel.)

copenhagen wheel

2. Following e-waste around the world. Sometimes the energy put into discarding the waste is more than we can ever get out of it after its recycled. This should be improved! (Project: Trash Track)

http://senseable.mit.edu/trashtrack/

That’s it for the notes.

You can check out all of their projects on their site!

After seeing the demo’s, I can’t wait to live in one of this “smarter” city. What a fascinating application of data! — The future is awesome!

How to analyze “new users” metric by specific pages in Google Analytics?

Problem:

you want to create a funnel chart of how your new users move from their landing page to your desired destination. Ideally it’s goes something like this:

Stage 1) lands on your home/landing page

Stage 2) goes to a product page

stage 3) goes to a checkout page

stage 4) sees a thank you page

Now, if you want to analyze the conversion among these stages for a “new” user then you will need create custom reports in google analytics. You will basically need to create a report for each specific page that you want to analyze. So how to set one up?

Solution: 

1) Navigate to Google analytics profile

2) On the top of the go to “customization” section and click on create a new custom report

3) here’s how you can set up a custom report that will use you new users by a specific page (notice the page filter?).

New users by specific pages funnel visualization google analytics

Conclusion:

In this post, I outlined the steps that you need to take to setup a custom report in google analytics that shows you new users by specific pages.

How to assign same axis values to a group of spark-lines in Excel?

Spark-line is a very handy data visualization technique! It’s great when you are space constrained to show trends among multiple data points.

Here’s an example:

Spark Line Trend Excel Data Visualization

But there’s an issue with above chart! Axis values for these group of spark-lines do not seem match – it could throw someone off if they didn’t pay close attention. So a good practice – when you know users are going to compare segments based on the spark-lines – is to assign them same axis values so it’s easier to compare. Here’s the modified version:

Excel Sparkline data visualization same axis

And…here are the steps:

1. Make sure that spark-lines are grouped.

Select the spark-lines > go to toolbar > Sparkline Tools > Design > Group

Excel Sparkline Group

2. On the “group” section, you’ll also find the “Axis” option – select that and make sure that “same for all axis” is selected for Vertical axis minimum and maximum values:

Excel Spark Line Data Viz same min max value

 

That’s about it. Just a quick formatting option that makes your spark-lines much more effective!

Author: Paras Doshi

Business Metric #6 of N: Net Promoter Score (NPS)

In this post, you’ll see the definition, benefits and basic calculation tutorial for using Net promoter score (NPS)

What is it? 

Net Promoter Score is a nice indicator to measure customer loyalty and satisfaction. The way you do that is by measuring how users likelihood to recommend your products/services. You can do this by asking a simple question: In a scale of 0-10, How likely are you to recommend to a friend?

Here’s how you calculate it: 

1) After you get responses, you need to classify the range in three categories “Promoters”, “passive”, “Detractors”. It could something like:

0-5 -> Detractors

5-8-> Passive

9-10 -> Promoters.

2) Now, here’s the formula

(Total promoters – Total detractors)/(Total survey users)

How to interpret it? 

So, what’s a good NPS?

Let’s take an example.

1) Promoters = 100, passive = 100, detractors = 100 THEN NPS = 0

2) Promoters = 50, passive = 100, detractors =  10 THEN NPS = 0.25 (or 25%)

3) Promoters = 10, passive = 100, detractors = 50 THEN NPS = -25%

As a basic rule of thumb, higher the number then better it is for you! You don’t want this to be negative because as you can see from example below it indicates that you have more detractors then promoters.

Other than general rule of thumb, you might want to keep an eye on the trend of NPS on a monthly/quarterly basis to make sure it’s moving in right direction. You might also want to benchmark this against your Industry standard – because NPS tends to be different for different industries.

Conclusion:

In this post, you learned about Net Promoter score and how to use it to measure customer loyalty and satisfaction.

Cohort Analysis: What is it and why use it?

In this post, you’ll learn definition and benefits of Cohort Analysis. Let’s get started!

Cohort Analysis: What is it?

Cohort analysis is a data analysis technique used to compare similar groups over time.

Cohort Analysis: Why use it?

Here’s the basic idea: Businesses are dynamic and thus are continuously evolving. A customer who joined previous year might get a different experience compared to customer who joined this year. This is especially true if it’s a startup or tech company where the business models change (or Pivot!) often. You might miss crucial insights if you ignore the dynamic nature of businesses in your data analysis. To see if the business models are evolving in right direction, you need to to use cohort analysis to analyze similar groups over time – Let’s see an example to make it a little bit more clear for you.

You decide to analyze “Average Revenue per Customer” by Fiscal Year and came up with following report:

Simple Data Analysis Averages Hide Interesting Trends

It seems that your “Average revenue per customer” is dropping and you worry that your investors might freak out and you won’t secure new investments. That’s sad! But hold on – Let’s put some cohort analysis technique to use and look at the same data-set from a different angle.

In this case, you decide to create cohorts of customers based on their joining year and then plot “Average Revenue Per Customer” by their year from joining date. Same data-set but it might give you different view. See here:

Cohort Analysis Customer Revenue and Year Joined

It seems you’re doing a good job! your latest cohort is performing better than previous cohorts since it has a higher average revenue per customer. This is a great sign – and you don’t need to worry about your investors pulling out either and well, start preparations to attract new investors – all because of cohort analysis! :) WIN-WIN!

Conclusion:

As you saw, cohort analysis is a very powerful technique which can help you uncover trends that you wouldn’t otherwise find by traditional data analysis techniques.

You might also like: Top 2 techniques to analyze data

Author: Paras Doshi

Every Data Analyst Needs to check out this FREE excel add-in: Power Query!

Power Query is amazing! It takes the data analysis capabilities of Excel to whole new level! In this post, I am going to share three reasons:

1. it enables repeatable mash-up of data!

Have you every had to do your data analysis tasks repeatedly on the data with same structure? Do you get “new” data every other week and need to go through the same data transformation workflow to get to the data that you need?

What’s the solution? Well, you can look at MACRO’s! Or you can request your IT department to create a Business Intelligence platform. However, what if you need to modify your data mashup workflow then these solutions don’t look great, do they now?

Don’t worry! Power Query is here!

It enables repeatable mashup of data like you might have never seen before! You need to try it to believe.

It’s very easy to input new data to Power Query and it enables you to retrieve final output based on new data using a “refresh” feature.

Each data-mashup is recorded as steps which you can go back and edit if you need to.

Power Query Refresh

2. It’s super-flexible!

Any data mashup performed using Power Query is expressed using its formula language called “M”. You can edit the code if you need to and as you can imagine such a platform enables much-needed flexibility for the analyst’s.

3. It has awesome advance features!

Do you want to Merge data? How about Join? Are you tired with VLOOKUP’s! Don’t worry! it’s super easy with Power Query! Here’s a post: Join Excel Tables in Power Query

How about Pivot or Unpivot? Done! Check this out: Unpivot excel data using Power Query

How about searching for online & open data sets? Done!

How about connecting to data sources that “Data” section of Excel doesn’t support yet? (Example: Facebook) – DONE! Power Query makes that happen for you.

And That’s not a complete list!

Plus you can unlock the “Power” (pun intended) of Power Query by using it with other tools in Power BI Stack. (Power Pivot, Power View, etc…) OR you can use the your final output from Power Query with other tools too! After all it’s an excel file.

Action-Item!

If you haven’t already then check out Power Query! it’s free and works with Excel 2010 and above.

Author: Paras Doshi

Achievement Unlocked! Passed Google Analytics certification Exam.

I recently completed the Google Analytics certification exam!

Here’s the Google Analytics Individual Qualification (GAIQ) certificate:

Google Analytics Certification Exam

Top two key techniques to analyze data:

There are many techniques to analyze data. In this post, we’re going to talk about two techniques that are critical for good data analysis! They are called “Benchmarking” and “Segmentation” techniques – Let’s talk a bit more about them:

1. Benchmarking

It means that when you analyze your numbers, you compare it against some point of reference. This would help you quickly add context to your analysis and help you assess if the number if good or bad. This is super important! it adds meaning to you data!

Let’s look at an example. CEO wants to see Revenue numbers for 2014 and an analyst is tasked to create this report. If you were the analyst, which report would you think resonated more w/ the CEO? Left or Right?

Benchmarking data providing context in analysis

I hope the above example helped you understand the importance of providing context w/ your data.

Now, let’s briefly talk about where do you get the data for benchmark?

There are two main sources: 1) Internal & 2) External

The example that you saw above was using an Internal source as a benchmark.

An example of an external benchmark could be subscribing to Industry news/data so that you understand how your business is running compared to similar other businesses. If your business sees a huge spike in sales, you need to know if it’s just your business or if it’s an Industry wide phenomenon. For instance, in Q4 most e-commerce sites would see spike in their sales – they would be able to understand what’s driving it only if they analyze by looking at Industry data and realizing that it’s shopping season!

Now, let’s shift gears and talk about technique #2: Segmentation.

2. Segmentation

Segmentation means that you break your data into categories (a.k.a segments) for analysis. So why do want to do that? Looking at the data at aggregated level is certainly helpful and helps you figure out the direction for your analysis. The real magic & powerful insights are usually derived by analyzing the segments (or sub sets of data)

Let’s a look at an example.

Let’s say CEO of a company looks at profitability numbers. He sees $6.5M and it’s $1M greater than last years – so that’s great news, right? But does that mean everything is fine and there’s no scope of optimization? Well – that could only be found out if you segment your data. So he asks his analyst to look at the data for him. So analyst goes back and after some experimentation & interviews w/ business leaders, he find an interesting insight by segmenting data by customers & sales channel! He finds that even though the company is profitable – there is a huge opportunity to optimize profitability for customer segment #1 across all sales channel (especially channel #1 where there’s a $2M+ loss!) Here’s a visual:

segmentation data Improve profitability low margin service offerings customers

I hope that helps to show that segmentation is a very important technique in data analysis!

Conclusion:

In this post, we saw segmentation & benchmark techniques that you can apply in your daily data analysis tasks!

Answering a question using data: Are marketers around the globe shifting their dollars to digital ads?

Answering a question using data: Are marketers around the globe shifting their dollars to digital ads?

According to the data shared by emarketer, we can clearly see that the Traditional Ad market is reaching a saturation state in 5 major economies (US, China, UK, Japan, Germany) while the digital ad market will see steady growth in some economies & explosive growth in US & China…but the market size of traditional ads will still certainly remain much bigger in US while market size of digital ads in china will overtake the traditional ads in 2017.

So to answer the question: Marketers are not decreasing their existing budgets for traditional ad channels but the increased marketing budget dollars seems to be directed to digital ads market.

Very interesting data-set, I encourage you to play with it!

Thanks Avinash Kaushik for sharing this interesting tool.

I was playing with the data using Excel & Tableau, here’s a public workbook if you’re interested: https://public.tableausoftware.com/profile/paras.doshi#!/vizhome/WorldWideAdSpend/Dashboard-DigitalAdSpendvsTraditionalAdSpend

YoY growth - Digital Ad Spends vs Traditional Ad Spend

Now, it’s your turn! What insights do you get from this data?