Nice series on WordPress Blog Statistics:

WordPress recently did a good series on how to analyze the data that’s available to you via WordPress Blog Stats tool. This series is great if you’ve a WordPress.com blog PLUS it’s a good read for any one in the data analytics role to learn how to write-up content like this.

1. Stats Wrangling I: Digging into Your Data
2. Stats Wrangling II: Days, Weeks, and Months
3. Stats Wrangling III: Top Posts and Pages
4. Stats Wrangling IV: Referrers and Clicks
5. Stats Wrangling V: The Words that Bring You Traffic

Along with WordPress Stats, I also use data from the Google Webmaster Tools. It’s a great way to see Keywords, Top posts & pages from a search engine point of view. It’s always good to have a healthy number of people searching for your content on search engines like Google.

I hope you take a look at how Data Analytics can help your Blog Grow. The series that WordPress ran focused on their platform but if you run your blog on other platform, this should give you a good sense of how to analyze the blog statistics.

Know who is Talking about YOU on the web – set up Google Alerts!

I’ve been using Google Alerts for more than a year now and I thought I’ll talk about how it helps me keep track of who is talking about XYZ on the web. here XYZ could be a brand, your full name, your competitor’s name, your company name among other things.

So why should you care?

Well, whether you know it or not, someone out there is talking about your brand, about YOU or about your company. you can’t control that – but what you can do is to “Monitor” it. Keep an eye out on what people are talking about YOU or your brand on the internet. That way, you get to stay current w/ the conversations about things that matter to you.

And If you’re a blogger then you can set up an alert for when your blog gets found (a.k.a indexed) by Google – nice! right?

Great! How can you set it up?

1. Go To http://www.google.com/alerts

2. Configure the options:

Google Alerts For BLOGGERS SEO

Example:

Here are a couple of alerts that I’ve set up:

GOOGLE ALERTS TO TRACK INDEXED DOCUMENTS

Conclusion:

Set up Google Alerts so that you can monitor mentions about your brand, company or just things that interest you!

Google Analytics: How to Track an email campaign?

In this post, I’ll share how I learned to track an email campaign via Google Analytics.

First up, what do I mean by email campaign?

let’s say you email 1000 newsletter subscribers a link (URL) along w/ a summary – How do you track the traffic that is generated via this email campaign? Well – that’s where Google analytics can help you track your email campaigns. One metric would be how may people clicked on that link and visited your site.

Why should I care?

“If you can’t measure it, you can’t manage it” – Peter Drucker

If you do not measure what’s working or what’s not working, then you can’t improve – can you? Let’s take a hypothetical example. supposing it’s cost you $25 dollars to email 1000 people. How do you calculate the ROI on it? Well – track it! And the tool you can consider using is Google Analytics.

Now, Here are the steps to track an email campaign via Google Analytics:

Here’s the visual:

google analytics track email campaign

Here are the steps:

1. First Step is to create an URL.

Why do you need this? Basically this URL would have “meta data” that helps Google Analytics identify this link belongs to one of the campaigns.

How do we create it? Use this web service: http://support.google.com/analytics/bin/answer.py?hl=en&answer=1033867 to create an URL:

This is how an URL that I created looks: http://parasdoshiblog.blogspot.com/?utm_source=newsletter&utm_medium=email&utm_campaign=UTDEmailCampaign

google URL builder google analytics

2. Create an advanced segment in Google Analytics:

> Open Google Analytics.

> Select your site

> you should be in the audience overview report

> From here, click on advanced segment and click on new custom segment

google analytics advance segments> Here I’ve configured it like shown in the image below. Note the name of the campaign is same as the name of the campaign in STEP 1.

email campaign track google analytics> Save segment

> next time you visit, you’ll see this custom segment – select it and you’ll see only from the campaign that you want to track:

google analytics custom segments traffic

That’s about it for this post. your comments are very welcome!

Found something interesting by exploring a “List of companies by revenue” Data Set:

I like exploring data sets to find interesting patterns from them. To that end, I was exploring a data-set: List of companies by revenue and I added a column to calculate Revenue/Employee to explore the dataset:

And I found an outlier!

Here’s the outlier: Exor

Here’s what it’s interesting:

It’s revenue in 2012 is: 109.15 billion USD

And number of employees is just 40!

Just think of Revenue/Employee !

To put things in perspective, Lets Compare that with its neighbor in the data-set:

Rank | Company | Industry | Revenue in USD billion | Employees

48 Koch Industries Conglomerate 110.00 60000.00
49 EXOR Investment 109.15 40.00
50 Cardinal Health Pharmaceuticals 107.55 40000.00
51 CVS Caremark Retail 107.10 202000.00
52 IBM Computer services 106.92 433362.00

I got to know about this by quickly creating a data visualization to explore the data-set:

list of companies by revenue

And removing Trafigura, Vitol and Exor, this is what we have:

power view excel 2013 rank revenue employees

Observation: oil and gas industry have relatively higher revenue/employee ration.

That’s about it for this post. Thanks for reading about my data exploration!

Data visualization: List of largest IT companies in the world

I was going through the list of largest IT companies in the world. And I thought, it would be great to see it visually! so here it goes:

[created using Power View in Excel 2013)

power view scatter chart excel 2013Configuration of Scatter Plot:

fields list power view scatter chart excel 2013

Some of observations:

- Foxconn has low revenue/employee ratio (I guess, it’s because they must be employing a lot of workers for their electronic manufacturing plant at low cost)

- Samsung is ranked number 1 and Apple is ranked 2. But apple has better revenue/employee ratio. Also Apple’s market cap (represented by Size of bubble) is greater then that of Samsung

- there’s a cluster that comprises of MS. Google, Amazon.. etc Also one more cluster of HP, Panasonic and IBM

And here’s the Data that I’ve used:

Rank Company Revenue (USD Billion) Employees Market cap
1 Samsung Electronics $188.10 221,726 $200
2 Apple $156.50 76,100 $427.62
3 HP $120.30 331,800 $32.46
4 Foxconn $119 1,230,000 $27.20
5 IBM $104.50 433,362 $229.45
6 Panasonic $99.65 327,512 $22.70
7 Microsoft $73.72 94,000 $231.03
8 Dell $62.07 106,700 $22.97
9 Amazon.com $61.09 88,400 $120.03
10 Fujitsu $54.46 173,155 $125.83
11 Intel $53.34 104,700 $105.26
12 Google $50.17 53,546 $248.31

 

How many websites in USA exceed the data collection limitations of Google Analytics?

Little bit of background:

- I was researching on the limitations of Google Analytics

- After reading the Limitations, I wanted to know – How many websites in USA exceed the limitations of Google Analytics?

So Here’s the Short Answer:

Only 108 sites exceed this limitation

(as of today)

And Here’s the long answer:

Limitations of Google Analytics. Here’s the URL: http://support.google.com/analytics/bin/answer.py?hl=en&answer=1070983

And I am quoting from the above URL:

Data Collection limit: You should not send more than 10 million hits per month. If you exceed this limit, there is no assurance that the excess hits will be processed.
Data Freshness limit: Sending more than 200,000 visits per day to Google Analytics will result in your reports being refreshed only once per day

And to take it further, I wanted to know how many website in USA get greater than 10 million hits per month, turns out only 108 websites in US get that much traffic.
Source: http://www.quantcast.com/top-sites/US?jump-to=108

so from data collection limit standpoint, only these 100 odd sites would exceed the limitations of Google Analytics.

To put things in Perspective: MySpace.com does not exceed Data Collection Google Analytics Limit:

my space can use google analytics

Conclusion

Just knowing about the Data Collection Limit was not interesting but I combined data from other data sources – it seemed very interesting to me! Anyhoo – In this post, I shared:

> Limitations of Google Analytics

> Answered How many websites in USA exceed the limitations of Google Analytics?

[UPDATE Feb 10th 2013] I made a mistake in correlating data from Quantcast and Google Analytics. Lesson learned: double-check for units when comparing data from two different sources

Florin Dumitrescu pointed out that while Quantcast uses People/Month and Google uses hits/month. They may NOT be always the same. Sorry about this.

Back to basics: Data Mining and Knowledge Discovery Process

Once in a while I go back to basics to revisit some of the fundamental technology concepts that I’ve learned over past few years. Today, I want to revisit Data Mining and Knowledge Discovery Process:

Here are the steps:

1) Raw Data

2) Data Pre processing (cleaning, sampling, transformation, integration etc)

3) Modeling (Building a Data Mining Model)

4) Testing the Model a.k.a assessing the Model

5) Knowledge Discovery

Here is the visualization:

knowledge discovery process data miningAdditional Note:

In the world of Data Mining and Knowledge discovery, we’re looking for a specific type of intelligence from the data which is Patterns. This is important because patterns tend to repeat and so if we find patterns from our data, we can predict/forecast that such things can happen in future.

Conclusion:

In this blog post, we saw the Knowledge Discovery and Data Mining process.

Things I shared on Social Media Networks during Noc 12 – Dec 31 (2012)

Big Data: The Coming Sensor Data Driven Productivity Revolution http://bit.ly/TQAPsW

Check out some nice getting started tutorials at beyondrelational site: http://bit.ly/RVVHRV

Complexity is your enemy. Any fool can make something complicated. It is hard to make something simple – Richard Branson

— via Paras Doshi – Blog http://on.fb.me/WAQ5ky

The success of companies like Google, Facebook, Amazon, and Netflix, not to mention Wall Street firms and industries from manufacturing to retail and healthcare, is increasingly driven by better tools for extracting meaning from very large quantities of data,” says Tim O’Reilly

— via Paras Doshi – Blog http://on.fb.me/WAQ5ky

Nice collection of about 20+ videos around the topic of “Data Science”: http://bit.ly/WMkZqc

Nice collection of videos by Berkeley school of information: http://bit.ly/Tf1yAD #Information #Data

Just found Facebook’s data team’s page: http://on.fb.me/ToYILO

via V Talk Tech – A Parth Acharya Blog – Nice HeatMap of stocks! http://on.fb.me/SfBbvF

what’s the biggest fear about cloud computing? via Windows Azure http://on.fb.me/VjIiHR

Resource: Presentations from the Sentiment Analysis Symposium http://bit.ly/VtPH3B

If I switched to the newest “holiday” theme on WordPress, this is how it would look: http://on.fb.me/UEuyFr

Nice! Code School now has R programming language! I have been playing with R for a while now and definitely want to learn more – here’s the link to learn R: http://bit.ly/VEAnkZ

Interesting tool from Google to optimize and analyze web page speeds: http://bit.ly/HTubNC

Performed #sentiment #Analysis on #starbucks twitter data using #R ! It was fun! http://on.fb.me/Z3qLo8

In 2002: The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year. And of course, over the past 10 years, this number would be bigger. http://bit.ly/TPT9r3

Reading: Business Analytics vs Business Intelligence? http://bit.ly/YUtJwx

Big data is a nickname for the recent increase in largely external and unstructured business and consumer information. How are businesses across industries harnessing traditional enterprise information management functions and systems to translate big data into useful business intelligence? http://www.deloitte.com/view/en_US/us/Services/additional-services/deloitte-analytics-service/217c19e69249b310VgnVCM2000003356f70aRCRD.htm

For business analytics professionals: 12 webcasts on Jan 30th 2013 http://bit.ly/RUFsZ3 #sqlpass #analytics #24hop

Some nice insights about how to build an Internet platform, from the founder of Zipcar: http://bit.ly/Yco6IP

Let’s connect and converse on any of these people networks!

paras doshi blog on facebookparas doshi twitter paras doshi google plus paras doshi linkedin

Seven Interesting Google Projects that a Data Professional may not have heard about:

Here’s the list:

1. Google Refine

2. Google Prediction API

3. Google Trends

4. Google Chart Tools

5. Google Big Query

6. Google Correlate

7. Google Fusion Tables

Note: These projects may not be ready to be used in your production environment as some of them are in Beta/Experimental stages and their support/development may be deprecated in future.

Thanks: I thought of writing this blog post after a discussion I had with Parth Acharya about Google and it’s projects for Data Professionals. He pointed me to some of the most interesting samples that used Google Fusion Tables and here’s his one of the blog post on related topic: Google Fusion Table & Data Visualization

There’s been a growing interest in Hadoop & Big Data, Here’s the Proof:

I like to keep an eye on Technology Trends. One of the ways I do that is by subscribing to leading magazines for articles – I may not always read the entire article but I definitely read the headlines to see what Industry is talking about. during last 12 months or so I have seen a lot of buzz around Big Data and I thought to myself – It would be nice to see a Trend line for Big Data. Taking it a step further, I am also interested in seeing if there is a correlation between growing trend in “Hadoop” and “Big Data”. Also, I wanted to see how it compares with the Terms like Business Intelligence and Data Science. With this, I turned to Google Trends to quickly create a Trend report to see the results.

Here’s the report:

Big Data Hadoop Business Intelligence

Here are some observations:

1) There’s a correlation between Trend of Big Data and Hadoop. In fact, it looks like growing interest in Hadoop fueled interest in “Big Data”.

2) Trend line of Big Data and Hadoop overtook that of Business Intelligence in Oct 2012 and sep 2012 respectively.

3) Decline in Trend line of Business Intelligence.

4) There seems to be a steady increase in Trend line for Business Analytics and Data Science.

And Here’s the Google Trend report URL: http://www.google.com/trends/explore#q=Big%20Data%2C%20Hadoop%2C%20Business%20Intelligence%2C%20Business%20Analytics%2C%20Data%20Science&cmpt=q

What do you think about these trends?