[VIDEO] Microsoft’s vision for “Advanced analytics” (presented at #sqlpass summit 2015)


Presented at #sqlpass summit 2015.

Data -> Insights -> ?


I was at the HP Big data conference last week and I heard something during the keynote that’s worth sharing with you.

As Data & Analytics professionals, we spend a lot of our time on finding insights, trends & patterns out of the data but the keynote speaker (Ken Rudin, Facebook) encouraged everyone to take that a step further = Think about Driving impact based on the insights. It’s simple yet a powerful idea! Over past few months, I have started working closely with decision makers and helping drive impact vs just “handing-off” insights.

I hope that helps! Just wanted to share that with you. What do you think?


Book Review: R in a Nutshell


R is a popular tool among data scientists because it’s just like a Swiss Army knife (or may be more!) for them!

R Language Data scientist swiss army knife tool

Analogy credit: Tapping the Data Deluge with R by Jeffrey Breen

Sometime back I worked on a research project that involved writing some R code – we were searching for tools ways to pull data from multiple social networks, perform text analysis and create effective data visualizations. R seemed like a great tool & so I was searching for a book/guides that teaches me fundamentals I needed to know to get few R related things done. One of the books that I used often during the research project was “R in nutshell”. I didn’t read it cover-to-cover but it was a great reference book for me. I used to read guides online/other-books and then I used to combine information from this book to get stuff done. The section I liked the most was on Data visualization which included some great code snippets to create effective data visualization using ggplot2 library. I used to take code snippets from this book & apply it on data-sets that I had.

text analysis sentiment

Fun stuff!

Also, I liked it that the book has some end-to-end examples that cover the entire life cycle of data analysis/statistical-analysis.


I recommend this book as a “reference” for someone who started working with R.


I received a copy of this book as part of OREILLY’s Blogger program. Thanks OREILLY! If you are a blogger, you should check out that program!

Business Analytics Continuum:


Think of “continuum” as something you start and you never stop improving upon. In my mind, Business Analytics Continuum is continuous investment of resources to take business analytics capabilities to next level. So what are these levels? Douglas McDowell explained about this concept in recent post here – I think it was a great food for thought for me and hence I posting about this particular concept here. 

Here is the visual representation of the concept:

business analytics continuum

And I would encourage you to read the entire post and other posts in the series here: PASS BAC Preview Series: Business Analytics Defined

Resource: 12 recorded sessions from the 24hop business analytics edition are online! #passbac #msbi


Recently, PASS hosted a 24hop business analytics event:

And now, the 12 one hour sessions ranging from data visualization, predictive analytics to Big Data are online for you to watch! They also serve as “Trailer” for what you can expect at the PASS Business Analytics conference!

Here’s the URL: http://passbaconference.com/Sessions/SneakPeeks.aspx

And I was following some of these sessions live on the event day – and I can tell you, these sessions are great resources!

Also, I participated in the twitter contest (by Microsoft BI) that was happening along w/ the event – and this is what I got for my win!

24 hop twitter contest prize

hoodie w/ embedded earphones!

That’s about it for this post. Enjoy the recordings!

Quick Post: Uploading Local Data to Hadoop file system using Hadoop Command Line


This is a Quick Post, Just want to share a command to upload local data to HDFS using Hadoop Command Line.

The command looks like:

> hadoop fs -copyFromLocal input.txt input/SqrtJob/input.txt


How to start Analyzing Twitter Data w/ R?


Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)

How to load some data to Hadoop on Windows to get started?


In this post, I want to point out that HDInsight (Hadoop on Windows) comes with a sample datasets (log files) that you can load using the command:

1. Hadoop command Line > Navigate to c:\Hadoop\GettingStarted

2. Execute the following command:

powershell -ExecutionPolicy unrestricted –F importdata.ps1 w3c

import data to hadoop on windows file system

After you have successfully executed the command, you can sample files in /w3c/input folder:

w3c log files iis hadoop on windows

Conclusion: In this post, we saw how to load some data to Hadoop on Windows file system to get started. Your comments are very welcome.

Official Resource: http://gettingstarted.hadooponazure.com/loadingData.html

Microsoft® HDInsight Preview for Windows: How to use Sqoop to load data into HDFS from SQL Server?


In this post, we’ll see how to use Sqoop to load data into HDFS from SQL Server?

With that, here are the steps:

1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7

2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine

Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!

3. Before we begin, decide on three things:

3a: Username and Password that Sqoop would use to login to the SQL Server database. If you create a new username and pasword, test it via SSMS before you proceed.

3b. select the table that you want to load into HDFS

In my case, it’s this table:

sql table to be loaded into hadoop hdfs from sql server3c: The target directory in HDFS. in my case I want it to be /user/data/sqoopstudent1

You can create by command: hadoop fs -mkdir /user/data/sqoopstudent1

[to learn about how to create directory, read: How to create a directory in Hadoop File System? ]

4. Now Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)

5. Navigate to: c:\Hadoop\sqoop-1.4.2\bin>

*This path may change in future, but navigate to the bin folder under the SQOOP_HOME.

6. Run dir command to see various files under this directory.

sqoop list files under the HOMe directory import export

Also you can run sqoop help for more information on the command that we are about to run.

sqoop list of commands help

7. Now here’s the command to Load data from SQL Server to HDFS:

c:\Hadoop\sqoop-1.4.2\bin>sqoop import –connect “jdbc:sqlserver://localhost;dat
abase=UniversityDB;username=sqoop;password=**********” –table student –tar
get-dir /user/data/sqoopstudent1 -m 1

sqoop command to load data from sql server to hadoop file system

8. After successfully running the above command, let’s browse the file in HDFS!

sqoop see the content of the file

That’s about it for this post!


Thanks Aviad Ezra who answered my question on this MSDN thread: An error while trying to use Sqoop on HDInsight to import data from SQL server to HDFS


In this post, we saw how to load data into Hadoop from SQL Server using Sqoop (SQL Hadoop)

Related Articles: