Hey, so I’m a big fan of Apache Spark and I’ve been using it for all of my independent projects. I recently had this idea to create a project that would showcase how to do some data wrangling with Apache Spark. For this project, we used Apache Spark 2.0.2 on Databricks cloud. Instead of using […]
Author: Aakash Sharan
The Art of Election Forecasting: Analyzing the 2012 US Presidential Election with Data Science
Hey there! Let’s talk about this dataset from RealClearPolitics and the US Presidential Election. Before we dive in, let’s get on the same page about a few things: The US Presidential Election happens every four years. There are 50 states in the US and each gets a certain number of electoral votes based on its […]
Making Sense of Big Data: A Beginner’s Guide to Logistic Regression Training in SparkR
Hey there! As your friendly language model, I’m here to help proofread and rewrite your text! Here’s the corrected and rewritten version of your post: Let’s do some Machine Learning with SparkR 1.6! The package only gives us the option to do linear or logistic regression, so for this exercise, we’re going to train a […]
Empowering R Programmers: Exploring the Capabilities of SparkR with RStudio
Let’s talk about SparkR! It’s an R package that provides a lightweight frontend to use Apache Spark from R. I used RStudio and Spark 1.6.1 for this exercise. SparkR has a distributed data frame implementation that supports operations like selection, filtering, and more. Cool, right? In RStudio, run the following code to check the system […]
Databricks Certified Spark Developer: Mastering Big Data Processing
So, I recently completed the XSeries course by UC Berkeley on Apache Spark and I was all set to give the Databricks Spark Developer Certification. And guess what? I cleared the exam last week! Woohoo! The course on edX was great for brushing up on my Spark skills (I had worked with Spark MLlib before), […]
Eliminating the Spam Menace: Building an Effective Machine Learning-Based Spam Filter
Hey there! Let’s talk about spam filters. You know, those annoying emails that keep showing up in your inbox, even though you never signed up for them. Yeah, those. Well, a spam filter is a program that filters out those unwanted emails and messages. Pretty cool, right? So, we’re going to build and evaluate a […]
The Tim Ferriss Show: A Love-Hate Relationship – A Personal Account of the Popular Podcast Series
So, I’m a bit of a podcast junkie. I love listening to them on my way to work. Some of my faves are Talking Machines, Linear Digressions, Data Skeptic, Freakonomics, The Art of Manliness, Lore, Myths and Legends. But my absolute favorite? The Tim Ferris Show. Now, I haven’t read any of Tim Ferris’ books, […]
Transforming Data Analytics: An Honest Review of MITx’s 15.071x Course, The Analytics Edge
Alright, folks! The The Analytics Edge course on edX is almost over and boy, have I learned a lot about Machine Learning in the past 2 months! This MOOC is hands down the best one I’ve taken so far, and I hope my other courses can at least live up to its awesomeness. I first […]
Unlocking the Power of Big Data: Embarking on a Journey of Learning Apache Spark
I am currently enrolled in the Introduction to Apache Spark, offered by UC Berkeley on edX. This course is the first installment in a five-part series, which commenced on the 15th of June. A few months ago, I attended a Spark workshop hosted by IBM. Although the workshop was satisfactory, I believe it could have […]
Revving Up the Engines of Racing Games: Review of The Crew
So, Xbox One’s free game of the month was "The Crew". Being a racing game enthusiast, I thought I’d give it a shot. But, boy oh boy, was I disappointed after playing it for just an hour! Most racing games don’t really have a great plot, but this one had a good story – or […]