Latest

Empowering R Programmers: Exploring the Capabilities of SparkR with RStudio

Let’s talk about SparkR! It’s an R package that provides a lightweight frontend to use Apache Spark from R. I used RStudio and Spark 1.6.1 for this exercise. SparkR has a distributed data frame implementation that supports operations like selection, filtering, and more. Cool, right? In RStudio, run the following code to check the system […]


Databricks Certified Spark Developer: Mastering Big Data Processing

So, I recently completed the XSeries course by UC Berkeley on Apache Spark and I was all set to give the Databricks Spark Developer Certification. And guess what? I cleared the exam last week! Woohoo! The course on edX was great for brushing up on my Spark skills (I had worked with Spark MLlib before), […]






Revving Up the Engines of Racing Games: Review of The Crew

So, Xbox One’s free game of the month was "The Crew". Being a racing game enthusiast, I thought I’d give it a shot. But, boy oh boy, was I disappointed after playing it for just an hour! Most racing games don’t really have a great plot, but this one had a good story – or […]



MIT’s Kaggle Competition Sees Fierce Competition Among Enrolled Students, with Overfitting a Concern for Some Top Contenders

It’s an absolute thrill to be in the top 1% of the Kaggle competition hosted by MIT! This contest is no joke, with some seriously experienced ML implementers throwing their hats into the ring. And let me tell you, the top 3 are on a whole other level – they’ve achieved over 90% accuracy, which […]


Revolutionizing Baseball Strategy: Validating Moneyball Predictions through Machine Learning Models

After diving into regression analysis, I couldn’t wait to test my newfound skills on some real-world data. Luckily, Kaggle has just the thing – they’re hosting a competition called ‘History of Baseball’ and, even better, they’ve provided a dataset for it! I had a blast analyzing Paul dePodesta’s predictions and statistical findings, using linear regression […]