Hey, so I’m a big fan of Apache Spark and I’ve been using it for all of my independent projects. I recently had this idea to create a project that would showcase how to do some data wrangling with Apache Spark. For this project, we used Apache Spark 2.0.2 on Databricks cloud. Instead of using […]
Category: Data Engineering
Empowering R Programmers: Exploring the Capabilities of SparkR with RStudio
Let’s talk about SparkR! It’s an R package that provides a lightweight frontend to use Apache Spark from R. I used RStudio and Spark 1.6.1 for this exercise. SparkR has a distributed data frame implementation that supports operations like selection, filtering, and more. Cool, right? In RStudio, run the following code to check the system […]
Databricks Certified Spark Developer: Mastering Big Data Processing
So, I recently completed the XSeries course by UC Berkeley on Apache Spark and I was all set to give the Databricks Spark Developer Certification. And guess what? I cleared the exam last week! Woohoo! The course on edX was great for brushing up on my Spark skills (I had worked with Spark MLlib before), […]
Unlocking the Power of Big Data: Embarking on a Journey of Learning Apache Spark
I am currently enrolled in the Introduction to Apache Spark, offered by UC Berkeley on edX. This course is the first installment in a five-part series, which commenced on the 15th of June. A few months ago, I attended a Spark workshop hosted by IBM. Although the workshop was satisfactory, I believe it could have […]