Tag: apache spark


Unlocking the Power of MapReduce: Using Python and Apache Spark for Enhanced Data Processing

Hey there! So we decided to create a Word Count application – a classic MapReduce example. But what the heck is a Word Count application, you ask? It’s basically a program that reads data and calculates the most common words. Easy peasy. For example: dataDF = sqlContext.createDataFrame([('Jax',), ('Rammus',), ('Zac',), ('Xin', ), ('Hecarim', ), ('Zac', ), […]




Empowering R Programmers: Exploring the Capabilities of SparkR with RStudio

Let’s talk about SparkR! It’s an R package that provides a lightweight frontend to use Apache Spark from R. I used RStudio and Spark 1.6.1 for this exercise. SparkR has a distributed data frame implementation that supports operations like selection, filtering, and more. Cool, right? In RStudio, run the following code to check the system […]


Databricks Certified Spark Developer: Mastering Big Data Processing

So, I recently completed the XSeries course by UC Berkeley on Apache Spark and I was all set to give the Databricks Spark Developer Certification. And guess what? I cleared the exam last week! Woohoo! The course on edX was great for brushing up on my Spark skills (I had worked with Spark MLlib before), […]


Unlocking the Power of Big Data: Embarking on a Journey of Learning Apache Spark

I am currently enrolled in the Introduction to Apache Spark, offered by UC Berkeley on edX. This course is the first installment in a five-part series, which commenced on the 15th of June. A few months ago, I attended a Spark workshop hosted by IBM. Although the workshop was satisfactory, I believe it could have […]