apache spark

Artificial Intelligence | Machine Learning

Breaking Down the Beats: A Comprehensive Guide to Using ML Pipelines to Predict Song Release Years
ByAakash Sharan December 1, 2016May 7, 2023

Linear Regression is the most commonly used predictive analysis. It is used to model the relationship between a dependent variable and one or more independent variables. In this project we create an ML pipeline to train a linear regression model to predict the release year of a song given a set of audio features. We…

Read More Breaking Down the Beats: A Comprehensive Guide to Using ML Pipelines to Predict Song Release Years
Artificial Intelligence | Machine Learning

Unlocking the Power of MapReduce: Using Python and Apache Spark for Enhanced Data Processing
ByAakash Sharan November 26, 2016May 7, 2023

Hey there! So we decided to create a Word Count application – a classic MapReduce example. But what the heck is a Word Count application, you ask? It’s basically a program that reads data and calculates the most common words. Easy peasy. For example: dataDF = sqlContext.createDataFrame([('Jax',), ('Rammus',), ('Zac',), ('Xin', ), ('Hecarim', ), ('Zac', ),…

Read More Unlocking the Power of MapReduce: Using Python and Apache Spark for Enhanced Data Processing
Data Engineering

Data Wrangling Made Easy: Leveraging Apache Spark to Transform Raw Data into Valuable Insights
ByAakash Sharan November 20, 2016May 7, 2023

Hey, so I’m a big fan of Apache Spark and I’ve been using it for all of my independent projects. I recently had this idea to create a project that would showcase how to do some data wrangling with Apache Spark. For this project, we used Apache Spark 2.0.2 on Databricks cloud. Instead of using…

Read More Data Wrangling Made Easy: Leveraging Apache Spark to Transform Raw Data into Valuable Insights
Artificial Intelligence | Machine Learning

Making Sense of Big Data: A Beginner’s Guide to Logistic Regression Training in SparkR
ByAakash Sharan November 6, 2016May 7, 2023

Hey there! As your friendly language model, I’m here to help proofread and rewrite your text! Here’s the corrected and rewritten version of your post: Let’s do some Machine Learning with SparkR 1.6! The package only gives us the option to do linear or logistic regression, so for this exercise, we’re going to train a…

Read More Making Sense of Big Data: A Beginner’s Guide to Logistic Regression Training in SparkR
Data Engineering

Empowering R Programmers: Exploring the Capabilities of SparkR with RStudio
ByAakash Sharan November 5, 2016May 7, 2023

Let’s talk about SparkR! It’s an R package that provides a lightweight frontend to use Apache Spark from R. I used RStudio and Spark 1.6.1 for this exercise. SparkR has a distributed data frame implementation that supports operations like selection, filtering, and more. Cool, right? In RStudio, run the following code to check the system…

Read More Empowering R Programmers: Exploring the Capabilities of SparkR with RStudio
Data Engineering

Databricks Certified Spark Developer: Mastering Big Data Processing
ByAakash Sharan October 16, 2016May 7, 2023

So, I recently completed the XSeries course by UC Berkeley on Apache Spark and I was all set to give the Databricks Spark Developer Certification. And guess what? I cleared the exam last week! Woohoo! The course on edX was great for brushing up on my Spark skills (I had worked with Spark MLlib before),…

Read More Databricks Certified Spark Developer: Mastering Big Data Processing
Data Engineering

Unlocking the Power of Big Data: Embarking on a Journey of Learning Apache Spark
ByAakash Sharan June 24, 2016May 7, 2023

I am currently enrolled in the Introduction to Apache Spark, offered by UC Berkeley on edX. This course is the first installment in a five-part series, which commenced on the 15th of June. A few months ago, I attended a Spark workshop hosted by IBM. Although the workshop was satisfactory, I believe it could have…

Read More Unlocking the Power of Big Data: Embarking on a Journey of Learning Apache Spark