The Hidden Complexity Behind Scaling Dense Vector Search New
Distributed Vector Search: How Real Vector Databases Scale Beyond One Machine New
Latest
Empowering R Programmers: Exploring the Capabilities of SparkR with RStudio
Let’s talk about SparkR! It’s an R package that provides a lightweight frontend to use Apache Spark from R. I used RStudio and Spark 1.6.1 for this exercise. SparkR has a distributed data frame implementation that supports operations like selection, filtering, and more. Cool, right? In RStudio, run the following code to check the system […]
Databricks Certified Spark Developer: Mastering Big Data Processing
So, I recently completed the XSeries course by UC Berkeley on Apache Spark and I was all set to give the Databricks Spark Developer Certification. And guess what? I cleared the exam last week! Woohoo! The course on edX was great for brushing up on my Spark skills (I had worked with Spark MLlib before), […]
Eliminating the Spam Menace: Building an Effective Machine Learning-Based Spam Filter
Hey there! Let’s talk about spam filters. You know, those annoying emails that keep showing up in your inbox, even though you never signed up for them. Yeah, those. Well, a spam filter is a program that filters out those unwanted emails and messages. Pretty cool, right? So, we’re going to build and evaluate a […]
The Tim Ferriss Show: A Love-Hate Relationship – A Personal Account of the Popular Podcast Series
So, I’m a bit of a podcast junkie. I love listening to them on my way to work. Some of my faves are Talking Machines, Linear Digressions, Data Skeptic, Freakonomics, The Art of Manliness, Lore, Myths and Legends. But my absolute favorite? The Tim Ferris Show. Now, I haven’t read any of Tim Ferris’ books, […]
Transforming Data Analytics: An Honest Review of MITx’s 15.071x Course, The Analytics Edge
Alright, folks! The The Analytics Edge course on edX is almost over and boy, have I learned a lot about Machine Learning in the past 2 months! This MOOC is hands down the best one I’ve taken so far, and I hope my other courses can at least live up to its awesomeness. I first […]
Unlocking the Power of Big Data: Embarking on a Journey of Learning Apache Spark
I am currently enrolled in the Introduction to Apache Spark, offered by UC Berkeley on edX. This course is the first installment in a five-part series, which commenced on the 15th of June. A few months ago, I attended a Spark workshop hosted by IBM. Although the workshop was satisfactory, I believe it could have […]
Revving Up the Engines of Racing Games: Review of The Crew
So, Xbox One’s free game of the month was "The Crew". Being a racing game enthusiast, I thought I’d give it a shot. But, boy oh boy, was I disappointed after playing it for just an hour! Most racing games don’t really have a great plot, but this one had a good story – or […]
Streamlining Your Scala Development: A Guide to Creating SBT Projects in Eclipse
Hey there, I’m currently knee-deep in a Scala course on Coursera called "Functional Programming in Scala" taught by none other than Martin Ordersky – the inventor of Scala. Let me tell you, creating a Scala ecosystem is no walk in the park. Recently, I hit a roadblock while trying to create an Eclipse project using […]
MIT’s Kaggle Competition Sees Fierce Competition Among Enrolled Students, with Overfitting a Concern for Some Top Contenders
It’s an absolute thrill to be in the top 1% of the Kaggle competition hosted by MIT! This contest is no joke, with some seriously experienced ML implementers throwing their hats into the ring. And let me tell you, the top 3 are on a whole other level – they’ve achieved over 90% accuracy, which […]
Revolutionizing Baseball Strategy: Validating Moneyball Predictions through Machine Learning Models
After diving into regression analysis, I couldn’t wait to test my newfound skills on some real-world data. Luckily, Kaggle has just the thing – they’re hosting a competition called ‘History of Baseball’ and, even better, they’ve provided a dataset for it! I had a blast analyzing Paul dePodesta’s predictions and statistical findings, using linear regression […]