The Hidden Complexity Behind Scaling Dense Vector Search New
Distributed Vector Search: How Real Vector Databases Scale Beyond One Machine New
Latest
Simplifying Javascript Techniques: A Guide to the Power of Hoisting
Hoisting is a funky Javascript mechanism where variables and functions get moved around like they’re playing a game of musical chairs. Here’s the deal: Only declarations get hoisted. Assignments or other executable logic are left in place, just like your friend who always bails on moving day. Functions get hoisted first and then variables, kind […]
Optimizing Your Search Algorithms: Notes on Quiescent Search and Branching Factor
Branching Factor The branching factor is the number of children at each node. The effective branching factor is the number of children generated by a "typical" node for a given search problem. Quiescent Search A full-width search sees everything up to its horizon, and nothing beyond. This is called the horizon effect. It’s like not […]
Mastering Search Optimization: An Introduction to Iterative Deepening
Depth-First Search (DFS) In DFS, we start from a node and go down a path until we reach a node that has no children. Whenever we run out of moves, we backtrack and explore the sibling of the node. And if there are no siblings, we go for the sibling of the grandparent and so […]
Mastering AI: Enrolling in Udacity’s Artificial Intelligence Nanodegree
Let me tell you about my experience with creating an AI that plays Connect 4. I took an undergraduate CS course during my Masters and we created the AI using a few algorithms. To be honest, I don’t remember all of them, but I do recall using the min-max algorithm. The AI wasn’t too difficult […]
Bridging the Gap Between Knowledge and Expertise: Professional Certification on Apache Cassandra
So I’ve been working with Apache Cassandra for over a year now and recently I decided to step up my game and get certified. And guess what? I passed Datastax’s Professional Certification on Apache Cassandra! The exam was purely theoretical, which means it didn’t ask me to calculate things like "partition size" and stuff. Datastax […]
Breaking Down the Beats: A Comprehensive Guide to Using ML Pipelines to Predict Song Release Years
Linear Regression is the most commonly used predictive analysis. It is used to model the relationship between a dependent variable and one or more independent variables. In this project we create an ML pipeline to train a linear regression model to predict the release year of a song given a set of audio features. We […]
Unlocking the Power of MapReduce: Using Python and Apache Spark for Enhanced Data Processing
Hey there! So we decided to create a Word Count application – a classic MapReduce example. But what the heck is a Word Count application, you ask? It’s basically a program that reads data and calculates the most common words. Easy peasy. For example: dataDF = sqlContext.createDataFrame([('Jax',), ('Rammus',), ('Zac',), ('Xin', ), ('Hecarim', ), ('Zac', ), […]
Data Wrangling Made Easy: Leveraging Apache Spark to Transform Raw Data into Valuable Insights
Hey, so I’m a big fan of Apache Spark and I’ve been using it for all of my independent projects. I recently had this idea to create a project that would showcase how to do some data wrangling with Apache Spark. For this project, we used Apache Spark 2.0.2 on Databricks cloud. Instead of using […]
The Art of Election Forecasting: Analyzing the 2012 US Presidential Election with Data Science
Hey there! Let’s talk about this dataset from RealClearPolitics and the US Presidential Election. Before we dive in, let’s get on the same page about a few things: The US Presidential Election happens every four years. There are 50 states in the US and each gets a certain number of electoral votes based on its […]
Making Sense of Big Data: A Beginner’s Guide to Logistic Regression Training in SparkR
Hey there! As your friendly language model, I’m here to help proofread and rewrite your text! Here’s the corrected and rewritten version of your post: Let’s do some Machine Learning with SparkR 1.6! The package only gives us the option to do linear or logistic regression, so for this exercise, we’re going to train a […]