All Posts

May 25, 2025
Understanding Paxos the intuitive way
We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over Paxos. Paxos revolutionized distributed computing by providing the first provably correct solution for achieving consensus among unreliable processors, forming the theoretical foundation for modern distributed systems and databases. Paxos is one of the most important and most difficult to understand algorithm. In this blog I will simplify and explain paxos in a very intuitive way.
May 25, 2025
FLP Impossibility and beyond
We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over the the paper ‘Impossibility of Distributed Consensus with One Faulty Process’ submitted in 1982 by Fischer, Lynch and Paterson. The purpose to understand this paper is to understand the limitation of distributed systems. The paper essentially presented what’s not possible and whole generation of researchers used it to come up with consensus algorithms by accepting the limitations of the distributed system.
May 18, 2025
A formal model of crash recovery in distributed system
We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over the the paper ‘A Formal Model of Crash Recovery in a Distributed System’ published in 1983 by Skeen and Stonebraker. The purpose to understand this paper is to learn how to formalize the crash recovery problem in distributed database environment. This will set foundation on how to think and build mathematical framework for crash recovery problems.
May 17, 2025
Getting distributed Consensus using quorum based commit protocol
We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over the the paper ‘A Quorum-Based Commit Protocol’ published in 1982 by Dale Skeen. This paper builds on the improves on work done on 2 phase commit. Quorum-based commit protocols is a way to get distributed consensus between participating database nodes by requiring a minimum number of nodes (quorum) to agree on transaction commit/abort decisions to ensure atomicity and consistency.
May 2, 2025
Distributed Consensus in Distributed Systems: Two Generals Problem and Byzantine Fault Tolerance Explained
We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will start with the paper ‘The Byzantine Generals Problem’ published in 1982 by Leslie Lamport, Robert Shostak, and Marshall Pease. This paper introduces the concept of Byzantine faults and formally proves the conditions under which consensus is impossible in the presence of arbitrary (malicious or faulty) behavior.
Apr 29, 2025
Understanding distributed time using Vector Clocks
To build a strong foundation in distributed systems, it’s essential to first understand the concept of distributed time. Friedemann Mattern’s 1988 paper “Virtual Time and Global States of Distributed Systems” is one of the foundational works in distributed computing, introducing key concepts that help us reason about the ordering of events and capture consistent global states in distributed systems.
Apr 26, 2025
Understanding distributed time using Logical Clocks
To build a strong foundation in distributed systems, it’s essential to first understand the concept of distributed time. To begin this journey, we’ll start by exploring the landmark research paper “Time, Clocks, and the Ordering of Events in a Distributed System,” published by Leslie Lamport in 1978. This paper is a must-read for anyone designing or building distributed systems.
Mar 9, 2025
Apache Iceberg Internals Dive Deep On Performance
In this blog I will go over how Apache Iceberg contributes to performance of compute engine. Apache Iceberg is an ACID table format designed for large-scale analytics workloads. While its consistency and schema evolution features are covered in previous blog, its impact on query performance can be equally transformative. By the end of this document, you will have a deep understanding of how Iceberg enhances performance, the trade-offs involved, and best practices for maximizing efficiency in read-heavy workloads.
Feb 22, 2025
Apache Iceberg Architecture Dive Deep
This is first part of iceberg series where I dive deep into Apache Iceberg internals. In this blog I will explain the architecture, specifications and protocols of Apache Iceberg in great details. I will go over the internal working, high level design, metadata evolution with examples for different write modes Apache Iceberg support Merge On Read, and Copy On Write. I will also go over the control flow / low level design of Apache Iceberg write path in both simplified and detailed version. I will also touch upon compaction and consistency handling of Apache Iceberg.
Feb 15, 2025
Introduction To Distributed Systems
This blog contains the introduction of distributed systems essentials, a starting point for someone starting up in the journey of distributed systems.