• Understanding Paxos the intuitive way

    We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over Paxos. Paxos revolutionized distributed computing by providing the first provably correct solution for achieving consensus among unreliable processors, forming the theoretical foundation for modern distributed systems and databases. Paxos is one of the most important and most difficult to understand algorithm. In this blog I will simplify and explain paxos in a very intuitive way.

  • FLP Impossibility and beyond

    We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over the the paper ‘Impossibility of Distributed Consensus with One Faulty Process’ submitted in 1982 by Fischer, Lynch and Paterson. The purpose to understand this paper is to understand the limitation of distributed systems. The paper essentially presented what’s not possible and whole generation of researchers used it to come up with consensus algorithms by accepting the limitations of the distributed system.

  • A formal model of crash recovery in distributed system

    We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over the the paper ‘A Formal Model of Crash Recovery in a Distributed System’ published in 1983 by Skeen and Stonebraker. The purpose to understand this paper is to learn how to formalize the crash recovery problem in distributed database environment. This will set foundation on how to think and build mathematical framework for crash recovery problems.

  • Getting distributed Consensus using quorum based commit protocol

    We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will go over the the paper ‘A Quorum-Based Commit Protocol’ published in 1982 by Dale Skeen. This paper builds on the improves on work done on 2 phase commit. Quorum-based commit protocols is a way to get distributed consensus between participating database nodes by requiring a minimum number of nodes (quorum) to agree on transaction commit/abort decisions to ensure atomicity and consistency.

  • Distributed Consensus in Distributed Systems: Two Generals Problem and Byzantine Fault Tolerance Explained

    We are on a path to build a strong foundation in distributed systems. We have already gone over distributed time; the next topic we will cover is Distributed Consensus. To build the foundation on distributed consensus, we will start with the paper ‘The Byzantine Generals Problem’ published in 1982 by Leslie Lamport, Robert Shostak, and Marshall Pease. This paper introduces the concept of Byzantine faults and formally proves the conditions under which consensus is impossible in the presence of arbitrary (malicious or faulty) behavior.

  • Understanding distributed time using Vector Clocks

    To build a strong foundation in distributed systems, it’s essential to first understand the concept of distributed time. Friedemann Mattern’s 1988 paper “Virtual Time and Global States of Distributed Systems” is one of the foundational works in distributed computing, introducing key concepts that help us reason about the ordering of events and capture consistent global states in distributed systems.

  • Understanding distributed time using Logical Clocks

    To build a strong foundation in distributed systems, it’s essential to first understand the concept of distributed time. To begin this journey, we’ll start by exploring the landmark research paper “Time, Clocks, and the Ordering of Events in a Distributed System,” published by Leslie Lamport in 1978. This paper is a must-read for anyone designing or building distributed systems.

  • Apache Iceberg Internals Dive Deep On Performance

    In this blog I will go over how Apache Iceberg contributes to performance of compute engine. Apache Iceberg is an ACID table format designed for large-scale analytics workloads. While its consistency and schema evolution features are covered in previous blog, its impact on query performance can be equally transformative. By the end of this document, you will have a deep understanding of how Iceberg enhances performance, the trade-offs involved, and best practices for maximizing efficiency in read-heavy workloads.

  • Apache Iceberg Architecture Dive Deep

    This is first part of iceberg series where I dive deep into Apache Iceberg internals. In this blog I will explain the architecture, specifications and protocols of Apache Iceberg in great details. I will go over the internal working, high level design, metadata evolution with examples for different write modes Apache Iceberg support Merge On Read, and Copy On Write. I will also go over the control flow / low level design of Apache Iceberg write path in both simplified and detailed version. I will also touch upon compaction and consistency handling of Apache Iceberg.

  • Introduction To Distributed Systems

    This blog contains the introduction of distributed systems essentials, a starting point for someone starting up in the journey of distributed systems.