CS 7210 - Distributed Computing


Introduction

I recently finished CS 7210: Distributed Computing in the OMSCS program, and I walked away with an A — but not without earning every bit of it. This course has easily been the most difficult, most time-consuming, but also my favorite so far.

If you're working in software engineering today — especially with microservices, cloud computing, Cosmos DB, etc. — you’re already dealing with distributed systems. This course helped me understand what’s really going on under the hood. It was eye-opening, to say the least.


The Projects (aka the Gauntlet)

The course is built around projects using the DSLabs Java framework from UW's Distributed Systems group (Elis’s lab). The framework is amazing but brutal. You can’t just hack things together — your solution has to be correct and performant. If there’s an edge case or timing bug, the test harness will find it. Many tests also have strict timeouts, so performance matters.

Here’s a quick rundown of the projects:

  • Project 1: DSLabs Intro – Basic intro, took just a couple of hours. (5% grade)

  • Project 2: Client-Server RPC – Implemented an exactly-once RPC over an async network. Also fairly easy. (10% grade)

  • Project 3: Primary-Backup – Here’s where the course hits hard. I underestimated it and didn’t pass all the test cases. Learned my lesson. (10% grade)

  • Project 4: Paxos – Implementing Paxos for consensus. Super tough, but I started early and got 100%. (15% grade)

  • Project 5: Sharded KV Store – Built a sharded, fault-tolerant KV store using Paxos and 2PC. Didn’t get full marks but came very close.

From Project 3 onwards, I realized this course demands real commitment. I started each project as soon as it was released, and I was doing close to 10-hour days on weekends. I probably didn’t hit the infamous "60 hours per week" OMSCS lore, but it was intense.


The Content

The video lectures were top-notch. Here's a breakdown of what we covered:

  • Time in distributed systems:
    Lamport clocks, vector clocks, matrix clocks.

  • States and cuts:
    System models, definite vs possible state.

  • Consensus protocols:
    2PC, 3PC, Paxos, Raft (you could pick either for Project 4; most chose Paxos due to lecture focus).

  • Replication techniques:
    Active replication vs standby (primary-backup), replicated state machines, chain replication (CR vs CRAQ), scalability.

  • Fault tolerance:
    Checkpointing (coordinated vs uncoordinated), logging.

  • Distributed transactions:
    Google Spanner, TrueTime, ordering write transactions with TT timestamps.

  • Consistency models:
    Linearizability, sequential consistency, causal, eventual.

  • Real-world systems:
    Memcached in clusters, peer-to-peer systems, DHTs like Chord.

  • Modern topics:
    Distributed data analytics, Borg, distributed ML (Cartel), BFT & blockchain, edge computing.


Final Thoughts

If you’re even remotely interested in distributed systems, microservices, or cloud architecture, take this course. It’s not easy. The DSLabs framework will humble you. But you’ll walk away with a deep understanding of the systems that power nearly everything in modern software.

I’ve already seen this knowledge pay off in how I think about architecture decisions in my day job — from fault tolerance to eventual consistency.

Highly recommended. Just don’t sleep on Project 3 😅.