Below are some of my projects roughly categorized by area.
Distributed Systems
service-capacity-modeling
library for capacity planning (determining which kind and how much of a computer to buy) for a particular workload such as Apache Cassandra or Elasticsearch. Essentially a multi-variate monte carlo simulation with a least regret optimizer over per workload models.jvmquake
agent for rescuing distributed databases written in Java from themselves by killing them (while grabbing a core dump) when they enter JVM death spirals. This is basically an extension ofjvmkill
that also detects GC spirals of death.synapse
andnerve
service mesh aka SmartStack. I help maintain and have contributed significant features to Airbnb’s service mesh, including improving the scalability of the system by multiple orders of magnitude (Yelp ran tens of thousands of containers across a global network with full global service discovery), and extending Synapse to be fully pluggable and support any proxy.Paasta
distributed platform as a service. I was a somewhat minor contributor to Paasta, mostly working on SmartStack integrations, but I did help port the whole codebase to Python 3 so that was interesting…Priam
distributed sidecar for Apache Cassandra. I work on this at Netflix, improving operability of Cassandra.pinch
toolkit for compressing, hashing and moving data around as fast as you can around a network. This is just a docker container with all of my favorite data compression and validation tools built in (e.g.zstd
,lz4
,xxhash
etc …) and a localgo
server that can do it all via HTTP for you (assuming the commands are installed)
Debugging / Performance Analysis
performance-analysis
collection ofjupyter
notebooks and various python scripts I’ve used to analyze the performance of various service or database setups. Perhaps one of the more interesting ones is my notebook for modeling Cassandra availability with different numbers of vnodescqltrace
dynamic tracer for observing live CQL traffic in real time. I mostly use this for debugging Cassandra clients and their performance.
Educational
python_service_performance
A step by step guide on how to make apython3
web service based onuwsgi
,pyramid
,gevent
andnginx
production ready. This means high scalability and low latency with a typical microservice setup.
Economics
splitit
algorithm and Python web service for fairly dividing items that are hard to value (e.g. rents in a 5 bedroom apartment). I used this with my roommates to divide rent fairly in San Francisco.
Debate
MIT-TAB
APDA parliamentary debate tabulation software. Basically this is a very complicated constraint optimization problem that debaters used to do by hand, and now a good fraction of the American Parliamentary Debate Association’s tournaments run with this software. I was the original author but have since handed development off to Ben Muschol who has really improved the project!
Machine Learning / AI
service-capacity-modeling
library for capacity planning (determining which kind and how much of a computer to buy) for a particular workload such as Apache Cassandra or Elasticsearch. Essentially a multi-variate monte carlo simulation with a least regret optimizer over per workload models.python_hqsom
implementation of the HQSOM deep learning algorithm. This was a project I worked on for a few graduate classes at MIT (6.867 and 6.868) that used genetic algorithms and deep learning and other such buzzwords. Surprisingly it actually worked pretty well.food.op
recipe recommender based on gradient boosting classifiers. It was a hackathon project but sorta neat to have recipes recommended based on previous cooking experiences.organon
symbolic constraint framework and solver for modeling complex constrained systems that may not have solutions. The paper is a decent read if you’re interested in what this project can do.