Below are some of my projects roughly categorized by area.
Distributed Systems
service-capacity-modelinglibrary for capacity planning (determining which kind and how much of a computer to buy) for a particular workload such as Apache Cassandra or Elasticsearch. Essentially a multi-variate monte carlo simulation with a least regret optimizer over per workload models.jvmquakeagent for rescuing distributed databases written in Java from themselves by killing them (while grabbing a core dump) when they enter JVM death spirals. This is basically an extension ofjvmkillthat also detects GC spirals of death.synapseandnerveservice mesh aka SmartStack. I help maintain and have contributed significant features to Airbnb’s service mesh, including improving the scalability of the system by multiple orders of magnitude (Yelp ran tens of thousands of containers across a global network with full global service discovery), and extending Synapse to be fully pluggable and support any proxy.Paastadistributed platform as a service. I was a somewhat minor contributor to Paasta, mostly working on SmartStack integrations, but I did help port the whole codebase to Python 3 so that was interesting…Priamdistributed sidecar for Apache Cassandra. I work on this at Netflix, improving operability of Cassandra.pinchtoolkit for compressing, hashing and moving data around as fast as you can around a network. This is just a docker container with all of my favorite data compression and validation tools built in (e.g.zstd,lz4,xxhashetc …) and a localgoserver that can do it all via HTTP for you (assuming the commands are installed)
Debugging / Performance Analysis
performance-analysiscollection ofjupyternotebooks and various python scripts I’ve used to analyze the performance of various service or database setups. Perhaps one of the more interesting ones is my notebook for modeling Cassandra availability with different numbers of vnodescqltracedynamic tracer for observing live CQL traffic in real time. I mostly use this for debugging Cassandra clients and their performance.
Educational
python_service_performanceA step by step guide on how to make apython3web service based onuwsgi,pyramid,geventandnginxproduction ready. This means high scalability and low latency with a typical microservice setup.
Economics
splititalgorithm and Python web service for fairly dividing items that are hard to value (e.g. rents in a 5 bedroom apartment). I used this with my roommates to divide rent fairly in San Francisco.
Debate
MIT-TABAPDA parliamentary debate tabulation software. Basically this is a very complicated constraint optimization problem that debaters used to do by hand, and now a good fraction of the American Parliamentary Debate Association’s tournaments run with this software. I was the original author but have since handed development off to Ben Muschol who has really improved the project!
Machine Learning / AI
service-capacity-modelinglibrary for capacity planning (determining which kind and how much of a computer to buy) for a particular workload such as Apache Cassandra or Elasticsearch. Essentially a multi-variate monte carlo simulation with a least regret optimizer over per workload models.python_hqsomimplementation of the HQSOM deep learning algorithm. This was a project I worked on for a few graduate classes at MIT (6.867 and 6.868) that used genetic algorithms and deep learning and other such buzzwords. Surprisingly it actually worked pretty well.food.oprecipe recommender based on gradient boosting classifiers. It was a hackathon project but sorta neat to have recipes recommended based on previous cooking experiences.organonsymbolic constraint framework and solver for modeling complex constrained systems that may not have solutions. The paper is a decent read if you’re interested in what this project can do.