In 2017 I decided to print some PDFs to read instead of just have them in an ever-growing folder of Papers to Read. Here is the list of the ones I managed to read this year.
Out of the Tar Pit
/ Ben Moseley, Peter Marks / 2006
This is my favorite paper. It is a bit long (around 60 pages), but well worth reading. It is very thorough in defining what software complexity is. The authors make the distinction between essential and accidental complexity (following Fred Brook’s ideas on the topic) and postulate that, to this day, most complexity in software is still accidental. To make things more concrete, they give an outline for a potential complexity-minimizing approach. The outline might throw some readers off. Some reviewers criticize the paper focusing only on this outline. I think that is a mistake. The given outline is one of the many possible solutions and many applicable insights can be found in the less concrete part of the paper.
In this paper, djb
describes his approaches to writing secure and bug-free software.
Kqueue: A generic and scalable event notification facility
/ Jonathan Lemon / 2001
This paper describes the FreeBSD event notification facility:
. The main application of such facility is in servers that rely on asynchronous I/O to be able to handle many clients concurrently (think Java NIO, Netty, Node.js, Nginx) without a thread/process per connection.
is very elegant and can do much more than just asynchronous I/O. It is a pity that the Linux equivalent
much less nicer than
/ Bryan Cantrill
, Jeff Bonwick / 2008
Although we assert that less—much less—code needs to be parallel than some might fear, it is nonetheless true that writing parallel code remains something of a black art. We also therefore give specific implementation techniques for developing a highly parallel system. As such, this article is somewhat dichotomous: we try both to argue that most code can (and should) achieve concurrency without explicit parallelism, and at the same time to elucidate techniques for those who must write explicitly parallel code.
This paper describes an incremental algorithm to efficiently solve linear constraints. This is the kind of algorithm behind systems like AutoLayout
— a layout system that is widely used by iOS developers.
Software Engineering at Google
/ Fergus Henderson / 2017
A description of Google’s software engineering practices.
Mosh: An Interactive Remote Shell for Mobile Clients
/ Keith Winstein, Hari Balakrishnan / 2012
This paper describes how Mosh
works. Great read for anyone building interactive networking software that needs to work in the presence of intermittent connection and keep perceived latency to a minimum.
A classical CS paper from the 70s. I think what it describes is known to anyone developing software in 2017. It still is a valuable reading for the historical value.
Spanner, TrueTime & The CAP Theorem
/ Eric Brewer
This paper describes how Spanner
can exist if the CAP theorem
is still true. Spanner is Google’s highly available global-scale distributed database that provides strong consistency for all transactions.
The Triumph of Types: Principia Mathematica’s Impact on Computer Science
/ Robert Constable / 2011
I read this when I was writing the Where do Type Systems Come From
post which I obviously recommend you to read.
Another classical CS paper. Reading it will give you an initial idea of what software verification is about.
State the Problem Before Describing the Solution
/ Leslie Lamport / 1978
How Complex Systems Fail
/ Richard Irvin Cook / 2002
This paper is also from Fabian Giesen’s Papers I like
Time, Clocks and the Ordering of Events
/ Leslie Lamport / 1978
Another paper from Leslie Lamport
. Probably his most cited paper.
It describes how to build distributed state machines (AKA distributed systems) given that only partial ordering of events is knowable in such systems (i.e. it is not possible to know for every event, what events happened before it).
In a distributed system, it is sometimes impossible to say that one of two events occurred first. The relation “happened before” is therefore only a partial ordering of the events in the system. We have found that problems often arise because people are not fully aware of this fact and its implications.
To this day, many systems are built as if it was possible to have complete ordering of events in distributed systems. This causes many subtle bugs that annoy users and puzzle programmers.
Holistic Configuration Management at Facebook
/ Chunqiang (CQ) Tang, Thawan Kooburat, Pradeep Venkat, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, Robert Karl / 2015
This paper describes how Facebook manages configuration (changing, testing, deploying…) of its backend services and mobile clients. My interest in reading this paper is in how they manage configuration of mobile clients in a reliable, scalable and backwards-compatible way. That is described in part 4 — Gatekeeper.