CAN YOU INTRODUCE YOUSELVES AND YOUR PROJECT?
I’m Sam, and I represent the Earthstar project. We’re working on Willow, a family of protocols for synchronisable data stores that can be used for many applications: traditional data storage, social applications, and resilient data archival, to name a few.
The Earthstar project was begun in 2020 by Cinnamon, whose work put user safety at the heart of everything. Cinnamon sadly passed away in 2022, but since then, we’ve tried to carry the torch they’ve passed on to us.
There are two of us working on Willow:
👩🎓 Aljoscha Meyer is pursuing a PhD at TU Berlin.
🧑💻 and myself, Sam Gwilym. I’m a programmer and illustrator, and I’ve been working on the Earthstar project since 2021.
What are the key issues you see with the state of the Internet today?
The key issue we see is fragility. Most networked services are built in a tightly coupled way where a single component failure can bring the whole service down, and users regularly lose access to their data. Even if they can keep up 100% uptime, these services are usually housed within data centres, which depend upon prodigious (and growing) amounts of electricity and water to keep running.
Building networked software this way assumes that we will always have equal or greater access to the connectivity, natural resources, and supply chains we do today. That may be a dangerous assumption that will leave us disconnected from each other after a natural disaster, geopolitical event, or simply the next unscheduled service outage.
Our view is that we need more technologies for antifragile networking.
Another issue we see is that we’re overserved by services that attempt to connect everyone, everywhere.
How does your project contribute to correcting some of those issues?
Devices using Willow can connect and collaborate on data as we do today. The two significant differences are that they can connect directly, with no privileged intermediary infrastructure like a data centre; they can disconnect from the network indefinitely yet still be able to read and write data. They’ll be caught up whenever they reconnect, as though they were connected all along.
You may have heard of something like this referred to as “local-first” or offline-first, and we’re far from being the first to develop this kind of system. However, we think Willow brings some significant advancements to this space. For example, deletion.
In a traditional network, you know where the data is stored so that you can delete it. But when a network of independent, local-first data stores wants something gone, a marker for what must be deleted must propagate across the network. These markers (or ‘tombstones’) can only partially be removed, so they accumulate over time and leak potentially sensitive metadata, possibly hinting at what was deleted, by whom, and when.
In Willow, we’ve introduced a new technique by which we can delete many items but leave behind a single marker, reducing the amount of storage used and metadata leaked.
Connecting to other devices directly also risks connecting to devices we don’t know or trust. However, we have an apparent Catch-22: we want to know if that untrusted device is interested in the same data as us, but we also want to keep our interests private if we have nothing in common. We have to prohibit the possibility of malicious actors collecting our interests, whether that knowledge is used for targeted marketing or political persecution. Willow uses a cryptographic technique called private set intersection, ensuring that others can’t discover what you’re interested in unless they already know about it. A device with nothing in common with yours is given the networking equivalent of being blanked.
Efficiency is critical in this context, not only because of all the extra communication we have to do in a system like this but also because we want Willow to be able to run on very low-spec hardware. To do this, we have devices continuously communicate their memory constraints to each other, so they only ever send each other as much data as their partner can handle. Devices also tell each other what they’re interested in, and how much of it they want, e.g. a phone with very little storage space can always get the last 100 messages.
What do you like most about (working on) your project?
A lot of what we do is writing specifications for Willow, so communication is a massive aspect of our work. We’ve maybe over-indulged ourselves in thinking about making Willow more comprehensible and filled our website with illustrations, diagrams, and an interactive cross-referencing system.
We’re working in a space with many precedents and parallel projects. It’s interesting to see what lessons everyone has taken away from previous efforts, and it’s a lot of fun to bring our take to the table. I think we’ve seen this space’s Cambrian explosion, and we’re now entering the Devonian era, where all these slimy creatures are climbing onto land for the first time with their half-leg, half-flipper things.
Where will you take your project next?
Our long-term goal is to make Willow as boring as possible, as in ‘PVC piping under the sink’ boring. We want to move this technology to a place where application authors and users take these benefits for granted, and it’s out of view, doing its job.
Our next step in this direction is to implement Willow in Rust, which will let us improve the efficiencies we’ve been touting and make Willow available on many platforms. NGI Core is funding this implementation work. Willow’s specification is more or less stable, so with any luck, we can focus on continually locking down and optimising that implementation and let it act as a platform for academic research, e.g. for interesting data structures for Willow’s three-dimensional data model.
I’m also excited to see where others will take Willow next, e.g. Iroh. They have a Willow implementation in the works, but I hope we can eventually get everyone on board with a single implementation that serves the whole ecosystem.
How did NGI Assure help you reach your goals for your project?
NGI Assure funded the Earthstar project twice over two years, and having that sustained funding let us figure out which questions we should be asking and even try to formulate some answers to those questions. This would have been impossible in a commercial context where we might have been pressured into pivoting to AI somehow. Conversely, an academic setting would not have been grounded enough for a project like this to develop into something people could use.
Do you have advice for people who are considering applying for NGI funding?
Don’t over-commit and try to stuff your grant proposal with a hundred things in a bid to make it more attractive. Do just enough to explore your idea and feel its edges, or focus wholeheartedly on your core idea and deliver something solid and nothing else.
Do you have any recommendations to improve future NGI programs or the wider NGI initiative?
There is nothing that I don’t think NGI hasn’t been made aware of already: that we need more integrated, sustained projects, with different kinds of specialities involved if we are going to get to the point where there are viable alternatives to the proprietary services many of us rely on.