Decentralized Syndication — The Missing Internet Protocol

The Internet is decentralized by design. It came into being not at once, but in parts. New protocols were added on top of previous ones, with each new protocol extending and improving functionality of the global network. TCP and IP protocols were built in the 1970s, then came SMTP and DNS in the 1980s. 1990s gave us probably the best known HTTP protocol that delivers the visual experience of the web. All these core protocols were built with decentralized Internet in mind.

TCP-IP protocol was well suited for data transfers between applications and for building application-level interactions. SMTP enabled a new way of human communication — emails. HTTP made the web accessible. Everyone now could create a website and share public information on the net. However one essential problem remained — discovering content on the web. In the early days people shared links to their websites on forums and mailing lists and that worked quite well while the Internet was small. But there was no generic web information publishing and discovery protocol. It seemed that this essential internet protocol was still missing.

As the Internet grew enterprises stepped in to fill content discovery gap. Search engines tried to scrape the whole of the web, index every website and every piece of information. Later on private companies created web publishing platforms that allowed users to publish without owning a personal website or domain. Everyone could create a new blog and publish content on a blogging platform. Then the Internet evolved and adopted an even simpler publishing model — social network apps and link aggregators. Now people could publish content even easier but their content was owned and walled by the content platforms.

The age of RSS

RSS (Really Simple Syndication) was one attempt to establish open content syndication and discovery mechanism. The main shortcoming of this system was that it was designed for content syndication from a single website. You needed other (usually closed source) platform called aggregator (or reader) to have a feed of all content from different domains that you follow. Users had to hand pick every blog that they follow and manually add it to the reader. There was no way to discover who was posting on the net using RSS protocol. In the peak of blogging age RSS and aggregator app combinations worked quite well, but there was trouble brewing on the horizon.

Eventually most of RSS users migrated to Google RSS aggregator platform (Google Reader). It was an easy and convenient app to use, but unfortunately one day Google decided to shut it down. This event caused an outrage and dealt a killing blow to RSS ecosystem. Many people did not notice that the landscape of the web was already changed by that time. Everyone was starting to become a content creator so without global aggregation and discovery solution RSS was too cumbersome. People just moved on to closed platforms where consuming and discovering content was much easier. It was not Google but the lack of global information syndication vision that killed RSS.

At the beginning these social network platforms were championing open data. Twitter was offering an open and free API, Facebook was hyping open graph protocol. As time passed by all of these open-data solutions closed up. These efforts did not need to be sustained because there was no open decentralized protocol at the core of these platforms. Eventually they just became profit and retention maximization machines. All the open data initiatives were shut down.

Bluesky

Bluesky is the new decentralization and open-data knight in the shining armor. At the core of Bluesky there is AT Protocol. It is open-source and open-data. It seems that Bluesky could really be an open alternative to Twitter, but can it be truly decentralized? Or is it just cosplaying decentralization? In order for a system to be decentralized there has to be low cost of entry or financial gain for new people who want to be hosting network instances. In November 2024 AT Protocol relay instance required 5 terabytes of storage (4TB increase in 4 months). And it is just now when Bluesky is only taking off. Such architecture might lead to eventual centralization because of massive instance running costs. For a truly decentralized system it should be straightforward and cost efficient to run a decentralized node similar to SMTP or HTTP servers.

RSDS

Having said all that I want to propose an alternative protocol. I call it RSDS - Really Simple Decentralized Syndication. It offers a decentralized platform for global social post syndication just for a fraction of a storage and complexity cost. I want to formulate core RSDS decentralized architecture tenets that I believe should apply to any decentralized global syndication protocol that might have a chance to succeed.

Everybody has to host their own content

As Bluesky example shows there is no storage efficient way that a decentralized global news feed could work with content baked into decentralized part of the platform. However it could be possible to build a scalable and fast decentralized infrastructure if instances only kept references to hosted content.

Let's define what could be the absolute minimum structure of decentralized content unit:

It is not unreasonable to expect that all this information could fit into roughly 100 bytes. Even if the platform would be receiving 1000 posts per second it would take around three terabytes to store one year of information. This calculation was made considering that there would be hundreds of millions active users in the platform.

Domain names should be decentralized IDs (DIDs)

There are many alternative overtly complicated proposals for DIDs, but there is one that is already working and built into the foundation of the decentralized web — your own domain name. Even Bluesky allows you to connect your domain as a handle. There are several reasons why domains as user IDs could be a good solution for a global decentralized system.

In RSDS protocol DID public key is hosted on each domain and everyone is free to verify all the posts that were submitted to a decentralized system by that user.

Proof of work time IDs can be used as timestamps

Keeping track of time and operations order is one of the most complicated challenges of a decentralized system. The innovation of Bitcoin was that the network uses proof-of-work and Merkle Trees to make sure that operations are in correct sequence. Each Bitcoin block is mined roughly in 10 minutes. Each mined block is identified by a specific hash, so these hashes essentially represent a clock that goes in 10 minute ticks.

When submitting a social post URL to the network user would sign the URL together with latest Bitcoin block hash. That would make sure that each post would have a decentralized time ID. Such blocked time could also be used to throttle posts by implementing limits for domain posts per block. This architecture would also prevent from posting content with future timestamps.

Hosting content should be as simple as possible

In order to make hosting easy the hosted content should leverage existing protocols and infrastructure. The easiest way to do this is to have social posts implemented as simple web pages. In order to submit a post to decentralized syndication platform all you would need to do is to own a domain name and have a simple web server.

It is important that decentralized syndication instances would accept non-HTTPS URLs. Having HTTPS adds additional complexity layer for hosting your content. Because the platform would be decentralized in nature it would be technically impossible to implement a persistent man-in-the-middle attack that would affect all platform instance nodes.

The protocol has to allow the right to be forgotten

One of the shortcomings of Bluesky is that all social interaction data is stored using Merkle-tree data structure. This makes all interaction history permanent. There is simply no easy way to delete a post or interaction. Although Merkle-trees work well in blockchains and in version control systems like git such architecture adds only bloat and unnecessary complication for a decentralized information syndication system. Removing your hosted content and informing the network about content deletion should be enough to propagate a simple data state update across all decentralized content index instances.

The protocol has to support publishing licenses

Open Source community has a pretty well established precedent of licensing code. As the age of AI is upon us it is essential to have built in standards in social media that allow owner of the posts to specify the scope of content republishing and reproduction rights. It is also important to be able to specify if each post can be used for AI training. Although Bluesky platform is promising not to train AI on user data, but there are no built-in licensing or republish preference mechanisms built in the protocol itself. That means that anyone is free to crawl your feed and do whatever they like with your data.

Decentralized instances should be able to host partial data

In RSDS protocol content index data is divided into time blocks. Each time block has unique ID that references a bitcoin block hash and represents roughly ten minutes in time. Since the data itself does not have references between itself any instance can freely choose what blocks it wants to host. There could be full instances that host all the post references or only ones that host the newest ones. Each instance could also decide what block lists to use for potentially harmful or offending domains. Each post would be an independent reference so the data hosted on instances will not have to be complete.

Decentralized system should be resilient to bloat attacks

One of the main challenges in a decentralized system is to secure the system from bloat and spam attacks. Bitcoin protocol works because posting information into decentralized system is not free and proof-of-work prevents spamming the network. In case in RSDS there can be several mechanisms preventing bloat attacks:

There should be an open path to commercialization

In order for decentralized social information network platform to exist there should be a way to build and monetize derivative products related to it. This could be achieved by building monetized "reader" platforms. This could be compared to Bluesky in relation to ATProtocol. Such reader platforms would be subscribing to decentralized index updates and do hosted content crawling and caching. Reader platforms could provide user-friendly interface to consume all the data. It would be costly to run such platforms due to storage and network costs but users could pay for these services using subscription fee or the platforms could be running ads.

There is no limit of what could be built on top of such a protocol, but the most important aspect of RSDS should be this: even without reader platforms it should be super easy to spin up an instance of the network node and start reading newest posts of your favorite content authors.

You can check out the source code of RSDS protocol on github. If you want to try posting your first post using this protocol you can do it here.