# Atomic Data Docs - Overview

Atomic Data is a modular specification for sharing, modifying and modeling graph data. It combines the ease of use of JSON, the connectivity of RDF (linked data) and the reliability of type-safety.

Venn diagram showing Atomic Data is the combination of JSON, RDF and Type-Safety

Atomic Data uses links to connect pieces of data, and therefore makes it easier to connect datasets to each other - even when these datasets exist on separate machines.

Atomic Data is especially suitable for knowledge graphs, distributed datasets, semantic data, p2p applications, decentralized apps and linked open data. It is designed to be highly extensible, easy to use, and to make the process of domain specific standardization as simple as possible.

Atomic Data is Linked Data, as it is a strict subset of RDF. It is type-safe (you know if something is a string, number, date, URL, etc.) and extensible through Atomic Schema, which means that you can define your own Classes, Properties and Datatypes.

The default serialization format for Atomic Data is JSON-AD, which is simply JSON where each key is a URL of an Atomic Property. These Properties are responsible for setting the datatype (to ensure type-safety) and setting shortnames (which help to keep names short, for example in JSON serialization) and descriptions (which provide semantic explanations of what a property should be used for).

Atomic Data has a standard for communicating state changes called Commits. These Commits are signed using cryptographic keys, which ensures that every change can be audited. Commits are also used to construct a history of versions.

Agents are Users that enable authentication. Atomic Data can be traversed using Paths, or queried using Collections. Hierarchies are used for authorization and keeping data organized. Invites can be used to easily create new users and provide them with rights.

Get Started

If you want to read more about how Atomic Data works - read on. If you'd rather play and discover for yourself, play with the existing open source tooling:

Make sure to join our Discord if you'd like to discuss Atomic Data with others.


Keep in mind that none of the Atomic Data project has reached a v1, which means that breaking changes can happen.

Reading these docs

This is written mostly as a book, so reading it in the order of the Table of Contents will probably give you the best experience. That being said, feel free to jump around - links are often used to refer to earlier discussed concepts. If you encounter any issues while reading, please leave an issue on Github. Use the arrows on the side / bottom to go to the next page.

Table of contents

Specification (core)

Specification (extended)

Using Atomic Data

Acknowledgements Newsletter Get involved

Motivation: Why Atomic Data?

Give people more control over their data

The world wide web was designed by Tim Berners-Lee to be a decentralized network of servers that help people share information. As I'm writing this, it is exactly 30 years ago that the first website has launched. Unfortunately, the web today is not the decentralized network it was supposed to be. A handful of large tech companies are in control of how the internet is evolving, and where and how our data is being stored. The various services that companies like Google and Microsoft offer (often for free) integrate really well with their other services, but are mostly designed to lock you in. Vendor lock-in means that it is often difficult to take your information from one app to another. This limits innovation, and limits users to decide how they want to interact with their data. Companies often have incentives that are not fully aligned with what users want. For example, Facebook sorts your newsfeed not to make you satisfied, but to make you spend as much time looking at ads. They don't want you to be able to control your own newsfeed. Even companies like Apple, that don't have an ad-revenue model, still have a reason to (and very much do) lock you in. To make things even worse, even open-source projects made by volunteers often to don't work well together. That's not because of bad intentions, that's because it is hard to make things interoperable.

If we want to change this, we need open tech that works really well together. And if we want that, we need to standardize. The existing standards are well-suited for documents and webpages, but not for structured personal data. If we want to have that, we need to standardize the read-write web, which includes standardizing how items are changed, how their types are checked, how we query lists, and more. I want all people to have a (virtual) private server that contains their own data, that they control. This Personal Data Store could very well be an old smartphone with a broken screen that is always on, running next to your router.

Atomic Data is designed to be a standard that achieves this. But we need more than a standard to get adoption - we need implementations. That's why I've been working on a server, various libraries, a GUI and more - all MIT licensed. If Atomic Data will be successful, there will likely be other, better implementations.

Linked data is awesome, but it is too difficult for developers in its current form

Linked data (RDF / the semantic web) enables us to use the web as a large, decentralized graph database. Using links everywhere in data has amazing merits: links remove ambiguity, they enable exploration, they enable connected datasets. But the existing specs are too difficult to use, and that is harming adoption.

At my company Ontola, we've been working with linked data quite intensely for the last couple of years. We went all-in on RDF, and challenged ourselves to create software that communicates exclusively using it. That has been an inspiring, but at times also a frustrating journey. While building our e-democracy platform Argu.co, we had to solve many RDF related problems. How to properly model data in RDF? How to deal with sequences? How to communicate state changes? Which serialization format to use? How to convert RDF to HTML, and build a front-end? We tackled some of these problems by having a tight grip on the data that we create (e.g. we know the type of data, because we control the resources), and another part is creating new protocols, formats, tools, and libraries. But it took a long time, and it was hard. It's been almost 15 years since the introduction of linked data, and its adoption has been slow. We know that some of its merits are undeniable, and we truly want the semantic web to succeed. I believe the lack of growth partially has to do with a lack of tooling, but also with some problems that lie in the RDF data model.

Atomic Data aims to take the best parts from RDF, and learn from the past to make a more developer-friendly, performant and reliable data model to achieve a truly linked web. Read more about how Atomic Data relates to RDF, and why these changes have been made.

Make it easier for developers to build really good, interoperable apps

Every time a developer builds an application, they have to figure a lot of things out. How to design the API, how to implement forms, how to deal with authentication, authorization, versioning, search... By having a more complete, strict standard, Atomic Data aims to decrease this burden. Atomic Schema enables developers to easily share their datamodels, and re-use those from others. Atomic Commits helps developers to deal with versioning, history, undo and audit logs. Atomic Hierarchies provides an intuitive model for authorization and access control. And finally, the existing open source libraries, server, GUI and templates help developers to have these features without writing them.

When (not) to use Atomic Data

When should you use Atomic Data

  • Flexible schemas. When dealing with structured wikis or semantic data, various instances of things will have different attributes. Atomic Data allows any kind of property on any resource.
  • High-value open data. Atomic Data is a bit harder to create than plain JSON, for example, but it is easier to re-use and understand. It's use of URLs for properties makes data self-documenting.
  • High interoperability requirements. When multiple groups of people have to use the same schema, Atomic Data provides easy ways to constrain and validate the data and ensure type safety.
  • Multi-class / multi-model. Contrary to (SQL) tables, Atomic Data allows a single thing to have multiple classes, each with their own properties.
  • Connected / decentralized data. With Atomic Data, you use URLs to point to things on other computers. This makes it possible to connect datasets very explicitly, without creating copies. Very useful for decentralized social networks, for example.
  • Audibility & Versioning. Using Atomic Commits, we can store all changes to data as transactions that can be replayed. This creates a complete audit log and history.
  • JSON or RDF as Output. Atomic Data serializes to idiomatic, clean JSON as well as various RDF formats (Turtle / JSON-LD / n-triples / RDF/XML).

When not to use Atomic Data

  • Internal use only. If you're not sharing structured data, Atomic Data will probably only make things harder for you.
  • Big Data. If you're dealing with TeraBytes of data, you probably don't want to use Atomic Data. The added cost of schema validation and the lack of distributed / large scale persistence tooling makes it not the right choice.
  • Video / Audio / 3D. These should have unique, optimized binary representations and have very strict, static schemas. The advantages of atomic / linked data do little to improve this, unless it's just for metadata.

What is Atomic Data?

Atomic Data Core

Atomic Data is a modular specification for sharing information on the web. Since Atomic Data is a modular specification, you can mostly take what you want to use, and ignore the rest. The Core part, however, is the only required part of the specification, as all others depend on it.

Atomic Data Core can be used to express any type of information, including personal data, vocabularies, metadata, documents, files and more. It's designed to be easily serializable to both JSON and linked data formats. It is typed data model, which means that every value should be validated and predictable.

It is a directed, labeled graph, similar to RDF, so contrary to some other (labeled) graph data models (e.g. NEO4j), a relationship between two items (Resources) does not have attributes.

Design goals

  • Browsable: Data should explicitly link to other pieces of data, and these links should be followable.
  • Semantic: Every data Atom and relation has a clear semantic meaning.
  • Interoperable: Plays nice with other data formats (e.g. JSON, XML, and all RDF formats).
  • Open: Free to use, open source, no strings attached.
  • Clear Ownership: The data shows who (or which domain) is in control of the data, so new versions of the data can easily be retrieved.
  • Mergeable: Any two sets of Atoms can be merged into a single graph without any merge conflicts / name collisions.
  • Extensible: Anyone can define their own data types and create Atoms with it.
  • ORM-friendly: Navigate a decentralized graph by using dot.syntax, similar to how you navigate a JSON object in javascript.
  • Type-safe: All valid Atomic data has an unambiguous, static datatype.


A Resource is a bunch of information about a thing, referenced by a single link (the Subject). Formally, it is a set of Atoms (i.e. a Graph) that share a Subject URL. You can think of a Resource as a single row in a spreadsheet or database. In practice, Resources can be anything - a Person, a Blogpost, a Todo item. A Resource consists of at least one Atom, so it always has some Property and some Value. A Property can only occur once in every Resource.

Atom (or Atomic Triple)

Every Resource is composed of Atoms. The Atom is the smallest possible piece of meaningful data / information (hence the name). You can think of an Atom as a single cell in a spreadsheet or database. An Atom consists of three fields:

  • Subject: the Thing that the atom is providing information about.
  • Property: the property of the Thing that the atom is about (will always be a URL to a Property).
  • Value: the new piece of information about the Atom.

If you're familiar with RDF, you'll notice similarities. An Atom is comparable with an RDF Triple / Statement (although there are important differences).

Let's turn this sentence into Atoms:

Arnold Peters, who's born on the 20th of Januari 1991, has a best friend named Britta Smalls.

Arnoldlast namePeters
Arnoldbest friendBritta
Brittalast nameSmalls

The table above shows human readable strings, but in Atomic Data, we use links (URLs) wherever we can. That's because links are awesome. Links remove ambiguity (we know exactly which person or property we mean), they are resolvable (we can click on them), and they are machine readable (machines can fetch links to do useful things with them). So the table from above, will more closely resemble this one:


The standard serialization format for Atomic Data is JSON-AD, which looks like this:

  "@id": "https://example.com/arnold",
  "https://example.com/properties/lastname": "Peters",
  "https://example.com/properties/birthDate": "1991-01-20",
  "https://example.com/properties/bestFriend": "https://example.com/britta",
  "@id": "https://example.com/britta",
  "https://example.com/properties/lastname": "Smalls",

The @id field denotes the Subject of each Resource, which is also the URL that should point to where the resource can be found.

In the JSON-AD example above, we have:

  • two Resources, describing two different Subjects: https://example.com/arnold and https://example.com/britta.
  • three different Properties (https://example.com/properties/bornAt, https://example.com/properties/firstName, and https://example.com/properties/bestFriend)
  • four Values (1991-01-20, Arnold, https://example.com/britta and Britta)
  • four Atoms - every row is one Atom.

All Subjects and Properties are Atomic URLs: they are links that point to more Atomic Data. One of the Values is a URL, too, but we also have values like Arnold and 1991-01-20. Values can have different Datatypes In most other data formats, the datatypes are limited and are visually distinct. JSON, for example, has array, object, string, number and boolean. In Atomic Data, however, datatypes are defined somewhere else, and are extendible. To find the Datatype of an Atom, you fetch the Property, and that Property will have a Datatype. For example, the https://example.com/properties/bornAt Property requires an ISO Date string, and the https://example.com/properties/firstName Property requires a regular string. This might seem a little tedious and weird at first, but is has some nice advantages! Their Datatypes are defined in the Properties.

Subject field

The Subject field is the first part of an Atom. It is the identifier that the rest of the Atom is providing information about. The Subject field is a URL that points to the Resource. The creator of the Subject MUST make sure that it resolves. In other words: following / downloading the Subject link will provide you with all the Atoms about the Subject (see Querying Atomic Data. This also means that the creator of a Resource must make sure that it is available at its URL - probably by hosting the data, or by using some service that hosts it. In JSON-AD, the Subject is denoted by @id.

Property field

The Property field is the second part of an Atom. It is a URL that points to an Atomic Property. For example https://example.com/createdAt or https://example.com/firstName.

The Property field MUST be a URL, and that URL MUST resolve to an Atomic Property, which contains information about the Datatype.

Value field

The Value field is the third part of an Atom. In RDF, this is called an object. Contrary to the Subject and Property values, the Value can be of any datatype. This includes URLs, strings, integers, dates and more.


A Graph is a collection of Atoms. A Graph can describe various subjects, and may or may not be related. Graphs can have several characteristics (Schema Complete, Valid, Closed)

Nested Resource

A Nested Resource only exists inside of another resource. It does not have its own subject.

In the following JSON-AD example, the address is a nested resource:

  "@id": "https://example.com/arnold",
  "https://example.com/properties/address": {
    "https://example.com/properties/firstLine": "Longstreet 22",
    "https://example.com/properties/city": "Watertown",
    "https://example.com/properties/country": "the Netherlands",

Nested Resources can be named or anonymous. An Anonymous Nested Resource does not have it's own @id field. It does have its own unique path, which can be used as its identifier.

In the next chapter, we'll explore how Atomic Data is serialized.

Serialization of Atomic Data

Atomic Data is not necessarily bound to a single serialization format. It's fundamentally a data model, and that's an important distinction to make.


However, it's recommended to use JSON-AD (more about that on the next page), which is specifically designed to be a simple, complete and performant format for Atomic Data.

JSON (simple)

Atomic Data is designed to be serializable to clean, simple JSON, for usage in (client) apps that don't need to know the full URLs of properties.

RDF serialization formats

Since Atomic Data is a strict subset of RDF, RDF serialization formats can be used to communicate and store Atomic Data, such as N-Triples, Turtle, HexTuples, JSON-LD and other RDF serialization formats. However, not all valid RDF is valid Atomic Data. Atomic Data is more strict. Read more about serializing Atomic Data to RDF in the RDF interoperability section.

Experimental serialization formats

Some experimental ideas for Atomic Data serialization are written here.

JSON-AD: The Atomic Data serialization format

Although you an use various serialization formats for Atomic Data, JSON-AD is the default serialization format. It is what the current Rust and Typescript / React implementations use to communicate. It is designed to feel familiar to developers an to be easy and performant to parse and serialize. It is inspired by JSON-LD.

It uses JSON, but has some additional constraints:

  • Every single Object is a Resource.
  • Every Key is a Property URL. Other keys are invalid. Each Property URL must resolve to an online Atomic Data Property.
  • The @id field is special: it defines the Subject of the Resource.
  • JSON arrays are mapped to Resource Arrays
  • Numbers can be Integers, Timestamps or Floats.
  • JSON booleans map to Booleans.
  • JSON strings can be many datatypes, including String, Markdown, Date or other.
  • Nested JSON Objects are Nested Resources. A Nested Resource can either be Anonymous (without an @id subject) or a Named Nested Resource with an @id subject. Everywhere a Subject URL can be used as a value (i.e. all properties with the datatype atomicURL), a Nested Resource can be used instead. This also means that an item in an ResourceArray can be a Nested Resource.
  • The root data structure must either be a Named Resource (with an @id), or an Array containing Named Resources. When you want to describe multiple Resources in one JSON-AD document, use an array as the root item.

Let's look at an example JSON-AD Resource:

  "@id": "https://atomicdata.dev/properties/description",
  "https://atomicdata.dev/properties/datatype": "https://atomicdata.dev/datatypes/markdown",
  "https://atomicdata.dev/properties/description": "A textual description of something. When making a description, make sure that the first few words tell the most important part. Give examples. Since the text supports markdown, you're free to use links and more.",
  "https://atomicdata.dev/properties/isA": [
  "https://atomicdata.dev/properties/shortname": "description"

The mime type (for HTTP content negotiation) is application/ad+json (registration ongoing).

Nested, Anonymous and Named resources

In JSON-AD, a Resource can be respresented in multiple ways:

  • Subject: A URL string, such as https://atomicdata.dev/classes/Class.
  • Named Resource: A JSON Object with an @id field containing the Subject.
  • Anonymous Nested Resource A JSON Object without an @id field. This is only possible if it is a Nested Resource, which means that it has a parent Resource.

Note that this is also valid for ResourceArrays, which usually only contain Subjects, but are allowed to contain Nested Resources.

JSON-AD Parsers, serializers and other libraries

  • Typescript / Javacript: @tomic/lib JSON-AD parser + in-memory store. Works with @tomic/react for rendering Atomic Data in React.
  • Rust: atomic_lib has a JSON-AD parser / serializer (and does a lot more).

Canonicalized JSON-AD

When you need deterministic serialization of Atomic Data (e.g. when calculating a cryptographic hash or signature, used in Atomic Commits), you can use the following procedure:

  1. Serialize your Resource to JSON-AD
  2. Do not include empty objects, empty arrays or null values.
  3. All keys are sorted alphabetically (lexicographically) - both in the root object, as in any nested objects.
  4. The JSON-AD is minified: no newlines, no spaces.

The last two steps of this process is more formally defined by the JSON Canonicalization Scheme (JCS, rfc8785).

Interoperability with JSON and JSON-LD

Read more about this subject.

Querying Atomic Data

There are multiple ways of getting Atomic Data into some system:

Atomic Paths

An Atomic Path is a string that consist of one or more URLs, which when traversed point to an item. For more information, see Atomic Paths.

Subject fetching (HTTP)

The simplest way of getting Atomic Data when the Subject is an HTTP URL, is by sending a GET request to the subject URL. Set the Content-Type header to an Atomic Data compatible mime type, such as application/ad+json.

GET https://atomicdata.dev/test HTTP/1.1
Content-Type: application/ad+json

The server SHOULD respond with all the Atoms of which the requested URL is the subject:

HTTP/1.1 200 OK
Content-Type: application/ad+json
Connection: Closed

  "@id": "https://atomicdata.dev/test",
  "https://atomicdata.dev/properties/shortname": "1611489928"

The server MAY also include other resources, if they are deemed relevant.

Triple Pattern Fragments

Triple Pattern Fragments (TPF) is an interface for querying RDF. It works great for Atomic Data as well.

An HTTP implementation of a TPF endpoint might accept a GET request to a URL such as this:


Make sure to URL encode the subject, property, value strings.

For example, let's search for all Atoms where the value is test.

GET https://atomicdata.dev/tpf?value=0 HTTP/1.1
Content-Type: text/turtle

This is the HTTP response:

HTTP/1.1 200 OK
Content-Type: text/turtle
Connection: Closed

<https://atomicdata.dev/agents> <https://atomicdata.dev/properties/collection/currentPage> "0"^^<https://atomicdata.dev/datatypes/integer> .


SPARQL is a powerful RDF query language. Since all Atomic Data is also valid RDF, it should be possible to query Atomic Data using SPARQL.

  • Convert / serialize Atomic Data to RDF (for example by using the /tpf endpoint and an accept header: curl -i -H "Accept: text/turtle" "https://atomicdata.dev/tpf")
  • Load it into a SPARQL engine (e.g. )

Atomic Paths

An Atomic Path is a string that consists of at least one URL, followed by one or more URLs or Shortnames. Every single value in an Atomic Resource can be targeted through such a Path. They can be used as identifiers for specific Values.

The simplest path, is the URL of a resource, which represents the entire Resource with all its properties. If you want to target a specific atom, you can use an Atomic Path with a second URL. This second URL can be replaced by a Shortname, if the Resource is an instance of a class which has properties with that Shortname (sounds more complicated than it is).


Let's start with this simple Resource:

  "@id": "https://example.com/john",
  "https://example.com/lastName": "McLovin",

Then the following Path targets the McLovin value:

https://example.com/john https://example.com/lastName => McLovin

Instead of using the full URL of the lastName Property, we can use its shortname:

https://example.com/john lastname => McLovin

We can also traverse relationships between resources:

  "@id": "https://example.com/john",
  "https://example.com/lastName": "McLovin",
  "https://example.com/employer": "https://example.com/XCorp",
  "@id": "https://example.com/XCorp",
  "https://example.com/description": "The greatest company!",

https://example.com/john employer description => The greatest company!

In the example above, the XCorp subject exists and is the source of the The greatest company! value. We can use this path as a unique identifier for the description of John's current employer. Note that the data for the description of that employer does not have to be in John's control for this path to work - it can live on a totally different server. However, in Atomic Data it's also possible to include this description in the resource of John as a Nested Resource.

Nested Resources

All Atomic Data Resources that we've discussed so far have an explicit URL as a subject. Unfortunately, creating unique and resolvable URLs can be a bother, and sometimes not necessary. If you've worked with RDF, this is what Blank Nodes are used for. In Atomic Data, we have something similar: Nested Resources.

Let's use a Nested Resource in the example from the previous section:

  "@id": "https://example.com/john",
  "https://example.com/lastName": "McLovin",
  "https://example.com/employer": {
    "https://example.com/description": "The greatest company!",

Now the employer is simply a nested Object. Note that it no longer has its own @id. However, we can still identify this Nested Resource using its Path.

The Subject of the nested resource is its path: https://example.com/john https://example.com/employer, including the spacebar.

Note that the path from before still resolves:

https://example.com/john employer description => The greatest company!

Traversing Arrays

We can also navigate Arrays using paths.

For example:

  "@id": "https://example.com/john",
  "hasShoes": [
      "https://example.com/name": "Mr. Boot",
      "https://example.com/name": "Sunny Sandals",

The Path of Mr. Boot is:

https://example.com/john hasShoes 0 name

You can target an item in an array by using a number to indicate its position, starting with 0.

Notice how the Resource with the name: Mr. Boot does not have an explicit @id, but it does have a Path. This means that we still have a unique, globally resolvable identifier - yay!

Try for yourself

Install the atomic-cli software and run atomic-cli get https://atomicdata.dev/classes/Class description.

Atomic Schema

Atomic Schema is the proposed standard for specifying classes, properties and datatypes in Atomic Data. You can compare it to UML diagrams, or what XSD is for XML. Atomic Schema deals with validating and constraining the shape of data. It is designed for checking if all the required properties are present, and whether the values conform to the datatype requirements (e.g. datetime, or URL).

This section will define various Classes, Properties and Datatypes (discussed in Atomic Core: Concepts).

Design Goals

  • Decentralized: Classes and Properties can be defined in external systems, and are resolved using web protocols such as HTTP.
  • Typed: Every Atom of data has a clear datatype. Validated data should be highly predictable.
  • IDE-friendly: Although Atomic Schema uses many URLs, users / developers should not have to type full URLs. The schema uses shortnames as aliases.
  • Self-documenting: When seeing a piece of data, simply following links will explain you how the data model is to be understood. This removes the need for (most of) existing API documentation.
  • Extensible: Anybody can create their own Datatypes, Properties and Classes.
  • Accessible: Support for languages, easily translatable. Useful for humans and machines.
  • Atomic: All the design goals of Atomic Data itself also apply here. Atomic Schema is defined using Atomic Data.

In short

In short, Atomic Schema works like this:

The Property field in an Atom, or the key in a JSON-AD object, links to a Property Resource. It is important that the URL to the Property Resource resolves, as others can re-use it and check its datatype. This Property does three things:

  1. it links to a Datatype which indicates which Value is acceptable.
  2. it has a description which tells you what the property means, what the relationship between the Subject and the Value means.
  3. it provides a Shortname, which is sometimes used as an alternative to the full URL of the Property.

DataTypes define the shape of the Value, e.g. a Number (124) or Boolean (true).

Classes are a special kind of Resource that describe an abstract class of things (such as "Person" or "Blog"). Classes can recommend or require a set of Properties. They behave as Models, similar to struts in C or interfaces in Typescript. A Resource could have one or more classes, which could provide information about which Properties are expected or required.


  "@id": "https://atomicdata.dev/classes/Agent",
  "https://atomicdata.dev/properties/description": "An Agent is a user that can create or modify data. It has two keys: a private and a public one. The private key should be kept secret. The public key is used to verify signatures (on [Commits](https://atomicdata.dev/classes/Commit)) set by the of the Agent.",
  "https://atomicdata.dev/properties/isA": [
  "https://atomicdata.dev/properties/recommends": [
  "https://atomicdata.dev/properties/requires": [
  "https://atomicdata.dev/properties/shortname": "agent"

Atomic Schema: Classes

The following Classes are some of the most fundamental concepts in Atomic Data, as they make data validation possible.

Click the URLs of the classes to read the most actual data, and discover their properties!


URL: https://atomicdata.dev/classes/Property

The Property class. The thing that the Property field should link to. A Property is an abstract type of Resource that describes the relation between a Subject and a Value. A Property provides some semantic information about the relationship (in its description), it provides a shorthand (the shortname) and it links to a Datatype.

Properties of a Property instance:

  • shortname - (required, Slug) the shortname for the property, used in ORM-style dot syntax (thing.property.anotherproperty).
  • description - (optional, AtomicURL, TranslationBox) the semantic meaning of the.
  • datatype - (required, AtomicURL, Datatype) a URL to an Atomic Datatype, which defines what the datatype should be of the Value in an Atom where the Property is the
  • classtype - (optional, AtomicURL, Class) if the datatype is an Atomic URL, the classtype defines which class(es?) is (are?) acceptable.
  "@id": "https://atomicdata.dev/properties/description",
  "https://atomicdata.dev/properties/datatype": "https://atomicdata.dev/datatypes/markdown",
  "https://atomicdata.dev/properties/description": "A textual description of something. When making a description, make sure that the first few words tell the most important part. Give examples. Since the text supports markdown, you're free to use links and more.",
  "https://atomicdata.dev/properties/isA": [
  "https://atomicdata.dev/properties/shortname": "description"

Visit https://atomicdata.dev/collections/property for a list of example Properties.


URL: https://atomicdata.dev/classes/Datatype

A Datatype specifies how a Value value should be interpreted. Datatypes are concepts such as boolean, string, integer. Since DataTypes can be linked to, you dan define your own. However, using non-standard datatypes limits how many applications will know what to do with the data.


  • description - (required, AtomicURL, TranslationBox) how the datatype functions.
  • stringSerialization - (required, AtomicURL, TranslationBox) how the datatype should be parsed / serialized as an UTF-8 string
  • stringExample - (required, string) an example stringSerialization that should be parsed correctly
  • binarySerialization - (optional, AtomicURL, TranslationBox) how the datatype should be parsed / serialized as a byte array.
  • binaryExample - (optional, string) an example binarySerialization that should be parsed correctly. Should have the same contents as the stringExample. Required if binarySerialization is present on the DataType.

Visit https://atomicdata.dev/collections/datatype for a list of example Datatypes.


URL: https://atomicdata.dev/classes/Class

A Class is an abstract type of Resource, such as Person. It is convention to use an Uppercase in its URI. Note that in Atomic Data, a Resource can have several Classes - not just a single one. If you need to set more complex constraints to your Classes (e.g. maximum string length, Properties that depend on each other), check out SHACL.


  • shortname - (required, Slug) a short string shorthand.
  • description - (required, AtomicURL, TranslationBox) human readable explanation of what the Class represents.
  • requires - (optional, ResourceArray, Property) a list of Properties that are required. If absent, none are required. These SHOULD have unique shortnames.
  • recommends - (optional, ResourceArray, Property) a list of Properties that are recommended. These SHOULD have unique shortnames.

A resource indicates it is an instance of that class by adding a https://atomicdata.dev/properties/isA Atom.


  "@id": "https://atomicdata.dev/classes/Class",
  "https://atomicdata.dev/properties/description": "A Class describes an abstract concept, such as 'Person' or 'Blogpost'. It describes the data shape of data and explains what the thing represents. It is convention to use Uppercase in its URL. Note that in Atomic Data, a Resource can have several Classes - not just a single one.",
  "https://atomicdata.dev/properties/isA": [
  "https://atomicdata.dev/properties/recommends": [
  "https://atomicdata.dev/properties/requires": [
  "https://atomicdata.dev/properties/shortname": "class"

Visit https://atomicdata.dev/collections/class for the a list of example Classes.

Atomic Schema: Datatypes

The Atomic Datatypes consist of some of the most commonly used Datatypes.

Please visit https://atomicdata.dev/collections/datatype for the latest list of official Datatypes.


URL: https://atomicdata.dev/datatypes/slug

A string with a limited set of allowed characters, used in IDE / Text editor context. Only letters, numbers and dashes are allowed.

Regex: ^[a-z0-9]+(?:-[a-z0-9]+)*$

Atomic URL

URL: https://atomicdata.dev/datatypes/atomicURL

A URL that should resolve to an Atomic Resource.


URL: https://atomicdata.dev/datatypes/URI

A Uniform Resource Identifier, preferably a URL (i.e. an URI that can be fetched). Could be HTTP, HTTPS, or any other type of schema.


URL: https://atomicdata.dev/datatypes/string

UTF-8 String, no max character count. Newlines use backslash escaped \n characters. Should not contain language specific data, use a TranslationBox instead.

e.g. String time! \n Second line!


URL: https://https://atomicdata.dev/datatypes/markdown

A markdown string, using the CommonMark syntax. UTF-8 formatted, no max character count, newlines are \n.


# Heading

Paragraph with [link](https://example.com).


URL: https://atomicdata.dev/datatypes/integer

Signed Integer, max 64 bit. Max value: 9223372036854775807

e.g. -420


URL: https://atomicdata.dev/datatypes/float

Number with a comma. Max value: 9223372036854775807

e.g. -420


URL: https://atomicdata.dev/datatypes/boolean

True or false, one or zero.

String serialization

true or false.

Binary serialization

Use a single bit one boolean.

1 for true, or 0 for false.


ISO date without time. YYYY-MM-DD.

e.g. 1991-01-20


URL: https://atomicdata.dev/datatypes/timestamp

Similar to Unix Timestamp. Milliseconds since midnight UTC 1970 jan 01 (aka the Unix Epoch). Use this for most DateTime fields. Signed 64 bit integer (instead of 32 bit in Unix systems).

e.g. 1596798919 (= 07 Aug 2020 11:15:19)


URL: https://atomicdata.dev/datatypes/resourceArray

Sequential, ordered list of Atomic URIs. Serialized as a JSON array with strings. Note that other types of arrays are not included in this spec, but can be perfectly valid. (discussion)

  • e.g. ["https://example.com/1", "https://example.com/1"]

Atomic Translations

Status: design / concept stage

Dealing with translations can be hard. (See discussion on this subject here.)


URL: https://atomicdata.dev/classes/TranslationBox (does not resolve yet)

A TranslationBox is a collection of translated strings, uses to provide multiple translations. It has a long list of optional properties, each corresponding to some language. Each possible language Property uses the following URL template: https://atomicdata.dev/languages/{langguageTag}. Use a BCP 47 language tag, e.g. nl or en-US.

For example:

  "@id": "https://example.com/john",
  "https://example.com/properties/lifestory": {
    "https://atomicdata.dev/languages/en": "Well, John was born and later he died.",
    "https://atomicdata.dev/languages/nl": "Tsja, John werd geboren en stierf later."

Every single property used for Translation strings are instances of the Translation class.

A translation string uses the MDString datatype, which means it allows Markdown syntax.

Atomic Schema FAQ

How do I create a Property that supports multiple Datatypes?

A property only has one single Datatype. However, feel free to create a new kind of Datatype that, in turn, refers to other Datatypes. Perhaps Generics, or Option like types should be part of the Atomic Base Datatypes.

Do you have an enum datatype?

In Atomic Data, enum is not a datatype, but it's a constraint that can be added to properties that have You can set allows-only on a Property, and use that to limit which values are allowed.

How should a client deal with Shortname collisions?

Atomic Data guarantees Subject-Property uniqueness, which means that Valid Resources are guaranteed to have only one of each Property. Properties offer Shortnames, which are short strings. These strings SHOULD be unique inside Classes, but these are not guaranteed to be unique inside all Resources. Note that Resources can have multiple Classes, and through that, they can have colliding Shortnames. Resources are also free to include Properties from other Classes, and their Shortnames, too, might collide.

For example:

  "@id": "https://example.com/people/123",
  "https://example.com/name": "John",
  "https://another.example.com/someOtherName": "Martin"

Let's assume that https://example.com/name and https://another.example.com/someOtherName are Properties that have the Shortname: name.

What if a client tries something such as people123.name? To consistently return a single value, we need some type of precedence:

  1. The earlier Class mentioned in the isA Property of the resource. Resources can have multiple classes, but they appear in an ordered ResourceArray. Classes, internally SHOULD have no key collisions in required and recommended properties, which means that they might have. If these exist internally, sort the properties by how they are ordered in the isA array - first item is preferred.
  2. When the Properties are not part of any of the mentioned Classes, use Alphabetical sorting of the Property URL.

When shortname collisions are possible, it's recommended to not use the shortname, but use the URL of the Property:


It is likely that using the URL for keys is also the most performant, since it probably more closely mimics the internal data model.

Many features in Atomic Data apps depend on the availability of Resources on their subject URL. If that server is offline, or the URL has changed, the existing links will break. This is a fundamental problem to HTTP, and not unique to Atomic Data. Like with websites, hosts should make sure that their server stays available, and that URLs remain static.

One possible solution to this problem, is using Content Addressing, such as the IPFS protocol enables, which is why we're planning for using that in the near future.

Another approach, is using foreign keys (see issue).

How does Atomic Schema relate to RDF / SHACL / SheX / OWL / RDFS?

Atomic Schema is the schema language for atomic data, whereas RDF has a couple of competing ones, which all vary greatly. In short, OWL is not designed for schema validation, but SHACL and SheX can maybe be compared to Atomic Schema. An important difference is that SHACL and SheX have to deal with all the complexities of RDF, whereas Atomic Data is more constrained.

For more information, see RDF interoperability.

Atomic Agents

Atomic Agents are used for authentication: to set an identity and prove who an actor actually is. Agents can represent both actual individuals, or machines that interact with data. Agents are the entities that can get write / read rights. Agents are used to sign Requests and Commits and to accept Invites.

Design goals

  • Decentralized: Atomic Agents can be created by anyone, at any domain
  • Easy: It should be easy to work with, code with, and use
  • Privacy-friendly: Agents should allow for privacy friendly workflows
  • Verifiable: Others should be able to verify who did what
  • Secure: Resistant to attacks by malicious others

The Agent model

url: https://atomicdata.dev/classes/Agent

An Agent is a Resource with its own URL. When it is created, the one creating the Agent will generate a cryptographic (Ed25519) keypair. It is required to include the publicKey in the Agent resource. The privateKey should be kept secret, and should be safely stored by the creator. For convenience, a secret can be generated, which is a single long string of characters that encodes both the privateKey and the subject of the Agent. This secret can be used to instantly, easily log in using a single string.

The publicKey is used to verify commit signatures by that Agent, to check if that Agent actually did create and sign that Commit.

Creating an Agent

Since an Agent is used for verification of commits, the Agent's subject should resolve and be publicly available. This means that the one creating the Agent has to deal with this. One way of doing this, is by hosting an Atomic Server. An easier way of doing this, is by accepting an Invite that exists on someone else's server.

Hierarchy, rights and authorization

Hierarchies help make information easier to find and understand. For example, most websites use breadcrumbs to show you where you are. Your computer probably has a bunch of drives and deeply nested folders that contain files. We generally use these hierarchical elements to keep data organized, and to keep a tighter grip on rights management. For example, sharing a specific folder with a team, but a different folder could be private.

Although you are free to use Atomic Data with your own custom authorization system, we have a standardized model that is currently being used by some of the tools that we've built.

Design goals

  • Fast. Authorization can sometimes be costly, but in this model we'll be considering performance.
  • Simple. Easy to understand, easy to implement.
  • Handles most basic use-cases. Should deal with basic read / write access control, calculating the size of a folder, rendering things in a tree.

Atomic Hierarchy Model

  • Every Resource SHOULD have a parent.
  • Any Resource can be a parent of some other Resource, as long as both Resources exists on the same Atomic Server.
  • Inversely, every Resource could have children.
  • Only Drives (Resources with the class Drive) are allowed to be a top-level parent.
  • Any Resource might have read and write Atoms. These both contain a list of Agents. These Agents will be granted the rights to edit (using Commits) or read / use the Resources.
  • Rights are additive, which means that the rights add up. If a Resource itself has no write Atom containing your Agent, but it's parent does have one, you will still get the write right.
  • Rights cannot be removed by children or parents - they can only be added.


See authentication.

Current limitations of the current Authorization model

The specification is growing (and please contribute in the docs repo), but the current specification lacks some features:

  • Rights can only be added, but not removed in a higher item of a hierarchy. This means that you cannot have a secret folder inside a public folder.
  • No model for representing groups of Agents, or other runtime checks for authorization.
  • No way to limit delete access seperately from write rights
  • No way to request a set of rights for a Resource

Authentication in Atomic Data

Atomic Data uses Hierarchies to describe who gets to access some resource, and who can edit it. When an Agent wants to edit a resource, they have to send a signed Commit. But how do we deal with reading data, how do we know who is trying to get access?

Design goals

  • Secure: Because, what's the point of authentication if it's not?
  • Ease of use: Setting up an identity should not require any effort, and proving identity should be minimal effort.
  • Anonimity allowed: Users should be able to have multiple identities, some of which are fully anonymous.
  • Self-sovereign: No dependency on servers that user's don't control. Or at least, minimise this.
  • Dummy-proof: We need a mechanism for dealing with forgetting passwords / client devices losing data.
  • Compatible with Commits: Atomic Commits require clients to sign things. Ideally, this functionality / strategy would also fit with the new model.
  • Fast: Of course, authentication will slow things down. But let's keep that to a minimum.

Authentication is done by signing individual (HTTP) requests with the Agent's private key.

Sending a request

Here's an example (js) client side implementation with comments:

// The Private Key of the agent is used for signing
// https://atomicdata.dev/properties/publicKey
const privateKey = "someBase64Key";
// The current time as milliseconds since unix epoch
const timestamp = Math.round(new Date().getTime());;
// This is what you will need to sign.
// The timestmap is to limit the harm of a man-in-the-middle attack.
// The `subject` is the full HTTP url that is to be fetched.
const message = `${subject} ${timestamp}`;
// Sign using Ed25519
const signed = await signToBase64(message, privateKey);
// Set all of these headers
headers.set('x-atomic-public-key', await agent.getPublicKey());
headers.set('x-atomic-signature', signed);
headers.set('x-atomic-timestamp', timestamp.toString());
headers.set('x-atomic-agent', agent?.subject);

Handling a request

  • If none of the x-atomic HTTP headers are present, the server assigns the PublicAgent to the request. This Agent represents any guest who is not signed in.
  • If some (but not all) of the x-atomic headers are present, the server will return with a 500.
  • The server must check the timestamp (max 10 seconds difference).
  • The server must check whether the public key matches the one from the Agent.
  • The server must check if the signature is valid.
  • The server must check if the request resource can be shared

Authentication for websockets

  • Since there's only a single HTTP request, we don't have a subject to fetch. Use ws as a subject, so sign a string like ws 12940791247.

Limitations / considerations

  • Since we need the Private Key to sign Commits and requests, the client should have this available. This means the client software as well as the user should deal with key management.

Invitations & Tokens

Discussion: https://github.com/ontola/atomic-data/issues/23

At some point on working on something in a web application, you're pretty likely to share that, often not with the entire world. In order to make this process of inviting others as simple as possible, we've come up with an Invitation standard.

Design goals

  • Edit without registration. Be able to edit or view things without being required to complete a registration process.
  • Share with a single URL. A single URL should contain all the information needed.
  • (Un)limited URL usage. A URL might be re-usable, or maybe not.


  1. The Owner or a resource creates an Invite. This Invite points to a target Resource, provides read rights by default but can additionally add write rights, contains a bunch of usagesLeft.
  2. The Guest opens the Invite URL. This returns the Invite resource, which provides the client with the information needed to do the next request which adds the actual rights.
  3. The browser client app might generate a set of keys, or use an existing one. It sends the Agent URL to the Invite in a query param.
  4. The server will respond with a Redirect resource, which links to the newly granted target resource.
  5. The Guest will now be able to access the Resource.

Try it on https://atomicdata.dev/invites/1

Atomic Commits

Disclaimer: Work in progress, prone to change.

Atomic Commits is a specification for communicating state changes (events / transactions / patches / deltas / mutations) of Atomic Data. It is the part of Atomic Data that is concerned with writing, editing, removing and updating information.

Design goals

  • Event sourced: Store and standardize changes, as well as the current state. This enables versioning, history playback, undo, audit logs, and more.
  • Traceable origin: Every change should be traceable to an actor and a point in time.
  • Verifiable: Have cryptographic proof for every change. Know when, and what was changed by whom.
  • Identifiable: A single commit has an identifier - it is a resource.
  • Decentralized: Commits can be shared in P2P networks from device to device, whilst maintaining verifiability.
  • Extensible: The methods inside a commit are not fixed. Use-case specific methods can be added by anyone.
  • Streamable: The commits could be used in streaming context.
  • Familiar: Introduces as little new stuff as possible (no new formats or language to learn)
  • Pub/Sub: Subscribe to changes and get notified on changes.
  • ACID-compliant: An Atomic commit will only occur if it results in a valid state.
  • Atomic: All the Atomic Data design goals also apply here.


Although it's a good idea to keep data at the source as much as possible, we'll often need to synchronize two systems. For example when data has to be queried or indexed differently than its source can support. Doing this synchronization can be very difficult, since most of our software is designed to only maintain and share the current state of a system.

I noticed this mainly when working on OpenBesluitvorming.nl - an open data project where we aimed to fetch and standardize meeting data (votes, meeting minutes, documents) from 150+ local governments in the Netherlands. We wrote software that fetched data from various systems (who all had different models, serialization formats and APIs), transformed this data to a single standard and share it through an API and a fulltext search endpoint. One of the hard parts was keeping our data in sync with the sources. How could we now if something was changed upstream? We queried all these systems every night for all meetings from the next and previous month, and made deep comparisons to our own data.

This approach has a couple of issues:

  • It costs a lot of resources, both for us and for the data suppliers.
  • It's not real-time - we can only run this once every 24 ours (because of how costly it is).
  • It's very prone to errors. We've had issues during all phases of Extraction, Transformation and Loading (ETL) processing.
  • It causes privacy issues. When some data at the source is removed (because it contained faulty or privacy sensitive data), how do we learn about that?

Persisting and sharing state changes could solve these issues. In order for this to work, we need to standardize this for all data suppliers. We need a specification that is easy to understand for most developers.

Keeping track of where data comes from is essential to knowing whether you can trust it - whether you consider it to be true. When you want to persist data, that quickly becomes bothersome. Atomic Data and Atomic Commits aim to make this easier by using cryptography for ensuring data comes from some particular source, and is therefore trustworthy.

If you want to know how Atomic Commits differ from other specs, see the compare section

Atomic Commits: Concepts


url: https://atomicdata.dev/classes/Commit

A Commit is a Resource that describes how a Resource must be updated. It can be used for auditing, versioning and feeds. It is cryptographically signed by an Agent.

The required fields are:

  • subject - The thing being changed. A Resource Subject URL (HTTP identifier) that the Commit is changing about. A Commit Subject must not contain query parameters, as these are reserved for dynamic resources.
  • signer - Who's making the change. The Atomic URL of the Author's profile - which in turn must contain a publicKey.
  • signature - Cryptographic proof of the change. A hash of the JSON-AD serialized Commit (without the signature field), signed by the Agent's private-key. This proves that the author is indeed the one who created this exact commit. The signature of the Commit is also used as the identifier of the commit.
  • created-at - When the change was made. A UNIX timestamp number of when the commit was created.

The optional method fields describe how the data must be changed:

  • destroy - If true, the existing Resource will be removed.
  • remove - an array of Properties that need to be removed (including their values).
  • set - a Nested Resource which contains all the new or edited fields.

These commands are executed in the order above. This means that you can set destroy to true and include set, which empties the existing resource and sets new values.

Posting commits using HTTP

Since Commits contains cryptographic proof of authorship, they can be accepted at a public endpoint. There is no need for authentication.

A commit should be sent (using an HTTPS POST request) to a /commmit endpoint of an Atomic Server. The server then checks the signature and the author rights, and responds with a 2xx status code if it succeeded, or an 5xx error if something went wrong. The error will be a JSON object.

Serialization with JSON-AD

Let's look at an example Commit:

  "@id": "https://atomicdata.dev/commits/3n+U/3OvymF86Ha6S9MQZtRVIQAAL0rv9ZQpjViht4emjnqKxj4wByiO9RhfL+qwoxTg0FMwKQsNg6d0QU7pAw==",
  "https://atomicdata.dev/properties/createdAt": 1611489929370,
  "https://atomicdata.dev/properties/isA": [
  "https://atomicdata.dev/properties/set": {
    "https://atomicdata.dev/properties/shortname": "1611489928"
  "https://atomicdata.dev/properties/signature": "3n+U/3OvymF86Ha6S9MQZtRVIQAAL0rv9ZQpjViht4emjnqKxj4wByiO9RhfL+qwoxTg0FMwKQsNg6d0QU7pAw==",
  "https://atomicdata.dev/properties/signer": "https://surfy.ddns.net/agents/9YCs7htDdF4yBAiA4HuHgjsafg+xZIrtZNELz4msCmc=",
  "https://atomicdata.dev/properties/subject": "https://atomicdata.dev/test"

This Commit can be sent to any Atomic Server. This server, in turn, should verify the signature and the author's rights before the server applies the Commit.

Calculating the signature

The signature is a base64 encoded Ed25519 signature of the deterministically serialized Commit. Calculating the signature is a delicate process that should be followed to the letter - even a single character in the wrong place will result in an incorrect signature, which makes the Commit invalid.

The first step is serializing the commit deterministically. This means that the process will always end in the exact same string.

  • Serialize the Commit as JSON-AD.
  • Do not serialize the signature field.
  • Do not include empty objects or arrays.
  • If destroy is false, do not include it.
  • All keys are sorted alphabetically - both in the root object, as in any nested objects.
  • The JSON-AD is minified: no newlines, no spaces.

This will result in a string. The next step is to sign this string using the Ed25519 private key from the Author. This signature is a byte array, which should be encoded in base64 for serialization. Make sure that the Author's URL resolves to a Resource that contains the linked public key.

Congratulations, you've just created a valid Commit!

Here are currently working implementations of this process, including serialization and signing (links are permalinks).

If you want validate your implementation, check out the tests for these two projects.


  • Commits adjust only one Resource at a time, which means that you cannot change multiple in one commit.
  • The one creating the Commit will need to sign it, which may make clients that write data more complicated than you'd like.
  • Commits require signatures, which means key management. Doing this securely is no trivial matter.
  • The signatures require JSON-AD serialization
  • If your implementation stores all Commits, this means

Atomic Commits compared to other (RDF) delta models

Let's compare the Atomic Commit approach with some existing protocols for communicating state changes / patches / mutations / deltas in linked data, JSON and text files. First, I'll briefly discuss the existing examples (open a PR / issue if we're missing something!). After that, we'll discuss how Atomic Data differs from the existing ones.


This might be an odd one in this list, but it is an interesting one nonetheless. Git is an incredibly popular version control system that is used by most software developers to manage their code. It's a decentralized concept which allows multiple computers to share a log of commits, which together represent a folder with its files and its history. It uses hashing to represent (parts of) data (which keeps the .git folder compact through deduplication), and uses cryptographic keys to sign commits and verify authorship. It is designed to work in the paradigm of text files, newlines and folders. Since most data can be represented as text files in a folder, Git is very flexible. This is partly because people are familiar with Git, but also because it has a great ecosystem - platforms such as Github provide a clean UI, cloud storage, issue tracking, authorization, authentication and more for free, as long as you use Git to manage your versions.

However, Git doesn't work great for structured data - especially when it changes a lot. Git, on its own, does not perform any validations on integrity of data. Git also does not adhere to some standardized serialization format for storing commits, which makes sense, because it was designed as a tool to solve a problem, and not as some standard that is to be used in various other systems. Also, git is kind of a heavyweight abstraction for many applications. It is designed for collaborating on open source projects, which means dealing with decentralized data storage and merge conflicts - things that might not be required in other kinds of scenarios.

RDF mutation systems



Describes changes (RDF Patches) in a specialized turtle-like serialization format.

TX .
PA "rdf" "http://www.w3.org/1999/02/22-rdf-syntax-ns#" .
PA "owl" "http://www.w3.org/2002/07/owl#" .
PA "rdfs" "http://www.w3.org/2000/01/rdf-schema#" .
A <http://example/SubClass> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .
A <http://example/SubClass> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://example/SUPER_CLASS> .
A <http://example/SubClass> <http://www.w3.org/2000/01/rdf-schema#label> "SubClass" .
TC .

Similar to Atomic Commits, these Delta's should have identifiers (URLs), which are denoted in a header.



Spec for classifying and representing state changes between two RDF resources. I wasn't able to find a serialization or an implementation for this.



An ontology for RDF change requests. Looks very interesting, but I'm not able to find any implementations.

prefix :      <http://example.org/> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix pat:  <http://purl.org/hpi/patchr#> .
@prefix guo:  <http://webr3.org/owl/guo#> .
@prefix prov: <http://purl.org/net/provenance/ns#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .
@prefix dbp:  <http://dbpedia.org/resource/> .
@prefix dbo:  <http://dbpedia.org/ontology/> .

:Patch_15 a pat:Patch ;
  pat:appliesTo <http://dbpedia.org/void.ttl#DBpedia_3.5> ;
  pat:status pat:Open ;
  pat:update [
    a guo:UpdateInstruction ;
    guo:target_graph <http://dbpedia.org/> ;
    guo:target_subject dbp:Oregon ;
    guo:delete [dbo:language dbp:De_jure ] ;
    guo:insert [dbo:language dbp:English_language ]
  ] ;
  prov:wasGeneratedBy [a prov:Activity ;
  pat:confidence "0.5"^^xsd:decimal ;
  prov:wasAssociatedWith :WhoKnows ;
  prov:actedOnBehalfOf :WhoKnows#Player_25 ;
  prov:performedAt "..."^^xsd:dateTime ] .



This offers quite a few features besides adding and deleting triples, such as updating lists. It's a unique serialization format, inspired by turtle. Some implementations exists, such as one in ruby which is

PATCH /timbl HTTP/1.1
Host: example.org
Content-Length: 478
Content-Type: text/ldpatch
If-Match: "abc123"

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix schema: <http://schema.org/> .
@prefix profile: <http://ogp.me/ns/profile#> .
@prefix ex: <http://example.org/vocab#> .

Delete { <#> profile:first_name "Tim" } .
Add {
  <#> profile:first_name "Timothy" ;
    profile:image <https://example.org/timbl.jpg> .
} .

Bind ?workLocation <#> / schema:workLocation .
Cut ?workLocation .

UpdateList <#> ex:preferredLanguages 1..2 ( "fr-CH" ) .

Bind ?event <#> / schema:performerIn [ / schema:url = <https://www.w3.org/2012/ldp/wiki/F2F5> ]  .
Add { ?event rdf:type schema:Event } .

Bind ?ted <http://conferences.ted.com/TED2009/> / ^schema:url ! .
Delete { ?ted schema:startDate "2009-02-04" } .
Add {
  ?ted schema:location [
    schema:name "Long Beach, California" ;
    schema:geo [
      schema:latitude "33.7817" ;
      schema:longitude "-118.2054"
} .



An N-Quads serialized delta format. Methods are URLs, which means they are extensible. Does not specify how to bundle lines. Used in production of a web app that we're working on (Argu.co). Designed with simplicity (no new serialization format, simple to parse) and performance in mind.

Initial state:

<http://example.org/resource> <http://example.org/predicate> "Old value 🙈" .


<http://example.org/resource> <http://example.org/predicate> "New value 🐵" <http://purl.org/linked-delta/replace> .

New state:

<http://example.org/resource> <http://example.org/predicate> "New value 🐵" .



A JSON denoted patch notation for RDF. Seems similar to the RDF/JSON serialization format. Uses string literals as operators / methods. Conceptually perhaps most similar to linked-delta.

Has a JS implementation.

    "op": "add",
    "s": "http://example.org/my/resource",
    "p": "http://example.org/ontology#title",
    "o": {
      "value": "New Title",
      "type": "http://www.w3.org/2001/XMLSchema#string"



SPARQL queries that change data.

PREFIX dc: <http://purl.org/dc/elements/1.1/>
  <http://example/book1> dc:title "A new book" ;
                         dc:creator "A.N.Other" .

Allows for very powerful queries, combined with updates. E.g. rename all persons named Bill to William:

PREFIX foaf:  <http://xmlns.com/foaf/0.1/>

WITH <http://example/addresses>
DELETE { ?person foaf:givenName 'Bill' }
INSERT { ?person foaf:givenName 'William' }
  { ?person foaf:givenName 'Bill'

SPARQL Update is the most powerful of the formats, but also perhaps the most difficult to implement and understand.



A simple way to edit JSON objects:

The original document

  "baz": "qux",
  "foo": "bar"

The patch

  { "op": "replace", "path": "/baz", "value": "boo" },
  { "op": "add", "path": "/hello", "value": ["world"] },
  { "op": "remove", "path": "/foo" }

The result

  "baz": "boo",
  "hello": ["world"]

It uses the JSON-Pointer spec for denoting paths. It has quite a bunch of implementations, in various languages.

Atomic Commits

Let's talk about the differences between the concepts above and Atomic Commits.

For starters, Atomic Commits can only work with a specific subset of RDF, namely Atomic Data. RDF allows for blank nodes, does not have subject-predicate uniqueness and offers named graphs - which all make it hard to unambiguously select a single value. Most of the alternative patch / delta models described above had to support these concepts. Atomic Data is more strict and constrained than RDF. It does not support named graphs and blank nodes. This enables a simpler approach to describing state changes, but it also means that Atomic Commits will not work with most existing RDF data.

Secondly, individual Atomic Commits are tightly coupled to specific Resources. A single Commit cannot change multiple resources - and most of the models discussed above to enable this. This is a big constraint, and it does not allow for things like compact migrations in a database. However, this resource-bound constraint opens up some interesting possibilities:

  • it becomes easier to combine it with authorization (i.e. check if the person has the correct rights to edit some resource): simply check if the Author has the rights to edit the Subject.
  • it makes it easier to find all Commits for a Resource, which is useful when constructing a history / audit log / previous version.

Thirdly, Atomic Commits don't introduce a new serialization format. It's just JSON. This means that it will feel familiar for most developers, and will be supported by many existing environments.

Finally, Atomic Commits use cryptography (hashing) to determine authenticity of commits. This concept is borrowed from git commits, which also uses signatures to prove authorship. As is the case with git, this also allows for verifiable P2P sharing of changes.

WebSockets in Atomic Data

WebSockets are a very fast and efficient way to have a client and server communicate in an asynchronous fashion. They are used in Atomic Data to allow real-time updates, which makes it possible to create things like collaborative applications and multiplayer games. These have been implemented in atomic-server and atomic-data-browser (powered by @tomic/lib).

Initializing a WebSocket connection

Send an HTTP GET request to the /ws endpoint of an atomic-server. The Server should update that request to a secure WebSocket (wss) connection. Use x-atomic authentication headers (read more here) and use ws as a subject when signing.

Client to server messages

  • SUBSCRIBE ${subject} tells the Server that you'd like to receive Commits about this Subject.
  • UNSUBSCRIBE ${subject} tells the Server that you'd like to stop receiving Commits about this Subject.
  • GET ${subject} fetch an individual resource.

Server to client messages

  • COMMIT ${CommitBody} an entire Commit for a resource that you're subscribed to
  • RESOURCE ${CommitBody} a resource as a response to a GET request.

Atomic Endpoints

URL: https://atomicdata.dev/classes/Endpoint

An Endpoint is a resource that accepts parameters in order to generate a response. You can think of it like a function in a programming language, or a API endpoint in an OpenAPI spec. It can be used to perform calculations on the server side, such as filtering data, sorting data, selecting a page in a collection, or performing some calculation. Because Endpoints are resources, they can be defined and read programmatically. This means that it's possible to render Endpoints as forms.

The most important property in an Endpoint is parameters, which is the list of Properties that can be filled in.

You can find a list of Endpoints supported by Atomic-Server on atomicdata.dev/endpoints.

Endpoint Resources are dynamic, because their properties could be calculated server-side. When a Property tends to be calculated server-side, they will have a isDynamic property set to true, which tells the client that it's probably useless to try to overwrite it.

A Server can also send a partial Resource for an Endpoint to the client, which means that some properties may be missing. When this is the case, the Resource will have an incomplete property set to true. This tells the client that it has to individually fetch the resource from the server to get the full body.

Design Goals

  • Familiar API: should look like something that most developers already know
  • Auto-generate forms: a front-end app should present Endpoints as forms that non-developers can interact with

Discussion in issue tracker.

Atomic Collections

URL: https://atomicdata.dev/classes/Collection

Sooner or later, developers will have to deal with (long) lists of items. For example, a set of blog posts, activities or users. These lists often need to be paginated, sorted, and filtered. For dealing with these problems, we have Atomic Collections.

An Atomic Collection is a Resource that links to a set of resources. Note that Collections are designed to be dynamic resources, often (partially) generated at runtime. Collections are Endpoints, which means that part of their properties are calculated server-side. Collections have various filters (subject, property, value) that can help to build a useful query.

  • members: How many items (members) are visible per page.
  • subject: Filter results by a property URL.
  • property: Filter results by a property URL.
  • value: Filter results by a Value.
  • sort_by: A property URL by which to sort.
  • sort_desc: Sort descending, instead of ascending. Defaults to false.
  • current_page: The number of the current page.
  • page_size: How many items (members) are visible per page.
  • total_pages: How many pages there are for the current collection.
  • total_items: How many items (members) are visible per page.

Persisting Properties vs Query Parameters

Since Atomic Collections are dynamic resources, you can pass query parameters to it. The keys of the query params match the shortnames of the properties of the Collection.

For example, let's take the Properties Collection on atomicdata.dev. We could limit the page size to 2 by adding the page_size=2 query parameter: https://atomicdata.dev/collections/property?page_size=2. Or we could sort the list by the description property: https://atomicdata.dev/collections/property?sort_by=https%3A%2F%2Fatomicdata.dev%2Fproperties%2Fdescription. Note that URLs need to be URL encoded.

These properties of Collections can either be set by passing query parameters, or they can be persisted by the Collection creator / editor.

Interoperability: Relation to other technology

Atomic data is designed to be easy to use in existing projects, and be interoperable with existing formats. This section will discuss how Atomic Data differs from or is similar to various data formats and paradigms, and how it can interoperate.

Data formats

  • JSON: Atomic Data is designed to be easily serializable to clean, idiomatic JSON. However, if you want to turn JSON into Atomic Data, you'll have to make sure that all keys in the JSON object are URLs that link to Atomic Properties, and the data itself also has to be available at its Subject URL.
  • RDF: Atomic Data is a strict subset of RDF, and can therefore be trivially serialized to all RDF formats (Turtle, N-triples, RDF/XML, JSON-LD, and others). The other way around is more difficult. Turning RDF into Atomic Data requires that all predicates are Atomic Properties, the values must match its properties datatype, the atoms must be available at the subject URL, and the subject-predicate combinations must be unique.


  • IPFS: Content-based addressing to prevent 404s and centralization

Database paradigms

  • SQL: How Atomic Data differs from and could interact with SQL databases

Upgrade guide

  • Upgrade: How to make your existing server-side application compatible with Atomic Data

How does Atomic Data relate to RDF?

RDF (the Resource Description Framework) is a W3C specification from 1999 that describes the original data model for linked data. It is the forerunner of Atomic Data, and is therefore highly similar in its model. Both heavily rely on using URLs, and both have a fundamentally simple and uniform model for data statements. Both view the web as a single, connected graph database. Because of that, Atomic Data is also highly compatible with RDF - all Atomic Data is also valid RDF. Atomic Data can be thought of as a more constrained, type safe version of RDF. However, it does differ in some fundamental ways.

  • Atomic calls the three parts of a Triple subject, property and value, instead of subject, predicate, object.
  • Atomic does not support having multiple statements with the same <subject> <predicate>, every combination must be unique.
  • Atomic does not have literals, named nodes and blank nodes - these are all values, but with different datatypes.
  • Atomic uses nested Resources and paths instead of blank nodes
  • Atomic requires URL (not URI) values in its subjects and properties (predicates), which means that they should be resolvable. Properties must resolve to an Atomic Property, which describes its datatype.
  • Atomic only allows those who control a resource's subject URL endpoint to edit the data. This means that you can't add triples about something that you don't control.
  • Atomic has no separate datatype field, but it requires that Properties (the resources that are shown when you follow a predicate value) specify a datatype. However, it is allowed to serialize the datatype explicitly, of course.
  • Atomic has no separate language field, but it does support Translation Resources.
  • Atomic has a native Event (state changes) model (Atomic Commits), which enables communication of state changes
  • Atomic has a native Schema model (Atomic Schema), which helps developers to know what data types they can expect (string, integer, link, array)
  • Atomic does not support Named Graphs. These should not be needed, because all statements should be retrievable by fetching the Subject of a resource. However, it is allowed to include other resources in a response.

Why these changes?

I have been working with RDF for quite some time now, and absolutely believe in some of the core premises of RDF. I started a company that specializes in Linked Data (Ontola), and we use it extensively in our products and services. Using URIs (and more-so URLs, which are URIs that can be fetched) for everything is a great idea, since it helps with interoperability and enables truly decentralized knowledge graphs. However, some of the characteristics of RDF make it hard to use, and have probably contributed to its relative lack of adoption.

It's too hard to select a specific value (object) in RDF

For example, let's say I want to render someone's birthday:

<example:joep> <schema:birthDate> "1991-01-20"^^xsd:date

Rendering this item might be as simple as fetching the subject URL, filtering by predicate URL, and parsing the object as a date.

However, this is also valid RDF:

<example:joep> <schema:birthDate> "1991-01-20"^^xsd:date <example:someNamedGraph>
<example:joep> <schema:birthDate> <example:birthDateObject> <example:someOtherNamedGraph>
<example:joep> <schema:birthDate> "20th of januari 1991"@en <example:someNamedGraph>
<example:joep> <schema:birthDate> "20 januari 1991"@nl <example:someNamedGraph>
<example:joep> <schema:birthDate> "2000-02-30"^^xsd:date <example:someNamedGraph>

Now things get more complicated if you just want to select the original birthdate value:

  1. Select the named graph. The triple containing that birthday may exist in some named graph different from the subject URL, which means that I first need to identify and fetch that graph.
  2. Select the subject.
  3. Select the predicate.
  4. Select the datatype. You probably need a specific datatype (in this case, a Date), so you need to filter the triples to match that specific datatype.
  5. Select the language. Same could be true for language, too, but that is not necessary in this birthdate example.
  6. Select the specific triple. Even after all our previous selectors, we still might have multiple values. How do I know which is the triple I'm supposed to use?

To be fair, with a lot of RDF data, only steps 2 and 3 are needed, since there are often no subject-predicate collisions. And if you control the data of the source, you can set any constraints that you like, inlcluding subject-predicate uniqueness. But if you're building a system that uses arbitrary RDF, that system also needs to deal with steps 1,4,5 and 6. That often means writing a lot of conditionals and other client-side logic to get the value that you need. It also means that serializing to a format like JSON becomes complicated - you can't just map predicates to keys - you might get collisions. And you can't use key-value stores for storing RDF, at least not in a trivial way. Every single selected value should be treated as an array of unknown datatypes, and that makes it really difficult to build software. All this complexity is the direct result of the lack of subject-predicate uniqueness.

As a developer who uses RDF data, I want to be able to do something like this:

// Fetches the resource
const joep = get("https://example.com/person/joep")
// Returns the value of the birthDate atom
console.log(joep.birthDate()) // => Date(1991-01-20)
// Fetches the employer relation at possibly some other domain, checks that resource for a property with the 'name' shortkey
console.log(joep.employer().name()) // => "Ontola.io"

Basically, I'd like to use all knowledge of the world as if it were a big JSON object. Being able to do that, requires using some things that are present in JSON, and using some things that are present in RDF.

  • Traverse data on various domains (which is already possible with RDF)
  • Have unique subject-predicate combinations (which is default in JSON)
  • Map properties URLs to keys (which often requires local mapping with RDF, e.g. in JSON-LD)
  • Link properties to datatypes (which is possible with ontologies like SHACL / SHEX)

Less focus on semantics, more on usability

One of the core ideas of the semantic web, is that anyone should be able to say anything about anything, using semantic triples. This is one of the reasons why it can be so hard to select a specific value in RDF. When you want to make all graphs mergeable (which is a great idea), but also want to allow anyone to create any triples about any subject, you get subject-predicate non-uniqueness. For the Semantic Web, having semantic triples is great. For linked data, and connecting datasets, having atomic triples (with unique subject-predicate combinations) seems preferable. Atomic Data chooses a more constrained approach, which makes it easier to use the data, but at the cost of some expressiveness.

Changing the names

RDF's subject, predicate and object terminology can be confusing to newcomers, so Atomic Data uses subject, property, value. This more closely resembles common CS terminology. (discussion)

Subject + Predicate uniqueness

As discussed above, in RDF, it's very much possible for a graph to contain multiple statements that share both a subject and a predicate. This is probably because of two reasons:

  1. RDF graphs must always be mergeable (just like Atomic Data).
  2. Anyone can make any statement about any subject (unlike Atomic Data, see next section).

However, this introduces a lot extra complexity for data users (see above), which makes it not very attractive to use RDF in any client. Whereas most languages and datatypes have key-value uniqueness that allow for unambiguous value selection, RDF clients have to deal with the possibility that multiple triples with the same subject-predicate combination might exist. It also introduces a different problem: How should you interpret a set of subject-predicate combinations? Does this represent a non-ordered collection, or did something to wrong with setting values?
In the RDF world, I've seen many occurences of both.

Atomic Data requires subject-property uniqueness, which means that these issues are no more. However, in order to guarantee this, and still retain graph merge-ability we also need to limit who creates statements about a subject:

Limiting subject usage

RDF allows that anne.com creates and hosts statements about the subject john.com. In other words, domain A creates statements about domain B. It allows anyone to say anything about any subject, thus allowing for extending data that is not under your control.

For example, developers at both Ontola and Inrupt (two companies that work a lot with RDF) use this feature to extend the Schema.org ontology with translations. This means they can still use standards from Schema.org, and have their own translations of these concepts.

However, I think this is a flawed approach. In the example above, two companies are adding statements about a subject. In this case, both are adding translations. They're doing the same work twice. And as more and more people will use that same resource, they will be forced to add the same translations, again and again.

I think one of the core perks of linked data, is being able to make your information highly re-usable. When you've created statements about an external thing, these statements are hard to re-use.

This means that someone using RDF data about domain B cannot know that domain B is actually the source of the data. Knowing where data comes from is one of the great things about URIs, but RDF does not require that you can think of subjects as the source of data. Many subjects in RDF don't actually resolve to all the known triples of the statement. It would make the conceptual model way simpler if statements about a subject could only be made from the source of the domain owner of the subject. When triples are created about a resource in a place other than where the subject is hosted, these triples are hard to share.

The way RDF projects deal with this, is by using named graphs. As a consequence, all systems that use these triples should keep track of another field for every atom. To make things worse, it makes subject-predicate impossible to guarantee. That's a high price to pay.

I've asked two RDF developers (who did not know each other) working on RDF about limiting subject usage, and both were critical. Interestingly, they provided the same usecase for using named graphs that would conflict with the limiting subject usage constraint. They both wanted to extend the schema.org ontology by adding properties to these items in a local graph. I don't think even this usecase is appropriate for named graphs. They were actually using an external resource that did not provide them with the things they needed. The things that they would add (the translations) are not re-usable, so in the end they will just keep spreading a URL that doesn't provide people with the things that they will come to expect. The schema.org URL still won't provide the translations that they wrote! I believe a better solution is to copy the resource (in this case a part of the schema.org ontology), and extend it, and host it somewhere else, and use that URL. Or even better: have a system for sharing your change suggestions with the source of the data, and allow for easy collaboration on ontologies.

No more literals / named nodes

In RDF, an object can either be a named node, blank node or literal. A literal has a value, a datatype and an optional language (if the literal is a string). Although RDF statements are often called triples, a single statement can consist of five fields: subject, predicate, object, language, datatype. Having five fields is way more than most information systems. Usually we have just key and value. This difference leads to compatibility issues when using RDF in applications. In practice, clients have to run a lot of checks before they can use the data - which makes RDF in most contexts harder to use than something such as JSON.

Atomic Data drops the named node / literal distinction. We just have values, and they are interpreted by looking at the datatype, which is defined in the property. When a value is a URL, we don't call it a named node, but we simply use a URL datatype.

Requiring URLs

A URL (Uniform Resource Locator) is a specific and cooler version of a URI (Uniform Resource Identifier), because a URL tells you where you can find more information about this thing (hence Locator).

RDF allows any type of URIs for subject and predicate value, which means they can be URLs, but don't have to be. This means they don't always resolve, or even function as locators. The links don't work, and that restricts how useful the links are. Atomic Data takes a different approach: these links MUST Resolve. Requiring Properties to resolve is part of what enables the type system of Atomic Schema - they provide the shortname and datatype.

Requiring URLs makes things easier for data users, but makes things a bit more difficult for the data producer. With Atomic Data, the data producer MUST offer the data at the URL of the subject. This is a challenge that requires tooling, which is why I've built Atomic-Server: an easy to use, performant, open source data management sytem.

Making sure that links actually work offer tremendous benefits for data consumers, and that advantage is often worth the extra trouble.

Replace blank nodes with paths

Blank (or anonymous) nodes are RDF resources with identifiers that exist only locally. In other words, their identifiers are not URLs. They are sometimes also called anonymous nodes. They make life easier for data producers, who can easily create (nested) resources without having to mint all the URLs. In most non-RDF data models, blank nodes are the default. For example, we nest JSON object without thinking twice.

Unfortunately, blank nodes tend to make things harder for clients. These clients will now need to keep track of where these blank nodes came from, and they need to create internal identifiers that will not collide. Cache invalidation with blank nodes also becomes a challenge. To make this a bit easier, Atomic Data introduces a new way of dealing with names of things that you have not given a URL yet: Atomic Paths.

Since Atomic Data has subject-predicate uniqueness (like JSON does, too), we can use the path of triples as a unique identifier:

https://example.com/john https://schema.org/employer

This prevents collisions and still makes it easy to point to a specific value.

Serialization formats are free to use nesting to denote paths - which means that it is not necessary to include these path strings explicitly in most serialization formats, such as in JSON-AD.

Combining datatype and predicate

Having both a datatype and a predicate value can lead to confusing situations. For example, the schema:dateCreated Property requires an ISO DateTime string (according to the schema.org definition), but using a value true with an xsd:boolean datatype results in perfectly valid RDF. This means that client software using triples with a schema:dateCreated predicate cannot safely assume that its value will be a DateTime. So if the client wants to use schema:dateCreated values, the client must also specify which type of data it expects, check the datatype field of every Atom and provide logic for when these don't match. Also important combining datatype and predicate fits the model of most programmers and languages better - just look at how every single struct / model / class / shape is defined in programming languages: key: datatype. This is why Atomic Data requires that a predicate links to a Property which must have a Datatype.

Adding shortnames (slugs / keys) in Properties

Using full URI strings as keys (in RDF predicates) results in a relatively clunky Developer Experience. Consider the short strings that developers are used to in pretty much all languages and data formats (object.attribute). Adding a required / tightly integrated key mapping (from long URLs to short, simple strings) in Atomic Properties solves this issue, and provides developers a way to write code like this: someAtomicPerson.bestFriend.name => "Britta". Although the RDF ecosystem does have some solutions for this (@context objects in JSON-LD, @prefix mappings, the @ontologies library), these prefixes are not defined in Properties themselves and therefore are often defined locally or separate from the ontology, which means that developers have to manually map them most of the time. This is why Atomic Data introduces a shortname field in Properties, which forces modelers to choose a 'key' that can be used in ORM contexts.

Adding native arrays

RDF lacks a clear solution for dealing with ordered data, resulting in confusion when developers have to create lists of content. Adding an Array data type as a base data type helps solve this. (discussion)

Adding a native state changes standard

There is no integrated standard for communicating state changes. Although linked-delta and rdf-delta do exist, they aren't referred to by the RDF spec. I think developers need guidance when learning a new system such as RDF, and that's why Atomic Commits is included in this book.

Adding a schema language and type safety

A schema language is necessary to constrain and validate instances of data. This is very useful when creating domain-specific standards, which can in turn be used to generate forms or language-specific types / interfaces. Shape validations are already possible in RDF using both SHACL and SHEX, and these are both very powerful and well designed.

However, with Atomic Data, I'm going for simplicity. This also means providing an all-inclusive documentation. I want people who read this book to have a decent grasp of creating, modeling, sharing, versioning and querying data. It should provide all information that most developers (new to linked data) will need to get started quickly. Simply linking to SHACL / SHEX documentation could be intimidating for new developers, who simply want to define a simple shape with a few keys and datatypes.

Also, SHACL requires named graphs (which are not specified in Atomic Data) and SHEX requires a new serialization format, which might limit adoption. Atomic Data has some unique constrains (such as subject-predicate uniqueness) which also might make things more complicated when using SHEX / SHACL.

However, it is not the intention of Atomic Data to create a modeling abstraction that is just as powerful as the ones mentioned above, so perhaps it is better to include a SHACL / SHEX tutorial and come up with a nice integration of both worlds.

A new name, with new docs

Besides the technical reasons described above, I think that there are social reasons to start with a new concept and give it a new name:

  • The RDF vocabulary is intimidating. When trying to understand RDF, you're likely to traverse many pages with new concepts: literal, named node, graph, predicate, named graph, blank node... The core specification provides a formal description of these concepts, but fails to do this in a way that results in quick understanding and workable intuitions. Even experienced RDF developers tend to be confused about the nuances of the core model.
  • There is a lack of learning resources that provide a clear, complete answer to the lifecycle of RDF data: modeling data, making data, hosting it, fetching it, updating it. Atomic Data aims to provide an opinionated answer to all of these steps. It feels more like a one-stop-shop for questions that developers are likely to encounter, whilst keeping the extendability.
  • All Core / Schema URLs should resolve to simple, clear explanations with both examples and machine readable definitions. Especially the Property and Class concepts.
  • The Semantic Web community has had a lot of academic attention from formal logic departments, resulting in a highly developed standard for knowledge modeling: the Web Ontology Language (OWL). While this is mostly great, its open-world philosophy and focus on reasoning abilities can confuse developers who are simply looking for a simple way to share models in RDF.

Convert RDF to Atomic Data

  • All the subject URLs MUST actually resolve, and return all triples about that subject. All blank nodes should be converted into URLs. Atomic Data tools might help to achieve this, for example by hosting the data.
  • All predicates SHOULD resolve to Atomic Properties, and these SHOULD have a datatype. You will probably need to change predicate URLs to Atomic Property URLs, or update the things that the predicate points to to include the required Atomic Property items (e.g. having a Datatype and a Shortname). This also means that the datatype in the original RDF statement can be dropped.
  • Literals with a language tag are converted to TranslationBox resources, which also means their identifiers must be created. Keep in mind that Atomic Data does not allow for blank nodes, so the TranslationBox identifiers must be URLs.

Step by step, it entails:

  1. Set up some server to make sure the URLs will resolve.
  2. Create (or find and refer to) Atomic Properties for all the predicates. Make sure they have a DataType and a Shortname.
  3. If you have triples about a subject that you don't control, change the URL to some that you can control, and refer to that external resource.

Atomic Data will need tooling to facilitate in this process. This tooling should help to create URLs, Properties, and host everything on an easy to use server.

Convert Atomic data to RDF

Since all Atomic Data is also valid RDF, it's trivial to convert / serialize Atoms to RDF. This is why atomic can serialize Atomic Data to RDF. (For example, try atomic-cli get https://atomicdata.dev/properties/description --as n3)

However, contrary to Atomic Data, RDF has optional Language and Datatype elements in every statement. It is good practice to use these RDF concepts when serializing Atomic Data into Turtle / RDF/XML, or other RDF serialization formats.

  • Convert Atoms with linked TranslationBox Resources to Literals with an xsd:string datatype and the corresponding language in the tag.
  • Convert Atoms with ResourceArrays to Collections that are native to that serialization format.
  • Dereference the Property and Datatype from Atomic Properties, and add the URLs in datatypes in RDF statements.

Atomic Data and Solid

The Solid project is an initiative by the inventor of linked data and the world wide web: sir Tim Berners-Lee. In many ways, it has similar goals to Atomic Data:

  • Decentralize the web
  • Make things more interoperable
  • Give people more control over their data

Technically, both are also similar:

  • Usage of personal servers, or PODs (Personal Online Datastores). Both Atomic Data and Solid aim to provide users with a highly personal server where all sorts of data can be stored.
  • Usage of linked data. All Atomic Data is valid RDF, which means that all Atomic Data is compatible with Solid. However, the other way around is more difficult. In other words, if you choose to use Atomic Data, you can always put it in your Solid Pod.

But there are some important differences, too, which will be explained in more detail below.

  • Atomic Data uses a strict built-in schema to ensure type safety.
  • Atomic Data standardizes state changes (which also provides version control / history, audit trails)
  • Atomic Data is more easily serializable to other formats (like JSON)
  • Atomic Data has a different model for Authorzation and Hierarchies
  • Atomic Data is less mature, and currently lacks things like authentication for read Access

Disclaimer: I've been quite involved in the development of Solid, and have a lot of respect for all the people who are working on it. The following is not meant as a critique on Solid, let alone the individuals working on it.

Atomic Data is type-safe, because of its built-in schema

Atomic Data is more strict than Solid - which means that it only accepts data that conforms to a specific shape. In a Solid Pod, you're free to add any shape of data that you like - it is not validated by some schema. Yes, there are some efforts of using SHACL or SHEX to constrain data before putting it in, but as of now it is not part of the spec or any implementation that I know of. A lack of schema strictness can be helpful during prototyping and rapid development, especially if you write data by hand, but it also limits how easy it is to build reliable apps with that data. Atomic Data aims to be very friendly for developers that re-use data, and that's why we take a different approach: all data must be validated by Atomic Schema before it's stored on a server. This means that all Atomic Properties will have to exist on a publicly accessible URL, before the property can be used somewhere.

You can think of Atomic Data more like a (dynamic) SQL database that offers guarantees about its content type, and a Solid Pod more like a document store that takes in all kinds of content. Most of the differences have to do with how Atomic Schema aims to make linked data easier to work with, but that is covered in the previous RDF chapter.

Atomic Data standardizes state changes (event sourcing)

With Solid, you change a Resource by sending a POST request to the URL that you want to change. With Atomic, you change a Resource by sending a signed Commit that contains the requested changes to a Server.

Event sourcing means that all changes are stored (persisted) and used to calculate the current state of things. In practice, this means that users get a couple of nice features for free:

  • Versioning for all items by default. Storing events means that these events can be replayed, which means you get to traverse time / undo / redo.
  • Edit / audit log for everything. Events contain information about who made which change at which point in time. Can be useful for finding out why things are the way they are.
  • Easier to add query options / indexes. Any system can play-back the events, which means that the events can be used as an API to add new query options / fill new indexes. This is especially useful if you want to add things like full-text search, or some geolocation index.

It also means that, compared to Solid, there is a relatively simple and strict API for changing data. Atomic Data has a uniform write API. All changes to data are done by posting Commits to the /commits endpoint of a Server. This removes the need to think about differences between all sorts of HTTP methods like POST / PUT / PATCH, and how servers should reply to that.

Atomic Data is more easily serializable to other formats (like JSON)

Atomic Data is designed with the modern developer in mind. One of the things that developers expect, is to be able to traverse (JSON) objects easily. Doing this with RDF is not easily possible, because doing this requires subject-predicate uniqueness. Atomic Data does not have this problem (properties must be unique), which means that traversing objects becomes easy.

Another problem that Atomic Data solves, is dealing with long URLs as property keys. Atomic Data uses shortnames to map properties to short, human-readable strings.

For more information about these differences, see the previous RDF chapter.

Hierarchy model, authorization, authentication

Atomic Data identities (Agents) are a combination of HTTP based, and cryptography (public / private key) based. In Atomic, all actions (from GET requests to Commits) are signed using the private key of the Agent. This makes Atomic Data a bit more unconventional, but also makes its auth mechanism very decentralized and lightweight.

Solid uses HTTP based WebID identifiers combined with an OIDC flow.

Atomic Data uses parent-child hierarchies to model data and performan authorization checks. This closely resembles how filesystems work, and is therefore familiar to most users.

Solid is more mature

Atomic Data has significant gaps at this moment - not just in the implementations, but also in the spec. This makes it not yet usable for most applications. Here's a list of things missing in Atomic Data, with links to their open issues and links to their existing Solid counterpart.

  • No inbox or notifications (issue)
  • No support from a big community, a well-funded business or the inventor of the world wide web.

How does Atomic Data relate to JSON?

Because JSON is so popular, Atomic Data is designed with JSON in mind.

Atomic Data is often (by default) serialized to JSON-AD, which itself uses JSON. JSON-AD uses URLs as keys, which is what gives Atomic Data many of its perks, but using these long strings as keys is not very easy to use in many contexts. That's why you can serialize Atomic Data to simple, clean JSON.

From Atomic Data to plain JSON

The JSON keys are then derived from the shortnames of properties. For example, we could convert this JSON-AD:

  "@id": "https://atomicdata.dev/properties/description",
  "https://atomicdata.dev/properties/datatype": "https://atomicdata.dev/datatypes/markdown",
  "https://atomicdata.dev/properties/description": "A textual description of something. When making a description, make sure that the first few words tell the most important part. Give examples. Since the text supports markdown, you're free to use links and more.",
  "https://atomicdata.dev/properties/isA": [
  "https://atomicdata.dev/properties/shortname": "description"

... into this plain JSON:

  "@id": "https://atomicdata.dev/properties/description",
  "datatype": "https://atomicdata.dev/datatypes/markdown",
  "description": "A textual description of something. When making a description, make sure that the first few words tell the most important part. Give examples. Since the text supports markdown, you're free to use links and more.",
  "is-a": [
  "shortname": "description"

Note that when you serialize Atomic Data to plain JSON, some information is lost: the URLs are no longer there. This means that it is no longer possible to find out what the datatype of a single value is - we now only know if it's a string, but not if it actually represents a markdown string or something else. Most Atomic Data systems will therefore not use this plain JSON serialization, but for some clients (e.g. a front-end app), it might be easier to use the plain JSON, as the keys are easier to write than the long URLs that JSON-AD uses.


Atomic Data requires a bit more information about pieces of data than JSON tends to contain. Let's take a look at a regular JSON example:

  "name": "John",
  "birthDate": "1991-01-20"

We need more information to convert this JSON into Atomic Data. The following things are missing:

  • What is the Subject URL of the resource being described?
  • What is the Property URL of the keys being used? (name and birthDate), and consequentially, how should the values be parsed? What are their DataTypes?

In order to make this conversion work, we need to link to three URLs that resolve to atomic data resources. The @id subject should resolve to the Resource itself, returning the JSON-AD from below. The Property keys (e.g. "https://example.com/properties/name") need to resolve to Atomic Properties.

  "@id": "https://example.com/people/john",
  "https://example.com/properties/name": "John",
  "https://example.com/properties/birthDate": "1991-01-20"

In practice, the easiest approach to make this conversion, is to create the data and host it using software like Atomic Server.

From Atomic Data to JSON-LD

Atomic Data is a strict subset of RDF, and the most popular serialization of RDF for JSON data is JSON-LD.

Since Atomic Schema requires the presence of a key slug in Properties, converting Atomic Data to JSON results in dev-friendly objects with nice shorthands.

  "@id": "https://example.com/people/John",
  "https://example.com/properties/lastname": "John",
  "https://example.com/properties/bestFriend": "https://example.com/sarah",

Can be automatically converted to:

  "@context": {
    "@id": "https://example.com/people/John",
    "name": "https://example.com/properties/lastname",
    "bestFriend": "https://example.com/properties/bestFriend",
  "name": "John",
  "bestFriend": {
    "@id": "https://example.com/sarah"

The @context object provides a mapping to the original URLs.

JSON-AD and JSON-LD are very similar by design, but there are some important differences:

  • JSON-AD is designed just for atomic data, and is therefore easier and more performant to parse / serialize.
  • JSON-LD uses @context to map keys to URLs. Any type of mapping is valid. JSON-AD, on the other hand, doesn't map anything - all keys are URLs.
  • JSON-LD uses nested objects for links and sequences, such as @list. JSON-AD does not.
  • Arrays in JSON-LD do not indicate ordered data - they indicate that for some subject-predicate combination, multiple values exist. This is a result of how RDF works.

JSON-LD Requirements for valid Atomic Data

  • Make sure the URLs used in the @context resolve to Atomic Properties.
  • Convert JSON-LD arrays into ResourceArrays
  • Creating nested JSON objects is possible (by resolving the identifiers from @id relations), but it is up to the serializer to decide how deep this object nesting should happen.

Note that as of now, there are no JSON-LD parsers for Atomic Data.

Atomic Data and IPFS

What is IPFS

IPFS (the InterPlanetary File System) is a standard that enables decentralized file storage and retrieval using content-based identifiers. Instead of using an HTTP URL like http://example.com/helloworld, it uses the IPFS scheme, such as ipfs:QmX6j9DHcPhgBcBtZsuRkfmk2v7G5mzb11vU9ve9i8vDsL. IPFS identifies things based on their unique content hash (the long, seemingly random string) using a thing called a Merkle DAG (this great article explains it nicely). This is called a CID, or Content ID. This simple idea (plus some not so simple network protocols) allows for decentralized, temper-proof storage of data. This fixes some issues with HTTP that are related to its centralized philosophy: no more 404s!

Why is IPFS interesting for Atomic Data

Atomic Data is highly dependent on the availability of Resources, especially Properties and Datatypes. These resources are meant to be re-used a lot, and when these go offline or change (for whatever reason), it could cause issues and confusion. IPFS guarantees that these resources are entirely static, which means that they cannot change. This is useful when dealing with Properties, as a change in datatype could break things. IPFS also allows for location-independent fetching, which means that resources can be retrieved from any location, as long as it's online. This Peer-to-peer functionality is a very fundamental advantage of IPFS over HTTP, especially when the resources are very likely to be re-use, which is especially the case for Atomic Data Properties.

Considerations using IPFS URLs

IPFS URLs are static, which means that their contents can never change. This is great for some types of data, but not so much for others. If you're describing a time-dependent thing (such as a person's job), you'll probably want to know what the current value is, and that is not possible when you only have an IPFS identifier. This can be fixed by including an HTTP URL in IPFS bodies.

IPFS data is also hard to remove, as it tends to be replicated across machines. If you're describing personal, private information, it can therefore be a bad idea to use IPFS.

And finally, its performance is typically not as good as HTTP. If you know the IPFS gateway that hosts the IPFS resource that you're looking for, things improve drastically. Luckily for Atomic Data, this is often the case, as we know the HTTP url of the server and could try whether that server has an IPFS gateway.

Atomic Data and IPLD

IPLD (not IPFS) stands for InterPlanetary Linked Data, but is not related to RDF. The scope seems fundamentally different from RDF, too, but I have to read more about this.

Atomic Data and SQL

Atomic Data has some characteristics that make it similar and different from SQL.

  • Atomic Data has a dynamic schema. Any Resource could have different properties. However, the properties themselves are validated (contrary to most NOSQL solutions)
  • Atomic Data separates reading and writing, whereas SQL has one language for both.
  • Atomic Data has a standardized way of storing changes (Commits)

Tables and Rows vs. Classes and Properties

At its core, SQL is a query language based around tables and rows. The tables in SQL are similar to Classes in Atomic Data: they both define a set of properties which an item could have. Every single item in a table is called a row in SQL, and a Resource in Atomic Data.

Identifiers: numbers vs. URLs

In SQL, rows have numbers as identifiers, whereas in Atomic Data, every resource has a resolvable HTTP URL as an identifier. This allows Atomic Data records to be easily re-used by other systems, as there is a guarantee that identifiers will be globally unique.

Dynamic vs static schema

In SQL, the schema of the database defines which shape the data can have, which properties are required, what datatypes they have. In Atomic Data, the schema exists as a Resource on the web, which means that they can be retrieved using HTTP. An Atomic Database (such as Atomic-Server) uses a dynamic schema, which means that any Resource can have different properties, and the properties themselves can be validated, even when the server is not aware of these properties beforehand. In SQL, you'd have to manually adjust the schema of your database to add a new property. Atomic Data is a decentralized, open system, which can read new schema data from other sources. SQL is a centralized, closed system, which relies on the DB manager to define the schema.


The SQL query language is for both reading and writing data. In Atomic Data a distinction is made between Query and Command - getting and setting (Command Query Responsibility Segregation, CQRS). The Query side is handled using Subject Fetching (sending a GET request to a URL, to get a single resource) and Collections (filtering and sorting data). The Command side is typically done using Atomic Commits, although you're free not to use it.

SQL is way more powerful, as a query language. In SQL, the one creating the query basically defines the shape of a table that is requested, and the database returns that shape. Atomic Data does not offer such functionality. So if you need to create custom tables at runtime, you might be better off using SQL, or move your Atomic Data to a query system.


Is Atomic Data NOSQL or SQL?

Generally, Atomic Data apps do not use SQL - so they are NOSQL. Atomic-server, for example, internally uses a key-value store (sled) for persistence.

Like most NOSQL systems, Atomic Data does not limit data entries to a specific table shape, so you can add any property that you like to a resource. However, unlike most NOSQL systems, Atomic Data does perform validations on each value. So in a way, Atomic Data tries to combine best of both worlds: the extendibility and flexibility of NOSQL, with the type safety of SQL.

Is Atomic Data transactional / ACID?

Yes, if you use Atomic-Server, then you can only write to the server by using Atomic Commits, which are in fact transactions. This means that if part of the transaction fails, it is reverted - transactions are only applied when they are 100% OK. This prevents inconsistent DB states.

Can I use a SQL database with Atomic Data?

Yes, if you want to make your existing project serve Atomic Data, you can keep your existing SQL database, see the upgrade guide. When you want to import arbitrary Atomic Data, it might be easier to use atomic-server. If you want to store arbitrary Atomic Data in a SQL database, you might be best off by creating a Resources table with a subject and a propertyValues column, or create both a properties table and a resources one.

Atomic Data and Graph Databases

Atomic Data fundamentally is a graph data model. We can think of Atomic Resources as nodes, and links to other resources through properties as edges.

In this section, we'll explore how Atomic Data relates to some graph technologies.

Comparing Atomic Data to Neo4j

Neo4j is a popular graph database that supports multiple query languages. The first difference is that Atomic Data is not a single piece of software but a specification. However, we can compare Neo4j as a product with the open source Atomic-Server. Atomic-Server is fully open source and free (MIT licensed), whereas Neo4j is partially open source and GPL licensed.

Labeled Property Graph

The data model of Neo4j features a labeled property graph, which means that edges (relationships between nodes) can have their own properties. This can be useful when adding data to relationship between nodes. For example: in the john - (knows) -> mary relationship, you might want to specify for how long they have known each other. In Neo4j, we can add this data to the labeled property graph.

In Atomic Data, we'd have to make a new resource to describe the relation between the two, if we wanted to add information about the relationship itself. This is called reification. This process can be time consuming, especially in Atomic Data, as this means that you'll have to specify the Class of this relationship and its properties. However, one benefit of this approach, is that the relationship itself becomes clearly defined and re-usable. Another benefit is that the simpler model of Atomic Data maps perfectly to datamodels like JSON, which makes things very convenient and familiar for developers.

Query language vs REST

Neo4j supports multiple query languages, but its mainly known for Cypher. It is used for doing practically everything: reading, writing, modelling, and more.

Atomic Data on the other hand does not have a query language. It uses a RESTful HTTP + JSON-AD approach for everything. Atomic Data uses Endpoints for specific goals that you'd do in a query language:

  • Collections (which can filter by Property or Value, and sort by any Property) to generate lists of resources
  • Paths for traversing graphs by property

And finally, data is written using Commits. Commits are very strict, as each one describes modifications to individual resources, and every Commits has to be signed. This means that with Atomic Data, we get versioning + audit trails for all data, but at the cost of more storage requirements and a bit more expensive write process.

Schema language and type safety

In Neo4j, constraints can be added to the database by Atomic Data uses Atomic Schema for validating datatypes and required properties in resources.

Other differences

Upgrade your existing application to Atomic Data

If you want to make your existing project compatible with Atomic Data, you probably don't have to get rid of your existing storage / DB implementation. The only thing that matters, is how you make the data accessible to others: the serialization. You can keep your existing software and logic, but simply change the last little part of your API. In short, this is what you'll have to do:

  • Map all properties of resources to Atomic Properties. Either use existing ones, or create new ones and make them accessible (using any Atomic Server, as long as the URLs of the properties resolve).
  • Make sure that when the user requests some URL, that you return that resource as a JSON-AD object (at the very least if the user requests it using an HTTP Accept header).

Don't feel obliged to implement all parts of the Atomic Data spec, such as Collections and Commits.

If you need any help, get in touch in our Discord

Various Use Cases for Atomic Data

Most of this book is either abstract or technical, but this section aims to be different. In this section, we'll present concrete examples of things that can be built with Atomic Data. Although you could use Atomic Data for pretty much any type of application, it is especially valuable where data re-use, standardization, and data ownership are important.

Atomic Data for personal data stores

A Personal Data Store (or personal data service) is a place where you store all sorts of personal information. For example a list of contacts, todo items, pictures, or your profile data. Not that long ago, the default for this was the my Documents folder on your hard drive. But as web applications became better, we started moving our data to the cloud. More and more of our personal information is stored by large corporations who use the information to build profiles to show us ads. And as cloud consumers, we often don't have the luxury of moving our personal data to a place to where we want it to be. Many services don't even provide export functionality, and even if they do, the exports often lack information or are not interoperable with other apps.

Atomic Data could help to re-introduce data ownership. Because the specification helps to standardize information, it becomes easier to make data interoperable. And even more important: Apps don't need their own back-end - they can use the same personal data store: an Atomic Server (such as this one).

Realizing this goal requires quite a bit of work, though. This specification needs to mature, and we need reliable implementations. We also need proper tutorials, libraries and tools that convince developers to use atomic data to power their applications.

Atomic Data for e-commerce & marketplaces

Buying good and services on the internet is currently responsible for about 15% of all commerce, and is steadily climbing. The internet makes it easier to find products, compare prices, get information and reviews, and finally order something. But the current e-commerce situation is far from perfect, as large corporations tend to monopolize, which means that we have less competition which ultimately harms prices and quality for consumers. Atomic Data can help empower smaller businesses, make searching for specific things way easier and ultimately make things cheaper for everyone.

Decentralize platform / sharing economy service marketplaces

Platforms like Uber, AirBNB and SnapCar are virtual marketplaces that help people share and find services. These platforms are responsible for:

  1. providing an interface for managing offers (e.g. describe your car, add specifications and pricing)
  2. hosting the data of the offers themselves (make the data available on the internet)
  3. providing a search interface (which means indexing the data from all the existing offers)
  4. facilitating the transaction / payments
  5. provide trust through reviews and warranties (e.g. refunds if the seller fails to deliver)

The fact that these responsibilities are almost always combined in a single platforms leads to vendor lock-in and an uncompetitive landscape, which ultimately harms consumers. Currently, if you want to manage your listing / offer on various platforms, you need to manually adjust it on all these various platforms. Some companies even prohibit offering on multiple platforms (which is a legal problem, not a technical one). This means that the biggest (most known) platforms have the most listings, so if you're looking for a house / car / rental / meal, you're likely to go for the biggest business - because that's the one that has the biggest assortment.

Compare this to how the web works: every browser should support every type of webpage, and it does not matter where the webpage is hosted. I can browse a webpage written on a mac on my windows machine, and I can read a webpage hosted by amazon on an google device. It does not matter, because the web is standardized and open, instead of being centralized and managed by one single company as proprietary data. This openness of the web means that we get search engines like Google and Bing that scrape the web and add it to their index. This results in a dynamic where those who want to sell their stuff will need to share their stuff using an open standard (for webpages things like HTML and sometimes a bit of metadata), so crawlers can properly index the webpages. We could do the same thing for structured data instead of pages, and that's what Atomic Data is all about.

Let's discuss a more practical example of what this could mean. Consider a restaurant owner who currently uses UberEats as their delivery platform. Using Atomic Data, they could define their menu on their own website. The Atomic Schema specification makes it easy to standardize how the data of a menu item looks like (e.g. price, image, title, allergens, vegan...). Several platforms (potentially modern variants of platforms like JustEat / UberEats) could then crawl this standardized Atomic Data, index it, and make it easily searchable. The customer would use one (or multiple) of these platforms, that would probably have the exact same offers. Where these platforms might differ, is in their own service offering, such as delivery speed or price. This would result in a more competitive and free market, where customers would be able to pick a platform based on their service price and quality, instead of their list of offerings. It would empower the small business owner to be far more flexible in which service they will do business with.

Searching for products on the internet is mostly limited to text search. If we want to buy a jacket, we see tonnes of jackets that are not even available in our own size. Every single website has their own way of searching and filtering.

Imagine making a search query in one application, and sending that to multiple suppliers, after you'll receive a fully personalized and optimized list of products. Browsing in an application that you like to use, not bound to any one specific store, that doesn't track you, and doesn't show advertisements. It is a tool that helps you to find what you need, and it is the job of producers to accurately describe their products in a format that your product browser can understand.

How do we get there?

Well, for starters, producers and suppliers will need to reach a consensus on how to describe their articles. This is not new; for many products, we already have a common language. Shoes have a shoe size, televisions have a screen size in diagonal inches, brightness is measured in nits, etc. Describing this in a machine-readable and predictable format as data is the next logical step. This is, of course, where Atomic Schema could help. Atomic-server could be the connected, open source database that suppliers use to describe their products as data.

Atomic Data for Surveys

Surveys and Questionnaires haven't been evolving that much over the past few years. However, Atomic Data has a couple of unique characteristics that would make it especially suitable for surveys. It could help make surveys easier to fill in, easier to analyze, easier to create, and more privacy friendly.

  • Re-useable survey responses which enable pre-filled form fields which can save the respondent a lot of time. They also make it possible for users to use their own responses to gather insights, for example into their own health.
  • Question standardization which helps researchers to re-use (validated) questions, which saves time for the researcher
  • Privacy friendly, yet highly personalized invites as a researcher, send profile descriptions to servers, and let the servers tell if the question is relevant.

Re-useable survey responses

Since many surveys describe personal information, it makes sense, as a respondent, to have a way of storing the information you filled in in a place that you control. Making this possible enables a few nice use cases.

  1. Auto-fill forms. Previously entered response data could be usable while filling in new surveys. This could result in a UX similar to auto-filling forms, but far more powerful and rich than browsers currently support.
  2. Analyze your own personal data. Standardized survey responses could also be used to gather insights into your own personal information. For example, filling in a survey about how your shortness of breath linked to air pollution has been today could be used in a different app to make a graph that visualizes how your shortness of breath has progressed over the months for personal insight.

Achieving something like this requires a high degree of standardization in both the surveys and the responses. The survey and its questions should provide information about:

  • The question. This is required in all survey questions, of course.
  • The required datatype of the response, such as 'string', or 'datetime' or some 'enumeration'.
  • A (link to a) semantic definition of the property being described. This is a bit more obscure: all pieces of linked data use links, instead of keys, to describe the relation between some resource and its property. For example, a normal resource might have a 'birthdate', while in linked data, we'd use 'https://schema.org/birthDate'. This semantic definition makes things easier to share, because it prevents misinterpretation. Links remove ambiguity.
  • A query description. This is even more obscure, but perhaps the most interesting. A query description means describing how a piece of information can be retrieved. Perhaps a question in a survey will want to know what your payment pointer is. If a piece of software wants to auto-fill this field, it needs to know where it can find your payment pointer.

Question Standardization

We can think of Questions as Resources that have a URL, and can be shared. Sharing questions like that can make it easier to use the same questions across surveys, which in turn can make it easier to interpret data. Some fields (e.g. medical) have highly standardized questions, which have been validated by studies. These Question resources should contain information about:

  • The question itself and its translations
  • The datatype of the response (e.g. date, string, enum), denoted by the Property of the response.
  • The path of the data, relative to the user. For example, a user's birthdate can be found by going to / profile birthdate

Atomic Schema and Atomic Paths can be of value here.

Privacy friendly invites with client-side filtering

Currently, a researcher needs to either build their own panel, or use a service that has a lot of respondents. Sometimes, researchers will need a very specific target audience, like a specific age group, nationality, gender, or owners of specific types of devices. Targeting these individuals is generally done by having a large database of personal information from many individuals. But there is another way of doing this: client-side filtering Instead of asking for the users data, and storing it centralized, we could send queries to decentralized personal data stores. There queries basically contain the targeting information and an invitation. The query is executed on the personal data store, and if the user characteristics align with the desired participants profile, the user receives an invite. The user only sees invitations that are highly relevant, without sharing any information with the researcher.

The Atomic Data specification solves at least part of this problem. Paths are used to describe the queries that researchers make. Atomic Server can be used as the personal online data store.

However, we still need to specify the process of sending a request to an individual (probably by introducing an inbox)

Atomic Data and Verifiable Credentials / SSI

What are Verifiable Credentials / Self-Sovereign Identity

Verifiable Credentials are pieces of information that have cryptographic proof by some reliable third party. For example, you could have a credential that proves your degree, signed by your education. These credentials an enable privacy-friendly transactions where a credential owner can prove being part of some group, without needing to actually identify themselves. For example, you could prove that you're over 18 by showing a credential issued by your government, without actually having to show your ID card with your birthdate. Verifiable Credentials are still not that widely used, but various projects exists that have had moderate success in implementing it.

What makes Atomic Data suitable for this

Firstly, Atomic Commit are already verifiable using signatures that contain all the needed information. Secondly, Atomic Schema can be used for standardizing Credential Schemas.

Every Atomic Commit is a Verifiable Credential

Every time an Agent updates a Resource, an Atomic Commit is made. This Commit is cryptographically signed by an Agent, just like how Verfifiable Credentials are signed. In essence, this means that all atomic data created through commits is fully verifiable.

How could this verification work?

  • Find the Commit that has created / edited the value that you want to verify. This can be made easier with a specialized Endpoint that takes a resource, property and signer and returns the associated Commit(s).
  • Check the signer of the Commit. Is that an Agent that you trust?
  • Verify the signature of the Commit using the public key of the Agent.

Sometimes, credentials need to be revoked. How could revocation work?

  • Find the Commit (see above)
  • Get the signer (see above)
  • Find the /isRevoked Endpoint of that signer, send a Request there to make sure the linked Commit is still valid and not revoked.

Visit the issue on github to join the discussion about this subject.

Use Atomic Schema for standardizing Credentials

If you are a Verifier who wants to check someone's birthdate, you'll probably expect a certain datatype in return, such as a date that is formatted in some specific way. Atomic Schema makes it possible to express which properties are required in a certain Class, and it also makes it possible to describe which datatype is linked to a specific Property. Combined, they allow for fine-grained descriptions of models / classes / schemas.

Software and libraries for Atomic Data

Although Atomic Data is a specification, it also has reference implementations:

Open source (MIT licenced) software for Atomic Data:

Libraries (MIT licenced) to build apps with:



Server for hosting Atomic Data. Uses atomic-lib.

  • Responds to requests for created Atomic Resources, makes atomic data available at their URL.
  • Embedded database
  • Authorization, authentication, versioning, collections, pagination
  • Browser-friendly HTML presentation, JSON serialization, RDF serialization.

One liner: $ docker run -p 80:80 -p 443:443 -v atomic-storage:/atomic-storage joepmeneer/atomic-server


repository + issue tracker.


Data browser, powered by @tomic/lib and @tomic/react.

  • View & edit atomic data, using dynamic forms
  • Collections with pagination and sorting
  • Client-side full-text search

demo (same as atomic-server)

repository + issue tracker.


A tool for generating / querying Atomic Data from the command line. Install with cargo install atomic-cli.

atomic 0.20.0
Joep Meindertsma <joep@ontola.io>
Create, share, fetch and model linked atomic data!

    atomic-cli [SUBCOMMAND]

    -h, --help       Prints help information
    -V, --version    Prints version information

    destroy    Permanently removes a Resource. Uses Commits.
    edit       Edit a single Atom from a Resource using your text editor. Uses Commits.
    get        Traverses a Path and prints the resulting Resource or Value.
    help       Prints this message or the help of the given subcommand(s)
    list       List all bookmarks
    new        Create a Resource
    remove     Remove a single Atom from a Resource. Uses Commits.
    set        Update an Atom's value. Uses Commits.
    tpf        Finds Atoms using Triple Pattern Fragments.

Visit https://github.com/joepio/atomic for more info

repository + issue tracker.


@tomic/lib and @tomic/react

Javascript / typescript libraries, especially useful for creating front-end apps.

Fork the atomic-data-react-template on codesandbox to get started directly!

atomic-lib (Rust)

Library that powers atomic-server and atomic-cli. Features:

  • An in-memory store
  • Parsing (JSON-AD) / Serialization (JSON-AD, JSON-LD, TTL, N-Triples)
  • Commit validation and processing
  • TPF queries
  • Constructing Collections
  • Path traversal
  • Basic validation

repository + issue tracker.

Want to add to this list? Some ideas for tooling

This document contains a set of ideas that would help achieve that success. Open a PR and edit this file to add your project!

Atomic Companion

A mobile app for granting permissions to your data and signing things. See github issue.

  • Show a notification when you try to log in somewhere with your agent
  • Notifications for mentions and other social items
  • Check uptime of your server

Atomizer (data importer and conversion kit)

  • Import data from some data source (CSV / SQL / JSON / RDF), fill in the gaps (mapping / IRI creation / datatypes) an create new Atoms
  • Perhaps a CLI, library, GUI or a combination of all of these

Atomic Preview

  • A simple (JS) widget that can be embedded anywhere, which converts an Atomic Graph into an HTML view.
  • Would be useful for documentation, and as a default view for Atomic Data.
  • Use @tomic/react and @tomic/lib to get started

Atomic-Dart + Flutter

Library + front-end app for browsing / manipulating Atomic Data on mobile devices.



Special thanks to:

  • Thom van Kalkeren (my colleague, friend and programming mentor who came up with many great ideas on how to work with RDF, such as HexTuples and linked-delta)
  • Tim Berners-Lee (for everything he did for linked data and the web)
  • Ruben Verborgh (for doing great work with RDF, such as the TPF spec)
  • Pat McBennett (for lots of valuable feedback on initial Atomic Data docs)
  • Manu Sporny (for his work on JSON-LD, which was an important inspiration for JSON-AD)
  • Jonas Smedegaard (for the various interesting talks we had and the feedback he provided)
  • Arthur Dingemans (for sharing his thoughts, providing feedback and his valuable suggestions)
  • Anja Koopman (for all her support, even when this project ate away days and nights of our time together)
  • All the other people who contributed to linked data related standards

Subscribe to the Atomic Data newsletter

We'll send you an update (max once per month) when there's something relevant to share, such as

  • Major changes to the specification
  • Major new releases (with new features)
  • Use-cases, implementations
  • Tutorials, blog posts
  • Organizational / funding news

Click here to sign up to the Atomic Data Newsletter

Get involved

Atomic Data is an open specification, and that means that you're very welcome to share your thoughts and help make this standard as good as possible.

Things you can do: