Date: 2024-10-18
Maybe Not LevelDB all the things
Reminder of what LevelDB is,
LevelDB is just a sorted key value store.
Why did I care about LevelDB so much?
LevelDB is available in the browser on top of IndexedDB, has libraries for basically every programming language, can run on top of S3, and has a pretty epic Awesome page.
My goal with LevelDB was to make a provenance backed file system, basically S3 with version control that uses cryptographic signatures to manage history rather than just trusting the server.
I was successful with this goal, I got a proof of concept that works, though it still needs to take advantage of batching and stuff.
I have realized I can not easily write my Nostr Relay without creating indexes on on the author, tags, and timestamp. Having to implement these features basically turns LevelDB into a Document Database. And the idea of running a Document Database on S3 doesn't really make sense. In fact someone has literally done this. This nostr server runs on LevelDB and the server's author literally built their own Document Database on top of LevelDB to make it work.
Persistent Browser Based Relays Exist Already
I did some additional research and found a nostr relay implementation that can run in browsers using IndexedDB as well as SQLite both of which as implemented in nostrudel.
- SQLite
- IndexedDB
Pivoting from POC Signed Nostr Provenance to a public blogging engine
I have to remember that my goal is to build on top of Nostr not reinvent Nostr. One should aim to stand on the shoulders of giants rather than stand by themselves.
My gripe was that I wanted Nostr to integrate with Multiformats, support other cryptographic ciphers via Multicodec, and therefore be compatible with IPFS. I want to see integration with Multiformats for representing hashes, public keys, and signatures as well as hashing the nostr events themselves.
At this point it is probably better to temporally forget that Multiformats exists and build the version control using nostr native event ID's and Tags.
The general plan was to have the content of Nostr documents represented via CID and then have metadata events structure the CID's. CID's abstracts the Nostr Event Sender from the Content that is being produced, therefore we still need multiformats and CID's but the the scope is very specific.
Rather than using V1 CID's we can use V0 and just have an additional nostr TAG to say what format the CID is in such as Markdown, Raw Text, Latex, etc etc
The plan now looks like we are going to come up with our own Nostr event types for Directories/Documents as well as CID Meme's.
So would a directory just be a markdown file with a series of special dd URL's or something like that?
The real question is, is a directory a document and is a document a directory?
Everything on unix is a file, with directories just being a special kind of file. Therefore in this abstraction a Directory is just going to be a Document that contains specific CID Meme's of structured data, say JSON, that is embedded within the Document, that contains directory metadata. Also a document can contain a single directory metadata meme.
How does this relate to our the POC Signed Nostr Provenance we built?
The POC Signed Nostr Provenance was built using the constraints of LevelDB, a key sorted key value store that can be updated. A document and file system is also just a sorted key value store.
If a document has 10 sentences. They can be labeled 1 through 10. When a sentence is added after the third sentence it gets the label 3.1. When the sentence at position 5 gets removed there is no longer an index at position 5. Everything gets places relative to what is already there in the list. Without sentence 5 existing it will never be used as reference with future edits getting added under 4 and 5.
What about indexes within the list getting too long?
We deal with Abstraction. A single paragraph may be 10 sentences but that paragraph is a single entity that can be moved around and referenced wherever.
The problem we are actually dealing with here is how that of a linked list when Nostr produces an event log of imputable events.
Well it is probably easier to think of everything as a series of modular Markdown/Latex/HTML blocks that each have an ID and the document just provides a list or Graph of these ID's to be represented.
That sorta reminds me of how fed.wiki works.
It looks like UUID's are going to make a come back. The Cell/Blocks of Markdown/Latex/HTML are all going to need to have ID's and they won't have titles the same way a Document/Directory have.
Or do we just reference the OG nostr event ID that was in a specific position as it get's update, we can just think of it as replying to the same event over and over again. That makes more sense than generating a UUID.
So that means the document ID is just going to be a list or graph of event ID's, with each of the Event ID's being updated via replies to the index event. We can also link other events within context.
Should meme events function more as modular nodes and edges or are we adding more complex concepts here such as having a edge that connects three nodes?
What does this have to do with
- Publishing your PKMS
- Adding Tokenized RBAC to the PKMS via QE
- Web based PKMS that can replace Obsidian and Notion
- Raindrop.io Social Bookmarking
- ActivityWatch Event Streaming
- Hypothes.is Social Annotation
- Discord Clone with Bots and Moderation
- ChatGPT(Open WebUI) Clone
- Recompiling social media platforms to Nostr events
- Jira level Project Management
- AWS Mechanical Turk Clone
- More at Epic User Journeys
So can all those use cases work in the CGFS Schema - Core context?
Within the context of a knowledge graph. Everything is a matter of nodes and edges, but a node or edge can include metadata for graph traversal such as Root Meme, and summarizing the route the meme is trying to take before we navigate there via the raw recursive graph querying.
This is definitely a problem that will solve itself when the time comes. One step at a time and the problem is well defined, right now we are trying to solve a problem we have not even defined.
The problem now is how do we create a graph of CID's that can map the public version of this graph with version control. For the beginning we will have to do entire raw markdown documents until we come up with our own markdown parser that can separate everything via headings.
Outside Context Notes
Getting Autistic about Hashing
TLDR; the way Nostr Events are hashed and verified is compatible with Multiformats and IPFS.
I still really like the idea of Nostr Events themselves being Content Addressable but that is a fight we will have to revisit in the future.
NIP01 has a specific way setting up the content of nostr events to be hashed. This is basically the same thing as DAG-JSON and the other IPLD data formats. In fact it would be very simple to just give Nostr Events their own Multicodec CID V1 prefix that would go right in this table
In fact we should be able to test this using nostr-tools.
In fact here is the code, just start reading at verifyEvent
. The key line is line 45 where we see all the data from the JSON getting sorted via a list so there is no ambiguity about how to structure data when you hash it. The string form the serializeEvent
function is what would be stored on IPFS not the JSON string.
return JSON.stringify([0, evt.pubkey, evt.created_at, evt.kind, evt.tags, evt.content])
How does nostr-rs-relay do NIP-05 validation?
- If there is no tag in the event wit a NIP05 identifier there is no way to look a user up via their public key besides using a lookup service.
- One must reply to an event on a nostr-rs-relay to let the relay know they exist
- Then the relay looks up event 0 on that users relay and now they can use the nostr-rs-relay
Source: nostr-rs-relay: docs/user-verification-nip05.md