Share your repls and programming experiences

← Back to all posts
Repl DB
h
ApoorvSingal (105)

ReplDB

A full-fledged file-based database management system with JSON-like data storage format. It's completely asynchronous, can handle multiple connections and operations at once, is super fast (< 10ms latency for most operations), and uses a dedicated repl for managing the database.

Note: This is not the official Replit DB. I created it before the official one was released.

NOTE #1: A lot (really lot) of things are undocumented here. Since this post was getting too big, I decided to make a website for the docs (https://repldb.repl.co) but before I could complete it, my tablet (the only device I had) got damaged beyond repair (I dropped it), and now I can't code (probably for this whole year) cuz no money *crying*.

NOTE #2: @Lord_Poseidon implemented the ReplDB server in Golang (https://repl.it/@Lord_Poseidon/FantasticThirdHashmap) which is compatible with the same existing ReplDB client API and is actually faster than the NodeJS server I made in handling a small number of operations concurrently while the NodeJS server is better at handling a large number of operations concurrently. Here is the performance comparison between the two servers, https://repl.it/@ApoorvSingal/db-server-performance-comparison. You can use it to choose the right server for you. The Golang server doesn't have error handling and querying yet.

NOTE #3: Repldb can be made many times faster by exploiting the fact that replit gives more ram than file storage to repls. Therefore, we can always keep the entire database in RAM without worrying about heap overflow, making the reads from many times faster. Also, this allows us to use resource-intensive compression and encryption algorithms because we don't have to worry about the performance loss caused by them. But as mentioned in note #1, I can't work on all this right now.

API Reference

UPDATE #1: major change in doc.set() functionality, read docs below for info.
UPDATE #2: major change in doc.get() functionality, read docs below for info.
UPDATE #3: add database snapshots
UPDATE #4: add beautiful error messages for command execution failures.
UPDATE #5: add querying and multithreading (undocumented).

Contents

  • class DB

    • constructor(serverUrl)
    • db.init(key, [len])
    • db.list()
    • db.createCollection(name)
    • db.createSnap(name)
    • db.collection(name)
    • db.doc(name)
  • class Collection

    • collection.parent
    • collection.exists()
    • collection.list()
    • collection.createCollection(name)
    • collection.collection(name)
    • collection.doc(name)
    • collection.delete()
  • class Doc

    • doc.parent
    • doc.exists()
    • doc.set(content)
    • doc.set(content, [preserveRest])
    • doc.get()
    • doc.get([props])
    • doc.delete()
  • class Snap extends Collection

    • snap.save(target)

DB

  • constructor(serverUrl)

    • serverUrl <string> url of the database server (do not include the protocol in url)
    • Returns <DB>
  • db.init(key, [len])

    • key <string> a secret key used for authentication.
    • len <integer> (default = 5) length of command IDs (you can ignore this in most cases, more detailed info is given below).
    • Returns: <Promise> (resolves to undefined).

Performs authentication and a quick handshake with the database server. It is necessary to call db.init() before using the db.

const db = new DB;
await db.init(process.env.DB_KEY);
// use db here
// await db.collection("users").doc("Kakashi").set({ age: 173 });

  • db.list()
    • Returns: <Promise> (resolves to object[]).

Lists all the collections and docs of the db. The objects in the returned array are of format { name: string, type: string<"doc" | "collection" }.

const db = new DB(url);

db.list().then(stuff => {
  stuff.forEach(async child => {
    if(child.type == "doc"){
      console.log("Doc:", child.name);
      
      const doc = db.doc(child.name);
      console.log(child.name+"'s data:", await doc.get());
    }
    else {
      console.log("Collection:", child.name);
    }
  });
});

  • db.createCollection(name)
    • name <string> name of the collection.
    • Returns: <Promise> (resolves to the newly made collection).

Creates a new collection with name name.


  • db.createSnap(name)
    • name <string> name of the collection.
    • Returns: <Promise> (resolves to the newly made snap).

Creates a new empty snapshot of the database with name name.


  • db.collection(name)
    • name <string> name of the collection.
    • Returns <Collection>

Returns an already existing collection. It does not check whether the collection exists or not, you can do it later with collection.exists().

let collec = db.collection("hehe");

if(!(await collec.exits())){
  collec = db.createCollection("hehe");
}
// do stuff with collection
console.log(await collec.doc("Kaka").get());

  • db.doc(name)
    • name <string> name of the doc.
    • Returns <Collection>

Returns a document. Just like db.collection(name) it does not check whether the document exists or not, you can do it later with doc.exists()

let doc = db.doc("hehe");

if(!(await doc.exits())){
  doc.set({...defaultStuff});
}
// do stuff with doc
console.log(await doc.get());

Collection

  • collection.parent <DB> | <Collection>

Gives the parent of the collection.

  • collection.exists()
    • Returns: <Promise> (resolves to either true or false).

Checks whether the collection exists or not.


  • collection.list()
    • Returns: <Promise> (resolves to object[]).

Lists all the collections and docs of the db. The objects in the returned array are of format { name: string, type: string<"doc" | "collection" }. Same as db.list().


  • collection.createCollection(name)
    • Returns: <Promise> (resolves to newly made collection).

Creates a child collection inside collection.


  • collection.collection(name)
    • Returns: <Collection>

Same as db.collection(name) but returns child collection of collection.


  • collection.doc(name)
    • name <string> name of the doc.
    • Returns: <Doc>

Same as db.doc(name) but returns child doc of collection


  • collection.delete()
    • Returns: <Promise> (resolves to undefined)

Deletes collection.


Doc

  • doc.parent <DB> | <Collection>

Gives the parent of the doc.


  • doc.exists()
    • Returns: <Promise> (resolves to true or false)

Checks whether the document exists or not.


  • doc.set(content)

    • -content <object> the content for the doc.-
    • Returns: <Promise> (resolves to undefined)

Sets the document's data to content.


  • doc.set(content, preserveRest)

    • content <object> the content for the doc.
    • preserveRest <boolean> (default = false) whether to preserve the propert
    • Returns: <Promise> (resolves to undefined)

Sets the document's data to content.

const doc = db.doc("doc");

await doc.set({ a: 1, b: 2 });
// prints { a: 1, b: 2 }
console.log(await doc.get());

await doc.set({ a: 1, c: 3 });
// prints { a: 1, c: 3 }
console.log(await doc.get());

await doc.set({ b: 2 }, true);
// prints { a: 1, b: 2, c: 3 }
console.log(await doc.get());

// Earlier the preserveRest functionality could be achieved by this
const oldData = await doc.get();
doc.set({ ...oldData, ...newData });

// But now, the new update doesn't just make it more readable and easier to use but also is more than twice as fast.
doc.set(newData, true);

  • doc.get()

    • Returns: <Promise> (resolves to <object>)

Gives the content of the document.


  • doc.get([props])

    • props <string[]> (optional) name of document properties to get.
    • Returns: <Promise> (resolves to <objcet>

Gives the values for props in the doc. If props is not specified, it returns the full content of the document.

const doc = collection.doc("doc");
await doc.set({ name: "Kakashi", age: 174 });

// prints { name: 'Kakashi', age: 174 }
console.log(await doc.get());

// prints { name: 'Kakashi' }
console.log(await doc.get(["name"]));

Why this update?

Suppose you had a really big doc like this one,

{ 
  name: "Kakashi",
  age: 174,
  accountCreatedAt: "20 April 2020",
  dateOfBrith: "10 April 1846",
  password: "a long big hash",
  ....and 100 more fields
}

If you ever needed the data of this doc, you could call doc.get() which would fetch the whole doc and return its content.
Suppose a part of your app only required name and age properties. To get those, you would do something like this,

const { name, age } = await doc.get();

This may seem okay, but actually, it fetches all the contents of your really long document, deserializes it, and returns the deserialized data, you make a reference for name and age and whenever the GC runs next time, it deletes all the other unnecessary stuff.

So, this would waste CPU time and memory in bringing the contents through WebSockets, deserializing the data, storing the data, and cleaning the unnecessary data.

But now, with the new update, the above code will change to something like this,

const { name, age } = await doc.get(["name", "age"]);

This time, out of your massive document, only name and age properties are fetched and all the performance issues mentioned above are perfectly solved.

You can fetch the whole document by not providing the props argument to the function, and therefore the new update also doesn't break old api implementations.

Along with the update in doc.set(), now the DBMS is capable of handling really huge documents with no performance issues.

Although, it's still not recommended to have huge documents. If you think you can break the contents of a large document into two, please do it, because even if your main application has no performance issues, the database server still needs to serialize and deserialize complete documents.


  • doc.delete()

    • Returns: <Promise> (resolves to undefined)

Deletes the document.

const doc = collec.doc("doc");
await doc.delete();

// prints false
console.log(await doc.exists());
// throws error
console.log(await doc.get());

// works
doc.set({ name: "Kakashi", age: "174" });

// prints true
console.log(await doc.exists());
// prints { name: 'Kakashi', age: "174" }
console.log(await doc.get());

Snap

A detailed description of snapshots is given in the implementation guide below.

  • snap.save(target)

    • target <Collection | Doc | DB> collection or document to save

Save the contents of target to the snapshot. If DB is passed as an argument, the whole database is saved in the snapshot.

Implementation Guide

How to use?

  • Fork the database repl, change the KEY variable in .env to any long, secret string, and run the repl.
  • Fork the client repl, import db.js, and start messing with the db. The db.js exports 3 classes (DB, Doc, and Collection) which are documented above.
    If you want to implement the client code in any other language, you don't need to do this step, read the below section and you can make a client API yourself.

How it works?

First of all, the DBMS uses a separate repl for maintaining the database, because,

  • it reduces the workload on the main application.
  • repl.it editor doesn't work very well with a huge number of files.

A WebSocket server is hosted on the database repl and the client library communicates with the server for manipulating the database.

Database Structure

The database is a directory called _db inside the repl's home directory, each collection is a subdirectory of _db, every sub-collection is a subdirectory of its parent collection and every document is a file with BSON encoded data.

The database server treats the database exactly like a collection, therefore, all the collection commands are valid for the database as well, although, some of them are undocumented in the above API reference because they are never needed and their usage is totally not recommended.

For example, calling db.delete() deletes the database i.e. the _db directory and the database can no longer be used, even if you reconnect and re-perform the handshake (db.init()). The db.init() does not create the database, it only performs the authentication and session management.
The database is already created in the server repl and there is no standard way to recreate the database if you ever delete it.

Although, there is a somewhat hacky way to recreate a deleted database,

const db = new DB("server url"); // deleted db

await db.init(key); // creates websocket connection

db.path = "";
db.createCollection("_db"); // creates the "_db" directory

db.path = "_db";

// use db normally here

The db.path here is another undocumented property of the DB class which represents the path of the database relative to the home directory, the default value of db.path is "_db" and there is never a need to change this.
Collections and documents also have the collection.path and doc.path properties which represent their path relative to the home directory. Altering their values is totally not good, never touch these properties.

The DBMS uses BSON format for storing data (which is also used by MongoDB);

Database Snapshots

The snapshot of a database represents the state of the database of the time when the snapshot was created. A snapshot can save the whole database or a few specific collections or documents depending on your purpose of creating the snapshot.
You can use snapshots for having a backup of your database in case your app accidentally deletes something important because of a bug or any other reason, or you can use snapshots for storing the history of the database, etc.

Just like the database, the snapshots are also treated as ordinary collections by the server, therefore all collection commands are valid for snapshots as well.

The snapshots are stored in the _snaps subdirectory of the home directory of the container. The path to a snapshot is _snaps/[snapshot name] and the path to a collection/doc inside a snap is _snaps/[snapshot name]/[parent collection name]/.../[collection name | doc name]

These collections and docs can be used exactly like the ordinary ones but it's not recommended to edit them directly since it defies the whole purpose of snapshots.

If you save a document, its parent collections are recursively created but only the given document is saved.

Here is an example to explain the workings of snapshots,

const db = new DB("server");

const collec = await db.createCollec("collec");
const doc = collec.doc("doc");

await doc.set({ a: 1, b: 2 });

const snap = db.createSnap("snap");

await snap.save(doc); // recursively creates "collec" collection.

const snapDoc = snap.collection("collec").doc("doc");

// prints { a: 1, b: 2 }
console.log(await snapDoc.get());

await doc.set({ c: 3 }, true);

// prints { a: 1, b: 2, c: 3 }
console.log(await doc.get());
// prints { a: 1, b: 2 }
console.log(await snapDoc.get());

await snap.save(doc);
// prints { a: 1, b: 2, c: 3 }
console.log(await snapDoc.get());

Working with the server

Handshake

When the client library connects to the database server, the library is needed to send a secret key for authentication and the length of command IDs (we'll talk about command IDs in a sec, for now, think of it as just a number).

Here is the authentication code on server,

wss.on("connection", (ws) => {
  
  ws.once("message", (message) => {
    
    const data = BSON.deserialize(message);
  
    if(data.key == process.env.KEY){
      len = data.len;
      ws.on("message", msg => handleCommand(ws, msg.toString("ascii")));
      ws.send("0");
    }
    else {
      ws.send("BUH!");
      ws.terminate();
    }
  });
});

The client sends BSON encoded { key: [some string], len: [some number] } payload.
If the key matches theKEY environment variable, authentication is successful, the len property is stored for future use and the client can now send commands to the server (we'll talk about commands later in this section).
If the key sent by the client does not match KEY environment variable, the socket is terminated.

This is the small handshake process at the beginning of WebSocket connection and is handled by db.init(key, [len]) method which is documented above in API reference of DB class.


Database Commands

The client can send 8 different kinds of instructions to the database server, each kind of instruction along with necessary input is called a command.
A command is a simple buffer of format: [command id][command index][command input]


Command ID

A command id is a unique id for each command and is generated by the client library. It is used to keep the data of different commands separate when two or more commands are being executed at the same time because all the data flows through the same WebSocket.
The length of the command id is always fixed and is sent to the database server by the client at the time of handshake. The length of command ids cannot be changed after the handshake.
Here is the super high quality uuid generator used by the our js api,

genKey(len){
  return Math.random().toString(30).substr(2, len); // the default value for len is 5
}

Yes, the command id is not some super unique permanent kind of thing. Once a command is completed, the same id can be used to represent any other command.

Even if your app always has 100 database commands under execution, the probability of two ids being the same is 0.000411522633744855% (assuming id length to be 5) and ofc if you don't like this number, you can change the id length to anything in db.init() as documented above in API reference.


Command Index

As mentioned above, there is a total of 7 different commands. Each command has an index starting from 0 to 7. Here is the list of all commands mapped with their index.

  • 0 => create collection (implemented by db.createCollection() and collection.createCollection())
  • 0 => create collection (implemented by db.createCollection(), collection.createCollection() and db.createSnap())
  • 1 => list collections (implemented by db.list() and collection.list())
  • 2 => delete collection (implemented by collection.delete())
  • 3 => check whether collection/doc exists (implemented by collection.exists() and doc.exists())
  • 4 => set document content (implemented by doc.set())
  • 5 => get document content (implemented by doc.get())
  • 6 => delete document (implemented by doc.delete())
  • 7 => saves a collection/doc to the snapshot (implemented by snap.save())

Command Input

Some commands require input to work. Here is the index-to-input map of commands.

  • 0 => path to collection
  • 1 => path to collection
  • 2 => path to collection
  • 3 => path to collection/doc
  • 4 => BSON encoded { path: [path to document], data: [BSON encoded object] }
  • 4 => BSON encoded { path: [path to document], preserve: [true | false], data: [BSON encoded object] } // read doc.set() api docs for more info
  • 5 => path to doc
  • 5 => BSON encoded { path: [path to document], props: [array of properties to fetch | undefined] } // read doc.get() api docs for more info
  • 6 => path to doc
  • 7 => BSON encoded { snap: [path to snap], target: [path to doc/collection/db] }

The paths of documents and collections are explained above in the "Database Structure" section.


Command Output

The output of the command starts with its id, the next byte tells whether the command was executed successfully or not. If the command is completed successfully, the first byte after command id is '0' which is followed by the output of the command (if any), otherwise, the first byte after id is '1' followed by the error message given by the files system.

So, the format for output is, [command id][0 | 1][command output]

The command output is only returned in a few commands. Here is an index-to-output map of commands.

  • 0 => none
  • 1 => BSON encoded array
  • 2 => none
  • 3 => none
  • 4 => none
  • 5 => BSON encoded object
  • 6 => none
  • 7 => none

In the case of index 3 (command to check whether doc/collection exists), if the collection/doc exists, the byte after id is 0, and if it doesn't, the byte after id is 1.


This is how each command is implemented in the client library,

const key = Collection.genKey(this.len); 
      
const listener = (message) => {
  if(message.startsWith(key)){
    this.ws.off("message", listener);
    
    if(message[this.len] == '0')
      resolve(undefined); // it is `BSON.deserialize(Buffer.from(message.substring(6)))` instead of `undefined` in case of command 1 (list collections/docs) and 5 (get doc content)
    else
      reject(message.substring(this.len+1));
  }
}
this.ws.on("message", l);
// key + commandIndex + path; is `key + commandIndex + BSON.serialize({ path, data, preserve })` for command 4 (set doc content) and `key + commandIndex + BSON.serialize({ path, props })` for command 5 (get doc conetnt)
this.ws.send(key+"1"+this.path);

This ends the implementation guide for the DBMS. Thanks for reading the docs this far.
I am working on some really cool things for this project, I will update this guide and let you guys know when I finalize the changes.

I would love to see client libraries in different languages from you guys.

Special thanks to @JSer for his awesome markdown guide and the person who gave me the idea to make a DBMS (I forgot your name, sorry).

Thanks for reading :).

And as you would have guessed, it took me more time to write the README than to make the whole DBMS.

Peace!

Comments
hotnewtop
Science (4)

Good but there are more ways now to work on it except those long codes but its ok if you want more functions

Btw your hands must be broken at the time u finished writing post

Lord_Poseidon (170)

If anyone needs help with setting up the DBMS, hmu! I've translated the DBMS so I know how stuff works.

NoelB33 (351)

Just how much time did you spend writing this post? It’s super long.

NoelB33 (351)

Woah, that must have been so exciting to finally finish [email protected]

DaLiteralPanda (11)

Good Job though I understand nothing and many ppl dont read much so if you can make this reading small then your a pro dev (:

ApoorvSingal (105)

@DaLiteralPanda This post actually has a lot (really lot) of stuff undocumented. If I had documented everything, the post would have been atleast 3 times bigger than this ¯_(ツ)_/¯.

Jakman (454)

This is good. Fine work man.

anishanne (7)

wow. What took longer? Writing the post or the project?

ApoorvSingal (105)

the post lmao. But now with the new updates(especially the unreleased ones), the code has gotten far bigger than the post. @anishanne

wulv (61)

Cool! Seems like it needed a ton of work to make

ApoorvSingal (105)

@wulv Writing this post was like 65% of the work. The implementation was easy and small. :)

ApoorvSingal (105)

@Codemonkey51 did you put the KEY in .env in both client and database repls?

ApoorvSingal (105)

Oh, seems like I messed up a bit with markdown lol.