read24 - Server: In-Memory Database

August 12, 2020

Roger Ngo

It is now time to put the effort into implementing the in-memory database to allow it to perform the same functions as a SQL database.‌

Yesterday I had defined a stub for this in-memory database connector as memory.ts. Let’s do some house-keeping, and set up on various places before I start implementing the functions needed for this in-memory database connector.

First, since data only exists in memory, the data will be lost once the server process has terminated. This can be good, or bad. It is good in that the database is always in a predictable state every time the process is restarted, but bad in that data is lost if there is no means to flush the data to disk periodically.

The in-memory database should be implemented so that the contents can be dumped to the filesystem and reloaded later.

In the _MemoryDb class, provide an instance variable that will hold all the data, and serve as a mock database. This variable will be a flat array with elements in the following structure:

{
    lastId: 0,
    data: []
}

Where lastId is the last increment ID which is used to assign to an AUTO_INCREMENT field. data is a collection of contents which are analogous to rows in a database table.

For each of these entries, they will be accessible by string keys. The keys are just the table names found in TableMapping, and so the database can be access like this:

db[TableMapping[tableName]].data.find(r => r.id === id)

Which will find a row in tableName with a particular id.

So all this can initialize in the constructor with empty data, and a dump, and initialize method can be used to save the data to disk, and load the data from disk respectively.

class _MemoryDb extends DbConnector {
    private _db = {};

    constructor() {
        super();

        for(const k in TableMapping)
            this._db[TableMapping[k]] = {
                lastId: 0,
                data: []
            };
    }

    public dump() {
        return this._db;
    }

    public initialize(db: any) {
        this._db = db;
    }

    public insert(tableName: string, data: Partial<DataRow>): Promise<number> {
        ....
    }
    
    ...
}

export const MemoryDb = new _MemoryDb();

Now, I can provide some API to dump, and load the database to memory. In index.ts, I can bring back those debug routes.

app.get('/debug/db', (req, res) => {
    if (process.env.DATA_SOURCE === 'mysql')
        return res.status(304);

    return res.status(200).json(MemoryDb.dump());
});

app.get('/debug/db/dump', (req, res) => {
    if (process.env.DATA_SOURCE === 'mysql')
        return res.status(304);
        
    fs.writeFileSync('db.json', JSON.stringify(MemoryDb.dump(), undefined, 4));

    return res.status(200).json({message: 'Dumped'});
});


app.listen(process.env.PORT || 5000, () => {
    // Initialize the in-memory DB if exists
    if (process.env.DATA_SORUCE !== 'mysql' && fs.existsSync('db.json')) 
        MemoryDb.initialize(JSON.parse(fs.readFileSync('db.json', 'utf-8')));

    console.log(`Running on: ${__dirname}`);
    console.log('It works!');
});

/debug/db is used to inspect the contents of the in-memory database, while /debug/db/dump is used to explicitly dump contents to disk.

When the server is restarted, the file db.json is checked to see if it exists. If so, then the file is used as an image of the memory database, and is loaded.‌

Now, the server itself must be updated to check for an environment variable DATA_SOURCE, and determine which type of database should be used for the duration of the application. This is similar to how I had configured before.‌

Within connector.ts:

if (process.env.DATA_SOURCE === 'mysql')
    DatabaseConnector = MySqlDb;
else 
    DatabaseConnector = MemoryDb;

console.log(`Using the ${process.env.DATA_SOURCE === 'mysql' ? 'MySQL' : 'Memory'} database connector.`);

If you can recall, the reason I had removed the previous in-memory database earlier was that it was getting difficult to maintain due to not being able to support various SelectOption properly. The most glaring use-case were aggregate functions like COUNT. This time, I will build the in-memory database connector with the options in mind.

There will be a new option in SelectOption called aggregate which will be of AggregateOption to perform aggregate queries such as COUNT. For now, this will be the only type of aggregate function supported.

export enum AggregateType {
    Count = 'COUNT'
}

export interface AggregateOption {
    type: AggregateType;
    column: string;
    alias: string;
}

export interface SelectOption {
    includeDeleted?: boolean;
    orderBy?: OrderByOption;
    offset?: number;
    limit?: number;
    columns?: string[];
    filters?: FilterOption[];
    fullTextMatch?: FullTextMatchOption[];
    in?: InOption;
    aggregate?: AggregateOption;
}

What is alias? That is how the caller will want to name the column that is returned by the aggregate function. It is also how the data will be accessed in the results. ‌

So for example, for the function numberOfPages found in the Book resource, a count of the books can be implemented as:

public static async numberOfPages() {
    const results = await DatabaseConnector.select('books', {
        aggregate: {
            type: AggregateType.Count,
            column: 'id',
            alias: 'count(id)'
        }
    });

    return results[0]['count(id)'];
}

Where now the result can be accessed as count(id) since it was specified to be the alias.‌

There will be a restriction when it comes to using an aggregate function. The downside right now is that aggregate function in select statements can only be used on their own. The columns property in SelectionOption must not be provided, or contain any string values. And so, the other way also applies: If using columns, then provide no aggregate in SelectOption.

In sql.ts, the select method can be updated to be:

public select(tableName: string, opts: SelectOption): Promise<DataRow[]> {
    const promise = new Promise<DataRow[]>((resolve, reject) => {
        let queryString = 'SELECT ';

        if (opts.aggregate && (opts.columns && opts.columns.length > 0))
            return reject('Can only select with either columns, or aggregrate function, but not both.');

        if (opts.columns && opts.columns.length > 0)
            queryString += `${opts.columns.join(',')} `;
        else if ((!opts.columns || opts.columns.length === 0) && !opts.aggregate)
            queryString += `* `;
        else if (opts.aggregate && (!opts.columns || opts.columns.length === 0)) {
            // Currently only support aggregate when no columns are provided for selection
            if (opts.aggregate.type === AggregateType.Count)
                queryString += `count(${opts.aggregate.column}) as ${connection.escape(opts.aggregate.alias)}`;
        }

        queryString += `FROM ${TableMapping[tableName]}`;
        
        ....
        
    }
    
    ...
    
}

Note, that if no columns, or aggregate is provided, then the query is assumed to be SELECT * ....

Phew, now I can get started in implementing the required methods for the _MemoryDb connector!

insert

The first, and easiest method is the insert method. It is just taking the provided data passed as a parameter and adding it to the table data. All I have to remember is to just provide an id value using the lastId. The id value here should actually be lastId + 1.

After inserting an item to a “table”, the main lastId should be incremented by 1, so that the next insertion can take on a higher id value.

public insert(tableName: string, data: Partial<DataRow>): Promise<number> {
    const table = this._db[TableMapping[tableName]];
    const tableData = table.data;

    const newData = {...data};
    for(const k in newData) {
        if (typeof newData[k] === 'boolean')
            newData[k] = newData[k].toString() === 'true' ? 1 : 0;
    }

    tableData.push({...newData, id: table.lastId + 1});
    table.lastId++;

    return Promise.resolve(table.lastId);
}

update

The next “easiest” method to implement is the update method. The approach is straightforward. It is to just find an existing entry in the “table”, and update all the values found in the properties provided in the data parameter.

public update(tableName: string, data: Partial<DataRow>): Promise<number> {
    const toUpdate = {...data};
    if (!toUpdate.id)
        return Promise.resolve(0);
    
    for(const col in toUpdate) {
        if (typeof toUpdate[col] === 'boolean')
            toUpdate[col] = toUpdate[col].toString() === 'true' ? 1 : 0;
    }

    const ref = this._db[TableMapping[tableName]].data.find(r => r.id === toUpdate.id);
    for(const k in ref)
        ref[k] = toUpdate[k];

    return Promise.resolve(1);
}

find

find is easy too. Just search for the element with the given id.

public find(tableName: string, id: number): Promise<Partial<DataRow>> {
    const ref = this._db[TableMapping[tableName]].data.find(r => r.id === id);

    return Promise.resolve(ref);
}

select

The select method is perhaps the most complex to implement. It is not as straightforward as translating the entire implementation of select found in sql.ts.

The key approach here is to operate on the same dataset, further reducing the number of elements until the desired selection is achieved.

The dataset, filteredData is initialized to include all data in the table. If opts.includeDeleted is not specified, then all data that has been logically deleted by having a dateDeleted value is excluded.

The first step in reducing the data is to filter out data specified in opts.filters. Since opts.filters is an array, each filter is iterated on, and filteredData is modified to contain values that match the condition. Notice that this is an AND condition.

For filtering this logic is to only allow data to be stored into the filteredData array if all filter conditions match through iterating each opts.filters element.

Full-text matching is through by an OR condition. For every property in the row in the list of columns to match, if the occurrence of the string occurs, then it is considered to be the match.

public select(tableName: string, opts: SelectOption): Promise<Partial<DataRow>[]> {
    const allData = [];
    
    if(!opts.includeDeleted)
        allData.push(...this._db[TableMapping[tableName]].data);
    else
        allData.push(...this._db[TableMapping[tableName]].data.filter(r => r.dateDeleted === 0));

    let filteredData = [];
    if (opts.filters) {
        const result = [...allData].filter(r => {
            let condition = undefined;
            opts.filters.forEach(filter => {
                condition = !condition
                    ? (r[filter.column] === filter.value) 
                    : condition && (r[filter.column] === filter.value);

                return filter;
            });

        });

        filteredData = [...result]
    } else {
        filteredData = [...allData];
    }


    if (opts.fullTextMatch && opts.fullTextMatch.length > 0) {
        for(const filter of opts.fullTextMatch) {
            filteredData = filteredData.filter(r => {
                let result = false;
                for(const col of filter.columns)
                    result = result || (r[col].toLowerCase().indexOf(filter.value) > -1);

                return result;
            });
        }
    }

    if (opts.orderBy) {
        filteredData.sort((x, y) => {
            const a = x[opts.orderBy.column];
            const b = y[opts.orderBy.column];

            if(typeof a === 'number') {
                if (opts.orderBy.ascending)
                    return a - b;
                else
                    return b - a;
            } else {
                if (opts.orderBy.ascending)
                    if (a <= b)
                        return -1;
                    else
                        return 0;
                else
                    return 1;
            }
        })
    }

    if (opts.limit) {
        const offset = opts.offset ? opts.offset : 0;
        const limit = offset + opts.limit;
        filteredData = filteredData.slice(offset, limit);
    }

    if ((opts.columns && opts.columns.length > 0) && opts.aggregate)
        return Promise.reject('Can only select with either columns, or aggregate function, but not both');

    // This is like a SELECT * 
    if ((!opts.columns || opts.columns.length === 0) && !opts.aggregate)
        return Promise.resolve(filteredData);

    // If it is a SELECT by columns, or aggregate
    const results = [];
    if ((opts.columns && opts.columns.length > 0) && !opts.aggregate)
        for(const f of filteredData) {
            const row = {};
            for(const c  of opts.columns) {
                row[c] = f[c];
            }

            results.push(row);
        }
    else if((!opts.columns || opts.columns.length === 0) && opts.aggregate) {
        if (opts.aggregate.type === AggregateType.Count) {
            results.push({
                [opts.aggregate.alias]: filteredData.filter(r => !!r[opts.aggregate.column]).length
            })
        }
    }

    return Promise.resolve(results);
}

ORDER BY and LIMIT are fairly straightforward, so I won’t go too much into detail here.

For selecting columns from each row, a new result set is created, and new objects are initialized to only included the indicated columns specified in opts.columns.

For aggregate functions, the column name in the result set must adhere to the alias.

[opts.aggregate.alias]: filteredData.filter(r => !!r[opts.aggregate.column]).length

delete

public delete(tableName: string, opts: DeleteOption): Promise<number> {
    // There needs to be some filtering
    if (opts.filters.length === 0 && !opts.in && !isValue(opts.in.column) && !isValue(opts.in.value))
        return Promise.resolve(0);

    const combined = [];
    const allData = [...this._db[TableMapping[tableName]].data];

    if (opts.filters.length > 0) {
        const deletedFilteredEntries = [...allData].filter(r => {
            let condition = undefined;
            opts.filters.forEach(filter => {
                condition = !condition
                    ? (r[filter.column] === filter.value) 
                    : condition && (r[filter.column] === filter.value);

                return filter;
            });
        });

        combined.push(...deletedFilteredEntries);
    }

    if (opts.in) {
        let deletedInEntries = [...allData];
        for (const v of opts.in.value) {
            deletedInEntries = deletedInEntries.filter(r => r[opts.in.column] === v);
        }
        
        combined.push(...deletedInEntries);
    }

    this._db[TableMapping[tableName]].data = this._db[TableMapping[tableName]].data.filter(r => !combined.find(d => d.id === r.id));

    return Promise.resolve(Array.from((new Set(combined)).values()).length);
}

For delete operations, I just need to be mindful and make sure I have this code only perform the delete if, and only if filters, or an in-clause is provided. Again, I should never be performing an uncontrolled delete.

I will build a list of entries to delete by inspecting the opts.filters, and opts.in individually. Once I have this “master list” of elements to delete, I simply just filter out those elements from the final set, and reassign the entire table with those elements.

Testing

So how do I know all this works? Well, I will set .env to use the memory database, and perform some front-end operations:

DATA_SOURCE=memory
PORT=5000

However, the better way is to actually create an automated test script that will test the in-memory database functions. ‌

I will explore that tomorrow.