I came up with a possible data structure for the Volumio2 music library, and wrote some code to test it.
Here is a Gist of the test code.
The test code first generates 105,000 “items”, each of which represents a single track on a given music service. These are organized into something which roughly approximates a real music collection - 7 genres, 50 artists per genre, 10 albums per artist, 10 tracks per album, and 3 services which offer each track. Some tracks have multiple artists, albums, or associated genres.
Then, it builds the music library by looking at each item and reading its metadata. On my RPi B+, it takes a little over 7 minutes to build the database with 105,000 items. The music library takes up only about 140mb, so I’m keeping the whole thing in RAM for now.
In the current library structure, the following tables are created:
- Genres
- Artists
- Albums
- Tracks
- Items
For instance, the Artist Table has one entry for each artist. Within each entry, it lists the artist’s name, keys to the genres the artist is affiliated with, and keys to the albums he/she has collaborated on. This will allow the user to navigate up to the relevant genres, or down to the relevant albums.
<Artist key 1 (hash of artist name)>: {
albumkeys: {
'<Album key 1 (hash of album title)>': null,
'<Album key 2 (hash of album title)>': null,
'<Album key 3 (hash of album title)>': null,
'<Album key 4 (hash of album title)>': null
},
genrekeys: {
'<Genre key 1 (hash of genre name)>': null,
'<Genre key 2 (hash of genre name)>': null
}
metadata: {
name: 'Artist Name'
}
}
The other tables share this same style, with similar keys to link up and down the tree. Search functions would also iterate on this tree-like structure.
At the bottom of the tree in the Item Table, each entry contains all the metadata of the item. For example, an entry in the Item Table might look like:
<Item key from service 1 (hash of service + URI)>: {
service: 'spop',
uri: '<spotify URI>',
metadata: {
title: 'Test Track',
genres: [ 'Genre 1', 'Genre 2' ],
artists: [ 'Artist 1', 'Artist 2', 'Artist 3' ],
albums: [ 'Album 1', 'Album 2' ] }
},
trackkey: { '<Track key 1 (hash of track title)>': null }
}
Moving up a level, an entry in the Track Table might look like:
<Track key 1 (hash of track title)>: {
itemkeys: {
<Item key from service 1 (hash of service + URI)>: null,
<Item key from service 2 (hash of service + URI)>: null,
<Item key from service 3 (hash of service + URI)>: null
},
albumkeys: {
'<Album key 1 (hash of track title)>': null,
'<Album key 2 (hash of track title)>': null
},
metadata: {
title: 'Track Title'
}
}
Moving up another level, an entry in the Album Table might be:
<Album key (hash of album name)>: {
trackkeys: {
'<Track key 1 (hash of track title)>': null,
'<Track key 2 (hash of track title)>': null,
'<Track key 3 (hash of track title)>': null
},
artistkeys: {
<Artist key 1 (hash of artist name)>: null,
<Artist key 2 (hash of artist name)>: null
}
metadata: {
title: 'Album Title'
}
}
The Artist Table is the next level up, an example entry for which is shown earlier. Then, at the top level is the Genre Table:
<Genre key 1 (hash of the genre name)>: {
artistkeys: {
<Artist key 1 (hash of artist name)>: null,
<Artist key 2 (hash of artist name)>: null,
<Artist key 3 (hash of artist name)>: null,
<Artist key 4 (hash of artist name)>: null
},
metadata: {
name: 'Genre Name'
}
}
These tables use a key:value type organization. This may not be as fast as a straight array, but there are some advantages:
- Each entry can be easily accessed by knowing its key, which is cross-referenced in other tables. Since the keys are just hashes of the genre names, artist names, etc, one can also access an entry directly without any searching if they know the full name
- Duplication of entries is not possible. Attempting to write into a duplicate entry just replaces the contents of the existing entry. This removes the need for running deduplication routines on search results, for example.
- The parts of each table which store links to other tables can be kept in memory for fast access, and portions which store larger objects (like album art) can be placed on disk. When complete data is needed, one can just merge the two portions using Object.assign() and present the complete result to the user.
I’m also looking into LevelDB, which can be used as the persistent storage for the music library tables. It is fast, well supported in Node, and can store and retrieve these tables directly as Javascript objects. 