Implementing a NIPL.
First up, we need to consider what we need to store.
- The music
- MetaData about the music
- Scores for song-pairs.
- Music is stored on a per-track basis as any format supported by the player. (.ogg and/or .mp3 are the most likely formats, most likely stored simply as files on the filesystem)
- Music metadata should be stored in an associated file - simple text file initially? (this ensures the data is stored in a consistent manner regardless of music format type)
- The Album, track number, Artist(s), date recorded, instruments played, genre... there could be alot of stuff. Refer to:
- For SongA->SongB where SongB follows songA, we cannot score the A->B pair untill after B begins. Nevertheless, we want to store the score with SongA, since that's where we'll need to know about it for next time we want to refer to it.
- For each song, a simple database to hold the scores and the song that score relates to. (maybe hold the number of votes also, for added precision when re-scoring)
Issues to think about:
- Implied scores (album/artist/genre) are difficult without further lookup tables keeping track of what songs belong to which album(s), artist(s) and genre(s)...
- Implied scores are gonna be kinda nescessary IMHO, given it's the only reasonable way to populate the scoring database with any reasonable speed.
This should maybe be put with some other part of the wiki, but for the time being, here 'tis:
We'll need a TRACKS table. We can safely assume we'll need a primary key "TRACKID" and TRACKNAME column.
- Note that "TRACKID" key uniqueness is covered in NCDI
So what about Albums?
- Just like the TRACKS table, we have an ALBUMS table. It has "ALBUMID" primary key (arbitrary), and ALBUMNAME column.
How about Artists?
- Similarly, an ARTISTS table with ARTISTID primary key (assignment can be sequential :), and ARTISTNAME column
- Am thinking just have a flat list of genres - any sub-genre organisation is, for now, outside the scope of this implementation. Thus, "GENREID" and "GENRENAME"
So, we have 4 tables so far - lists for TRACK, ALBUM, ARTIST, and GENRE
Now, the essential part of NIPL is knowing what the links between tracks are:
A table called "TRACKLINK" with columns of TRACKLAST, TRACKNEXT, SCORE, VOTECOUNT. This table doesn't have a unique key - though in theory, the TRACKLAST & TRACKNEXT combo should always unique.
Similarly, for implied scores, we have a ALBUMLINK, ARTISTLINK and GENRELINK tables - each with the same LAST, NEXT, SCORE and COUNT columns.
Is there anything we've missed out?
- Q. How do we know what artists did which albums or tracks, and which genres. etc?
- A. Another table Everything links back to tracks - that's what's important. So, let's do "TRACK_TO_GENRE" table with TRACKID and GENREID as columns. Similarly, we can do "TRACK_TO_ARTIST" with TRACKID and ARTISTID. Lastly, "TRACK_TO_ALBUM" is desirable, with ALBUMID and TRACKID columns. Desirable because even though it provides information discoverable from NCDI interogation, this way it's native, plus provides advantages listed below.
- Q. What about tracks that are on multiple albums?
- A. The TRACK_TO_ALBUM table can easily have multiple ALBUMID with the same TRACKID.
- Q. What about albums with multiple discs?
- A. The ALBUMS table handles albums which may be multi-disc sets. We add DISCID and DISC_TRACK_NUM columns to the ALBUM_TO_TRACK table to match the trackID to a precise location. More importantly, this allows the system to play an entire disc or album in order :)
- Q. So Albums are different from discs then? How are they named?
- Q. Yes, an "ALBUM" is the multidisc set as released. Eg, Pink Floyd's album "The Wall" is released on two discs. Back in vinyl days, it was on 4 sides, and in DVD-audio, it will probably be just one disc!). We add a DISCS table now to provide for names on individual discs. (rare, we concede, and this is possibly information more suited to a NUMB database encompasing the NIPL db)
- Q. Doesn't this make for a horrible UI?
- A. No. This granularity of data means the UI can slice and dice the information any way it wants, but there's no rule that this complexity must be passed on to the user. Ideally, the UI will let the user edit summaries, which will be decomposed into values for all the relevant tables. eg: The end UI can ask the user to set a genre by Album - it just stores it as a genre per track of that said album. The UI could even try and predict the "most-likely" genre for an album by looking at the artist(s) for that album, then looking at other genres for that artist(s)!)
- Q. Wouldn't it be nice to have a way of noting when individual artists are part of larger groups? (eg, Paul McCartney is reknown solo artist, but also reknown as part of Wings (and The Beatles, I am told ;)
- A. Sounds good. "BANDMEMBERS" table with ARTISTID and MEMBER_ARTISTID columns - both populated with ARTISTID keys.
The database so far...
(for those who came in late)
- TRACKS (with TRACKID and TRACKNAME) (grows at rate of track addition)
- DISCS (with DISCID and DISCNAME) (grows at rate of disc addition - approx 1/12th that of track addition)
- ALBUMS (with ALBUMID and ALBUMNAME) (grows at 1/10th rate of track addition, approx)
- ARTISTS (with ARTISTID and ARTISTNAME) (very slow growth)
- GENRES (with GENREID and GENRENAME) (effectively static size)
- BANDMEMBERS (with ARTISTID and MEMBER_ARTISTID) (extremely slow rate of growth)
- TRACK_TO_GENRE (with TRACKID and GENREID) (grows faster than rate of addition)
- TRACK_TO_ALBUM (with TRACKID, ALBUMID, DISCID and DISC_TRACK_NUM) (grows faster than rate of track addition)
- TRACK_TO_ARTIST (with TRACKID and ARTISTID) (grows faster than rate of track addition)
- TRACKLINK (with TRACKLAST, TRACKNEXT, VOTE and COUNT) (growth will slow, tend towards trackcount^2)
- ALBUMLINK (with ALBUMLAST, ALBUMNEXT, VOTE and COUNT) (growth will slow, tend towards albumcount^2)
- ARTISTLINK (with ARTISTLAST, ARTISTNEXT, VOTE and COUNT) (growth tending towards artistcount^2)
- GENRELINK (with GENRELAST, GENRENEXT, VOTE and COUNT) (growth tends to genrecount^2)
Note that the data in the ALBUMLINK, ARTISTLINK, and GENRELINK tables should be able to be extrapolated from the data in other tables - however we make provision to store it seperately as caching this data is expected to be useful for performance. (the alternative is to calculate the ALBUMLINK info (for example) by looking at links between all songs on albumA to all the songs on albumB - this would usually be, on average, almost 150 links to look up. The situation only gets worse if you're trying to recalculate the ARTISTLINK data, and GENRELINK data! This is especially relevant when you consider that the ALBUMLINK, ARTISTLINK and GENRELINK data is used in implied scores. Compacting down 150 lookups to one is VERY important when that one lookup is just one of *many* before deciding what the next TRACK will be...
Note2 this set of tables are designed such that a subset can be used if desired (eg, drop the ALBUMLINK, ARTISTLINK and GENRELINK tables and don't use implied scoring at all), and yet still be a usable subset of complete NIPL such that the collected data can be incorporated or harvested into a more-complete NIPL implementation. Similarly, the tables and fields should lend themselves into being part of a larger database with scope outside NIPL. (eg, in a complete NUMB system, we might also have a table "TRACKINFO" with columns COPYRIGHTOWNER COMPOSER AUTHOR PERFORMER ID3GENRE PARODYOF SAMPLES MIXVERSION ...and so on... all usefull, valid information... but outside the scope of what NIPL is aiming to accomplish.)
or: Reading and writing, when and where?
- When tracks are added, we will most likely write to the TRACKS, DISCS, ALBUMS and ARTISTS tables, as well as to the TRACK_TO_* tables.
- As the system matures, ARTISTS is less likely to need writing to.
- DISCS is only written to for a multidisc album.
- When tracks are played, at each changover there is the potential for a score to be set. We write to the four *LINK tables.
- As the system matures, these writes do not generate new lines, but rather modify existing lines.
- Worst case: score written after each track change.
- Note that scores linking tracks/albums/artists/genres may be set at any time outside of the playlist UI. (eg, a webUI to do this)
- We read the database extensively to determine which song to play next. Ideally the database reads and calculations should not take more than several seconds. (up to 30 seconds would seem OK imho - a good implementation will use longer tracks to calculate several tracks ahead to accomodate for an abnormally short track)
- we want to get the current_track-to-next_track score for every possible value of "next_track". Worse case - this is n-1 values, where n is the total number of tracks. Will this require n-1 reads, or is SQL smart enough to get all the data we want in one read? (nemo guesses here that this might be implementation dependant?). Once all the possible next-track and scores are known, then it's CPU grunt to work out which next-track to choose. We may need to read from the following tables
- TRACKS (to know what tracks are possible for next_track
- *LINK tables (to work out how likely each track is)
old stuff below
For the database to work properly, it matters only that genres are uniquely identified - the actual human-readable strings are not necessary. So, we'll make another table GENRES, with two columns: GENREID and NAME. How sub-genres can be implemented, I don't know - I don't think anyone knows how subgenres work yet, so implementing them is pointless.
To connect genres and songs, we'll need a third table, called (why not?) GENRECONNECT. It can have two colums, TRACKID and GENREID.
To find the genre IDs attached to a song with, say, SONGID=123, we run:
select GENREID from GENRECONNECT where TRACKID=123;
To find the genre names attached to a song, we have the slightly more complex query:
select GENRES.NAME from GENRECONNECT, GENRES where GENRECONNECT.TRACKID=123 and GENRECONNECT.GENREID=GENRES.GENREID;