NIPL/Implementation
(Commenting) |
(nemo notes table growth rates) |
||
Line 80: | Line 80: | ||
(for those who came in late) |
(for those who came in late) |
||
− | * TRACKS (with TRACKID and TRACKNAME) |
+ | * TRACKS (with TRACKID and TRACKNAME) (grows at rate of track addition) |
− | * ALBUMS (with ALBUMID and ALBUMNAME) |
+ | * ALBUMS (with ALBUMID and ALBUMNAME) (grows at 1/12th rate of track addition, approx) |
− | * ARTISTS (with ARTISTID and ARTISTNAME) |
+ | * ARTISTS (with ARTISTID and ARTISTNAME) (grows at 1/100th, estimated, rate of track addition) |
− | * GENRES (with GENREID and GENRENAME) |
+ | * GENRES (with GENREID and GENRENAME) (static size) |
− | * TRACKEQUIV (with TRACKID and SAMETRACK) |
+ | * TRACKEQUIV (with TRACKID and SAMETRACK) (extremely slow rate of growth) |
− | * BANDMEMBERS (with NAME and MEMBER) |
+ | * BANDMEMBERS (with NAME and MEMBER) (extremely slow rate of growth) |
− | * GENRECONNECT (with TRACKID and GENREID) |
+ | * GENRECONNECT (with TRACKID and GENREID) (grows faster than rate of track addition) |
− | * ALBUMCONNECT (with TRACKID and ALBUMID) |
+ | * ALBUMCONNECT (with TRACKID and ALBUMID) (grows at rate of track addition) |
− | * ARTISTCONNECT (with TRACKID and ARTISTID) |
+ | * ARTISTCONNECT (with TRACKID and ARTISTID) (grows at rate of track addition) |
+ | |||
+ | * TRACKLINK (with TRACKLAST, TRACKNEXT, VOTE and COUNT) (growth will slow, tend towards trackcount^2) |
||
+ | * ALBUMLINK (with ALBUMLAST, ALBUMNEXT, VOTE and COUNT) (growth will slow, tend towards albumcount^2) |
||
+ | * ARTISTLINK (with ARTISTLAST, ARTISTNEXT, VOTE and COUNT) (growth tending to artistcount^2) |
||
+ | * GENRELINK (with GENRELAST, GENRENEXT, VOTE and COUNT) (growth tends to genrecount^2) |
||
− | * TRACKLINK (with TRACKLAST, TRACKNEXT, VOTE and COUNT) |
||
− | * ALBUMLINK (with ALBUMLAST, ALBUMNEXT, VOTE and COUNT) |
||
− | * ARTISTLINK (with ARTISTLAST, ARTISTNEXT, VOTE and COUNT) |
||
− | * GENRELINK (with GENRELAST, GENRENEXT, VOTE and COUNT) |
||
''What I mentioned before about TRACKLINK goes for ALBUMLINK. ARTISTLINK, and GENRELINK. -- Screwtape'' |
''What I mentioned before about TRACKLINK goes for ALBUMLINK. ARTISTLINK, and GENRELINK. -- Screwtape'' |
Revision as of 17:08, 11 June 2002
Implementing a NIPL.
First up, we need to consider what we need to store.
- The music
- MetaData about the music
- Scores for song-pairs.
Method
- Music
- Music is stored on a per-track basis as any format supported by the player. (.ogg and/or .mp3 are the most likely formats, most likely stored simply as files on the filesystem)
- MetaData
- Music metadata should be stored in an associated file - simple text file initially? (this ensures the data is stored in a consistent manner regardless of music format type)
- The Album, track number, Artist(s), date recorded, instruments played, genre... there could be alot of stuff. Refer to:
- Scores
- For SongA->SongB where SongB follows songA, we cannot score the A->B pair untill after B begins. Nevertheless, we want to store the score with SongA, since that's where we'll need to know about it for next time we want to refer to it.
- For each song, a simple database to hold the scores and the song that score relates to. (maybe hold the number of votes also, for added precision when re-scoring)
Issues to think about:
- Implied scores (album/artist/genre) are difficult without further lookup tables keeping track of what songs belong to which album(s), artist(s) and genre(s)...
- Implied scores are gonna be kinda nescessary IMHO, given it's the only reasonable way to populate the scoring database with any reasonable speed.
Schema
This should maybe be put with some other part of the wiki, but for the time being, here 'tis:
We'll need a TRACKS table. We can safely assume we'll need a primary key "TRACKID" and TRACKNAME column.
- Note that "TRACKID" key uniqueness is covered in NCDI
So what about Albums?
- Just like the TRACKS table, we have an ALBUMS table. It has "ALBUMID" primary key (see NCDI), and ALBUMNAME column.
How about Artists?
- Similarly, an ARTISTS table with ARTISTID primary key (assignment can be sequential :), and ARTISTNAME column
And Genres?
- Am thinking just have a flat list of genres - any sub-genre organisation is, for now, outside the scope of this implementation. Thus, "GENREID" and "GENRENAME"
So, we have 4 tables so far - lists for TRACK, ALBUM, ARTIST, and GENRE
Now, the essential part of NIPL is knowing what the links between tracks are:
A table called "TRACKLINK" with columns of TRACKLAST, TRACKNEXT, SCORE, VOTECOUNT. This table doesn't have a unique key - though in theory, the TRACKLAST & TRACKNEXT combo should always unique.
TRACKLINK seems too close to TRACKCONNECT for my liking - how about TRACKLINKAGE or TRACKCHAINING or something? -- Screwtape
Similarly, for implied scores, we have a ALBUMLINK, ARTISTLINK and GENRELINK tables - each with the same LAST, NEXT, SCORE and COUNT columns.
Is there anything we've missed out?
- Q. What about tracking which songs come from which albums?
- A. This is handled by using NCDI for TRACKID and ALBUMID.
- Q. What about tracks that are on multiple albums?
- A. New table time! Let's call it TRACKEQUIV, and has a TRACKID column, and a ALIAS column. Thus linking two seperate NCDI's as being the same actual track.
Note that every time you add A as an alias of B, you should also add B as an alias of A, for completeness. -- Screwtape
- Q. How do we know what artists did which albums or tracks, and which genres. etc?
- A. More tables huh? Everything links back to tracks - that's what's important. So, let's do "GENRECONNECT" table with TRACKID and GENREID as columns. Similarly, we can do "ARTISTCONNECT" with TRACKID and ARTISTID. Lastly, "ALBUMCONNECT" may be desirable, with ALBUMID and TRACKID columns. Desirable because it's implemented hte same as the rest of hte system now, rather than relying on NCDI ;)
Note that ALBUMCONNECT would also solve the previous two questions rather neatly. Question 1 is an obvious outcome, question 2 just requires having two or more records with the same TRACKID and different ALBUMIDs. -- Screwtape
- Q. Doesn't this make for a horrible UI?
- A. No. The end UI can ask for a genre by Album - it just stores it as a genre by track. (for example). The UI could even try and predict the "most-likely" genre for an album by looking at the artist(s) for that album, then looking at other genres for that artist(s).
No. This granularity of data means the UI can slice and dice the information any way it wants, but there's no rule that this complexity must be passed on to the user. Ideally, the UI will let the user edit summaries, which will be decomposed into values for all the relevant tables. -- Screwtape
- Q. Wouldn't it be nice to have a way of noting when individual artists are part of larger groups? (eg, Paul McCartney is reknown solo artist, but also reknown as part of Wings (and The Beatles, I am told ;)
- A. Sounds good. "BANDMEMBERS" table with NAME and MEMBER columns - both populated with ARTISTID keys.
I'd call them ARTISTID and MEMBER_ARTISTID, to highlight that they're both ARTISTIDs. I'd expect a field called NAME to be a human-readable name. -- Screwtape
The database so far...
(for those who came in late)
- TRACKS (with TRACKID and TRACKNAME) (grows at rate of track addition)
- ALBUMS (with ALBUMID and ALBUMNAME) (grows at 1/12th rate of track addition, approx)
- ARTISTS (with ARTISTID and ARTISTNAME) (grows at 1/100th, estimated, rate of track addition)
- GENRES (with GENREID and GENRENAME) (static size)
- TRACKEQUIV (with TRACKID and SAMETRACK) (extremely slow rate of growth)
- BANDMEMBERS (with NAME and MEMBER) (extremely slow rate of growth)
- GENRECONNECT (with TRACKID and GENREID) (grows faster than rate of track addition)
- ALBUMCONNECT (with TRACKID and ALBUMID) (grows at rate of track addition)
- ARTISTCONNECT (with TRACKID and ARTISTID) (grows at rate of track addition)
- TRACKLINK (with TRACKLAST, TRACKNEXT, VOTE and COUNT) (growth will slow, tend towards trackcount^2)
- ALBUMLINK (with ALBUMLAST, ALBUMNEXT, VOTE and COUNT) (growth will slow, tend towards albumcount^2)
- ARTISTLINK (with ARTISTLAST, ARTISTNEXT, VOTE and COUNT) (growth tending to artistcount^2)
- GENRELINK (with GENRELAST, GENRENEXT, VOTE and COUNT) (growth tends to genrecount^2)
What I mentioned before about TRACKLINK goes for ALBUMLINK. ARTISTLINK, and GENRELINK. -- Screwtape
Note that the data in the ALBUMLINK, ARTISTLINK, and GENRELINK tables can be extrapolated from the
data in other tables - however we make provision to store it seperately as caching this data is expected to be useful for performance. (the alternative is to calculate the ALBUMLINK info (for example) by looking at links between all songs on albumA to all the songs on albumB - this would usually be, on average, almost 150 links to look up. The situation only gets worse if you're trying to recalculate the ARTISTLINK data, and GENRELINK data! This is especially relevant when you consider that the ALBUMLINK, ARTISTLINK and GENRELINK data is used in implied scores. Compacting down 150 lookups to one is VERY important when that one lookup is just one of *many* before deciding what the next TRACK will be...
old stuff below
For the database to work properly, it matters only that genres are uniquely identified - the actual human-readable strings are not necessary. So, we'll make another table GENRES, with two columns: GENREID and NAME. How sub-genres can be implemented, I don't know - I don't think anyone knows how subgenres work yet, so implementing them is pointless.
To connect genres and songs, we'll need a third table, called (why not?) GENRECONNECT. It can have two colums, TRACKID and GENREID.
To find the genre IDs attached to a song with, say, SONGID=123, we run:
SELECT GENREID FROM GENRECONNECT WHERE TRACKID=123
To find the genre names attached to a song, we have the slightly more complex query:
SELECT GENRES.NAME FROM GENRECONNECT, GENRES WHERE GENRECONNECT.TRACKID=123 AND GENRECONNECT.GENREID=GENRES.GENREID
Screwtape sez: SQL gurus can correct me on this, SQL examples included only for the Edification and Education of Nemo, who persists in his unorthodox obsessions.