NIPL/Implementation

From ThorxWiki
Revision as of 16:57, 11 June 2002 by Screwtape (Talk)

Jump to: navigation, search

Implementing a NIPL.

First up, we need to consider what we need to store.

  • The music
  • MetaData about the music
  • Scores for song-pairs.

Method

  • Music
    • Music is stored on a per-track basis as any format supported by the player. (.ogg and/or .mp3 are the most likely formats, most likely stored simply as files on the filesystem)
  • MetaData
    • Music metadata should be stored in an associated file - simple text file initially? (this ensures the data is stored in a consistent manner regardless of music format type)
    • The Album, track number, Artist(s), date recorded, instruments played, genre... there could be alot of stuff. Refer to:
  • Scores
    • For SongA->SongB where SongB follows songA, we cannot score the A->B pair untill after B begins. Nevertheless, we want to store the score with SongA, since that's where we'll need to know about it for next time we want to refer to it.
    • For each song, a simple database to hold the scores and the song that score relates to. (maybe hold the number of votes also, for added precision when re-scoring)

Issues to think about:

  • Implied scores (album/artist/genre) are difficult without further lookup tables keeping track of what songs belong to which album(s), artist(s) and genre(s)...
    • Implied scores are gonna be kinda nescessary IMHO, given it's the only reasonable way to populate the scoring database with any reasonable speed.

Schema

This should maybe be put with some other part of the wiki, but for the time being, here 'tis:

We'll need a TRACKS table. We can safely assume we'll need a primary key "TRACKID" and TRACKNAME column.

  • Note that "TRACKID" key uniqueness is covered in NCDI

So what about Albums?

Just like the TRACKS table, we have an ALBUMS table. It has "ALBUMID" primary key (see NCDI), and ALBUMNAME column.

How about Artists?

Similarly, an ARTISTS table with ARTISTID primary key (assignment can be sequential :), and ARTISTNAME column

And Genres?

Am thinking just have a flat list of genres - any sub-genre organisation is, for now, outside the scope of this implementation. Thus, "GENREID" and "GENRENAME"

So, we have 4 tables so far - lists for TRACK, ALBUM, ARTIST, and GENRE

Now, the essential part of NIPL is knowing what the links between tracks are:

A table called "TRACKLINK" with columns of TRACKLAST, TRACKNEXT, SCORE, VOTECOUNT. This table doesn't have a unique key - though in theory, the TRACKLAST & TRACKNEXT combo should always unique.

TRACKLINK seems too close to TRACKCONNECT for my liking - how about TRACKLINKAGE or TRACKCHAINING or something? -- Screwtape

Similarly, for implied scores, we have a ALBUMLINK, ARTISTLINK and GENRELINK tables - each with the same LAST, NEXT, SCORE and COUNT columns.


Is there anything we've missed out?

Q. What about tracking which songs come from which albums?
A. This is handled by using NCDI for TRACKID and ALBUMID.
Q. What about tracks that are on multiple albums?
A. New table time! Let's call it TRACKEQUIV, and has a TRACKID column, and a ALIAS column. Thus linking two seperate NCDI's as being the same actual track.

Note that every time you add A as an alias of B, you should also add B as an alias of A, for completeness. -- Screwtape

Q. How do we know what artists did which albums or tracks, and which genres. etc?
A. More tables huh? Everything links back to tracks - that's what's important. So, let's do "GENRECONNECT" table with TRACKID and GENREID as columns. Similarly, we can do "ARTISTCONNECT" with TRACKID and ARTISTID. Lastly, "ALBUMCONNECT" may be desirable, with ALBUMID and TRACKID columns. Desirable because it's implemented hte same as the rest of hte system now, rather than relying on NCDI ;)

Note that ALBUMCONNECT would also solve the previous two questions rather neatly. Question 1 is an obvious outcome, question 2 just requires having two or more records with the same TRACKID and different ALBUMIDs. -- Screwtape

Q. Doesn't this make for a horrible UI?
A. No. The end UI can ask for a genre by Album - it just stores it as a genre by track. (for example). The UI could even try and predict the "most-likely" genre for an album by looking at the artist(s) for that album, then looking at other genres for that artist(s).

No. This granularity of data means the UI can slice and dice the information any way it wants, but there's no rule that this complexity must be passed on to the user. Ideally, the UI will let the user edit summaries, which will be decomposed into values for all the relevant tables. -- Screwtape

Q. Wouldn't it be nice to have a way of noting when individual artists are part of larger groups? (eg, Paul McCartney is reknown solo artist, but also reknown as part of Wings (and The Beatles, I am told ;)
A. Sounds good. "BANDMEMBERS" table with NAME and MEMBER columns - both populated with ARTISTID keys.

I'd call them ARTISTID and MEMBER_ARTISTID, to highlight that they're both ARTISTIDs. I'd expect a field called NAME to be a human-readable name. -- Screwtape


The database so far...

(for those who came in late)

  • TRACKS (with TRACKID and TRACKNAME)
  • ALBUMS (with ALBUMID and ALBUMNAME)
  • ARTISTS (with ARTISTID and ARTISTNAME)
  • GENRES (with GENREID and GENRENAME)
  • TRACKEQUIV (with TRACKID and SAMETRACK)
  • BANDMEMBERS (with NAME and MEMBER)
  • GENRECONNECT (with TRACKID and GENREID)
  • ALBUMCONNECT (with TRACKID and ALBUMID)
  • ARTISTCONNECT (with TRACKID and ARTISTID)
  • TRACKLINK (with TRACKLAST, TRACKNEXT, VOTE and COUNT)
  • ALBUMLINK (with ALBUMLAST, ALBUMNEXT, VOTE and COUNT)
  • ARTISTLINK (with ARTISTLAST, ARTISTNEXT, VOTE and COUNT)
  • GENRELINK (with GENRELAST, GENRENEXT, VOTE and COUNT)

What I mentioned before about TRACKLINK goes for ALBUMLINK. ARTISTLINK, and GENRELINK. -- Screwtape


Note that the data in the ALBUMLINK, ARTISTLINK, and GENRELINK tables can be extrapolated from the data in other tables - however we make provision to store it seperately as caching this data is expected to be useful for performance. (the alternative is to calculate the ALBUMLINK info (for example) by looking at links between all songs on albumA to all the songs on albumB - this would usually be, on average, almost 150 links to look up. The situation only gets worse if you're trying to recalculate the ARTISTLINK data, and GENRELINK data! This is especially relevant when you consider that the ALBUMLINK, ARTISTLINK and GENRELINK data is used in implied scores. Compacting down 150 lookups to one is VERY important when that one lookup is just one of *many* before deciding what the next TRACK will be...



old stuff below

For the database to work properly, it matters only that genres are uniquely identified - the actual human-readable strings are not necessary. So, we'll make another table GENRES, with two columns: GENREID and NAME. How sub-genres can be implemented, I don't know - I don't think anyone knows how subgenres work yet, so implementing them is pointless.

To connect genres and songs, we'll need a third table, called (why not?) GENRECONNECT. It can have two colums, TRACKID and GENREID.

To find the genre IDs attached to a song with, say, SONGID=123, we run:

SELECT GENREID FROM GENRECONNECT WHERE TRACKID=123

To find the genre names attached to a song, we have the slightly more complex query:


SELECT GENRES.NAME FROM GENRECONNECT, GENRES WHERE GENRECONNECT.TRACKID=123 AND GENRECONNECT.GENREID=GENRES.GENREID

Screwtape sez: SQL gurus can correct me on this, SQL examples included only for the Edification and Education of Nemo, who persists in his unorthodox obsessions.

Personal tools
Namespaces

Variants
Actions
Navigation
meta navigation
More thorx
Tools