NIPL/Implementation

From ThorxWiki
(Difference between revisions)
Jump to: navigation, search
(expansion....)
(db expansion)
Line 30: Line 30:
 
This should maybe be put with some other part of the wiki, but for the time being, here 'tis:
 
This should maybe be put with some other part of the wiki, but for the time being, here 'tis:
   
We'll need a TRACKS table. We can safely assume we'll need a primary key "TRACKID" for TRACKS.
+
We'll need a TRACKS table. We can safely assume we'll need a primary key "TRACKID" and TRACKNAME column.
 
* Note that "TRACKID" key uniqueness is covered in [[NCDI]]
 
* Note that "TRACKID" key uniqueness is covered in [[NCDI]]
  +
  +
So what about Albums?
  +
:Just like the TRACKS table, we have an ALBUMS table. It has "ALBUMID" primary key (see NCDI), and ALBUMNAME column.
  +
  +
How about Artists?
  +
:Similarly, an ARTISTS table with ARTISTID primary key (assignment can be sequential :), and ARTISTNAME column
  +
  +
And Genres?
  +
:Am thinking just have a flat list of genres - any sub-genre organisation is, for now, outside the scope of this implementation. Thus, "GENREID" and "GENRENAME"
  +
  +
----
  +
  +
So, we have 4 tables so far - lists for TRACK, ALBUM, ARTIST, and GENRE
  +
  +
Now, the essential part of NIPL is knowing what the links between tracks are:
  +
  +
A table called "TRACKLINK" with columns of TRACKLAST, TRACKNEXT, SCORE, VOTECOUNT. This table doesn't have a unique key - though in theory, the TRACKLAST & TRACKNEXT combo should always unique.
  +
  +
Similarly, for implied scores, we have a ALBUMLINK, ARTISTLINK and GENRELINK tables - each with the same LAST, NEXT, SCORE and COUNT columns.
  +
  +
----
  +
  +
Is there anything we've missed out?
  +
  +
;Q. What about tracking which songs come from which albums?:A. This is handled by using [[NCDI]] for TRACKID and ALBUMID.
  +
;Q. What about tracks that are on multiple albums?:A. New table time! Let's call it TRACKEQUIV, and has a TRACKID column, and a SAMETRACK column. Thus linking two seperate NCDI's as being the same actual track.
  +
;Q. How do we know what artists did which albums or tracks, and which genres. etc?:A. More tables huh? Everything links back to tracks - that's what's important. So, let's do "GENRECONNECT" table with TRACKID and GENREID as columns. Similarly, We can do "ARTISTCONNECT" with TRACKID and ARTISTID. Lastly, "ALBUMCONNECT" may be desirable, with ALBUMID and TRACKID columns. Desirable because it's implemented hte same as the rest of hte system now, rather than relying on NCDI ;)
  +
;Q. Doesn't this make for a horrible UI?:A. No. The end UI can ask for a genre by Album - it just stores it as a genre by track. (for example). The UI could even try and predict the "most-likely" genre for an album by looking at the artist(s) for that album, then looking at other genres for that artist(s).
  +
;Q. Wouldn't it be nice to have a way of noting when individual artists are part of larger groups? (eg, Paul McCartney is reknown solo artist, but also reknown as part of Wings (and The Beatles, I am told ;):A. Sounds good. "BANDMEMBERS" table with NAME and MEMBER columns - both populated with ARTISTID keys.
  +
  +
----
  +
  +
=== The database so far... ===
  +
(for those who came in late)
  +
  +
* TRACKS (with TRACKID and TRACKNAME)
  +
* ALBUMS (with ALBUMID and ALBUMNAME)
  +
* ARTISTS (with ARTISTID and ARTISTNAME)
  +
* GENRES (with GENREID and GENRENAME)
  +
  +
* TRACKEQUIV (with TRACKID and SAMETRACK)
  +
* BANDMEMBERS (with NAME and MEMBER)
  +
  +
* GENRECONNECT (with TRACKID and GENREID)
  +
* ALBUMCONNECT (with TRACKID and ALBUMID)
  +
* ARTISTCONNECT (with TRACKID and ARTISTID)
  +
  +
* TRACKLINK (with TRACKLAST, TRACKNEXT, VOTE and COUNT)
  +
* ALBUMLINK (with ALBUMLAST, ALBUMNEXT, VOTE and COUNT)
  +
* ARTISTLINK (with ARTISTLAST, ARTISTNEXT, VOTE and COUNT)
  +
* GENRELINK (with GENRELAST, GENRENEXT, VOTE and COUNT)
  +
  +
  +
'''Note''' that the data in the ALBUMLINK, ARTISTLINK, and GENRELINK tables can be extrapolated from the
  +
data in other tables - however we make provision to store it seperately as caching this data is expected to be usefull for performance. (the alternative is to calculate the ALBUMLINK info (for example) by looking at links between all songs on albumA to all the songs on albumB - this would usually be, on average, almost 150 links to look up. The situation only gets worse if you're trying to recalculate the ARTISTLINK data, and GENRELINK data! This is especially relevant when you consider that the ALBUMLINK, ARTISTLINK and GENRELINK data is used in implied scores. Compacting down 150 lookups to 1 is VERY important when that 1 lookup is just one of *many* before deciding what the next TRACK will be...
  +
  +
  +
----
  +
  +
''old stuff here below''
   
 
For the database to work properly, it matters only that genres are uniquely identified - the actual human-readable strings are not necessary. So, we'll make another table GENRES, with two columns: GENREID and NAME. How sub-genres can be implemented, I don't know - I don't think anyone knows how subgenres work yet, so implementing them is pointless.
 
For the database to work properly, it matters only that genres are uniquely identified - the actual human-readable strings are not necessary. So, we'll make another table GENRES, with two columns: GENREID and NAME. How sub-genres can be implemented, I don't know - I don't think anyone knows how subgenres work yet, so implementing them is pointless.
Line 45: Line 105:
 
SELECT GENRES.NAME FROM GENRECONNECT, GENRES WHERE GENRECONNECT.TRACKID=123 AND GENRECONNECT.GENREID=GENRES.GENREID
 
SELECT GENRES.NAME FROM GENRECONNECT, GENRES WHERE GENRECONNECT.TRACKID=123 AND GENRECONNECT.GENREID=GENRES.GENREID
   
SQL gurus can correct me on this, SQL examples included only for the Edification and Education of [[Nemo]], who persists in his [http://xdm.house.cx/ unorthodox obsessions].
+
'''Screwtape sez''':''SQL gurus can correct me on this, SQL examples included only for the Edification and Education of [[Nemo]], who persists in his [http://xdm.house.cx/ unorthodox obsessions].''
   
 
...
 
...
 
So what about Albums?
 
:Just like the TRACKS table, we have an ALBUMS table. It has "ALBUMID" primary key (see NCDI), and ALBUMNAME column.
 
 
How about Artists?
 
:Similarly, an ARTISTS table with ARTISTID primary key (assignment can be sequential :), and ARTISTNAME column
 
 
And Genres?
 
:Am thinking just have a flat list of genres - any sub-genre organisation is, for now, outside the scope of this implementation. Thus, "GENREID" and "GENRENAME"
 

Revision as of 15:44, 11 June 2002

Implementing a NIPL.

First up, we need to consider what we need to store.

  • The music
  • MetaData about the music
  • Scores for song-pairs.

Method

  • Music
    • Music is stored on a per-track basis as any format supported by the player. (.ogg and/or .mp3 are the most likely formats, most likely stored simply as files on the filesystem)
  • MetaData
    • Music metadata should be stored in an associated file - simple text file initially? (this ensures the data is stored in a consistent manner regardless of music format type)
    • The Album, track number, Artist(s), date recorded, instruments played, genre... there could be alot of stuff. Refer to:
  • Scores
    • For SongA->SongB where SongB follows songA, we cannot score the A->B pair untill after B begins. Nevertheless, we want to store the score with SongA, since that's where we'll need to know about it for next time we want to refer to it.
    • For each song, a simple database to hold the scores and the song that score relates to. (maybe hold the number of votes also, for added precision when re-scoring)

Issues to think about:

  • Implied scores (album/artist/genre) are difficult without further lookup tables keeping track of what songs belong to which album(s), artist(s) and genre(s)...
    • Implied scores are gonna be kinda nescessary IMHO, given it's the only reasonable way to populate the scoring database with any reasonable speed.

Schema

This should maybe be put with some other part of the wiki, but for the time being, here 'tis:

We'll need a TRACKS table. We can safely assume we'll need a primary key "TRACKID" and TRACKNAME column.

  • Note that "TRACKID" key uniqueness is covered in NCDI

So what about Albums?

Just like the TRACKS table, we have an ALBUMS table. It has "ALBUMID" primary key (see NCDI), and ALBUMNAME column.

How about Artists?

Similarly, an ARTISTS table with ARTISTID primary key (assignment can be sequential :), and ARTISTNAME column

And Genres?

Am thinking just have a flat list of genres - any sub-genre organisation is, for now, outside the scope of this implementation. Thus, "GENREID" and "GENRENAME"

So, we have 4 tables so far - lists for TRACK, ALBUM, ARTIST, and GENRE

Now, the essential part of NIPL is knowing what the links between tracks are:

A table called "TRACKLINK" with columns of TRACKLAST, TRACKNEXT, SCORE, VOTECOUNT. This table doesn't have a unique key - though in theory, the TRACKLAST & TRACKNEXT combo should always unique.

Similarly, for implied scores, we have a ALBUMLINK, ARTISTLINK and GENRELINK tables - each with the same LAST, NEXT, SCORE and COUNT columns.


Is there anything we've missed out?

Q. What about tracking which songs come from which albums?
A. This is handled by using NCDI for TRACKID and ALBUMID.
Q. What about tracks that are on multiple albums?
A. New table time! Let's call it TRACKEQUIV, and has a TRACKID column, and a SAMETRACK column. Thus linking two seperate NCDI's as being the same actual track.
Q. How do we know what artists did which albums or tracks, and which genres. etc?
A. More tables huh? Everything links back to tracks - that's what's important. So, let's do "GENRECONNECT" table with TRACKID and GENREID as columns. Similarly, We can do "ARTISTCONNECT" with TRACKID and ARTISTID. Lastly, "ALBUMCONNECT" may be desirable, with ALBUMID and TRACKID columns. Desirable because it's implemented hte same as the rest of hte system now, rather than relying on NCDI ;)
Q. Doesn't this make for a horrible UI?
A. No. The end UI can ask for a genre by Album - it just stores it as a genre by track. (for example). The UI could even try and predict the "most-likely" genre for an album by looking at the artist(s) for that album, then looking at other genres for that artist(s).
Q. Wouldn't it be nice to have a way of noting when individual artists are part of larger groups? (eg, Paul McCartney is reknown solo artist, but also reknown as part of Wings (and The Beatles, I am told ;)
A. Sounds good. "BANDMEMBERS" table with NAME and MEMBER columns - both populated with ARTISTID keys.

The database so far...

(for those who came in late)

  • TRACKS (with TRACKID and TRACKNAME)
  • ALBUMS (with ALBUMID and ALBUMNAME)
  • ARTISTS (with ARTISTID and ARTISTNAME)
  • GENRES (with GENREID and GENRENAME)
  • TRACKEQUIV (with TRACKID and SAMETRACK)
  • BANDMEMBERS (with NAME and MEMBER)
  • GENRECONNECT (with TRACKID and GENREID)
  • ALBUMCONNECT (with TRACKID and ALBUMID)
  • ARTISTCONNECT (with TRACKID and ARTISTID)
  • TRACKLINK (with TRACKLAST, TRACKNEXT, VOTE and COUNT)
  • ALBUMLINK (with ALBUMLAST, ALBUMNEXT, VOTE and COUNT)
  • ARTISTLINK (with ARTISTLAST, ARTISTNEXT, VOTE and COUNT)
  • GENRELINK (with GENRELAST, GENRENEXT, VOTE and COUNT)


Note that the data in the ALBUMLINK, ARTISTLINK, and GENRELINK tables can be extrapolated from the data in other tables - however we make provision to store it seperately as caching this data is expected to be usefull for performance. (the alternative is to calculate the ALBUMLINK info (for example) by looking at links between all songs on albumA to all the songs on albumB - this would usually be, on average, almost 150 links to look up. The situation only gets worse if you're trying to recalculate the ARTISTLINK data, and GENRELINK data! This is especially relevant when you consider that the ALBUMLINK, ARTISTLINK and GENRELINK data is used in implied scores. Compacting down 150 lookups to 1 is VERY important when that 1 lookup is just one of *many* before deciding what the next TRACK will be...



old stuff here below

For the database to work properly, it matters only that genres are uniquely identified - the actual human-readable strings are not necessary. So, we'll make another table GENRES, with two columns: GENREID and NAME. How sub-genres can be implemented, I don't know - I don't think anyone knows how subgenres work yet, so implementing them is pointless.

To connect genres and songs, we'll need a third table, called (why not?) GENRECONNECT. It can have two colums, TRACKID and GENREID.

To find the genre IDs attached to a song with, say, SONGID=123, we run:

SELECT GENREID FROM GENRECONNECT WHERE TRACKID=123

To find the genre names attached to a song, we have the slightly more complex query:

SELECT GENRES.NAME FROM GENRECONNECT, GENRES WHERE GENRECONNECT.TRACKID=123 AND GENRECONNECT.GENREID=GENRES.GENREID

Screwtape sez:SQL gurus can correct me on this, SQL examples included only for the Edification and Education of Nemo, who persists in his unorthodox obsessions.

...

Personal tools
Namespaces

Variants
Actions
Navigation
meta navigation
More thorx
Tools