WhyDoWeNeedADiscID
Screwtape speaking, here
Nemo asked me to look at a thread on the FreeDB messageboard about an XML-based query format for FreeDB, and comment here. Not on the XML-bit, but because nemo posted a link to NCDI and someone else posted and said it was rather a bad idea.
If that link above is dead, let me summarize:
- nemo
- The current DiscID implementation has some problems. Here's a better one.
- pbxx
- Why do we need an ID system at all?
- nemo
- Uh, I need sleep, Screwtape, will you look at this till I wake up?
pbxx asks a Very Good Question. Why do we need a disc ID?
Let's start with the problem we're trying to solve.
The Problem
FreeDB is obviously trying to figure out what CD a client has in their drive, so that appropriate metadata can be returned. To this end, the FreeDB server needs to be given all the identifying information it can get.
From talks with Nemo, the only easily-obtainable distinguishing features of a CD are the track offsets and the length of the disc. That's all the information the server really needs to be given, anything more is gratuitous.
So the question is, why is encoding this information in a disc ID better (or worse!) than just sending the raw data?
Pros and Cons
Advantages of a disc ID:
- Bandwidth - It's quicker (though probably not by much) to transmit even a 64-bit ID than 99 track-offset fields and a length.
Nemo sez: However, the offsets, etc, are sent with the query anyways...
- Database ID - Any serious server implementation is going to try to use the information in the query as a database key at some point, so we might as well make a it a useful key to begin with.
Advantages of using raw data:
- Future-compatibility - If somebody comes up with a better way of hashing, you don't have to update every client to take advantage of it.
Nemo sez: If the protocol is designed right, this doesn't matter. The disc identification between server and client should be handled by just looking at the toc - I agree with that. The server indexes the CD's with the NCDI, but there isn't any special requirement (beyond convenience) that the client do the same. It can use DiscID, CDindex, or no ID at all for all the server should care.
- Fewer collisions - Any hashing system has the potential to create collisions, so using the raw data will give you the fewest collisions possible.
Nemo sez: True, but if you accept that a collision is inevitable (ie, raw data is the same, no matter what resolution), then storing just the raw data helps none when that collision occurs. NCDI provides a means to keep the data seperate. (no reason this couldn't be done with a raw-data system either, in truth. ;)
Decision
Go not to the Elves for counsel, for they will say both no and yes.
Look at the respective advantages. Which do you think is more important? For FreeDB, I think that making the query string include all relevant details to the highest level of accuracy is probably worth it.
For NUDI, who knows? I don't know exactly what problem NUDI is trying to solve, so I can't make any reccommendations.
Nemo sez: NUDI is, to be wholly correct, merely a small utility. It will, ultimately, identify a CD with a DiscID, NCDI, CDindex, raw toc, and more if they exist. NCDI is the Id designed for: * Identifying and indexing CD data within ** NUMB ** NIPL ** CDwiki