WhyDoWeNeedADiscID

From ThorxWiki
Jump to: navigation, search

Screwtape speaking, here

Nemo asked me to look at a thread on the FreeDB messageboard about an XML-based query format for FreeDB, and comment here. Not on the XML-bit, but because nemo posted a link to NCDI and someone else posted and said it was rather a bad idea.

If that link above is dead, let me summarize:

nemo
The current DiscID implementation has some problems. Here's a better one.
pbxx
Why do we need an ID system at all?
nemo
Uh, I need sleep, Screwtape, will you look at this till I wake up?

pbxx asks a Very Good Question. Why do we need a disc ID?

Let's start with the problem we're trying to solve.

The Problem

FreeDB is obviously trying to figure out what CD a client has in their drive, so that appropriate metadata can be returned. To this end, the FreeDB server needs to be given all the identifying information it can get.

From talks with Nemo, the only easily-obtainable distinguishing features of a CD are the track offsets and the length of the disc. That's all the information the server really needs to be given, anything more is gratuitous.

So the question is, why is encoding this information in a disc ID better (or worse!) than just sending the raw data?

Pros and Cons

Advantages of a disc ID:

  • Bandwidth - It's quicker (though probably not by much) to transmit even a 64-bit ID than 99 track-offset fields and a length.
Nemo sez: However, the offsets, etc, are sent with the query anyways...
  • Database ID - Any serious server implementation is going to try to use the information in the query as a database key at some point, so we might as well make a it a useful key to begin with.

Advantages of using raw data:

  • Future-compatibility - If somebody comes up with a better way of hashing, you don't have to update every client to take advantage of it.
Nemo sez: If the protocol is designed right, this doesn't matter.
The disc identification between server and client should be handled
by just looking at the toc - I agree with that. The server indexes
the CD's with the NCDI, but there isn't any special requirement
(beyond convenience) that the client do the same. It can use DiscID,
CDindex, or no ID at all for all the server should care.
  • Fewer collisions - Any hashing system has the potential to create collisions, so using the raw data will give you the fewest collisions possible.
Nemo sez: True, but if you accept that a collision is inevitable
(ie, raw data is the same, no matter what resolution), then storing
just the raw data helps none when that collision occurs.
NCDI provides a means to keep the data seperate. (no reason this
couldn't be done with a raw-data system either, in truth. ;)

Decision

Go not to the Elves for counsel, for they will say both no and yes.

Look at the respective advantages. Which do you think is more important? For FreeDB, I think that making the query string include all relevant details to the highest level of accuracy is probably worth it.

For NUDI, who knows? I don't know exactly what problem NUDI is trying to solve, so I can't make any reccommendations.

Nemo sez: NUDI is, to be wholly correct, merely a small utility.
It will, ultimately, identify a CD with a DiscID, NCDI, CDindex,
raw toc, and more if they exist. NCDI is the Id designed for:
* Identifying and indexing CD data within
** NUMB
** NIPL
** CDwiki
Personal tools
Namespaces

Variants
Actions
Navigation
meta navigation
More thorx
Tools