Revision as of 01:15, 25 October 2007

Screwtape speaking, here

Nemo asked me to look at a thread on the FreeDB messageboard about an XML-based query format for FreeDB, and comment here. Not on the XML-bit, but because nemo posted a link to NCDI and someone else posted and said it was rather a bad idea.

If that link above is dead, let me summarize:

nemo: The current DiscID implementation has some problems. Here's a better one.
pbxx: Why do we need an ID system at all?
nemo: Uh, I need sleep, Screwtape, will you look at this till I wake up?

pbxx asks a Very Good Question. Why do we need a disc ID?

Let's start with the problem we're trying to solve.

The Problem

FreeDB is obviously trying to figure out what CD a client has in their drive, so that appropriate metadata can be returned. To this end, the FreeDB server needs to be given all the identifying information it can get.

From talks with Nemo, the only easily-obtainable distinguishing features of a CD are the track offsets and the length of the disc. That's all the information the server really needs to be given, anything more is gratuitous.

So the question is, why is encoding this information in a disc ID better (or worse!) than just sending the raw data?

Pros and Cons

Advantages of a disc ID:

Bandwidth - It's quicker (though probably not by much) to transmit even a 64-bit ID than 99 track-offset fields and a length.

Nemo sez: However, the offsets, etc, are sent with the query anyways...

Database ID - Any serious server implementation is going to try to use the information in the query as a database key at some point, so we might as well make a it a useful key to begin with.

Advantages of using raw data:

Future-compatibility - If somebody comes up with a better way of hashing, you don't have to update every client to take advantage of it.

Nemo sez: If the protocol is designed right, this doesn't matter.
The disc identification between server and client should be handled
by just looking at the toc - I agree with that. The server indexes
the CD's with the NCDI, but there isn't any special requirement
(beyond convenience) that the client do the same. It can use DiscID,
CDindex, or no ID at all for all the server should care.

Fewer collisions - Any hashing system has the potential to create collisions, so using the raw data will give you the fewest collisions possible.

Nemo sez: True, but if you accept that a collision is inevitable
(ie, raw data is the same, no matter what resolution), then storing
just the raw data helps none when that collision occurs.
NCDI provides a means to keep the data seperate. (no reason this
couldn't be done with a raw-data system either, in truth. ;)

Decision

Go not to the Elves for counsel, for they will say both no and yes.

Look at the respective advantages. Which do you think is more important? For FreeDB, I think that making the query string include all relevant details to the highest level of accuracy is probably worth it.

For NUDI, who knows? I don't know exactly what problem NUDI is trying to solve, so I can't make any reccommendations.

Nemo sez: NUDI is, to be wholly correct, merely a small utility.
It will, ultimately, identify a CD with a DiscID, NCDI, CDindex,
raw toc, and more if they exist. NCDI is the Id designed for:
* Identifying and indexing CD data within
** NUMB
** NIPL
** CDwiki

@@ Line 1: / Line 1: @@
 ''[[Screwtape]] speaking, here''
-Nemo asked me to look at [http://www.freedb.org/cgi-bin/ib3/ikonboard.cgi?s=3cd2b9c64c17ffff;act=ST;f=3;t=89 a thread] on the FreeDB messageboard about an XML-based query format for [[FreeDB]], and comment here. Not on the XML-bit, but because nemo posted a link to [[NCDI]] and someone else posted and said it was rather a bad idea.
+Nemo asked me to look at [http://www.freedb.org/cgi-bin/ib3/ikonboard.cgi?s=3cd2b9c64c17ffff;act=ST;f=3;t=89 a thread] on the [[FreeDB]] messageboard about an XML-based query format for [[FreeDB]], and comment here. Not on the XML-bit, but because nemo posted a link to [[NCDI]] and someone else posted and said it was rather a bad idea.
 If that link above is dead, let me summarize:
-;nemo:The current DiscID implementation has some problems. [[NCDI|Here]]'s a better one.
+;nemo:The current [[DiscID]] implementation has some problems. [[NCDI|Here]]'s a better one.
 ;pbxx:Why do we need an ID system at all?
 ;nemo:Uh, I need sleep, Screwtape, will you look at this till I wake up?
@@ Line 15: / Line 15: @@
 = The Problem =
-FreeDB is obviously trying to figure out what CD a client has in their drive, so that appropriate metadata can be returned. To this end, the FreeDB server needs to be given all the identifying information it can get.
+[[FreeDB]] is obviously trying to figure out what CD a client has in their drive, so that appropriate metadata can be returned. To this end, the [[FreeDB]] server needs to be given all the identifying information it can get.
 From talks with [[Nemo]], the only easily-obtainable distinguishing features of a CD are the track offsets and the length of the disc. That's all the information the server really needs to be given, anything more is gratuitous.
@@ Line 34: / Line 34: @@
  by just looking at the toc - I agree with that. The server indexes
  the CD's with the NCDI, but there isn't any special requirement
- (beyond convenience) that the client do the same. It can use DiscID,
+ (beyond convenience) that the client do the same. It can use [[DiscID]],
  CDindex, or no ID at all for all the server should care.
 * '''Fewer collisions''' - Any hashing system has the potential to create collisions, so using the raw data will give you the fewest collisions possible.
@@ Line 48: / Line 48: @@
 '''Go not to the Elves for counsel, for they will say both no and yes.'''
-Look at the respective advantages. Which do you think is more important? For FreeDB, I think that making the query string include all relevant details to the highest level of accuracy is probably worth it.
+Look at the respective advantages. Which do you think is more important? For [[FreeDB]], I think that making the query string include all relevant details to the highest level of accuracy is probably worth it.
 For [[NUDI]], who knows? I don't know exactly what problem NUDI is trying to solve, so I can't make any reccommendations.
  Nemo sez: NUDI is, to be wholly correct, merely a small utility.
- It will, ultimately, identify a CD with a DiscID, NCDI, CDindex,
+ It will, ultimately, identify a CD with a [[DiscID]], NCDI, CDindex,
  raw toc, and more if they exist. [[NCDI]] is the Id designed for:
  * Identifying and indexing CD data within

WhyDoWeNeedADiscID

Revision as of 01:15, 25 October 2007

The Problem

Pros and Cons

Decision

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

meta navigation

More thorx

Tools