FreeDB

From ThorxWiki
Jump to: navigation, search

FreeDB[1] is a clone of CDDB[2], based on the GPL software and database made available by CDDB before they locked them up and became, frankly, proprietry pricks.

FreeDB is now often considered to be superior to the original CDDB in terms of quality of data, and certainly in terms of friendliness to developers. However, they are still using the original CDDB-developed method, while CDDB is moving quickly onto their own 100% proprietry CDDB2 system. (which offers much good stuff, but is sadly very proprietry).

Nevertheless, the existing freedb database is a very interesting resource to look at...

Here is an explanation of how the DiscID is generated.

Given that freedb makes the entire database available, it's relatively easy enough to extract a list of discid's, break them down into the respective fields, and run some statistical analysis over them.

Following are the results of this as based on the database as it existed in October2001.

The table shows

  • Pop - the number of CD's counted for stats on the tracks on a CD, and length of CD
  • Mean - the average
  • Med - the median
  • Mode - what track count / cd length holds the record for being "most"?
  • rec - the count that mode the mode the mode...
  • St.Dev - the standard deviation about the mean
     \Stats   ---------- Tracks on CD ----------     ------------- Length of CD -------------
 Genre\       Pop.  Mean  Med  Mode(rec)  St.Dev     Pop.   Mean    Med    Mode(rec)   St.Dev
 ----------------------------------------------------------------------------------------------
 Blues       18254  13.42  13  12( 2308)  5.2042    18254  3129.61  3122   2905 (18)   921.8151
 Classical   39420  12.71  12   8 (3003)  7.0740    39419  3693.20  3778   4212 (37)   680.7465
 Country     11457  13.74  12  10 (2869)  5.3808    11455  2674.79  2520   2402 (16)   903.2864
 Data         2308  10.76  11   1  (400)  8.8888     2307  3206.00  3430.5 4431 ( 6)  1264.5403
 Folk        21303  13.80  13  12 (3137)  4.9533    21302  3036.90  2973   2649 (23)   853.2145
 Jazz        30727  11.53  11  10 (3959)  4.8772    30727  3269.34  3270   2683 (27)   775.9943
 Misc       119584  13.15  13  12(12438)  6.8964   119578  3177.45  3280   4440(150)  1074.1188
 Newage      14453  10.83  11  10 (1639)  5.3237    14453  3170.68  3217   4437 (19)  1023.5188
 Reggae       4906  13.16  13  10  (625)  4.9640     4906  3053.04  3076   4440 (11)   942.4191
 Rock       151572  12.08  12  12(16755)  5.8031   151566  2934.76  2936   4440(115)  1096.7841
 Soundtrack  12373  15.22  14  12 (1010)  8.1135    12373  3089.63  3097   2722 (15)   996.4178
 ----------------------------------------------------------------------------------------------
 Total      426357  12.64  12  12(45417)  6.2549   426340  3118.85  3167   4440(359)  1032.6259

Things to be aware of:

  • The "length of CD" stats ignored any CD's longer than 5500 seconds - well beyond the legal length of a CD anyways. The "tracks on CD" stats did not ignore those CD's. This explains the discrepency between the population counts. (17 CD's out of 426357 total, less than 0.004%)
  • Many CD's exist in more than one genre in the freedb database. In fact, non-unique discid's make up 21.9% of the entire database!! (note that the above stats are available for the unique subsets of duplicate ID's and unique ID's)
  • The stats do not (and cannot) weight for the popularity of different CD's. ie, in an extreme case of CDone with 3 tracks selling a billion copies, and CDtwo with 1 track selling one copy, this method would happily say that the average track count in this two CD universe is "2 tracks"!
Personal tools
Namespaces

Variants
Actions
Navigation
meta navigation
More thorx
Tools