Connect with your Facebook account

SQL database schema?

 
    •  
      CommentAuthorfcahoon
    • CommentTimeNov 6th 2009
     permalink
    I just downloaded the most recent xml dump and am planning to put the data into an SQL database.

    Although the structure is pretty clear from the wiki description, I would like to know the max length of all the character fields.

    I'm also wondering if location is in another database table, and maybe has ids which aren't dumped into the xml?

    It would be great to post some generic SQL on the wiki to create a compatible table structure -- I'm sure I'm not the only one who would benefit.

    Thanks for any info.
    •  
      CommentAuthorBarabbas
    • CommentTimeNov 6th 2009 edited
     permalink
    Hi fcahoon,

    I am using the following structure for my sqlite database:

    sql = """
    CREATE TABLE artists (
    uid INTEGER PRIMARY KEY,
    id INTEGER,
    name TEXT,
    country TEXT,
    image TEXT,
    mbgid TEXT,
    state TEXT,
    city TEXT,
    url TEXT,
    latitude TEXT,
    longitude TEXT,
    albumcount INTEGER
    )
    """
    self.conn.execute(sql)
    self.conn.commit()

    # ALBUMS
    sql = """
    CREATE TABLE albums (
    uid INTEGER PRIMARY KEY,
    id INTEGER,
    name TEXT,
    url TEXT,
    releasedate TEXT,
    filename TEXT,
    id3genre INTEGER,
    mbgid TEXT,
    license_artwork TEXT,
    artist_id INTEGER,
    trackcount INTEGER
    )
    """
    self.conn.execute(sql)
    self.conn.commit()
    # TRACKS
    sql = """
    CREATE TABLE tracks (
    uid INTEGER PRIMARY KEY,
    id INTEGER,
    name TEXT,
    duration INTEGER,
    album_id INTEGER,
    artist_id INTEGER,
    numalbum INTEGER,
    filename TEXT,
    mbgid TEXT,
    id3genre INTEGER,
    license TEXT
    )
    """
    self.conn.execute(sql)
    self.conn.commit()

    # TAGS
    sql = """
    CREATE TABLE tags (
    uid INTEGER PRIMARY KEY,
    artist_id INTEGER,
    album_id INTEGER,
    track_id INTEGER,
    idstr TEXT,
    weight FLOAT
    )
    """
    self.conn.execute(sql)
    self.conn.commit()


    Perhaps this helps you. It is somehow redundant as it has a artist_id, a album_id and a track_id field for every tag where an track_id would to the trick - but I had the impression that this would speed up some queries.
    Even the "uid" should be replaced by the id-field given by jamendo.

    As you can see, i tend to use TEXT-fields all the time. This thread is a good opportunity to improve that, isn't it?
    •  
      CommentAuthorfcahoon
    • CommentTimeNov 6th 2009
     permalink
    Yeah, I was really hoping to get those lengths: it seems to me that most if not all of those fields should be varchars.

    If the developers don't weigh in, I guess I'll make some educated guesses based on the max lengths of the existing data ... but that could lead me into trouble down the road if I guess too short.
    •  
      CommentAuthorfcahoon
    • CommentTimeNov 8th 2009
     permalink
    Treating everything as strings for the purpose, I found the max lengths from yesterday's dump:

    album:
    filename => 151
    id => 5
    id3genre => 2
    license_artwork => 53
    mbgid => 36
    name => 133
    releasedate => 25
    url => 34

    artist:
    id => 6
    image => 96
    mbgid => 36
    name => 125
    url => 94

    location:
    city => 58
    country => 3
    latitude => 9
    longitude => 10
    state => 3

    tag:
    idstr => 52
    weight => 6

    track:
    id => 6
    id3genre => 2
    license => 53
    mbgid => 36
    name => 196
    numalbum => 3

    looks like all names, urls and filenames could easily be varchar(255) -- that's what I'm going to use for now.
    •  
      CommentAuthoribuddz
    • CommentTimeDec 12th 2009
     permalink
    How are you loading the XML dump file into the database (PHP, Java)?
    •  
      CommentAuthorfcahoon
    • CommentTimeDec 12th 2009
     permalink
    I'm using Perl, with the Rose::DB::Object ORM mapping module. I'm almost done, but it appears there's some bad (non-numeric) longitudes and/or latitudes in the last dump I downloaded, so I have to figure out what that bad data is and how I want to deal with it.

    i'm certainly happy to share my code with anyone who wants it.
 

Forum powered by Vanilla