When looking at the tags of the MP3 versions of the albums from Jamendo, the french accented characters (this is just one example, other country-specific characters are broken, too) are looking like garbage.
This is NOT only happening with ONE album or so, but with ALL albums I downloaded.
Now I don't know which software Jamendo.com uses to write the tags (it is done by some Jamendo software, am I right ?). But I suspect that unicode issues are the reason for that.
Looking at the tags via an HEX-Editor, it seems like the v2 tags are written as UTF-16 Unicode. Writing the tags as Unicode is fine, as it's the right step towards internationalization, but I suspect that accented characters, etc. somehow get lost in the coversion process.
Maybe some Jamendo official could say something about that (what software is used to tag the files, and why accents get lost)
I also had this problem with any of the MP3 versions I've DL'd here. I think what I did to correct it was to specify UTF-8 in my BT client (Azureus) and it seemed to work although I can't be certain if that was indeed the problem/fix over the long term, as soon afterwards I switched to getting the ogg versions only, since the ogg/flac format tags don't display this problem. I was also having a problem fixing it, so ended up re-DLing or transcoding to flac if I was having a problem getting the ogg versions on some of the older releases.
After looking into the subject of ID3 tags, it looks like it's developed haphazardly over the years as to versions and what they do or features they add, so I avoid MP3's whenever I can now.
Well, you are mainly talking about the filenames, and you are absoultely correct, using an Unicode comaptible client should solve that problem
But what I was focusing on was the tags, and as far as they are concerned, there is no "excuse" for declaring the tags as being Unicode (UTF-16) ID3v2.3 tags (which is in fact perfect for accented characters and all other international characters), but then writing wrongly encoded special characters into them. Note that buggy software (read: buggy ID3v2 implementations) is one of the main reasons for the mp3 tag chaos.
If a software uses unicode, it has to "do it right", and I'm having the strong suspicion that the software Jamendo.com uses for tagging the albums has issues with Unicode. Which would be a shame
PS: Don't you Jamendo guys me wrong, I don't want to sound offensive or disrespectful at all, but I think this is a real issue that should be taken care of.
Ya. I should have specified about the filenames and tags issue within the BT client. I went back to an old MP3 DL I had previous problems with and indeed the filename was corrected but the tag info was fubar'd. And all the accented characters were still mangled.
I never did manage to look into it far enough to actually find a way to fix the problem. However if I might suggest to the Jamendo folks if they were to first enter music info into MusicBrainz and then use the resulting tags thier system will generate, as a way to standardize the tags, as well as get the various artists info out to anybody that uses thier database (like Last.fm), which is a freely open system for anyone to edit/add information into.
sylvinus wrote: I'll fix that next month, I'll post some test MP3s here before re-encoding our whole database ;-)
Any updates on that? The problem still persists, I am currently working around it by retagging the files here, but I guess that most people don't know how to do it. It should not be too hard for jamendo to fix the problem. The ID3V1 tags are OK (or at least as good as ID3V1 tags can be). The problem is only in the ID3V2 tags, which are usually preferred if both exist. They are encoded with UTF16, which is also OK (spacemarine already mentioned it).
This is the start of the song "Si mon rêve", but the accented "e" is somehow mangled into 2 UTF-16 characters, which is the problem. Could somebody from jamendo comment on how the tags are converted and written? I could then do some more debugging, or maybe provide a script that fixes already downloaded files.
Same encoding issue here (of course:-) As long as there's no fix and also for the already downloaded albums: Does anybody know a windows/platform independent application which can convert the tags in a batch job? Of course I could edit the tags one by one, but this could take days:-)
I don't think the problem is with that file per se but may be with the way you get the information for tags.
Here is the diagnosis:
The tag strings first encoded with UTF-8 which should be find for mp3s from the whole world. But at some point raw string data gets treated as if it was Latin-1 and is encoded with UTF-16. that's why accented characters turn into double characters: they were double byte sequences in UTF-8 and gat treated as two characters when read/used as Latin-1. If you want I could further investigate where things go wrong - send me an email to femistofel@gmail.com .
To illustrate this I've made a tiny python script that fixes the problem. I used the original pytagger library to read / write tags, but it should work with your version as well. I didn't bother fixing ID3v1 tags because I don't see them in players.
Thanks for your post artm. I've worked a bit on the issue. there seems to be an updated version of pytagger out (0.5). We are currently using 0.3 :-/ Does 0.5 fix the bug?
I'll post some examples of encoded files on the blog before updating the blog so that everyone can comment them.
I never used pytagger before, I only checked it out since you use it, so I can't say if older version was causing problems. I don't see text encoding changes in ChangeLog between 0.3 and 0.5 though.
But I suspect the problem could creep in before the actual tagging occurs. How do you aquire the text for tags? Does it come from some database? May be text is stored as UTF-8 in the database but read as if it was Latin-1?