Hey, even more on character encoding

Ok. let's say I have to downgrade, if you will, from a file that was pushed to me as valid xml with UTF-8 character encoding, to iso-8859-1 as html. What follows is the only way I have found to do this which is both easily scriptable and that uses common command line tools in a *nix environment (including OSX!). If there's a better way, or if any of this frankencode can be improved on, please leave a note in the comments :)

The xml in question is coming from InDesign cs2, and I've seen to it that the xml tags used in the document are actually html, so that part is taken care of.

The next step, since there are special characters that will not translate from UTF-8, like fancy quotes and apostrophes, is to get them into the ascii equivalent *before* converting the rest of the file. Html Tidy seems to be the best tool for the job, using the -b flag to strip fancyness from the characters.

[After testing this a bit more, I've added the -wrap 0 flag and specified xml]

tidy -q -b -xml -wrap 0 -utf8 filename

Tidy will also do a bunch more problem solving like balance tags and add proper html head, title and body tags (Specified xml so this does't happen). I don't need the xml declaration so it comes out on the fly using sed.

tidy -q -b -xml -wrap 0 -utf8 filename | sed '1,1d'

Once the file's in good shape I can convert it with iconv:

/usr/bin/iconv -c -f UTF-8 -t iso-8859-1 filename >newfilename

Two for loops in a bash script and we should have converted files sitting in a 'done' folder:

for f in $( ls *.xml ); do

tidy -q -b -xml -wrap 0 -utf8 $f 2>&1 | more | sed '1,1d' > $f.tidy

done

for i in $( ls *xml.html ); do

/usr/bin/iconv -c -f UTF-8 -t iso-8859-1 $i >done/$i.conv

done

The more pipe is a workaround to suppress tidy messages found at Dave Raggett's Tidy page.

All that's left to do now is rename the files to something a bit shorter.

Character encoding revisited

Some time ago I had posted about converting filename character encoding with convmv. Well a recent bit of problem solving over mismatched file encoding led to the discovery of another conversion tool, iconv. This converts the file's contents, and supports a huge number of encoding sets. The happy part in this story, for me anyway, is that it's also a default inclusion in OSX. This means I can build a small shell script to do what I need instead of resorting to applescript *shudder*. Each time I avoid the translation to POSIX paths the happier it makes me.

In-Grid



Haven't heard this in a while. Was reminded of it last night, while watching the ep of Flight of the Conchords with the 60's inspired french song.

Desktop Clutter

mb_desk.jpg

Working in osx leads to desktop clutter. Partially, I think, because many more app defaults are set to drop things there instead of the root of your home directory. A messy home dir is the problem my linux box has, and I'm less likely to do anything about it, since it's not constantly visible. Kind of the messy closet, or stuff stashed under the bed. So I guess it's time to try to be more disciplined about file organization, 'cause the desktop chaos kind of freaks me out.

A bit over a month

mbd.jpg

Pitchfork has the first track from the new My Brightest Diamond album up. It's really quite good. Hard to believe that it's only out in June. I've seen copies of it floating around already but, there is something to be said for waiting, the increasing anticipation. The building up of that can't wait feeling is something that hasn't happened to me for a long while, with any media, especially music though. It's nice to still be able to feel that way about someone's work.

Spooky

Finally picked up Spook Country. Only in as far as the first chapter so far, but it already feels good to have new Gibson to read.

Music Odds and Ends

Crystal Castles played live in a (devastating), Sid centric episode of skins.

Contact Contemporary Music is going to be playing Eno's Discreet Music at this year's Bang on a Can Marathon in NY.

Under Byen I found while checking out the label site. Nice stuff.

Been listening to a couple of Verdi operas, La Forza del Destino and Otello. They are so huge in sound. I want more :)

My Brightest Diamond | Magic Rabbit

Hadn't heard this version of the song before seeing this fantastic video.

Film in the trees

always_arrive.jpg

Finally continued work on my new project this weekend and shot a roll of tri-x with a Beaulieu 5008s. It's a bit more finnicky (the location of the power button, for instance) than the Canon 1014 XLS I've gotten used to. Hopefully I'll be processing the footage either late this week or the weekend. Would have been sooner but I picked up the wrong developing solution at the photo store on Saturday.

After this reel is processed I'll be hitting the optical printer with about 2 1/2 reels to be blown up to 16mm. I'll finally have enough footage to start cutting on the steenbeck.