Music Purchase Decision Support System

Alaa's picture
Submitted by Alaa on Thu, 06/01/2005 - 4:01pm.
::

yes you read it right, my little GNU/Linux box runs the worlds most advanced state of the art Music Purchase decision support systems, we're talking all the buzzwords here, XML enabled Webservices and distributed processing peer2peer capability all for the sake of helping me decide how to spend 20$ online

now to tell you what all this is about, as explained here I decided to pay for some of the free music I love.

the decision involves finding out which magnatunes artists I found through Irate and gave high ratings to their songs, how I did that is what this post is about, cool CLI hacks to get info from your Irate XML files (should help you with any XML file I suppose).

irate stores rating info in the ~/irate/trackdatabase.xml I needed to process this file and answer the question which tracks rated 10, 7 or 5 are by manatune artists

to answer this I needed to understand the structure of the file, turns out irate does not insert end of line caharacters or indent the file in anyway, so it was pretty hard to read, I needed a tool to indent XML files.

searching freshmeat for xml indent lead me to the aptly named XML Indent tool, which unfortunatly had no Mandrake package (note to self get DarknessWolf to package this one), so after a quick make I could use it to figure out the structure of the ~/irate/trackdatabase.xml file

$ xmlindent ~/irate/trackdatabase.xml | head
<?xml version="1.0"?>
<TrackDatabase serial="663">
    <User port="2278" name="alaa" password="OUCH" host="server.irateradio.org"/><AutoDownload setting="37" count="5012"/>
    <Player path="madplay"/>
    <PlayList length="49" UnratedRatio="97"/>
    <Track url="http://artists.iuma.com/dl/CARV/audio/CARV_-_Slipbackinto.mp3" state="erased" played="1" weight="" file="" artist="CARV" rating="0.0" title="Slipbackinto" last="30-Aug-03 6:01:56 PM"/>

allright so there is a Track element and it has a url, artist and rating properties, thats all we need.

from this point on I could use grep but I decided to find out what I can do with XML cli tools, so first I asked what are the XML cli tools available on Mandrake

$ urpmf --summary xml | grep -i 'command\|cli'
xmlclitools:Command-line xml tools
xmlstarlet:Command Line XML Toolkit

ok 2 packages only, lets install them and find out what commands they add

  1. urpmi xmlclitools xmlstarlet
$ urpmq -l xmlstarlet | grep bin/ /usr/bin/xml $ urpmq -l xmlclitools | grep bin/ /usr/bin/xmlfmt /usr/bin/xmlgrep /usr/bin/xmlmod /usr/bin/xmlngrep

turns out both packages come with no man pages, after checking --help and the README files under /usr/share/doc/ I conclude that xmlstarlet is complex and requires knowledge of XPATH and XSLT, leave this one for laters and lets focus on the straight forward xmlclitools.

turns out the thing was very simple, xmlgrep helps me select xml elements based on the value of their properties and xmlfmt displays only the info I need. here is a step by step

$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track
<Track url="http://artists.iuma.com/dl/CARV/audio/CARV_-_Slipbackinto.mp3" state="erased" played="1" weight="" file="" artist="CARV" rating="0.0" title="Slipbackinto" last="30-Aug-03 6:01:56 PM"/>
...

ok this selects Track elements only; -f tells xmlgrep which file to read, the TrackDatabase.Track argument is the path to the elements we want to select (TrackDatabase being the parent of all Track elements), we could use the -g to make searches global and not bother about parents.

$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*'
<Track played="14" rating="7.0" url="http://magnatune.com/all/03-Monsters-Beth%20Quist.mp3" serial="406" file="/home/alaa/irate/download/03-Monsters-Beth Quist.mp3" last="20040827183419" artist="Beth Quist" title="Monsters"/>
...

ok now the search is narrowed down to Tracks with the url attribute matching the regular expression .*magnatune.*, xmlclitools have this quirck where all periods have to be escaped so the previous regular expression has to be written as '\.*magnatune\.*'.

property~=regexp matches regular expressions, property=value matches exact value and property!=value selects items that don't match the value.

$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*':rating~='10|7|5\.0'
<Track played="14" rating="7.0" url="http://magnatune.com/all/03-Monsters-Beth%20Quist.mp3" serial="406" file="/home/alaa/irate/download/03-Monsters-Beth Quist.mp3" last="20040827183419" artist="Beth Quist" title="Monsters"/>
<Track played="10" rating="5.0" url="http://magnatune.com/all/04-Freedom-Jeff%20Wahl.mp3" file="/home/alaa/irate/download/04-Freedom-Jeff Wahl.mp3" serial="349" last="20040629183951" artist="Jeff Wahl" title="Freedom"/>
...

this narrows down the search further to only tracks with ratings 10,7 or 5, note how I had to escape the period again.

you can search for any combiniations of attributes, I'm not sure if we can do OR searches instead of the default AND search though.

$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*':rating~='10|7|5\.0' | xmlfmt Track:artist
...
Seismic Anamoly
Seismic Anamoly
Human Response
Barbara Leoni
Seismic Anamoly
Thursday Group
Jade Leary
Curandero
Falik
...

finally we pipe it all to xmlfmt and ask it to only show us the artist property of Track elements.

notice that some artists are repeated because I rated multiple tracks for them, its easy to remedy that

$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*':rating~='10|7|5\.0' | xmlfmt Track:artist | sort | uniq
Artemis
Barbara Leoni
Beth Quist
C. Layne
Curandero
Drop Trio
Ed Martin
Falik
Four Stones.Net
Human Response
Jacob Heringman and Catherine
Jade Leary
Jay Kishor
Jeff Wahl
Kyiv Chamber Choir
Norine Braun
Paul Berget
Rapoon
Seismic Anamoly
Solace
SoulPrint
Stellamara
The Napoleon Blown Aparts
Thursday Group
Tim Rayborn
Tom Paul
touchingGrace
Version
Very Large Array

good ol' GnuTextUtils to the rescue.

finally maybe I want to see the rating in front of each artist

$ xmlgrep -f ~/irate/trackdatabase.xml TrackDatabase.Track:url~='\.*magnatune\.*':rating~='10|7|5\.0' | xmlfmt Track:artist Track:rating | sort       
Artemis 10.0
Barbara Leoni 7.0
Beth Quist 10.0
Beth Quist 7.0
C. Layne 5.0
Curandero 7.0
Curandero 7.0
Drop Trio 10.0
Ed Martin 5.0
Falik 7.0
...

thats it folks, ou could go on to caclucalte average rating of an artists if you want, I went on and wrote a couple of scripts to help me leach the files from http://magnatune.com but thats another story.

so irate, xmlgrep, xmlfmt, sort, uniq, sed, awk, grep, wget and all the usual suspects are my bleeding edge music purchase decision support system.

add to your todo list

add to your todo list

  • get DarknessWolf to package this one

but hey, i thought bash scripting and xml files don't mix. very cool.

whirlpool's picture

pipe the last command to uniq -f 1

$ xmlgrep -gf trackdatabase.xml 0 Track:url~='\.*magna\.*':rating~='10|7|5\.0' | xmlfmt Track:rating Track:artist | sort | uniq -f 1
10.0 Reza Manzoori
5.0 Beth Quist
7.0 Kourosh Zolani
7.0 MRDC
7.0 Psychetropic
whirlpool's picture

not a good idea

since in Irate you rate by track and not artist, this is not a good idea, you want to look at all the rating you gave to this artist.

however something to tell you maximum, minimum and avergae rating would be nice.

cheers,
Alaa


http://www.manalaa.net "i`m feeling for the 2nd time like alice in wonderland reading el wafd"

Alaa's picture

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.