A couple of weeks ago I discovered code swarm while surfing the web. For those of you who haven't heard of it yet, it uses processing, a programming language to program images, animation, and interactions, to generate a set of images representing the activity of a software project based on the commit history of the used VCS tool. The resulting images can than be used to create a video of the development history of the project.
Here's a short, and maybe better description from code swarm website:
This visualization, called code_swarm, shows the history of commits in a software project. A commit happens when a developer makes changes to the code or documents and transfers them into the central project repository. Both developers and files are represented as moving elements. When a developer commits a file, it lights up and flies towards that developer. Files are colored according to their purpose, such as whether they are source code or a document. If files or developers have not been active for a while, they will fade away. A histogram at the bottom keeps a reminder of what has come before.
Since I missed to get a cake for DokuWikis 4th birthday, I thought I create a code swarm video of the commit history of this fabulous project
. Since code swarm doesn't support a tool to convert darcs repository logs yet, I'll describe what I did to prepare the required XML data.
First you have to get the revision history of your project. To be sure you have the complete logs1) I suggest to do a fresh chekout of the main repository.
% darcs get http://remote-domain.org/repository
Luckily darcs already provides an option to export the commit log as XML. The –summary switch is needed to get the list of changed files per commit as well!
% cd repository/ % darcs changes --output-xml --summary
Here's what a sample commit looks like.
<patch author='Michael Klier ' date='20080505142356' local_date='Mon May 5 16:23:56 CEST 2008' inverted='False' hash='20080505142356-23886-9d3b1f1512a94acc241a01af0d3a9f9053783f3c.gz'> <name>actionOK() should honour </name> <summary> <modify_file> inc/confutils.php<added_lines num='4'/> </modify_file> </summary> </patch>
Now lets convert that into the format which code swarm requires. Code swarm only needs the filename, the author and the date2). That means, a commit which changed multiple files becomes one code swarm event per changed file. The code swarm XML looks like the following:
<event filename="$filename" date="$timestamp" author="$author" />
I used a python script to do the conversion. But I also had to manually edit the source XML as well as the final XML to get rid of unicode problems and to normalize the data. With normalize I mean that for example some long time commiters used different email addresses over the years, sometimes with their real names and sometimes without, I simply made sure that always their real name is set as the author attribute. That's something you can't really script and which needs manual tuning
. But to get a halfway accurate representation of the commit history it's absolutely necessary.
Here's the piece of quick and dirty python I used to to convert the XML3). Note that in order to not make the dom parser choke on the input file you have to add the <xml version=“1.0”?> string to the source file!
#!/usr/bin/env python # -*- coding: utf-8 -*- import sys import time from xml.dom import minidom xmldoc = minidom.parse('dokuwiki-darcs.xml') patches = xmldoc.getElementsByTagName('patch') events = [] for patch in patches: author = patch.getAttribute('author').strip() date = int(time.mktime(time.strptime(patch.getAttribute('date'), '%Y%m%d%H%M%S')))*1000 files = patch.getElementsByTagName('modify_file') for file in files: fname = file.firstChild.data.strip() date = date author = author event = (fname, date, author) events.insert(0, event) print '<?xml version="1.0"?>' print '<file_events>' for event in events: try: print '<event filename="%s" date="%s" author="%s" />' % event except UnicodeEncodeError: print >>sys.stderr, event print '</file_events>
Just run it on your source file it will write the resulting XML to stdout. If it encounters a unicode problem it displays the error and you have to fix it in the source file.
% ./convert_darcs.py > final.xml
Now just checkout out the codeswarm svn repo4) follow the installation instructions on their webiste, copy the data/sample.config to a new configuration file and alter the settings to fit your needs.
I defined different colors for different parts of the DokuWiki code base.
The default settings are OK for the most part, the only thing I've changed is the FramesPerDay setting which I've set to 24.
Once everything is setup just run code swarm through the provide run.sh script and wait until it's finished.
I've used ffmpeg to create the resulting video from the images and added a mp3 to have some ambience
. I am not that experienced with video creation yet, but I got some good results with those settings, though the file got quite big5).
% cd frames/ % ffmpeg -f image2 -r 24 -i ./code_swarm-%05d.png -i ./audio.mp3 -acodec copy -sameq ./dokuwiki-codeswarm.avi -pass 2
That's it
. Here's the resulting video. To be continued, enjoy!!
PS.: I plan to create one with a bigger resolution, the current one seems to suffer a bit from the conversion vimeo does. Also the sound seems to clip at the first second. You'll find them at my vimeo account once they're finished.
Ahmm … if I try posting a comment without subscribing to the comments the comment is rejected with a “CAPTCHA answered wrong”. Might be something you might wanna look into?
2008/07/17 17:31Hmm… now it worked but I don't know if that is cause I already subscribed with the first entry…
Enough spam here…
2008/07/17 17:31
Of course I know that going through the data once would be better
. However, code swarm expects the commits to be in reverse cronological order. That's why I pack the events into a list first, by putting every parsed event at the beginning of the list. It seemed to me to be the easiest way to do this
.
This is so cool. When I first saw the Code Swarm videos I thought they were nice, but having a video were I recognize all the names is just plain awesome.
Excellent work, thanks chi.
2008/07/17 17:49At first I thought these looked like galactic star maps, but after seeing the video, I now know its a black hole. ;)
Great work !!
2008/07/17 18:16
I am glad you all like it
.
@Chris: Yeah, it really looks like Andi is absorbing everything into nothingness
.
Try this:
mencoder 'mf://*.png' -mf fps=24:type=png -ovc lavc vcodec=libx264 -oac copy -o movie.avi
Just a short remark: Why not just have the Python script go once through the data? You could remove the last loop and just throw the code into the “for file in files:” loop. Don't know how long it runs and probably was due to the weird data that darcs exported (or the way minidom read the xml data).
Apart it's a really nice gift to the project, more projects should do something like this.