The following post appears to be older than 100 days. I therefore cannot guarantee that any technical information in this post is still valid today.

Please consider to also look for other, more up to date resources!
2008/07/17

CodeSwarm, Darcs and DokuWiki

A couple of weeks ago I discovered code swarm while surfing the web. For those of you who haven't heard of it yet, it uses processing, a programming language to program images, animation, and interactions, to generate a set of images representing the activity of a software project based on the commit history of the used VCS tool. The resulting images can than be used to create a video of the development history of the project.

Here's a short, and maybe better description from code swarm website:

This visualization, called code_swarm, shows the history of commits in a software project. A commit happens when a developer makes changes to the code or documents and transfers them into the central project repository. Both developers and files are represented as moving elements. When a developer commits a file, it lights up and flies towards that developer. Files are colored according to their purpose, such as whether they are source code or a document. If files or developers have not been active for a while, they will fade away. A histogram at the bottom keeps a reminder of what has come before.

Since I missed to get a cake for DokuWikis 4th birthday, I thought I create a code swarm video of the commit history of this fabulous project ;-). Since code swarm doesn't support a tool to convert darcs repository logs yet, I'll describe what I did to prepare the required XML data.

First you have to get the revision history of your project. To be sure you have the complete logs1) I suggest to do a fresh chekout of the main repository.

% darcs get http://remote-domain.org/repository

Luckily darcs already provides an option to export the commit log as XML. The –summary switch is needed to get the list of changed files per commit as well!

% cd repository/
% darcs changes --output-xml --summary

Here's what a sample commit looks like.

<patch author='Michael Klier ' date='20080505142356' local_date='Mon May  5 16:23:56 CEST 2008' inverted='False' hash='20080505142356-23886-9d3b1f1512a94acc241a01af0d3a9f9053783f3c.gz'>
    <name>actionOK() should honour </name>
    <summary>
    <modify_file>
    inc/confutils.php<added_lines num='4'/>
    </modify_file>
    </summary>
</patch>

Now lets convert that into the format which code swarm requires. Code swarm only needs the filename, the author and the date2). That means, a commit which changed multiple files becomes one code swarm event per changed file. The code swarm XML looks like the following:

<event filename="$filename" date="$timestamp" author="$author" />

I used a python script to do the conversion. But I also had to manually edit the source XML as well as the final XML to get rid of unicode problems and to normalize the data. With normalize I mean that for example some long time commiters used different email addresses over the years, sometimes with their real names and sometimes without, I simply made sure that always their real name is set as the author attribute. That's something you can't really script and which needs manual tuning ;-). But to get a halfway accurate representation of the commit history it's absolutely necessary.

Here's the piece of quick and dirty python I used to to convert the XML3). Note that in order to not make the dom parser choke on the input file you have to add the <xml version=“1.0”?> string to the source file!

convert_darcs.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
import sys
import time
from xml.dom import minidom
 
xmldoc = minidom.parse('dokuwiki-darcs.xml')
 
patches = xmldoc.getElementsByTagName('patch')
 
events = []
 
for patch in patches:
	author = patch.getAttribute('author').strip()
	date = int(time.mktime(time.strptime(patch.getAttribute('date'), '%Y%m%d%H%M%S')))*1000
	files = patch.getElementsByTagName('modify_file')
 
	for file in files:
		fname = file.firstChild.data.strip()
		date = date
		author = author
		event = (fname, date, author)
		events.insert(0, event)
 
 
print '<?xml version="1.0"?>'
print '<file_events>'
 
for event in events:
	try:
		print '<event filename="%s" date="%s" author="%s" />' % event
	except UnicodeEncodeError:
		print >>sys.stderr, event
 
print '</file_events>

Just run it on your source file it will write the resulting XML to stdout. If it encounters a unicode problem it displays the error and you have to fix it in the source file.

% ./convert_darcs.py > final.xml

Now just checkout out the codeswarm svn repo4) follow the installation instructions on their webiste, copy the data/sample.config to a new configuration file and alter the settings to fit your needs.

I defined different colors for different parts of the DokuWiki code base.

  • red = core code
  • yellow = geshi (to distinguish geshi updates from the core code)
  • magenta = parser code (sadly you don't really see it, keep an eye on Harry Fuecks ;-))
  • white = test cases (you'll notice them once the show up the first time)
  • green = template and plugins
  • blue = language files

The default settings are OK for the most part, the only thing I've changed is the FramesPerDay setting which I've set to 24.

Once everything is setup just run code swarm through the provide run.sh script and wait until it's finished.

I've used ffmpeg to create the resulting video from the images and added a mp3 to have some ambience ;-). I am not that experienced with video creation yet, but I got some good results with those settings, though the file got quite big5).

% cd frames/
% ffmpeg -f image2 -r 24 -i ./code_swarm-%05d.png -i ./audio.mp3 -acodec copy -sameq ./dokuwiki-codeswarm.avi -pass 2

That's it :-). Here's the resulting video. To be continued, enjoy!!

The Adobe Flash Plugin is needed to display this content.

PS.: I plan to create one with a bigger resolution, the current one seems to suffer a bit from the conversion vimeo does. Also the sound seems to clip at the first second. You'll find them at my vimeo account once they're finished.

1) ususally you use darcs get –partial to clone a repo which doesn't download the whole commit history
2) as unix timestamp with milliseconds
3) I plan to incorporate a cleaner and better version of it into the code swarm conversion tool and send it upstream
4) you'll need to have the java-jdk and apache-ant installed on your system
5) around ~40MB for 7000+ commits

Comments

1

Just a short remark: Why not just have the Python script go once through the data? You could remove the last loop and just throw the code into the “for file in files:” loop. Don't know how long it runs and probably was due to the weird data that darcs exported (or the way minidom read the xml data).

Apart it's a really nice gift to the project, more projects should do something like this.

2008/07/17 17:30
2

Ahmm … if I try posting a comment without subscribing to the comments the comment is rejected with a “CAPTCHA answered wrong”. Might be something you might wanna look into?

2008/07/17 17:31
3

Hmm… now it worked but I don't know if that is cause I already subscribed with the first entry…

Enough spam here…

2008/07/17 17:31
4

Of course I know that going through the data once would be better ;-). However, code swarm expects the commits to be in reverse cronological order. That's why I pack the events into a list first, by putting every parsed event at the beginning of the list. It seemed to me to be the easiest way to do this :-).

2008/07/17 17:33
5

Hmmm, strange, thanks for letting me know. I'll look into it!

2008/07/17 17:34
6

This is so cool. When I first saw the Code Swarm videos I thought they were nice, but having a video were I recognize all the names is just plain awesome.

Excellent work, thanks chi.

2008/07/17 17:49
7

I simply LOVE that video, thanks so much! :)

2008/07/17 18:00
8

At first I thought these looked like galactic star maps, but after seeing the video, I now know its a black hole. ;)

Great work !!

2008/07/17 18:16
9

I am glad you all like it :-).

@Chris: Yeah, it really looks like Andi is absorbing everything into nothingness ;-).

2008/07/17 20:28
10 [...] created [...]
2008/07/18 23:32
11

Try this:

mencoder 'mf://*.png' -mf fps=24:type=png -ovc lavc vcodec=libx264 -oac copy -o movie.avi
Felipe Contreras
2008/08/30 11:04



ASZGI