The following post appears to be older than 100 days. I therefore cannot guarantee that any technical information in this post is still valid today.

Please consider to also look for other, more up to date resources!
2009/04/20

Spider Your DokuWiki Using Wget

Some of you might have been in that situation already and know that sometimes it's necessary to spider your DokuWiki. For example if you need to rebuild the search index or you make use of the tag plugin and you don't want to visit each site on your own to trigger the (re)generation of the needed meta data1).

Here's a short bash snippet which uses wget I want to share with you. You have to run it inside of your <dokuwiki>/data/pages folder or it won't work.

for file in $(find ./ -type f); do
    file=${file//.\//}
    file=${file//\//:}
    file=$(basename $file '.txt')
    url="http://yourdomain.org/doku.php?id=$file"
    wget -nv "$url" -O /dev/null
    [ $? != 0 ] && echo "ERROR fetching $url"
    sleep 1
done

There are probably one million other ways to do this in bash. The reason I search the pages directory first instead of using the <dokuwiki>/data/index/page.idx file, is that there could be pages added from a script which in turn could be missing in the global index because of that.

Note: I would set the sleep count at least to one second (if not more) in order to give the indexer enough time to finish his job and to avoid lock conflicts.

And yeah, it's intentional that I don't use the –spider switch of wget, because it only checks the header response instead of downloading the file, which could be possibly not enough to trigger the indexer.

1) DokuWiki uses a Webbug to do all what's needed in the background

Comments

1

Hmm when you're running a CLI script anyway why not run bin/indexer.php? Or when that's not an option you could at least call the webbug directly instead of doku.php

2009/04/20 18:48
2

To also let syntax plugins update their meta data (like the tag plugin for example).

2009/04/20 20:42
3

Ah I see. Go on then ;)

2009/04/21 08:51
4

Thanks for this… really useful! I needed some adjustments to make it work though. I change the wget to act recursively 1 layer deep.

  wget -q "$url" -r -l1 -O /dev/null
Menno
2009/08/20 15:14



RFNBO