Distributed Web Spider - Demo for the Terracotta WorkManager

About

This application implements a distributed web spider and is a demo applications for the Terracotta WorkManager.

Build Instructions

First download and install Terracotta DSO.

This sample application requires Maven 2 and Java 5 and needs to be built before you can run it.

First download and install Maven 2. Then perform these steps:

  1. Step up into the parent (workmanager) directory and invoke mvn install to install the Maven pom.xml files, build the workmanager and spider jars and install these into your local Maven repository.
  2. Step down into the spider directory and invoke mvn assembly:assembly to build the application and the distribution.

Run Instructions

  1. Set the TC_HOME env variable to point to the root of the Terracotta DSO installation directory.

  2. Start a Terracotta Server from the $TC_HOME/dso/samples directory.

    Unix: ./start-demo-server.sh &
    Windows: start-demo-server.bat

  3. Start a Master:

    Unix: master.sh
    Windows: master.bat

    You have to feed it with a colon separated list of all the routing IDs using the '-i' option: '-i 0:1:2'. Routing IDs are simply used to associate workers with queues. If you plan to run the demo with two workers, you would pass '-i 0:1' as arguments to the command.
    You can also feed it an optional "fail-over" routing ID, which specifies a queue and worker that will receive failed work, this is done by providing the '-f' option together with a routing ID.

    By default it will let the web spider crawl the www.terracotta.org web site, but if you want to change that you can the master.bat/master.sh script.

  4. Start one or more Workers:

    Unix: worker.sh
    Windows: worker.bat

    You have to feed it with a routing ID for each worker (which will be the queue that the worker should use, has to be one that is stated in the master '-i' option routing ID list.

E.g. something like:
master.bat -i route1:route2

worker.bat route1 
worker.bat route2