The selftest webapp is the repository for the serverside of the intergration test.
The integration self test is run from the command line. Invocation makes the crawler go up against itself trawling the selftest webapp. When done, the product -- arc and log files -- are analyzed by code herein to verify test pass or fail.
The integration
self test is the aggregation of multiple individual tests each testing a
particular crawler aspect. For example, the Robots test validates
the crawler's parse of robots.txt. Each test comprises a directory
under the selftest webapp named for the test into which we put the
server pages that express the scenario to test, and a class from this
package named for test webapp directory w/ a SelfTest
suffix.
The selftest class verifies test success. Each selftest class subclasses
org.archive.crawler.selftest.SelfTestCase
which is itself
a subclass of org.junit.TestCase
). All tests need to be
registered with the {@link org.archive.crawler.selftest.AllSelfTestCases}
class and must live in the org.archive.crawler.selftest package. The class
{@link org.archive.crawler.selftest.SelfTestCrawlJobHandler}
manages the running of selftest.
Run one test only by passing its name as the option value to the selftest argument.
The first crop of self tests are derived from tests developed by Parker Thompson < pt at archive dot org >. See Tests. These tests in turn look to have been derived from Testing Search Indexing Systems1.
TODO
TODO