public class RuntimeLimitEnforcer extends Processor
This processor extends and improves on the 'max-time' capability of Heritrix. Essentially, the 'Terminate job' option functions the same way as 'max-time'. The processor however also enables pausing when the runtime is exceeded and the blocking of all URIs.
The processor allows variable runtime based on host (or other override/refinement criteria) however using such overrides only makes sense when using 'Block URIs' as pause and terminate will have global impact once encountered anywhere.
Modifier and Type | Class and Description |
---|---|
static class |
RuntimeLimitEnforcer.Operation
The action that the processor takes once the runtime has elapsed.
|
Modifier and Type | Field and Description |
---|---|
protected CrawlController |
controller |
protected RuntimeLimitEnforcer.Operation |
expirationOperation
The action that the processor takes once the runtime has elapsed.
|
protected static Logger |
logger |
protected long |
runtimeSeconds
The amount of time, in seconds, that the crawl will be allowed to run
before this processor performs it's 'end operation.'
|
protected StatisticsTracker |
statisticsTracker |
Constructor and Description |
---|
RuntimeLimitEnforcer() |
Modifier and Type | Method and Description |
---|---|
CrawlController |
getCrawlController() |
RuntimeLimitEnforcer.Operation |
getExpirationOperation() |
long |
getRuntimeSeconds() |
StatisticsTracker |
getStatisticsTracker() |
protected void |
innerProcess(CrawlURI curi)
Actually performs the process.
|
protected ProcessResult |
innerProcessResult(CrawlURI curi) |
void |
setCrawlController(CrawlController controller) |
void |
setExpirationOperation(RuntimeLimitEnforcer.Operation op) |
void |
setRuntimeSeconds(long secs) |
void |
setStatisticsTracker(StatisticsTracker statisticsTracker) |
protected boolean |
shouldProcess(CrawlURI puri)
Determines whether the given uri should be processed by this
processor.
|
doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerRejectProcess, isRunning, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop, toCheckpointJson
protected static Logger logger
protected long runtimeSeconds
protected RuntimeLimitEnforcer.Operation expirationOperation
Operation: Pause job - Pauses the crawl. A change (increase) to the runtime duration will make it pausible to resume the crawl. Attempts to resume the crawl without modifying the run time will cause it to be immediately paused again.
Operation: Terminate job - Terminates the job. Equivalent to using the max-time setting on the CrawlController.
Operation: Block URIs - Blocks each URI with an -5002 (blocked by custom processor) fetch status code. This will cause all the URIs queued to wind up in the crawl.log.
protected CrawlController controller
protected StatisticsTracker statisticsTracker
public long getRuntimeSeconds()
public void setRuntimeSeconds(long secs)
public RuntimeLimitEnforcer.Operation getExpirationOperation()
public void setExpirationOperation(RuntimeLimitEnforcer.Operation op)
public CrawlController getCrawlController()
public void setCrawlController(CrawlController controller)
public StatisticsTracker getStatisticsTracker()
public void setStatisticsTracker(StatisticsTracker statisticsTracker)
protected boolean shouldProcess(CrawlURI puri)
Processor
shouldProcess
in class Processor
puri
- the URI to testprotected void innerProcess(CrawlURI curi)
Processor
#ENABLED
, the
#DECIDE_RULES
and the #shouldProcess(ProcessorURI)
tests.innerProcess
in class Processor
curi
- the URI to processprotected ProcessResult innerProcessResult(CrawlURI curi) throws InterruptedException
innerProcessResult
in class Processor
InterruptedException
Copyright © 2003-2014 Internet Archive. All Rights Reserved.