public class PreconditionEnforcer extends Processor
Modifier and Type | Field and Description |
---|---|
protected CrawlerLoggerModule |
loggerModule |
protected CrawlMetadata |
metadata
Auto-discovered module providing configured (or overridden)
User-Agent value and RobotsHonoringPolicy
|
protected ServerCache |
serverCache |
Constructor and Description |
---|
PreconditionEnforcer() |
Modifier and Type | Method and Description |
---|---|
protected boolean |
authenticated(Credential credential,
CrawlURI curi)
Has passed credential already been authenticated.
|
protected boolean |
considerDnsPreconditions(CrawlURI curi) |
protected boolean |
considerRobotsPreconditions(CrawlURI curi)
Consider the robots precondition.
|
protected boolean |
credentialPrecondition(CrawlURI curi)
Consider credential preconditions.
|
boolean |
getCalculateRobotsOnly() |
CredentialStore |
getCredentialStore() |
int |
getIpValidityDurationSeconds() |
CrawlerLoggerModule |
getLoggerModule() |
CrawlMetadata |
getMetadata() |
int |
getRobotsValidityDurationSeconds() |
ServerCache |
getServerCache() |
protected void |
innerProcess(CrawlURI puri)
Actually performs the process.
|
protected ProcessResult |
innerProcessResult(CrawlURI puri) |
boolean |
isIpExpired(CrawlURI curi)
Return true if ip should be looked up.
|
void |
setCalculateRobotsOnly(boolean calcOnly) |
void |
setCredentialStore(CredentialStore credentials) |
void |
setIpValidityDurationSeconds(int duration) |
void |
setLoggerModule(CrawlerLoggerModule loggerModule) |
void |
setMetadata(CrawlMetadata provider) |
void |
setRobotsValidityDurationSeconds(int duration) |
void |
setServerCache(ServerCache serverCache) |
protected boolean |
shouldProcess(CrawlURI puri)
Determines whether the given uri should be processed by this
processor.
|
doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerRejectProcess, isRunning, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop, toCheckpointJson
protected CrawlMetadata metadata
protected ServerCache serverCache
protected CrawlerLoggerModule loggerModule
public int getIpValidityDurationSeconds()
public void setIpValidityDurationSeconds(int duration)
public int getRobotsValidityDurationSeconds()
public void setRobotsValidityDurationSeconds(int duration)
public boolean getCalculateRobotsOnly()
public void setCalculateRobotsOnly(boolean calcOnly)
public CrawlMetadata getMetadata()
public void setMetadata(CrawlMetadata provider)
public CredentialStore getCredentialStore()
public void setCredentialStore(CredentialStore credentials)
public ServerCache getServerCache()
public void setServerCache(ServerCache serverCache)
public CrawlerLoggerModule getLoggerModule()
public void setLoggerModule(CrawlerLoggerModule loggerModule)
protected boolean shouldProcess(CrawlURI puri)
Processor
shouldProcess
in class Processor
puri
- the URI to testprotected void innerProcess(CrawlURI puri)
Processor
#ENABLED
, the
#DECIDE_RULES
and the #shouldProcess(ProcessorURI)
tests.innerProcess
in class Processor
puri
- the URI to processprotected ProcessResult innerProcessResult(CrawlURI puri)
innerProcessResult
in class Processor
protected boolean considerRobotsPreconditions(CrawlURI curi)
curi
- CrawlURI we're checking for any required preconditions.curi
has a precondition or processing
should be terminated for some other reason. False if
we can proceed to process this url.protected boolean considerDnsPreconditions(CrawlURI curi)
curi
- CrawlURI whose dns prerequisite we're to check.public boolean isIpExpired(CrawlURI curi)
curi
- the URI to check.protected boolean credentialPrecondition(CrawlURI curi)
CrawlServer
. If there are, have they
been run already? If not, make the running of these logins a precondition
of accessing any other url on this CrawlServer
.
One day, do optimization and avoid running the bulk of the code below. Argument for running the code everytime is that overrides and refinements may change what comes back from credential store.
curi
- CrawlURI we're checking for any required preconditions.curi
has a precondition that needs to
be met before we can proceed. False if we can precede to process
this url.protected boolean authenticated(Credential credential, CrawlURI curi)
credential
- Credential to test.curi
- CrawlURI.Copyright © 2003-2014 Internet Archive. All Rights Reserved.