public class LinksScoper extends Scoper
Since this scoper has to create CrawlURIs, no sense discarding them since later in the processing chain CrawlURIs rather than Links are whats needed scheduling extracted links w/ the Frontier (Frontier#schedule expects CrawlURI, not Link). This class replaces Links w/ the CrawlURI that wraps the Link in the CrawlURI.
fileLogger, isRunning, loggerModule, scope
beanName, kp, recoveryCheckpoint, uriCount
Constructor and Description |
---|
LinksScoper()
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
DecideRule |
getLogRejectsRule()
Deprecated.
|
int |
getPreferenceDepthHops()
Deprecated.
|
protected int |
getSchedulingFor(CrawlURI curi,
Link wref,
int preferenceDepthHops)
Deprecated.
Determine scheduling for the
curi . |
boolean |
getSeedsRedirectNewSeeds()
Deprecated.
|
protected void |
handlePrerequisite(CrawlURI curi)
Deprecated.
The CrawlURI has a prerequisite; apply scoping and update
Link to CrawlURI in manner analogous to outlink handling.
|
protected void |
innerProcess(CrawlURI puri)
Deprecated.
Actually performs the process.
|
protected void |
outOfScope(CrawlURI caUri)
Deprecated.
Called when a CrawlURI is ruled out of scope.
|
void |
setLogRejectsRule(DecideRule rule)
Deprecated.
|
void |
setPreferenceDepthHops(int depth)
Deprecated.
|
void |
setSeedsRedirectNewSeeds(boolean redirect)
Deprecated.
|
protected boolean |
shouldProcess(CrawlURI puri)
Deprecated.
Determines whether the given uri should be processed by this
processor.
|
getLoggerModule, getLogToFile, getScope, isInScope, isRunning, setLoggerModule, setLogToFile, setScope, start, stop
doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, startCheckpoint, toCheckpointJson
public LinksScoper()
name
- Name of this filter.public boolean getSeedsRedirectNewSeeds()
public void setSeedsRedirectNewSeeds(boolean redirect)
public DecideRule getLogRejectsRule()
public void setLogRejectsRule(DecideRule rule)
public int getPreferenceDepthHops()
public void setPreferenceDepthHops(int depth)
protected boolean shouldProcess(CrawlURI puri)
Processor
shouldProcess
in class Processor
puri
- the URI to testprotected void innerProcess(CrawlURI puri)
Processor
#ENABLED
, the
#DECIDE_RULES
and the #shouldProcess(ProcessorURI)
tests.innerProcess
in class Processor
puri
- the URI to processprotected void handlePrerequisite(CrawlURI curi)
curi
- CrawlURI with prereq to considerprotected void outOfScope(CrawlURI caUri)
Scoper
outOfScope
in class Scoper
caUri
- CrawlURI that is out of scope.protected int getSchedulingFor(CrawlURI curi, Link wref, int preferenceDepthHops)
curi
.
As with the LinksScoper in general, this only handles extracted links,
seeds do not pass through here, but are given MEDIUM priority.
Imports into the frontier similarly do not pass through here,
but are given NORMAL priority.Copyright © 2003-2014 Internet Archive. All Rights Reserved.