public class CandidatesProcessor extends Processor
Modifier and Type | Field and Description |
---|---|
protected CandidateChain |
candidateChain
Candidate chain
|
protected Frontier |
frontier
The frontier to use.
|
protected CrawlerLoggerModule |
loggerModule |
protected SeedModule |
seeds |
protected static int |
SEEDS_REDIRECT_NEW_SEEDS_MAX_HOPS |
protected SheetOverlaysManager |
sheetOverlaysManager |
Constructor and Description |
---|
CandidatesProcessor()
Usual no-argument constructor
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
checkForSeedPromotion(CrawlURI curi)
Check if the URI needs special 'discovered seed' treatment.
|
CandidateChain |
getCandidateChain() |
Frontier |
getFrontier() |
CrawlerLoggerModule |
getLoggerModule() |
boolean |
getProcessErrorOutlinks() |
SeedModule |
getSeeds() |
boolean |
getSeedsRedirectNewSeeds() |
SheetOverlaysManager |
getSheetOverlaysManager() |
protected void |
innerProcess(CrawlURI curi)
Run candidates chain on each of (1) any prerequisite, if present;
(2) any outCandidates, if present; (3) all outlinks, if appropriate
|
protected int |
runCandidateChain(CrawlURI candidate,
CrawlURI source)
Run candidatesChain on a single candidate CrawlURI; if its
reported status is nonnegative, schedule to frontier.
|
void |
setCandidateChain(CandidateChain candidateChain) |
void |
setFrontier(Frontier frontier) |
void |
setLoggerModule(CrawlerLoggerModule loggerModule) |
void |
setProcessErrorOutlinks(boolean errorOutlinks) |
void |
setSeeds(SeedModule seeds) |
void |
setSeedsRedirectNewSeeds(boolean redirect) |
void |
setSheetOverlaysManager(SheetOverlaysManager sheetOverlaysManager) |
protected boolean |
shouldProcess(CrawlURI puri)
Determines whether the given uri should be processed by this
processor.
|
doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop, toCheckpointJson
protected CandidateChain candidateChain
protected Frontier frontier
protected CrawlerLoggerModule loggerModule
protected static final int SEEDS_REDIRECT_NEW_SEEDS_MAX_HOPS
protected SeedModule seeds
protected SheetOverlaysManager sheetOverlaysManager
public CandidateChain getCandidateChain()
public void setCandidateChain(CandidateChain candidateChain)
public Frontier getFrontier()
public void setFrontier(Frontier frontier)
public CrawlerLoggerModule getLoggerModule()
public void setLoggerModule(CrawlerLoggerModule loggerModule)
public boolean getSeedsRedirectNewSeeds()
public void setSeedsRedirectNewSeeds(boolean redirect)
public boolean getProcessErrorOutlinks()
public void setProcessErrorOutlinks(boolean errorOutlinks)
public SeedModule getSeeds()
public void setSeeds(SeedModule seeds)
public SheetOverlaysManager getSheetOverlaysManager()
public void setSheetOverlaysManager(SheetOverlaysManager sheetOverlaysManager)
protected boolean shouldProcess(CrawlURI puri)
Processor
shouldProcess
in class Processor
puri
- the URI to testprotected int runCandidateChain(CrawlURI candidate, CrawlURI source) throws InterruptedException
candidate
- CrawlURI to considersource
- CrawlURI from which candidate was discovered/derivedInterruptedException
protected void innerProcess(CrawlURI curi) throws InterruptedException
innerProcess
in class Processor
curi
- the URI to processInterruptedException
- if the thread is interruptedProcessor.innerProcess(org.archive.modules.CrawlURI)
protected boolean checkForSeedPromotion(CrawlURI curi)
curi
- Copyright © 2003-2014 Internet Archive. All Rights Reserved.