public class BdbFrontier extends WorkQueueFrontier implements Checkpointable, org.springframework.beans.factory.BeanNameAware
Frontier.FrontierGroup, Frontier.State
Modifier and Type | Field and Description |
---|---|
protected BdbModule |
bdb |
protected String |
beanName |
protected boolean |
dumpPendingAtClose |
protected SortedMap<Integer,Queue<String>> |
inactiveQueuesByPrecedence
All 'inactive' queues, not yet in active rotation.
|
protected BdbMultipleWorkQueues |
pendingUris
all URIs scheduled to be crawled
|
protected Checkpoint |
recoveryCheckpoint |
protected StoredQueue<String> |
retiredQueues
'retired' queues, no longer considered for activation.
|
allQueues, appCtx, futureUris, highestPrecedenceWaiting, inProcessQueues, largestQueues, MAX_SNOOZED_IN_MEMORY, maxQueuesPerReportCategory, precedenceFloor, readyClassQueues, snoozedClassQueues, snoozedOverflow, snoozedOverflowCount, snoozeLongMs, uriUniqFilter
controller, dispositionInProgressLock, dispositionPending, disregardedUriCount, failedFetchCount, futureUriCount, kp, lastReachedState, loggerModule, managerThread, nextOrdinal, outboundLock, preparer, queuedUriCount, recover, scope, seeds, serverCache, sheetOverlaysManager, succeededFetchCount, targetState, totalProcessedBytes
Constructor and Description |
---|
BdbFrontier() |
Modifier and Type | Method and Description |
---|---|
void |
close()
Release resources only needed when running
|
void |
consistencyCheck()
Run a self-consistency check over queue collections, queues-of-queues,
etc.
|
protected void |
consistencyMarkup(DisposableStoredSortedMap<String,String> queueSummaries,
Iterable<?> queues,
String mark) |
protected Queue<String> |
createInactiveQueueForPrecedence(int precedence)
Create an inactiveQueue to hold queue names at the given precedence
|
protected Queue<String> |
createInactiveQueueForPrecedence(int precedence,
boolean usePriorData)
Optionally reuse prior data, for use when resuming from a checkpoint
|
protected BdbMultipleWorkQueues |
createMultipleWorkQueues()
Create the single object (within which is one BDB database)
inside which all the other queues live.
|
void |
doCheckpoint(Checkpoint checkpointInProgress)
Do the actual checkpoint.
|
void |
dumpAllPendingToLog()
Dump all still-enqueued URIs to the crawl.log -- without actually
dequeuing.
|
protected void |
finalTasks()
Perform any tasks necessary before entering
FINISH frontier state/FINISHED crawl state
|
void |
finishCheckpoint(Checkpoint checkpointInProgress)
Cleanup/unlock; need not complete for a checkpoint to be valid.
|
boolean |
getDumpPendingAtClose() |
Frontier.FrontierGroup |
getGroup(CrawlURI curi)
Get the 'frontier group' (usually queue) for the given
CrawlURI.
|
protected SortedMap<Integer,Queue<String>> |
getInactiveQueuesByPrecedence()
Return a sorted map of all queues of WorkQueue keys, keyed by precedence
|
protected WorkQueue |
getQueueFor(String classKey)
Return the work queue for the given classKey, or null
if no such queue exists.
|
protected Queue<String> |
getRetiredQueues()
Return queue of all retired queue names.
|
CompositeData |
getURIsList(String marker,
int numberOfMatches,
String pattern,
boolean verbose)
Return list of urls.
|
protected BdbMultipleWorkQueues |
getWorkQueues() |
protected void |
initAllQueues()
Initialize the allQueues field in an implementation-appropriate
way.
|
protected void |
initOtherQueues()
Initialize all other internal queues in an implementation-appropriate
way.
|
void |
setBdbModule(BdbModule bdb) |
void |
setBeanName(String name) |
void |
setDumpPendingAtClose(boolean dumpPendingAtClose) |
void |
setRecoveryCheckpoint(Checkpoint checkpoint)
Used to inform a bean that it should restore its state from
the given Checkpoint when launched (Lifecycle start()).
|
void |
startCheckpoint(Checkpoint checkpointInProgress)
Note a checkpoint is about to begin.
|
protected boolean |
workQueueDataOnDisk()
Returns
true if the WorkQueue implementation of this
Frontier stores its workload on disk instead of relying
on serialization mechanisms. |
activateInactiveQueue, allNonemptyReportTo, allQueuesReportTo, appendQueueReports, averageDepth, checkFutures, congestionRatio, considerIncluded, deactivateQueue, deepestUri, deleted, deleteURIs, destroy, discoveredUriCount, findEligibleURI, forceWakeQueues, forget, getBalanceReplenishAmount, getErrorPenaltyAmount, getInactiveQueuesForPrecedence, getInProcessCount, getLargestQueuesCount, getMaxInWait, getMaxQueuesPerReportCategory, getPrecedenceFloor, getQueuePrecedencePolicy, getQueueTotalBudget, getSnoozedCount, getSnoozeLongMs, getTotalEligibleInactiveQueues, getTotalInactiveQueues, getTotalIneligibleInactiveQueues, getUriUniqFilter, handleQueue, initInternalQueues, isEmpty, processFinish, processScheduleAlways, processScheduleIfUnique, readyQueue, reconsiderRetiredQueues, reenqueueQueue, reportTo, retireQueue, schedule, sendToQueue, setApplicationContext, setBalanceReplenishAmount, setErrorPenaltyAmount, setLargestQueuesCount, setMaxQueuesPerReportCategory, setPrecedenceFloor, setQueuePrecedencePolicy, setQueueTotalBudget, setSnoozeLongMs, setUriUniqFilter, shortReportLegend, shortReportLineTo, shortReportMap, start, stop, updateHighestWaiting, wakeQueues
addedSeed, beginDisposition, concludedSeedBatch, crawlEnded, decrementQueuedCount, disregardedUriCount, doJournalAdded, doJournalDisregarded, doJournalEmitted, doJournalFinishedFailure, doJournalFinishedSuccess, doJournalReenqueued, doJournalRelocated, endDisposition, failedFetchCount, finished, finishedUriCount, futureUriCount, getClassKey, getCrawlController, getExtract404s, getExtractIndependently, getFrontierJournal, getFrontierPreparer, getKeyedProperties, getLoggerModule, getMaxOutlinks, getMaxRetries, getRecoveryLogEnabled, getRetryDelaySeconds, getScope, getSeeds, getServerCache, getSheetOverlaysManager, importRecoverFormat, importURIs, importURIsSimple, incrementDisregardedUriCount, incrementFailedFetchCount, incrementQueuedUriCount, incrementQueuedUriCount, incrementSucceededFetchCount, isDisregarded, isRunning, log, logNonfatalErrors, managementTasks, needsReenqueuing, next, nonseedLine, noteAboutToEmit, onApplicationEvent, overMaxRetries, pause, prepForFrontier, queuedUriCount, reachedState, receive, requestState, retryDelayFor, run, setCrawlController, setExtract404s, setExtractIndependently, setFrontierPreparer, setLoggerModule, setMaxOutlinks, setMaxRetries, setRecoveryLogEnabled, setRetryDelaySeconds, setScope, setSeeds, setServerCache, setSheetOverlaysManager, shortReportLine, startManagerThread, succeededFetchCount, tally, terminate, unpause
protected SortedMap<Integer,Queue<String>> inactiveQueuesByPrecedence
protected StoredQueue<String> retiredQueues
protected transient BdbMultipleWorkQueues pendingUris
protected BdbModule bdb
protected String beanName
protected boolean dumpPendingAtClose
protected Checkpoint recoveryCheckpoint
public void setBdbModule(BdbModule bdb)
public void setBeanName(String name)
setBeanName
in interface org.springframework.beans.factory.BeanNameAware
public boolean getDumpPendingAtClose()
public void setDumpPendingAtClose(boolean dumpPendingAtClose)
protected SortedMap<Integer,Queue<String>> getInactiveQueuesByPrecedence()
WorkQueueFrontier
getInactiveQueuesByPrecedence
in class WorkQueueFrontier
protected Queue<String> getRetiredQueues()
WorkQueueFrontier
getRetiredQueues
in class WorkQueueFrontier
protected BdbMultipleWorkQueues createMultipleWorkQueues() throws com.sleepycat.je.DatabaseException
com.sleepycat.je.DatabaseException
protected WorkQueue getQueueFor(String classKey)
getQueueFor
in class WorkQueueFrontier
classKey
- key to look forpublic Frontier.FrontierGroup getGroup(CrawlURI curi)
Frontier
public CompositeData getURIsList(String marker, int numberOfMatches, String pattern, boolean verbose)
getURIsList
in interface Frontier
marker
- numberOfMatches
- verbose
- FrontierMarker
,
#getInitialMarker(String, boolean)
protected void finalTasks()
AbstractFrontier
finalTasks
in class AbstractFrontier
public void close()
WorkQueueFrontier
close
in interface Closeable
close
in interface AutoCloseable
close
in class WorkQueueFrontier
protected BdbMultipleWorkQueues getWorkQueues()
protected boolean workQueueDataOnDisk()
WorkQueueFrontier
true
if the WorkQueue implementation of this
Frontier stores its workload on disk instead of relying
on serialization mechanisms.
TODO: rename! (this is a very misleading name) or kill (don't
see any implementations that return false)workQueueDataOnDisk
in class WorkQueueFrontier
public void startCheckpoint(Checkpoint checkpointInProgress)
Checkpointable
startCheckpoint
in interface Checkpointable
checkpointInProgress
- Checkpointpublic void doCheckpoint(Checkpoint checkpointInProgress)
Checkpointable
doCheckpoint
in interface Checkpointable
checkpointInProgress
- Checkpointpublic void finishCheckpoint(Checkpoint checkpointInProgress)
Checkpointable
finishCheckpoint
in interface Checkpointable
checkpointInProgress
- Checkpointpublic void setRecoveryCheckpoint(Checkpoint checkpoint)
Checkpointable
setRecoveryCheckpoint
in interface Checkpointable
checkpoint
- Checkpointprotected void initAllQueues() throws com.sleepycat.je.DatabaseException
WorkQueueFrontier
initAllQueues
in class WorkQueueFrontier
com.sleepycat.je.DatabaseException
protected void initOtherQueues() throws com.sleepycat.je.DatabaseException
WorkQueueFrontier
initOtherQueues
in class WorkQueueFrontier
com.sleepycat.je.DatabaseException
protected Queue<String> createInactiveQueueForPrecedence(int precedence)
WorkQueueFrontier
createInactiveQueueForPrecedence
in class WorkQueueFrontier
protected Queue<String> createInactiveQueueForPrecedence(int precedence, boolean usePriorData)
public void dumpAllPendingToLog() throws com.sleepycat.je.DatabaseException
com.sleepycat.je.DatabaseException
public void consistencyCheck()
protected void consistencyMarkup(DisposableStoredSortedMap<String,String> queueSummaries, Iterable<?> queues, String mark)
Copyright © 2003-2014 Internet Archive. All Rights Reserved.