public abstract class WorkQueue extends Object implements Frontier.FrontierGroup, Serializable, org.archive.util.Reporter, Delayed, IdentityCacheable
Modifier and Type | Field and Description |
---|---|
protected boolean |
active
whether queue is active (ready/in-process/snoozed) or on a waiting queue
|
protected String |
classKey
The classKey
|
protected long |
costCount
Total number of items charged against queue; with totalExpenditure
can be used to calculate 'average cost'.
|
protected long |
count
Total number of stored items
|
protected long |
enqueueCount
Total number of items ever enqueued
|
protected long |
errorCount
count of errors encountered
|
protected long |
expenditureAtLastActivation
Record of expenditures at last activation (session start)
|
protected boolean |
isManaged
Whether queue is already in lifecycle stage
|
protected int |
lastCost
Cost of the last item to be charged against queue
|
protected long |
lastDequeueTime
time of last dequeue (disposition of some URI)
|
protected String |
lastPeeked
Last URI peeked
|
protected String |
lastQueued
Last URI enqueued
|
protected CrawlURI |
peekItem
The next item to be returned
|
protected PrecedenceProvider |
precedenceProvider
assigned precedence
|
protected boolean |
retired |
protected int |
sessionBudget
Per-session 'budget' controlling activity duration
|
protected FetchStats |
substats
Substats for all CrawlURIs in this group
|
protected long |
totalBudget
Total to spend on this queue over its lifetime
|
protected long |
totalExpenditure
Running tally of total expenditures on this queue
|
protected long |
wakeTime
Time to wake, if snoozed
|
Modifier and Type | Method and Description |
---|---|
int |
compareTo(Delayed obj) |
void |
considerActive()
Begin an 'active' session, which begins when a queue first offers a
URI for crawling, and continues until it is deactivated (for example,
for session-budget reasons).
|
protected abstract void |
deleteItem(WorkQueueFrontier frontier,
CrawlURI item)
Removes the given item from the queue.
|
long |
deleteMatching(WorkQueueFrontier frontier,
String match)
Delete URIs matching the given pattern from this queue.
|
protected abstract long |
deleteMatchingFromQueue(WorkQueueFrontier frontier,
String match)
Delete URIs matching the given pattern from this queue.
|
protected void |
dequeue(WorkQueueFrontier frontier,
CrawlURI expected)
Remove the peekItem from the queue and adjusts the count.
|
protected long |
enqueue(WorkQueueFrontier frontier,
CrawlURI curi)
Add the given CrawlURI, noting its addition in running count.
|
void |
expend(int amount)
Decrease the internal running budget by the given amount.
|
String |
getClassKey() |
long |
getCount()
Count of URIs in this queue.
|
long |
getDelay(TimeUnit unit) |
String |
getKey() |
int |
getPrecedence() |
PrecedenceProvider |
getPrecedenceProvider() |
protected long |
getSessionBalance() |
int |
getSessionBudget()
Return current session 'activity budget balance'
|
FetchStats |
getSubstats() |
long |
getTotalExpenditure()
Return the tally of all expenditures on this queue
|
long |
getWakeTime() |
protected abstract void |
insertItem(WorkQueueFrontier frontier,
CrawlURI curi,
boolean overwriteIfPresent)
Insert the given curi, whether it is already present or not.
|
boolean |
isManaged()
Whether the queue is already in a lifecycle stage --
such as ready, in-progress, snoozed -- and thus should
not be redundantly inserted to readyClassQueues
|
boolean |
isOverSessionBudget()
Check whether queue has temporarily (session) exceeded its budget.
|
boolean |
isOverTotalBudget()
Check whether queue has permanently (total) exceeded its budget.
|
boolean |
isRetired() |
void |
makeDirty() |
void |
noteDeactivated()
Update queue state to recognize it has been sent to one of the
inactive (by-precedence) queues, waiting for a turn.
|
void |
noteError(int penalty)
Note an error and assess an extra penalty.
|
void |
noteExhausted()
Update queue state to recognize it has been completely exhausted,
and is no longer on any of the ready/inactive queues-of-queues
|
CrawlURI |
peek(WorkQueueFrontier frontier)
Return the topmost queue item -- and remember it,
such that even later higher-priority inserts don't
change it.
|
protected abstract CrawlURI |
peekItem(WorkQueueFrontier frontier)
Returns first item from queue (does not delete)
|
void |
reportTo(PrintWriter writer) |
void |
setIdentityCache(ObjectIdentityCache<?> cache) |
void |
setPrecedenceProvider(PrecedenceProvider precedenceProvider) |
protected void |
setRetired(boolean b)
Set the retired status of this queue.
|
protected void |
setSessionBudget(int budget)
Set the session 'activity budget' to the given value.
|
protected void |
setTotalBudget(long budget)
Set the total expenditure level allowable before queue is
considered inherently 'over-budget'.
|
void |
setWakeTime(long l) |
String |
shortReportLegend() |
String |
shortReportLine() |
void |
shortReportLineTo(PrintWriter writer) |
Map<String,Object> |
shortReportMap() |
void |
tally(CrawlURI curi,
FetchStats.Stage stage) |
String |
toString() |
void |
unpeek(CrawlURI expected)
Forgive the peek, allowing a subsequent peek to
return a different item.
|
protected void |
update(WorkQueueFrontier frontier,
CrawlURI curi)
Update the given CrawlURI, which should already be present.
|
protected final String classKey
protected boolean active
protected long count
protected long enqueueCount
protected boolean isManaged
protected long wakeTime
protected PrecedenceProvider precedenceProvider
protected int sessionBudget
protected int lastCost
protected long costCount
protected long totalExpenditure
protected long expenditureAtLastActivation
protected long totalBudget
protected transient CrawlURI peekItem
protected String lastQueued
protected String lastPeeked
protected long lastDequeueTime
protected long errorCount
protected FetchStats substats
protected boolean retired
public WorkQueue(String pClassKey)
public long deleteMatching(WorkQueueFrontier frontier, String match)
frontier
- match
- protected long enqueue(WorkQueueFrontier frontier, CrawlURI curi)
frontier
- Work queues manager.curi
- CrawlURI to insert.public CrawlURI peek(WorkQueueFrontier frontier)
frontier
- Work queues managerprotected void dequeue(WorkQueueFrontier frontier, CrawlURI expected)
frontier
- Work queues manager.protected void setSessionBudget(int budget)
balance
- to usepublic int getSessionBudget()
public void considerActive()
protected void setTotalBudget(long budget)
budget
- public boolean isOverSessionBudget()
public boolean isOverTotalBudget()
public long getTotalExpenditure()
public void expend(int amount)
amount
- tp decrementpublic void noteError(int penalty)
penalty
- additional amount to deductpublic void setWakeTime(long l)
l
- public long getWakeTime()
public String getClassKey()
public void unpeek(CrawlURI expected)
public final int compareTo(Delayed obj)
compareTo
in interface Comparable<Delayed>
protected void update(WorkQueueFrontier frontier, CrawlURI curi)
frontier
- Work queues manager.curi
- CrawlURI to update.public long getCount()
protected abstract void insertItem(WorkQueueFrontier frontier, CrawlURI curi, boolean overwriteIfPresent) throws IOException
frontier
- WorkQueueFrontier.curi
- CrawlURI to insert.IOException
- if there was a problem while inserting the itemprotected abstract long deleteMatchingFromQueue(WorkQueueFrontier frontier, String match) throws IOException
frontier
- WorkQueues manager.match
- the pattern to matchIOException
- if there was a problem while deletingprotected abstract void deleteItem(WorkQueueFrontier frontier, CrawlURI item) throws IOException
frontier
- Work queues manager.IOException
- if there was a problem while deleting the itemprotected abstract CrawlURI peekItem(WorkQueueFrontier frontier) throws IOException
IOException
- if there was a problem while peekingpublic Map<String,Object> shortReportMap()
shortReportMap
in interface org.archive.util.Reporter
protected long getSessionBalance()
public void shortReportLineTo(PrintWriter writer)
shortReportLineTo
in interface org.archive.util.Reporter
public String shortReportLegend()
shortReportLegend
in interface org.archive.util.Reporter
public String shortReportLine()
public void reportTo(PrintWriter writer)
reportTo
in interface org.archive.util.Reporter
writer
- IOException
public FetchStats getSubstats()
getSubstats
in interface FetchStats.HasFetchStats
protected void setRetired(boolean b)
b
- new value for retired statuspublic boolean isRetired()
public PrecedenceProvider getPrecedenceProvider()
public void setPrecedenceProvider(PrecedenceProvider precedenceProvider)
precedenceProvider
- the precedenceProvider to setpublic int getPrecedence()
public void tally(CrawlURI curi, FetchStats.Stage stage)
tally
in interface FetchStats.CollectsFetchStats
public void noteDeactivated()
public void noteExhausted()
public boolean isManaged()
public String getKey()
getKey
in interface IdentityCacheable
public void makeDirty()
makeDirty
in interface IdentityCacheable
public void setIdentityCache(ObjectIdentityCache<?> cache)
setIdentityCache
in interface IdentityCacheable
Copyright © 2003-2014 Internet Archive. All Rights Reserved.