Class | Description |
---|---|
AbstractFrontier |
Shared facilities for Frontier implementations.
|
AntiCalendarCostAssignmentPolicy |
CostAssignmentPolicy that further penalizes URIs with
calendar-suggestive strings in them, with an extra unit
of cost.
|
AssignmentLevelSurtQueueAssignmentPolicy |
Create a queueKey based on the SURT authority, reduced to the
public-suffix-plus-one domain (topmost assignable domain).
|
BdbFrontier |
A Frontier using several BerkeleyDB JE Databases to hold its record of
known hosts (queues), and pending URIs.
|
BdbMultipleWorkQueues |
A BerkeleyDB-database-backed structure for holding ordered
groupings of CrawlURIs.
|
BdbWorkQueue |
One independent queue of items with the same 'classKey' (eg host).
|
BucketQueueAssignmentPolicy |
Uses the target IPs as basis for queue-assignment,
distributing them over a fixed number of sub-queues.
|
CostAssignmentPolicy |
Calculate a integer 'cost' value for the given CrawlURI.
|
FrontierJournal |
Helper class for managing a simple Frontier change-events journal which is
useful for recovering from crawl problems.
|
HostnameQueueAssignmentPolicy |
QueueAssignmentPolicy based on the hostname:port evident in the given
CrawlURI.
|
IPQueueAssignmentPolicy |
Uses target IP as basis for queue-assignment, unless it is unavailable,
in which case it behaves as HostnameQueueAssignmentPolicy.
|
QueueAssignmentPolicy |
Establishes a mapping from CrawlURIs to String keys (queue names).
|
RecyclingSerialBinding<K> |
A SerialBinding that recycles a single FastOutputStream per
thread, avoiding reallocation of the internal buffer for
either repeated serializations or because of mid-serialization
expansions.
|
SurtAuthorityQueueAssignmentPolicy |
SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname.
|
UnitCostAssignmentPolicy |
A CostAssignment policy that uses a constant value of 1 for all CrawlURIs.
|
URIAuthorityBasedQueueAssignmentPolicy |
SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname.
|
WagCostAssignmentPolicy |
A CostAssignmentPolicy based on some wild guesses of kinds of URIs
that should be deferred into the (potentially never-crawled) future.
|
WorkQueue |
A single queue of related URIs to visit, grouped by a classKey
(typically "hostname:port" or similar)
|
WorkQueueFrontier |
A common Frontier base using several queues to hold pending URIs.
|
ZeroCostAssignmentPolicy |
CostAssignmentPolicy considering all URIs costless -- essentially
disabling budgetting features.
|
Copyright © 2003-2014 Internet Archive. All Rights Reserved.