Package | Description |
---|---|
org.archive.crawler.frontier | |
org.archive.crawler.reporting | |
org.archive.modules |
The beginnings of a refactored settings framework.
|
org.archive.modules.extractor | |
org.archive.modules.net | |
org.archive.net | |
org.archive.util |
Modifier and Type | Method and Description |
---|---|
protected String |
URIAuthorityBasedQueueAssignmentPolicy.bucketBasis(UURI uuri)
Base subqueue on first path-segment, if any.
|
protected String |
HostnameQueueAssignmentPolicy.getCoreKey(UURI basis) |
protected abstract String |
URIAuthorityBasedQueueAssignmentPolicy.getCoreKey(UURI basis) |
protected String |
SurtAuthorityQueueAssignmentPolicy.getCoreKey(UURI basis) |
protected int |
URIAuthorityBasedQueueAssignmentPolicy.getSubqueue(UURI basisUuri,
int parallelQueues) |
Modifier and Type | Method and Description |
---|---|
void |
CrawlerLoggerModule.logUriError(org.apache.commons.httpclient.URIException e,
UURI u,
CharSequence l)
Log a URIException from deep inside other components to the crawl's
shared log.
|
Modifier and Type | Method and Description |
---|---|
UURI |
CrawlURI.getBaseURI()
Get the (HTML) Base URI used for derelativizing internal URIs.
|
UURI |
CrawlURI.getPolicyBasisUURI()
Get the UURI that should be used as the basis of policy/overlay
decisions.
|
UURI |
CrawlURI.getUURI() |
UURI |
CrawlURI.getVia() |
protected UURI |
CrawlURI.readUuri(String u)
Read a UURI from a String, handling a null or URIException
|
Modifier and Type | Method and Description |
---|---|
CrawlURI |
CrawlURI.createCrawlURI(UURI baseUURI,
Link link)
Utility method for creation of CandidateURIs found extracting
links from this CrawlURI.
|
CrawlURI |
CrawlURI.createCrawlURI(UURI baseUURI,
Link link,
int scheduling,
boolean seed)
Utility method for creation of CandidateURIs found extracting
links from this CrawlURI.
|
void |
CrawlURI.setBaseURI(UURI base) |
void |
CrawlURI.setVia(UURI via) |
Constructor and Description |
---|
CrawlURI(UURI uuri)
Create a new instance of CrawlURI from a
UURI . |
CrawlURI(UURI u,
String pathFromSeed,
UURI via,
LinkContext viaContext) |
Modifier and Type | Method and Description |
---|---|
protected static List<String> |
ExtractorURI.extractQueryStringLinks(UURI source)
Look for URIs inside the supplied UURI.
|
void |
UriErrorLoggerModule.logUriError(org.apache.commons.httpclient.URIException e,
UURI u,
CharSequence l) |
void |
Extractor.logUriError(org.apache.commons.httpclient.URIException e,
UURI uuri,
CharSequence l) |
Modifier and Type | Method and Description |
---|---|
CrawlHost |
ServerCache.getHostFor(UURI uuri)
Get the
CrawlHost associated with curi . |
CrawlServer |
ServerCache.getServerFor(UURI uuri)
Get the
CrawlServer associated with curi . |
static String |
CrawlServer.getServerKey(UURI uuri)
Get key to use doing lookup on server instances.
|
Modifier and Type | Method and Description |
---|---|
static UURI |
UURIFactory.getInstance(String uri) |
static UURI |
UURIFactory.getInstance(UURI base,
String relative) |
protected UURI |
UURIFactory.makeOne(String fixedUpUri,
boolean escaped,
String charset) |
Modifier and Type | Method and Description |
---|---|
static UURI |
UURIFactory.getInstance(UURI base,
String relative) |
Modifier and Type | Method and Description |
---|---|
static String |
UriUtils.speculativeFixup(String candidate,
UURI base)
Perform additional fixup of likely-URI Strings
|
Copyright © 2003-2014 Internet Archive. All Rights Reserved.