public class ExtractorURI extends Extractor
Modifier and Type | Field and Description |
---|---|
protected static String |
ABS_HTTP_URI_PATTERN |
DEFAULT_PARAMETERS, extractorParameters, loggerModule, numberOfLinksExtracted
Constructor and Description |
---|
ExtractorURI()
Constructor
|
Modifier and Type | Method and Description |
---|---|
void |
extract(CrawlURI curi)
Perform usual extraction on a CrawlURI
|
protected void |
extractLink(CrawlURI curi,
Link wref)
Consider a single Link for internal URIs
|
protected static List<String> |
extractQueryStringLinks(UURI source)
Look for URIs inside the supplied UURI.
|
protected boolean |
shouldProcess(CrawlURI uri)
Determines whether the given uri should be processed by this
processor.
|
addOutlink, fromCheckpointJson, getExtractorParameters, getLoggerModule, innerProcess, logUriError, report, setExtractorParameters, setLoggerModule, toCheckpointJson
doCheckpoint, finishCheckpoint, flattenVia, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop
protected static final String ABS_HTTP_URI_PATTERN
protected boolean shouldProcess(CrawlURI uri)
Processor
shouldProcess
in class Processor
uri
- the URI to testpublic void extract(CrawlURI curi)
protected void extractLink(CrawlURI curi, Link wref)
curi
- CrawlURI to add discoveries towref
- Link to examine for internal URIsCopyright © 2003-2014 Internet Archive. All Rights Reserved.