public class ExtractorImpliedURI extends Extractor
DEFAULT_PARAMETERS, extractorParameters, loggerModule, numberOfLinksExtracted
Constructor and Description |
---|
ExtractorImpliedURI()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
extract(CrawlURI curi)
Perform usual extraction on a CrawlURI
|
protected static String |
extractImplied(CharSequence uri,
Pattern trigger,
String build)
Utility method for extracting 'implied' URI given a source uri,
trigger pattern, and build pattern.
|
String |
getFormat() |
Pattern |
getRegex() |
boolean |
getRemoveTriggerUris() |
void |
setFormat(String format) |
void |
setRegex(Pattern regex) |
void |
setRemoveTriggerUris(boolean remove) |
protected boolean |
shouldProcess(CrawlURI uri)
Determines whether the given uri should be processed by this
processor.
|
addOutlink, fromCheckpointJson, getExtractorParameters, getLoggerModule, innerProcess, logUriError, report, setExtractorParameters, setLoggerModule, toCheckpointJson
doCheckpoint, finishCheckpoint, flattenVia, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop
public Pattern getRegex()
public void setRegex(Pattern regex)
public String getFormat()
public void setFormat(String format)
public boolean getRemoveTriggerUris()
public void setRemoveTriggerUris(boolean remove)
protected boolean shouldProcess(CrawlURI uri)
Processor
shouldProcess
in class Processor
uri
- the URI to testpublic void extract(CrawlURI curi)
protected static String extractImplied(CharSequence uri, Pattern trigger, String build)
uri
- source to check for implied URItrigger
- regex pattern which if matched implies another URIbuild
- replacement pattern to build the implied URICopyright © 2003-2014 Internet Archive. All Rights Reserved.