public class AggressiveExtractorHTML extends ExtractorHTML
Modifier and Type | Field and Description |
---|---|
protected static Logger |
logger |
A_FORM_OFFSETS, A_META_ROBOTS, extractorJS, metadata
DEFAULT_PARAMETERS, extractorParameters, loggerModule, numberOfLinksExtracted
Constructor and Description |
---|
AggressiveExtractorHTML() |
Modifier and Type | Method and Description |
---|---|
protected void |
processScript(CrawlURI curi,
CharSequence sequence,
int endOfOpenTag) |
addLinkFromString, afterPropertiesSet, considerIfLikelyUri, considerQueryStringValues, elementContext, extract, getContentDeclaredCharset, getExtractJavascript, getExtractOnlyFormGets, getExtractorJS, getExtractValueAttributes, getIgnoreFormActionUrls, getIgnoreUnexpectedHtml, getMaxAttributeNameLength, getMaxAttributeValLength, getMaxElementLength, getMetadata, getTreatFramesAsEmbedLinks, innerExtract, isHtmlExpectedHere, processEmbed, processEmbed, processGeneralTag, processLink, processMeta, processScriptCode, processStyle, setExtractJavascript, setExtractOnlyFormGets, setExtractorJS, setExtractValueAttributes, setIgnoreFormActionUrls, setIgnoreUnexpectedHtml, setMaxAttributeNameLength, setMaxAttributeValLength, setMaxElementLength, setMetadata, setTreatFramesAsEmbedLinks, shouldExtract
extract, shouldProcess
addOutlink, fromCheckpointJson, getExtractorParameters, getLoggerModule, innerProcess, logUriError, report, setExtractorParameters, setLoggerModule, toCheckpointJson
doCheckpoint, finishCheckpoint, flattenVia, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop
protected static Logger logger
protected void processScript(CrawlURI curi, CharSequence sequence, int endOfOpenTag)
processScript
in class ExtractorHTML
Copyright © 2003-2014 Internet Archive. All Rights Reserved.