A B C D E F G H I J K L M N O P Q R S T U V W X Z _ 

A

A_ANNOTATIONS - Static variable in interface org.archive.modules.CoreAttributeConstants
shorthand string tokens indicating notable occurrences, separated by commas
A_CONTENT_DIGEST - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
content digest
A_CONTENT_DIGEST_COUNT - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
number of times we've seen this content digest (1 original + n duplicates)
A_CONTENT_DIGEST_HISTORY - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
content digest history map
A_CONTENT_TYPE - Static variable in interface org.archive.modules.CoreAttributeConstants
Extracted MIME type of fetched content; should be set immediately by fetching module if possible (rather than waiting for a later analyzer)
A_CREDENTIALS_KEY - Static variable in interface org.archive.modules.CoreAttributeConstants
Key to get credential avatars from A_LIST.
A_DELAY_FACTOR - Static variable in interface org.archive.modules.CoreAttributeConstants
Multiplier of last fetch duration to wait before fetching another item of the same class (eg host)
A_DISTANCE_FROM_SEED - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_DNS_FETCH_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_DNS_SERVER_IP_LABEL - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_ETAG_HEADER - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
header name (and AList key) for ETag
A_FETCH_BEGAN_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_FETCH_COMPLETED_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_FETCH_HISTORY - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
fetch history array
A_FORCE_RETIRE - Static variable in interface org.archive.modules.CoreAttributeConstants
flag indicating the containing queue should be retired
A_FORM_OFFSETS - Static variable in class org.archive.modules.extractor.ExtractorHTML
 
A_FTP_CONTROL_CONVERSATION - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_FTP_FETCH_STATUS - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_HERITABLE_KEYS - Static variable in interface org.archive.modules.CoreAttributeConstants
Key to (optional) attribute specifying a list of keys that are passed to CandidateURIs that 'descend' (are discovered via) this URI.
A_HREF - Static variable in class org.archive.modules.extractor.HTMLLinkContext
 
A_HTML_BASE - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_HTML_FORM_OBJECTS - Static variable in class org.archive.modules.forms.ExtractorHTMLForms
 
A_HTTP_AUTH_CHALLENGES - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_HTTP_PROXY_HOST - Static variable in interface org.archive.modules.CoreAttributeConstants
local override of proxy host
A_HTTP_PROXY_PORT - Static variable in interface org.archive.modules.CoreAttributeConstants
local override of proxy port
A_LAST_MODIFIED_HEADER - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
header name (and AList key) for last-modified timestamp
A_META_ROBOTS - Static variable in class org.archive.modules.extractor.ExtractorHTML
 
A_MINIMUM_DELAY - Static variable in interface org.archive.modules.CoreAttributeConstants
Minimum delay before fetching another item of th same class (eg host).
A_MIRROR_PATH - Static variable in interface org.archive.modules.CoreAttributeConstants
Define for org.archive.crawler.writer.MirrorWriterProcessor.
A_MIRROR_PATH - Static variable in class org.archive.modules.writer.MirrorWriterProcessor
 
A_NONFATAL_ERRORS - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_ORIGINAL_DATE - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
date content payload was written
A_ORIGINAL_URL - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
url that the content payload was written for
A_PRECALC_PRECEDENCE - Static variable in interface org.archive.modules.CoreAttributeConstants
key to attribute containing pre-calculated precedence
A_PREREQUISITE_URI - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_REFERENCE_LENGTH - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
reference length (content length or virtual length
A_RETRY_DELAY - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_RRECORD_SET_LABEL - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_RUNTIME_EXCEPTION - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_SOURCE_TAG - Static variable in interface org.archive.modules.CoreAttributeConstants
a 'source' (usu.
A_STATUS - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
key for status (when in history)
A_SUBMIT_DATA - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_VIA_DIGEST - Static variable in class org.archive.modules.extractor.TrapSuppressExtractor
ALIst attribute key for carrying-forward content-digest from 'via'
A_WARC_FILE_OFFSET - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
offset into warc file of warc record with content payload
A_WARC_FILENAME - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
warc filename containing the content payload
A_WARC_RECORD_ID - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
warc record id of warc record with the content payload
A_WARC_RESPONSE_HEADERS - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_WHOIS_SERVER_IP - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_WRITE_TAG - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
Writer processors of all types are encouraged to put a 'writeTag' (analogous to HTTP 'etag') in the CrawlURI state.
abort() - Method in class org.apache.commons.httpclient.HttpMethodBase
Aborts the execution of this method.
aboutToLog() - Method in class org.archive.modules.CrawlURI
Notify CrawlURI it is about to be logged; opportunity for self-annotation
ABS_HTTP_URI_PATTERN - Static variable in class org.archive.modules.extractor.ExtractorURI
 
AbstractContentDigestHistory - Class in org.archive.modules.recrawl
Represents a store of information, presumably persistent, keyed by content digest.
AbstractContentDigestHistory() - Constructor for class org.archive.modules.recrawl.AbstractContentDigestHistory
 
AbstractCookieStorage - Class in org.archive.modules.fetcher
 
AbstractCookieStorage() - Constructor for class org.archive.modules.fetcher.AbstractCookieStorage
 
AbstractFrontier - Class in org.archive.crawler.frontier
Shared facilities for Frontier implementations.
AbstractFrontier() - Constructor for class org.archive.crawler.frontier.AbstractFrontier
 
AbstractLongFPSet - Class in org.archive.util
Shell of functionality for a Set of primitive long fingerprints, held in an array of possibly-empty slots.
AbstractLongFPSet() - Constructor for class org.archive.util.AbstractLongFPSet
To support serialization TODO: verify needed?
AbstractLongFPSet(int, float) - Constructor for class org.archive.util.AbstractLongFPSet
Create a new AbstractLongFPSet with a given capacity and load Factor
AbstractPersistProcessor - Class in org.archive.modules.recrawl
 
AbstractPersistProcessor() - Constructor for class org.archive.modules.recrawl.AbstractPersistProcessor
 
ac - Variable in class org.archive.crawler.framework.CrawlJob
 
AcceptDecideRule - Class in org.archive.modules.deciderules
 
AcceptDecideRule() - Constructor for class org.archive.modules.deciderules.AcceptDecideRule
 
acceptNonDnsResolves - Variable in class org.archive.modules.fetcher.FetchDNS
If a DNS lookup fails, whether or not to fallback to InetAddress resolution, which may use local 'hosts' files or other mechanisms.
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.EngineResource
 
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.EnhDirectoryResource
Accept a POST used to edit or create a file.
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.JobResource
 
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.ScriptResource
 
accepts(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
 
accumulate(CrawlURI) - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
actionDir - Variable in class org.archive.crawler.framework.ActionDirectory
 
ActionDirectory - Class in org.archive.crawler.framework
Directory watched for new files.
ActionDirectory() - Constructor for class org.archive.crawler.framework.ActionDirectory
 
actions - Variable in class org.archive.modules.extractor.CustomSWFTags
 
activateInactiveQueue() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Activate an inactive queue, if any are available.
active - Variable in class org.archive.crawler.frontier.WorkQueue
whether queue is active (ready/in-process/snoozed) or on a waiting queue
actOn(File) - Method in class org.archive.crawler.framework.ActionDirectory
Process an individual action file found
actOn(File) - Method in class org.archive.modules.seeds.SeedModule
 
actOn(File) - Method in class org.archive.modules.seeds.TextSeedModule
Treat the given file as a source of additional seeds, announcing to SeedListeners.
add(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Add given uri, if not already present.
add(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
add(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
add(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
 
add(long) - Method in class org.archive.util.AbstractLongFPSet
Add the given value to this set
add(CharSequence) - Method in interface org.archive.util.BloomFilter
Adds a character sequence to the filter.
add(CharSequence) - Method in class org.archive.util.BloomFilter64bit
Adds a character sequence to the filter.
add(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
add(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Add a fingerprint to the set.
add(Histotable<K>) - Method in class org.archive.util.Histotable
 
add(Iterator<E>) - Method in class org.archive.util.iterator.CompositeIterator
Add an iterator to the internal chain.
addAllow(String) - Method in class org.archive.modules.net.RobotsDirectives
 
addCap(byte[]) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Add a dummy 'cap' entry at the given insertion key.
addCookie(Cookie) - Method in class org.apache.commons.httpclient.HttpState
Adds an HTTP cookie, replacing any existing equivalent cookies.
addCookieRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates Cookie request headers for those cookies that match the given host, port and path.
addCookies(Cookie[]) - Method in class org.apache.commons.httpclient.HttpState
Adds an array of HTTP cookies.
addCredential(Credential) - Method in class org.archive.modules.net.CrawlServer
Add an avatar.
addDataPersistentMember(String) - Static method in class org.archive.modules.CrawlURI
Add the key of data map items you want to persist across processings.
addDisallow(String) - Method in class org.archive.modules.net.RobotsDirectives
 
added(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
addedSeed(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
When notified of a seed via the SeedListener interface, schedule it.
addedSeed(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
Create a seed record, even on initial notification (before any real attempt/processing.
addedSeed(CrawlURI) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
If appropriate, convert seed notification into prefix-addition.
addedSeed(CrawlURI) - Method in interface org.archive.modules.seeds.SeedListener
 
addExternalPath(String) - Method in class org.archive.spring.KeyedProperties
Add a path by which the outside world can reach this map
addExtraInfo(String, Object) - Method in class org.archive.modules.CrawlURI
 
addField(String, String, String) - Method in class org.archive.modules.forms.HTMLForm
Add a discovered INPUT, tracking it as potential username/password receiver.
addFlash(Response, String) - Static method in class org.archive.crawler.restlet.Flash
 
addFlash(Response, String, Flash.Kind) - Static method in class org.archive.crawler.restlet.Flash
 
addForce(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Add given uri, all the way through to underlying destination, even if already present.
addForce(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
addForce(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
addGlobalVariable(String, String) - Method in class org.archive.crawler.restlet.ScriptingConsole
 
addHeaderLink(CrawlURI, Header) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
addHeaderLink(CrawlURI, String, String) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
addHostRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates Host request header, as long as no Host request header already exists.
addIfNotBlank(ANVLRecord, String, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
addJobDirectory(File) - Method in class org.archive.crawler.framework.Engine
Adds a job directory to the Engine known jobConfigs if not extant.
addLinkFromString(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
 
addLogger(Logger) - Method in class org.archive.crawler.reporting.AlertThreadGroup
 
addNewFp(long) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
addNewFp(long) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Add an FP (which may be an old or new FP) to the new complete list.
addNewFp(long) - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
addNow(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Immediately add uri.
addNow(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
addNow(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
addOutlink(CrawlURI, String, LinkContext, Hop) - Method in class org.archive.modules.extractor.Extractor
Create and add a 'Link' to the CrawlURI with given URI/context/hop-type
addPersistentDataMapKey(String) - Method in class org.archive.modules.CrawlURI
 
addPresentableNestedNames(Collection<Object>, Object, Set<Object>) - Method in class org.archive.crawler.restlet.JobRelatedResource
Starting at (and including) the given object, adds nested Map representations of named beans to the namedBeans Collection.
addPropertyChangeListener(PropertyChangeListener) - Method in class org.archive.io.ReadSourceEditor
 
addPropertyChangeListener(PropertyChangeListener) - Method in class org.archive.spring.ConfigPathEditor
 
addProxyConnectionHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates Proxy-Connection: Keep-Alive request header when communicating via a proxy server.
AddRedirectFromRootServerToScope - Class in org.archive.modules.deciderules
 
AddRedirectFromRootServerToScope() - Constructor for class org.archive.modules.deciderules.AddRedirectFromRootServerToScope
 
addRefreshHeaderLink(CrawlURI, Header) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
addRelativeToBase(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
 
addRelativeToVia(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
 
addRequestHeader(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
Adds the specified request header, NOT overwriting any previous value.
addRequestHeader(String, String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Adds the specified request header, NOT overwriting any previous value.
addRequestHeaders(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates all the required request headers to be submitted via the given connection.
addResponseContent(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
This method populates curi with response status and content type.
addResponseFooter(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
Use this method internally to add footers.
ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter64bit
 
addRuleAssociation(DecideRuledSheetAssociation) - Method in class org.archive.crawler.spring.SheetOverlaysManager
 
addRuleAssociations(Set<DecideRuledSheetAssociation>) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Collect all rule-based SheetAssociations.
addSeed(CrawlURI) - Method in class org.archive.modules.seeds.SeedModule
 
addSeed(CrawlURI) - Method in class org.archive.modules.seeds.TextSeedModule
Add a new seed to scope.
addSeedListener(SeedListener) - Method in class org.archive.modules.seeds.SeedModule
 
addStats(Map<String, Map<String, Long>>) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
addSurtAssociation(String, String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
 
addSurtAssociations(List<SurtPrefixesSheetAssociation>) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Collect all SURT-based SheetAssociations.
addSurtsAssociation(SurtPrefixesSheetAssociation) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Add an individual surtsAssociation to the sheetNamesBySurt map.
addToManifest(String, char, boolean) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
Add a file to the manifest of files used/generated by the current crawl.
addToManifest(String, char, boolean) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
addUserAgentRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates default User-Agent request header, as long as no User-Agent request header already exists.
addWhoisLink(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchWhois
 
addWhoisLinks(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
Adds outlinks to whois:{domain} and whois:{ipAddress}
afterPropertiesSet() - Method in class org.archive.checkpointing.Checkpoint
 
afterPropertiesSet() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
afterPropertiesSet() - Method in class org.archive.crawler.util.BloomUriUniqFilter
Initializer.
afterPropertiesSet() - Method in class org.archive.modules.CrawlMetadata
 
afterPropertiesSet() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
afterPropertiesSet() - Method in class org.archive.modules.extractor.ExtractorHTML
 
afterPropertiesSet() - Method in class org.archive.modules.ScriptedProcessor
 
agentsToDirectives - Variable in class org.archive.modules.net.Robotstxt
 
AggressiveExtractorHTML - Class in org.archive.modules.extractor
Extended version of ExtractorHTML with more aggressive javascript link extraction where javascript code is parsed first with general HTML tags regex, and than by javascript speculative link regex.
AggressiveExtractorHTML() - Constructor for class org.archive.modules.extractor.AggressiveExtractorHTML
 
AlertHandler - Class in org.archive.crawler.reporting
Stub Handler, catching and relaying WARNING/SEVERE events to AlertThreadGroup.
AlertHandler() - Constructor for class org.archive.crawler.reporting.AlertHandler
 
alertsLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
alertThreadGroup - Variable in class org.archive.crawler.framework.CrawlController
 
alertThreadGroup - Variable in class org.archive.crawler.framework.CrawlJob
 
AlertThreadGroup - Class in org.archive.crawler.reporting
Parent thread group which lets all child threads find the right 'alert' error handler.
AlertThreadGroup(String) - Constructor for class org.archive.crawler.reporting.AlertThreadGroup
 
allBeans - Variable in class org.archive.spring.ConfigPathConfigurer
 
allConfigPaths - Variable in class org.archive.spring.ConfigPathConfigurer
 
allErrors - Variable in class org.archive.spring.PathSharingContext
 
allFps - Variable in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
allNonemptyReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Compact report of all nonempty queues (one queue per line)
allowCreate - Variable in class org.archive.bdb.BdbModule.BdbConfig
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.CustomRobotsPolicy
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.IgnoreRobotsPolicy
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.ObeyRobotsPolicy
 
allows - Variable in class org.archive.modules.net.RobotsDirectives
 
allows(String) - Method in class org.archive.modules.net.RobotsDirectives
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.RobotsPolicy
 
allowsAll() - Method in class org.archive.modules.net.Robotstxt
Does this policy effectively allow everything? (No disallows or timing (crawl-delay) directives?)
allowsEdit(File) - Method in class org.archive.crawler.restlet.EnhDirectory
 
allowsPaging(File) - Method in class org.archive.crawler.restlet.EnhDirectory
 
allQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All known queues.
allQueuesReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Compact report of all nonempty queues (one queue per line)
alreadySeen - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
analyze(CrawlURI, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
Run analysis: find form METHOD, ACTION, and all INPUT names/values Log as configured.
ANNOTATION_UNWRITTEN - Static variable in class org.archive.modules.writer.WriterPoolProcessor
CrawlURI annotation indicating no record was written.
announceSeeds() - Method in class org.archive.modules.seeds.SeedModule
 
announceSeeds() - Method in class org.archive.modules.seeds.TextSeedModule
Announce all seeds from configured source to SeedListeners (including nonseed lines mixed in).
announceSeeds(CountDownLatch) - Method in class org.archive.modules.seeds.TextSeedModule
 
announceSeedsFromReader(BufferedReader, CountDownLatch) - Method in class org.archive.modules.seeds.TextSeedModule
Announce all seeds (and nonseed possible-directive lines) from the given Reader
AntiCalendarCostAssignmentPolicy - Class in org.archive.crawler.frontier
CostAssignmentPolicy that further penalizes URIs with calendar-suggestive strings in them, with an extra unit of cost.
AntiCalendarCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
appCtx - Variable in class org.archive.crawler.framework.ActionDirectory
 
appCtx - Variable in class org.archive.crawler.framework.CheckpointService
 
appCtx - Variable in class org.archive.crawler.framework.CrawlController
 
appCtx - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
appCtx - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
appCtx - Variable in class org.archive.crawler.restlet.BeanBrowseResource
 
appCtx - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
 
appCtx - Variable in class org.archive.modules.ScriptedProcessor
 
appCtx - Variable in class org.archive.spring.ConfigPathConfigurer
 
append(String) - Method in class org.archive.util.PaddingStringBuffer
append a string directly to the buffer
append(int) - Method in class org.archive.util.PaddingStringBuffer
append an int to the buffer.
append(long) - Method in class org.archive.util.PaddingStringBuffer
append a long to the buffer.
appendQueueReports(PrintWriter, String, Iterator<?>, int, int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Append queue report to general Frontier report.
applyOverlaysTo(CrawlURI) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Apply the proper overlays (by Sheet beanName) to the given CrawlURI, according to configured associations.
applyQuota(CrawlURI, String, long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
Apply the quota specified by the given key against the actual value provided.
Arc2Warc - Class in org.archive.io
Convert ARCs to (sortof) WARCs.
Arc2Warc() - Constructor for class org.archive.io.Arc2Warc
 
ARCHIVE_TIME_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
ARCWriterProcessor - Class in org.archive.modules.writer
Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format.
ARCWriterProcessor() - Constructor for class org.archive.modules.writer.ARCWriterProcessor
 
ArrayLongFPCache - Class in org.archive.util.fingerprint
Simple long fingerprint cache using a backing array; any long maps to one of 'smear' slots.
ArrayLongFPCache() - Constructor for class org.archive.util.fingerprint.ArrayLongFPCache
 
asAnnotation() - Method in class org.archive.modules.forms.HTMLForm
Provide abbreviated annotation, of the form...
asHttpClientDataWith(String, String) - Method in class org.archive.modules.forms.HTMLForm
Create the NameValuePair array expected by HttpClient, merging username and password into the appropriate value slots.
assertNoSideEffects(CrawlURI) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
Asserts that the given URI has no URI errors, no localized errors, and no annotations.
assertNotOpen() - Method in class org.apache.commons.httpclient.HttpConnection
Throws an IllegalStateException if the connection is already open.
assertOpen() - Method in class org.apache.commons.httpclient.HttpConnection
Throws an IllegalStateException if the connection is not open.
AssignmentLevelSurtQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Create a queueKey based on the SURT authority, reduced to the public-suffix-plus-one domain (topmost assignable domain).
AssignmentLevelSurtQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.AssignmentLevelSurtQueueAssignmentPolicy
 
atFinish() - Method in class org.archive.crawler.framework.CrawlController
Evaluate if the crawl should stop because it is finished, without actually stopping the crawl.
atProcessor(Processor) - Method in class org.archive.crawler.framework.ToeThread
 
atProcessor(Processor) - Method in interface org.archive.modules.ProcessorChain.ChainStatusReceiver
 
attach(CrawlURI) - Method in class org.archive.modules.credential.Credential
Attach this credentials avatar to the passed curi .
ATTR_MAX_BYTES_WRITTEN - Static variable in class org.archive.modules.writer.Kw3WriterProcessor
Max size for each file.Key for the maximum ARC bytes to write attribute.
audience - Variable in class org.archive.modules.CrawlMetadata
 
AUDIO_VIDEO_IMAGE_MIMETYPE_SET - Static variable in class org.archive.util.UriUtils
 
AUDIO_VIDEO_IMAGE_MIMETYPES - Static variable in class org.archive.util.UriUtils
 
authenticate(Request) - Method in class org.archive.crawler.restlet.RateLimitGuard
 
authenticated(Credential, CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Has passed credential already been authenticated.
AutoKryo - Class in org.archive.bdb
Extensions to Kryo to let classes control their own registration, suggest other classes to register together, and use the same (Sun-JVM-only) trick for deserializing classes without no-arg constructors.
AutoKryo() - Constructor for class org.archive.bdb.AutoKryo
 
autoregister(Class<?>) - Method in class org.archive.bdb.AutoKryo
 
autoregisterTo(AutoKryo) - Static method in class org.archive.crawler.frontier.BdbWorkQueue
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.CrawlURI
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.CrawlHost
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.CrawlServer
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.RobotsDirectives
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.Robotstxt
 
autoregisterTo(AutoKryo) - Static method in class org.archive.util.IdentityCacheableWrapper
 
AVAILABLE_EXTRACTOR - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
availableRobotsPolicies - Variable in class org.archive.modules.CrawlMetadata
Map of all available RobotsPolicies, by name, to choose from.
averageDepth() - Method in interface org.archive.crawler.framework.Frontier
Average depth of the last URI in all eligible queues.
averageDepth() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
averageDepth - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 

B

base - Variable in class org.archive.spring.ConfigPath
 
Base32 - Class in org.archive.util
Base32 - encodes and decodes RFC3548 Base32 (see http://www.faqs.org/rfcs/rfc3548.html ) Imported public-domain code of Bitzi.
Base32() - Constructor for class org.archive.util.Base32
 
baseClass - Variable in class org.archive.bdb.KryoBinding
 
BaseQueuePrecedencePolicy - Class in org.archive.crawler.frontier.precedence
QueuePrecedencePolicy that sets a uri-queue's precedence to a configured single value.
BaseQueuePrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
BaseResource - Class in org.archive.crawler.restlet
Abstract Resource with common shared functionality.
BaseResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.BaseResource
 
BaseRule - Class in org.archive.modules.canonicalize
Base of all rules applied canonicalizing a URL that are configurable via the Heritrix settings system.
BaseRule() - Constructor for class org.archive.modules.canonicalize.BaseRule
Constructor.
BaseUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
UriPrecedencePolicy which assigns URIs a set value (perhaps a overridden for different URIs).
BaseUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
bdb - Variable in class org.archive.crawler.frontier.BdbFrontier
 
bdb - Variable in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
bdb - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
bdb - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
bdb - Variable in class org.archive.modules.fetcher.BdbCookieStorage
 
bdb - Variable in class org.archive.modules.fetcher.FetchWhois
 
bdb - Variable in class org.archive.modules.net.BdbServerCache
 
bdb - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
bdb - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
 
BdbContentDigestHistory - Class in org.archive.modules.recrawl
Bdb content digest history store.
BdbContentDigestHistory() - Constructor for class org.archive.modules.recrawl.BdbContentDigestHistory
 
BdbCookieStorage - Class in org.archive.modules.fetcher
CookieStorage using BDB, so that cookies accumulated in large crawls do not outgrow RAM.
BdbCookieStorage() - Constructor for class org.archive.modules.fetcher.BdbCookieStorage
 
BdbFrontier - Class in org.archive.crawler.frontier
A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
BdbFrontier() - Constructor for class org.archive.crawler.frontier.BdbFrontier
 
BdbModule - Class in org.archive.bdb
Utility module for managing a shared BerkeleyDB-JE environment
BdbModule() - Constructor for class org.archive.bdb.BdbModule
 
BdbModule.BdbConfig - Class in org.archive.bdb
Configuration object for databases.
BdbModule.BdbConfig() - Constructor for class org.archive.bdb.BdbModule.BdbConfig
 
BdbMultipleWorkQueues - Class in org.archive.crawler.frontier
A BerkeleyDB-database-backed structure for holding ordered groupings of CrawlURIs.
BdbMultipleWorkQueues(Database, StoredClassCatalog) - Constructor for class org.archive.crawler.frontier.BdbMultipleWorkQueues
Create the multi queue in the given environment.
BdbServerCache - Class in org.archive.modules.net
ServerCache backed by BDB big maps; the usual choice for crawls.
BdbServerCache() - Constructor for class org.archive.modules.net.BdbServerCache
 
BdbUriUniqFilter - Class in org.archive.crawler.util
A BDB implementation of an AlreadySeen list.
BdbUriUniqFilter() - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
 
BdbUriUniqFilter(File) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbUriUniqFilter(File, int) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbWorkQueue - Class in org.archive.crawler.frontier
One independent queue of items with the same 'classKey' (eg host).
BdbWorkQueue(String, BdbFrontier) - Constructor for class org.archive.crawler.frontier.BdbWorkQueue
Create a virtual queue inside the given BdbMultipleWorkQueues
BeanBrowseResource - Class in org.archive.crawler.restlet
Restlet Resource which allows browsing the constructed beans in a hierarchical fashion.
BeanBrowseResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.BeanBrowseResource
 
beanFactory - Variable in class org.archive.crawler.spring.SheetOverlaysManager
 
beanFactory - Variable in class org.archive.spring.Sheet
 
BeanFieldsPatternValidator - Class in org.archive.spring
 
BeanFieldsPatternValidator(Class<?>, String...) - Constructor for class org.archive.spring.BeanFieldsPatternValidator
 
BeanFieldsPatternValidator.PropertyPatternRule - Class in org.archive.spring
 
BeanFieldsPatternValidator.PropertyPatternRule(String, String, String) - Constructor for class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
 
BeanLookupBindings - Class in org.archive.crawler.framework
Provides syntactic sugar for H3 scripts to reference beans without adding a line like def scope = appCtx.getBean("scope");.
BeanLookupBindings(ApplicationContext) - Constructor for class org.archive.crawler.framework.BeanLookupBindings
 
BeanLookupBindings(ApplicationContext, Map<String, Object>) - Constructor for class org.archive.crawler.framework.BeanLookupBindings
 
beanName - Variable in class org.archive.crawler.frontier.BdbFrontier
 
beanName - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
beanName - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
beanName - Variable in class org.archive.modules.deciderules.DecideRuleSequence
 
beanName - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
beanName - Variable in class org.archive.modules.Processor
 
beanPath - Variable in class org.archive.crawler.restlet.BeanBrowseResource
 
beansException(BeansException) - Method in class org.archive.crawler.framework.CrawlJob
Report a BeansException during instantiation; report chain in reverse order (so root cause is first); ignore non-BeansExceptions or messages without a useful compact message.
BeansModel - Class in org.archive.crawler.restlet.models
 
BeansModel(String, String, String, Object, boolean, String, Object, Collection<Object>) - Constructor for class org.archive.crawler.restlet.models.BeansModel
 
beanToNameMap - Variable in class org.archive.crawler.restlet.JobRelatedResource
 
beginCrawlStop() - Method in class org.archive.crawler.framework.CrawlController
Start the process of stopping the crawl.
beginDisposition(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Inform frontier that a block of processing that should complete atomically with respect to checkpoints is about to begin.
beginDisposition(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
beginFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
beginFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Begin merging pending candidates with complete list.
beginFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
BenchmarkUriUniqFilters - Class in org.archive.crawler.util
BenchmarkUriUniqFilters
BenchmarkUriUniqFilters() - Constructor for class org.archive.crawler.util.BenchmarkUriUniqFilters
 
bind(String, Object) - Method in class org.archive.crawler.restlet.ScriptingConsole
 
bindObjectName(Context, ObjectName) - Static method in class org.archive.util.JndiUtils
 
bInheritHandle - Variable in class org.archive.util.FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES
 
BIT_INDEX_MASK - Static variable in class org.archive.util.BloomFilter64bit
 
bitIndexesFor(CharSequence) - Method in class org.archive.util.BloomFilter64bit
 
bits - Variable in class org.archive.util.BloomFilter64bit
The underlying bit vector
BLOCK_SIZE - Static variable in interface org.archive.util.ms.BlockFileSystem
The size of a block in bytes.
blockAwaitingSeedLines - Variable in class org.archive.modules.seeds.TextSeedModule
Number of lines of seeds-source to read on initial load before proceeding with crawl.
BlockFileSystem - Interface in org.archive.util.ms
Describes the internal file system contained in .doc files.
BlockInputStream - Class in org.archive.util.ms
InputStream for a file contained in a BlockFileSystem.
BlockInputStream(BlockFileSystem, int) - Constructor for class org.archive.util.ms.BlockInputStream
Constructor.
bloom - Variable in class org.archive.crawler.util.BloomUriUniqFilter
 
BloomFilter - Interface in org.archive.util
Common interface for different Bloom filter implementations
BloomFilter64bit - Class in org.archive.util
A Bloom filter.
BloomFilter64bit(long, int) - Constructor for class org.archive.util.BloomFilter64bit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomFilter64bit(long, int, boolean) - Constructor for class org.archive.util.BloomFilter64bit
 
BloomFilter64bit(long, int, Random, boolean) - Constructor for class org.archive.util.BloomFilter64bit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomUriUniqFilter - Class in org.archive.crawler.util
An implementation of an AlreadySeen list based on the MG4J BloomFilter.
BloomUriUniqFilter() - Constructor for class org.archive.crawler.util.BloomUriUniqFilter
Default constructor
bucketBasis(UURI) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
Base subqueue on first path-segment, if any.
bucketFor(long, int) - Method in class org.archive.util.LongToIntConsistentHash
Return the proper integer bucket-number for the given long hash, up to the given integer boundary (exclusive).
bucketFor(CharSequence, int) - Method in class org.archive.util.LongToIntConsistentHash
Convenience alternative which creates longHash from CharSequence
bucketFor(char[], int) - Method in class org.archive.util.LongToIntConsistentHash
 
BucketQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Uses the target IPs as basis for queue-assignment, distributing them over a fixed number of sub-queues.
BucketQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
buffer - Variable in class org.archive.util.PaddingStringBuffer
 
bufLocal - Variable in class org.archive.crawler.io.UriProcessingFormatter
Reusable assembly buffer.
buildAndAddOutlink(CrawlURI, Map<String, Object>) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
buildDisplayingHeader(int, long) - Static method in class org.archive.crawler.util.LogReader
 
buildSurtPrefixSet() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Construct the set of prefixes to use, from the seed list ( which may include both URIs and '+'-prefixed directives).
busyThreads - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
bytesProcessed - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 

C

cache - Variable in class org.archive.crawler.processor.CrawlMapper
 
cache - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
cachedFormat - Variable in class org.archive.crawler.io.UriProcessingFormatter
 
cacheLength() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
cachePercent - Variable in class org.archive.bdb.BdbModule
 
cacheSize - Variable in class org.archive.bdb.BdbModule
 
calcOutputDirs() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
calcReverseSortedHostsDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
Return a copy of the hosts distribution in reverse-sorted (largest first) order.
calcSchemeAuthorityKeyBytes(String) - Static method in class org.archive.crawler.util.BdbUriUniqFilter
 
calcSeedRecordsSortedByStatusCode() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
calculateInsertKey(CrawlURI) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Calculate the insertKey that places a CrawlURI in the desired spot.
calculateOriginKey(String) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Calculate the 'origin' key for a virtual queue of items with the given classKey.
calculatePrecedence(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
Calculate the precedence value for the given queue.
calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
Calculate the precedence value for the given URI.
calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
 
calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
calculatePrecedence(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
 
CALENDARISH - Static variable in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
canary - Variable in class org.archive.util.ObjectIdentityBdbCache
 
candidateChain - Variable in class org.archive.crawler.framework.CrawlController
Candidate chain
candidateChain - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
Candidate chain
CandidateChain - Class in org.archive.modules
 
CandidateChain() - Constructor for class org.archive.modules.CandidateChain
 
CandidateScoper - Class in org.archive.crawler.prefetch
Simple single-URI scoper, considers passed-in URI as candidate; sets fetchstatus negative and skips to end of processing if out-of-scope.
CandidateScoper() - Constructor for class org.archive.crawler.prefetch.CandidateScoper
 
CandidatesProcessor - Class in org.archive.crawler.postprocessor
Processor which sends all candidate outlinks through the CandidateChain, scheduling those with non-negative status codes to the frontier.
CandidatesProcessor() - Constructor for class org.archive.crawler.postprocessor.CandidatesProcessor
Usual no-argument constructor
candidateUserAgents - Variable in class org.archive.modules.net.FirstNamedRobotsPolicy
list of user-agents to try; if any are allowed, a URI will be crawled
candidateUserAgents - Variable in class org.archive.modules.net.MostFavoredRobotsPolicy
list of user-agents to try; if any are allowed, a URI will be crawled
CanonicalizationRule - Interface in org.archive.modules.canonicalize
A rule to apply canonicalizing a url.
canonicalize(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
Canonicalize passed CrawlURI.
canonicalize(String) - Method in interface org.archive.modules.canonicalize.CanonicalizationRule
Apply this canonicalization rule.
canonicalize(String) - Method in class org.archive.modules.canonicalize.FixupQueryString
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.LowercaseRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.RegexRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
Run the passed uuri through the list of rules.
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripExtraSlashes
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripSessionCFIDs
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripSessionIDs
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripUserinfoRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripWWWNRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripWWWRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.UriCanonicalizationPolicy
 
canonicalString - Variable in class org.archive.modules.CrawlURI
 
capacityPowerOfTwo - Variable in class org.archive.util.AbstractLongFPSet
the capacity of this set, specified as the exponent of a power of 2
caseSensitiveFilesystem - Variable in class org.archive.modules.writer.MirrorWriterProcessor
True if the file system is case-sensitive, like UNIX.
catalog - Variable in class org.archive.modules.extractor.PDFParser
 
characterMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
This list is grouped in pairs.
checkAvailableSpace(File) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
Probe via File.getUsableSpace to see if monitored paths have fallen below the pause threshold.
checkBytesWritten() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
checkForLimitsExceeded(CrawlStatSnapshot) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
checkForNull(String) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
checkForSeedPromotion(CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
Check if the URI needs special 'discovered seed' treatment.
checkFutures() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Check for any future-scheduled URIs now eligible for reenqueuing
checkMidfetchAbort(CrawlURI, HttpRecorderMethod, HttpConnection) - Method in class org.archive.modules.fetcher.FetchHTTP
 
checkNotUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
Throws an IllegalStateException if the HTTP method has been already executed, but not recycled.
checkOutlinks - Variable in class org.archive.crawler.processor.CrawlMapper
Whether to apply the mapping to discovered outlinks, for example after extraction has occurred.
Checkpoint - Class in org.archive.checkpointing
Represents a single checkpoint, by its name and main store directory.
Checkpoint() - Constructor for class org.archive.checkpointing.Checkpoint
 
checkpoint - Variable in class org.archive.crawler.framework.CheckpointSuccessEvent
 
Checkpointable - Interface in org.archive.checkpointing
Interface for objects that can checkpoint their state, possibly but not necessarily into the provided Checkpoint instance, on request.
checkpointDir - Variable in class org.archive.checkpointing.Checkpoint
Checkpoints directory; either an absolute path, or relative to the CheckpointService's checkpointsDirectory (which will be inserted as the COnfigPath base before the Checkpoint is consulted).
checkpointFailed(Exception) - Method in class org.archive.crawler.framework.CheckpointService
Note that a checkpoint failed
checkpointFailed(String) - Method in class org.archive.crawler.framework.CheckpointService
 
checkpointInProgress - Variable in class org.archive.crawler.framework.CheckpointService
 
checkpointIntervalMinutes - Variable in class org.archive.crawler.framework.CheckpointService
 
checkpointsDir - Variable in class org.archive.crawler.framework.CheckpointService
 
CheckpointService - Class in org.archive.crawler.framework
Executes checkpoints, and offers convenience methods for enumerating available Checkpoints and injecting a recovery-Checkpoint after build and before launch (setRecoveryCheckpointByName).
CheckpointService() - Constructor for class org.archive.crawler.framework.CheckpointService
Create a new Checkpointer
CheckpointSuccessEvent - Class in org.archive.crawler.framework
Report success of a Checkpoint (so that it may be reported by the CrawlJOb to the job log).
CheckpointSuccessEvent(CheckpointService, Checkpoint) - Constructor for class org.archive.crawler.framework.CheckpointSuccessEvent
 
checkpointTask - Variable in class org.archive.crawler.framework.CheckpointService
 
CheckpointUtils - Class in org.archive.crawler.util
Utilities useful checkpointing.
CheckpointUtils() - Constructor for class org.archive.crawler.util.CheckpointUtils
 
CheckpointValidator - Class in org.archive.crawler.framework
 
CheckpointValidator() - Constructor for class org.archive.crawler.framework.CheckpointValidator
 
checkQuotas(CrawlURI, FetchStats.HasFetchStats, int) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
Check all quotas for the given substats and category (server, host, or group).
checkUri - Variable in class org.archive.crawler.processor.CrawlMapper
Whether to apply the mapping to a URI being processed itself, for example early in processing (while its status is still 'unattempted').
checkUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
Throws an IllegalStateException if the HTTP method has not been executed since last recycle.
checkXML() - Method in class org.archive.crawler.framework.CrawlJob
Is the primary XML config minimally well-formed?
chmod - Variable in class org.archive.modules.writer.Kw3WriterProcessor
Should permissions be changed for the newly created dirs.
chmodValue - Variable in class org.archive.modules.writer.Kw3WriterProcessor
What should the permissions be set to.
chosenEngine - Variable in class org.archive.crawler.restlet.ScriptResource
 
circle - Variable in class org.archive.util.LongToIntConsistentHash
 
cj - Variable in class org.archive.crawler.restlet.JobRelatedResource
 
cj - Variable in class org.archive.crawler.restlet.JobResource
 
classCatalog - Variable in class org.archive.util.bdbje.EnhancedEnvironment
 
classCatalogDB - Variable in class org.archive.util.bdbje.EnhancedEnvironment
 
classKey - Variable in class org.archive.crawler.frontier.WorkQueue
The classKey
ClassKeyMatchesRegexDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURI class key -- i.e.
ClassKeyMatchesRegexDecideRule() - Constructor for class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
Usual constructor.
clazz - Variable in class org.archive.spring.BeanFieldsPatternValidator
 
cleanup() - Method in class org.archive.crawler.framework.ToePool
 
cleanupHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
Perform any final cleanup related to the HttpClient instance.
cleanUpOldFiles(String) - Method in class org.archive.util.TmpDirTestCase
Delete any files left over from previous run.
cleanUpOldFiles(File, String) - Method in class org.archive.util.TmpDirTestCase
Delete any files left over from previous run.
clear() - Method in class org.apache.commons.httpclient.HttpState
Clears the state information (all cookies, credentials and proxy credentials).
clear() - Method in class org.archive.crawler.io.UriProcessingFormatter
 
clearAllOverrideContexts() - Static method in class org.archive.spring.KeyedProperties
 
clearAt(long) - Method in class org.archive.util.AbstractLongFPSet
 
clearAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
clearCookies() - Method in class org.apache.commons.httpclient.HttpState
Clears all cookies.
clearCredentials() - Method in class org.apache.commons.httpclient.HttpState
Clears all credentials.
clearOverridesFrom(OverlayContext) - Static method in class org.archive.spring.KeyedProperties
 
clearPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
Clear prerequisite, if any.
clearProxyCredentials() - Method in class org.apache.commons.httpclient.HttpState
Clears all proxy credentials.
CLibrary - Interface in org.archive.util
Interface to standard C library functions; initially just link().
ClientFTP - Class in org.archive.net
Client for FTP operations.
ClientFTP() - Constructor for class org.archive.net.ClientFTP
Constructs a new ClientFTP.
close() - Method in class org.apache.commons.httpclient.HttpConnection
Closes the socket and streams.
close() - Method in class org.archive.bdb.BdbModule
 
close() - Method in class org.archive.bdb.StoredQueue
 
close() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Close down any allocated resources.
close() - Method in class org.archive.crawler.frontier.BdbFrontier
 
close() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
clean up
close() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Release resources only needed when running
close() - Method in class org.archive.crawler.reporting.AlertHandler
 
close() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
close() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
close() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
close() - Method in class org.archive.io.CrawlerJournal
Flush and close the underlying IO objects.
close() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
close() - Method in class org.archive.modules.fetcher.DefaultServerCache
Called when shutting down the cache so we can do clean up.
close() - Method in class org.archive.util.bdbje.EnhancedEnvironment
 
close() - Method in class org.archive.util.ObjectIdentityBdbCache
 
close() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
close() - Method in interface org.archive.util.ObjectIdentityCache
close/release any associated resources
close() - Method in class org.archive.util.ObjectIdentityMemCache
 
closeDatabase(Database) - Method in class org.archive.bdb.BdbModule
 
closeDatabase(String) - Method in class org.archive.bdb.BdbModule
 
closeDataConnection() - Method in class org.archive.net.ClientFTP
 
closeIfStale() - Method in class org.apache.commons.httpclient.HttpConnection
Closes the connection if stale.
closeLogFiles() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
Close all log files and remove handlers from loggers.
closeSocketAndStreams() - Method in class org.apache.commons.httpclient.HttpConnection
Closes everything out.
collect(CrawlController, StatisticsTracker) - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
Collect all relevant snapshot samples, from the given CrawlController and StatisticsTracker (which also provides the previous snapshot for rate-calculations.
collection - Variable in class org.archive.modules.writer.Kw3WriterProcessor
Name of collection.
COLLECTION_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
comment - Variable in class org.archive.modules.deciderules.DecideRule
 
compactReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
compare(Object, Object) - Method in class org.apache.commons.httpclient.Cookie
Compares two cookies to determine order for cookie header.
compareTo(CrawlJob) - Method in class org.archive.crawler.framework.CrawlJob
Sort for reverse-chronological listing.
compareTo(Delayed) - Method in class org.archive.crawler.frontier.WorkQueue
 
compareTo(DecideRuledSheetAssociation) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
compareTo(FPMergeUriUniqFilter.PendingItem) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
compareTo(Link) - Method in class org.archive.modules.extractor.Link
 
completePause() - Method in class org.archive.crawler.framework.CrawlController
 
completeStop() - Method in class org.archive.crawler.framework.CrawlController
Called when the last toethread exits.
component - Variable in class org.archive.crawler.Heritrix
 
composeCacheSummary() - Method in class org.archive.util.ObjectIdentityBdbCache
 
composeCacheSummary() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
CompositeIterator<E> - Class in org.archive.util.iterator
An iterator that's built up out of any number of other iterators.
CompositeIterator() - Constructor for class org.archive.util.iterator.CompositeIterator
Create an empty CompositeIterator.
CompositeIterator(Iterator<E>, Iterator<E>) - Constructor for class org.archive.util.iterator.CompositeIterator
Convenience method for concatenating together two iterators.
compress - Variable in class org.archive.modules.writer.WriterPoolProcessor
Whether to gzip-compress files when writing to disk; by default true, meaning do-compress.
concludedSeedBatch() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
concludedSeedBatch() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
concludedSeedBatch() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
concludedSeedBatch() - Method in interface org.archive.modules.seeds.SeedListener
 
ConfigFile - Class in org.archive.spring
ConfigPath with added implication that it is an individual, readable/writable File.
ConfigFile() - Constructor for class org.archive.spring.ConfigFile
 
ConfigFile(String, String) - Constructor for class org.archive.spring.ConfigFile
 
ConfigFileEditor - Class in org.archive.spring
PropertyEditor allowing Strings to become ConfigFile instances.
ConfigFileEditor() - Constructor for class org.archive.spring.ConfigFileEditor
 
ConfigPath - Class in org.archive.spring
A filesystem path, as a bean, for the convenience of configuration via srping beans.xml or user interfaces to same.
ConfigPath() - Constructor for class org.archive.spring.ConfigPath
 
ConfigPath(String, String) - Constructor for class org.archive.spring.ConfigPath
 
configPathConfigurer - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
 
ConfigPathConfigurer - Class in org.archive.spring
Bean to fixup all configuration-relative ConfigPath instances, and maintain an inventory of referenced paths.
ConfigPathConfigurer() - Constructor for class org.archive.spring.ConfigPathConfigurer
 
ConfigPathEditor - Class in org.archive.spring
PropertyEditor allowing Strings to become ConfigPath instances.
ConfigPathEditor() - Constructor for class org.archive.spring.ConfigPathEditor
 
ConfigString - Class in org.archive.spring
A configuration string that provides its own reader via the ReadSource interface, for convenient use in spring configuration where any of an inline string, path to local file (ConfigPath), or any other readable-text-source would all be equally welcome.
ConfigString() - Constructor for class org.archive.spring.ConfigString
 
ConfigString(String) - Constructor for class org.archive.spring.ConfigString
 
configureHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
 
configureHttp(int, String, String, int, String, String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
configureMethod(CrawlURI, HttpMethod) - Method in class org.archive.modules.fetcher.FetchHTTP
Configure the HttpMethod setting options and headers.
configurer - Variable in class org.archive.spring.ConfigPath
 
congestionRatio() - Method in interface org.archive.crawler.framework.Frontier
Ratio of number of threads that would theoretically allow maximum crawl progress (if each was as productive as current threads), to current number of threads.
congestionRatio() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
congestionRatio - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
conhash - Variable in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
connect() - Method in class org.archive.net.s3.S3URLConnection
Connect to S3 and get the object reference, but don't read any of the object data yet.
connectTimeoutMs - Variable in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
consecutiveConnectionErrors - Variable in class org.archive.modules.net.CrawlServer
 
considerActive() - Method in class org.archive.crawler.frontier.WorkQueue
Begin an 'active' session, which begins when a queue first offers a URI for crawling, and continues until it is deactivated (for example, for session-budget reasons).
considerDnsPreconditions(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
considerIfLikelyUri(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
Consider whether a given string is URI-like.
considerIncluded(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should consider the given UURI as if already scheduled.
considerIncluded(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
considerQueryStringValues(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
Consider a query-string-like collections of key=value[&key=value] pairs for URI-like strings in the values.
considerRobotsPreconditions(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Consider the robots precondition.
considerString(Extractor, CrawlURI, boolean, String) - Method in class org.archive.modules.extractor.ExtractorJS
 
considerStringAsUri(String) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
considerStrings(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorJS
 
considerStrings(Extractor, CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorJS
 
considerStrings(Extractor, CrawlURI, CharSequence, boolean) - Method in class org.archive.modules.extractor.ExtractorJS
 
considerTimestamp() - Method in class org.archive.io.CrawlerJournal
Write a timestamp line if appropriate
consistencyCheck() - Method in class org.archive.crawler.frontier.BdbFrontier
Run a self-consistency check over queue collections, queues-of-queues, etc.
consistencyMarkup(DisposableStoredSortedMap<String, String>, Iterable<?>, String) - Method in class org.archive.crawler.frontier.BdbFrontier
 
CONSTRUCTOR_CACHE - Static variable in class org.archive.bdb.AutoKryo
 
constructRegex(int) - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
 
contains(long) - Method in class org.archive.util.AbstractLongFPSet
Does this set contain the given value?
contains(CharSequence) - Method in interface org.archive.util.BloomFilter
Checks whether the given character sequence is in this filter.
contains(CharSequence) - Method in class org.archive.util.BloomFilter64bit
Checks whether the given character sequence is in this filter.
contains(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
contains(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Does this set contain a given fingerprint.
contains(int) - Method in class org.archive.util.ms.Piece
 
containsContentTypeCharsetDeclaration() - Method in class org.archive.modules.CrawlURI
 
containsDataKey(String) - Method in class org.archive.modules.CrawlURI
 
containsHost(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
 
containsKey(Object) - Method in class org.archive.crawler.framework.BeanLookupBindings
 
containsServer(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
 
CONTENT_LENGTH_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
CONTENT_MD5_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
CONTENT_TYPE_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
contentDigestHistory - Variable in class org.archive.modules.recrawl.ContentDigestHistoryLoader
 
contentDigestHistory - Variable in class org.archive.modules.recrawl.ContentDigestHistoryStorer
 
ContentDigestHistoryLoader - Class in org.archive.modules.recrawl
 
ContentDigestHistoryLoader() - Constructor for class org.archive.modules.recrawl.ContentDigestHistoryLoader
 
ContentDigestHistoryStorer - Class in org.archive.modules.recrawl
 
ContentDigestHistoryStorer() - Constructor for class org.archive.modules.recrawl.ContentDigestHistoryStorer
 
ContentExtractor - Class in org.archive.modules.extractor
Extracts link from the fetched content of a URI, as opposed to its headers.
ContentExtractor() - Constructor for class org.archive.modules.extractor.ContentExtractor
 
ContentExtractorTestBase - Class in org.archive.modules.extractor
Abstract base class for unit testing ContentExtractor implementations.
ContentExtractorTestBase() - Constructor for class org.archive.modules.extractor.ContentExtractorTestBase
 
ContentLengthDecideRule - Class in org.archive.modules.deciderules
 
ContentLengthDecideRule() - Constructor for class org.archive.modules.deciderules.ContentLengthDecideRule
Usual constructor.
contentSinceCheck - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
contentTypeMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
This list is grouped in pairs.
ContentTypeMatchesRegexDecideRule - Class in org.archive.modules.deciderules
DecideRule whose decision is applied if the URI's content-type is present and matches the supplied regular expression.
ContentTypeMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule
 
ContentTypeNotMatchesRegexDecideRule - Class in org.archive.modules.deciderules
DecideRule whose decision is applied if the URI's content-type is present and does not match the supplied regular expression.
ContentTypeNotMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.ContentTypeNotMatchesRegexDecideRule
 
controlConversation - Variable in class org.archive.net.ClientFTP
 
controller - Variable in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
 
controller - Variable in class org.archive.crawler.framework.CheckpointService
 
controller - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
 
controller - Variable in class org.archive.crawler.framework.ToePool
 
controller - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
controller - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
 
controller - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
controller - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
controller - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
Cookie - Class in org.apache.commons.httpclient
HTTP "magic-cookie" represents a piece of state information that the HTTP agent and the target server can exchange to maintain a session.
Cookie() - Constructor for class org.apache.commons.httpclient.Cookie
Default constructor.
Cookie(String, String, String) - Constructor for class org.apache.commons.httpclient.Cookie
Creates a cookie with the given name, value and domain attribute.
Cookie(String, String, String, String, Date, boolean) - Constructor for class org.apache.commons.httpclient.Cookie
Creates a cookie with the given name, value, domain attribute, path attribute, expiration attribute, and secure attribute
Cookie(String, String, String, String, int, boolean) - Constructor for class org.apache.commons.httpclient.Cookie
Creates a cookie with the given name, value, domain attribute, path attribute, maximum age attribute, and secure attribute
COOKIEDB_NAME - Static variable in class org.archive.modules.fetcher.BdbCookieStorage
 
cookiesLoadFile - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
 
CookieSpec - Interface in org.apache.commons.httpclient.cookie
Defines the cookie management specification.
CookieSpecBase - Class in org.apache.commons.httpclient.cookie
Cookie management functions shared by all specification.
CookieSpecBase() - Constructor for class org.apache.commons.httpclient.cookie.CookieSpecBase
Default constructor
cookiesSaveFile - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
 
CookieStorage - Interface in org.archive.modules.fetcher
 
cookieStorage - Variable in class org.archive.modules.fetcher.FetchHTTP
 
copy(CrawlJob, File, boolean) - Method in class org.archive.crawler.framework.Engine
Copy a job to a new location, possibly making a job a profile or a profile a runnable job.
copy(CrawlJob, String, boolean) - Method in class org.archive.crawler.framework.Engine
Copy a job to a new location, possibly making a job a profile or a profile a runnable job.
copyForwardWriteTagIfDupe(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
If this fetch is identical to the last written (archived) fetch, then copy forward the writeTag.
copyJob(String, boolean) - Method in class org.archive.crawler.restlet.JobResource
 
copyPersistSourceToHistoryMap(File, StoredSortedMap<String, Map>) - Static method in class org.archive.modules.recrawl.PersistProcessor
Populates a given StoredSortedMap (history map) from an old environment db or a persist log.
copyPersistSourceToHistoryMap(URL, StoredSortedMap<String, Map>) - Static method in class org.archive.modules.recrawl.PersistProcessor
Populates a given StoredSortedMap (history map) from an old persist log.
CoreAttributeConstants - Interface in org.archive.modules
Attribute keys and constant strings used by the core crawler classes.
CostAssignmentPolicy - Class in org.archive.crawler.frontier
Calculate a integer 'cost' value for the given CrawlURI.
CostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.CostAssignmentPolicy
 
costCount - Variable in class org.archive.crawler.frontier.WorkQueue
Total number of items charged against queue; with totalExpenditure can be used to calculate 'average cost'.
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.CostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.UnitCostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.WagCostAssignmentPolicy
Add constant penalties for certain features of URI (and its 'via') that make it more delayable/skippable.
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.ZeroCostAssignmentPolicy
 
CostUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
UriPrecedencePolicy which sets a URI's precedence to its 'cost' -- which simulates the in-queue sorting order in Heritrix 1.x, where cost contributed the same bits to the queue-insert-key that precedence now does.
CostUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.CostUriPrecedencePolicy
 
count() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
 
count - Variable in class org.archive.crawler.frontier.WorkQueue
Total number of stored items
count - Variable in class org.archive.crawler.reporting.AlertThreadGroup
 
count - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
count - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
count - Variable in class org.archive.util.AbstractLongFPSet
The current number of elements in the set
count() - Method in class org.archive.util.AbstractLongFPSet
Return the number of entries in this set.
count - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
count() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
count() - Method in interface org.archive.util.fingerprint.LongFPSet
get the number of elements in the Set
count - Variable in class org.archive.util.ObjectIdentityBdbCache
 
count - Variable in class org.archive.util.ObjectIdentityBdbManualCache
 
countryCodes - Variable in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
Country code name.
Cp1252 - Class in org.archive.util.ms
A fast implementation of code page 1252.
crawlCheckpoint(Object, File) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
CrawlController - Class in org.archive.crawler.framework
CrawlController collects all the classes which cooperate to perform a crawl and provides a high-level interface to the running crawl.
CrawlController() - Constructor for class org.archive.crawler.framework.CrawlController
 
CrawlController.State - Enum in org.archive.crawler.framework
 
CrawlController.StopCompleteEvent - Class in org.archive.crawler.framework
 
CrawlController.StopCompleteEvent(Object) - Constructor for class org.archive.crawler.framework.CrawlController.StopCompleteEvent
 
crawlDelay - Variable in class org.archive.modules.net.RobotsDirectives
 
crawledBytes - Variable in class org.archive.crawler.reporting.StatisticsTracker
tally sizes novel, verified (same hash), vouched (not-modified)
CrawledBytesHistotable - Class in org.archive.crawler.util
 
CrawledBytesHistotable() - Constructor for class org.archive.crawler.util.CrawledBytesHistotable
 
crawledBytesSummary() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawledURIDisregard(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawledURIFailure(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawledURINeedRetry(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawledURISuccessful(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlEmpty(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlEnding(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlEndTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
wall-clock time the crawl ended
crawlerCount - Variable in class org.archive.crawler.processor.HashCrawlMapper
Number of crawlers among which to split up the URIs.
CrawlerJournal - Class in org.archive.io
Utility class for a crawler journal/log that is compressed and rotates by serial number at checkpoints.
CrawlerJournal(String, String) - Constructor for class org.archive.io.CrawlerJournal
Create a new crawler journal at the given location
CrawlerJournal(File) - Constructor for class org.archive.io.CrawlerJournal
Create a new crawler journal at the given location
CrawlerLoggerModule - Class in org.archive.crawler.reporting
Module providing all expected whole-crawl logging facilities
CrawlerLoggerModule() - Constructor for class org.archive.crawler.reporting.CrawlerLoggerModule
 
CrawlHost - Class in org.archive.modules.net
Represents a single remote "host".
CrawlHost(String) - Constructor for class org.archive.modules.net.CrawlHost
Create a new CrawlHost object.
CrawlHost(String, String) - Constructor for class org.archive.modules.net.CrawlHost
Create a new CrawlHost object.
CrawlJob - Class in org.archive.crawler.framework
CrawlJob represents a crawl configuration, including its configuration files, instantiated/running ApplicationContext, and disk output, potentially across multiple runs.
CrawlJob(File) - Constructor for class org.archive.crawler.framework.CrawlJob
 
CrawlJob.JobLogFormatter - Class in org.archive.crawler.framework
Formatter for job.log
CrawlJob.JobLogFormatter() - Constructor for class org.archive.crawler.framework.CrawlJob.JobLogFormatter
 
CrawlJobModel - Class in org.archive.crawler.restlet.models
 
CrawlJobModel(CrawlJob, String) - Constructor for class org.archive.crawler.restlet.models.CrawlJobModel
 
CrawlLimitEnforcer - Class in org.archive.crawler.framework
Bean to enforce limits on the size of a crawl in URI count, byte count, or elapsed time.
CrawlLimitEnforcer() - Constructor for class org.archive.crawler.framework.CrawlLimitEnforcer
 
crawlLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
CrawlMapper - Class in org.archive.crawler.processor
A simple crawl splitter/mapper, dividing up CrawlURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).
CrawlMapper() - Constructor for class org.archive.crawler.processor.CrawlMapper
Constructor.
CrawlMetadata - Class in org.archive.modules
Basic crawl metadata, as consulted by functional modules and recorded in ARCs/WARCs.
CrawlMetadata() - Constructor for class org.archive.modules.CrawlMetadata
 
crawlPaused(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlPauseStarted - Variable in class org.archive.crawler.reporting.StatisticsTracker
wall-clock time of last pause, while pause in progres
crawlPausing(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlResuming(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
CrawlServer - Class in org.archive.modules.net
Represents a single remote "server".
CrawlServer(String) - Constructor for class org.archive.modules.net.CrawlServer
Creates a new CrawlServer object.
crawlStartTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
wall-clock time the crawl started
CrawlStateEvent - Class in org.archive.crawler.event
 
CrawlStateEvent(Object, CrawlController.State, String) - Constructor for class org.archive.crawler.event.CrawlStateEvent
 
CrawlStatSnapshot - Class in org.archive.crawler.reporting
Frozen snapshot of a variety of crawl statistics.
CrawlStatSnapshot() - Constructor for class org.archive.crawler.reporting.CrawlStatSnapshot
 
CrawlStatus - Enum in org.archive.crawler.framework
 
CrawlSummaryReport - Class in org.archive.crawler.reporting
The "Crawl Report", with summaries of overall crawl size.
CrawlSummaryReport() - Constructor for class org.archive.crawler.reporting.CrawlSummaryReport
 
crawlTotalPausedTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
duration tally of all time spent in paused state
CrawlURI - Class in org.archive.modules
Represents a candidate URI and the associated state it collects as it is crawled.
CrawlURI(UURI) - Constructor for class org.archive.modules.CrawlURI
Create a new instance of CrawlURI from a UURI.
CrawlURI(UURI, String, UURI, LinkContext) - Constructor for class org.archive.modules.CrawlURI
 
CrawlURI.FetchType - Enum in org.archive.modules
 
CrawlURIDispositionEvent - Class in org.archive.crawler.event
 
CrawlURIDispositionEvent(Object, CrawlURI, CrawlURIDispositionEvent.Disposition) - Constructor for class org.archive.crawler.event.CrawlURIDispositionEvent
 
CrawlURIDispositionEvent.Disposition - Enum in org.archive.crawler.event
 
createCrawlURI(UURI, Link) - Method in class org.archive.modules.CrawlURI
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
createCrawlURI(UURI, Link, int, boolean) - Method in class org.archive.modules.CrawlURI
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
createdEnvironment - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
createDiskMap(Database, StoredClassCatalog, Class) - Method in class org.archive.util.ObjectIdentityBdbCache
 
createDiskMap(Database, StoredClassCatalog, Class) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
createFileLogger(File, String, Logger) - Static method in class org.archive.crawler.util.LogUtils
Creates a file logger that use heritrix.properties file logger configuration.
createFormSubmissionAttempt(CrawlURI, HTMLForm, String) - Method in class org.archive.modules.forms.FormLoginProcessor
 
createFp(CharSequence) - Static method in class org.archive.crawler.util.FPMergeUriUniqFilter
Create a fingerprint from the given key
CreateHardLinkA(String, String, FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES) - Method in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
 
createHostDirectory - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Create a subdirectory named for the host in the URI.
createInactiveQueueForPrecedence(int) - Method in class org.archive.crawler.frontier.BdbFrontier
 
createInactiveQueueForPrecedence(int, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
Optionally reuse prior data, for use when resuming from a checkpoint
createInactiveQueueForPrecedence(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Create an inactiveQueue to hold queue names at the given precedence
createKey(CharSequence) - Static method in class org.archive.crawler.util.BdbUriUniqFilter
Create fingerprint.
createMultipleWorkQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
Create the single object (within which is one BDB database) inside which all the other queues live.
createNewJobWithDefaults(File) - Method in class org.archive.crawler.framework.Engine
create a new job dir and copy profile CXML into as non-profile CXML
createPortDirectory - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Create a subdirectory named for the port in the URI.
createRecorder(String) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
Deprecated.
createRecorder(String, String) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
 
createRoot() - Method in class org.archive.crawler.restlet.EngineApplication
 
createSocket() - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(String, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(InetAddress, int, InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
Attempts to get a new socket connection to the given host within the given time limit.
createSocket(String, int) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
 
createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(String, int) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(Socket, String, int, boolean) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
CreateSymbolicLinkA(String, String, FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES) - Method in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
 
createUriSet() - Method in class org.archive.crawler.util.MemUriUniqFilter
 
createUriSet() - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
Credential - Class in org.archive.modules.credential
Credential type.
Credential() - Constructor for class org.archive.modules.credential.Credential
Constructor.
credentialPrecondition(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Consider credential preconditions.
CredentialStore - Class in org.archive.modules.credential
Front door to the credential store.
CredentialStore() - Constructor for class org.archive.modules.credential.CredentialStore
Constructor.
CSS_BACKSLASH_ESCAPE - Static variable in class org.archive.modules.extractor.ExtractorCSS
 
CSS_URI_EXTRACTOR - Static variable in class org.archive.modules.extractor.ExtractorCSS
CSS URL extractor pattern.
curi - Variable in class org.archive.crawler.event.CrawlURIDispositionEvent
 
curi - Variable in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
current() - Static method in class org.archive.crawler.reporting.AlertThreadGroup
 
current - Variable in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
currentDocsPerSecond - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
currentFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
currentIterator - Variable in class org.archive.util.iterator.CompositeIterator
 
currentKiBPerSec - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
currentLaunchDir - Variable in class org.archive.spring.PathSharingContext
 
currentLaunchId - Variable in class org.archive.spring.PathSharingContext
 
currentLaunchJobLogHandler - Variable in class org.archive.crawler.framework.CrawlJob
 
customRobots - Variable in class org.archive.modules.net.CustomRobotsPolicy
textual alternate robots.txt rules to follow
CustomRobotsPolicy - Class in org.archive.modules.net
Follow a custom-written robots policy, rather than the site's own declarations Does not support overlays of different custom-robots; instead it is recommended each custom policy be declared as a separate bean, with a distinct name.
CustomRobotsPolicy() - Constructor for class org.archive.modules.net.CustomRobotsPolicy
 
customRobotstxt - Variable in class org.archive.modules.net.CustomRobotsPolicy
 
CustomSWFTags - Class in org.archive.modules.extractor
Overwrite action tags, that may hold URI, to use CrawlUriSWFAction action.
CustomSWFTags(SWFActions) - Constructor for class org.archive.modules.extractor.CustomSWFTags
 

D

d - Variable in class org.archive.util.BloomFilter64bit
The number of hash functions used by this filter.
data - Variable in class org.archive.modules.CrawlURI
Flexible dynamic attributes list.
data - Variable in class org.archive.modules.extractor.Link
Flexible dynamic attributes list.
data - Variable in class org.archive.spring.PathSharingContext
 
databaseConfig() - Static method in class org.archive.bdb.StoredQueue
A suitable DatabaseConfig for the Database backing a StoredQueue.
dataSocket - Variable in class org.archive.net.ClientFTP
 
db - Variable in class org.archive.bdb.DisposableStoredSortedMap
 
db - Variable in class org.archive.util.ObjectIdentityBdbCache
The BDB JE database used for this instance.
db - Variable in class org.archive.util.ObjectIdentityBdbManualCache
The BDB JE database used for this instance.
deactivateQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Put the given queue on the inactiveQueues queue
DEBUG - Static variable in class org.archive.util.BloomFilter64bit
 
DecideResult - Enum in org.archive.modules.deciderules
The decision of a DecideRule.
DecideRule - Class in org.archive.modules.deciderules
 
DecideRule() - Constructor for class org.archive.modules.deciderules.DecideRule
 
DecideRuledSheetAssociation - Class in org.archive.crawler.spring
SheetAssociation applied on the basis of DecideRules.
DecideRuledSheetAssociation() - Constructor for class org.archive.crawler.spring.DecideRuledSheetAssociation
 
DecideRuleSequence - Class in org.archive.modules.deciderules
 
DecideRuleSequence() - Constructor for class org.archive.modules.deciderules.DecideRuleSequence
 
decideToMapOutlink(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
decisionFor(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
 
decode(String) - Static method in class org.archive.util.Base32
Decodes the given Base32 String to a raw byte array.
decode(int) - Static method in class org.archive.util.ms.Cp1252
Returns the Unicode character for the given Cp1252 byte.
decrementQueuedCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
Note that a number of queued Uris have been deleted.
deepestUri() - Method in interface org.archive.crawler.framework.Frontier
Ordinal position of the 'deepest' URI eligible for crawling.
deepestUri() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
deepestUri - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
DEFAULT_CAPACITY - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
DEFAULT_CLASS_KEY - Static variable in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
DEFAULT_IP_WHOIS_SERVER - Static variable in class org.archive.modules.fetcher.FetchWhois
 
DEFAULT_LOWER_BOUND - Static variable in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Default lower bound
DEFAULT_LOWER_BOUND - Static variable in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Default lower bound
DEFAULT_MAX_PENDING - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
DEFAULT_PARAMETERS - Static variable in class org.archive.modules.extractor.Extractor
 
DEFAULT_REPLICAS - Static variable in class org.archive.util.LongToIntConsistentHash
 
DEFAULT_SMEAR - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
DEFAULT_TEST_TMP_DIR - Static variable in class org.archive.util.TmpDirTestCase
Default test tmp.
DEFAULT_TOE_PRIORITY - Static variable in class org.archive.crawler.framework.ToePool
run worker thread slightly lower than usual
DEFAULT_UPPER_BOUND - Static variable in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Default upper bound
DEFAULT_UPPER_BOUND - Static variable in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Default upper bound
DefaultBlockFileSystem - Class in org.archive.util.ms
Default implementation of the Block File System.
DefaultBlockFileSystem(SeekInputStream, int) - Constructor for class org.archive.util.ms.DefaultBlockFileSystem
Constructor.
DefaultServerCache - Class in org.archive.modules.fetcher
Server and Host cache.
DefaultServerCache() - Constructor for class org.archive.modules.fetcher.DefaultServerCache
Constructor.
DefaultServerCache(ObjectIdentityCache<CrawlServer>, ObjectIdentityCache<CrawlHost>) - Constructor for class org.archive.modules.fetcher.DefaultServerCache
 
DefaultTempDirProvider - Class in org.archive.modules.net
 
DefaultTempDirProvider() - Constructor for class org.archive.modules.net.DefaultTempDirProvider
 
defaultUpdateDescriptor(PropertyDescriptor) - Method in class org.archive.crawler.restlet.JobRelatedResource
 
defaultURI() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
Returns a CrawlURI for testing purposes.
deferOrFinishGeneric(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchWhois
 
deferredWrite - Variable in class org.archive.bdb.BdbModule.BdbConfig
 
degree - Variable in class st.ata.util.FPGenerator
The number of bits in fingerprints generated by this.
delaySeconds - Variable in class org.archive.crawler.framework.ActionDirectory
delay between scans of actionDirectory for new files
delete(CrawlURI) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Delete the given CrawlURI from persistent store.
deleted(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that a CrawlURI has been deleted outside of the normal next()/finished() lifecycle.
deleted(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Force logging, etc.
deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Removes the given item from the queue.
deleteJob(CrawlJob) - Method in class org.archive.crawler.framework.Engine
 
deleteMatching(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
Delete URIs matching the given pattern from this queue.
deleteMatchingFromQueue(String, String, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Delete all CrawlURIs matching the given expression.
deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
Delete URIs matching the given pattern from this queue.
deleteSheet(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Delete a named sheet from all associations and the master named sheets map.
deleteURIs(String, String) - Method in interface org.archive.crawler.framework.Frontier
Delete any URI that matches the given regular expression from the list of discovered and pending URIs.
deleteURIs(String, String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
dequeue(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Remove the peekItem from the queue and adjusts the count.
desc - Variable in enum org.archive.crawler.framework.CrawlStatus
 
description - Variable in class org.archive.modules.CrawlMetadata
 
DescriptorUpdater - Interface in org.archive.crawler.restlet
 
destroy() - Method in class org.archive.bdb.BdbModule
 
destroy() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
destroy() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
destroy() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
detach(CrawlURI) - Method in class org.archive.modules.credential.Credential
Detach this credential from passed curi.
detachAll(CrawlURI) - Method in class org.archive.modules.credential.Credential
Detach all credentials of this type from passed curi.
determineRootRef(Request) - Method in class org.archive.crawler.restlet.EnhDirectory
 
digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchDNS
Which algorithm (for example MD5 or SHA-1) to use to perform an on-the-fly digest hash of retrieved content-bodies.
digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchFTP
Which algorithm (for example MD5 or SHA-1) to use to perform an on-the-fly digest hash of retrieved content-bodies.
digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchHTTP
Which algorithm (for example MD5 or SHA-1) to use to perform an on-the-fly digest hash of retrieved content-bodies.
dir - Variable in class org.archive.bdb.BdbModule
 
directory - Variable in class org.archive.modules.writer.WriterPoolProcessor
 
directoryFile - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Implicitly append this to a URI ending with '/'.
dirResource - Variable in class org.archive.crawler.restlet.EditRepresentation
 
dirResource - Variable in class org.archive.crawler.restlet.PagedRepresentation
wrapped EnhDirectoryResource; used to formulate self-links
dirtyItems - Variable in class org.archive.util.ObjectIdentityBdbManualCache
 
dirtyKey(String) - Method in class org.archive.util.ObjectIdentityBdbCache
 
dirtyKey(String) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
dirtyKey(String) - Method in interface org.archive.util.ObjectIdentityCache
force the persistent backend, if any, to eventually be updated with live object state for the given key
dirtyKey(String) - Method in class org.archive.util.ObjectIdentityMemCache
 
disallows - Variable in class org.archive.modules.net.RobotsDirectives
 
disconnect() - Method in class org.archive.net.ClientFTP
 
discoveredUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of discovered URIs.
discoveredUriCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
(non-Javadoc)
discoveredUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
DiskFPMergeUriUniqFilter - Class in org.archive.crawler.util
Crude FPMergeUriUniqFilter using a disk data file of raw longs as the overall FP record.
DiskFPMergeUriUniqFilter(File) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
DiskFPMergeUriUniqFilter.DataFileLongIterator - Class in org.archive.crawler.util
 
DiskFPMergeUriUniqFilter.DataFileLongIterator(DataInputStream) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Construct a long iterator reading from the given stream.
diskMap - Variable in class org.archive.util.ObjectIdentityBdbCache
The Collection view of the BDB JE database used for this instance.
diskMap - Variable in class org.archive.util.ObjectIdentityBdbManualCache
The Collection view of the BDB JE database used for this instance.
DiskSpaceMonitor - Class in org.archive.crawler.monitor
Monitors the available space on the paths configured.
DiskSpaceMonitor() - Constructor for class org.archive.crawler.monitor.DiskSpaceMonitor
 
DisposableStoredSortedMap<K,V> - Class in org.archive.bdb
TempStoredSortedMap remembers its backing Database, and offers a dispose() method for closing/discarding the underlying Database.
DisposableStoredSortedMap(Database, EntryBinding<K>, EntityBinding<V>, boolean) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
 
DisposableStoredSortedMap(Database, EntryBinding<K>, EntityBinding<V>, PrimaryKeyAssigner) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
 
DisposableStoredSortedMap(Database, EntryBinding<K>, EntryBinding<V>, boolean) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
 
DisposableStoredSortedMap(Database, EntryBinding<K>, EntryBinding<V>, PrimaryKeyAssigner) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
 
dispose() - Method in class org.archive.bdb.DisposableStoredSortedMap
 
disposition - Variable in class org.archive.crawler.event.CrawlURIDispositionEvent
 
dispositionChain - Variable in class org.archive.crawler.framework.CrawlController
Disposition chain
DispositionChain - Class in org.archive.modules
 
DispositionChain() - Constructor for class org.archive.modules.DispositionChain
 
dispositionInProgressLock - Variable in class org.archive.crawler.frontier.AbstractFrontier
lock allowing steps of outside processing that need to complete all-or-nothing to signal their in-progress status
dispositionPending - Variable in class org.archive.crawler.frontier.AbstractFrontier
remembers a disposition-in-progress, so that extra endDisposition() calls are harmless
DispositionProcessor - Class in org.archive.crawler.postprocessor
A step, late in the processing of a CrawlURI, for marking-up the CrawlURI with values to affect frontier disposition, and updating information that may have been affected by the fetch.
DispositionProcessor() - Constructor for class org.archive.crawler.postprocessor.DispositionProcessor
 
disregardedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that were scheduled at one point but have been disregarded.
disregardedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
URIs that are disregarded (for example because of robot.txt rules
disregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
diversionDir - Variable in class org.archive.crawler.processor.CrawlMapper
Directory to write diversion logs.
diversionLogs - Variable in class org.archive.crawler.processor.CrawlMapper
Mapping of target crawlers to logs (PrintWriters)
divertLog(CrawlURI, String) - Method in class org.archive.crawler.processor.CrawlMapper
Note the given CrawlURI in the appropriate diversion log.
DNSJavaUtil - Class in org.archive.util
Utility methods based on DNSJava.
doAbort(CrawlURI, HttpMethod, String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
Doc - Class in org.archive.util.ms
Reads .doc files.
doCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
 
doCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
Do the actual checkpoint.
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
 
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
 
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
Run checkpointing.
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
docsPerSecond - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
document - Variable in class org.archive.modules.extractor.PDFParser
 
DOCUMENT_BUILDER - Static variable in class org.archive.crawler.migrate.MigrateH1to3Tool
 
documentReader - Variable in class org.archive.modules.extractor.PDFParser
 
doJournalAdded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalEmitted(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalFinishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalFinishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalReenqueued(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalRelocated(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
domain - Variable in class org.archive.modules.credential.Credential
The root domain this credential goes against: E.g.
DOMAIN_OVERBOUNDS - Static variable in class org.apache.commons.httpclient.Cookie
Character which, if appended to end of a domain, will give a boundary key that sorts past all Cookie sortKeys for the same domain.
domainMatch(String, String) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Performs domain-match as defined by the cookie specification.
domainMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Performs domain-match as implemented in common browsers.
domainMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
doneDir - Variable in class org.archive.crawler.framework.ActionDirectory
 
doRecover() - Method in class org.archive.bdb.BdbModule
 
doStripRegexMatch(String, String) - Method in class org.archive.modules.canonicalize.BaseRule
Run a regex that strips elements of a string.
dotBegin - Variable in class org.archive.modules.writer.MirrorWriterProcessor
If a segment starts with '.', the '.' is replaced by this.
doTeardown() - Method in class org.archive.crawler.framework.CrawlJob
 
dotEnd - Variable in class org.archive.modules.writer.MirrorWriterProcessor
If a directory name ends with '.' it is replaced by this.
doubleToString(double, int) - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
downloadDisregards - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
downloadedUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
downloadFailures - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
dropboxes - Static variable in class org.archive.crawler.restlet.Flash
 
dumpAllPendingToLog() - Method in class org.archive.crawler.frontier.BdbFrontier
Dump all still-enqueued URIs to the crawl.log -- without actually dequeuing.
dumpPendingAtClose - Variable in class org.archive.crawler.frontier.BdbFrontier
 
dumpReports() - Method in class org.archive.crawler.reporting.StatisticsTracker
Run the reports.
dumpSurtPrefixSet() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Dump the current prefixes in use to configured dump file (if any)
dupByHashBytes - Variable in class org.archive.modules.fetcher.FetchStats
 
dupByHashUrls - Variable in class org.archive.modules.fetcher.FetchStats
 
DUPLICATE - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
DUPLICATECOUNT - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
duplicateCount - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
duplicatesAtLastSample - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 

E

EDIT_FILTER - Static variable in class org.archive.crawler.restlet.JobResource
 
EDIT_FILTER - Static variable in class org.archive.crawler.restlet.models.CrawlJobModel
 
editFilter - Variable in class org.archive.crawler.restlet.EnhDirectory
 
EditRepresentation - Class in org.archive.crawler.restlet
Representation wrapping a FileRepresentation, displaying its contents in a TextArea for editting.
EditRepresentation(FileRepresentation, EnhDirectoryResource) - Constructor for class org.archive.crawler.restlet.EditRepresentation
 
elapsedMilliseconds - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
elapsedReport() - Method in class org.archive.crawler.framework.CrawlJob
 
elapsedReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
elementContext(CharSequence, CharSequence) - Static method in class org.archive.modules.extractor.ExtractorHTML
Create a suitable XPath-like context from an element name and optional attribute name.
EMBED_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for embeds without other context.
emitBumper(PrintWriter, boolean) - Method in class org.archive.crawler.restlet.PagedRepresentation
Emit a "start" or "EOF" bumper as appropriate to prominently indicate if page borders start- or end- of-file.
emitControls(PrintWriter) - Method in class org.archive.crawler.restlet.PagedRepresentation
Emit the navigational controls.
emitted(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
EMPTY - Static variable in class org.archive.util.AbstractLongFPSet
A constant used to indicate that a slot in the set storage is empty.
empty - Variable in class st.ata.util.FPGenerator
Fingerprint of the empty string of bytes.
encode(byte[]) - Static method in class org.archive.util.Base32
Encodes byte array to Base32 String.
EncodingUtil - Class in org.apache.commons.httpclient.util
The home for utility methods that handle various encoding tasks.
encounteredReferences - Variable in class org.archive.modules.extractor.PDFParser
 
endDisposition() - Method in interface org.archive.crawler.framework.Frontier
Inform frontier the processing signalled by an earlier pending beginDisposition() call has finished.
endDisposition() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
Engine - Class in org.archive.crawler.framework
Implementation for Engine.
Engine(File) - Constructor for class org.archive.crawler.framework.Engine
 
engine - Variable in class org.archive.crawler.Heritrix
 
engine - Variable in class org.archive.crawler.restlet.EngineApplication
 
EngineApplication - Class in org.archive.crawler.restlet
Restlet Application for a Heritrix crawl 'Engine', which is aware of local job configurations/directories and can assemble/launch/monitor/ manage crawls.
EngineApplication(Engine) - Constructor for class org.archive.crawler.restlet.EngineApplication
 
EngineApplication.EngineStatusService - Class in org.archive.crawler.restlet
Customize Restlet error to include back button and full stack.
EngineApplication.EngineStatusService() - Constructor for class org.archive.crawler.restlet.EngineApplication.EngineStatusService
 
EngineModel - Class in org.archive.crawler.restlet.models
 
EngineModel(Engine, String) - Constructor for class org.archive.crawler.restlet.models.EngineModel
 
engineName - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
engine name; default "beanshell"
engineName - Variable in class org.archive.modules.ScriptedProcessor
engine name; default "beanshell"
EngineResource - Class in org.archive.crawler.restlet
Restlet Resource representing an Engine that may be used to assemble, launch, monitor, and manage crawls.
EngineResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.EngineResource
 
EnhancedEnvironment - Class in org.archive.util.bdbje
Version of BDB_JE Environment with additional convenience features, such as a shared, cached StoredClassCatalog.
EnhancedEnvironment(File, EnvironmentConfig) - Constructor for class org.archive.util.bdbje.EnhancedEnvironment
Constructor
EnhDirectory - Class in org.archive.crawler.restlet
Enhanced version of Restlet Directory, which allows the local filesystem directory to be determined dynamically based on the request details.
EnhDirectory(Context, Reference) - Constructor for class org.archive.crawler.restlet.EnhDirectory
 
EnhDirectory(Context, String) - Constructor for class org.archive.crawler.restlet.EnhDirectory
 
EnhDirectoryResource - Class in org.archive.crawler.restlet
Enhanced version of Restlet DirectoryResource, adding ability to edit some files.
EnhDirectoryResource(EnhDirectory, Request, Response) - Constructor for class org.archive.crawler.restlet.EnhDirectoryResource
 
enqueue(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Add the given CrawlURI, noting its addition in running count.
enqueueCount - Variable in class org.archive.crawler.frontier.WorkQueue
Total number of items ever enqueued
enqueuedCounts - Variable in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
ensureStandardPoliciesAvailable() - Method in class org.archive.modules.CrawlMetadata
 
ensureStaticInitialization() - Static method in class org.archive.crawler.reporting.AlertHandler
Simply to ensure static initialization (installing catchall handler on topmost logger) is run.
Entry - Interface in org.archive.util.ms
 
Entry.EntryType - Enum in org.archive.util.ms
 
entryString(Object) - Static method in class org.archive.util.Histotable
Utility method to convert a key->Long into the string "count key".
entryToObject(DatabaseEntry) - Method in class org.archive.bdb.KryoBinding
 
equals(Object) - Method in class org.apache.commons.httpclient.Cookie
Two cookies are equal if the name, path and domain match.
equals(Object) - Method in class org.archive.modules.extractor.Link
 
equals(Object) - Method in class org.archive.modules.extractor.LinkContext
 
equals(Object) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
All instances of DefaultProtocolSocketFactory are the same.
equals(Object) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
equals(Object) - Method in class org.archive.modules.net.CrawlHost
 
equals(Object) - Method in class org.archive.modules.net.CrawlServer
 
errorCount - Variable in class org.archive.crawler.frontier.WorkQueue
count of errors encountered
errorMessage - Variable in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
 
escape(String) - Static method in class org.archive.util.JavaLiterals
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.AddRedirectFromRootServerToScope
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ContentTypeNotMatchesRegexDecideRule
Evaluate whether given object's string version does not match configured regex (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusDecideRule
Evaluate whether given object is equal to the configured status
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusNotMatchesRegexDecideRule
Evaluate whether given object's FetchStatus does not match configured regex (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.HasViaDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.HopCrossesAssignmentLevelDomainDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
Evaluate whether given object's string version matches configured regexes
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
Evaluate whether given object's string version matches configured regex
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Returns "true" if the provided CrawlURI has a fetch status that falls within this instance's specified range.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesFilePatternDecideRule
Evaluate whether given object's string version does not match configured regex (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesListRegexDecideRule
Evaluate whether given object's string version does not match configured regexs (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesRegexDecideRule
Evaluate whether given object's string version does not match configured regex (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Returns "true" if the provided CrawlURI has a fetch status that does not fall within this instance's specified range.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.PredicatedDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
Evaluate whether given CrawlURI's content-digest exactly matches that of preceding fetch.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotOnDomainsDecideRule
Evaluate whether given object's URI is NOT in the set of domains -- simply reverse superclass's determination
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotOnHostsDecideRule
Evaluate whether given object's URI is NOT in the set of hosts -- simply reverse superclass's determination
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotSurtPrefixedDecideRule
Evaluate whether given object's URI is NOT in the SURT prefix set -- simply reverse superclass's determination
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Evaluate whether given object's URI is covered by the SURT prefix set
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TooManyHopsDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
Evaluate whether given object is over the threshold number of path-segments.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TransclusionDecideRule
Evaluate whether given object is within the acceptable thresholds of transitive hops.
exactKey(String) - Static method in class org.archive.surt.SURTTokenizer
 
execute(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Executes this method using the specified HttpConnection and HttpState.
execute(ScriptEngine, String) - Method in class org.archive.crawler.restlet.ScriptingConsole
 
executor - Variable in class org.archive.crawler.framework.ActionDirectory
 
executor - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
expectedConcurrency - Variable in class org.archive.bdb.BdbModule
Expected number of concurrent threads; used to tune nLockTables according to JE FAQ http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33
expectedInserts - Variable in class org.archive.util.BloomFilter64bit
The expected number of inserts; determines calculated size
expectedResult - Variable in class org.archive.modules.extractor.StringExtractorTestBase.TestData
 
expend(int) - Method in class org.archive.crawler.frontier.WorkQueue
Decrease the internal running budget by the given amount.
expenditureAtLastActivation - Variable in class org.archive.crawler.frontier.WorkQueue
Record of expenditures at last activation (session start)
expirationOperation - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
The action that the processor takes once the runtime has elapsed.
extend(long, byte) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding the low eight bits of "b".
extend(long, char) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "v".
extend(long, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "v".
extend(long, long) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "v".
extend(long, byte[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding "n" bytes of "buf" starting from "buf[start]".
extend(long, char[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "n" characters of "buf" starting from "buf[i]".
extend(long, CharSequence) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) the characters of "s".
extend(long, int[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "n" characters of "buf" starting from "buf[i]".
extend(long, long[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "n" characters of "buf" starting from "buf[i]".
extend8(long, String) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding the lower eight bits of the characters of "s".
extend8(long, char[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding the lower eight bits of "n" characters of "buf" starting from "buf[i]".
extend_byte(long, int) - Method in class st.ata.util.FPGenerator
Extends f with lower eight bits of v without full reduction.
extend_char(long, int) - Method in class st.ata.util.FPGenerator
Extends f with lower sixteen bits of v.
extend_int(long, int) - Method in class st.ata.util.FPGenerator
Extends f with (all bits of) v.
extend_long(long, long) - Method in class st.ata.util.FPGenerator
Extends f with v.
extendHopsPath(String, char) - Static method in class org.archive.modules.CrawlURI
Extend a 'hopsPath' (pathFromSeed string of single-character hop-type symbols), keeping the number of displayed hop-types under MAX_HOPS_DISPLAYED.
ExternalGeoLocationDecideRule - Class in org.archive.modules.deciderules
A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface.
ExternalGeoLocationDecideRule() - Constructor for class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
ExternalGeoLookupInterface - Interface in org.archive.modules.deciderules
Interface used by ExternalImplDecideRule.
externalPaths - Variable in class org.archive.spring.KeyedProperties
the alternate global property-paths leading to this map TODO: consider if deterministic ordered list is important
extract(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
Extracts links
extract(CrawlURI) - Method in class org.archive.modules.extractor.Extractor
Extracts links from the given URI.
extract(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
Run extractor.
extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
Perform usual extraction on a CrawlURI
extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorURI
Perform usual extraction on a CrawlURI
extract(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
Run extractor.
extract(CrawlURI) - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
extractImplied(CharSequence, Pattern, String) - Static method in class org.archive.modules.extractor.ExtractorImpliedURI
Utility method for extracting 'implied' URI given a source uri, trigger pattern, and build pattern.
extractLink(CrawlURI, Link) - Method in class org.archive.modules.extractor.ExtractorURI
Consider a single Link for internal URIs
extractor - Variable in class org.archive.modules.extractor.ContentExtractorTestBase
An extractor created during the setUp.
Extractor - Class in org.archive.modules.extractor
Extracts links from fetched URIs.
Extractor() - Constructor for class org.archive.modules.extractor.Extractor
 
ExtractorCSS - Class in org.archive.modules.extractor
This extractor is parsing URIs from CSS type files.
ExtractorCSS() - Constructor for class org.archive.modules.extractor.ExtractorCSS
 
ExtractorDOC - Class in org.archive.modules.extractor
This class allows the caller to extract href style links from word97-format word documents.
ExtractorDOC() - Constructor for class org.archive.modules.extractor.ExtractorDOC
 
ExtractorHTML - Class in org.archive.modules.extractor
Basic link-extraction, from an HTML content-body, using regular expressions.
ExtractorHTML() - Constructor for class org.archive.modules.extractor.ExtractorHTML
 
ExtractorHTMLForms - Class in org.archive.modules.forms
Extracts extra information about FORMs in HTML, loading this into the CrawlURI (for potential later use by FormLoginProcessor) and adding a small annotation to the crawl.log.
ExtractorHTMLForms() - Constructor for class org.archive.modules.forms.ExtractorHTMLForms
 
ExtractorHTTP - Class in org.archive.modules.extractor
Extracts URIs from HTTP response headers.
ExtractorHTTP() - Constructor for class org.archive.modules.extractor.ExtractorHTTP
 
ExtractorImpliedURI - Class in org.archive.modules.extractor
An extractor for finding 'implied' URIs inside other URIs.
ExtractorImpliedURI() - Constructor for class org.archive.modules.extractor.ExtractorImpliedURI
Constructor.
extractorJS - Variable in class org.archive.modules.extractor.ExtractorHTML
Javascript extractor to use to process inline javascript.
ExtractorJS - Class in org.archive.modules.extractor
Processes Javascript files for strings that are likely to be crawlable URIs.
ExtractorJS() - Constructor for class org.archive.modules.extractor.ExtractorJS
 
extractorJS - Variable in class org.archive.modules.extractor.ExtractorSWF
Javascript extractor to use to process inline javascript.
ExtractorMultipleRegex - Class in org.archive.modules.extractor
An extractor that uses regular expressions to find strings in the fetched content of a URI, and constructs outlink URIs from those strings.
ExtractorMultipleRegex() - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex
 
ExtractorMultipleRegex.GroupList - Class in org.archive.modules.extractor
 
ExtractorMultipleRegex.GroupList(MatchResult) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.GroupList
 
ExtractorMultipleRegex.MatchList - Class in org.archive.modules.extractor
 
ExtractorMultipleRegex.MatchList(String, CharSequence) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.MatchList
 
ExtractorMultipleRegex.MatchList(ExtractorMultipleRegex.GroupList...) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.MatchList
 
extractorParameters - Variable in class org.archive.modules.extractor.Extractor
 
ExtractorParameters - Interface in org.archive.modules.extractor
Bean interface for parameters consulted by multiple Extractors, and thus provided by some shared object.
ExtractorPDF - Class in org.archive.modules.extractor
Allows the caller to process a CrawlURI representing a PDF for the purpose of extracting URIs
ExtractorPDF() - Constructor for class org.archive.modules.extractor.ExtractorPDF
 
ExtractorSWF - Class in org.archive.modules.extractor
Extracts URIs from SWF (flash/shockwave) files.
ExtractorSWF() - Constructor for class org.archive.modules.extractor.ExtractorSWF
 
ExtractorSWF.CrawlUriSWFAction - Class in org.archive.modules.extractor
SWF action that handles discovered URIs.
ExtractorSWF.CrawlUriSWFAction(CrawlURI, Extractor) - Constructor for class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
ExtractorSWF.ExtractorTagParser - Class in org.archive.modules.extractor
TagParser customized to ignore SWFTags that will never contain extractable URIs.
ExtractorSWF.ExtractorTagParser(SWFTagTypes) - Constructor for class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
ExtractorUniversal - Class in org.archive.modules.extractor
A last ditch extractor that will look at the raw byte code and try to extract anything that looks like a link.
ExtractorUniversal() - Constructor for class org.archive.modules.extractor.ExtractorUniversal
Constructor.
ExtractorURI - Class in org.archive.modules.extractor
An extractor for finding URIs inside other URIs.
ExtractorURI() - Constructor for class org.archive.modules.extractor.ExtractorURI
Constructor
ExtractorXML - Class in org.archive.modules.extractor
A simple extractor which finds HTTP URIs inside XML/RSS files, inside attribute values and simple elements (those with only whitespace + HTTP URI + whitespace as contents).
ExtractorXML() - Constructor for class org.archive.modules.extractor.ExtractorXML
 
extractQueryStringLinks(UURI) - Static method in class org.archive.modules.extractor.ExtractorURI
Look for URIs inside the supplied UURI.
extractURIs() - Method in class org.archive.modules.extractor.PDFParser
Extract URIs from all objects found in a Pdf document's catalog.
extractURIs(PdfObject) - Method in class org.archive.modules.extractor.PDFParser
Parse a PdfDictionary, looking for URIs recursively and adding them to foundURIs
extraInfo - Variable in class org.archive.modules.CrawlURI
 

F

F_ADD - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_DISREGARD - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_EMIT - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_FAILURE - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_INCLUDE - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_REENQUEUED - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_SUCCESS - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
FACTORIES - Static variable in class org.archive.crawler.restlet.ScriptResource
 
failedFetchCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that failed to process.
failedFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
failedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
fakeResponse(StatusLine, HeaderGroup, InputStream) - Method in class org.apache.commons.httpclient.HttpMethodBase
This method is a dirty hack intended to work around current (2.0) design flaw that prevents the user from obtaining correct status code, headers and response body from the preceding HTTP CONNECT method.
fastOutputStreamHolder - Variable in class org.archive.crawler.frontier.RecyclingSerialBinding
Thread-local cache of reusable FastOutputStream
fetch(CrawlURI, String, String) - Method in class org.archive.modules.fetcher.FetchWhois
 
fetchChain - Variable in class org.archive.crawler.framework.CrawlController
Fetch chain
FetchChain - Class in org.archive.modules
 
FetchChain() - Constructor for class org.archive.modules.FetchChain
 
fetchDisregards - Variable in class org.archive.modules.fetcher.FetchStats
 
FetchDNS - Class in org.archive.modules.fetcher
Processor to resolve 'dns:' URIs.
FetchDNS() - Constructor for class org.archive.modules.fetcher.FetchDNS
 
FetchErrors - Class in org.archive.modules.fetcher
 
FetchErrors() - Constructor for class org.archive.modules.fetcher.FetchErrors
 
fetchFailures - Variable in class org.archive.modules.fetcher.FetchStats
 
FetchFTP - Class in org.archive.modules.fetcher
Fetches documents and directory listings using FTP.
FetchFTP() - Constructor for class org.archive.modules.fetcher.FetchFTP
Constructs a new FetchFTP.
FetchFTP.SocketFactoryWithTimeout - Class in org.archive.modules.fetcher
A SocketFactory much like DefaultSocketFactory, except that the createSocket() methods that open connections support a connect timeout.
FetchFTP.SocketFactoryWithTimeout() - Constructor for class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
FetchHistoryProcessor - Class in org.archive.modules.recrawl
Maintain a history of fetch information inside the CrawlURI's attributes.
FetchHistoryProcessor() - Constructor for class org.archive.modules.recrawl.FetchHistoryProcessor
 
FetchHTTP - Class in org.archive.modules.fetcher
HTTP fetcher that uses Apache Jakarta Commons HttpClient library.
FetchHTTP() - Constructor for class org.archive.modules.fetcher.FetchHTTP
Constructor.
fetchNonResponses - Variable in class org.archive.modules.fetcher.FetchStats
 
fetchResponses - Variable in class org.archive.modules.fetcher.FetchStats
 
FetchStats - Class in org.archive.modules.fetcher
Collector of statistics for a 'subset' of a crawl, such as a server (host:port), host, or frontier group (eg queue).
FetchStats() - Constructor for class org.archive.modules.fetcher.FetchStats
 
FetchStats.CollectsFetchStats - Interface in org.archive.modules.fetcher
 
FetchStats.HasFetchStats - Interface in org.archive.modules.fetcher
 
FetchStats.Stage - Enum in org.archive.modules.fetcher
 
FetchStatusCodes - Interface in org.archive.modules.fetcher
Constant flag codes to be used, in lieu of per-protocol codes (like HTTP's 200, 404, etc.), when network/internal/ out-of-band conditions occur.
fetchStatusCodesToString(int) - Static method in class org.archive.modules.CrawlURI
Takes a status code and converts it into a human readable string.
FetchStatusDecideRule - Class in org.archive.modules.deciderules
Rule applies the configured decision for any URI which has a fetch status equal to the 'target-status' setting.
FetchStatusDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusDecideRule
Usual constructor.
FetchStatusMatchesRegexDecideRule - Class in org.archive.modules.deciderules
 
FetchStatusMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusMatchesRegexDecideRule
Usual constructor.
FetchStatusNotMatchesRegexDecideRule - Class in org.archive.modules.deciderules
 
FetchStatusNotMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusNotMatchesRegexDecideRule
Usual constructor.
fetchSuccesses - Variable in class org.archive.modules.fetcher.FetchStats
 
FetchWhois - Class in org.archive.modules.fetcher
WHOIS Fetcher (RFC 3912).
FetchWhois() - Constructor for class org.archive.modules.fetcher.FetchWhois
 
FetchWhois.UrlStatus - Enum in org.archive.modules.fetcher
 
file - Variable in class org.archive.crawler.restlet.PagedRepresentation
File
fileLogger - Variable in class org.archive.crawler.framework.Scoper
 
fileLogger - Variable in class org.archive.modules.deciderules.DecideRuleSequence
 
filename - Variable in enum org.archive.crawler.util.Logs
 
fileRepresentation - Variable in class org.archive.crawler.restlet.EditRepresentation
 
fileRepresentation - Variable in class org.archive.crawler.restlet.PagedRepresentation
wrapped FileRepresentation
FilesystemLinkMaker - Class in org.archive.util
Wrapper for platform-dependent hard link creation.
FilesystemLinkMaker() - Constructor for class org.archive.util.FilesystemLinkMaker
 
FilesystemLinkMaker.Kernel32Library - Interface in org.archive.util
 
FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES - Class in org.archive.util
 
FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES() - Constructor for class org.archive.util.FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES
 
fillWith(CrawlURI, String) - Method in class org.archive.crawler.reporting.SeedRecord
Fill instance with given values; skips makeDirty so may be used on initialization.
finalize() - Method in class org.archive.util.ObjectIdentityBdbCache
 
finalize() - Method in class org.archive.util.ObjectIdentityBdbCache.LowMemoryCanary
When collected/finalized -- as should be expected in low-memory conditions -- trigger an expunge and a new 'canary' insertion.
finalize() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
finalTasks() - Method in class org.archive.crawler.frontier.AbstractFrontier
Perform any tasks necessary before entering FINISH frontier state/FINISHED crawl state
finalTasks() - Method in class org.archive.crawler.frontier.BdbFrontier
 
find(SortedSet<String>, String) - Static method in class org.archive.util.PrefixFinder
Extracts prefixes of a given string from a SortedSet.
findAttributeValueGroup(String, int, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
findAvailableCheckpointDirectories() - Method in class org.archive.crawler.framework.CheckpointService
Returns a list of available, valid (contains 'valid' file) checkpoint directories, as File instances, with the more recently-written appearing first.
findEligibleURI() - Method in class org.archive.crawler.frontier.AbstractFrontier
Find a CrawlURI eligible to be put on the outbound queue for processing.
findEligibleURI() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the next CrawlURI eligible to be processed (and presumably visited/fetched) by a a worker thread.
findFirstLineBeginning(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that that begins with the given string.
findFirstLineBeginningFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that begins with the given string.
findFirstLineContaining(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findFirstLineContaining(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findFirstLineContainingFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findGroups(String, int, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
findJobConfigs() - Method in class org.archive.crawler.framework.Engine
Find all job configurations in the usual place -- subdirectories of the jobs directory with files ending '.cxml', and from jobPathFiles (previously added by user) found in the jobs directory
findKeys(SortedMap<String, ?>, String) - Static method in class org.archive.util.PrefixFinder
 
findTarget(Request, Response) - Method in class org.archive.crawler.restlet.EnhDirectory
 
FINISH - Static variable in class org.archive.modules.ProcessResult
 
finishCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
 
finishCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
Cleanup/unlock; need not complete for a checkpoint to be valid.
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
 
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
 
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
finished(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Report a URI being processed as having finished processing.
finished(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Note that the previously emitted CrawlURI has completed its processing (for now).
finishedDisregard(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
finishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
finishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
finishedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that have finished processing.
finishedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
finishedUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
finishFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
finishFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Complete the merge of candidate and previously-known FPs (closing files/iterators as appropriate).
finishFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
FirstNamedRobotsPolicy - Class in org.archive.modules.net
Working from an ordered list of potential User-Agents, consisting of first the regularly-configured User-Agent and then those in the candidateUserAgents list, consider each potential agent in order.
FirstNamedRobotsPolicy() - Constructor for class org.archive.modules.net.FirstNamedRobotsPolicy
 
fixup(String) - Method in class org.archive.crawler.reporting.HostsReport
 
fixupConfigPath(ConfigPath, String) - Method in class org.archive.spring.ConfigPathConfigurer
 
fixupPaths(Object, String) - Method in class org.archive.spring.ConfigPathConfigurer
Find any ConfigPath properties in the passed bean; ensure that if they have a null 'base', that is replaced with the job home directory.
FixupQueryString - Class in org.archive.modules.canonicalize
Strip any trailing question mark.
FixupQueryString() - Constructor for class org.archive.modules.canonicalize.FixupQueryString
 
Flash - Class in org.archive.crawler.restlet
Utility for including a brief last-action or background-action message on web responses.
Flash(String) - Constructor for class org.archive.crawler.restlet.Flash
Create an ACK flash of default styling with the given message.
Flash(String, Flash.Kind) - Constructor for class org.archive.crawler.restlet.Flash
Create a Flash of the given kind, message with default styling.
Flash.Kind - Enum in org.archive.crawler.restlet
usual types
flattenH1Order(Document) - Static method in class org.archive.crawler.migrate.MigrateH1to3Tool
Given a Document, return a Map of all non-blank simple text nodes, keyed by the pseudo-XPath to their parent element.
flattenVia() - Method in class org.archive.modules.CrawlURI
Method returns string version of this URI's referral URI.
flattenVia(CrawlURI) - Static method in class org.archive.modules.Processor
 
flush() - Method in class org.archive.crawler.reporting.AlertHandler
 
flush() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
flush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Perform a merge of all 'pending' items to the overall fingerprint list.
FLUSH_DELAY_FACTOR - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
flushRequestOutputStream() - Method in class org.apache.commons.httpclient.HttpConnection
Flushes the output request stream.
forAllHostsDo(Closure) - Method in class org.archive.modules.fetcher.DefaultServerCache
NOTE: Should not mutate the CrawlHost instance so retrieved; depending on the hostscache implementation, the change may not be reliably persistent.
forAllHostsDo(Closure) - Method in class org.archive.modules.net.ServerCache
Utility for performing an action on every CrawlHost.
forAllPendingDo(Closure) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Utility method to perform action for all pending CrawlURI instances.
forceFetch() - Method in class org.archive.modules.CrawlURI
If this method returns true, this URI should be fetched even though it already has been crawled.
forceScarceMemory() - Static method in class org.archive.util.TestUtils
Temporarily exhaust memory, forcing weak/soft references to be broken.
forceWakeQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Utility method for advanced users/experimentation: force wake all snoozed queues -- for example to kick a crawl where connectivity problems have put all queues in slow-retry-snoozes back to busy-ness.
forget(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Forget item was seen
forget(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Forget the given CrawlURI.
forget(String, CrawlURI) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
forget(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
forget(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
forgetAllButLatest - Variable in class org.archive.checkpointing.Checkpoint
 
forgetAllButLatest - Variable in class org.archive.crawler.framework.CheckpointService
 
forgetAllSchemeAuthorityMatching(String) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Forget all entries that match the scheme+host+port of the given url, so that they can be crawled again if discovered again.
format(LogRecord) - Method in class org.archive.crawler.framework.CrawlJob.JobLogFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.NonFatalErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.RuntimeErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.StatisticsLogFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.UriErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
format(LogRecord) - Method in class org.archive.util.OneLineSimpleLogger
 
formatBytes(Long) - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
formatCookie(Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Create a "Cookie" header value for an array of cookies.
formatCookie(Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Return a string suitable for sending in a "Cookie" header
formatCookie(Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
formatCookieHeader(Cookie[]) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Create a "Cookie" Header for an array of Cookies.
formatCookieHeader(Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Create a "Cookie" Header for single Cookie.
formatCookieHeader(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Create a "Cookie" Header containing all Cookies in cookies.
formatCookieHeader(Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Create a "Cookie" Header containing the Cookie.
formatCookieHeader(Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
formatCookieHeader(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
formatCookies(Cookie[]) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Create a "Cookie" header value for an array of cookies.
formatCookies(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Create a "Cookie" header value containing all Cookies in cookies suitable for sending in a "Cookie" header
formatCookies(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
formItems - Variable in class org.archive.modules.credential.HtmlFormCredential
Form items.
FormLoginProcessor - Class in org.archive.modules.forms
A step, post-ExtractorHTMLForms, where a followup CrawlURI to attempt a form submission may be synthesized.
FormLoginProcessor() - Constructor for class org.archive.modules.forms.FormLoginProcessor
 
formUrlEncode(NameValuePair[], String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Form-urlencoding routine.
foundURIs - Variable in class org.archive.modules.extractor.PDFParser
 
fp(byte[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of "n" bytes of "buf" starting from "buf[start]".
fp(char[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of (all bits of) "n" characters of "buf" starting from "buf[i]".
fp(CharSequence) - Method in class st.ata.util.FPGenerator
Compute fingerprint of (all bits of) the characters of "s".
fp(int[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of (all bits of) "n" characters of "buf" starting from "buf[i]".
fp(long[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of (all bits of) "n" characters of "buf" starting from "buf[i]".
fp8(String) - Method in class st.ata.util.FPGenerator
Compute fingerprint of the lower eight bits of the characters of "s".
fp8(char[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of the lower eight bits of "n" characters of "buf" starting from "buf[i]".
FPGenerator - Class in st.ata.util
This class provides methods that construct fingerprints of strings of bytes via operations in GF[2^d] for 0 < d <= 64.
FPMergeUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter based on merging FP arrays (in memory or from disk).
FPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter
 
FPMergeUriUniqFilter.PendingItem - Class in org.archive.crawler.util
Represents a long fingerprint and (possibly) its corresponding CrawlURI, awaiting the next merge in a 'pending' state.
FPMergeUriUniqFilter.PendingItem(long, CrawlURI) - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
fpset - Variable in class org.archive.crawler.util.FPUriUniqFilter
 
FPUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter storing 64-bit UURI fingerprints, using an internal LongFPSet instance.
FPUriUniqFilter(LongFPSet) - Constructor for class org.archive.crawler.util.FPUriUniqFilter
Create FPUriUniqFilter wrapping given long set
FPUriUniqFilter() - Constructor for class org.archive.crawler.util.FPUriUniqFilter
 
freeReserveMemory() - Method in class org.archive.crawler.framework.CrawlController
 
frequentFlushes - Variable in class org.archive.modules.writer.WriterPoolProcessor
Whether to flush to underlying file frequently (at least after each record), or not.
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.extractor.Extractor
 
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.forms.FormLoginProcessor
 
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.Processor
Restore internal state from JSONObject stored at earlier checkpoint-time.
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
fromHopsViaString(String) - Static method in class org.archive.modules.CrawlURI
 
frontier - Variable in class org.archive.crawler.framework.ActionDirectory
autowired frontier for actions
frontier - Variable in class org.archive.crawler.framework.CrawlController
The frontier to use for the crawl.
Frontier - Interface in org.archive.crawler.framework
An interface for URI Frontiers.
frontier - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
The frontier to use.
frontier - Variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
frontier - Variable in class org.archive.crawler.processor.HashCrawlMapper
 
frontier - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
 
Frontier.FrontierGroup - Interface in org.archive.crawler.framework
Generic interface representing the internal groupings of a Frontier's URIs -- usually queues.
Frontier.State - Enum in org.archive.crawler.framework
Enumeration of possible target states.
FrontierJournal - Class in org.archive.crawler.frontier
Helper class for managing a simple Frontier change-events journal which is useful for recovering from crawl problems.
FrontierJournal(String, String) - Constructor for class org.archive.crawler.frontier.FrontierJournal
Create a new recovery journal at the given location
FrontierNonemptyReport - Class in org.archive.crawler.reporting
Report of all nonempty Frontier queues (as usually dumped at end of crawl for reference).
FrontierNonemptyReport() - Constructor for class org.archive.crawler.reporting.FrontierNonemptyReport
 
FrontierPreparer - Class in org.archive.crawler.prefetch
Processor to preload URI with as much precalculated policy-based info as possible before it reaches frontier criticial sections.
FrontierPreparer() - Constructor for class org.archive.crawler.prefetch.FrontierPreparer
 
frontierReport() - Method in class org.archive.crawler.framework.CrawlJob
 
frontierReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
FrontierSummaryReport - Class in org.archive.crawler.reporting
Frontier summary report showing a limited number of queues of each type -- as typically consulted during a crawl in progress.
FrontierSummaryReport() - Constructor for class org.archive.crawler.reporting.FrontierSummaryReport
 
fullVia - Variable in class org.archive.modules.CrawlURI
 
futureUriCount() - Method in interface org.archive.crawler.framework.Frontier
 
futureUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
futureUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
futureUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
futureUris - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
URIs scheduled to be re-enqueued at future date

G

generateCrawlLogTail() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
generateFrom(ConfigPath, int) - Method in class org.archive.checkpointing.Checkpoint
Use immediately after instantiation to fill-in a Checkpoint created outside Spring configuration.
generateJobLogTail() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
generateReports() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
generateRequestLine(HttpConnection, String, String, String, String) - Static method in class org.apache.commons.httpclient.HttpMethodBase
Generates HTTP request line according to the specified attributes.
generator - Variable in class org.archive.io.Arc2Warc
 
generator - Variable in class org.archive.modules.writer.WARCWriterProcessor
Generator for record IDs
get(Object) - Method in class org.archive.crawler.framework.BeanLookupBindings
 
get(DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Get the next nearest item after the given key.
get(String) - Static method in class org.archive.crawler.util.LogReader
Returns the entire file.
get(InputStreamReader) - Static method in class org.archive.crawler.util.LogReader
Reads entire contents of reader, returns as string.
get(String, int, int) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log file.
get(InputStreamReader, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log file.
get(Object, String) - Method in class org.archive.modules.credential.CredentialStore
 
get(CharSequence, CharSequence) - Static method in class org.archive.modules.extractor.HTMLLinkContext
return an instance of HTMLLinkContext for attribute attr in element el.
get(String) - Static method in class org.archive.modules.extractor.HTMLLinkContext
return an instance of HTMLLinkContext for path path.
get(String) - Method in class org.archive.spring.KeyedProperties
Get the given value, checking override maps if appropriate.
get(Object) - Method in class org.archive.util.Histotable
Return 0 instead of null for absent keys.
get() - Method in class org.archive.util.IdentityCacheableWrapper
 
get(String) - Method in class org.archive.util.ObjectIdentityBdbCache
 
get(String) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
get(String) - Method in interface org.archive.util.ObjectIdentityCache
get the object under the given key/name -- but should not mutate object state
get(String) - Method in class org.archive.util.ObjectIdentityMemCache
 
get() - Method in class org.archive.util.Supplier
 
getAcceptCompression() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getAcceptHeaders() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getAcceptNonDnsResolves() - Method in class org.archive.modules.fetcher.FetchDNS
 
getAction() - Method in class org.archive.modules.forms.HTMLForm
 
getActionDir() - Method in class org.archive.crawler.framework.ActionDirectory
 
getActiveToeCount() - Method in class org.archive.crawler.framework.CrawlController
 
getActiveToeCount() - Method in class org.archive.crawler.framework.ToePool
 
getAlertCount() - Method in class org.archive.crawler.framework.CrawlJob
 
getAlertCount() - Method in class org.archive.crawler.reporting.AlertThreadGroup
 
getAlertCount() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getAlertsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getAll() - Method in class org.archive.modules.credential.CredentialStore
 
getAllConfigPaths() - Method in class org.archive.spring.ConfigPathConfigurer
 
getAllErrors() - Method in class org.archive.spring.PathSharingContext
 
getAllowByRegex() - Method in class org.archive.crawler.prefetch.Preselector
 
getAlsoCheckVia() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getAnnotations() - Method in class org.archive.modules.CrawlURI
Get the annotations set for this uri.
getApplicableSurtPrefix() - Method in class org.archive.modules.forms.FormLoginProcessor
 
getAsciiBytes(String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the specified string to byte array of ASCII characters.
getAsciiString(byte[], int, int) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the byte array of ASCII characters to a string.
getAsciiString(byte[]) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the byte array of ASCII characters to a string.
getAsText() - Method in class org.archive.io.ReadSourceEditor
 
getAsText() - Method in class org.archive.spring.ConfigPathEditor
 
getAt(long) - Method in class org.archive.util.AbstractLongFPSet
Get the stored value at the given slot.
getAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
getAttributeEither(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchHTTP
Get a value either from inside the CrawlURI instance, or from settings (module attributes).
getAudience() - Method in class org.archive.modules.CrawlMetadata
 
getAuthenticationRealm() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
use #getHostAuthState()
getAuthScheme(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
 
getAvailableGlobalVariables() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getAvailableGlobalVariables() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getAvailableRobotsPolicies() - Method in class org.archive.modules.CrawlMetadata
 
getAvailableScriptEngines() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getAvailableScriptEngines() - Method in class org.archive.crawler.restlet.ScriptResource
 
getBalanceReplenishAmount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getBase() - Method in class org.archive.spring.ConfigPath
 
getBasePrecedence() - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
getBasePrecedence() - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
getBaseURI() - Method in class org.archive.modules.CrawlURI
Get the (HTML) Base URI used for derelativizing internal URIs.
getBdbSubDirectory(File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getBeanName() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
getBeanName() - Method in class org.archive.modules.Processor
 
getBeanName() - Method in class org.archive.spring.HeritrixLifecycleProcessor
 
getBeanpathTarget(String) - Method in class org.archive.crawler.framework.CrawlJob
Utility method for getting a bean or any other object addressable with a 'bean path' -- a property-path string (with dots and []indexes) starting with a bean name.
getBeansRefPath() - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
getBit(long) - Method in interface org.archive.util.BloomFilter
 
getBit(long) - Method in class org.archive.util.BloomFilter64bit
Returns from the local bitvector the value of the bit with the specified index.
getBlockAll() - Method in class org.archive.crawler.prefetch.Preselector
 
getBlockAwaitingSeedLines() - Method in class org.archive.modules.seeds.TextSeedModule
 
getBlockByRegex() - Method in class org.archive.crawler.prefetch.Preselector
 
getBloomFilter() - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
getBuiltJobs() - Method in class org.archive.crawler.restlet.EngineResource
 
getByRealm(Set<Credential>, String, CrawlURI) - Static method in class org.archive.modules.credential.HttpAuthenticationCredential
Convenience method that does look up on passed set using realm for key.
getByRegex(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegex(InputStreamReader, String, int, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegex(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegex(InputStreamReader, String, String, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegexFromSeries(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegexFromSeries(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getBytes(String, String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the specified string to a byte array.
getBytesPerFileType(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns the accumulated number of bytes from files of a given file type.
getBytesPerHost(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns the accumulated number of bytes downloaded from a given host.
getCacheMisses() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getCachePercent() - Method in class org.archive.bdb.BdbModule
 
getCacheSize() - Method in class org.archive.bdb.BdbModule
 
getCalculateRobotsOnly() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getCandidateChain() - Method in class org.archive.crawler.framework.CrawlController
 
getCandidateChain() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getCandidateUserAgents() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
getCandidateUserAgents() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
getCanonicalizationPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getCanonicalString() - Method in class org.archive.modules.CrawlURI
 
getCaseSensitiveFilesystem() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getCharacterMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getCharPosLimit() - Method in class org.archive.util.ms.Piece
 
getCharPosStart() - Method in class org.archive.util.ms.Piece
 
getCheckOutlinks() - Method in class org.archive.crawler.processor.CrawlMapper
 
getCheckpoint() - Method in class org.archive.crawler.framework.CheckpointSuccessEvent
 
getCheckpointDir() - Method in class org.archive.checkpointing.Checkpoint
 
getCheckpointIntervalMinutes() - Method in class org.archive.crawler.framework.CheckpointService
 
getCheckpointsDir() - Method in class org.archive.crawler.framework.CheckpointService
 
getCheckpointService() - Method in class org.archive.crawler.framework.CrawlJob
Return the configured Checkpointer instance, if there is exactly one, otherwise null.
getCheckUri() - Method in class org.archive.crawler.processor.CrawlMapper
 
getChild() - Method in interface org.archive.util.ms.Entry
 
getChmod() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getChmodValue() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getClassCatalog() - Method in class org.archive.bdb.BdbModule
 
getClassCatalog() - Method in class org.archive.util.bdbje.EnhancedEnvironment
Return a StoredClassCatalog backed by a Database in this environment, either pre-existing or created (and cached) if necessary.
getClassCheckpointFile(File, String, Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFile(File, Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFilename(Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFilename(Class<?>, String) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassKey(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.AssignmentLevelSurtQueueAssignmentPolicy
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
Get the String key (name) of the queue to which the CrawlURI should be assigned.
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
getClassKey() - Method in class org.archive.crawler.frontier.WorkQueue
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getClassKey() - Method in class org.archive.modules.CrawlURI
Get the token (usually the hostname + port) which indicates what "class" this CrawlURI should be grouped with, for the purposes of ensuring only one item of the class is processed at once, all items of the class are held for a politeness period, etc.
getCollection() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getComment() - Method in class org.apache.commons.httpclient.Cookie
Returns the comment describing the purpose of this cookie, or null if no such comment has been defined.
getComment() - Method in class org.archive.modules.deciderules.DecideRule
 
getComponent() - Method in class org.archive.crawler.Heritrix
 
getCompoundName(String) - Static method in class org.archive.util.JndiUtils
 
getCompoundName(ObjectName) - Static method in class org.archive.util.JndiUtils
Return name to use as jndi name.
getCompress() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getConfigPathConfigurer() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getConfigPaths() - Method in class org.archive.crawler.framework.CrawlJob
Return all known ConfigPaths, as an aid to viewing or editting.
getConfigurationFile() - Method in class org.archive.spring.PathSharingContext
 
getConfigurationFilePath() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
getConnectTimeoutMs() - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
getContentCharSet(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the character set from the Content-Type header.
getContentDeclaredCharset(CrawlURI, String) - Method in class org.archive.modules.extractor.ExtractorHTML
 
getContentDeclaredCharset(CrawlURI, String) - Method in class org.archive.modules.extractor.ExtractorXML
 
getContentDigest() - Method in class org.archive.modules.CrawlURI
Return the retained content-digest value, if any.
getContentDigestHistory() - Method in class org.archive.modules.CrawlURI
 
getContentDigestSchemeString() - Method in class org.archive.modules.CrawlURI
 
getContentDigestString() - Method in class org.archive.modules.CrawlURI
 
getContentLength() - Method in class org.archive.modules.CrawlURI
For completed HTTP transactions, the length of the content-body.
getContentLengthThreshold() - Method in class org.archive.modules.deciderules.ContentLengthDecideRule
 
getContentLengthThreshold() - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
getContentRegexes() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
getContentSize() - Method in class org.archive.modules.CrawlURI
Get the size in bytes of this URI's recorded content, inclusive of things like protocol headers.
getContentType() - Method in class org.archive.modules.CrawlURI
Get the content type of this URI.
getContentType() - Method in class org.archive.net.s3.S3URLConnection
XXX Not sure what this should be or if it even matters for our use.
getContentTypeMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getContext() - Method in class org.archive.modules.extractor.Link
 
getControlConversation() - Method in class org.archive.net.ClientFTP
 
getController() - Method in class org.archive.crawler.framework.ToePool
 
getController() - Method in class org.archive.crawler.framework.ToeThread
Get the CrawlController acossiated with this thread.
getControlUri(long, int, boolean) - Method in class org.archive.crawler.restlet.PagedRepresentation
Construct navigational URI for given parameters.
getCookiePolicy() - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
Use HttpMethodParams.getCookiePolicy(), HttpMethod.getParams().
getCookies() - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use getCookiesMap() // <- IA/HERITRIX CHANGE
getCookies(String, int, String, boolean) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use CookieSpec#match(String, int, String, boolean, Cookie)
getCookiesLoadFile() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
getCookiesMap() - Method in class org.apache.commons.httpclient.HttpState
Returns a sorted map of cookies that this HTTP state currently contains.
getCookiesMap() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
getCookiesMap() - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
getCookiesMap() - Method in interface org.archive.modules.fetcher.CookieStorage
 
getCookiesMap() - Method in class org.archive.modules.fetcher.SimpleCookieStorage
 
getCookiesSaveFile() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
getCookieStorage() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getCoreKey(UURI) - Method in class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
 
getCoreKey(UURI) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
getCoreKey(UURI) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
getCost(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
Return the 'cost' of a CrawlURI (how much of its associated queue's budget it depletes upon attempted processing)
getCostAssignmentPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getCount() - Method in class org.archive.crawler.frontier.WorkQueue
Count of URIs in this queue.
getCountryCode() - Method in class org.archive.modules.net.CrawlHost
Get country code of this host
getCountryCodes() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
getCrawlController() - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
 
getCrawlController() - Method in class org.archive.crawler.framework.CheckpointService
 
getCrawlController() - Method in class org.archive.crawler.framework.CrawlJob
 
getCrawlController() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
getCrawlController() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getCrawlController() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getCrawlController() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
getCrawlController() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
getCrawlController() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getCrawlDelay() - Method in class org.archive.modules.net.RobotsDirectives
 
getCrawlDuration() - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns how long the current crawl has been running *including* time paused (contrast with getCrawlElapsedTime()).
getCrawledBytes() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getCrawlElapsedTime() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getCrawlerCount() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getCrawlExitStatus() - Method in class org.archive.crawler.framework.CrawlController
 
getCrawlJob() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getCrawlJobShortName() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getCrawlJobUrl() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getCrawlLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getCrawlURI() - Method in class org.archive.crawler.event.CrawlURIDispositionEvent
 
getCreateHostDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getCreatePortDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getCredentials(String, String) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use #getCredentials(AuthScope)
getCredentials(AuthScope) - Method in class org.apache.commons.httpclient.HttpState
Get the credentials for the given authentication scope.
getCredentials() - Method in class org.archive.modules.CrawlURI
 
getCredentials() - Method in class org.archive.modules.credential.CredentialStore
 
getCredentials() - Method in class org.archive.modules.net.CrawlServer
 
getCredentialStore() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getCredentialStore() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getCredentialTypes() - Static method in class org.archive.modules.credential.CredentialStore
 
getCurrentLaunchDir() - Method in class org.archive.spring.PathSharingContext
 
getCurrentLaunchId() - Method in class org.archive.spring.PathSharingContext
 
getCurrentProcessorName() - Method in class org.archive.crawler.framework.ToeThread
 
getCustomEditor() - Method in class org.archive.io.ReadSourceEditor
 
getCustomEditor() - Method in class org.archive.spring.ConfigPathEditor
 
getCustomRobots() - Method in class org.archive.modules.net.CustomRobotsPolicy
 
getData() - Method in class org.archive.modules.CrawlURI
 
getData() - Method in class org.archive.modules.extractor.Link
Attribute list
getData() - Method in class org.archive.spring.PathSharingContext
 
getDatabase(String) - Method in class org.archive.bdb.BdbModule
 
getDatabaseConfig() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getDatabaseName() - Method in class org.archive.util.ObjectIdentityBdbCache
 
getDatabaseName() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
getDataList(String) - Method in class org.archive.modules.CrawlURI
Convenience method: return (creating if necessary) list at given data key
getDecision() - Method in class org.archive.modules.deciderules.PredicatedDecideRule
 
getDefaultCharset() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getDefaultEncoding() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getDefaultMaxFileSize() - Method in class org.archive.modules.writer.ARCWriterProcessor
 
getDefaultMaxFileSize() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getDefaultMaxFileSize() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getDefaultRules() - Static method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
A reasonable set of default rules to use, if no others are provided by operator configuration.
getDefaultStorePaths() - Method in class org.archive.modules.writer.ARCWriterProcessor
 
getDefaultStorePaths() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getDefaultStorePaths() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getDefaultUriPrecedencePolicy() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
getDeferrals() - Method in class org.archive.modules.CrawlURI
Get the deferral count.
getDeferToPrevious() - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
Whether to always defer to a previously-assigned key inside the CrawlURI.
getDelay(TimeUnit) - Method in class org.archive.crawler.frontier.WorkQueue
 
getDelayFactor() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getDelaySeconds() - Method in class org.archive.crawler.framework.ActionDirectory
 
getDescription() - Method in enum org.archive.crawler.framework.CrawlStatus
 
getDescription() - Method in class org.archive.modules.CrawlMetadata
 
getDestination() - Method in class org.archive.modules.extractor.Link
 
getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchDNS
 
getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchFTP
 
getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getDigestContent() - Method in class org.archive.modules.fetcher.FetchDNS
 
getDigestContent() - Method in class org.archive.modules.fetcher.FetchFTP
 
getDigestContent() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getDir() - Method in class org.archive.bdb.BdbModule
 
getDirectivesFor(String, boolean) - Method in class org.archive.modules.net.Robotstxt
Return the RobotsDirectives, if any, appropriate for the given User-Agent string.
getDirectivesFor(String) - Method in class org.archive.modules.net.Robotstxt
Return directives to use for the given User-Agent, resorting to wildcard rules or the default no-directives if necessary.
getDirectory() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getDirectoryFile() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getDisposition() - Method in class org.archive.crawler.event.CrawlURIDispositionEvent
 
getDisposition() - Method in class org.archive.crawler.reporting.SeedRecord
 
getDispositionChain() - Method in class org.archive.crawler.framework.CrawlController
 
getDiversionDir() - Method in class org.archive.crawler.processor.CrawlMapper
 
getDiversionLog(String) - Method in class org.archive.crawler.processor.CrawlMapper
Get the diversion log for a given target crawler node node.
getDNSRecord(long, Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
 
getDNSServerIPLabel() - Method in class org.archive.modules.CrawlURI
 
getDoAuthentication() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true if the HTTP method should automatically handle HTTP authentication challenges (status code 401, etc.), false otherwise
getDomain() - Method in class org.apache.commons.httpclient.Cookie
Returns domain attribute of the cookie.
getDomain() - Method in class org.archive.modules.credential.Credential
 
getDomDocument(File) - Method in class org.archive.crawler.framework.CrawlJob
Read a file to a DOM Document; return null if this isn't possible for any reason.
getDoneDir() - Method in class org.archive.crawler.framework.ActionDirectory
 
getDotBegin() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getDotEnd() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getDumpPendingAtClose() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getDupByHashBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getDupByHashUrls() - Method in class org.archive.modules.fetcher.FetchStats
 
getEarliestNextURIEmitTime() - Method in class org.archive.modules.net.CrawlHost
Get the earliest time a URI for this host could be emitted.
getEffectiveVersion() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the HTTP version used with this method (may be null if undefined, that is, the method has not been executed)
getEmbedHopCount() - Method in class org.archive.modules.CrawlURI
Get the embed hop count.
getEnabled() - Method in class org.archive.modules.canonicalize.BaseRule
 
getEnabled() - Method in interface org.archive.modules.canonicalize.CanonicalizationRule
 
getEnabled() - Method in class org.archive.modules.deciderules.DecideRule
 
getEnabled() - Method in class org.archive.modules.Processor
 
getEngine() - Method in class org.archive.crawler.Heritrix
 
getEngine() - Method in class org.archive.crawler.restlet.EngineApplication
 
getEngine() - Method in class org.archive.crawler.restlet.EngineResource
 
getEngine() - Method in class org.archive.crawler.restlet.JobRelatedResource
 
getEngine() - Method in class org.archive.crawler.restlet.JobResource
 
getEngine() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
Get the proper ScriptEngine instance -- either shared or local to this thread.
getEngine() - Method in class org.archive.modules.ScriptedProcessor
Get the proper ScriptEngine instance -- either shared or local to this thread.
getEngineName() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
getEngineName() - Method in class org.archive.modules.ScriptedProcessor
 
getEnhDirectory() - Method in class org.archive.crawler.restlet.EnhDirectoryResource
 
getEntriesDescending() - Method in class org.archive.crawler.util.TopNSet
Get descending ordered list of key,count Entries.
getEntry(int) - Method in class org.archive.util.ms.DefaultBlockFileSystem
Returns the entry with the given number.
getEntryByFrequencySortedSet() - Static method in class org.archive.util.Histotable
Get a SortedSet that, when filled with (String key)->(long count) Entry instances, sorts them by (count, key) descending, as is useful for most-frequent displays.
getErrorPenaltyAmount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getException() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getException() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getExpectedConcurrency() - Method in class org.archive.bdb.BdbModule
 
getExpectedInserts() - Method in interface org.archive.util.BloomFilter
Report the number of expected inserts used at instantiation time to calculate the bitfield size.
getExpectedInserts() - Method in class org.archive.util.BloomFilter64bit
 
getExpirationOperation() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
getExpiryDate() - Method in class org.apache.commons.httpclient.Cookie
Returns the expiration Date of the cookie, or null if none exists.
getExtract404s() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getExtract404s() - Method in interface org.archive.modules.extractor.ExtractorParameters
Whether to extract links from responses with a 404 'not found' response code.
getExtractAllForms() - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
getExtractFromDirs() - Method in class org.archive.modules.fetcher.FetchFTP
Returns the extract.from.dirs attribute for this FetchFTP and the given curi.
getExtractIndependently() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getExtractIndependently() - Method in interface org.archive.modules.extractor.ExtractorParameters
Whether each extractor should make an independent decision as to whether it can extract links from a URI's content (when value is true), or whether a previous extractor's success (marking the URI as hasBeenLinkExtracted) should cancel later extractors (when value is false).
getExtractJavascript() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getExtractOnlyFormGets() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getExtractorJS() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getExtractorJS() - Method in class org.archive.modules.extractor.ExtractorSWF
 
getExtractorParameters() - Method in class org.archive.modules.extractor.Extractor
 
getExtractParent() - Method in class org.archive.modules.fetcher.FetchFTP
Returns the extract.parent attribute for this FetchFTP and the given curi.
getExtractValueAttributes() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getExtraInfo() - Method in class org.archive.modules.CrawlURI
 
getFetchAttempts() - Method in class org.archive.modules.CrawlURI
Get the count of attempts (trips through the processing loop) at getting the document referenced by this URI.
getFetchBeginTime() - Method in class org.archive.modules.CrawlURI
 
getFetchChain() - Method in class org.archive.crawler.framework.CrawlController
 
getFetchCompletedTime() - Method in class org.archive.modules.CrawlURI
 
getFetchDisregards() - Method in class org.archive.modules.fetcher.FetchStats
 
getFetchDuration() - Method in class org.archive.modules.CrawlURI
 
getFetchNonResponses() - Method in class org.archive.modules.fetcher.FetchStats
 
getFetchResponses() - Method in class org.archive.modules.fetcher.FetchStats
 
getFetchStatus() - Method in class org.archive.modules.CrawlURI
Return the overall/fetch status of this CrawlURI for its current trip through the processing loop.
getFetchSuccesses() - Method in class org.archive.modules.fetcher.FetchStats
 
getFetchType() - Method in class org.archive.modules.CrawlURI
 
getFile() - Method in class org.archive.spring.ConfigPath
 
getFileDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns a HashMap that contains information about distributions of encountered mime types.
getFilename() - Method in class org.archive.crawler.reporting.CrawlSummaryReport
 
getFilename() - Method in class org.archive.crawler.reporting.FrontierNonemptyReport
 
getFilename() - Method in class org.archive.crawler.reporting.FrontierSummaryReport
 
getFilename() - Method in class org.archive.crawler.reporting.HostsReport
 
getFilename() - Method in class org.archive.crawler.reporting.MimetypesReport
 
getFilename() - Method in class org.archive.crawler.reporting.ProcessorsReport
 
getFilename() - Method in class org.archive.crawler.reporting.Report
 
getFilename() - Method in class org.archive.crawler.reporting.ResponseCodeReport
 
getFilename() - Method in class org.archive.crawler.reporting.SeedsReport
 
getFilename() - Method in class org.archive.crawler.reporting.SourceTagsReport
 
getFilename() - Method in class org.archive.crawler.reporting.ToeThreadsReport
 
getFilename() - Method in enum org.archive.crawler.util.Logs
 
getFilePos() - Method in class org.archive.util.ms.Piece
 
getFileRepresentation() - Method in class org.archive.crawler.restlet.EditRepresentation
 
getFirstARecord(Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
 
getFirstKey() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getFlashes(Request) - Static method in class org.archive.crawler.restlet.Flash
 
getFollowRedirects() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true if the HTTP method should automatically follow HTTP redirects (status code 302, etc.), false otherwise.
getForceQueueAssignment() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
 
getForceRetire() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getForceRetire() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getForgetAllButLatest() - Method in class org.archive.checkpointing.Checkpoint
 
getForgetAllButLatest() - Method in class org.archive.crawler.framework.CheckpointService
 
getFormat() - Method in class org.archive.modules.canonicalize.RegexRule
 
getFormat() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
getFormItems() - Method in class org.archive.modules.credential.HtmlFormCredential
 
getFormProvince(CrawlURI) - Method in class org.archive.modules.forms.FormLoginProcessor
Get the 'form province' - either the configured (applicableSurtPrefix) or inferred (full current server) range of URIs that is considered covered by one form login
getFpset() - Method in class org.archive.crawler.util.FPUriUniqFilter
 
getFrequentFlushes() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getFrom(String, int, Pattern, boolean) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getFrom() - Method in class org.archive.modules.CrawlMetadata
 
getFrom() - Method in interface org.archive.modules.fetcher.UserAgentProvider
 
getFromSeries(String, int, int) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log spread across a numbered series of files.
getFrontier() - Method in class org.archive.crawler.framework.ActionDirectory
 
getFrontier() - Method in class org.archive.crawler.framework.CrawlController
 
getFrontier() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getFrontier() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getFrontier() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getFrontier() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
getFrontierJournal() - Method in interface org.archive.crawler.framework.Frontier
 
getFrontierJournal() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getFrontierPreparer() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getFrontierReportShort() - Method in class org.archive.crawler.framework.CrawlController
 
getFullVia() - Method in class org.archive.modules.CrawlURI
 
getGroup(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Get the 'frontier group' (usually queue) for the given CrawlURI.
getGroup(CrawlURI) - Method in class org.archive.crawler.frontier.BdbFrontier
 
getGroupMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getGroupMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getGroupMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getGroupMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHarvester() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getHashCount() - Method in interface org.archive.util.BloomFilter
Report the number of internal independent hash function (and thus the number of bits set/checked for each item presented).
getHashCount() - Method in class org.archive.util.BloomFilter64bit
 
getHeritrixHome() - Static method in class org.archive.crawler.Heritrix
Exploit -Dheritrix.home if available to us.
getHeritrixVersion() - Method in class org.archive.crawler.framework.Engine
 
getHistoryDbName() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
getHistoryDbName() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
 
getHistoryLength() - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
 
getHolder() - Method in class org.archive.modules.CrawlURI
Return the 'holder' for the convenience of an external facility.
getHolderCost() - Method in class org.archive.modules.CrawlURI
Return the 'holderCost' for convenience of external facility (frontier)
getHolderKey() - Method in class org.archive.modules.CrawlURI
Return the 'holderKey' for convenience of an external facility (Frontier).
getHopChar() - Method in enum org.archive.modules.extractor.Hop
Returns a hop character suitable for display in logs.
getHopCount() - Method in class org.archive.modules.CrawlURI
Get total hops from seed.
getHopString() - Method in enum org.archive.modules.extractor.Hop
 
getHopType() - Method in class org.archive.modules.extractor.Link
 
getHost() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the host.
getHostAddress(CrawlURI) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
from WriterPoolProcessor
getHostAddress(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
Return IP address of given URI suitable for recording (as in a classic ARC 5-field header line).
getHostAddress(String) - Static method in class org.archive.util.DNSJavaUtil
Return an InetAddress for passed host.
getHostAuthState() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the target host authentication state
getHostConfiguration() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
no longer applicable
getHostFor(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
Get the CrawlHost associated with name.
getHostFor(String) - Method in class org.archive.modules.net.ServerCache
 
getHostFor(UURI) - Method in class org.archive.modules.net.ServerCache
Get the CrawlHost associated with curi.
getHostLastFinished(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns the time (in millisec) when a URI belonging to a given host was last finished processing.
getHostMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getHostMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHostMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHostMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHostMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHostName() - Method in class org.archive.modules.net.CrawlHost
Get the host name.
getHrefPath(File, CrawlJob) - Static method in class org.archive.crawler.restlet.JobResource
Get a usable HrefPath, relative to the JobResource, for the given file.
getHtmlOutput() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getHtmlOutput() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getHttpAuthChallenges() - Method in class org.archive.modules.CrawlURI
 
getHttpAuthChallenges() - Method in class org.archive.modules.net.CrawlServer
 
getHttpBindAddress() - Method in class org.archive.modules.fetcher.FetchHTTP
Local IP address or hostname to use when making connections (binding sockets).
getHttpConnectionManager() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the httpConnectionManager.
getHttpMethod() - Method in class org.archive.modules.CrawlURI
 
getHttpMethod() - Method in class org.archive.modules.credential.HtmlFormCredential
 
getHttpProxyHost() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getHttpProxyPassword() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getHttpProxyPort() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getHttpProxyUser() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getIgnoreCookies() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getIgnoreFormActionUrls() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getIgnoreUnexpectedHtml() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getImportedConfigs(File) - Method in class org.archive.crawler.framework.CrawlJob
Return all config files included via 'import' statements in the primary config (or other included configs).
getInactiveQueuesByPrecedence() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getInactiveQueuesByPrecedence() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return a sorted map of all queues of WorkQueue keys, keyed by precedence
getInactiveQueuesForPrecedence(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Get the queue of inactive uri-queue names at the given precedence.
getIncrementCounts() - Method in class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
 
getIndex() - Method in interface org.archive.util.ms.Entry
 
getInferRootPage() - Method in class org.archive.modules.extractor.ExtractorHTTP
 
getInFromFile(String) - Method in class org.archive.modules.extractor.PDFParser
Read a file named 'doc' and store its' bytes for later processing.
getInitialDelaySeconds() - Method in class org.archive.crawler.framework.ActionDirectory
 
getInProcessCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
The number of CrawlURIs 'in process' (passed to the outbound queue and not yet finished by returning through the inbound queue.)
getInProcessCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getInputStream() - Method in class org.archive.net.s3.S3URLConnection
Get an InputStream for the object, connecting to S3 if connect() hasn't been called yet.
getInstance(String) - Static method in class org.archive.net.UURIFactory
 
getInstance(UURI, String) - Static method in class org.archive.net.UURIFactory
 
getIntervalSeconds() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getIP() - Method in class org.archive.modules.net.CrawlHost
Get the IP address for this host.
getIpAddresses() - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
getIpFetched() - Method in class org.archive.modules.net.CrawlHost
Get the time when the IP address for this host was last looked up.
getIpTTL() - Method in class org.archive.modules.net.CrawlHost
Get the TTL value from the dns record for this host.
getIpValidityDurationSeconds() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getIsolateThreads() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
getIsolateThreads() - Method in class org.archive.modules.ScriptedProcessor
 
getIteratorOfURLsSuccessfullyCrawledFromSeedUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getJavaInitializationString() - Method in class org.archive.io.ReadSourceEditor
 
getJavaInitializationString() - Method in class org.archive.spring.ConfigPathEditor
 
getJeLogsFilter() - Static method in class org.archive.crawler.util.CheckpointUtils
 
getJob(String) - Method in class org.archive.crawler.framework.Engine
 
getJobConfigs() - Method in class org.archive.crawler.framework.Engine
 
getJobContext() - Method in class org.archive.crawler.framework.CrawlJob
 
getJobDir() - Method in class org.archive.crawler.framework.CrawlJob
 
getJobDirectoryFrom(File) - Method in class org.archive.crawler.framework.Engine
Return the job directory File read from the supplied ".jobpath" file, or null on any error.
getJobLog() - Method in class org.archive.crawler.framework.CrawlJob
 
getJobLogger() - Method in class org.archive.crawler.framework.CrawlJob
Get a logger to a distinguished file, job.log in the job's directory, into which job-specific events may be reported.
getJobName() - Method in class org.archive.modules.CrawlMetadata
 
getJobsDir() - Method in class org.archive.crawler.framework.Engine
 
getJobStatusDescription() - Method in class org.archive.crawler.framework.CrawlJob
 
getJumpTarget() - Method in class org.archive.modules.ProcessResult
 
getKeepSnapshotsCount() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getKey() - Method in class org.archive.crawler.frontier.WorkQueue
 
getKey() - Method in class org.archive.crawler.reporting.SeedRecord
 
getKey() - Method in class org.archive.modules.credential.Credential
 
getKey() - Method in class org.archive.modules.credential.HtmlFormCredential
 
getKey() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getKey() - Method in class org.archive.modules.net.CrawlHost
 
getKey() - Method in class org.archive.modules.net.CrawlServer
 
getKey() - Method in interface org.archive.util.IdentityCacheable
 
getKey() - Method in class org.archive.util.IdentityCacheableWrapper
 
getKeyedProperties() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getKeyedProperties() - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
getKeyedProperties() - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
getKeyedProperties() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
 
getKeyedProperties() - Method in class org.archive.modules.canonicalize.BaseRule
 
getKeyedProperties() - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
 
getKeyedProperties() - Method in class org.archive.modules.CrawlMetadata
 
getKeyedProperties() - Method in class org.archive.modules.credential.CredentialStore
 
getKeyedProperties() - Method in class org.archive.modules.deciderules.DecideRule
 
getKeyedProperties() - Method in class org.archive.modules.Processor
 
getKeyedProperties() - Method in class org.archive.modules.ProcessorChain
 
getKeyedProperties() - Method in interface org.archive.spring.HasKeyedProperties
 
getKind() - Method in class org.archive.crawler.restlet.Flash
 
getKryo() - Method in class org.archive.bdb.KryoBinding
 
getLargest() - Method in class org.archive.crawler.util.TopNSet
 
getLargestQueuesCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
remember this many largest queues for reporting's sake; actual tracking can be somewhat approximate when some queues shrink before others' sizes are again noted, or if the size is adjusted mid-crawl.
getLargestValue() - Method in class org.archive.util.Histotable
Return the largest value of any key that is larger than 0.
getLastActivityTime() - Method in class org.archive.crawler.framework.CrawlJob
 
getLastCacheMissDiff() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getLastHop() - Method in class org.archive.modules.CrawlURI
convenience access to last hop character, as string
getLastLaunch() - Method in class org.archive.crawler.framework.CrawlJob
 
getLastLaunchTime() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
getLastResponseInputStream() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the stream used to read the last response's body.
getLastSnapshot() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getLastSuccessTime() - Method in class org.archive.modules.fetcher.FetchStats
 
getLaunchCount() - Method in class org.archive.crawler.framework.CrawlJob
 
getLinesExecuted() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getLinesExecuted() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getLinkCount() - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
getLinkHopCount() - Method in class org.archive.modules.CrawlURI
Get the link hop count.
getListLogicalOr() - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
 
getLiveHostReportSize() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getLocalAddress() - Method in class org.apache.commons.httpclient.HttpConnection
Return the local address used when creating the connection.
getLocalName() - Method in class org.archive.crawler.processor.CrawlMapper
 
getLogExtraInfo() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getLogFile() - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
getLogger() - Static method in class org.archive.crawler.util.RecoveryLogMapper
 
getLoggerModule() - Method in class org.archive.crawler.framework.CrawlController
 
getLoggerModule() - Method in class org.archive.crawler.framework.Scoper
 
getLoggerModule() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getLoggerModule() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getLoggerModule() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getLoggerModule() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
getLoggerModule() - Method in class org.archive.modules.extractor.Extractor
 
getLoggerModule() - Method in class org.archive.modules.forms.FormLoginProcessor
 
getLogin() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getLoginPassword() - Method in class org.archive.modules.forms.FormLoginProcessor
 
getLoginUri() - Method in class org.archive.modules.credential.HtmlFormCredential
 
getLoginUsername() - Method in class org.archive.modules.forms.FormLoginProcessor
 
getLogRejectsRule() - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
getLogToFile() - Method in class org.archive.crawler.framework.Scoper
 
getLogToFile() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
getLookup() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
getLowerBound() - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Returns the lower bound on the range of acceptable status codes.
getLowerBound() - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Returns the lower bound on the range of acceptable status codes.
getLowerBound() - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
 
getMap() - Method in class org.archive.spring.Sheet
Return map of full bean-path (starting with a target bean-name) to the alternate value for that targeted property
getMap() - Method in class org.archive.util.ObjectIdentityMemCache
Offer raw map access for convenience of checkpoint/recovery.
getMapPath() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
getMapUri() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
getMaxAttributeNameLength() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getMaxAttributeValLength() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getMaxBytesDownload() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
getMaxDelayMs() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getMaxDocumentsDownload() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
getMaxElementLength() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getMaxFetchKBSec() - Method in class org.archive.modules.fetcher.FetchFTP
 
getMaxFetchKBSec() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getMaxFileSizeBytes() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getMaxFileSizeBytes() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMaxHops() - Method in class org.archive.modules.deciderules.TooManyHopsDecideRule
 
getMaxInWait() - Method in class org.archive.crawler.frontier.AbstractFrontier
Maximum amount of time to wait for an inbound update event before giving up and rechecking on the ability to further fill the outbound queue.
getMaxInWait() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getMaxLengthBytes() - Method in class org.archive.modules.fetcher.FetchFTP
 
getMaxLengthBytes() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getMaxOutlinks() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getMaxOutlinks() - Method in interface org.archive.modules.extractor.ExtractorParameters
The maximum number of outlinks to discover from any URI's content.
getMaxPathDepth() - Method in class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
 
getMaxPathLength() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getMaxPerHostBandwidthUsageKbSec() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getMaxQueuesPerReportCategory() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getMaxRepetitions() - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
 
getMaxRetries() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getMaxSegLength() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getMaxSize() - Method in class org.archive.crawler.util.TopNSet
 
getMaxSizeToDigest() - Method in class org.archive.modules.extractor.HTTPContentDigest
 
getMaxSizeToParse() - Method in class org.archive.modules.extractor.ExtractorPDF
 
getMaxSizeToParse() - Method in class org.archive.modules.extractor.ExtractorUniversal
 
getMaxSpeculativeHops() - Method in class org.archive.modules.deciderules.TransclusionDecideRule
 
getMaxTimeSeconds() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
getMaxToeThreads() - Method in class org.archive.crawler.framework.CrawlController
 
getMaxTotalBytesToWrite() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMaxTransHops() - Method in class org.archive.modules.deciderules.TransclusionDecideRule
 
getMaxWaitForIdleMs() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMessage() - Method in class org.archive.crawler.event.CrawlStateEvent
 
getMessage() - Method in class org.archive.crawler.restlet.Flash
 
getMetadata() - Method in class org.archive.crawler.framework.CrawlController
 
getMetadata() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getMetadata() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getMetadata() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getMetadata() - Method in class org.archive.modules.writer.ARCWriterProcessor
 
getMetadata() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getMetadata() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMetadataProvider() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMethodRetryHandler() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
use HttpMethodParams
getMigrateMap() - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
 
getMinDelayMs() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getModuleClass() - Method in class org.archive.state.ModuleTestBase
Returns the class of the module to test.
getMonitorConfigPaths() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getMonitorMounts() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
getMonitorPaths() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getName() - Method in class org.apache.commons.httpclient.HttpMethodBase
Obtains the name of the HTTP method as used in the HTTP request line, for example "GET" or "POST".
getName() - Method in class org.archive.checkpointing.Checkpoint
 
getName() - Method in class org.archive.modules.net.CrawlServer
 
getName() - Method in class org.archive.spring.ConfigPath
 
getName() - Method in class org.archive.spring.Sheet
 
getName() - Method in interface org.archive.util.ms.Entry
 
getNamedUserAgents() - Method in class org.archive.modules.net.Robotstxt
 
getNavlinksOnly() - Method in class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
 
getNext() - Method in interface org.archive.util.ms.Entry
 
getNextBlock(int) - Method in interface org.archive.util.ms.BlockFileSystem
Returns the number of the block that follows the given block.
getNextBlock(int) - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getNextCheckpointNumber() - Method in class org.archive.crawler.framework.CheckpointService
 
getNextNearestItem(DatabaseEntry, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getNonfatalErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getNonfatalErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getNonFatalFailures() - Method in class org.archive.modules.CrawlURI
 
getNotModifiedBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getNotModifiedUrls() - Method in class org.archive.modules.fetcher.FetchStats
 
getNovelBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getNovelUrls() - Method in class org.archive.modules.fetcher.FetchStats
 
getObjectCache(String, boolean, Class<V>) - Method in class org.archive.bdb.BdbModule
 
getObjectCache(String, boolean, Class<V>, Class<? extends V>) - Method in class org.archive.bdb.BdbModule
Get an ObjectIdentityCache, backed by a BDB Database of the given name, with objects of the given valueClass type.
getOIBCCache(String, boolean, Class<? extends V>) - Method in class org.archive.bdb.BdbModule
Get an ObjectIdentityBdbCache, backed by a BDB Database of the given name, with the given value class type.
getOnlyStoreIfWriteTagPresent() - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
 
getOperator() - Method in class org.archive.modules.CrawlMetadata
 
getOperatorContactUrl() - Method in class org.archive.modules.CrawlMetadata
 
getOperatorFrom() - Method in class org.archive.modules.CrawlMetadata
 
getOrCreateSheet(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Get a Sheet of the given name, or create if it does not already exist.
getOrder() - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
getOrder() - Method in class org.archive.spring.ConfigPathConfigurer
Act as late as possible.
getOrdinal() - Method in class org.archive.modules.CrawlURI
Get the ordinal (serial number) assigned at creation.
getOrganization() - Method in class org.archive.modules.CrawlMetadata
 
getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityBdbCache
 
getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
getOrUse(String, Supplier<V>) - Method in interface org.archive.util.ObjectIdentityCache
get the object under the given key/name, using (and remembering) the object supplied by the supplier if no prior mapping exists -- but should not mutate object state
getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityMemCache
 
getOutCandidates() - Method in class org.archive.modules.CrawlURI
Returns discovered candidate URIs.
getOutlinkRule() - Method in class org.archive.crawler.processor.CrawlMapper
 
getOutLinks() - Method in class org.archive.modules.CrawlURI
Returns discovered links.
getOverlayMap(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Retrieve the named overlay Map.
getOverlayMap(String) - Method in class org.archive.modules.CrawlURI
 
getOverlayMap(String) - Method in interface org.archive.spring.OverlayContext
get the map corresponding to the overlay name
getOverlayMap(String) - Method in interface org.archive.spring.OverlayMapsSource
 
getOverlayNames() - Method in class org.archive.modules.CrawlURI
 
getOverlayNames() - Method in interface org.archive.spring.OverlayContext
return a list of the names of overlay maps to consider
getOverrideKeys(String) - Method in class org.archive.spring.KeyedProperties
Compose the complete keys (externalPath + local key name) to use for checking for contextual overrides.
getParallelQueues() - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
The number of parallel queues to split a core key into.
getParams() - Method in class org.apache.commons.httpclient.HttpConnection
Returns HTTP protocol parameters associated with this method.
getParams() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns HTTP protocol parameters associated with this method.
getPassword() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getPassword() - Method in class org.archive.modules.fetcher.FetchFTP
 
getPath() - Method in class org.apache.commons.httpclient.Cookie
Returns the path attribute of the cookie
getPath() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the path of this HTTP method.
getPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getPath() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getPath() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getPath() - Method in class org.archive.spring.ConfigPath
 
getPath() - Method in class org.archive.spring.ConfigPathConfigurer
 
getPathFromSeed() - Method in class org.archive.modules.CrawlURI
 
getPathQuery(CrawlURI) - Method in class org.archive.modules.net.RobotsPolicy
 
getPattern() - Method in enum org.archive.modules.deciderules.MatchesFilePatternDecideRule.Preset
 
getPauseAtStart() - Method in class org.archive.crawler.framework.CrawlController
 
getPauseThresholdKb() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
getPauseThresholdMiB() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getPersistentDataKeys() - Static method in class org.archive.modules.CrawlURI
Add the key of items you want to persist across processings.
getPersistentDataMap() - Method in class org.archive.modules.CrawlURI
 
getPolicyBasisUURI() - Method in class org.archive.modules.CrawlURI
Get the UURI that should be used as the basis of policy/overlay decisions.
getPolitenessDelay() - Method in class org.archive.modules.CrawlURI
 
getPool() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getPoolMaxActive() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getPort() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the port of the host.
getPort() - Method in class org.archive.modules.net.CrawlServer
Get the port number for this server.
getPrecedence() - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
getPrecedence() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
getPrecedence() - Method in class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
 
getPrecedence() - Method in class org.archive.crawler.frontier.WorkQueue
 
getPrecedence() - Method in class org.archive.modules.CrawlURI
 
getPrecedenceFloor() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getPrecedenceProvider() - Method in class org.archive.crawler.frontier.WorkQueue
 
getPreferenceDepthHops() - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
getPreferenceDepthHops() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getPreferenceEmbedHops() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getPreferredVariant() - Method in class org.archive.crawler.restlet.BaseResource
If client can accept text/html, always prefer it.
getPrefix() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getPrefixClassKey(byte[]) - Static method in class org.archive.crawler.frontier.BdbWorkQueue
 
getPreloadSource() - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
getPreloadSourceUrl() - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
Return the authentication URI, either absolute or relative, that serves as prerequisite the passed curi.
getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
 
getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
Get the prerequisite for this URI.
getPrevious() - Method in interface org.archive.util.ms.Entry
 
getPrimaryConfig() - Method in class org.archive.crawler.framework.CrawlJob
 
getPrimaryConfigurationPath() - Method in class org.archive.spring.PathSharingContext
 
getProcessErrorOutlinks() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getProcessors() - Method in class org.archive.modules.ProcessorChain
 
getProcessStatus() - Method in class org.archive.modules.ProcessResult
 
getProfileCxmlResource() - Method in class org.archive.crawler.framework.Engine
 
getProgressLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getProgressStamp() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getProgressStatisticsLine() - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
Return one line of current progress-statistics
getProgressStats() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getPropertyDescriptors(BeanWrapperImpl) - Method in class org.archive.crawler.restlet.JobRelatedResource
Get and modify the PropertyDescriptors associated with the BeanWrapper.
getProtocol() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the protocol used to establish the connection.
getProxyAuthenticationRealm() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
use #getProxyAuthState()
getProxyAuthState() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the proxy authentication state
getProxyCredentials(String, String) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use #getProxyCredentials(AuthScope)
getProxyCredentials(AuthScope) - Method in class org.apache.commons.httpclient.HttpState
Get the proxy credentials for the given authentication scope.
getProxyHost() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the proxy host.
getProxyPort() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the port of the proxy host.
getPseudoXpath(Node) - Static method in class org.archive.crawler.migrate.MigrateH1to3Tool
Given a node, give back an XPath-like string that addresses it.
getQueryString() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the query string of this HTTP method.
getQueueAssignmentPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getQueueFor(String) - Method in class org.archive.crawler.frontier.BdbFrontier
Return the work queue for the given classKey, or null if no such queue exists.
getQueueFor(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the work queue for the given classKey, or null if no such queue exists.
getQueuePrecedencePolicy() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getQueueTotalBudget() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getRawInput() - Method in interface org.archive.util.ms.BlockFileSystem
Returns the raw input stream for this file system.
getRawInput() - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getRawOutput() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getRawOutput() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getReader() - Method in class org.archive.crawler.restlet.EditRepresentation
 
getReader() - Method in class org.archive.crawler.restlet.PagedRepresentation
 
getRealm() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getRecheckScope() - Method in class org.archive.crawler.prefetch.Preselector
 
getRecheckThresholdKb() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
getRecordedFinishes() - Method in class org.archive.modules.fetcher.FetchStats
 
getRecordedSize() - Method in class org.archive.modules.CrawlURI
Get size of data recorded (transferred)
getRecordedSize(CrawlURI) - Static method in class org.archive.modules.Processor
 
getRecorder() - Method in class org.archive.modules.CrawlURI
Get the http recorder associated with this uri.
getRecorderInBufferBytes() - Method in class org.archive.crawler.framework.CrawlController
 
getRecorderOutBufferBytes() - Method in class org.archive.crawler.framework.CrawlController
 
getRecordID() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getRecordIDGenerator() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getRecoverableExceptionCount() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
no longer used Returns the number of "recoverable" exceptions thrown and handled, to allow for monitoring the quality of the connection.
getRecoveryCheckpoint() - Method in class org.archive.crawler.framework.CheckpointService
 
getRecoveryLogEnabled() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getRedirectUri() - Method in class org.archive.crawler.reporting.SeedRecord
 
getReducePrefixRegex() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getReduceRegex(CrawlURI) - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getReference(ObjectName) - Static method in class org.archive.util.JndiUtils
 
getRegex() - Method in class org.archive.modules.canonicalize.RegexRule
 
getRegex() - Method in class org.archive.modules.deciderules.MatchesFilePatternDecideRule
Use a preset if configured to do so.
getRegex() - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
 
getRegex() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
getRegexList() - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
 
getRemaining() - Method in class org.archive.modules.fetcher.FetchStats
 
getRemoveTriggerUris() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
getReplyStrings() - Method in class org.archive.net.ClientFTP
 
getReports() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getReportsDir() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getRepresentation(Status, Request, Response) - Method in class org.archive.crawler.restlet.EngineApplication.EngineStatusService
 
getRequestCharSet() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the character encoding of the request from the Content-Type header.
getRequestHeader(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the specified request header.
getRequestHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the header group storing the request headers.
getRequestHeaders() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns an array of the requests headers that the HTTP method currently has
getRequestHeaders(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
 
getRequestOutputStream() - Method in class org.apache.commons.httpclient.HttpConnection
Returns an OutputStream suitable for writing the request.
getRescheduleDelaySeconds() - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
 
getRescheduleTime() - Method in class org.archive.modules.CrawlURI
 
getResourceDir() - Method in class org.archive.state.ModuleTestBase
Returns the location of the Java resources directory for your project.
getRespectCrawlDelayUpToSeconds() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getResponseBody() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the response body of the HTTP method, if any, as an array of bytes.
getResponseBodyAsStream() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the response body of the HTTP method, if any, as an InputStream.
getResponseBodyAsString() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the response body of the HTTP method, if any, as a String.
getResponseCharSet() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the character encoding of the response from the Content-Type header.
getResponseContentLength() - Method in class org.apache.commons.httpclient.HttpMethodBase
Return the length (in bytes) of the response body, as specified in a Content-Length header.
getResponseFooter(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the response footer associated with the given name.
getResponseFooters() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns an array of the response footers that the HTTP method currently has in the order in which they were read.
getResponseHeader(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the response header associated with the given name.
getResponseHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the header group storing the response headers.
getResponseHeaders(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
 
getResponseHeaders() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns an array of the response headers that the HTTP method currently has in the order in which they were read.
getResponseInputStream() - Method in class org.apache.commons.httpclient.HttpConnection
Return a InputStream suitable for reading the response.
getResponseStream() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns a stream from which the body of the current response may be read.
getResponseTrailerHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the header group storing the response trailer headers as per RFC 2616 section 3.6.1.
getRetiredQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getRetiredQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return queue of all retired queue names.
getRetryDelaySeconds() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getReverseSortedCopy(Map<String, AtomicLong>) - Method in class org.archive.crawler.reporting.StatisticsTracker
Sort the entries of the given Map in descending order by their values, which must be longs wrapped with AtomicLong.
getReverseSortedHostCounts(Map<String, AtomicLong>) - Method in class org.archive.crawler.reporting.StatisticsTracker
Return a copy of the hosts distribution in reverse-sorted (largest first) order.
getRobotsDenials() - Method in class org.archive.modules.fetcher.FetchStats
 
getRobotsPolicy() - Method in class org.archive.modules.CrawlMetadata
Get the currently-effective RobotsPolicy, as specified by the string name and chosen from the full available map.
getRobotsPolicyName() - Method in class org.archive.modules.CrawlMetadata
 
getRobotstxt() - Method in class org.archive.modules.net.CrawlServer
 
getRobotsValidityDurationSeconds() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getRoot() - Method in interface org.archive.util.ms.BlockFileSystem
Returns the root entry of the file system.
getRoot() - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getRotationDigits() - Method in class org.archive.crawler.processor.CrawlMapper
 
getRuleAssociations() - Method in class org.archive.crawler.spring.SheetOverlaysManager
All DecideRuledSheetAssociations, in Ordered order
getRules() - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
getRules() - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
 
getRules() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
getRuntimeErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getRuntimeErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getRuntimeSeconds() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
getRunWhileEmpty() - Method in class org.archive.crawler.framework.CrawlController
 
getSchedulingDirective(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
Calculate the coarse, original 'schedulingDirective' prioritization for the given CrawlURI
getSchedulingDirective() - Method in class org.archive.modules.CrawlURI
 
getSchedulingFor(CrawlURI, Link, int) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
Determine scheduling for the curi.
getSchemes() - Method in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
 
getScope() - Method in interface org.archive.crawler.framework.Frontier
 
getScope() - Method in class org.archive.crawler.framework.Scoper
 
getScope() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getScratchDir() - Method in class org.archive.crawler.framework.CrawlController
 
getScratchDisk() - Method in interface org.archive.modules.extractor.TempDirProvider
 
getScratchDisk() - Method in class org.archive.modules.net.DefaultTempDirProvider
 
getScript() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getScript() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getScriptSource() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
getScriptSource() - Method in class org.archive.modules.ScriptedProcessor
 
getSecure() - Method in class org.apache.commons.httpclient.Cookie
 
getSeedCollection() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSeedForUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
Returns seed for urlString (null if seed not found).
getSeedListeners() - Method in class org.archive.modules.seeds.SeedModule
 
getSeeds() - Method in class org.archive.crawler.framework.ActionDirectory
 
getSeeds() - Method in class org.archive.crawler.framework.CrawlController
 
getSeeds() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getSeeds() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getSeeds() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getSeeds() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getSeedsAsSurtPrefixes() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getSeedsIterator() - Method in class org.archive.crawler.reporting.StatisticsTracker
Get a seed iterator for the job being monitored.
getSeedsRedirectNewSeeds() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getSeedsRedirectNewSeeds() - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
getSeedUrlToDiscoveredUrlsMap() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSendBufferSize() - Method in class org.apache.commons.httpclient.HttpConnection
Gets the socket's sendBufferSize.
getSendConnectionClose() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSendIfModifiedSince() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSendIfNoneMatch() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSendRange() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSendReferer() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSerialNo() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getSerialNumber() - Method in class org.archive.crawler.framework.ToeThread
 
getServerCache() - Method in class org.archive.crawler.framework.CrawlController
 
getServerCache() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getServerCache() - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
getServerCache() - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
getServerCache() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getServerCache() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getServerCache() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getServerCache() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getServerCache() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
getServerCache() - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
getServerCache() - Method in class org.archive.modules.fetcher.FetchDNS
 
getServerCache() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getServerCache() - Method in class org.archive.modules.fetcher.FetchWhois
 
getServerCache() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getServerCache() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getServerFor(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
Get the CrawlServer associated with name.
getServerFor(String) - Method in class org.archive.modules.net.ServerCache
 
getServerFor(UURI) - Method in class org.archive.modules.net.ServerCache
Get the CrawlServer associated with curi.
getServerKey(UURI) - Static method in class org.archive.modules.net.CrawlServer
Get key to use doing lookup on server instances.
getServerMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getServerMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getServerMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getServerMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getSessionBalance() - Method in class org.archive.crawler.frontier.WorkQueue
 
getSessionBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Return current session 'activity budget balance'
getSheetOverlaysManager() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getSheetOverlaysManager() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getSheetsByName() - Method in class org.archive.crawler.spring.SheetOverlaysManager
Sheets, by name; starts with all autowired Sheets but others may be added by other means (mid-crawl reconfiguration).
getSheetsNamesBySurt() - Method in class org.archive.crawler.spring.SheetOverlaysManager
Sheet names, by the SURT prefix to which they should be applied.
getShortName() - Method in class org.archive.checkpointing.Checkpoint
 
getShortName() - Method in class org.archive.crawler.framework.CrawlJob
 
getShouldFetchBodyRule() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getShouldMasquerade() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
getShouldMasquerade() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
getShouldProcessRule() - Method in class org.archive.modules.Processor
 
getShouldReportAtEndOfCrawl() - Method in class org.archive.crawler.reporting.Report
 
getShouldReportDuringCrawl() - Method in class org.archive.crawler.reporting.Report
 
getSizeBytes() - Method in interface org.archive.util.BloomFilter
The amount of memory in bytes consumed by the bloom bitfield.
getSizeBytes() - Method in class org.archive.util.BloomFilter64bit
 
getSkipIdenticalDigests() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getSlotState(long) - Method in class org.archive.util.AbstractLongFPSet
Check the state of a slot in the storage.
getSlotState(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
getSmallest() - Method in class org.archive.crawler.util.TopNSet
 
getSnapshot() - Method in class org.archive.crawler.event.StatSnapshotEvent
 
getSnapshot() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getSnoozedCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getSnoozeLongMs() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getSocket() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the connection socket.
getSortedByCounts() - Method in class org.archive.util.Histotable
 
getSortedByKeys() - Method in class org.archive.util.Histotable
 
getSortedDuplicates() - Method in class org.archive.bdb.BdbModule.BdbConfig
 
getSortKey() - Method in class org.apache.commons.httpclient.Cookie
Create a 'sort key' for this Cookie that will cause it to sort alongside other Cookies of the same domain (with or without leading '.').
getSoTimeout() - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnectionParams.getSoTimeout(), HttpConnection.getParams().
getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchFTP
 
getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchWhois
 
getSource() - Method in class org.archive.modules.extractor.Link
 
getSourceCodeDir() - Method in class org.archive.state.ModuleTestBase
Returns the location of the source code directory for your project.
getSourceTag() - Method in class org.archive.modules.CrawlURI
 
getSourceTagSeeds() - Method in class org.archive.modules.seeds.SeedModule
 
getSslTrustLevel() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getStackTrace() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getStartNewFilesOnCheckpoint() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getState() - Method in class org.archive.crawler.event.CrawlStateEvent
 
getState() - Method in class org.archive.crawler.framework.CrawlController
 
getStaticRef(String) - Method in class org.archive.crawler.restlet.BaseResource
 
getStaticRef(String) - Method in class org.archive.crawler.restlet.EditRepresentation
 
getStatisticsTracker() - Method in class org.archive.crawler.framework.CrawlController
 
getStatisticsTracker() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
getStats() - Method in class org.archive.crawler.framework.CrawlJob
 
getStatusCode() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the response status code.
getStatusCode() - Method in class org.archive.crawler.reporting.SeedRecord
 
getStatusCodeDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
Return a objectCache representing the distribution of status codes for successfully fetched curis, as represented by a cache where key -> val represents (string)code -> (integer)count.
getStatusCodes() - Method in class org.archive.modules.deciderules.FetchStatusDecideRule
 
getStatusLine() - Method in class org.apache.commons.httpclient.HttpMethodBase
Provides access to the response status line.
getStatusText() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the status text (or "reason phrase") associated with the latest response.
getStep() - Method in class org.archive.crawler.framework.ToeThread
 
getStoredMap(String, Class<K>, Class<V>, boolean, boolean) - Method in class org.archive.bdb.BdbModule
Creates a database-backed TempStoredSortedMap for transient reporting requirements.
getStoredQueue(String, Class<K>, boolean) - Method in class org.archive.bdb.BdbModule
 
getStorePaths() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getString(byte[], int, int, String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the byte array of HTTP content characters to a string.
getString(byte[], String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the byte array of HTTP content characters to a string.
getString(CrawlURI) - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
 
getString(CrawlURI) - Method in class org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule
 
getString(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusMatchesRegexDecideRule
 
getString(CrawlURI) - Method in class org.archive.modules.deciderules.HopsPathMatchesRegexDecideRule
 
getString(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
 
getStripRegex() - Method in class org.archive.modules.extractor.HTTPContentDigest
 
getSubContext(String) - Static method in class org.archive.util.JndiUtils
Get subcontext.
getSubContext(CompoundName) - Static method in class org.archive.util.JndiUtils
Get subcontext.
getSubqueue(UURI, int) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
getSubstats() - Method in class org.archive.crawler.frontier.WorkQueue
 
getSubstats() - Method in interface org.archive.modules.fetcher.FetchStats.HasFetchStats
 
getSubstats() - Method in class org.archive.modules.net.CrawlHost
 
getSubstats() - Method in class org.archive.modules.net.CrawlServer
 
getSuccess() - Method in class org.archive.checkpointing.Checkpoint
 
getSuccessBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getSuccessfullyCrawledUrls() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSuffixAtEnd() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getSupplementaryRule() - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
getSurtAuthority(String) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
getSurtPrefixes() - Method in class org.archive.crawler.spring.SurtPrefixesSheetAssociation
 
getSurtsDumpFile() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getSurtsSource() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getSurtsSourceFile() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Deprecated.
redundant now that we have SurtPrefixedDecideRule.surtsSource
getTags() - Method in class org.archive.io.ReadSourceEditor
 
getTags() - Method in class org.archive.spring.ConfigPathEditor
 
getTargetSheetNames() - Method in class org.archive.crawler.spring.SheetAssociation
 
getTemplate() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
getTemplate() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getTemplateConfiguration() - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
getTemplateConfiguration() - Method in class org.archive.crawler.restlet.EngineResource
 
getTemplateConfiguration() - Method in class org.archive.crawler.restlet.JobResource
 
getTemplateConfiguration() - Method in class org.archive.crawler.restlet.ScriptResource
 
getTestEnvironment(File) - Static method in class org.archive.util.bdbje.EnhancedEnvironment
Create a temporary test environment in the given directory.
getText(String) - Static method in class org.archive.util.ms.Doc
Returns the text of the .doc file with the given file name.
getText(File) - Static method in class org.archive.util.ms.Doc
Returns the text of the given .doc file.
getText(SeekInputStream) - Static method in class org.archive.util.ms.Doc
Returns the text of the given .doc file.
getText(BlockFileSystem, int) - Static method in class org.archive.util.ms.Doc
Returns the text for the given .doc file.
getTextSource() - Method in class org.archive.modules.seeds.TextSeedModule
 
getThreadNumber() - Method in class org.archive.modules.CrawlURI
Get the number of the ToeThread responsible for processing this uri.
getTimeoutSeconds() - Method in class org.archive.modules.fetcher.FetchFTP
 
getTimeoutSeconds() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getTmpDir() - Method in class org.archive.util.TmpDirTestCase
 
getToeCount() - Method in class org.archive.crawler.framework.CrawlController
 
getToeCount() - Method in class org.archive.crawler.framework.ToePool
 
getToePool() - Method in class org.archive.crawler.framework.CrawlController
 
getToeThreadReport() - Method in class org.archive.crawler.framework.CrawlController
 
getToeThreadReportShort() - Method in class org.archive.crawler.framework.CrawlController
 
getToeThreadReportShortData() - Method in class org.archive.crawler.framework.CrawlController
 
getTooLongDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getTopSet() - Method in class org.archive.crawler.util.TopNSet
Make internal map available (for checkpoint/restore purposes).
getTotal() - Method in class org.archive.util.Histotable
Return the total of all tallies.
getTotalBytes() - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
getTotalBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getTotalBytesWritten() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getTotalEligibleInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Total of all URIs in inactive queues at precedences above the floor
getTotalExpenditure() - Method in class org.archive.crawler.frontier.WorkQueue
Return the tally of all expenditures on this queue
getTotalInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Total of all URIs in inactive queues at all precedences
getTotalIneligibleInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Total of all URIs in inactive queues at precedences at or below the floor
getTotalScheduled() - Method in class org.archive.modules.fetcher.FetchStats
 
getTotalUrls() - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
getTrackSeeds() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getTrackSources() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getTransHops() - Method in class org.archive.modules.CrawlURI
Tally up the number of transitive (non-simple-link) hops at the end of this CrawlURI's pathFromSeed.
getTreatFramesAsEmbedLinks() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getType() - Method in interface org.archive.util.ms.Entry
 
getUnderscoreSet() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getUpperBound() - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Returns the upper bound on the range of acceptable status codes.
getUpperBound() - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Returns the upper bound on the range of acceptable status codes.
getUpperBound() - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
 
getURI() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the URI of the HTTP method
getUri() - Method in class org.archive.crawler.reporting.SeedRecord
 
getURI() - Method in class org.archive.modules.CrawlURI
 
getURICount() - Method in class org.archive.modules.Processor
Returns the number of URIs this processor has handled.
getUriErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getUriErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getUriPrecedencePolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getUriProcessing() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getUriRegex() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
getURIs() - Method in class org.archive.modules.extractor.PDFParser
Get a list of URIs retrieved from the Pdf during the extractURIs operation.
getURIsList(String, int, String, boolean) - Method in interface org.archive.crawler.framework.Frontier
Returns a list of all uncrawled URIs starting from a specified marker until numberOfMatches is reached.
getURIsList(String, int, String, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
Return list of urls.
getUriUniqFilter() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getURL(String, String) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
Overwrite handling of discovered URIs.
getUseHardLinkCheckpoints() - Method in class org.archive.bdb.BdbModule
 
getUseHeaderLength() - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
getUseHTTP11() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getUsePreset() - Method in class org.archive.modules.deciderules.MatchesFilePatternDecideRule
 
getUsePublicSuffixesRegex() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getUserAgent() - Method in class org.archive.modules.CrawlMetadata
 
getUserAgent() - Method in class org.archive.modules.CrawlURI
Get the user agent to use for crawling this URI.
getUserAgent() - Method in interface org.archive.modules.fetcher.UserAgentProvider
 
getUserAgentProvider() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getUserAgentTemplate() - Method in class org.archive.modules.CrawlMetadata
 
getUsername() - Method in class org.archive.modules.fetcher.FetchFTP
 
getUseSharedCache() - Method in class org.archive.bdb.BdbModule
 
getUURI() - Method in class org.archive.modules.CrawlURI
 
getValidator() - Method in class org.archive.crawler.framework.CheckpointService
 
getValidator() - Method in class org.archive.modules.CrawlMetadata
 
getValidator() - Method in interface org.archive.spring.HasValidator
 
getValidDateFormats() - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Returns the Collection of date patterns used for parsing.
getValidDateFormats() - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
 
getValidDateFormats() - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
getValidTestData() - Method in class org.archive.modules.extractor.StringExtractorTestBase
Returns an array of valid test data pairs.
getValue() - Method in class org.archive.io.ReadSourceEditor
 
getValue() - Method in class org.archive.spring.ConfigFileEditor
 
getValue() - Method in class org.archive.spring.ConfigPathEditor
 
getValue() - Method in class org.archive.spring.ConfigString
 
getVariants() - Method in class org.archive.crawler.restlet.EnhDirectoryResource
Add EditRepresentation as a variant when appropriate.
getVersion() - Method in class org.apache.commons.httpclient.Cookie
Returns the version of the cookie specification to which this cookie conforms.
getVia() - Method in class org.archive.modules.CrawlURI
 
getViaContext() - Method in class org.archive.modules.CrawlURI
 
getVirtualHost() - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
no longer applicable
getWakeTime() - Method in class org.archive.crawler.frontier.WorkQueue
 
getWhoisQuery(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
getWhoisServer(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
getWorkQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getWriteBufferSize() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getWriteMetadata() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getWriteRequests() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getWriteRevisitForIdenticalDigests() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getWriteRevisitForNotModified() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getXmlWriter(Writer) - Static method in class org.archive.crawler.restlet.XmlMarshaller
 
groovyTemplate() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
groovyTemplates - Variable in class org.archive.modules.extractor.ExtractorMultipleRegex
 
GROUP - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
gzipFile - Variable in class org.archive.io.CrawlerJournal
File we're writing journal to.

H

handle401(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
Server is looking for basic/digest auth credentials (RFC2617).
handlePrerequisite(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
The CrawlURI has a prerequisite; apply scoping and update Link to CrawlURI in manner analogous to outlink handling.
handleQueue(WorkQueue, boolean, long, long) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Send an active queue to its next state, based on the supplied parameters.
Handler - Class in org.archive.net.s3
Handler for Amazon S3 URLs of the form s3://id:secret@bucket/key
Handler() - Constructor for class org.archive.net.s3.Handler
 
handleSeed(CrawlURI, String) - Method in class org.archive.crawler.reporting.StatisticsTracker
If the curi is a seed, we update the processedSeeds cache.
handleUnregisteredClass(Class) - Method in class org.archive.bdb.AutoKryo
 
harvester - Variable in class org.archive.modules.writer.Kw3WriterProcessor
Name of the harvester that is used for the web harvesting.
HARVESTER_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
hasApplicationContext() - Method in class org.archive.crawler.framework.CrawlJob
 
hasAvailableCheckpoints() - Method in class org.archive.crawler.framework.CheckpointService
 
hasBeenLinkExtracted() - Method in class org.archive.modules.CrawlURI
If true then a link extractor has already claimed this CrawlURI and performed link extraction on the document content.
hasBeenLookedUp() - Method in class org.archive.modules.net.CrawlHost
Return true if the IP for this host has been looked up.
hasBeenUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true if the HTTP method has been already executed, but not recycled.
hasContentDigestHistory() - Method in class org.archive.modules.CrawlURI
 
hasCredentials() - Method in class org.archive.modules.CrawlURI
 
hasCredentials() - Method in class org.archive.modules.net.CrawlServer
 
hasData() - Method in class org.archive.modules.extractor.Link
 
hasErrors - Variable in class org.archive.modules.net.Robotstxt
 
hash(CharSequence, int, int) - Method in class org.archive.util.BloomFilter64bit
Hashes the given sequence with the given hash function.
hash(CharSequence) - Method in class org.archive.util.LongToIntConsistentHash
 
hashCode() - Method in class org.apache.commons.httpclient.Cookie
Returns a hash code in keeping with the Object.hashCode() general hashCode contract.
hashCode() - Method in class org.archive.modules.extractor.Link
 
hashCode() - Method in class org.archive.modules.extractor.LinkContext
 
hashCode() - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
All instances of DefaultProtocolSocketFactory have the same hash code.
hashCode() - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
hashCode() - Method in class org.archive.modules.net.CrawlHost
 
hashCode() - Method in class org.archive.modules.net.CrawlServer
 
HashCrawlMapper - Class in org.archive.crawler.processor
Maps URIs to one of N crawler names by applying a hash to the URI's (possibly-transformed) classKey.
HashCrawlMapper() - Constructor for class org.archive.crawler.processor.HashCrawlMapper
Constructor.
hashSet - Variable in class org.archive.crawler.util.MemUriUniqFilter
 
hasHttpAuthenticationCredential(CrawlURI) - Static method in class org.archive.modules.Processor
 
hasIdenticalDigest(CrawlURI) - Static method in class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
Utility method for testing if a CrawlURI's last two history entries (one being the most recent fetch) have identical content-digest information.
HasKeyedProperties - Interface in org.archive.spring
Interface indicating an object has an internal map of properties, and thus at least partially amenable to sheet-based contextual overriding of properties.
hasNext() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Test whether any items remain; loads next item into holding 'next' field.
hasNext() - Method in class org.archive.util.iterator.CompositeIterator
 
hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
 
hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
 
hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
hasPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
 
hasRfc2617Credential() - Method in class org.archive.modules.CrawlURI
 
hasStarted - Variable in class org.archive.crawler.framework.CrawlController
 
hasStarted() - Method in class org.archive.crawler.framework.CrawlController
 
hasValidApplicationContext() - Method in class org.archive.crawler.framework.CrawlJob
Did the ApplicationContext self-validate? return true if validation passed without errors
HasValidator - Interface in org.archive.spring
 
hasValidStamp(File) - Static method in class org.archive.checkpointing.Checkpoint
 
HasViaDecideRule - Class in org.archive.modules.deciderules
Rule applies the configured decision for any URI which has a 'via' (essentially, any URI that was a seed or some kinds of mid-crawl adds).
HasViaDecideRule() - Constructor for class org.archive.modules.deciderules.HasViaDecideRule
Usual constructor.
hasWriteTag(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
 
haveOverlayNamesBeenSet() - Method in class org.archive.modules.CrawlURI
 
haveOverlayNamesBeenSet() - Method in interface org.archive.spring.OverlayContext
test if this context has actually been configured with overlays (even if in fact no overlays were added)
haveSeen(int, int) - Method in class org.archive.modules.extractor.PDFParser
Indicates, based on a PDFObject's generation/id pair whether the parser has already encountered this object (or a reference to it) so we don't infinitely loop on circuits within the PDF.
HEADER_LENGTH_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
HEADER_MD5_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
HEADER_PREDICTS_MISSING - Static variable in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
HEADER_TRUNC - Static variable in interface org.archive.modules.CoreAttributeConstants
 
HEADER_TRUNC - Static variable in class org.archive.modules.fetcher.FetchErrors
 
headSetInclusive(SortedSet<String>, String) - Static method in class org.archive.util.PrefixFinder
 
heapReport() - Method in class org.archive.crawler.framework.Engine
 
heapReportData() - Method in class org.archive.crawler.framework.Engine
 
Heritrix - Class in org.archive.crawler
Main class for Heritrix crawler.
Heritrix() - Constructor for class org.archive.crawler.Heritrix
 
HeritrixHttpMethodRetryHandler - Class in org.archive.modules.fetcher
Retry handler that tries ten times to establish connection and then once established, if a GET method, tries ten times to get response (If POST, it tries once only).
HeritrixHttpMethodRetryHandler() - Constructor for class org.archive.modules.fetcher.HeritrixHttpMethodRetryHandler
Constructor.
HeritrixHttpMethodRetryHandler(int) - Constructor for class org.archive.modules.fetcher.HeritrixHttpMethodRetryHandler
Constructor.
HeritrixLifecycleProcessor - Class in org.archive.spring
Stand-in LifecycleProcessor to avoid a full automatic start() when our ApplicationContext (PathSharingContext) is built ('refreshed').
HeritrixLifecycleProcessor() - Constructor for class org.archive.spring.HeritrixLifecycleProcessor
 
HeritrixProtocolSocketFactory - Class in org.archive.modules.fetcher
Version of protocol socket factory that tries to get IP from heritrix IP cache -- if its been set into the HttpConnectionParameters.
HeritrixProtocolSocketFactory() - Constructor for class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
Constructor.
HeritrixSSLProtocolSocketFactory - Class in org.archive.modules.fetcher
Implementation of the commons-httpclient SSLProtocolSocketFactory so we can return SSLSockets whose trust manager is ConfigurableX509TrustManager.
HeritrixSSLProtocolSocketFactory() - Constructor for class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
Shutdown constructor.
HIDDEN_PROPS - Static variable in class org.archive.crawler.restlet.JobRelatedResource
suppress problematic properties
HIGH - Static variable in class org.archive.modules.SchedulingConstants
High scheduling priority.
HIGHEST - Static variable in class org.archive.modules.SchedulingConstants
Highest scheduling priority.
highestPrecedenceWaiting - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
HighestUriQueuePrecedencePolicy - Class in org.archive.crawler.frontier.precedence
QueuePrecedencePolicy that sets a uri-queue's precedence to that of the highest URI currently enqueued within itself, added to the configured base-precedence.
HighestUriQueuePrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy
 
HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider - Class in org.archive.crawler.frontier.precedence
Helper provider for maintaining the tracked distribution of included URIs and calculating the queue precedence.
HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider(int) - Constructor for class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
HISTORY_DB_CONFIG - Static variable in class org.archive.modules.recrawl.PersistProcessor
 
historyDb - Variable in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
historyDb - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
historyDb - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
 
historyDbConfig - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
historyDbConfig() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
historyDbName - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
historyDbName - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
 
historyLength - Variable in class org.archive.modules.recrawl.FetchHistoryProcessor
Desired history array length.
Histotable<K> - Class in org.archive.util
Collect and report frequency information.
Histotable() - Constructor for class org.archive.util.Histotable
 
holder - Variable in class org.archive.modules.CrawlURI
 
holderCost - Variable in class org.archive.modules.CrawlURI
spot for an integer cost to be placed by external facility (frontier).
holderKey - Variable in class org.archive.modules.CrawlURI
 
hookupDatabase(Database, Class<E>, StoredClassCatalog) - Method in class org.archive.bdb.StoredQueue
 
Hop - Enum in org.archive.modules.extractor
The kind of "hop" from one URI to another.
HopCrossesAssignmentLevelDomainDecideRule - Class in org.archive.modules.deciderules
Applies its decision if the current URI differs in that portion of its hostname/domain that is assigned/sold by registrars, its 'assignment-level-domain' (ALD) (AKA 'public suffix' or in previous Heritrix versions, 'topmost assigned SURT')
HopCrossesAssignmentLevelDomainDecideRule() - Constructor for class org.archive.modules.deciderules.HopCrossesAssignmentLevelDomainDecideRule
 
HopsPathMatchesRegexDecideRule - Class in org.archive.modules.deciderules
Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regex.
HopsPathMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.HopsPathMatchesRegexDecideRule
Usual constructor.
hopString - Variable in enum org.archive.modules.extractor.Hop
 
HopsUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
UriPrecedencePolicy which assigns URIs a precedence equal to the number of hops in its hops-path-from-seed (either all hops or just navlink ('L') hops.
HopsUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
 
HOST - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
hostKeys() - Method in class org.archive.modules.fetcher.DefaultServerCache
 
hostKeys() - Method in class org.archive.modules.net.ServerCache
 
hostMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
This list is grouped in pairs.
HostnameQueueAssignmentPolicy - Class in org.archive.crawler.frontier
QueueAssignmentPolicy based on the hostname:port evident in the given CrawlURI.
HostnameQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
 
HostResolver - Interface in org.archive.modules.fetcher
 
hosts - Variable in class org.archive.modules.fetcher.DefaultServerCache
hostname -> CrawlHost.
hostsBytesTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
hostsDistributionTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
hostsLastFinishedTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
HostsReport - Class in org.archive.crawler.reporting
The "Hosts Report", tallies by host.
HostsReport() - Constructor for class org.archive.crawler.reporting.HostsReport
 
HTML_TAGS - Static variable in class org.archive.util.UriUtils
 
HTMLForm - Class in org.archive.modules.forms
Simple representation of a discovered HTML Form.
HTMLForm() - Constructor for class org.archive.modules.forms.HTMLForm
 
HTMLForm.FormInput - Class in org.archive.modules.forms
 
HTMLForm.FormInput() - Constructor for class org.archive.modules.forms.HTMLForm.FormInput
 
HtmlFormCredential - Class in org.archive.modules.credential
Credential that holds all needed to do a GET/POST to a HTML form.
HtmlFormCredential() - Constructor for class org.archive.modules.credential.HtmlFormCredential
Constructor.
HTMLLinkContext - Class in org.archive.modules.extractor
XPath-like context for HTML discovered URIs.
HTMLLinkContext(String) - Constructor for class org.archive.modules.extractor.HTMLLinkContext
Constructor.
HTMLLinkContext(CharSequence, CharSequence) - Constructor for class org.archive.modules.extractor.HTMLLinkContext
 
HTTP_BIND_ADDRESS - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
HTTP_SCHEME - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
HttpAuthenticationCredential - Class in org.archive.modules.credential
A Basic/Digest HTTP Authentication (RFC2617) credential.
HttpAuthenticationCredential() - Constructor for class org.archive.modules.credential.HttpAuthenticationCredential
Constructor.
HttpConnection - Class in org.apache.commons.httpclient
An abstraction of an HTTP InputStream and OutputStream pair, together with the relevant attributes.
HttpConnection(String, int) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host and port.
HttpConnection(String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host and port using the given protocol.
HttpConnection(String, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host with the virtual alias and port using given protocol.
HttpConnection(String, int, String, int) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host and port via the given proxy host and port using the default protocol.
HttpConnection(HostConfiguration) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host configuration.
HttpConnection(String, int, String, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
Deprecated.
use #HttpConnection(String, int, String, int, Protocol)
HttpConnection(String, int, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host with the virtual alias and port via the given proxy host and port using the given protocol.
HTTPContentDigest - Class in org.archive.modules.extractor
A processor for calculating custom HTTP content digests in place of the default (if any) computed by the HTTP fetcher processors.
HTTPContentDigest() - Constructor for class org.archive.modules.extractor.HTTPContentDigest
Constructor.
httpMethod - Variable in class org.archive.modules.credential.HtmlFormCredential
GET or POST.
HttpMethodBase - Class in org.apache.commons.httpclient
An abstract base implementation of HttpMethod.
HttpMethodBase() - Constructor for class org.apache.commons.httpclient.HttpMethodBase
No-arg constructor.
HttpMethodBase(String) - Constructor for class org.apache.commons.httpclient.HttpMethodBase
Constructor specifying a URI.
HttpParser - Class in org.apache.commons.httpclient
This class exists solely for compatibility, it's with httpclient The actual functionality is in LaxHttpParser
HttpParser() - Constructor for class org.apache.commons.httpclient.HttpParser
 
HTTPS_SCHEME - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
HttpState - Class in org.apache.commons.httpclient
A container for HTTP attributes that may persist from request to request, such as cookies and authentication credentials.
HttpState() - Constructor for class org.apache.commons.httpclient.HttpState
Default constructor.

I

id - Variable in class org.archive.net.s3.S3URLConnection
 
IdenticalDigestDecideRule - Class in org.archive.modules.deciderules.recrawl
Rule applies configured decision to any CrawlURIs whose prior-history content-digest matches the latest fetch.
IdenticalDigestDecideRule() - Constructor for class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
Usual constructor.
IdentityCacheable - Interface in org.archive.util
Common interface for objects held in ObjectIdentityCaches.
IdentityCacheableWrapper<K> - Class in org.archive.util
Wrapper allowing other objects to be held in an ObjectIdentityCache.
IdentityCacheableWrapper(String, K) - Constructor for class org.archive.util.IdentityCacheableWrapper
 
IgnoreCookiesSpec - Class in org.apache.commons.httpclient.cookie
A cookie spec that does nothing.
IgnoreCookiesSpec() - Constructor for class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
IgnoreRobotsPolicy - Class in org.archive.modules.net
Policy to ignore robots.
IgnoreRobotsPolicy() - Constructor for class org.archive.modules.net.IgnoreRobotsPolicy
 
IMG_SRC - Static variable in class org.archive.modules.extractor.HTMLLinkContext
 
importRecoverFormat(File, boolean, boolean, boolean, String) - Method in interface org.archive.crawler.framework.Frontier
Import URIs from the given file (in recover-log-like format, with a 3-character 'type' tag preceding a URI with optional hops/via).
importRecoverFormat(File, boolean, boolean, boolean, String) - Method in class org.archive.crawler.frontier.AbstractFrontier
Import URIs from the given file (in recover-log-like format, with a 3-character 'type' tag preceding a URI with optional hops/via).
importRecoverLog(JSONObject, Frontier) - Static method in class org.archive.crawler.frontier.FrontierJournal
Utility method for scanning a recovery journal and applying it to a Frontier.
importURIs(String) - Method in interface org.archive.crawler.framework.Frontier
Load URIs from a file, for scheduling and/or considered-included status (if from a recovery log).
importURIs(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
importURIsSimple(JSONObject) - Method in class org.archive.crawler.frontier.AbstractFrontier
Import URIs from either a simple (one URI per line) or crawl.log format.
inactiveQueuesByPrecedence - Variable in class org.archive.crawler.frontier.BdbFrontier
All 'inactive' queues, not yet in active rotation.
included(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
includesRetireDirective() - Method in class org.archive.modules.CrawlURI
 
incrementConsecutiveConnectionErrors() - Method in class org.archive.modules.net.CrawlServer
 
incrementDeferrals() - Method in class org.archive.modules.CrawlURI
Increment the deferral count.
incrementDiscardedOutLinks() - Method in class org.archive.modules.CrawlURI
 
incrementDisregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of disregarded URIs.
incrementFailedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of failed URIs.
incrementFetchAttempts() - Method in class org.archive.modules.CrawlURI
Increment the count of attempts (trips through the processing loop) at getting the document referenced by this URI.
incrementMapCount(ConcurrentMap<String, AtomicLong>, String) - Static method in class org.archive.crawler.reporting.StatisticsTracker
Increment a counter for a key in a given HashMap.
incrementMapCount(ConcurrentMap<String, AtomicLong>, String, long) - Static method in class org.archive.crawler.reporting.StatisticsTracker
Increment a counter for a key in a given HashMap by an arbitrary amount.
incrementQueuedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of queued URIs.
incrementQueuedUriCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of queued URIs.
incrementSucceededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of successfully fetched URIs.
INDEX_FORMAT - Static variable in class org.archive.checkpointing.Checkpoint
format for serial numbers
indexOfCurrentIterator - Variable in class org.archive.util.iterator.CompositeIterator
 
INFERRED_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for inferred urls without other context.
inferRootPage - Variable in class org.archive.modules.extractor.ExtractorHTTP
should all HTTP URIs be used to infer a link to the site's root?
inheritFrom(CrawlURI) - Method in class org.archive.modules.CrawlURI
Inherit (copy) the relevant keys-values from the ancestor.
initAllQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
initAllQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Initialize the allQueues field in an implementation-appropriate way.
INITARGS - Static variable in class org.archive.bdb.AutoKryo
 
initialDelaySeconds - Variable in class org.archive.crawler.framework.ActionDirectory
how long after crawl start to first scan action directory
initialize(Database) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Method shared by constructors.
initialize(File) - Method in class org.archive.io.CrawlerJournal
 
initialize() - Method in class org.archive.modules.extractor.PDFParser
Initialize opens the document for reading.
initialize(Environment, String, Class, StoredClassCatalog) - Method in class org.archive.util.ObjectIdentityBdbCache
Call this method when you have an instance when you used the default constructor or when you have a deserialized instance that you want to reconnect with an extant bdbje environment.
initialize(Environment, String, Class, StoredClassCatalog) - Method in class org.archive.util.ObjectIdentityBdbManualCache
Call this method when you have an instance when you used the default constructor or when you have a deserialized instance that you want to reconnect with an extant bdbje environment.
initializeFromReader(BufferedReader) - Method in class org.archive.modules.net.Robotstxt
 
initInternalQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Initializes internal queues.
initLaunchDir() - Method in class org.archive.spring.PathSharingContext
 
initLaunchId() - Method in class org.archive.spring.PathSharingContext
 
initLifecycleProcessor() - Method in class org.archive.spring.PathSharingContext
Initialize the LifecycleProcessor.
initOtherQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
initOtherQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Initialize all other internal queues in an implementation-appropriate way.
initOutputStream(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
Get the OutputStream for the file to write to.
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.AcceptDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.ContentLengthDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PredicatedDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PrerequisiteAcceptDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.RejectDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.SeedAcceptDecideRule
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
Actually extracts links.
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorCSS
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorDOC
Processes a word document and extracts any hyperlinks from it.
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTML
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorJS
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorPDF
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorSWF
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorUniversal
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorXML
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.TrapSuppressExtractor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
Run candidates chain on each of (1) any prerequisite, if present; (2) any outCandidates, if present; (3) all outlinks, if appropriate
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.CandidateScoper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
innerProcess(CrawlURI) - Method in class org.archive.modules.extractor.Extractor
Processes the given URI.
innerProcess(CrawlURI) - Method in class org.archive.modules.extractor.HTTPContentDigest
 
innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchDNS
 
innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchFTP
Processes the given URI.
innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
 
innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
innerProcess(CrawlURI) - Method in class org.archive.modules.forms.FormLoginProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.Processor
Actually performs the process.
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryLoader
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryStorer
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistStoreProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.ScriptedProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
Notes a CrawlURI's content size in its running tally.
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.CandidateScoper
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
innerProcessResult(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
innerProcessResult(CrawlURI) - Method in class org.archive.modules.Processor
 
innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.ARCWriterProcessor
Writes a CrawlURI and its associated data to store file.
innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.WARCWriterProcessor
Writes a CrawlURI and its associated data to store file.
innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
innerRejectProcess(CrawlURI) - Method in class org.archive.modules.Processor
Invoked after a URI has been rejected.
innerRejectProcess(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.SimpleCookieStorage
 
inProcessQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
all per-class queues from whom a URI is outstanding
insertItem(WorkQueueFrontier, CrawlURI, boolean) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
insertItem(WorkQueueFrontier, CrawlURI, boolean) - Method in class org.archive.crawler.frontier.WorkQueue
Insert the given curi, whether it is already present or not.
insertKeyToString(DatabaseEntry) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
installProvider(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
Install the appropriate provider helper object into the WorkQueue, if necessary.
installProvider(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy
 
installReplicasUpTo(int) - Method in class org.archive.util.LongToIntConsistentHash
Install necessary replicas, if not already present.
INSTANCE - Static variable in class org.archive.modules.net.IgnoreRobotsPolicy
 
INSTANCE - Static variable in class org.archive.modules.net.ObeyRobotsPolicy
 
INSTANCE - Static variable in interface org.archive.util.CLibrary
 
INSTANCE - Static variable in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
 
instance - Variable in class org.archive.util.Supplier
 
instanceMain(String[]) - Method in class org.archive.crawler.Heritrix
 
instanceMain(String[]) - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
 
instanceMain(String[]) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
instantiateContainer() - Method in class org.archive.crawler.framework.CrawlJob
Can the configuration yield an assembled ApplicationContext?
interpolate(String) - Method in class org.archive.spring.ConfigPathConfigurer
 
intervalSeconds - Variable in class org.archive.crawler.reporting.StatisticsTracker
The interval between writing progress information to log.
invert(DecideResult) - Static method in enum org.archive.modules.deciderules.DecideResult
 
invokeStatic(String, Class<?>, Class<?>[], Object[]) - Method in class org.archive.bdb.AutoKryo
 
IP_ADDRESS - Static variable in class org.archive.modules.extractor.ExtractorUniversal
Matches any string that begins with http:// or https:// followed by something that looks like an ip address (four numbers, none longer then 3 chars seperated by 3 dots).
IP_ADDRESS_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
IP_ADDRESS_REGEX - Static variable in class org.archive.modules.fetcher.FetchWhois
 
IP_NEVER_EXPIRES - Static variable in class org.archive.modules.net.CrawlHost
Flag value indicating always-valid IP
IP_NEVER_LOOKED_UP - Static variable in class org.archive.modules.net.CrawlHost
Flag value indicating an IP has not yet been looked up
IpAddressSetDecideRule - Class in org.archive.modules.deciderules
IpAddressSetDecideRule must be used with Preselector.setRecheckScope(boolean) set to true because it relies on Heritrix' dns lookup to establish the ip address for a URI before it can run.
IpAddressSetDecideRule() - Constructor for class org.archive.modules.deciderules.IpAddressSetDecideRule
 
IPQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Uses target IP as basis for queue-assignment, unless it is unavailable, in which case it behaves as HostnameQueueAssignmentPolicy.
IPQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
is2XXSuccess() - Method in class org.archive.modules.CrawlURI
 
isAborted() - Method in class org.apache.commons.httpclient.HttpMethodBase
Tests whether the execution of this method has been aborted
isActive() - Method in class org.archive.crawler.framework.CrawlController
Is this crawl actively able/trying to crawl? Includes both states RUNNING and EMPTY.
isActive() - Method in class org.archive.crawler.framework.ToeThread
Is this thread validly processing a URI, not paused, waiting for a URI, or interrupted?
isAllowCreate() - Method in class org.archive.bdb.BdbModule.BdbConfig
 
isARCType(String) - Method in class org.archive.io.Warc2Arc
 
isAuthenticationPreemptive() - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
Use HttpClientParams.isAuthenticationPreemptive(), HttpClient.getParams().
isCheckpointing() - Method in class org.archive.crawler.framework.CheckpointService
 
isCheckpointRecovery - Variable in class org.archive.modules.fetcher.BdbCookieStorage
are we a checkpoint recovery? (in which case, reuse stored cookie data?)
isCheckpointRecovery - Variable in class org.archive.modules.net.BdbServerCache
 
isConnectionCloseForced() - Method in class org.apache.commons.httpclient.HttpMethodBase
Tests if the connection should be force-closed when no longer needed.
isDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
isDomainAttributeSpecified() - Method in class org.apache.commons.httpclient.Cookie
Returns true if cookie's domain was set via a domain attribute in the Set-Cookie header.
isEmpty() - Method in class org.archive.bdb.StoredQueue
 
isEmpty() - Method in interface org.archive.crawler.framework.Frontier
Returns true if the frontier contains no more URIs to crawl.
isEmpty() - Method in class org.archive.crawler.frontier.AbstractFrontier
Frontier is empty only if all queues are empty and no URIs are in-process
isEmpty() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return whether frontier is exhausted: all crawlable URIs done (none waiting or pending).
isEveryTime() - Method in class org.archive.modules.credential.Credential
 
isEveryTime() - Method in class org.archive.modules.credential.HtmlFormCredential
 
isEveryTime() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
isExpired() - Method in class org.apache.commons.httpclient.Cookie
Returns true if this cookie has expired.
isExpired(Date) - Method in class org.apache.commons.httpclient.Cookie
Returns true if this cookie has expired according to the time passed in.
isExpired() - Method in class org.archive.crawler.restlet.Flash
Indicate whether the Flash should persist.
isFailure() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
isFinished() - Method in class org.archive.crawler.framework.CrawlController
 
isHtmlExpectedHere(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTML
Test whether this HTML is so unexpected (eg in place of a GIF URI) that it shouldn't be scanned for links.
isHttp11() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
Use HttpMethodParams.getVersion()
isHttpTransaction() - Method in class org.archive.modules.CrawlURI
Return true if this is a http transaction.
isInScope(CrawlURI) - Method in class org.archive.crawler.framework.Scoper
Schedule the given CrawlURI with the Frontier.
isInScope(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
isIpExpired(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Return true if ip should be looked up.
isLaunchable() - Method in class org.archive.crawler.framework.CrawlJob
Is it reasonable to offer a launch button
isLaunchInfoPartial - Variable in class org.archive.crawler.framework.CrawlJob
 
isLaunchInfoPartial() - Method in class org.archive.crawler.framework.CrawlJob
 
isLikelyFalsePositive(CharSequence) - Static method in class org.archive.util.UriUtils
 
isLikelyUri(CharSequence) - Static method in class org.archive.util.UriUtils
Deprecated.
produces too many false positives, UriUtils.isVeryLikelyUri(CharSequence) is preferred
isLikelyUriHtmlContextLegacy(CharSequence) - Static method in class org.archive.util.UriUtils
 
isLikelyUriJavascriptContextLegacy(CharSequence) - Static method in class org.archive.util.UriUtils
 
isLocation() - Method in class org.archive.modules.CrawlURI
 
isLocked() - Method in class org.apache.commons.httpclient.HttpConnection
Tests if the connection is locked.
isManaged - Variable in class org.archive.crawler.frontier.WorkQueue
Whether queue is already in lifecycle stage
isManaged() - Method in class org.archive.crawler.frontier.WorkQueue
Whether the queue is already in a lifecycle stage -- such as ready, in-progress, snoozed -- and thus should not be redundantly inserted to readyClassQueues
isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.CustomRobotsPolicy
 
isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
isolateThreads - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
Whether each ToeThread should get its own independent script engine, or they should share synchronized access to one engine.
isolateThreads - Variable in class org.archive.modules.ScriptedProcessor
Whether each ToeThread should get its own independent script engine, or they should share synchronized access to one engine.
isOpen - Variable in class org.apache.commons.httpclient.HttpConnection
Whether or not the connection is connected.
isOpen() - Method in class org.apache.commons.httpclient.HttpConnection
Tests if the connection is open.
isOverSessionBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Check whether queue has temporarily (session) exceeded its budget.
isOverTotalBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Check whether queue has permanently (total) exceeded its budget.
isPaintable() - Method in class org.archive.io.ReadSourceEditor
 
isPaintable() - Method in class org.archive.spring.ConfigPathEditor
 
isPathAttributeSpecified() - Method in class org.apache.commons.httpclient.Cookie
Returns true if cookie's path was set via a path attribute in the Set-Cookie header.
isPausable() - Method in class org.archive.crawler.framework.CrawlJob
 
isPaused() - Method in class org.archive.crawler.framework.CrawlController
Tell if the controller is paused
isPausing() - Method in class org.archive.crawler.framework.CrawlController
 
isPersistent() - Method in class org.apache.commons.httpclient.Cookie
Returns false if the cookie should be discarded at the end of the "session"; true otherwise.
isPossibleUri(CharSequence) - Static method in class org.archive.util.UriUtils
 
isPost() - Method in class org.archive.modules.credential.Credential
 
isPost() - Method in class org.archive.modules.credential.HtmlFormCredential
 
isPost() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
isPrerequisite() - Method in class org.archive.modules.CrawlURI
Returns true if this CrawlURI is a prerequisite.
isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
 
isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
 
isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
isProfile() - Method in class org.archive.crawler.framework.CrawlJob
Is this job a 'profile' (or template), meaning it may be editted or copied to another jobs, but should not be launched.
isProxied() - Method in class org.apache.commons.httpclient.HttpConnection
Returns true if the connection is established via a proxy, false otherwise.
isQuadAddress(CrawlURI, String, CrawlHost) - Method in class org.archive.modules.fetcher.FetchDNS
 
isRequestSent() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true if the HTTP has been transmitted to the target server in its entirety, false otherwise.
isResponseAvailable() - Method in class org.apache.commons.httpclient.HttpConnection
Tests if input data avaialble.
isResponseAvailable(int) - Method in class org.apache.commons.httpclient.HttpConnection
Tests if input data becomes available within the given period time in milliseconds.
isRetired() - Method in class org.archive.crawler.frontier.WorkQueue
 
isRobotsExpired(int) - Method in class org.archive.modules.net.CrawlServer
Is the robots policy expired.
isRunning - Variable in class org.archive.bdb.BdbModule
 
isRunning() - Method in class org.archive.bdb.BdbModule
 
isRunning() - Method in class org.archive.crawler.framework.ActionDirectory
 
isRunning - Variable in class org.archive.crawler.framework.CheckpointService
 
isRunning() - Method in class org.archive.crawler.framework.CheckpointService
 
isRunning - Variable in class org.archive.crawler.framework.CrawlController
 
isRunning() - Method in class org.archive.crawler.framework.CrawlController
 
isRunning() - Method in class org.archive.crawler.framework.CrawlJob
 
isRunning - Variable in class org.archive.crawler.framework.Scoper
 
isRunning() - Method in class org.archive.crawler.framework.Scoper
 
isRunning() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
isRunning() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
isRunning() - Method in class org.archive.crawler.processor.CrawlMapper
 
isRunning - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
isRunning() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
isRunning - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
isRunning() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
isRunning - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
isRunning() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
isRunning - Variable in class org.archive.modules.deciderules.DecideRuleSequence
 
isRunning() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
isRunning - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
 
isRunning() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
isRunning() - Method in class org.archive.modules.fetcher.FetchHTTP
 
isRunning() - Method in class org.archive.modules.fetcher.FetchWhois
 
isRunning - Variable in class org.archive.modules.net.BdbServerCache
 
isRunning() - Method in class org.archive.modules.net.BdbServerCache
 
isRunning - Variable in class org.archive.modules.Processor
 
isRunning() - Method in class org.archive.modules.Processor
 
isRunning - Variable in class org.archive.modules.ProcessorChain
 
isRunning() - Method in class org.archive.modules.ProcessorChain
 
isRunning() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
isRunning() - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
isRunning() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
 
isSecure() - Method in class org.apache.commons.httpclient.HttpConnection
Returns true if the connection is established over a secure protocol.
isSeed() - Method in class org.archive.modules.CrawlURI
 
isStale() - Method in class org.apache.commons.httpclient.HttpConnection
Determines whether this connection is "stale", which is to say that either it is no longer open, or an attempt to read the connection would fail.
isStaleCheckingEnabled() - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnectionParams.isStaleCheckingEnabled(), HttpConnection.getParams().
isStopComplete - Variable in class org.archive.crawler.framework.CrawlController
 
isStopComplete() - Method in class org.archive.crawler.framework.CrawlController
 
isStrictMode() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
Use HttpParams.setParameter(String, Object) to exercise a more granular control over HTTP protocol strictness.
isSuccess() - Method in class org.archive.modules.CrawlURI
Ask this URI if it was a success or not.
isSuccess(CrawlURI) - Static method in class org.archive.modules.Processor
 
isTransactional() - Method in class org.archive.bdb.BdbModule.BdbConfig
 
isTransparent() - Method in class org.apache.commons.httpclient.HttpConnection
Indicates if the connection is completely transparent from end to end.
isUnicode() - Method in class org.archive.util.ms.Piece
 
isUnpausable() - Method in class org.archive.crawler.framework.CrawlJob
 
isValidRobots() - Method in class org.archive.modules.net.CrawlServer
If true then valid robots.txt information has been retrieved.
isVeryLikelyUri(CharSequence) - Static method in class org.archive.util.UriUtils
 
isXmlOk() - Method in class org.archive.crawler.framework.CrawlJob
Is the primary config file legal XML?
iterator() - Method in class org.archive.bdb.StoredQueue
 
iterator() - Method in class org.archive.modules.ProcessorChain
 
iterator - Variable in class org.archive.util.Iteratorable
 
iterator() - Method in class org.archive.util.Iteratorable
 
iterator() - Method in class org.archive.util.Transform
 
Iteratorable<K> - Class in org.archive.util
Make an Iterator usable as an Iterable (and thus enable new-style for-each loops).
Iteratorable(Iterator<K>) - Constructor for class org.archive.util.Iteratorable
 
iterators - Variable in class org.archive.util.iterator.CompositeIterator
 

J

JavaLiterals - Class in org.archive.util
Utility functions to escape or unescape Java literal strings.
JavaLiterals() - Constructor for class org.archive.util.JavaLiterals
 
JAVASCRIPT_STRING_EXTRACTOR - Static variable in class org.archive.modules.extractor.ExtractorJS
 
JerichoExtractorHTML - Class in org.archive.modules.extractor
Improved link-extraction from an HTML content-body using jericho-html parser.
JerichoExtractorHTML() - Constructor for class org.archive.modules.extractor.JerichoExtractorHTML
 
JndiUtils - Class in org.archive.util
JNDI utilities.
JndiUtils() - Constructor for class org.archive.util.JndiUtils
 
jobConfigs - Variable in class org.archive.crawler.framework.Engine
map of job short names -> CrawlJob instances
jobDirRelativePath(File) - Method in class org.archive.crawler.framework.CrawlJob
Compute a path relative to the job directory for all contained files, or null if the File is not inside the job directory.
jobLogger - Variable in class org.archive.crawler.framework.CrawlJob
 
jobName - Variable in class org.archive.modules.CrawlMetadata
 
JobRelatedResource - Class in org.archive.crawler.restlet
Shared superclass for resources that represent functional aspects of a CrawlJob.
JobRelatedResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.JobRelatedResource
 
JobResource - Class in org.archive.crawler.restlet
Restlet Resource representing a single local CrawlJob inside an Engine.
JobResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.JobResource
 
jobsDir - Variable in class org.archive.crawler.framework.Engine
directory where job directories are expected
JS_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for JavaScript-discovered urls without other context.
JSONUtils - Class in org.archive.util
Utilities for working with JSON/JSONObjects.
JSONUtils() - Constructor for class org.archive.util.JSONUtils
 
JSSTRING - Static variable in class org.archive.modules.extractor.ExtractorSWF
 
jump(String) - Static method in class org.archive.modules.ProcessResult
 

K

keepSnapshotsCount - Variable in class org.archive.crawler.reporting.StatisticsTracker
Number of crawl-stat sample snapshots to keep for calculation purposes.
key - Variable in class org.archive.util.IdentityCacheableWrapper
 
KeyedProperties - Class in org.archive.spring
Map for storing overridable properties.
KeyedProperties() - Constructor for class org.archive.spring.KeyedProperties
 
keys - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
keySet() - Method in class org.archive.util.ObjectIdentityBdbCache
 
keySet() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
keySet() - Method in interface org.archive.util.ObjectIdentityCache
set of all keys
keySet() - Method in class org.archive.util.ObjectIdentityMemCache
 
kill() - Method in class org.archive.crawler.framework.ToeThread
Terminates a thread.
killThread(int, boolean) - Method in class org.archive.crawler.framework.CrawlController
Kills a thread.