A B C D E F G H I J K L M N O P Q R S T U V W X Z _ 

A

A_ANNOTATIONS - Static variable in interface org.archive.modules.CoreAttributeConstants
shorthand string tokens indicating notable occurrences, separated by commas
A_CONTENT_DIGEST - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
content digest
A_CONTENT_DIGEST_COUNT - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
number of times we've seen this content digest (1 original + n duplicates)
A_CONTENT_DIGEST_HISTORY - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
content digest history map
A_CONTENT_TYPE - Static variable in interface org.archive.modules.CoreAttributeConstants
Extracted MIME type of fetched content; should be set immediately by fetching module if possible (rather than waiting for a later analyzer)
A_CREDENTIALS_KEY - Static variable in interface org.archive.modules.CoreAttributeConstants
Key to get credential avatars from A_LIST.
A_DELAY_FACTOR - Static variable in interface org.archive.modules.CoreAttributeConstants
Multiplier of last fetch duration to wait before fetching another item of the same class (eg host)
A_DISTANCE_FROM_SEED - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_DNS_FETCH_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_DNS_SERVER_IP_LABEL - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_ETAG_HEADER - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
header name (and AList key) for ETag
A_FETCH_BEGAN_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_FETCH_COMPLETED_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_FETCH_HISTORY - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
fetch history array
A_FORCE_RETIRE - Static variable in interface org.archive.modules.CoreAttributeConstants
flag indicating the containing queue should be retired
A_FORM_OFFSETS - Static variable in class org.archive.modules.extractor.ExtractorHTML
 
A_FTP_CONTROL_CONVERSATION - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_FTP_FETCH_STATUS - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_HERITABLE_KEYS - Static variable in interface org.archive.modules.CoreAttributeConstants
Key to (optional) attribute specifying a list of keys that are passed to CandidateURIs that 'descend' (are discovered via) this URI.
A_HREF - Static variable in class org.archive.modules.extractor.HTMLLinkContext
 
A_HTML_BASE - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_HTML_FORM_OBJECTS - Static variable in class org.archive.modules.forms.ExtractorHTMLForms
 
A_HTTP_AUTH_CHALLENGES - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_HTTP_PROXY_HOST - Static variable in interface org.archive.modules.CoreAttributeConstants
local override of proxy host
A_HTTP_PROXY_PORT - Static variable in interface org.archive.modules.CoreAttributeConstants
local override of proxy port
A_LAST_MODIFIED_HEADER - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
header name (and AList key) for last-modified timestamp
A_META_ROBOTS - Static variable in class org.archive.modules.extractor.ExtractorHTML
 
A_MINIMUM_DELAY - Static variable in interface org.archive.modules.CoreAttributeConstants
Minimum delay before fetching another item of th same class (eg host).
A_MIRROR_PATH - Static variable in interface org.archive.modules.CoreAttributeConstants
Define for org.archive.crawler.writer.MirrorWriterProcessor.
A_MIRROR_PATH - Static variable in class org.archive.modules.writer.MirrorWriterProcessor
 
A_NONFATAL_ERRORS - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_ORIGINAL_DATE - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
date content payload was written
A_ORIGINAL_URL - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
url that the content payload was written for
A_PRECALC_PRECEDENCE - Static variable in interface org.archive.modules.CoreAttributeConstants
key to attribute containing pre-calculated precedence
A_PREREQUISITE_URI - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_REFERENCE_LENGTH - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
reference length (content length or virtual length
A_RETRY_DELAY - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_RRECORD_SET_LABEL - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_RUNTIME_EXCEPTION - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_SOURCE_TAG - Static variable in interface org.archive.modules.CoreAttributeConstants
a 'source' (usu.
A_STATUS - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
key for status (when in history)
A_SUBMIT_DATA - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_VIA_DIGEST - Static variable in class org.archive.modules.extractor.TrapSuppressExtractor
ALIst attribute key for carrying-forward content-digest from 'via'
A_WARC_FILE_OFFSET - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
offset into warc file of warc record with content payload
A_WARC_FILENAME - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
warc filename containing the content payload
A_WARC_RECORD_ID - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
warc record id of warc record with the content payload
A_WARC_RESPONSE_HEADERS - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_WHOIS_SERVER_IP - Static variable in interface org.archive.modules.CoreAttributeConstants
 
A_WRITE_TAG - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
Writer processors of all types are encouraged to put a 'writeTag' (analogous to HTTP 'etag') in the CrawlURI state.
abort() - Method in class org.apache.commons.httpclient.HttpMethodBase
Aborts the execution of this method.
aboutToLog() - Method in class org.archive.modules.CrawlURI
Notify CrawlURI it is about to be logged; opportunity for self-annotation
ABS_HTTP_URI_PATTERN - Static variable in class org.archive.modules.extractor.ExtractorURI
 
AbstractContentDigestHistory - Class in org.archive.modules.recrawl
Represents a store of information, presumably persistent, keyed by content digest.
AbstractContentDigestHistory() - Constructor for class org.archive.modules.recrawl.AbstractContentDigestHistory
 
AbstractCookieStorage - Class in org.archive.modules.fetcher
 
AbstractCookieStorage() - Constructor for class org.archive.modules.fetcher.AbstractCookieStorage
 
AbstractFrontier - Class in org.archive.crawler.frontier
Shared facilities for Frontier implementations.
AbstractFrontier() - Constructor for class org.archive.crawler.frontier.AbstractFrontier
 
AbstractLongFPSet - Class in org.archive.util
Shell of functionality for a Set of primitive long fingerprints, held in an array of possibly-empty slots.
AbstractLongFPSet() - Constructor for class org.archive.util.AbstractLongFPSet
To support serialization TODO: verify needed?
AbstractLongFPSet(int, float) - Constructor for class org.archive.util.AbstractLongFPSet
Create a new AbstractLongFPSet with a given capacity and load Factor
AbstractPersistProcessor - Class in org.archive.modules.recrawl
 
AbstractPersistProcessor() - Constructor for class org.archive.modules.recrawl.AbstractPersistProcessor
 
ac - Variable in class org.archive.crawler.framework.CrawlJob
 
AcceptDecideRule - Class in org.archive.modules.deciderules
 
AcceptDecideRule() - Constructor for class org.archive.modules.deciderules.AcceptDecideRule
 
acceptNonDnsResolves - Variable in class org.archive.modules.fetcher.FetchDNS
If a DNS lookup fails, whether or not to fallback to InetAddress resolution, which may use local 'hosts' files or other mechanisms.
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.EngineResource
 
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.EnhDirectoryResource
Accept a POST used to edit or create a file.
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.JobResource
 
acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.ScriptResource
 
accepts(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
 
accumulate(CrawlURI) - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
actionDir - Variable in class org.archive.crawler.framework.ActionDirectory
 
ActionDirectory - Class in org.archive.crawler.framework
Directory watched for new files.
ActionDirectory() - Constructor for class org.archive.crawler.framework.ActionDirectory
 
actions - Variable in class org.archive.modules.extractor.CustomSWFTags
 
activateInactiveQueue() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Activate an inactive queue, if any are available.
active - Variable in class org.archive.crawler.frontier.WorkQueue
whether queue is active (ready/in-process/snoozed) or on a waiting queue
actOn(File) - Method in class org.archive.crawler.framework.ActionDirectory
Process an individual action file found
actOn(File) - Method in class org.archive.modules.seeds.SeedModule
 
actOn(File) - Method in class org.archive.modules.seeds.TextSeedModule
Treat the given file as a source of additional seeds, announcing to SeedListeners.
add(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Add given uri, if not already present.
add(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
add(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
add(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
 
add(long) - Method in class org.archive.util.AbstractLongFPSet
Add the given value to this set
add(CharSequence) - Method in interface org.archive.util.BloomFilter
Adds a character sequence to the filter.
add(CharSequence) - Method in class org.archive.util.BloomFilter64bit
Adds a character sequence to the filter.
add(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
add(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Add a fingerprint to the set.
add(Histotable<K>) - Method in class org.archive.util.Histotable
 
add(Iterator<E>) - Method in class org.archive.util.iterator.CompositeIterator
Add an iterator to the internal chain.
addAllow(String) - Method in class org.archive.modules.net.RobotsDirectives
 
addCap(byte[]) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Add a dummy 'cap' entry at the given insertion key.
addCookie(Cookie) - Method in class org.apache.commons.httpclient.HttpState
Adds an HTTP cookie, replacing any existing equivalent cookies.
addCookieRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates Cookie request headers for those cookies that match the given host, port and path.
addCookies(Cookie[]) - Method in class org.apache.commons.httpclient.HttpState
Adds an array of HTTP cookies.
addCredential(Credential) - Method in class org.archive.modules.net.CrawlServer
Add an avatar.
addDataPersistentMember(String) - Static method in class org.archive.modules.CrawlURI
Add the key of data map items you want to persist across processings.
addDisallow(String) - Method in class org.archive.modules.net.RobotsDirectives
 
added(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
addedSeed(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
When notified of a seed via the SeedListener interface, schedule it.
addedSeed(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
Create a seed record, even on initial notification (before any real attempt/processing.
addedSeed(CrawlURI) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
If appropriate, convert seed notification into prefix-addition.
addedSeed(CrawlURI) - Method in interface org.archive.modules.seeds.SeedListener
 
addExternalPath(String) - Method in class org.archive.spring.KeyedProperties
Add a path by which the outside world can reach this map
addExtraInfo(String, Object) - Method in class org.archive.modules.CrawlURI
 
addField(String, String, String) - Method in class org.archive.modules.forms.HTMLForm
Add a discovered INPUT, tracking it as potential username/password receiver.
addFlash(Response, String) - Static method in class org.archive.crawler.restlet.Flash
 
addFlash(Response, String, Flash.Kind) - Static method in class org.archive.crawler.restlet.Flash
 
addForce(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Add given uri, all the way through to underlying destination, even if already present.
addForce(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
addForce(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
addGlobalVariable(String, String) - Method in class org.archive.crawler.restlet.ScriptingConsole
 
addHeaderLink(CrawlURI, Header) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
addHeaderLink(CrawlURI, String, String) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
addHostRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates Host request header, as long as no Host request header already exists.
addIfNotBlank(ANVLRecord, String, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
addJobDirectory(File) - Method in class org.archive.crawler.framework.Engine
Adds a job directory to the Engine known jobConfigs if not extant.
addLinkFromString(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
 
addLogger(Logger) - Method in class org.archive.crawler.reporting.AlertThreadGroup
 
addNewFp(long) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
addNewFp(long) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Add an FP (which may be an old or new FP) to the new complete list.
addNewFp(long) - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
addNow(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Immediately add uri.
addNow(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
addNow(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
addOutlink(CrawlURI, String, LinkContext, Hop) - Method in class org.archive.modules.extractor.Extractor
Create and add a 'Link' to the CrawlURI with given URI/context/hop-type
addPersistentDataMapKey(String) - Method in class org.archive.modules.CrawlURI
 
addPresentableNestedNames(Collection<Object>, Object, Set<Object>) - Method in class org.archive.crawler.restlet.JobRelatedResource
Starting at (and including) the given object, adds nested Map representations of named beans to the namedBeans Collection.
addPropertyChangeListener(PropertyChangeListener) - Method in class org.archive.io.ReadSourceEditor
 
addPropertyChangeListener(PropertyChangeListener) - Method in class org.archive.spring.ConfigPathEditor
 
addProxyConnectionHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates Proxy-Connection: Keep-Alive request header when communicating via a proxy server.
AddRedirectFromRootServerToScope - Class in org.archive.modules.deciderules
 
AddRedirectFromRootServerToScope() - Constructor for class org.archive.modules.deciderules.AddRedirectFromRootServerToScope
 
addRefreshHeaderLink(CrawlURI, Header) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
addRelativeToBase(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
 
addRelativeToVia(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
 
addRequestHeader(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
Adds the specified request header, NOT overwriting any previous value.
addRequestHeader(String, String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Adds the specified request header, NOT overwriting any previous value.
addRequestHeaders(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates all the required request headers to be submitted via the given connection.
addResponseContent(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
This method populates curi with response status and content type.
addResponseFooter(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
Use this method internally to add footers.
ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter64bit
 
addRuleAssociation(DecideRuledSheetAssociation) - Method in class org.archive.crawler.spring.SheetOverlaysManager
 
addRuleAssociations(Set<DecideRuledSheetAssociation>) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Collect all rule-based SheetAssociations.
addSeed(CrawlURI) - Method in class org.archive.modules.seeds.SeedModule
 
addSeed(CrawlURI) - Method in class org.archive.modules.seeds.TextSeedModule
Add a new seed to scope.
addSeedListener(SeedListener) - Method in class org.archive.modules.seeds.SeedModule
 
addStats(Map<String, Map<String, Long>>) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
addSurtAssociation(String, String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
 
addSurtAssociations(List<SurtPrefixesSheetAssociation>) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Collect all SURT-based SheetAssociations.
addSurtsAssociation(SurtPrefixesSheetAssociation) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Add an individual surtsAssociation to the sheetNamesBySurt map.
addToManifest(String, char, boolean) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
Add a file to the manifest of files used/generated by the current crawl.
addToManifest(String, char, boolean) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
addUserAgentRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Generates default User-Agent request header, as long as no User-Agent request header already exists.
addWhoisLink(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchWhois
 
addWhoisLinks(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
Adds outlinks to whois:{domain} and whois:{ipAddress}
afterPropertiesSet() - Method in class org.archive.checkpointing.Checkpoint
 
afterPropertiesSet() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
afterPropertiesSet() - Method in class org.archive.crawler.util.BloomUriUniqFilter
Initializer.
afterPropertiesSet() - Method in class org.archive.modules.CrawlMetadata
 
afterPropertiesSet() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
afterPropertiesSet() - Method in class org.archive.modules.extractor.ExtractorHTML
 
afterPropertiesSet() - Method in class org.archive.modules.ScriptedProcessor
 
agentsToDirectives - Variable in class org.archive.modules.net.Robotstxt
 
AggressiveExtractorHTML - Class in org.archive.modules.extractor
Extended version of ExtractorHTML with more aggressive javascript link extraction where javascript code is parsed first with general HTML tags regex, and than by javascript speculative link regex.
AggressiveExtractorHTML() - Constructor for class org.archive.modules.extractor.AggressiveExtractorHTML
 
AlertHandler - Class in org.archive.crawler.reporting
Stub Handler, catching and relaying WARNING/SEVERE events to AlertThreadGroup.
AlertHandler() - Constructor for class org.archive.crawler.reporting.AlertHandler
 
alertsLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
alertThreadGroup - Variable in class org.archive.crawler.framework.CrawlController
 
alertThreadGroup - Variable in class org.archive.crawler.framework.CrawlJob
 
AlertThreadGroup - Class in org.archive.crawler.reporting
Parent thread group which lets all child threads find the right 'alert' error handler.
AlertThreadGroup(String) - Constructor for class org.archive.crawler.reporting.AlertThreadGroup
 
allBeans - Variable in class org.archive.spring.ConfigPathConfigurer
 
allConfigPaths - Variable in class org.archive.spring.ConfigPathConfigurer
 
allErrors - Variable in class org.archive.spring.PathSharingContext
 
allFps - Variable in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
allNonemptyReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Compact report of all nonempty queues (one queue per line)
allowCreate - Variable in class org.archive.bdb.BdbModule.BdbConfig
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.CustomRobotsPolicy
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.IgnoreRobotsPolicy
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.ObeyRobotsPolicy
 
allows - Variable in class org.archive.modules.net.RobotsDirectives
 
allows(String) - Method in class org.archive.modules.net.RobotsDirectives
 
allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.RobotsPolicy
 
allowsAll() - Method in class org.archive.modules.net.Robotstxt
Does this policy effectively allow everything? (No disallows or timing (crawl-delay) directives?)
allowsEdit(File) - Method in class org.archive.crawler.restlet.EnhDirectory
 
allowsPaging(File) - Method in class org.archive.crawler.restlet.EnhDirectory
 
allQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All known queues.
allQueuesReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Compact report of all nonempty queues (one queue per line)
alreadySeen - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
analyze(CrawlURI, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
Run analysis: find form METHOD, ACTION, and all INPUT names/values Log as configured.
ANNOTATION_UNWRITTEN - Static variable in class org.archive.modules.writer.WriterPoolProcessor
CrawlURI annotation indicating no record was written.
announceSeeds() - Method in class org.archive.modules.seeds.SeedModule
 
announceSeeds() - Method in class org.archive.modules.seeds.TextSeedModule
Announce all seeds from configured source to SeedListeners (including nonseed lines mixed in).
announceSeeds(CountDownLatch) - Method in class org.archive.modules.seeds.TextSeedModule
 
announceSeedsFromReader(BufferedReader, CountDownLatch) - Method in class org.archive.modules.seeds.TextSeedModule
Announce all seeds (and nonseed possible-directive lines) from the given Reader
AntiCalendarCostAssignmentPolicy - Class in org.archive.crawler.frontier
CostAssignmentPolicy that further penalizes URIs with calendar-suggestive strings in them, with an extra unit of cost.
AntiCalendarCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
appCtx - Variable in class org.archive.crawler.framework.ActionDirectory
 
appCtx - Variable in class org.archive.crawler.framework.CheckpointService
 
appCtx - Variable in class org.archive.crawler.framework.CrawlController
 
appCtx - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
appCtx - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
appCtx - Variable in class org.archive.crawler.restlet.BeanBrowseResource
 
appCtx - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
 
appCtx - Variable in class org.archive.modules.ScriptedProcessor
 
appCtx - Variable in class org.archive.spring.ConfigPathConfigurer
 
append(String) - Method in class org.archive.util.PaddingStringBuffer
append a string directly to the buffer
append(int) - Method in class org.archive.util.PaddingStringBuffer
append an int to the buffer.
append(long) - Method in class org.archive.util.PaddingStringBuffer
append a long to the buffer.
appendQueueReports(PrintWriter, String, Iterator<?>, int, int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Append queue report to general Frontier report.
applyOverlaysTo(CrawlURI) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Apply the proper overlays (by Sheet beanName) to the given CrawlURI, according to configured associations.
applyQuota(CrawlURI, String, long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
Apply the quota specified by the given key against the actual value provided.
Arc2Warc - Class in org.archive.io
Convert ARCs to (sortof) WARCs.
Arc2Warc() - Constructor for class org.archive.io.Arc2Warc
 
ARCHIVE_TIME_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
ARCWriterProcessor - Class in org.archive.modules.writer
Processor module for writing the results of successful fetches (and perhaps someday, certain kinds of network failures) to the Internet Archive ARC file format.
ARCWriterProcessor() - Constructor for class org.archive.modules.writer.ARCWriterProcessor
 
ArrayLongFPCache - Class in org.archive.util.fingerprint
Simple long fingerprint cache using a backing array; any long maps to one of 'smear' slots.
ArrayLongFPCache() - Constructor for class org.archive.util.fingerprint.ArrayLongFPCache
 
asAnnotation() - Method in class org.archive.modules.forms.HTMLForm
Provide abbreviated annotation, of the form...
asHttpClientDataWith(String, String) - Method in class org.archive.modules.forms.HTMLForm
Create the NameValuePair array expected by HttpClient, merging username and password into the appropriate value slots.
assertNoSideEffects(CrawlURI) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
Asserts that the given URI has no URI errors, no localized errors, and no annotations.
assertNotOpen() - Method in class org.apache.commons.httpclient.HttpConnection
Throws an IllegalStateException if the connection is already open.
assertOpen() - Method in class org.apache.commons.httpclient.HttpConnection
Throws an IllegalStateException if the connection is not open.
AssignmentLevelSurtQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Create a queueKey based on the SURT authority, reduced to the public-suffix-plus-one domain (topmost assignable domain).
AssignmentLevelSurtQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.AssignmentLevelSurtQueueAssignmentPolicy
 
atFinish() - Method in class org.archive.crawler.framework.CrawlController
Evaluate if the crawl should stop because it is finished, without actually stopping the crawl.
atProcessor(Processor) - Method in class org.archive.crawler.framework.ToeThread
 
atProcessor(Processor) - Method in interface org.archive.modules.ProcessorChain.ChainStatusReceiver
 
attach(CrawlURI) - Method in class org.archive.modules.credential.Credential
Attach this credentials avatar to the passed curi .
ATTR_MAX_BYTES_WRITTEN - Static variable in class org.archive.modules.writer.Kw3WriterProcessor
Max size for each file.Key for the maximum ARC bytes to write attribute.
audience - Variable in class org.archive.modules.CrawlMetadata
 
AUDIO_VIDEO_IMAGE_MIMETYPE_SET - Static variable in class org.archive.util.UriUtils
 
AUDIO_VIDEO_IMAGE_MIMETYPES - Static variable in class org.archive.util.UriUtils
 
authenticate(Request) - Method in class org.archive.crawler.restlet.RateLimitGuard
 
authenticated(Credential, CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Has passed credential already been authenticated.
AutoKryo - Class in org.archive.bdb
Extensions to Kryo to let classes control their own registration, suggest other classes to register together, and use the same (Sun-JVM-only) trick for deserializing classes without no-arg constructors.
AutoKryo() - Constructor for class org.archive.bdb.AutoKryo
 
autoregister(Class<?>) - Method in class org.archive.bdb.AutoKryo
 
autoregisterTo(AutoKryo) - Static method in class org.archive.crawler.frontier.BdbWorkQueue
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.CrawlURI
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.CrawlHost
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.CrawlServer
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.RobotsDirectives
 
autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.Robotstxt
 
autoregisterTo(AutoKryo) - Static method in class org.archive.util.IdentityCacheableWrapper
 
AVAILABLE_EXTRACTOR - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
availableRobotsPolicies - Variable in class org.archive.modules.CrawlMetadata
Map of all available RobotsPolicies, by name, to choose from.
averageDepth() - Method in interface org.archive.crawler.framework.Frontier
Average depth of the last URI in all eligible queues.
averageDepth() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
averageDepth - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 

B

base - Variable in class org.archive.spring.ConfigPath
 
Base32 - Class in org.archive.util
Base32 - encodes and decodes RFC3548 Base32 (see http://www.faqs.org/rfcs/rfc3548.html ) Imported public-domain code of Bitzi.
Base32() - Constructor for class org.archive.util.Base32
 
baseClass - Variable in class org.archive.bdb.KryoBinding
 
BaseQueuePrecedencePolicy - Class in org.archive.crawler.frontier.precedence
QueuePrecedencePolicy that sets a uri-queue's precedence to a configured single value.
BaseQueuePrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
BaseResource - Class in org.archive.crawler.restlet
Abstract Resource with common shared functionality.
BaseResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.BaseResource
 
BaseRule - Class in org.archive.modules.canonicalize
Base of all rules applied canonicalizing a URL that are configurable via the Heritrix settings system.
BaseRule() - Constructor for class org.archive.modules.canonicalize.BaseRule
Constructor.
BaseUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
UriPrecedencePolicy which assigns URIs a set value (perhaps a overridden for different URIs).
BaseUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
bdb - Variable in class org.archive.crawler.frontier.BdbFrontier
 
bdb - Variable in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
bdb - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
bdb - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
bdb - Variable in class org.archive.modules.fetcher.BdbCookieStorage
 
bdb - Variable in class org.archive.modules.fetcher.FetchWhois
 
bdb - Variable in class org.archive.modules.net.BdbServerCache
 
bdb - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
bdb - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
 
BdbContentDigestHistory - Class in org.archive.modules.recrawl
Bdb content digest history store.
BdbContentDigestHistory() - Constructor for class org.archive.modules.recrawl.BdbContentDigestHistory
 
BdbCookieStorage - Class in org.archive.modules.fetcher
CookieStorage using BDB, so that cookies accumulated in large crawls do not outgrow RAM.
BdbCookieStorage() - Constructor for class org.archive.modules.fetcher.BdbCookieStorage
 
BdbFrontier - Class in org.archive.crawler.frontier
A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
BdbFrontier() - Constructor for class org.archive.crawler.frontier.BdbFrontier
 
BdbModule - Class in org.archive.bdb
Utility module for managing a shared BerkeleyDB-JE environment
BdbModule() - Constructor for class org.archive.bdb.BdbModule
 
BdbModule.BdbConfig - Class in org.archive.bdb
Configuration object for databases.
BdbModule.BdbConfig() - Constructor for class org.archive.bdb.BdbModule.BdbConfig
 
BdbMultipleWorkQueues - Class in org.archive.crawler.frontier
A BerkeleyDB-database-backed structure for holding ordered groupings of CrawlURIs.
BdbMultipleWorkQueues(Database, StoredClassCatalog) - Constructor for class org.archive.crawler.frontier.BdbMultipleWorkQueues
Create the multi queue in the given environment.
BdbServerCache - Class in org.archive.modules.net
ServerCache backed by BDB big maps; the usual choice for crawls.
BdbServerCache() - Constructor for class org.archive.modules.net.BdbServerCache
 
BdbUriUniqFilter - Class in org.archive.crawler.util
A BDB implementation of an AlreadySeen list.
BdbUriUniqFilter() - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
 
BdbUriUniqFilter(File) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbUriUniqFilter(File, int) - Constructor for class org.archive.crawler.util.BdbUriUniqFilter
Constructor.
BdbWorkQueue - Class in org.archive.crawler.frontier
One independent queue of items with the same 'classKey' (eg host).
BdbWorkQueue(String, BdbFrontier) - Constructor for class org.archive.crawler.frontier.BdbWorkQueue
Create a virtual queue inside the given BdbMultipleWorkQueues
BeanBrowseResource - Class in org.archive.crawler.restlet
Restlet Resource which allows browsing the constructed beans in a hierarchical fashion.
BeanBrowseResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.BeanBrowseResource
 
beanFactory - Variable in class org.archive.crawler.spring.SheetOverlaysManager
 
beanFactory - Variable in class org.archive.spring.Sheet
 
BeanFieldsPatternValidator - Class in org.archive.spring
 
BeanFieldsPatternValidator(Class<?>, String...) - Constructor for class org.archive.spring.BeanFieldsPatternValidator
 
BeanFieldsPatternValidator.PropertyPatternRule - Class in org.archive.spring
 
BeanFieldsPatternValidator.PropertyPatternRule(String, String, String) - Constructor for class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
 
BeanLookupBindings - Class in org.archive.crawler.framework
Provides syntactic sugar for H3 scripts to reference beans without adding a line like def scope = appCtx.getBean("scope");.
BeanLookupBindings(ApplicationContext) - Constructor for class org.archive.crawler.framework.BeanLookupBindings
 
BeanLookupBindings(ApplicationContext, Map<String, Object>) - Constructor for class org.archive.crawler.framework.BeanLookupBindings
 
beanName - Variable in class org.archive.crawler.frontier.BdbFrontier
 
beanName - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
beanName - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
beanName - Variable in class org.archive.modules.deciderules.DecideRuleSequence
 
beanName - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
beanName - Variable in class org.archive.modules.Processor
 
beanPath - Variable in class org.archive.crawler.restlet.BeanBrowseResource
 
beansException(BeansException) - Method in class org.archive.crawler.framework.CrawlJob
Report a BeansException during instantiation; report chain in reverse order (so root cause is first); ignore non-BeansExceptions or messages without a useful compact message.
BeansModel - Class in org.archive.crawler.restlet.models
 
BeansModel(String, String, String, Object, boolean, String, Object, Collection<Object>) - Constructor for class org.archive.crawler.restlet.models.BeansModel
 
beanToNameMap - Variable in class org.archive.crawler.restlet.JobRelatedResource
 
beginCrawlStop() - Method in class org.archive.crawler.framework.CrawlController
Start the process of stopping the crawl.
beginDisposition(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Inform frontier that a block of processing that should complete atomically with respect to checkpoints is about to begin.
beginDisposition(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
beginFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
beginFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Begin merging pending candidates with complete list.
beginFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
BenchmarkUriUniqFilters - Class in org.archive.crawler.util
BenchmarkUriUniqFilters
BenchmarkUriUniqFilters() - Constructor for class org.archive.crawler.util.BenchmarkUriUniqFilters
 
bind(String, Object) - Method in class org.archive.crawler.restlet.ScriptingConsole
 
bindObjectName(Context, ObjectName) - Static method in class org.archive.util.JndiUtils
 
bInheritHandle - Variable in class org.archive.util.FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES
 
BIT_INDEX_MASK - Static variable in class org.archive.util.BloomFilter64bit
 
bitIndexesFor(CharSequence) - Method in class org.archive.util.BloomFilter64bit
 
bits - Variable in class org.archive.util.BloomFilter64bit
The underlying bit vector
BLOCK_SIZE - Static variable in interface org.archive.util.ms.BlockFileSystem
The size of a block in bytes.
blockAwaitingSeedLines - Variable in class org.archive.modules.seeds.TextSeedModule
Number of lines of seeds-source to read on initial load before proceeding with crawl.
BlockFileSystem - Interface in org.archive.util.ms
Describes the internal file system contained in .doc files.
BlockInputStream - Class in org.archive.util.ms
InputStream for a file contained in a BlockFileSystem.
BlockInputStream(BlockFileSystem, int) - Constructor for class org.archive.util.ms.BlockInputStream
Constructor.
bloom - Variable in class org.archive.crawler.util.BloomUriUniqFilter
 
BloomFilter - Interface in org.archive.util
Common interface for different Bloom filter implementations
BloomFilter64bit - Class in org.archive.util
A Bloom filter.
BloomFilter64bit(long, int) - Constructor for class org.archive.util.BloomFilter64bit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomFilter64bit(long, int, boolean) - Constructor for class org.archive.util.BloomFilter64bit
 
BloomFilter64bit(long, int, Random, boolean) - Constructor for class org.archive.util.BloomFilter64bit
Creates a new Bloom filter with given number of hash functions and expected number of elements.
BloomUriUniqFilter - Class in org.archive.crawler.util
An implementation of an AlreadySeen list based on the MG4J BloomFilter.
BloomUriUniqFilter() - Constructor for class org.archive.crawler.util.BloomUriUniqFilter
Default constructor
bucketBasis(UURI) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
Base subqueue on first path-segment, if any.
bucketFor(long, int) - Method in class org.archive.util.LongToIntConsistentHash
Return the proper integer bucket-number for the given long hash, up to the given integer boundary (exclusive).
bucketFor(CharSequence, int) - Method in class org.archive.util.LongToIntConsistentHash
Convenience alternative which creates longHash from CharSequence
bucketFor(char[], int) - Method in class org.archive.util.LongToIntConsistentHash
 
BucketQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Uses the target IPs as basis for queue-assignment, distributing them over a fixed number of sub-queues.
BucketQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
buffer - Variable in class org.archive.util.PaddingStringBuffer
 
bufLocal - Variable in class org.archive.crawler.io.UriProcessingFormatter
Reusable assembly buffer.
buildAndAddOutlink(CrawlURI, Map<String, Object>) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
buildDisplayingHeader(int, long) - Static method in class org.archive.crawler.util.LogReader
 
buildSurtPrefixSet() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Construct the set of prefixes to use, from the seed list ( which may include both URIs and '+'-prefixed directives).
busyThreads - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
bytesProcessed - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 

C

cache - Variable in class org.archive.crawler.processor.CrawlMapper
 
cache - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
cachedFormat - Variable in class org.archive.crawler.io.UriProcessingFormatter
 
cacheLength() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
cachePercent - Variable in class org.archive.bdb.BdbModule
 
cacheSize - Variable in class org.archive.bdb.BdbModule
 
calcOutputDirs() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
calcReverseSortedHostsDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
Return a copy of the hosts distribution in reverse-sorted (largest first) order.
calcSchemeAuthorityKeyBytes(String) - Static method in class org.archive.crawler.util.BdbUriUniqFilter
 
calcSeedRecordsSortedByStatusCode() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
calculateInsertKey(CrawlURI) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Calculate the insertKey that places a CrawlURI in the desired spot.
calculateOriginKey(String) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Calculate the 'origin' key for a virtual queue of items with the given classKey.
calculatePrecedence(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
Calculate the precedence value for the given queue.
calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
Calculate the precedence value for the given URI.
calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
 
calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
calculatePrecedence(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
 
CALENDARISH - Static variable in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
canary - Variable in class org.archive.util.ObjectIdentityBdbCache
 
candidateChain - Variable in class org.archive.crawler.framework.CrawlController
Candidate chain
candidateChain - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
Candidate chain
CandidateChain - Class in org.archive.modules
 
CandidateChain() - Constructor for class org.archive.modules.CandidateChain
 
CandidateScoper - Class in org.archive.crawler.prefetch
Simple single-URI scoper, considers passed-in URI as candidate; sets fetchstatus negative and skips to end of processing if out-of-scope.
CandidateScoper() - Constructor for class org.archive.crawler.prefetch.CandidateScoper
 
CandidatesProcessor - Class in org.archive.crawler.postprocessor
Processor which sends all candidate outlinks through the CandidateChain, scheduling those with non-negative status codes to the frontier.
CandidatesProcessor() - Constructor for class org.archive.crawler.postprocessor.CandidatesProcessor
Usual no-argument constructor
candidateUserAgents - Variable in class org.archive.modules.net.FirstNamedRobotsPolicy
list of user-agents to try; if any are allowed, a URI will be crawled
candidateUserAgents - Variable in class org.archive.modules.net.MostFavoredRobotsPolicy
list of user-agents to try; if any are allowed, a URI will be crawled
CanonicalizationRule - Interface in org.archive.modules.canonicalize
A rule to apply canonicalizing a url.
canonicalize(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
Canonicalize passed CrawlURI.
canonicalize(String) - Method in interface org.archive.modules.canonicalize.CanonicalizationRule
Apply this canonicalization rule.
canonicalize(String) - Method in class org.archive.modules.canonicalize.FixupQueryString
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.LowercaseRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.RegexRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
Run the passed uuri through the list of rules.
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripExtraSlashes
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripSessionCFIDs
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripSessionIDs
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripUserinfoRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripWWWNRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.StripWWWRule
 
canonicalize(String) - Method in class org.archive.modules.canonicalize.UriCanonicalizationPolicy
 
canonicalString - Variable in class org.archive.modules.CrawlURI
 
capacityPowerOfTwo - Variable in class org.archive.util.AbstractLongFPSet
the capacity of this set, specified as the exponent of a power of 2
caseSensitiveFilesystem - Variable in class org.archive.modules.writer.MirrorWriterProcessor
True if the file system is case-sensitive, like UNIX.
catalog - Variable in class org.archive.modules.extractor.PDFParser
 
characterMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
This list is grouped in pairs.
checkAvailableSpace(File) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
Probe via File.getUsableSpace to see if monitored paths have fallen below the pause threshold.
checkBytesWritten() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
checkForLimitsExceeded(CrawlStatSnapshot) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
checkForNull(String) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
checkForSeedPromotion(CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
Check if the URI needs special 'discovered seed' treatment.
checkFutures() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Check for any future-scheduled URIs now eligible for reenqueuing
checkMidfetchAbort(CrawlURI, HttpRecorderMethod, HttpConnection) - Method in class org.archive.modules.fetcher.FetchHTTP
 
checkNotUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
Throws an IllegalStateException if the HTTP method has been already executed, but not recycled.
checkOutlinks - Variable in class org.archive.crawler.processor.CrawlMapper
Whether to apply the mapping to discovered outlinks, for example after extraction has occurred.
Checkpoint - Class in org.archive.checkpointing
Represents a single checkpoint, by its name and main store directory.
Checkpoint() - Constructor for class org.archive.checkpointing.Checkpoint
 
checkpoint - Variable in class org.archive.crawler.framework.CheckpointSuccessEvent
 
Checkpointable - Interface in org.archive.checkpointing
Interface for objects that can checkpoint their state, possibly but not necessarily into the provided Checkpoint instance, on request.
checkpointDir - Variable in class org.archive.checkpointing.Checkpoint
Checkpoints directory; either an absolute path, or relative to the CheckpointService's checkpointsDirectory (which will be inserted as the COnfigPath base before the Checkpoint is consulted).
checkpointFailed(Exception) - Method in class org.archive.crawler.framework.CheckpointService
Note that a checkpoint failed
checkpointFailed(String) - Method in class org.archive.crawler.framework.CheckpointService
 
checkpointInProgress - Variable in class org.archive.crawler.framework.CheckpointService
 
checkpointIntervalMinutes - Variable in class org.archive.crawler.framework.CheckpointService
 
checkpointsDir - Variable in class org.archive.crawler.framework.CheckpointService
 
CheckpointService - Class in org.archive.crawler.framework
Executes checkpoints, and offers convenience methods for enumerating available Checkpoints and injecting a recovery-Checkpoint after build and before launch (setRecoveryCheckpointByName).
CheckpointService() - Constructor for class org.archive.crawler.framework.CheckpointService
Create a new Checkpointer
CheckpointSuccessEvent - Class in org.archive.crawler.framework
Report success of a Checkpoint (so that it may be reported by the CrawlJOb to the job log).
CheckpointSuccessEvent(CheckpointService, Checkpoint) - Constructor for class org.archive.crawler.framework.CheckpointSuccessEvent
 
checkpointTask - Variable in class org.archive.crawler.framework.CheckpointService
 
CheckpointUtils - Class in org.archive.crawler.util
Utilities useful checkpointing.
CheckpointUtils() - Constructor for class org.archive.crawler.util.CheckpointUtils
 
CheckpointValidator - Class in org.archive.crawler.framework
 
CheckpointValidator() - Constructor for class org.archive.crawler.framework.CheckpointValidator
 
checkQuotas(CrawlURI, FetchStats.HasFetchStats, int) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
Check all quotas for the given substats and category (server, host, or group).
checkUri - Variable in class org.archive.crawler.processor.CrawlMapper
Whether to apply the mapping to a URI being processed itself, for example early in processing (while its status is still 'unattempted').
checkUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
Throws an IllegalStateException if the HTTP method has not been executed since last recycle.
checkXML() - Method in class org.archive.crawler.framework.CrawlJob
Is the primary XML config minimally well-formed?
chmod - Variable in class org.archive.modules.writer.Kw3WriterProcessor
Should permissions be changed for the newly created dirs.
chmodValue - Variable in class org.archive.modules.writer.Kw3WriterProcessor
What should the permissions be set to.
chosenEngine - Variable in class org.archive.crawler.restlet.ScriptResource
 
circle - Variable in class org.archive.util.LongToIntConsistentHash
 
cj - Variable in class org.archive.crawler.restlet.JobRelatedResource
 
cj - Variable in class org.archive.crawler.restlet.JobResource
 
classCatalog - Variable in class org.archive.util.bdbje.EnhancedEnvironment
 
classCatalogDB - Variable in class org.archive.util.bdbje.EnhancedEnvironment
 
classKey - Variable in class org.archive.crawler.frontier.WorkQueue
The classKey
ClassKeyMatchesRegexDecideRule - Class in org.archive.crawler.deciderules
Rule applies configured decision to any CrawlURI class key -- i.e.
ClassKeyMatchesRegexDecideRule() - Constructor for class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
Usual constructor.
clazz - Variable in class org.archive.spring.BeanFieldsPatternValidator
 
cleanup() - Method in class org.archive.crawler.framework.ToePool
 
cleanupHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
Perform any final cleanup related to the HttpClient instance.
cleanUpOldFiles(String) - Method in class org.archive.util.TmpDirTestCase
Delete any files left over from previous run.
cleanUpOldFiles(File, String) - Method in class org.archive.util.TmpDirTestCase
Delete any files left over from previous run.
clear() - Method in class org.apache.commons.httpclient.HttpState
Clears the state information (all cookies, credentials and proxy credentials).
clear() - Method in class org.archive.crawler.io.UriProcessingFormatter
 
clearAllOverrideContexts() - Static method in class org.archive.spring.KeyedProperties
 
clearAt(long) - Method in class org.archive.util.AbstractLongFPSet
 
clearAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
clearCookies() - Method in class org.apache.commons.httpclient.HttpState
Clears all cookies.
clearCredentials() - Method in class org.apache.commons.httpclient.HttpState
Clears all credentials.
clearOverridesFrom(OverlayContext) - Static method in class org.archive.spring.KeyedProperties
 
clearPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
Clear prerequisite, if any.
clearProxyCredentials() - Method in class org.apache.commons.httpclient.HttpState
Clears all proxy credentials.
CLibrary - Interface in org.archive.util
Interface to standard C library functions; initially just link().
ClientFTP - Class in org.archive.net
Client for FTP operations.
ClientFTP() - Constructor for class org.archive.net.ClientFTP
Constructs a new ClientFTP.
close() - Method in class org.apache.commons.httpclient.HttpConnection
Closes the socket and streams.
close() - Method in class org.archive.bdb.BdbModule
 
close() - Method in class org.archive.bdb.StoredQueue
 
close() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Close down any allocated resources.
close() - Method in class org.archive.crawler.frontier.BdbFrontier
 
close() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
clean up
close() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Release resources only needed when running
close() - Method in class org.archive.crawler.reporting.AlertHandler
 
close() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
close() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
close() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
close() - Method in class org.archive.io.CrawlerJournal
Flush and close the underlying IO objects.
close() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
close() - Method in class org.archive.modules.fetcher.DefaultServerCache
Called when shutting down the cache so we can do clean up.
close() - Method in class org.archive.util.bdbje.EnhancedEnvironment
 
close() - Method in class org.archive.util.ObjectIdentityBdbCache
 
close() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
close() - Method in interface org.archive.util.ObjectIdentityCache
close/release any associated resources
close() - Method in class org.archive.util.ObjectIdentityMemCache
 
closeDatabase(Database) - Method in class org.archive.bdb.BdbModule
 
closeDatabase(String) - Method in class org.archive.bdb.BdbModule
 
closeDataConnection() - Method in class org.archive.net.ClientFTP
 
closeIfStale() - Method in class org.apache.commons.httpclient.HttpConnection
Closes the connection if stale.
closeLogFiles() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
Close all log files and remove handlers from loggers.
closeSocketAndStreams() - Method in class org.apache.commons.httpclient.HttpConnection
Closes everything out.
collect(CrawlController, StatisticsTracker) - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
Collect all relevant snapshot samples, from the given CrawlController and StatisticsTracker (which also provides the previous snapshot for rate-calculations.
collection - Variable in class org.archive.modules.writer.Kw3WriterProcessor
Name of collection.
COLLECTION_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
comment - Variable in class org.archive.modules.deciderules.DecideRule
 
compactReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
compare(Object, Object) - Method in class org.apache.commons.httpclient.Cookie
Compares two cookies to determine order for cookie header.
compareTo(CrawlJob) - Method in class org.archive.crawler.framework.CrawlJob
Sort for reverse-chronological listing.
compareTo(Delayed) - Method in class org.archive.crawler.frontier.WorkQueue
 
compareTo(DecideRuledSheetAssociation) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
compareTo(FPMergeUriUniqFilter.PendingItem) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
compareTo(Link) - Method in class org.archive.modules.extractor.Link
 
completePause() - Method in class org.archive.crawler.framework.CrawlController
 
completeStop() - Method in class org.archive.crawler.framework.CrawlController
Called when the last toethread exits.
component - Variable in class org.archive.crawler.Heritrix
 
composeCacheSummary() - Method in class org.archive.util.ObjectIdentityBdbCache
 
composeCacheSummary() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
CompositeIterator<E> - Class in org.archive.util.iterator
An iterator that's built up out of any number of other iterators.
CompositeIterator() - Constructor for class org.archive.util.iterator.CompositeIterator
Create an empty CompositeIterator.
CompositeIterator(Iterator<E>, Iterator<E>) - Constructor for class org.archive.util.iterator.CompositeIterator
Convenience method for concatenating together two iterators.
compress - Variable in class org.archive.modules.writer.WriterPoolProcessor
Whether to gzip-compress files when writing to disk; by default true, meaning do-compress.
concludedSeedBatch() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
concludedSeedBatch() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
concludedSeedBatch() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
concludedSeedBatch() - Method in interface org.archive.modules.seeds.SeedListener
 
ConfigFile - Class in org.archive.spring
ConfigPath with added implication that it is an individual, readable/writable File.
ConfigFile() - Constructor for class org.archive.spring.ConfigFile
 
ConfigFile(String, String) - Constructor for class org.archive.spring.ConfigFile
 
ConfigFileEditor - Class in org.archive.spring
PropertyEditor allowing Strings to become ConfigFile instances.
ConfigFileEditor() - Constructor for class org.archive.spring.ConfigFileEditor
 
ConfigPath - Class in org.archive.spring
A filesystem path, as a bean, for the convenience of configuration via srping beans.xml or user interfaces to same.
ConfigPath() - Constructor for class org.archive.spring.ConfigPath
 
ConfigPath(String, String) - Constructor for class org.archive.spring.ConfigPath
 
configPathConfigurer - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
 
ConfigPathConfigurer - Class in org.archive.spring
Bean to fixup all configuration-relative ConfigPath instances, and maintain an inventory of referenced paths.
ConfigPathConfigurer() - Constructor for class org.archive.spring.ConfigPathConfigurer
 
ConfigPathEditor - Class in org.archive.spring
PropertyEditor allowing Strings to become ConfigPath instances.
ConfigPathEditor() - Constructor for class org.archive.spring.ConfigPathEditor
 
ConfigString - Class in org.archive.spring
A configuration string that provides its own reader via the ReadSource interface, for convenient use in spring configuration where any of an inline string, path to local file (ConfigPath), or any other readable-text-source would all be equally welcome.
ConfigString() - Constructor for class org.archive.spring.ConfigString
 
ConfigString(String) - Constructor for class org.archive.spring.ConfigString
 
configureHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
 
configureHttp(int, String, String, int, String, String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
configureMethod(CrawlURI, HttpMethod) - Method in class org.archive.modules.fetcher.FetchHTTP
Configure the HttpMethod setting options and headers.
configurer - Variable in class org.archive.spring.ConfigPath
 
congestionRatio() - Method in interface org.archive.crawler.framework.Frontier
Ratio of number of threads that would theoretically allow maximum crawl progress (if each was as productive as current threads), to current number of threads.
congestionRatio() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
congestionRatio - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
conhash - Variable in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
connect() - Method in class org.archive.net.s3.S3URLConnection
Connect to S3 and get the object reference, but don't read any of the object data yet.
connectTimeoutMs - Variable in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
consecutiveConnectionErrors - Variable in class org.archive.modules.net.CrawlServer
 
considerActive() - Method in class org.archive.crawler.frontier.WorkQueue
Begin an 'active' session, which begins when a queue first offers a URI for crawling, and continues until it is deactivated (for example, for session-budget reasons).
considerDnsPreconditions(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
considerIfLikelyUri(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
Consider whether a given string is URI-like.
considerIncluded(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should consider the given UURI as if already scheduled.
considerIncluded(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
considerQueryStringValues(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
Consider a query-string-like collections of key=value[&key=value] pairs for URI-like strings in the values.
considerRobotsPreconditions(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Consider the robots precondition.
considerString(Extractor, CrawlURI, boolean, String) - Method in class org.archive.modules.extractor.ExtractorJS
 
considerStringAsUri(String) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
considerStrings(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorJS
 
considerStrings(Extractor, CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorJS
 
considerStrings(Extractor, CrawlURI, CharSequence, boolean) - Method in class org.archive.modules.extractor.ExtractorJS
 
considerTimestamp() - Method in class org.archive.io.CrawlerJournal
Write a timestamp line if appropriate
consistencyCheck() - Method in class org.archive.crawler.frontier.BdbFrontier
Run a self-consistency check over queue collections, queues-of-queues, etc.
consistencyMarkup(DisposableStoredSortedMap<String, String>, Iterable<?>, String) - Method in class org.archive.crawler.frontier.BdbFrontier
 
CONSTRUCTOR_CACHE - Static variable in class org.archive.bdb.AutoKryo
 
constructRegex(int) - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
 
contains(long) - Method in class org.archive.util.AbstractLongFPSet
Does this set contain the given value?
contains(CharSequence) - Method in interface org.archive.util.BloomFilter
Checks whether the given character sequence is in this filter.
contains(CharSequence) - Method in class org.archive.util.BloomFilter64bit
Checks whether the given character sequence is in this filter.
contains(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
contains(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Does this set contain a given fingerprint.
contains(int) - Method in class org.archive.util.ms.Piece
 
containsContentTypeCharsetDeclaration() - Method in class org.archive.modules.CrawlURI
 
containsDataKey(String) - Method in class org.archive.modules.CrawlURI
 
containsHost(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
 
containsKey(Object) - Method in class org.archive.crawler.framework.BeanLookupBindings
 
containsServer(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
 
CONTENT_LENGTH_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
CONTENT_MD5_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
CONTENT_TYPE_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
contentDigestHistory - Variable in class org.archive.modules.recrawl.ContentDigestHistoryLoader
 
contentDigestHistory - Variable in class org.archive.modules.recrawl.ContentDigestHistoryStorer
 
ContentDigestHistoryLoader - Class in org.archive.modules.recrawl
 
ContentDigestHistoryLoader() - Constructor for class org.archive.modules.recrawl.ContentDigestHistoryLoader
 
ContentDigestHistoryStorer - Class in org.archive.modules.recrawl
 
ContentDigestHistoryStorer() - Constructor for class org.archive.modules.recrawl.ContentDigestHistoryStorer
 
ContentExtractor - Class in org.archive.modules.extractor
Extracts link from the fetched content of a URI, as opposed to its headers.
ContentExtractor() - Constructor for class org.archive.modules.extractor.ContentExtractor
 
ContentExtractorTestBase - Class in org.archive.modules.extractor
Abstract base class for unit testing ContentExtractor implementations.
ContentExtractorTestBase() - Constructor for class org.archive.modules.extractor.ContentExtractorTestBase
 
ContentLengthDecideRule - Class in org.archive.modules.deciderules
 
ContentLengthDecideRule() - Constructor for class org.archive.modules.deciderules.ContentLengthDecideRule
Usual constructor.
contentSinceCheck - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
contentTypeMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
This list is grouped in pairs.
ContentTypeMatchesRegexDecideRule - Class in org.archive.modules.deciderules
DecideRule whose decision is applied if the URI's content-type is present and matches the supplied regular expression.
ContentTypeMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule
 
ContentTypeNotMatchesRegexDecideRule - Class in org.archive.modules.deciderules
DecideRule whose decision is applied if the URI's content-type is present and does not match the supplied regular expression.
ContentTypeNotMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.ContentTypeNotMatchesRegexDecideRule
 
controlConversation - Variable in class org.archive.net.ClientFTP
 
controller - Variable in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
 
controller - Variable in class org.archive.crawler.framework.CheckpointService
 
controller - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
 
controller - Variable in class org.archive.crawler.framework.ToePool
 
controller - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
controller - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
 
controller - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
controller - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
controller - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
Cookie - Class in org.apache.commons.httpclient
HTTP "magic-cookie" represents a piece of state information that the HTTP agent and the target server can exchange to maintain a session.
Cookie() - Constructor for class org.apache.commons.httpclient.Cookie
Default constructor.
Cookie(String, String, String) - Constructor for class org.apache.commons.httpclient.Cookie
Creates a cookie with the given name, value and domain attribute.
Cookie(String, String, String, String, Date, boolean) - Constructor for class org.apache.commons.httpclient.Cookie
Creates a cookie with the given name, value, domain attribute, path attribute, expiration attribute, and secure attribute
Cookie(String, String, String, String, int, boolean) - Constructor for class org.apache.commons.httpclient.Cookie
Creates a cookie with the given name, value, domain attribute, path attribute, maximum age attribute, and secure attribute
COOKIEDB_NAME - Static variable in class org.archive.modules.fetcher.BdbCookieStorage
 
cookiesLoadFile - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
 
CookieSpec - Interface in org.apache.commons.httpclient.cookie
Defines the cookie management specification.
CookieSpecBase - Class in org.apache.commons.httpclient.cookie
Cookie management functions shared by all specification.
CookieSpecBase() - Constructor for class org.apache.commons.httpclient.cookie.CookieSpecBase
Default constructor
cookiesSaveFile - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
 
CookieStorage - Interface in org.archive.modules.fetcher
 
cookieStorage - Variable in class org.archive.modules.fetcher.FetchHTTP
 
copy(CrawlJob, File, boolean) - Method in class org.archive.crawler.framework.Engine
Copy a job to a new location, possibly making a job a profile or a profile a runnable job.
copy(CrawlJob, String, boolean) - Method in class org.archive.crawler.framework.Engine
Copy a job to a new location, possibly making a job a profile or a profile a runnable job.
copyForwardWriteTagIfDupe(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
If this fetch is identical to the last written (archived) fetch, then copy forward the writeTag.
copyJob(String, boolean) - Method in class org.archive.crawler.restlet.JobResource
 
copyPersistSourceToHistoryMap(File, StoredSortedMap<String, Map>) - Static method in class org.archive.modules.recrawl.PersistProcessor
Populates a given StoredSortedMap (history map) from an old environment db or a persist log.
copyPersistSourceToHistoryMap(URL, StoredSortedMap<String, Map>) - Static method in class org.archive.modules.recrawl.PersistProcessor
Populates a given StoredSortedMap (history map) from an old persist log.
CoreAttributeConstants - Interface in org.archive.modules
Attribute keys and constant strings used by the core crawler classes.
CostAssignmentPolicy - Class in org.archive.crawler.frontier
Calculate a integer 'cost' value for the given CrawlURI.
CostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.CostAssignmentPolicy
 
costCount - Variable in class org.archive.crawler.frontier.WorkQueue
Total number of items charged against queue; with totalExpenditure can be used to calculate 'average cost'.
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.CostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.UnitCostAssignmentPolicy
 
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.WagCostAssignmentPolicy
Add constant penalties for certain features of URI (and its 'via') that make it more delayable/skippable.
costOf(CrawlURI) - Method in class org.archive.crawler.frontier.ZeroCostAssignmentPolicy
 
CostUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
UriPrecedencePolicy which sets a URI's precedence to its 'cost' -- which simulates the in-queue sorting order in Heritrix 1.x, where cost contributed the same bits to the queue-insert-key that precedence now does.
CostUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.CostUriPrecedencePolicy
 
count() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
 
count - Variable in class org.archive.crawler.frontier.WorkQueue
Total number of stored items
count - Variable in class org.archive.crawler.reporting.AlertThreadGroup
 
count - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
count - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
count() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
count - Variable in class org.archive.util.AbstractLongFPSet
The current number of elements in the set
count() - Method in class org.archive.util.AbstractLongFPSet
Return the number of entries in this set.
count - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
count() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
count() - Method in interface org.archive.util.fingerprint.LongFPSet
get the number of elements in the Set
count - Variable in class org.archive.util.ObjectIdentityBdbCache
 
count - Variable in class org.archive.util.ObjectIdentityBdbManualCache
 
countryCodes - Variable in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
Country code name.
Cp1252 - Class in org.archive.util.ms
A fast implementation of code page 1252.
crawlCheckpoint(Object, File) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
CrawlController - Class in org.archive.crawler.framework
CrawlController collects all the classes which cooperate to perform a crawl and provides a high-level interface to the running crawl.
CrawlController() - Constructor for class org.archive.crawler.framework.CrawlController
 
CrawlController.State - Enum in org.archive.crawler.framework
 
CrawlController.StopCompleteEvent - Class in org.archive.crawler.framework
 
CrawlController.StopCompleteEvent(Object) - Constructor for class org.archive.crawler.framework.CrawlController.StopCompleteEvent
 
crawlDelay - Variable in class org.archive.modules.net.RobotsDirectives
 
crawledBytes - Variable in class org.archive.crawler.reporting.StatisticsTracker
tally sizes novel, verified (same hash), vouched (not-modified)
CrawledBytesHistotable - Class in org.archive.crawler.util
 
CrawledBytesHistotable() - Constructor for class org.archive.crawler.util.CrawledBytesHistotable
 
crawledBytesSummary() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawledURIDisregard(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawledURIFailure(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawledURINeedRetry(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawledURISuccessful(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlEmpty(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlEnded(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
crawlEnded(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlEnding(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlEndTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
wall-clock time the crawl ended
crawlerCount - Variable in class org.archive.crawler.processor.HashCrawlMapper
Number of crawlers among which to split up the URIs.
CrawlerJournal - Class in org.archive.io
Utility class for a crawler journal/log that is compressed and rotates by serial number at checkpoints.
CrawlerJournal(String, String) - Constructor for class org.archive.io.CrawlerJournal
Create a new crawler journal at the given location
CrawlerJournal(File) - Constructor for class org.archive.io.CrawlerJournal
Create a new crawler journal at the given location
CrawlerLoggerModule - Class in org.archive.crawler.reporting
Module providing all expected whole-crawl logging facilities
CrawlerLoggerModule() - Constructor for class org.archive.crawler.reporting.CrawlerLoggerModule
 
CrawlHost - Class in org.archive.modules.net
Represents a single remote "host".
CrawlHost(String) - Constructor for class org.archive.modules.net.CrawlHost
Create a new CrawlHost object.
CrawlHost(String, String) - Constructor for class org.archive.modules.net.CrawlHost
Create a new CrawlHost object.
CrawlJob - Class in org.archive.crawler.framework
CrawlJob represents a crawl configuration, including its configuration files, instantiated/running ApplicationContext, and disk output, potentially across multiple runs.
CrawlJob(File) - Constructor for class org.archive.crawler.framework.CrawlJob
 
CrawlJob.JobLogFormatter - Class in org.archive.crawler.framework
Formatter for job.log
CrawlJob.JobLogFormatter() - Constructor for class org.archive.crawler.framework.CrawlJob.JobLogFormatter
 
CrawlJobModel - Class in org.archive.crawler.restlet.models
 
CrawlJobModel(CrawlJob, String) - Constructor for class org.archive.crawler.restlet.models.CrawlJobModel
 
CrawlLimitEnforcer - Class in org.archive.crawler.framework
Bean to enforce limits on the size of a crawl in URI count, byte count, or elapsed time.
CrawlLimitEnforcer() - Constructor for class org.archive.crawler.framework.CrawlLimitEnforcer
 
crawlLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
CrawlMapper - Class in org.archive.crawler.processor
A simple crawl splitter/mapper, dividing up CrawlURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).
CrawlMapper() - Constructor for class org.archive.crawler.processor.CrawlMapper
Constructor.
CrawlMetadata - Class in org.archive.modules
Basic crawl metadata, as consulted by functional modules and recorded in ARCs/WARCs.
CrawlMetadata() - Constructor for class org.archive.modules.CrawlMetadata
 
crawlPaused(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlPauseStarted - Variable in class org.archive.crawler.reporting.StatisticsTracker
wall-clock time of last pause, while pause in progres
crawlPausing(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
crawlResuming(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
CrawlServer - Class in org.archive.modules.net
Represents a single remote "server".
CrawlServer(String) - Constructor for class org.archive.modules.net.CrawlServer
Creates a new CrawlServer object.
crawlStartTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
wall-clock time the crawl started
CrawlStateEvent - Class in org.archive.crawler.event
 
CrawlStateEvent(Object, CrawlController.State, String) - Constructor for class org.archive.crawler.event.CrawlStateEvent
 
CrawlStatSnapshot - Class in org.archive.crawler.reporting
Frozen snapshot of a variety of crawl statistics.
CrawlStatSnapshot() - Constructor for class org.archive.crawler.reporting.CrawlStatSnapshot
 
CrawlStatus - Enum in org.archive.crawler.framework
 
CrawlSummaryReport - Class in org.archive.crawler.reporting
The "Crawl Report", with summaries of overall crawl size.
CrawlSummaryReport() - Constructor for class org.archive.crawler.reporting.CrawlSummaryReport
 
crawlTotalPausedTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
duration tally of all time spent in paused state
CrawlURI - Class in org.archive.modules
Represents a candidate URI and the associated state it collects as it is crawled.
CrawlURI(UURI) - Constructor for class org.archive.modules.CrawlURI
Create a new instance of CrawlURI from a UURI.
CrawlURI(UURI, String, UURI, LinkContext) - Constructor for class org.archive.modules.CrawlURI
 
CrawlURI.FetchType - Enum in org.archive.modules
 
CrawlURIDispositionEvent - Class in org.archive.crawler.event
 
CrawlURIDispositionEvent(Object, CrawlURI, CrawlURIDispositionEvent.Disposition) - Constructor for class org.archive.crawler.event.CrawlURIDispositionEvent
 
CrawlURIDispositionEvent.Disposition - Enum in org.archive.crawler.event
 
createCrawlURI(UURI, Link) - Method in class org.archive.modules.CrawlURI
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
createCrawlURI(UURI, Link, int, boolean) - Method in class org.archive.modules.CrawlURI
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
createdEnvironment - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
createDiskMap(Database, StoredClassCatalog, Class) - Method in class org.archive.util.ObjectIdentityBdbCache
 
createDiskMap(Database, StoredClassCatalog, Class) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
createFileLogger(File, String, Logger) - Static method in class org.archive.crawler.util.LogUtils
Creates a file logger that use heritrix.properties file logger configuration.
createFormSubmissionAttempt(CrawlURI, HTMLForm, String) - Method in class org.archive.modules.forms.FormLoginProcessor
 
createFp(CharSequence) - Static method in class org.archive.crawler.util.FPMergeUriUniqFilter
Create a fingerprint from the given key
CreateHardLinkA(String, String, FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES) - Method in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
 
createHostDirectory - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Create a subdirectory named for the host in the URI.
createInactiveQueueForPrecedence(int) - Method in class org.archive.crawler.frontier.BdbFrontier
 
createInactiveQueueForPrecedence(int, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
Optionally reuse prior data, for use when resuming from a checkpoint
createInactiveQueueForPrecedence(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Create an inactiveQueue to hold queue names at the given precedence
createKey(CharSequence) - Static method in class org.archive.crawler.util.BdbUriUniqFilter
Create fingerprint.
createMultipleWorkQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
Create the single object (within which is one BDB database) inside which all the other queues live.
createNewJobWithDefaults(File) - Method in class org.archive.crawler.framework.Engine
create a new job dir and copy profile CXML into as non-profile CXML
createPortDirectory - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Create a subdirectory named for the port in the URI.
createRecorder(String) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
Deprecated.
createRecorder(String, String) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
 
createRoot() - Method in class org.archive.crawler.restlet.EngineApplication
 
createSocket() - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(String, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(InetAddress, int, InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
Attempts to get a new socket connection to the given host within the given time limit.
createSocket(String, int) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
 
createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(String, int) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
createSocket(Socket, String, int, boolean) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
CreateSymbolicLinkA(String, String, FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES) - Method in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
 
createUriSet() - Method in class org.archive.crawler.util.MemUriUniqFilter
 
createUriSet() - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
Credential - Class in org.archive.modules.credential
Credential type.
Credential() - Constructor for class org.archive.modules.credential.Credential
Constructor.
credentialPrecondition(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Consider credential preconditions.
CredentialStore - Class in org.archive.modules.credential
Front door to the credential store.
CredentialStore() - Constructor for class org.archive.modules.credential.CredentialStore
Constructor.
CSS_BACKSLASH_ESCAPE - Static variable in class org.archive.modules.extractor.ExtractorCSS
 
CSS_URI_EXTRACTOR - Static variable in class org.archive.modules.extractor.ExtractorCSS
CSS URL extractor pattern.
curi - Variable in class org.archive.crawler.event.CrawlURIDispositionEvent
 
curi - Variable in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
current() - Static method in class org.archive.crawler.reporting.AlertThreadGroup
 
current - Variable in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
currentDocsPerSecond - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
currentFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
currentIterator - Variable in class org.archive.util.iterator.CompositeIterator
 
currentKiBPerSec - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
currentLaunchDir - Variable in class org.archive.spring.PathSharingContext
 
currentLaunchId - Variable in class org.archive.spring.PathSharingContext
 
currentLaunchJobLogHandler - Variable in class org.archive.crawler.framework.CrawlJob
 
customRobots - Variable in class org.archive.modules.net.CustomRobotsPolicy
textual alternate robots.txt rules to follow
CustomRobotsPolicy - Class in org.archive.modules.net
Follow a custom-written robots policy, rather than the site's own declarations Does not support overlays of different custom-robots; instead it is recommended each custom policy be declared as a separate bean, with a distinct name.
CustomRobotsPolicy() - Constructor for class org.archive.modules.net.CustomRobotsPolicy
 
customRobotstxt - Variable in class org.archive.modules.net.CustomRobotsPolicy
 
CustomSWFTags - Class in org.archive.modules.extractor
Overwrite action tags, that may hold URI, to use CrawlUriSWFAction action.
CustomSWFTags(SWFActions) - Constructor for class org.archive.modules.extractor.CustomSWFTags
 

D

d - Variable in class org.archive.util.BloomFilter64bit
The number of hash functions used by this filter.
data - Variable in class org.archive.modules.CrawlURI
Flexible dynamic attributes list.
data - Variable in class org.archive.modules.extractor.Link
Flexible dynamic attributes list.
data - Variable in class org.archive.spring.PathSharingContext
 
databaseConfig() - Static method in class org.archive.bdb.StoredQueue
A suitable DatabaseConfig for the Database backing a StoredQueue.
dataSocket - Variable in class org.archive.net.ClientFTP
 
db - Variable in class org.archive.bdb.DisposableStoredSortedMap
 
db - Variable in class org.archive.util.ObjectIdentityBdbCache
The BDB JE database used for this instance.
db - Variable in class org.archive.util.ObjectIdentityBdbManualCache
The BDB JE database used for this instance.
deactivateQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Put the given queue on the inactiveQueues queue
DEBUG - Static variable in class org.archive.util.BloomFilter64bit
 
DecideResult - Enum in org.archive.modules.deciderules
The decision of a DecideRule.
DecideRule - Class in org.archive.modules.deciderules
 
DecideRule() - Constructor for class org.archive.modules.deciderules.DecideRule
 
DecideRuledSheetAssociation - Class in org.archive.crawler.spring
SheetAssociation applied on the basis of DecideRules.
DecideRuledSheetAssociation() - Constructor for class org.archive.crawler.spring.DecideRuledSheetAssociation
 
DecideRuleSequence - Class in org.archive.modules.deciderules
 
DecideRuleSequence() - Constructor for class org.archive.modules.deciderules.DecideRuleSequence
 
decideToMapOutlink(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
decisionFor(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
 
decode(String) - Static method in class org.archive.util.Base32
Decodes the given Base32 String to a raw byte array.
decode(int) - Static method in class org.archive.util.ms.Cp1252
Returns the Unicode character for the given Cp1252 byte.
decrementQueuedCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
Note that a number of queued Uris have been deleted.
deepestUri() - Method in interface org.archive.crawler.framework.Frontier
Ordinal position of the 'deepest' URI eligible for crawling.
deepestUri() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
deepestUri - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
DEFAULT_CAPACITY - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
DEFAULT_CLASS_KEY - Static variable in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
DEFAULT_IP_WHOIS_SERVER - Static variable in class org.archive.modules.fetcher.FetchWhois
 
DEFAULT_LOWER_BOUND - Static variable in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Default lower bound
DEFAULT_LOWER_BOUND - Static variable in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Default lower bound
DEFAULT_MAX_PENDING - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
DEFAULT_PARAMETERS - Static variable in class org.archive.modules.extractor.Extractor
 
DEFAULT_REPLICAS - Static variable in class org.archive.util.LongToIntConsistentHash
 
DEFAULT_SMEAR - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
DEFAULT_TEST_TMP_DIR - Static variable in class org.archive.util.TmpDirTestCase
Default test tmp.
DEFAULT_TOE_PRIORITY - Static variable in class org.archive.crawler.framework.ToePool
run worker thread slightly lower than usual
DEFAULT_UPPER_BOUND - Static variable in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Default upper bound
DEFAULT_UPPER_BOUND - Static variable in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Default upper bound
DefaultBlockFileSystem - Class in org.archive.util.ms
Default implementation of the Block File System.
DefaultBlockFileSystem(SeekInputStream, int) - Constructor for class org.archive.util.ms.DefaultBlockFileSystem
Constructor.
DefaultServerCache - Class in org.archive.modules.fetcher
Server and Host cache.
DefaultServerCache() - Constructor for class org.archive.modules.fetcher.DefaultServerCache
Constructor.
DefaultServerCache(ObjectIdentityCache<CrawlServer>, ObjectIdentityCache<CrawlHost>) - Constructor for class org.archive.modules.fetcher.DefaultServerCache
 
DefaultTempDirProvider - Class in org.archive.modules.net
 
DefaultTempDirProvider() - Constructor for class org.archive.modules.net.DefaultTempDirProvider
 
defaultUpdateDescriptor(PropertyDescriptor) - Method in class org.archive.crawler.restlet.JobRelatedResource
 
defaultURI() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
Returns a CrawlURI for testing purposes.
deferOrFinishGeneric(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchWhois
 
deferredWrite - Variable in class org.archive.bdb.BdbModule.BdbConfig
 
degree - Variable in class st.ata.util.FPGenerator
The number of bits in fingerprints generated by this.
delaySeconds - Variable in class org.archive.crawler.framework.ActionDirectory
delay between scans of actionDirectory for new files
delete(CrawlURI) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Delete the given CrawlURI from persistent store.
deleted(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that a CrawlURI has been deleted outside of the normal next()/finished() lifecycle.
deleted(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Force logging, etc.
deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Removes the given item from the queue.
deleteJob(CrawlJob) - Method in class org.archive.crawler.framework.Engine
 
deleteMatching(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
Delete URIs matching the given pattern from this queue.
deleteMatchingFromQueue(String, String, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Delete all CrawlURIs matching the given expression.
deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
Delete URIs matching the given pattern from this queue.
deleteSheet(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Delete a named sheet from all associations and the master named sheets map.
deleteURIs(String, String) - Method in interface org.archive.crawler.framework.Frontier
Delete any URI that matches the given regular expression from the list of discovered and pending URIs.
deleteURIs(String, String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
dequeue(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Remove the peekItem from the queue and adjusts the count.
desc - Variable in enum org.archive.crawler.framework.CrawlStatus
 
description - Variable in class org.archive.modules.CrawlMetadata
 
DescriptorUpdater - Interface in org.archive.crawler.restlet
 
destroy() - Method in class org.archive.bdb.BdbModule
 
destroy() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
destroy() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
destroy() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
detach(CrawlURI) - Method in class org.archive.modules.credential.Credential
Detach this credential from passed curi.
detachAll(CrawlURI) - Method in class org.archive.modules.credential.Credential
Detach all credentials of this type from passed curi.
determineRootRef(Request) - Method in class org.archive.crawler.restlet.EnhDirectory
 
digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchDNS
Which algorithm (for example MD5 or SHA-1) to use to perform an on-the-fly digest hash of retrieved content-bodies.
digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchFTP
Which algorithm (for example MD5 or SHA-1) to use to perform an on-the-fly digest hash of retrieved content-bodies.
digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchHTTP
Which algorithm (for example MD5 or SHA-1) to use to perform an on-the-fly digest hash of retrieved content-bodies.
dir - Variable in class org.archive.bdb.BdbModule
 
directory - Variable in class org.archive.modules.writer.WriterPoolProcessor
 
directoryFile - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Implicitly append this to a URI ending with '/'.
dirResource - Variable in class org.archive.crawler.restlet.EditRepresentation
 
dirResource - Variable in class org.archive.crawler.restlet.PagedRepresentation
wrapped EnhDirectoryResource; used to formulate self-links
dirtyItems - Variable in class org.archive.util.ObjectIdentityBdbManualCache
 
dirtyKey(String) - Method in class org.archive.util.ObjectIdentityBdbCache
 
dirtyKey(String) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
dirtyKey(String) - Method in interface org.archive.util.ObjectIdentityCache
force the persistent backend, if any, to eventually be updated with live object state for the given key
dirtyKey(String) - Method in class org.archive.util.ObjectIdentityMemCache
 
disallows - Variable in class org.archive.modules.net.RobotsDirectives
 
disconnect() - Method in class org.archive.net.ClientFTP
 
discoveredUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of discovered URIs.
discoveredUriCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
(non-Javadoc)
discoveredUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
DiskFPMergeUriUniqFilter - Class in org.archive.crawler.util
Crude FPMergeUriUniqFilter using a disk data file of raw longs as the overall FP record.
DiskFPMergeUriUniqFilter(File) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
DiskFPMergeUriUniqFilter.DataFileLongIterator - Class in org.archive.crawler.util
 
DiskFPMergeUriUniqFilter.DataFileLongIterator(DataInputStream) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Construct a long iterator reading from the given stream.
diskMap - Variable in class org.archive.util.ObjectIdentityBdbCache
The Collection view of the BDB JE database used for this instance.
diskMap - Variable in class org.archive.util.ObjectIdentityBdbManualCache
The Collection view of the BDB JE database used for this instance.
DiskSpaceMonitor - Class in org.archive.crawler.monitor
Monitors the available space on the paths configured.
DiskSpaceMonitor() - Constructor for class org.archive.crawler.monitor.DiskSpaceMonitor
 
DisposableStoredSortedMap<K,V> - Class in org.archive.bdb
TempStoredSortedMap remembers its backing Database, and offers a dispose() method for closing/discarding the underlying Database.
DisposableStoredSortedMap(Database, EntryBinding<K>, EntityBinding<V>, boolean) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
 
DisposableStoredSortedMap(Database, EntryBinding<K>, EntityBinding<V>, PrimaryKeyAssigner) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
 
DisposableStoredSortedMap(Database, EntryBinding<K>, EntryBinding<V>, boolean) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
 
DisposableStoredSortedMap(Database, EntryBinding<K>, EntryBinding<V>, PrimaryKeyAssigner) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
 
dispose() - Method in class org.archive.bdb.DisposableStoredSortedMap
 
disposition - Variable in class org.archive.crawler.event.CrawlURIDispositionEvent
 
dispositionChain - Variable in class org.archive.crawler.framework.CrawlController
Disposition chain
DispositionChain - Class in org.archive.modules
 
DispositionChain() - Constructor for class org.archive.modules.DispositionChain
 
dispositionInProgressLock - Variable in class org.archive.crawler.frontier.AbstractFrontier
lock allowing steps of outside processing that need to complete all-or-nothing to signal their in-progress status
dispositionPending - Variable in class org.archive.crawler.frontier.AbstractFrontier
remembers a disposition-in-progress, so that extra endDisposition() calls are harmless
DispositionProcessor - Class in org.archive.crawler.postprocessor
A step, late in the processing of a CrawlURI, for marking-up the CrawlURI with values to affect frontier disposition, and updating information that may have been affected by the fetch.
DispositionProcessor() - Constructor for class org.archive.crawler.postprocessor.DispositionProcessor
 
disregardedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that were scheduled at one point but have been disregarded.
disregardedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
URIs that are disregarded (for example because of robot.txt rules
disregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
diversionDir - Variable in class org.archive.crawler.processor.CrawlMapper
Directory to write diversion logs.
diversionLogs - Variable in class org.archive.crawler.processor.CrawlMapper
Mapping of target crawlers to logs (PrintWriters)
divertLog(CrawlURI, String) - Method in class org.archive.crawler.processor.CrawlMapper
Note the given CrawlURI in the appropriate diversion log.
DNSJavaUtil - Class in org.archive.util
Utility methods based on DNSJava.
doAbort(CrawlURI, HttpMethod, String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
Doc - Class in org.archive.util.ms
Reads .doc files.
doCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
 
doCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
Do the actual checkpoint.
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
 
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
 
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
Run checkpointing.
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
doCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
doCheckpoint(Checkpoint) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
docsPerSecond - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
document - Variable in class org.archive.modules.extractor.PDFParser
 
DOCUMENT_BUILDER - Static variable in class org.archive.crawler.migrate.MigrateH1to3Tool
 
documentReader - Variable in class org.archive.modules.extractor.PDFParser
 
doJournalAdded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalEmitted(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalFinishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalFinishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalReenqueued(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
doJournalRelocated(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
domain - Variable in class org.archive.modules.credential.Credential
The root domain this credential goes against: E.g.
DOMAIN_OVERBOUNDS - Static variable in class org.apache.commons.httpclient.Cookie
Character which, if appended to end of a domain, will give a boundary key that sorts past all Cookie sortKeys for the same domain.
domainMatch(String, String) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Performs domain-match as defined by the cookie specification.
domainMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Performs domain-match as implemented in common browsers.
domainMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
doneDir - Variable in class org.archive.crawler.framework.ActionDirectory
 
doRecover() - Method in class org.archive.bdb.BdbModule
 
doStripRegexMatch(String, String) - Method in class org.archive.modules.canonicalize.BaseRule
Run a regex that strips elements of a string.
dotBegin - Variable in class org.archive.modules.writer.MirrorWriterProcessor
If a segment starts with '.', the '.' is replaced by this.
doTeardown() - Method in class org.archive.crawler.framework.CrawlJob
 
dotEnd - Variable in class org.archive.modules.writer.MirrorWriterProcessor
If a directory name ends with '.' it is replaced by this.
doubleToString(double, int) - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
downloadDisregards - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
downloadedUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
downloadFailures - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
dropboxes - Static variable in class org.archive.crawler.restlet.Flash
 
dumpAllPendingToLog() - Method in class org.archive.crawler.frontier.BdbFrontier
Dump all still-enqueued URIs to the crawl.log -- without actually dequeuing.
dumpPendingAtClose - Variable in class org.archive.crawler.frontier.BdbFrontier
 
dumpReports() - Method in class org.archive.crawler.reporting.StatisticsTracker
Run the reports.
dumpSurtPrefixSet() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Dump the current prefixes in use to configured dump file (if any)
dupByHashBytes - Variable in class org.archive.modules.fetcher.FetchStats
 
dupByHashUrls - Variable in class org.archive.modules.fetcher.FetchStats
 
DUPLICATE - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
DUPLICATECOUNT - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
duplicateCount - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
duplicatesAtLastSample - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 

E

EDIT_FILTER - Static variable in class org.archive.crawler.restlet.JobResource
 
EDIT_FILTER - Static variable in class org.archive.crawler.restlet.models.CrawlJobModel
 
editFilter - Variable in class org.archive.crawler.restlet.EnhDirectory
 
EditRepresentation - Class in org.archive.crawler.restlet
Representation wrapping a FileRepresentation, displaying its contents in a TextArea for editting.
EditRepresentation(FileRepresentation, EnhDirectoryResource) - Constructor for class org.archive.crawler.restlet.EditRepresentation
 
elapsedMilliseconds - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
elapsedReport() - Method in class org.archive.crawler.framework.CrawlJob
 
elapsedReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
elementContext(CharSequence, CharSequence) - Static method in class org.archive.modules.extractor.ExtractorHTML
Create a suitable XPath-like context from an element name and optional attribute name.
EMBED_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for embeds without other context.
emitBumper(PrintWriter, boolean) - Method in class org.archive.crawler.restlet.PagedRepresentation
Emit a "start" or "EOF" bumper as appropriate to prominently indicate if page borders start- or end- of-file.
emitControls(PrintWriter) - Method in class org.archive.crawler.restlet.PagedRepresentation
Emit the navigational controls.
emitted(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
EMPTY - Static variable in class org.archive.util.AbstractLongFPSet
A constant used to indicate that a slot in the set storage is empty.
empty - Variable in class st.ata.util.FPGenerator
Fingerprint of the empty string of bytes.
encode(byte[]) - Static method in class org.archive.util.Base32
Encodes byte array to Base32 String.
EncodingUtil - Class in org.apache.commons.httpclient.util
The home for utility methods that handle various encoding tasks.
encounteredReferences - Variable in class org.archive.modules.extractor.PDFParser
 
endDisposition() - Method in interface org.archive.crawler.framework.Frontier
Inform frontier the processing signalled by an earlier pending beginDisposition() call has finished.
endDisposition() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
Engine - Class in org.archive.crawler.framework
Implementation for Engine.
Engine(File) - Constructor for class org.archive.crawler.framework.Engine
 
engine - Variable in class org.archive.crawler.Heritrix
 
engine - Variable in class org.archive.crawler.restlet.EngineApplication
 
EngineApplication - Class in org.archive.crawler.restlet
Restlet Application for a Heritrix crawl 'Engine', which is aware of local job configurations/directories and can assemble/launch/monitor/ manage crawls.
EngineApplication(Engine) - Constructor for class org.archive.crawler.restlet.EngineApplication
 
EngineApplication.EngineStatusService - Class in org.archive.crawler.restlet
Customize Restlet error to include back button and full stack.
EngineApplication.EngineStatusService() - Constructor for class org.archive.crawler.restlet.EngineApplication.EngineStatusService
 
EngineModel - Class in org.archive.crawler.restlet.models
 
EngineModel(Engine, String) - Constructor for class org.archive.crawler.restlet.models.EngineModel
 
engineName - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
engine name; default "beanshell"
engineName - Variable in class org.archive.modules.ScriptedProcessor
engine name; default "beanshell"
EngineResource - Class in org.archive.crawler.restlet
Restlet Resource representing an Engine that may be used to assemble, launch, monitor, and manage crawls.
EngineResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.EngineResource
 
EnhancedEnvironment - Class in org.archive.util.bdbje
Version of BDB_JE Environment with additional convenience features, such as a shared, cached StoredClassCatalog.
EnhancedEnvironment(File, EnvironmentConfig) - Constructor for class org.archive.util.bdbje.EnhancedEnvironment
Constructor
EnhDirectory - Class in org.archive.crawler.restlet
Enhanced version of Restlet Directory, which allows the local filesystem directory to be determined dynamically based on the request details.
EnhDirectory(Context, Reference) - Constructor for class org.archive.crawler.restlet.EnhDirectory
 
EnhDirectory(Context, String) - Constructor for class org.archive.crawler.restlet.EnhDirectory
 
EnhDirectoryResource - Class in org.archive.crawler.restlet
Enhanced version of Restlet DirectoryResource, adding ability to edit some files.
EnhDirectoryResource(EnhDirectory, Request, Response) - Constructor for class org.archive.crawler.restlet.EnhDirectoryResource
 
enqueue(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Add the given CrawlURI, noting its addition in running count.
enqueueCount - Variable in class org.archive.crawler.frontier.WorkQueue
Total number of items ever enqueued
enqueuedCounts - Variable in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
ensureStandardPoliciesAvailable() - Method in class org.archive.modules.CrawlMetadata
 
ensureStaticInitialization() - Static method in class org.archive.crawler.reporting.AlertHandler
Simply to ensure static initialization (installing catchall handler on topmost logger) is run.
Entry - Interface in org.archive.util.ms
 
Entry.EntryType - Enum in org.archive.util.ms
 
entryString(Object) - Static method in class org.archive.util.Histotable
Utility method to convert a key->Long into the string "count key".
entryToObject(DatabaseEntry) - Method in class org.archive.bdb.KryoBinding
 
equals(Object) - Method in class org.apache.commons.httpclient.Cookie
Two cookies are equal if the name, path and domain match.
equals(Object) - Method in class org.archive.modules.extractor.Link
 
equals(Object) - Method in class org.archive.modules.extractor.LinkContext
 
equals(Object) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
All instances of DefaultProtocolSocketFactory are the same.
equals(Object) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
equals(Object) - Method in class org.archive.modules.net.CrawlHost
 
equals(Object) - Method in class org.archive.modules.net.CrawlServer
 
errorCount - Variable in class org.archive.crawler.frontier.WorkQueue
count of errors encountered
errorMessage - Variable in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
 
escape(String) - Static method in class org.archive.util.JavaLiterals
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.AddRedirectFromRootServerToScope
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ContentTypeNotMatchesRegexDecideRule
Evaluate whether given object's string version does not match configured regex (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusDecideRule
Evaluate whether given object is equal to the configured status
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusNotMatchesRegexDecideRule
Evaluate whether given object's FetchStatus does not match configured regex (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.HasViaDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.HopCrossesAssignmentLevelDomainDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
Evaluate whether given object's string version matches configured regexes
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
Evaluate whether given object's string version matches configured regex
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Returns "true" if the provided CrawlURI has a fetch status that falls within this instance's specified range.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesFilePatternDecideRule
Evaluate whether given object's string version does not match configured regex (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesListRegexDecideRule
Evaluate whether given object's string version does not match configured regexs (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesRegexDecideRule
Evaluate whether given object's string version does not match configured regex (by reversing the superclass's answer).
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Returns "true" if the provided CrawlURI has a fetch status that does not fall within this instance's specified range.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.PredicatedDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
Evaluate whether given CrawlURI's content-digest exactly matches that of preceding fetch.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
 
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotOnDomainsDecideRule
Evaluate whether given object's URI is NOT in the set of domains -- simply reverse superclass's determination
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotOnHostsDecideRule
Evaluate whether given object's URI is NOT in the set of hosts -- simply reverse superclass's determination
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotSurtPrefixedDecideRule
Evaluate whether given object's URI is NOT in the SURT prefix set -- simply reverse superclass's determination
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Evaluate whether given object's URI is covered by the SURT prefix set
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TooManyHopsDecideRule
Evaluate whether given object is over the threshold number of hops.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
Evaluate whether given object is over the threshold number of path-segments.
evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TransclusionDecideRule
Evaluate whether given object is within the acceptable thresholds of transitive hops.
exactKey(String) - Static method in class org.archive.surt.SURTTokenizer
 
execute(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Executes this method using the specified HttpConnection and HttpState.
execute(ScriptEngine, String) - Method in class org.archive.crawler.restlet.ScriptingConsole
 
executor - Variable in class org.archive.crawler.framework.ActionDirectory
 
executor - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
expectedConcurrency - Variable in class org.archive.bdb.BdbModule
Expected number of concurrent threads; used to tune nLockTables according to JE FAQ http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33
expectedInserts - Variable in class org.archive.util.BloomFilter64bit
The expected number of inserts; determines calculated size
expectedResult - Variable in class org.archive.modules.extractor.StringExtractorTestBase.TestData
 
expend(int) - Method in class org.archive.crawler.frontier.WorkQueue
Decrease the internal running budget by the given amount.
expenditureAtLastActivation - Variable in class org.archive.crawler.frontier.WorkQueue
Record of expenditures at last activation (session start)
expirationOperation - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
The action that the processor takes once the runtime has elapsed.
extend(long, byte) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding the low eight bits of "b".
extend(long, char) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "v".
extend(long, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "v".
extend(long, long) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "v".
extend(long, byte[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding "n" bytes of "buf" starting from "buf[start]".
extend(long, char[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "n" characters of "buf" starting from "buf[i]".
extend(long, CharSequence) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) the characters of "s".
extend(long, int[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "n" characters of "buf" starting from "buf[i]".
extend(long, long[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding (all bits of) "n" characters of "buf" starting from "buf[i]".
extend8(long, String) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding the lower eight bits of the characters of "s".
extend8(long, char[], int, int) - Method in class st.ata.util.FPGenerator
Extends fingerprint f by adding the lower eight bits of "n" characters of "buf" starting from "buf[i]".
extend_byte(long, int) - Method in class st.ata.util.FPGenerator
Extends f with lower eight bits of v without full reduction.
extend_char(long, int) - Method in class st.ata.util.FPGenerator
Extends f with lower sixteen bits of v.
extend_int(long, int) - Method in class st.ata.util.FPGenerator
Extends f with (all bits of) v.
extend_long(long, long) - Method in class st.ata.util.FPGenerator
Extends f with v.
extendHopsPath(String, char) - Static method in class org.archive.modules.CrawlURI
Extend a 'hopsPath' (pathFromSeed string of single-character hop-type symbols), keeping the number of displayed hop-types under MAX_HOPS_DISPLAYED.
ExternalGeoLocationDecideRule - Class in org.archive.modules.deciderules
A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface.
ExternalGeoLocationDecideRule() - Constructor for class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
ExternalGeoLookupInterface - Interface in org.archive.modules.deciderules
Interface used by ExternalImplDecideRule.
externalPaths - Variable in class org.archive.spring.KeyedProperties
the alternate global property-paths leading to this map TODO: consider if deterministic ordered list is important
extract(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
Extracts links
extract(CrawlURI) - Method in class org.archive.modules.extractor.Extractor
Extracts links from the given URI.
extract(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
Run extractor.
extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
Perform usual extraction on a CrawlURI
extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorURI
Perform usual extraction on a CrawlURI
extract(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
Run extractor.
extract(CrawlURI) - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
extractImplied(CharSequence, Pattern, String) - Static method in class org.archive.modules.extractor.ExtractorImpliedURI
Utility method for extracting 'implied' URI given a source uri, trigger pattern, and build pattern.
extractLink(CrawlURI, Link) - Method in class org.archive.modules.extractor.ExtractorURI
Consider a single Link for internal URIs
extractor - Variable in class org.archive.modules.extractor.ContentExtractorTestBase
An extractor created during the setUp.
Extractor - Class in org.archive.modules.extractor
Extracts links from fetched URIs.
Extractor() - Constructor for class org.archive.modules.extractor.Extractor
 
ExtractorCSS - Class in org.archive.modules.extractor
This extractor is parsing URIs from CSS type files.
ExtractorCSS() - Constructor for class org.archive.modules.extractor.ExtractorCSS
 
ExtractorDOC - Class in org.archive.modules.extractor
This class allows the caller to extract href style links from word97-format word documents.
ExtractorDOC() - Constructor for class org.archive.modules.extractor.ExtractorDOC
 
ExtractorHTML - Class in org.archive.modules.extractor
Basic link-extraction, from an HTML content-body, using regular expressions.
ExtractorHTML() - Constructor for class org.archive.modules.extractor.ExtractorHTML
 
ExtractorHTMLForms - Class in org.archive.modules.forms
Extracts extra information about FORMs in HTML, loading this into the CrawlURI (for potential later use by FormLoginProcessor) and adding a small annotation to the crawl.log.
ExtractorHTMLForms() - Constructor for class org.archive.modules.forms.ExtractorHTMLForms
 
ExtractorHTTP - Class in org.archive.modules.extractor
Extracts URIs from HTTP response headers.
ExtractorHTTP() - Constructor for class org.archive.modules.extractor.ExtractorHTTP
 
ExtractorImpliedURI - Class in org.archive.modules.extractor
An extractor for finding 'implied' URIs inside other URIs.
ExtractorImpliedURI() - Constructor for class org.archive.modules.extractor.ExtractorImpliedURI
Constructor.
extractorJS - Variable in class org.archive.modules.extractor.ExtractorHTML
Javascript extractor to use to process inline javascript.
ExtractorJS - Class in org.archive.modules.extractor
Processes Javascript files for strings that are likely to be crawlable URIs.
ExtractorJS() - Constructor for class org.archive.modules.extractor.ExtractorJS
 
extractorJS - Variable in class org.archive.modules.extractor.ExtractorSWF
Javascript extractor to use to process inline javascript.
ExtractorMultipleRegex - Class in org.archive.modules.extractor
An extractor that uses regular expressions to find strings in the fetched content of a URI, and constructs outlink URIs from those strings.
ExtractorMultipleRegex() - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex
 
ExtractorMultipleRegex.GroupList - Class in org.archive.modules.extractor
 
ExtractorMultipleRegex.GroupList(MatchResult) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.GroupList
 
ExtractorMultipleRegex.MatchList - Class in org.archive.modules.extractor
 
ExtractorMultipleRegex.MatchList(String, CharSequence) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.MatchList
 
ExtractorMultipleRegex.MatchList(ExtractorMultipleRegex.GroupList...) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.MatchList
 
extractorParameters - Variable in class org.archive.modules.extractor.Extractor
 
ExtractorParameters - Interface in org.archive.modules.extractor
Bean interface for parameters consulted by multiple Extractors, and thus provided by some shared object.
ExtractorPDF - Class in org.archive.modules.extractor
Allows the caller to process a CrawlURI representing a PDF for the purpose of extracting URIs
ExtractorPDF() - Constructor for class org.archive.modules.extractor.ExtractorPDF
 
ExtractorSWF - Class in org.archive.modules.extractor
Extracts URIs from SWF (flash/shockwave) files.
ExtractorSWF() - Constructor for class org.archive.modules.extractor.ExtractorSWF
 
ExtractorSWF.CrawlUriSWFAction - Class in org.archive.modules.extractor
SWF action that handles discovered URIs.
ExtractorSWF.CrawlUriSWFAction(CrawlURI, Extractor) - Constructor for class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
ExtractorSWF.ExtractorTagParser - Class in org.archive.modules.extractor
TagParser customized to ignore SWFTags that will never contain extractable URIs.
ExtractorSWF.ExtractorTagParser(SWFTagTypes) - Constructor for class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
ExtractorUniversal - Class in org.archive.modules.extractor
A last ditch extractor that will look at the raw byte code and try to extract anything that looks like a link.
ExtractorUniversal() - Constructor for class org.archive.modules.extractor.ExtractorUniversal
Constructor.
ExtractorURI - Class in org.archive.modules.extractor
An extractor for finding URIs inside other URIs.
ExtractorURI() - Constructor for class org.archive.modules.extractor.ExtractorURI
Constructor
ExtractorXML - Class in org.archive.modules.extractor
A simple extractor which finds HTTP URIs inside XML/RSS files, inside attribute values and simple elements (those with only whitespace + HTTP URI + whitespace as contents).
ExtractorXML() - Constructor for class org.archive.modules.extractor.ExtractorXML
 
extractQueryStringLinks(UURI) - Static method in class org.archive.modules.extractor.ExtractorURI
Look for URIs inside the supplied UURI.
extractURIs() - Method in class org.archive.modules.extractor.PDFParser
Extract URIs from all objects found in a Pdf document's catalog.
extractURIs(PdfObject) - Method in class org.archive.modules.extractor.PDFParser
Parse a PdfDictionary, looking for URIs recursively and adding them to foundURIs
extraInfo - Variable in class org.archive.modules.CrawlURI
 

F

F_ADD - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_DISREGARD - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_EMIT - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_FAILURE - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_INCLUDE - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_REENQUEUED - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
F_SUCCESS - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
FACTORIES - Static variable in class org.archive.crawler.restlet.ScriptResource
 
failedFetchCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that failed to process.
failedFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
failedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
fakeResponse(StatusLine, HeaderGroup, InputStream) - Method in class org.apache.commons.httpclient.HttpMethodBase
This method is a dirty hack intended to work around current (2.0) design flaw that prevents the user from obtaining correct status code, headers and response body from the preceding HTTP CONNECT method.
fastOutputStreamHolder - Variable in class org.archive.crawler.frontier.RecyclingSerialBinding
Thread-local cache of reusable FastOutputStream
fetch(CrawlURI, String, String) - Method in class org.archive.modules.fetcher.FetchWhois
 
fetchChain - Variable in class org.archive.crawler.framework.CrawlController
Fetch chain
FetchChain - Class in org.archive.modules
 
FetchChain() - Constructor for class org.archive.modules.FetchChain
 
fetchDisregards - Variable in class org.archive.modules.fetcher.FetchStats
 
FetchDNS - Class in org.archive.modules.fetcher
Processor to resolve 'dns:' URIs.
FetchDNS() - Constructor for class org.archive.modules.fetcher.FetchDNS
 
FetchErrors - Class in org.archive.modules.fetcher
 
FetchErrors() - Constructor for class org.archive.modules.fetcher.FetchErrors
 
fetchFailures - Variable in class org.archive.modules.fetcher.FetchStats
 
FetchFTP - Class in org.archive.modules.fetcher
Fetches documents and directory listings using FTP.
FetchFTP() - Constructor for class org.archive.modules.fetcher.FetchFTP
Constructs a new FetchFTP.
FetchFTP.SocketFactoryWithTimeout - Class in org.archive.modules.fetcher
A SocketFactory much like DefaultSocketFactory, except that the createSocket() methods that open connections support a connect timeout.
FetchFTP.SocketFactoryWithTimeout() - Constructor for class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
FetchHistoryProcessor - Class in org.archive.modules.recrawl
Maintain a history of fetch information inside the CrawlURI's attributes.
FetchHistoryProcessor() - Constructor for class org.archive.modules.recrawl.FetchHistoryProcessor
 
FetchHTTP - Class in org.archive.modules.fetcher
HTTP fetcher that uses Apache Jakarta Commons HttpClient library.
FetchHTTP() - Constructor for class org.archive.modules.fetcher.FetchHTTP
Constructor.
fetchNonResponses - Variable in class org.archive.modules.fetcher.FetchStats
 
fetchResponses - Variable in class org.archive.modules.fetcher.FetchStats
 
FetchStats - Class in org.archive.modules.fetcher
Collector of statistics for a 'subset' of a crawl, such as a server (host:port), host, or frontier group (eg queue).
FetchStats() - Constructor for class org.archive.modules.fetcher.FetchStats
 
FetchStats.CollectsFetchStats - Interface in org.archive.modules.fetcher
 
FetchStats.HasFetchStats - Interface in org.archive.modules.fetcher
 
FetchStats.Stage - Enum in org.archive.modules.fetcher
 
FetchStatusCodes - Interface in org.archive.modules.fetcher
Constant flag codes to be used, in lieu of per-protocol codes (like HTTP's 200, 404, etc.), when network/internal/ out-of-band conditions occur.
fetchStatusCodesToString(int) - Static method in class org.archive.modules.CrawlURI
Takes a status code and converts it into a human readable string.
FetchStatusDecideRule - Class in org.archive.modules.deciderules
Rule applies the configured decision for any URI which has a fetch status equal to the 'target-status' setting.
FetchStatusDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusDecideRule
Usual constructor.
FetchStatusMatchesRegexDecideRule - Class in org.archive.modules.deciderules
 
FetchStatusMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusMatchesRegexDecideRule
Usual constructor.
FetchStatusNotMatchesRegexDecideRule - Class in org.archive.modules.deciderules
 
FetchStatusNotMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusNotMatchesRegexDecideRule
Usual constructor.
fetchSuccesses - Variable in class org.archive.modules.fetcher.FetchStats
 
FetchWhois - Class in org.archive.modules.fetcher
WHOIS Fetcher (RFC 3912).
FetchWhois() - Constructor for class org.archive.modules.fetcher.FetchWhois
 
FetchWhois.UrlStatus - Enum in org.archive.modules.fetcher
 
file - Variable in class org.archive.crawler.restlet.PagedRepresentation
File
fileLogger - Variable in class org.archive.crawler.framework.Scoper
 
fileLogger - Variable in class org.archive.modules.deciderules.DecideRuleSequence
 
filename - Variable in enum org.archive.crawler.util.Logs
 
fileRepresentation - Variable in class org.archive.crawler.restlet.EditRepresentation
 
fileRepresentation - Variable in class org.archive.crawler.restlet.PagedRepresentation
wrapped FileRepresentation
FilesystemLinkMaker - Class in org.archive.util
Wrapper for platform-dependent hard link creation.
FilesystemLinkMaker() - Constructor for class org.archive.util.FilesystemLinkMaker
 
FilesystemLinkMaker.Kernel32Library - Interface in org.archive.util
 
FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES - Class in org.archive.util
 
FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES() - Constructor for class org.archive.util.FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES
 
fillWith(CrawlURI, String) - Method in class org.archive.crawler.reporting.SeedRecord
Fill instance with given values; skips makeDirty so may be used on initialization.
finalize() - Method in class org.archive.util.ObjectIdentityBdbCache
 
finalize() - Method in class org.archive.util.ObjectIdentityBdbCache.LowMemoryCanary
When collected/finalized -- as should be expected in low-memory conditions -- trigger an expunge and a new 'canary' insertion.
finalize() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
finalTasks() - Method in class org.archive.crawler.frontier.AbstractFrontier
Perform any tasks necessary before entering FINISH frontier state/FINISHED crawl state
finalTasks() - Method in class org.archive.crawler.frontier.BdbFrontier
 
find(SortedSet<String>, String) - Static method in class org.archive.util.PrefixFinder
Extracts prefixes of a given string from a SortedSet.
findAttributeValueGroup(String, int, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
findAvailableCheckpointDirectories() - Method in class org.archive.crawler.framework.CheckpointService
Returns a list of available, valid (contains 'valid' file) checkpoint directories, as File instances, with the more recently-written appearing first.
findEligibleURI() - Method in class org.archive.crawler.frontier.AbstractFrontier
Find a CrawlURI eligible to be put on the outbound queue for processing.
findEligibleURI() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the next CrawlURI eligible to be processed (and presumably visited/fetched) by a a worker thread.
findFirstLineBeginning(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that that begins with the given string.
findFirstLineBeginningFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that begins with the given string.
findFirstLineContaining(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findFirstLineContaining(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findFirstLineContainingFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
Return the line number of the first line in the log/file that matches a given regular expression.
findGroups(String, int, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
findJobConfigs() - Method in class org.archive.crawler.framework.Engine
Find all job configurations in the usual place -- subdirectories of the jobs directory with files ending '.cxml', and from jobPathFiles (previously added by user) found in the jobs directory
findKeys(SortedMap<String, ?>, String) - Static method in class org.archive.util.PrefixFinder
 
findTarget(Request, Response) - Method in class org.archive.crawler.restlet.EnhDirectory
 
FINISH - Static variable in class org.archive.modules.ProcessResult
 
finishCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
 
finishCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
Cleanup/unlock; need not complete for a checkpoint to be valid.
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
 
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
 
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
 
finishCheckpoint(Checkpoint) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
finished(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Report a URI being processed as having finished processing.
finished(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Note that the previously emitted CrawlURI has completed its processing (for now).
finishedDisregard(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
finishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
finishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
finishedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs that have finished processing.
finishedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
finishedUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
finishFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
finishFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Complete the merge of candidate and previously-known FPs (closing files/iterators as appropriate).
finishFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
FirstNamedRobotsPolicy - Class in org.archive.modules.net
Working from an ordered list of potential User-Agents, consisting of first the regularly-configured User-Agent and then those in the candidateUserAgents list, consider each potential agent in order.
FirstNamedRobotsPolicy() - Constructor for class org.archive.modules.net.FirstNamedRobotsPolicy
 
fixup(String) - Method in class org.archive.crawler.reporting.HostsReport
 
fixupConfigPath(ConfigPath, String) - Method in class org.archive.spring.ConfigPathConfigurer
 
fixupPaths(Object, String) - Method in class org.archive.spring.ConfigPathConfigurer
Find any ConfigPath properties in the passed bean; ensure that if they have a null 'base', that is replaced with the job home directory.
FixupQueryString - Class in org.archive.modules.canonicalize
Strip any trailing question mark.
FixupQueryString() - Constructor for class org.archive.modules.canonicalize.FixupQueryString
 
Flash - Class in org.archive.crawler.restlet
Utility for including a brief last-action or background-action message on web responses.
Flash(String) - Constructor for class org.archive.crawler.restlet.Flash
Create an ACK flash of default styling with the given message.
Flash(String, Flash.Kind) - Constructor for class org.archive.crawler.restlet.Flash
Create a Flash of the given kind, message with default styling.
Flash.Kind - Enum in org.archive.crawler.restlet
usual types
flattenH1Order(Document) - Static method in class org.archive.crawler.migrate.MigrateH1to3Tool
Given a Document, return a Map of all non-blank simple text nodes, keyed by the pseudo-XPath to their parent element.
flattenVia() - Method in class org.archive.modules.CrawlURI
Method returns string version of this URI's referral URI.
flattenVia(CrawlURI) - Static method in class org.archive.modules.Processor
 
flush() - Method in class org.archive.crawler.reporting.AlertHandler
 
flush() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
flush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Perform a merge of all 'pending' items to the overall fingerprint list.
FLUSH_DELAY_FACTOR - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
flushRequestOutputStream() - Method in class org.apache.commons.httpclient.HttpConnection
Flushes the output request stream.
forAllHostsDo(Closure) - Method in class org.archive.modules.fetcher.DefaultServerCache
NOTE: Should not mutate the CrawlHost instance so retrieved; depending on the hostscache implementation, the change may not be reliably persistent.
forAllHostsDo(Closure) - Method in class org.archive.modules.net.ServerCache
Utility for performing an action on every CrawlHost.
forAllPendingDo(Closure) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Utility method to perform action for all pending CrawlURI instances.
forceFetch() - Method in class org.archive.modules.CrawlURI
If this method returns true, this URI should be fetched even though it already has been crawled.
forceScarceMemory() - Static method in class org.archive.util.TestUtils
Temporarily exhaust memory, forcing weak/soft references to be broken.
forceWakeQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Utility method for advanced users/experimentation: force wake all snoozed queues -- for example to kick a crawl where connectivity problems have put all queues in slow-retry-snoozes back to busy-ness.
forget(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Forget item was seen
forget(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Forget the given CrawlURI.
forget(String, CrawlURI) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
forget(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
forget(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
forgetAllButLatest - Variable in class org.archive.checkpointing.Checkpoint
 
forgetAllButLatest - Variable in class org.archive.crawler.framework.CheckpointService
 
forgetAllSchemeAuthorityMatching(String) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Forget all entries that match the scheme+host+port of the given url, so that they can be crawled again if discovered again.
format(LogRecord) - Method in class org.archive.crawler.framework.CrawlJob.JobLogFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.NonFatalErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.RuntimeErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.StatisticsLogFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.UriErrorFormatter
 
format(LogRecord) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
format(LogRecord) - Method in class org.archive.util.OneLineSimpleLogger
 
formatBytes(Long) - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
formatCookie(Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Create a "Cookie" header value for an array of cookies.
formatCookie(Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Return a string suitable for sending in a "Cookie" header
formatCookie(Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
formatCookieHeader(Cookie[]) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Create a "Cookie" Header for an array of Cookies.
formatCookieHeader(Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Create a "Cookie" Header for single Cookie.
formatCookieHeader(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Create a "Cookie" Header containing all Cookies in cookies.
formatCookieHeader(Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Create a "Cookie" Header containing the Cookie.
formatCookieHeader(Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
formatCookieHeader(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
formatCookies(Cookie[]) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Create a "Cookie" header value for an array of cookies.
formatCookies(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Create a "Cookie" header value containing all Cookies in cookies suitable for sending in a "Cookie" header
formatCookies(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
formItems - Variable in class org.archive.modules.credential.HtmlFormCredential
Form items.
FormLoginProcessor - Class in org.archive.modules.forms
A step, post-ExtractorHTMLForms, where a followup CrawlURI to attempt a form submission may be synthesized.
FormLoginProcessor() - Constructor for class org.archive.modules.forms.FormLoginProcessor
 
formUrlEncode(NameValuePair[], String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Form-urlencoding routine.
foundURIs - Variable in class org.archive.modules.extractor.PDFParser
 
fp(byte[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of "n" bytes of "buf" starting from "buf[start]".
fp(char[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of (all bits of) "n" characters of "buf" starting from "buf[i]".
fp(CharSequence) - Method in class st.ata.util.FPGenerator
Compute fingerprint of (all bits of) the characters of "s".
fp(int[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of (all bits of) "n" characters of "buf" starting from "buf[i]".
fp(long[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of (all bits of) "n" characters of "buf" starting from "buf[i]".
fp8(String) - Method in class st.ata.util.FPGenerator
Compute fingerprint of the lower eight bits of the characters of "s".
fp8(char[], int, int) - Method in class st.ata.util.FPGenerator
Compute fingerprint of the lower eight bits of "n" characters of "buf" starting from "buf[i]".
FPGenerator - Class in st.ata.util
This class provides methods that construct fingerprints of strings of bytes via operations in GF[2^d] for 0 < d <= 64.
FPMergeUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter based on merging FP arrays (in memory or from disk).
FPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter
 
FPMergeUriUniqFilter.PendingItem - Class in org.archive.crawler.util
Represents a long fingerprint and (possibly) its corresponding CrawlURI, awaiting the next merge in a 'pending' state.
FPMergeUriUniqFilter.PendingItem(long, CrawlURI) - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
 
fpset - Variable in class org.archive.crawler.util.FPUriUniqFilter
 
FPUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter storing 64-bit UURI fingerprints, using an internal LongFPSet instance.
FPUriUniqFilter(LongFPSet) - Constructor for class org.archive.crawler.util.FPUriUniqFilter
Create FPUriUniqFilter wrapping given long set
FPUriUniqFilter() - Constructor for class org.archive.crawler.util.FPUriUniqFilter
 
freeReserveMemory() - Method in class org.archive.crawler.framework.CrawlController
 
frequentFlushes - Variable in class org.archive.modules.writer.WriterPoolProcessor
Whether to flush to underlying file frequently (at least after each record), or not.
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.extractor.Extractor
 
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.forms.FormLoginProcessor
 
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.Processor
Restore internal state from JSONObject stored at earlier checkpoint-time.
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
fromCheckpointJson(JSONObject) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
fromHopsViaString(String) - Static method in class org.archive.modules.CrawlURI
 
frontier - Variable in class org.archive.crawler.framework.ActionDirectory
autowired frontier for actions
frontier - Variable in class org.archive.crawler.framework.CrawlController
The frontier to use for the crawl.
Frontier - Interface in org.archive.crawler.framework
An interface for URI Frontiers.
frontier - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
The frontier to use.
frontier - Variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
frontier - Variable in class org.archive.crawler.processor.HashCrawlMapper
 
frontier - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
 
Frontier.FrontierGroup - Interface in org.archive.crawler.framework
Generic interface representing the internal groupings of a Frontier's URIs -- usually queues.
Frontier.State - Enum in org.archive.crawler.framework
Enumeration of possible target states.
FrontierJournal - Class in org.archive.crawler.frontier
Helper class for managing a simple Frontier change-events journal which is useful for recovering from crawl problems.
FrontierJournal(String, String) - Constructor for class org.archive.crawler.frontier.FrontierJournal
Create a new recovery journal at the given location
FrontierNonemptyReport - Class in org.archive.crawler.reporting
Report of all nonempty Frontier queues (as usually dumped at end of crawl for reference).
FrontierNonemptyReport() - Constructor for class org.archive.crawler.reporting.FrontierNonemptyReport
 
FrontierPreparer - Class in org.archive.crawler.prefetch
Processor to preload URI with as much precalculated policy-based info as possible before it reaches frontier criticial sections.
FrontierPreparer() - Constructor for class org.archive.crawler.prefetch.FrontierPreparer
 
frontierReport() - Method in class org.archive.crawler.framework.CrawlJob
 
frontierReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
FrontierSummaryReport - Class in org.archive.crawler.reporting
Frontier summary report showing a limited number of queues of each type -- as typically consulted during a crawl in progress.
FrontierSummaryReport() - Constructor for class org.archive.crawler.reporting.FrontierSummaryReport
 
fullVia - Variable in class org.archive.modules.CrawlURI
 
futureUriCount() - Method in interface org.archive.crawler.framework.Frontier
 
futureUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
futureUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
futureUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
futureUris - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
URIs scheduled to be re-enqueued at future date

G

generateCrawlLogTail() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
generateFrom(ConfigPath, int) - Method in class org.archive.checkpointing.Checkpoint
Use immediately after instantiation to fill-in a Checkpoint created outside Spring configuration.
generateJobLogTail() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
generateReports() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
generateRequestLine(HttpConnection, String, String, String, String) - Static method in class org.apache.commons.httpclient.HttpMethodBase
Generates HTTP request line according to the specified attributes.
generator - Variable in class org.archive.io.Arc2Warc
 
generator - Variable in class org.archive.modules.writer.WARCWriterProcessor
Generator for record IDs
get(Object) - Method in class org.archive.crawler.framework.BeanLookupBindings
 
get(DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Get the next nearest item after the given key.
get(String) - Static method in class org.archive.crawler.util.LogReader
Returns the entire file.
get(InputStreamReader) - Static method in class org.archive.crawler.util.LogReader
Reads entire contents of reader, returns as string.
get(String, int, int) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log file.
get(InputStreamReader, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log file.
get(Object, String) - Method in class org.archive.modules.credential.CredentialStore
 
get(CharSequence, CharSequence) - Static method in class org.archive.modules.extractor.HTMLLinkContext
return an instance of HTMLLinkContext for attribute attr in element el.
get(String) - Static method in class org.archive.modules.extractor.HTMLLinkContext
return an instance of HTMLLinkContext for path path.
get(String) - Method in class org.archive.spring.KeyedProperties
Get the given value, checking override maps if appropriate.
get(Object) - Method in class org.archive.util.Histotable
Return 0 instead of null for absent keys.
get() - Method in class org.archive.util.IdentityCacheableWrapper
 
get(String) - Method in class org.archive.util.ObjectIdentityBdbCache
 
get(String) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
get(String) - Method in interface org.archive.util.ObjectIdentityCache
get the object under the given key/name -- but should not mutate object state
get(String) - Method in class org.archive.util.ObjectIdentityMemCache
 
get() - Method in class org.archive.util.Supplier
 
getAcceptCompression() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getAcceptHeaders() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getAcceptNonDnsResolves() - Method in class org.archive.modules.fetcher.FetchDNS
 
getAction() - Method in class org.archive.modules.forms.HTMLForm
 
getActionDir() - Method in class org.archive.crawler.framework.ActionDirectory
 
getActiveToeCount() - Method in class org.archive.crawler.framework.CrawlController
 
getActiveToeCount() - Method in class org.archive.crawler.framework.ToePool
 
getAlertCount() - Method in class org.archive.crawler.framework.CrawlJob
 
getAlertCount() - Method in class org.archive.crawler.reporting.AlertThreadGroup
 
getAlertCount() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getAlertsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getAll() - Method in class org.archive.modules.credential.CredentialStore
 
getAllConfigPaths() - Method in class org.archive.spring.ConfigPathConfigurer
 
getAllErrors() - Method in class org.archive.spring.PathSharingContext
 
getAllowByRegex() - Method in class org.archive.crawler.prefetch.Preselector
 
getAlsoCheckVia() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getAnnotations() - Method in class org.archive.modules.CrawlURI
Get the annotations set for this uri.
getApplicableSurtPrefix() - Method in class org.archive.modules.forms.FormLoginProcessor
 
getAsciiBytes(String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the specified string to byte array of ASCII characters.
getAsciiString(byte[], int, int) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the byte array of ASCII characters to a string.
getAsciiString(byte[]) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the byte array of ASCII characters to a string.
getAsText() - Method in class org.archive.io.ReadSourceEditor
 
getAsText() - Method in class org.archive.spring.ConfigPathEditor
 
getAt(long) - Method in class org.archive.util.AbstractLongFPSet
Get the stored value at the given slot.
getAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
getAttributeEither(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchHTTP
Get a value either from inside the CrawlURI instance, or from settings (module attributes).
getAudience() - Method in class org.archive.modules.CrawlMetadata
 
getAuthenticationRealm() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
use #getHostAuthState()
getAuthScheme(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
 
getAvailableGlobalVariables() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getAvailableGlobalVariables() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getAvailableRobotsPolicies() - Method in class org.archive.modules.CrawlMetadata
 
getAvailableScriptEngines() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getAvailableScriptEngines() - Method in class org.archive.crawler.restlet.ScriptResource
 
getBalanceReplenishAmount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getBase() - Method in class org.archive.spring.ConfigPath
 
getBasePrecedence() - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
getBasePrecedence() - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
getBaseURI() - Method in class org.archive.modules.CrawlURI
Get the (HTML) Base URI used for derelativizing internal URIs.
getBdbSubDirectory(File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getBeanName() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
getBeanName() - Method in class org.archive.modules.Processor
 
getBeanName() - Method in class org.archive.spring.HeritrixLifecycleProcessor
 
getBeanpathTarget(String) - Method in class org.archive.crawler.framework.CrawlJob
Utility method for getting a bean or any other object addressable with a 'bean path' -- a property-path string (with dots and []indexes) starting with a bean name.
getBeansRefPath() - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
getBit(long) - Method in interface org.archive.util.BloomFilter
 
getBit(long) - Method in class org.archive.util.BloomFilter64bit
Returns from the local bitvector the value of the bit with the specified index.
getBlockAll() - Method in class org.archive.crawler.prefetch.Preselector
 
getBlockAwaitingSeedLines() - Method in class org.archive.modules.seeds.TextSeedModule
 
getBlockByRegex() - Method in class org.archive.crawler.prefetch.Preselector
 
getBloomFilter() - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
getBuiltJobs() - Method in class org.archive.crawler.restlet.EngineResource
 
getByRealm(Set<Credential>, String, CrawlURI) - Static method in class org.archive.modules.credential.HttpAuthenticationCredential
Convenience method that does look up on passed set using realm for key.
getByRegex(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegex(InputStreamReader, String, int, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegex(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegex(InputStreamReader, String, String, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegexFromSeries(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getByRegexFromSeries(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
Returns all lines in a log/file matching a given regular expression.
getBytes(String, String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the specified string to a byte array.
getBytesPerFileType(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns the accumulated number of bytes from files of a given file type.
getBytesPerHost(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns the accumulated number of bytes downloaded from a given host.
getCacheMisses() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getCachePercent() - Method in class org.archive.bdb.BdbModule
 
getCacheSize() - Method in class org.archive.bdb.BdbModule
 
getCalculateRobotsOnly() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getCandidateChain() - Method in class org.archive.crawler.framework.CrawlController
 
getCandidateChain() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getCandidateUserAgents() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
getCandidateUserAgents() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
getCanonicalizationPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getCanonicalString() - Method in class org.archive.modules.CrawlURI
 
getCaseSensitiveFilesystem() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getCharacterMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getCharPosLimit() - Method in class org.archive.util.ms.Piece
 
getCharPosStart() - Method in class org.archive.util.ms.Piece
 
getCheckOutlinks() - Method in class org.archive.crawler.processor.CrawlMapper
 
getCheckpoint() - Method in class org.archive.crawler.framework.CheckpointSuccessEvent
 
getCheckpointDir() - Method in class org.archive.checkpointing.Checkpoint
 
getCheckpointIntervalMinutes() - Method in class org.archive.crawler.framework.CheckpointService
 
getCheckpointsDir() - Method in class org.archive.crawler.framework.CheckpointService
 
getCheckpointService() - Method in class org.archive.crawler.framework.CrawlJob
Return the configured Checkpointer instance, if there is exactly one, otherwise null.
getCheckUri() - Method in class org.archive.crawler.processor.CrawlMapper
 
getChild() - Method in interface org.archive.util.ms.Entry
 
getChmod() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getChmodValue() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getClassCatalog() - Method in class org.archive.bdb.BdbModule
 
getClassCatalog() - Method in class org.archive.util.bdbje.EnhancedEnvironment
Return a StoredClassCatalog backed by a Database in this environment, either pre-existing or created (and cached) if necessary.
getClassCheckpointFile(File, String, Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFile(File, Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFilename(Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassCheckpointFilename(Class<?>, String) - Static method in class org.archive.crawler.util.CheckpointUtils
 
getClassKey(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.AssignmentLevelSurtQueueAssignmentPolicy
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
Get the String key (name) of the queue to which the CrawlURI should be assigned.
getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
getClassKey() - Method in class org.archive.crawler.frontier.WorkQueue
 
getClassKey(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getClassKey() - Method in class org.archive.modules.CrawlURI
Get the token (usually the hostname + port) which indicates what "class" this CrawlURI should be grouped with, for the purposes of ensuring only one item of the class is processed at once, all items of the class are held for a politeness period, etc.
getCollection() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getComment() - Method in class org.apache.commons.httpclient.Cookie
Returns the comment describing the purpose of this cookie, or null if no such comment has been defined.
getComment() - Method in class org.archive.modules.deciderules.DecideRule
 
getComponent() - Method in class org.archive.crawler.Heritrix
 
getCompoundName(String) - Static method in class org.archive.util.JndiUtils
 
getCompoundName(ObjectName) - Static method in class org.archive.util.JndiUtils
Return name to use as jndi name.
getCompress() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getConfigPathConfigurer() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getConfigPaths() - Method in class org.archive.crawler.framework.CrawlJob
Return all known ConfigPaths, as an aid to viewing or editting.
getConfigurationFile() - Method in class org.archive.spring.PathSharingContext
 
getConfigurationFilePath() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
getConnectTimeoutMs() - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
getContentCharSet(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the character set from the Content-Type header.
getContentDeclaredCharset(CrawlURI, String) - Method in class org.archive.modules.extractor.ExtractorHTML
 
getContentDeclaredCharset(CrawlURI, String) - Method in class org.archive.modules.extractor.ExtractorXML
 
getContentDigest() - Method in class org.archive.modules.CrawlURI
Return the retained content-digest value, if any.
getContentDigestHistory() - Method in class org.archive.modules.CrawlURI
 
getContentDigestSchemeString() - Method in class org.archive.modules.CrawlURI
 
getContentDigestString() - Method in class org.archive.modules.CrawlURI
 
getContentLength() - Method in class org.archive.modules.CrawlURI
For completed HTTP transactions, the length of the content-body.
getContentLengthThreshold() - Method in class org.archive.modules.deciderules.ContentLengthDecideRule
 
getContentLengthThreshold() - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
getContentRegexes() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
getContentSize() - Method in class org.archive.modules.CrawlURI
Get the size in bytes of this URI's recorded content, inclusive of things like protocol headers.
getContentType() - Method in class org.archive.modules.CrawlURI
Get the content type of this URI.
getContentType() - Method in class org.archive.net.s3.S3URLConnection
XXX Not sure what this should be or if it even matters for our use.
getContentTypeMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getContext() - Method in class org.archive.modules.extractor.Link
 
getControlConversation() - Method in class org.archive.net.ClientFTP
 
getController() - Method in class org.archive.crawler.framework.ToePool
 
getController() - Method in class org.archive.crawler.framework.ToeThread
Get the CrawlController acossiated with this thread.
getControlUri(long, int, boolean) - Method in class org.archive.crawler.restlet.PagedRepresentation
Construct navigational URI for given parameters.
getCookiePolicy() - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
Use HttpMethodParams.getCookiePolicy(), HttpMethod.getParams().
getCookies() - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use getCookiesMap() // <- IA/HERITRIX CHANGE
getCookies(String, int, String, boolean) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use CookieSpec#match(String, int, String, boolean, Cookie)
getCookiesLoadFile() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
getCookiesMap() - Method in class org.apache.commons.httpclient.HttpState
Returns a sorted map of cookies that this HTTP state currently contains.
getCookiesMap() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
getCookiesMap() - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
getCookiesMap() - Method in interface org.archive.modules.fetcher.CookieStorage
 
getCookiesMap() - Method in class org.archive.modules.fetcher.SimpleCookieStorage
 
getCookiesSaveFile() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
getCookieStorage() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getCoreKey(UURI) - Method in class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
 
getCoreKey(UURI) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
getCoreKey(UURI) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
getCost(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
Return the 'cost' of a CrawlURI (how much of its associated queue's budget it depletes upon attempted processing)
getCostAssignmentPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getCount() - Method in class org.archive.crawler.frontier.WorkQueue
Count of URIs in this queue.
getCountryCode() - Method in class org.archive.modules.net.CrawlHost
Get country code of this host
getCountryCodes() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
getCrawlController() - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
 
getCrawlController() - Method in class org.archive.crawler.framework.CheckpointService
 
getCrawlController() - Method in class org.archive.crawler.framework.CrawlJob
 
getCrawlController() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
getCrawlController() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getCrawlController() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getCrawlController() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
getCrawlController() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
getCrawlController() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getCrawlDelay() - Method in class org.archive.modules.net.RobotsDirectives
 
getCrawlDuration() - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns how long the current crawl has been running *including* time paused (contrast with getCrawlElapsedTime()).
getCrawledBytes() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getCrawlElapsedTime() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getCrawlerCount() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getCrawlExitStatus() - Method in class org.archive.crawler.framework.CrawlController
 
getCrawlJob() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getCrawlJobShortName() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getCrawlJobUrl() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getCrawlLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getCrawlURI() - Method in class org.archive.crawler.event.CrawlURIDispositionEvent
 
getCreateHostDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getCreatePortDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getCredentials(String, String) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use #getCredentials(AuthScope)
getCredentials(AuthScope) - Method in class org.apache.commons.httpclient.HttpState
Get the credentials for the given authentication scope.
getCredentials() - Method in class org.archive.modules.CrawlURI
 
getCredentials() - Method in class org.archive.modules.credential.CredentialStore
 
getCredentials() - Method in class org.archive.modules.net.CrawlServer
 
getCredentialStore() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getCredentialStore() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getCredentialTypes() - Static method in class org.archive.modules.credential.CredentialStore
 
getCurrentLaunchDir() - Method in class org.archive.spring.PathSharingContext
 
getCurrentLaunchId() - Method in class org.archive.spring.PathSharingContext
 
getCurrentProcessorName() - Method in class org.archive.crawler.framework.ToeThread
 
getCustomEditor() - Method in class org.archive.io.ReadSourceEditor
 
getCustomEditor() - Method in class org.archive.spring.ConfigPathEditor
 
getCustomRobots() - Method in class org.archive.modules.net.CustomRobotsPolicy
 
getData() - Method in class org.archive.modules.CrawlURI
 
getData() - Method in class org.archive.modules.extractor.Link
Attribute list
getData() - Method in class org.archive.spring.PathSharingContext
 
getDatabase(String) - Method in class org.archive.bdb.BdbModule
 
getDatabaseConfig() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getDatabaseName() - Method in class org.archive.util.ObjectIdentityBdbCache
 
getDatabaseName() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
getDataList(String) - Method in class org.archive.modules.CrawlURI
Convenience method: return (creating if necessary) list at given data key
getDecision() - Method in class org.archive.modules.deciderules.PredicatedDecideRule
 
getDefaultCharset() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getDefaultEncoding() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getDefaultMaxFileSize() - Method in class org.archive.modules.writer.ARCWriterProcessor
 
getDefaultMaxFileSize() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getDefaultMaxFileSize() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getDefaultRules() - Static method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
A reasonable set of default rules to use, if no others are provided by operator configuration.
getDefaultStorePaths() - Method in class org.archive.modules.writer.ARCWriterProcessor
 
getDefaultStorePaths() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getDefaultStorePaths() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getDefaultUriPrecedencePolicy() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
getDeferrals() - Method in class org.archive.modules.CrawlURI
Get the deferral count.
getDeferToPrevious() - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
Whether to always defer to a previously-assigned key inside the CrawlURI.
getDelay(TimeUnit) - Method in class org.archive.crawler.frontier.WorkQueue
 
getDelayFactor() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getDelaySeconds() - Method in class org.archive.crawler.framework.ActionDirectory
 
getDescription() - Method in enum org.archive.crawler.framework.CrawlStatus
 
getDescription() - Method in class org.archive.modules.CrawlMetadata
 
getDestination() - Method in class org.archive.modules.extractor.Link
 
getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchDNS
 
getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchFTP
 
getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getDigestContent() - Method in class org.archive.modules.fetcher.FetchDNS
 
getDigestContent() - Method in class org.archive.modules.fetcher.FetchFTP
 
getDigestContent() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getDir() - Method in class org.archive.bdb.BdbModule
 
getDirectivesFor(String, boolean) - Method in class org.archive.modules.net.Robotstxt
Return the RobotsDirectives, if any, appropriate for the given User-Agent string.
getDirectivesFor(String) - Method in class org.archive.modules.net.Robotstxt
Return directives to use for the given User-Agent, resorting to wildcard rules or the default no-directives if necessary.
getDirectory() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getDirectoryFile() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getDisposition() - Method in class org.archive.crawler.event.CrawlURIDispositionEvent
 
getDisposition() - Method in class org.archive.crawler.reporting.SeedRecord
 
getDispositionChain() - Method in class org.archive.crawler.framework.CrawlController
 
getDiversionDir() - Method in class org.archive.crawler.processor.CrawlMapper
 
getDiversionLog(String) - Method in class org.archive.crawler.processor.CrawlMapper
Get the diversion log for a given target crawler node node.
getDNSRecord(long, Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
 
getDNSServerIPLabel() - Method in class org.archive.modules.CrawlURI
 
getDoAuthentication() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true if the HTTP method should automatically handle HTTP authentication challenges (status code 401, etc.), false otherwise
getDomain() - Method in class org.apache.commons.httpclient.Cookie
Returns domain attribute of the cookie.
getDomain() - Method in class org.archive.modules.credential.Credential
 
getDomDocument(File) - Method in class org.archive.crawler.framework.CrawlJob
Read a file to a DOM Document; return null if this isn't possible for any reason.
getDoneDir() - Method in class org.archive.crawler.framework.ActionDirectory
 
getDotBegin() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getDotEnd() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getDumpPendingAtClose() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getDupByHashBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getDupByHashUrls() - Method in class org.archive.modules.fetcher.FetchStats
 
getEarliestNextURIEmitTime() - Method in class org.archive.modules.net.CrawlHost
Get the earliest time a URI for this host could be emitted.
getEffectiveVersion() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the HTTP version used with this method (may be null if undefined, that is, the method has not been executed)
getEmbedHopCount() - Method in class org.archive.modules.CrawlURI
Get the embed hop count.
getEnabled() - Method in class org.archive.modules.canonicalize.BaseRule
 
getEnabled() - Method in interface org.archive.modules.canonicalize.CanonicalizationRule
 
getEnabled() - Method in class org.archive.modules.deciderules.DecideRule
 
getEnabled() - Method in class org.archive.modules.Processor
 
getEngine() - Method in class org.archive.crawler.Heritrix
 
getEngine() - Method in class org.archive.crawler.restlet.EngineApplication
 
getEngine() - Method in class org.archive.crawler.restlet.EngineResource
 
getEngine() - Method in class org.archive.crawler.restlet.JobRelatedResource
 
getEngine() - Method in class org.archive.crawler.restlet.JobResource
 
getEngine() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
Get the proper ScriptEngine instance -- either shared or local to this thread.
getEngine() - Method in class org.archive.modules.ScriptedProcessor
Get the proper ScriptEngine instance -- either shared or local to this thread.
getEngineName() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
getEngineName() - Method in class org.archive.modules.ScriptedProcessor
 
getEnhDirectory() - Method in class org.archive.crawler.restlet.EnhDirectoryResource
 
getEntriesDescending() - Method in class org.archive.crawler.util.TopNSet
Get descending ordered list of key,count Entries.
getEntry(int) - Method in class org.archive.util.ms.DefaultBlockFileSystem
Returns the entry with the given number.
getEntryByFrequencySortedSet() - Static method in class org.archive.util.Histotable
Get a SortedSet that, when filled with (String key)->(long count) Entry instances, sorts them by (count, key) descending, as is useful for most-frequent displays.
getErrorPenaltyAmount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getException() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getException() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getExpectedConcurrency() - Method in class org.archive.bdb.BdbModule
 
getExpectedInserts() - Method in interface org.archive.util.BloomFilter
Report the number of expected inserts used at instantiation time to calculate the bitfield size.
getExpectedInserts() - Method in class org.archive.util.BloomFilter64bit
 
getExpirationOperation() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
getExpiryDate() - Method in class org.apache.commons.httpclient.Cookie
Returns the expiration Date of the cookie, or null if none exists.
getExtract404s() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getExtract404s() - Method in interface org.archive.modules.extractor.ExtractorParameters
Whether to extract links from responses with a 404 'not found' response code.
getExtractAllForms() - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
getExtractFromDirs() - Method in class org.archive.modules.fetcher.FetchFTP
Returns the extract.from.dirs attribute for this FetchFTP and the given curi.
getExtractIndependently() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getExtractIndependently() - Method in interface org.archive.modules.extractor.ExtractorParameters
Whether each extractor should make an independent decision as to whether it can extract links from a URI's content (when value is true), or whether a previous extractor's success (marking the URI as hasBeenLinkExtracted) should cancel later extractors (when value is false).
getExtractJavascript() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getExtractOnlyFormGets() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getExtractorJS() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getExtractorJS() - Method in class org.archive.modules.extractor.ExtractorSWF
 
getExtractorParameters() - Method in class org.archive.modules.extractor.Extractor
 
getExtractParent() - Method in class org.archive.modules.fetcher.FetchFTP
Returns the extract.parent attribute for this FetchFTP and the given curi.
getExtractValueAttributes() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getExtraInfo() - Method in class org.archive.modules.CrawlURI
 
getFetchAttempts() - Method in class org.archive.modules.CrawlURI
Get the count of attempts (trips through the processing loop) at getting the document referenced by this URI.
getFetchBeginTime() - Method in class org.archive.modules.CrawlURI
 
getFetchChain() - Method in class org.archive.crawler.framework.CrawlController
 
getFetchCompletedTime() - Method in class org.archive.modules.CrawlURI
 
getFetchDisregards() - Method in class org.archive.modules.fetcher.FetchStats
 
getFetchDuration() - Method in class org.archive.modules.CrawlURI
 
getFetchNonResponses() - Method in class org.archive.modules.fetcher.FetchStats
 
getFetchResponses() - Method in class org.archive.modules.fetcher.FetchStats
 
getFetchStatus() - Method in class org.archive.modules.CrawlURI
Return the overall/fetch status of this CrawlURI for its current trip through the processing loop.
getFetchSuccesses() - Method in class org.archive.modules.fetcher.FetchStats
 
getFetchType() - Method in class org.archive.modules.CrawlURI
 
getFile() - Method in class org.archive.spring.ConfigPath
 
getFileDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns a HashMap that contains information about distributions of encountered mime types.
getFilename() - Method in class org.archive.crawler.reporting.CrawlSummaryReport
 
getFilename() - Method in class org.archive.crawler.reporting.FrontierNonemptyReport
 
getFilename() - Method in class org.archive.crawler.reporting.FrontierSummaryReport
 
getFilename() - Method in class org.archive.crawler.reporting.HostsReport
 
getFilename() - Method in class org.archive.crawler.reporting.MimetypesReport
 
getFilename() - Method in class org.archive.crawler.reporting.ProcessorsReport
 
getFilename() - Method in class org.archive.crawler.reporting.Report
 
getFilename() - Method in class org.archive.crawler.reporting.ResponseCodeReport
 
getFilename() - Method in class org.archive.crawler.reporting.SeedsReport
 
getFilename() - Method in class org.archive.crawler.reporting.SourceTagsReport
 
getFilename() - Method in class org.archive.crawler.reporting.ToeThreadsReport
 
getFilename() - Method in enum org.archive.crawler.util.Logs
 
getFilePos() - Method in class org.archive.util.ms.Piece
 
getFileRepresentation() - Method in class org.archive.crawler.restlet.EditRepresentation
 
getFirstARecord(Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
 
getFirstKey() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getFlashes(Request) - Static method in class org.archive.crawler.restlet.Flash
 
getFollowRedirects() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true if the HTTP method should automatically follow HTTP redirects (status code 302, etc.), false otherwise.
getForceQueueAssignment() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
 
getForceRetire() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getForceRetire() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getForgetAllButLatest() - Method in class org.archive.checkpointing.Checkpoint
 
getForgetAllButLatest() - Method in class org.archive.crawler.framework.CheckpointService
 
getFormat() - Method in class org.archive.modules.canonicalize.RegexRule
 
getFormat() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
getFormItems() - Method in class org.archive.modules.credential.HtmlFormCredential
 
getFormProvince(CrawlURI) - Method in class org.archive.modules.forms.FormLoginProcessor
Get the 'form province' - either the configured (applicableSurtPrefix) or inferred (full current server) range of URIs that is considered covered by one form login
getFpset() - Method in class org.archive.crawler.util.FPUriUniqFilter
 
getFrequentFlushes() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getFrom(String, int, Pattern, boolean) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getFrom() - Method in class org.archive.modules.CrawlMetadata
 
getFrom() - Method in interface org.archive.modules.fetcher.UserAgentProvider
 
getFromSeries(String, int, int) - Static method in class org.archive.crawler.util.LogReader
Gets a portion of a log spread across a numbered series of files.
getFrontier() - Method in class org.archive.crawler.framework.ActionDirectory
 
getFrontier() - Method in class org.archive.crawler.framework.CrawlController
 
getFrontier() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getFrontier() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getFrontier() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getFrontier() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
getFrontierJournal() - Method in interface org.archive.crawler.framework.Frontier
 
getFrontierJournal() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getFrontierPreparer() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getFrontierReportShort() - Method in class org.archive.crawler.framework.CrawlController
 
getFullVia() - Method in class org.archive.modules.CrawlURI
 
getGroup(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Get the 'frontier group' (usually queue) for the given CrawlURI.
getGroup(CrawlURI) - Method in class org.archive.crawler.frontier.BdbFrontier
 
getGroupMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getGroupMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getGroupMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getGroupMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHarvester() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getHashCount() - Method in interface org.archive.util.BloomFilter
Report the number of internal independent hash function (and thus the number of bits set/checked for each item presented).
getHashCount() - Method in class org.archive.util.BloomFilter64bit
 
getHeritrixHome() - Static method in class org.archive.crawler.Heritrix
Exploit -Dheritrix.home if available to us.
getHeritrixVersion() - Method in class org.archive.crawler.framework.Engine
 
getHistoryDbName() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
getHistoryDbName() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
 
getHistoryLength() - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
 
getHolder() - Method in class org.archive.modules.CrawlURI
Return the 'holder' for the convenience of an external facility.
getHolderCost() - Method in class org.archive.modules.CrawlURI
Return the 'holderCost' for convenience of external facility (frontier)
getHolderKey() - Method in class org.archive.modules.CrawlURI
Return the 'holderKey' for convenience of an external facility (Frontier).
getHopChar() - Method in enum org.archive.modules.extractor.Hop
Returns a hop character suitable for display in logs.
getHopCount() - Method in class org.archive.modules.CrawlURI
Get total hops from seed.
getHopString() - Method in enum org.archive.modules.extractor.Hop
 
getHopType() - Method in class org.archive.modules.extractor.Link
 
getHost() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the host.
getHostAddress(CrawlURI) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
from WriterPoolProcessor
getHostAddress(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
Return IP address of given URI suitable for recording (as in a classic ARC 5-field header line).
getHostAddress(String) - Static method in class org.archive.util.DNSJavaUtil
Return an InetAddress for passed host.
getHostAuthState() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the target host authentication state
getHostConfiguration() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
no longer applicable
getHostFor(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
Get the CrawlHost associated with name.
getHostFor(String) - Method in class org.archive.modules.net.ServerCache
 
getHostFor(UURI) - Method in class org.archive.modules.net.ServerCache
Get the CrawlHost associated with curi.
getHostLastFinished(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
Returns the time (in millisec) when a URI belonging to a given host was last finished processing.
getHostMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getHostMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHostMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHostMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHostMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getHostName() - Method in class org.archive.modules.net.CrawlHost
Get the host name.
getHrefPath(File, CrawlJob) - Static method in class org.archive.crawler.restlet.JobResource
Get a usable HrefPath, relative to the JobResource, for the given file.
getHtmlOutput() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getHtmlOutput() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getHttpAuthChallenges() - Method in class org.archive.modules.CrawlURI
 
getHttpAuthChallenges() - Method in class org.archive.modules.net.CrawlServer
 
getHttpBindAddress() - Method in class org.archive.modules.fetcher.FetchHTTP
Local IP address or hostname to use when making connections (binding sockets).
getHttpConnectionManager() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the httpConnectionManager.
getHttpMethod() - Method in class org.archive.modules.CrawlURI
 
getHttpMethod() - Method in class org.archive.modules.credential.HtmlFormCredential
 
getHttpProxyHost() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getHttpProxyPassword() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getHttpProxyPort() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getHttpProxyUser() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getIgnoreCookies() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getIgnoreFormActionUrls() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getIgnoreUnexpectedHtml() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getImportedConfigs(File) - Method in class org.archive.crawler.framework.CrawlJob
Return all config files included via 'import' statements in the primary config (or other included configs).
getInactiveQueuesByPrecedence() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getInactiveQueuesByPrecedence() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return a sorted map of all queues of WorkQueue keys, keyed by precedence
getInactiveQueuesForPrecedence(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Get the queue of inactive uri-queue names at the given precedence.
getIncrementCounts() - Method in class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
 
getIndex() - Method in interface org.archive.util.ms.Entry
 
getInferRootPage() - Method in class org.archive.modules.extractor.ExtractorHTTP
 
getInFromFile(String) - Method in class org.archive.modules.extractor.PDFParser
Read a file named 'doc' and store its' bytes for later processing.
getInitialDelaySeconds() - Method in class org.archive.crawler.framework.ActionDirectory
 
getInProcessCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
The number of CrawlURIs 'in process' (passed to the outbound queue and not yet finished by returning through the inbound queue.)
getInProcessCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getInputStream() - Method in class org.archive.net.s3.S3URLConnection
Get an InputStream for the object, connecting to S3 if connect() hasn't been called yet.
getInstance(String) - Static method in class org.archive.net.UURIFactory
 
getInstance(UURI, String) - Static method in class org.archive.net.UURIFactory
 
getIntervalSeconds() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getIP() - Method in class org.archive.modules.net.CrawlHost
Get the IP address for this host.
getIpAddresses() - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
getIpFetched() - Method in class org.archive.modules.net.CrawlHost
Get the time when the IP address for this host was last looked up.
getIpTTL() - Method in class org.archive.modules.net.CrawlHost
Get the TTL value from the dns record for this host.
getIpValidityDurationSeconds() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getIsolateThreads() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
getIsolateThreads() - Method in class org.archive.modules.ScriptedProcessor
 
getIteratorOfURLsSuccessfullyCrawledFromSeedUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getJavaInitializationString() - Method in class org.archive.io.ReadSourceEditor
 
getJavaInitializationString() - Method in class org.archive.spring.ConfigPathEditor
 
getJeLogsFilter() - Static method in class org.archive.crawler.util.CheckpointUtils
 
getJob(String) - Method in class org.archive.crawler.framework.Engine
 
getJobConfigs() - Method in class org.archive.crawler.framework.Engine
 
getJobContext() - Method in class org.archive.crawler.framework.CrawlJob
 
getJobDir() - Method in class org.archive.crawler.framework.CrawlJob
 
getJobDirectoryFrom(File) - Method in class org.archive.crawler.framework.Engine
Return the job directory File read from the supplied ".jobpath" file, or null on any error.
getJobLog() - Method in class org.archive.crawler.framework.CrawlJob
 
getJobLogger() - Method in class org.archive.crawler.framework.CrawlJob
Get a logger to a distinguished file, job.log in the job's directory, into which job-specific events may be reported.
getJobName() - Method in class org.archive.modules.CrawlMetadata
 
getJobsDir() - Method in class org.archive.crawler.framework.Engine
 
getJobStatusDescription() - Method in class org.archive.crawler.framework.CrawlJob
 
getJumpTarget() - Method in class org.archive.modules.ProcessResult
 
getKeepSnapshotsCount() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getKey() - Method in class org.archive.crawler.frontier.WorkQueue
 
getKey() - Method in class org.archive.crawler.reporting.SeedRecord
 
getKey() - Method in class org.archive.modules.credential.Credential
 
getKey() - Method in class org.archive.modules.credential.HtmlFormCredential
 
getKey() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getKey() - Method in class org.archive.modules.net.CrawlHost
 
getKey() - Method in class org.archive.modules.net.CrawlServer
 
getKey() - Method in interface org.archive.util.IdentityCacheable
 
getKey() - Method in class org.archive.util.IdentityCacheableWrapper
 
getKeyedProperties() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getKeyedProperties() - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
getKeyedProperties() - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
getKeyedProperties() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
 
getKeyedProperties() - Method in class org.archive.modules.canonicalize.BaseRule
 
getKeyedProperties() - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
 
getKeyedProperties() - Method in class org.archive.modules.CrawlMetadata
 
getKeyedProperties() - Method in class org.archive.modules.credential.CredentialStore
 
getKeyedProperties() - Method in class org.archive.modules.deciderules.DecideRule
 
getKeyedProperties() - Method in class org.archive.modules.Processor
 
getKeyedProperties() - Method in class org.archive.modules.ProcessorChain
 
getKeyedProperties() - Method in interface org.archive.spring.HasKeyedProperties
 
getKind() - Method in class org.archive.crawler.restlet.Flash
 
getKryo() - Method in class org.archive.bdb.KryoBinding
 
getLargest() - Method in class org.archive.crawler.util.TopNSet
 
getLargestQueuesCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
remember this many largest queues for reporting's sake; actual tracking can be somewhat approximate when some queues shrink before others' sizes are again noted, or if the size is adjusted mid-crawl.
getLargestValue() - Method in class org.archive.util.Histotable
Return the largest value of any key that is larger than 0.
getLastActivityTime() - Method in class org.archive.crawler.framework.CrawlJob
 
getLastCacheMissDiff() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
getLastHop() - Method in class org.archive.modules.CrawlURI
convenience access to last hop character, as string
getLastLaunch() - Method in class org.archive.crawler.framework.CrawlJob
 
getLastLaunchTime() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
 
getLastResponseInputStream() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the stream used to read the last response's body.
getLastSnapshot() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getLastSuccessTime() - Method in class org.archive.modules.fetcher.FetchStats
 
getLaunchCount() - Method in class org.archive.crawler.framework.CrawlJob
 
getLinesExecuted() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getLinesExecuted() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getLinkCount() - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
getLinkHopCount() - Method in class org.archive.modules.CrawlURI
Get the link hop count.
getListLogicalOr() - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
 
getLiveHostReportSize() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getLocalAddress() - Method in class org.apache.commons.httpclient.HttpConnection
Return the local address used when creating the connection.
getLocalName() - Method in class org.archive.crawler.processor.CrawlMapper
 
getLogExtraInfo() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getLogFile() - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
getLogger() - Static method in class org.archive.crawler.util.RecoveryLogMapper
 
getLoggerModule() - Method in class org.archive.crawler.framework.CrawlController
 
getLoggerModule() - Method in class org.archive.crawler.framework.Scoper
 
getLoggerModule() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getLoggerModule() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getLoggerModule() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getLoggerModule() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
getLoggerModule() - Method in class org.archive.modules.extractor.Extractor
 
getLoggerModule() - Method in class org.archive.modules.forms.FormLoginProcessor
 
getLogin() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getLoginPassword() - Method in class org.archive.modules.forms.FormLoginProcessor
 
getLoginUri() - Method in class org.archive.modules.credential.HtmlFormCredential
 
getLoginUsername() - Method in class org.archive.modules.forms.FormLoginProcessor
 
getLogRejectsRule() - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
getLogToFile() - Method in class org.archive.crawler.framework.Scoper
 
getLogToFile() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
getLookup() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
getLowerBound() - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Returns the lower bound on the range of acceptable status codes.
getLowerBound() - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Returns the lower bound on the range of acceptable status codes.
getLowerBound() - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
 
getMap() - Method in class org.archive.spring.Sheet
Return map of full bean-path (starting with a target bean-name) to the alternate value for that targeted property
getMap() - Method in class org.archive.util.ObjectIdentityMemCache
Offer raw map access for convenience of checkpoint/recovery.
getMapPath() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
getMapUri() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
getMaxAttributeNameLength() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getMaxAttributeValLength() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getMaxBytesDownload() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
getMaxDelayMs() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getMaxDocumentsDownload() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
getMaxElementLength() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getMaxFetchKBSec() - Method in class org.archive.modules.fetcher.FetchFTP
 
getMaxFetchKBSec() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getMaxFileSizeBytes() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getMaxFileSizeBytes() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMaxHops() - Method in class org.archive.modules.deciderules.TooManyHopsDecideRule
 
getMaxInWait() - Method in class org.archive.crawler.frontier.AbstractFrontier
Maximum amount of time to wait for an inbound update event before giving up and rechecking on the ability to further fill the outbound queue.
getMaxInWait() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getMaxLengthBytes() - Method in class org.archive.modules.fetcher.FetchFTP
 
getMaxLengthBytes() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getMaxOutlinks() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getMaxOutlinks() - Method in interface org.archive.modules.extractor.ExtractorParameters
The maximum number of outlinks to discover from any URI's content.
getMaxPathDepth() - Method in class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
 
getMaxPathLength() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getMaxPerHostBandwidthUsageKbSec() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getMaxQueuesPerReportCategory() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getMaxRepetitions() - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
 
getMaxRetries() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getMaxSegLength() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getMaxSize() - Method in class org.archive.crawler.util.TopNSet
 
getMaxSizeToDigest() - Method in class org.archive.modules.extractor.HTTPContentDigest
 
getMaxSizeToParse() - Method in class org.archive.modules.extractor.ExtractorPDF
 
getMaxSizeToParse() - Method in class org.archive.modules.extractor.ExtractorUniversal
 
getMaxSpeculativeHops() - Method in class org.archive.modules.deciderules.TransclusionDecideRule
 
getMaxTimeSeconds() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
getMaxToeThreads() - Method in class org.archive.crawler.framework.CrawlController
 
getMaxTotalBytesToWrite() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMaxTransHops() - Method in class org.archive.modules.deciderules.TransclusionDecideRule
 
getMaxWaitForIdleMs() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMessage() - Method in class org.archive.crawler.event.CrawlStateEvent
 
getMessage() - Method in class org.archive.crawler.restlet.Flash
 
getMetadata() - Method in class org.archive.crawler.framework.CrawlController
 
getMetadata() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getMetadata() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getMetadata() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getMetadata() - Method in class org.archive.modules.writer.ARCWriterProcessor
 
getMetadata() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getMetadata() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMetadataProvider() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getMethodRetryHandler() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
use HttpMethodParams
getMigrateMap() - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
 
getMinDelayMs() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getModuleClass() - Method in class org.archive.state.ModuleTestBase
Returns the class of the module to test.
getMonitorConfigPaths() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getMonitorMounts() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
getMonitorPaths() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getName() - Method in class org.apache.commons.httpclient.HttpMethodBase
Obtains the name of the HTTP method as used in the HTTP request line, for example "GET" or "POST".
getName() - Method in class org.archive.checkpointing.Checkpoint
 
getName() - Method in class org.archive.modules.net.CrawlServer
 
getName() - Method in class org.archive.spring.ConfigPath
 
getName() - Method in class org.archive.spring.Sheet
 
getName() - Method in interface org.archive.util.ms.Entry
 
getNamedUserAgents() - Method in class org.archive.modules.net.Robotstxt
 
getNavlinksOnly() - Method in class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
 
getNext() - Method in interface org.archive.util.ms.Entry
 
getNextBlock(int) - Method in interface org.archive.util.ms.BlockFileSystem
Returns the number of the block that follows the given block.
getNextBlock(int) - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getNextCheckpointNumber() - Method in class org.archive.crawler.framework.CheckpointService
 
getNextNearestItem(DatabaseEntry, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
getNonfatalErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getNonfatalErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getNonFatalFailures() - Method in class org.archive.modules.CrawlURI
 
getNotModifiedBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getNotModifiedUrls() - Method in class org.archive.modules.fetcher.FetchStats
 
getNovelBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getNovelUrls() - Method in class org.archive.modules.fetcher.FetchStats
 
getObjectCache(String, boolean, Class<V>) - Method in class org.archive.bdb.BdbModule
 
getObjectCache(String, boolean, Class<V>, Class<? extends V>) - Method in class org.archive.bdb.BdbModule
Get an ObjectIdentityCache, backed by a BDB Database of the given name, with objects of the given valueClass type.
getOIBCCache(String, boolean, Class<? extends V>) - Method in class org.archive.bdb.BdbModule
Get an ObjectIdentityBdbCache, backed by a BDB Database of the given name, with the given value class type.
getOnlyStoreIfWriteTagPresent() - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
 
getOperator() - Method in class org.archive.modules.CrawlMetadata
 
getOperatorContactUrl() - Method in class org.archive.modules.CrawlMetadata
 
getOperatorFrom() - Method in class org.archive.modules.CrawlMetadata
 
getOrCreateSheet(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Get a Sheet of the given name, or create if it does not already exist.
getOrder() - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
getOrder() - Method in class org.archive.spring.ConfigPathConfigurer
Act as late as possible.
getOrdinal() - Method in class org.archive.modules.CrawlURI
Get the ordinal (serial number) assigned at creation.
getOrganization() - Method in class org.archive.modules.CrawlMetadata
 
getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityBdbCache
 
getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
getOrUse(String, Supplier<V>) - Method in interface org.archive.util.ObjectIdentityCache
get the object under the given key/name, using (and remembering) the object supplied by the supplier if no prior mapping exists -- but should not mutate object state
getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityMemCache
 
getOutCandidates() - Method in class org.archive.modules.CrawlURI
Returns discovered candidate URIs.
getOutlinkRule() - Method in class org.archive.crawler.processor.CrawlMapper
 
getOutLinks() - Method in class org.archive.modules.CrawlURI
Returns discovered links.
getOverlayMap(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Retrieve the named overlay Map.
getOverlayMap(String) - Method in class org.archive.modules.CrawlURI
 
getOverlayMap(String) - Method in interface org.archive.spring.OverlayContext
get the map corresponding to the overlay name
getOverlayMap(String) - Method in interface org.archive.spring.OverlayMapsSource
 
getOverlayNames() - Method in class org.archive.modules.CrawlURI
 
getOverlayNames() - Method in interface org.archive.spring.OverlayContext
return a list of the names of overlay maps to consider
getOverrideKeys(String) - Method in class org.archive.spring.KeyedProperties
Compose the complete keys (externalPath + local key name) to use for checking for contextual overrides.
getParallelQueues() - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
The number of parallel queues to split a core key into.
getParams() - Method in class org.apache.commons.httpclient.HttpConnection
Returns HTTP protocol parameters associated with this method.
getParams() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns HTTP protocol parameters associated with this method.
getPassword() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getPassword() - Method in class org.archive.modules.fetcher.FetchFTP
 
getPath() - Method in class org.apache.commons.httpclient.Cookie
Returns the path attribute of the cookie
getPath() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the path of this HTTP method.
getPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getPath() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getPath() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getPath() - Method in class org.archive.spring.ConfigPath
 
getPath() - Method in class org.archive.spring.ConfigPathConfigurer
 
getPathFromSeed() - Method in class org.archive.modules.CrawlURI
 
getPathQuery(CrawlURI) - Method in class org.archive.modules.net.RobotsPolicy
 
getPattern() - Method in enum org.archive.modules.deciderules.MatchesFilePatternDecideRule.Preset
 
getPauseAtStart() - Method in class org.archive.crawler.framework.CrawlController
 
getPauseThresholdKb() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
getPauseThresholdMiB() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
getPersistentDataKeys() - Static method in class org.archive.modules.CrawlURI
Add the key of items you want to persist across processings.
getPersistentDataMap() - Method in class org.archive.modules.CrawlURI
 
getPolicyBasisUURI() - Method in class org.archive.modules.CrawlURI
Get the UURI that should be used as the basis of policy/overlay decisions.
getPolitenessDelay() - Method in class org.archive.modules.CrawlURI
 
getPool() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getPoolMaxActive() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getPort() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the port of the host.
getPort() - Method in class org.archive.modules.net.CrawlServer
Get the port number for this server.
getPrecedence() - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
getPrecedence() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
getPrecedence() - Method in class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
 
getPrecedence() - Method in class org.archive.crawler.frontier.WorkQueue
 
getPrecedence() - Method in class org.archive.modules.CrawlURI
 
getPrecedenceFloor() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getPrecedenceProvider() - Method in class org.archive.crawler.frontier.WorkQueue
 
getPreferenceDepthHops() - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
getPreferenceDepthHops() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getPreferenceEmbedHops() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getPreferredVariant() - Method in class org.archive.crawler.restlet.BaseResource
If client can accept text/html, always prefer it.
getPrefix() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getPrefixClassKey(byte[]) - Static method in class org.archive.crawler.frontier.BdbWorkQueue
 
getPreloadSource() - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
getPreloadSourceUrl() - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
Return the authentication URI, either absolute or relative, that serves as prerequisite the passed curi.
getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
 
getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
Get the prerequisite for this URI.
getPrevious() - Method in interface org.archive.util.ms.Entry
 
getPrimaryConfig() - Method in class org.archive.crawler.framework.CrawlJob
 
getPrimaryConfigurationPath() - Method in class org.archive.spring.PathSharingContext
 
getProcessErrorOutlinks() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getProcessors() - Method in class org.archive.modules.ProcessorChain
 
getProcessStatus() - Method in class org.archive.modules.ProcessResult
 
getProfileCxmlResource() - Method in class org.archive.crawler.framework.Engine
 
getProgressLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getProgressStamp() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getProgressStatisticsLine() - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
Return one line of current progress-statistics
getProgressStats() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getPropertyDescriptors(BeanWrapperImpl) - Method in class org.archive.crawler.restlet.JobRelatedResource
Get and modify the PropertyDescriptors associated with the BeanWrapper.
getProtocol() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the protocol used to establish the connection.
getProxyAuthenticationRealm() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
use #getProxyAuthState()
getProxyAuthState() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the proxy authentication state
getProxyCredentials(String, String) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use #getProxyCredentials(AuthScope)
getProxyCredentials(AuthScope) - Method in class org.apache.commons.httpclient.HttpState
Get the proxy credentials for the given authentication scope.
getProxyHost() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the proxy host.
getProxyPort() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the port of the proxy host.
getPseudoXpath(Node) - Static method in class org.archive.crawler.migrate.MigrateH1to3Tool
Given a node, give back an XPath-like string that addresses it.
getQueryString() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the query string of this HTTP method.
getQueueAssignmentPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getQueueFor(String) - Method in class org.archive.crawler.frontier.BdbFrontier
Return the work queue for the given classKey, or null if no such queue exists.
getQueueFor(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return the work queue for the given classKey, or null if no such queue exists.
getQueuePrecedencePolicy() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getQueueTotalBudget() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getRawInput() - Method in interface org.archive.util.ms.BlockFileSystem
Returns the raw input stream for this file system.
getRawInput() - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getRawOutput() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getRawOutput() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getReader() - Method in class org.archive.crawler.restlet.EditRepresentation
 
getReader() - Method in class org.archive.crawler.restlet.PagedRepresentation
 
getRealm() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
getRecheckScope() - Method in class org.archive.crawler.prefetch.Preselector
 
getRecheckThresholdKb() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
getRecordedFinishes() - Method in class org.archive.modules.fetcher.FetchStats
 
getRecordedSize() - Method in class org.archive.modules.CrawlURI
Get size of data recorded (transferred)
getRecordedSize(CrawlURI) - Static method in class org.archive.modules.Processor
 
getRecorder() - Method in class org.archive.modules.CrawlURI
Get the http recorder associated with this uri.
getRecorderInBufferBytes() - Method in class org.archive.crawler.framework.CrawlController
 
getRecorderOutBufferBytes() - Method in class org.archive.crawler.framework.CrawlController
 
getRecordID() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getRecordIDGenerator() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getRecoverableExceptionCount() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
no longer used Returns the number of "recoverable" exceptions thrown and handled, to allow for monitoring the quality of the connection.
getRecoveryCheckpoint() - Method in class org.archive.crawler.framework.CheckpointService
 
getRecoveryLogEnabled() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getRedirectUri() - Method in class org.archive.crawler.reporting.SeedRecord
 
getReducePrefixRegex() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getReduceRegex(CrawlURI) - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getReference(ObjectName) - Static method in class org.archive.util.JndiUtils
 
getRegex() - Method in class org.archive.modules.canonicalize.RegexRule
 
getRegex() - Method in class org.archive.modules.deciderules.MatchesFilePatternDecideRule
Use a preset if configured to do so.
getRegex() - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
 
getRegex() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
getRegexList() - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
 
getRemaining() - Method in class org.archive.modules.fetcher.FetchStats
 
getRemoveTriggerUris() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
getReplyStrings() - Method in class org.archive.net.ClientFTP
 
getReports() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getReportsDir() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getRepresentation(Status, Request, Response) - Method in class org.archive.crawler.restlet.EngineApplication.EngineStatusService
 
getRequestCharSet() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the character encoding of the request from the Content-Type header.
getRequestHeader(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the specified request header.
getRequestHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the header group storing the request headers.
getRequestHeaders() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns an array of the requests headers that the HTTP method currently has
getRequestHeaders(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
 
getRequestOutputStream() - Method in class org.apache.commons.httpclient.HttpConnection
Returns an OutputStream suitable for writing the request.
getRescheduleDelaySeconds() - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
 
getRescheduleTime() - Method in class org.archive.modules.CrawlURI
 
getResourceDir() - Method in class org.archive.state.ModuleTestBase
Returns the location of the Java resources directory for your project.
getRespectCrawlDelayUpToSeconds() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getResponseBody() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the response body of the HTTP method, if any, as an array of bytes.
getResponseBodyAsStream() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the response body of the HTTP method, if any, as an InputStream.
getResponseBodyAsString() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the response body of the HTTP method, if any, as a String.
getResponseCharSet() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the character encoding of the response from the Content-Type header.
getResponseContentLength() - Method in class org.apache.commons.httpclient.HttpMethodBase
Return the length (in bytes) of the response body, as specified in a Content-Length header.
getResponseFooter(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the response footer associated with the given name.
getResponseFooters() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns an array of the response footers that the HTTP method currently has in the order in which they were read.
getResponseHeader(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the response header associated with the given name.
getResponseHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the header group storing the response headers.
getResponseHeaders(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
 
getResponseHeaders() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns an array of the response headers that the HTTP method currently has in the order in which they were read.
getResponseInputStream() - Method in class org.apache.commons.httpclient.HttpConnection
Return a InputStream suitable for reading the response.
getResponseStream() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns a stream from which the body of the current response may be read.
getResponseTrailerHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
Gets the header group storing the response trailer headers as per RFC 2616 section 3.6.1.
getRetiredQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getRetiredQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return queue of all retired queue names.
getRetryDelaySeconds() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getReverseSortedCopy(Map<String, AtomicLong>) - Method in class org.archive.crawler.reporting.StatisticsTracker
Sort the entries of the given Map in descending order by their values, which must be longs wrapped with AtomicLong.
getReverseSortedHostCounts(Map<String, AtomicLong>) - Method in class org.archive.crawler.reporting.StatisticsTracker
Return a copy of the hosts distribution in reverse-sorted (largest first) order.
getRobotsDenials() - Method in class org.archive.modules.fetcher.FetchStats
 
getRobotsPolicy() - Method in class org.archive.modules.CrawlMetadata
Get the currently-effective RobotsPolicy, as specified by the string name and chosen from the full available map.
getRobotsPolicyName() - Method in class org.archive.modules.CrawlMetadata
 
getRobotstxt() - Method in class org.archive.modules.net.CrawlServer
 
getRobotsValidityDurationSeconds() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getRoot() - Method in interface org.archive.util.ms.BlockFileSystem
Returns the root entry of the file system.
getRoot() - Method in class org.archive.util.ms.DefaultBlockFileSystem
 
getRotationDigits() - Method in class org.archive.crawler.processor.CrawlMapper
 
getRuleAssociations() - Method in class org.archive.crawler.spring.SheetOverlaysManager
All DecideRuledSheetAssociations, in Ordered order
getRules() - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
getRules() - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
 
getRules() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
getRuntimeErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getRuntimeErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getRuntimeSeconds() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
getRunWhileEmpty() - Method in class org.archive.crawler.framework.CrawlController
 
getSchedulingDirective(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
Calculate the coarse, original 'schedulingDirective' prioritization for the given CrawlURI
getSchedulingDirective() - Method in class org.archive.modules.CrawlURI
 
getSchedulingFor(CrawlURI, Link, int) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
Determine scheduling for the curi.
getSchemes() - Method in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
 
getScope() - Method in interface org.archive.crawler.framework.Frontier
 
getScope() - Method in class org.archive.crawler.framework.Scoper
 
getScope() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getScratchDir() - Method in class org.archive.crawler.framework.CrawlController
 
getScratchDisk() - Method in interface org.archive.modules.extractor.TempDirProvider
 
getScratchDisk() - Method in class org.archive.modules.net.DefaultTempDirProvider
 
getScript() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getScript() - Method in class org.archive.crawler.restlet.ScriptingConsole
 
getScriptSource() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
getScriptSource() - Method in class org.archive.modules.ScriptedProcessor
 
getSecure() - Method in class org.apache.commons.httpclient.Cookie
 
getSeedCollection() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSeedForUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
Returns seed for urlString (null if seed not found).
getSeedListeners() - Method in class org.archive.modules.seeds.SeedModule
 
getSeeds() - Method in class org.archive.crawler.framework.ActionDirectory
 
getSeeds() - Method in class org.archive.crawler.framework.CrawlController
 
getSeeds() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getSeeds() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getSeeds() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getSeeds() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getSeedsAsSurtPrefixes() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getSeedsIterator() - Method in class org.archive.crawler.reporting.StatisticsTracker
Get a seed iterator for the job being monitored.
getSeedsRedirectNewSeeds() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getSeedsRedirectNewSeeds() - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
getSeedUrlToDiscoveredUrlsMap() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSendBufferSize() - Method in class org.apache.commons.httpclient.HttpConnection
Gets the socket's sendBufferSize.
getSendConnectionClose() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSendIfModifiedSince() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSendIfNoneMatch() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSendRange() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSendReferer() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSerialNo() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getSerialNumber() - Method in class org.archive.crawler.framework.ToeThread
 
getServerCache() - Method in class org.archive.crawler.framework.CrawlController
 
getServerCache() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getServerCache() - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
getServerCache() - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
getServerCache() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
getServerCache() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
getServerCache() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getServerCache() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getServerCache() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
getServerCache() - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
getServerCache() - Method in class org.archive.modules.fetcher.FetchDNS
 
getServerCache() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getServerCache() - Method in class org.archive.modules.fetcher.FetchWhois
 
getServerCache() - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
getServerCache() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getServerFor(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
Get the CrawlServer associated with name.
getServerFor(String) - Method in class org.archive.modules.net.ServerCache
 
getServerFor(UURI) - Method in class org.archive.modules.net.ServerCache
Get the CrawlServer associated with curi.
getServerKey(UURI) - Static method in class org.archive.modules.net.CrawlServer
Get key to use doing lookup on server instances.
getServerMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getServerMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getServerMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getServerMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
getSessionBalance() - Method in class org.archive.crawler.frontier.WorkQueue
 
getSessionBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Return current session 'activity budget balance'
getSheetOverlaysManager() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
getSheetOverlaysManager() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
getSheetsByName() - Method in class org.archive.crawler.spring.SheetOverlaysManager
Sheets, by name; starts with all autowired Sheets but others may be added by other means (mid-crawl reconfiguration).
getSheetsNamesBySurt() - Method in class org.archive.crawler.spring.SheetOverlaysManager
Sheet names, by the SURT prefix to which they should be applied.
getShortName() - Method in class org.archive.checkpointing.Checkpoint
 
getShortName() - Method in class org.archive.crawler.framework.CrawlJob
 
getShouldFetchBodyRule() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getShouldMasquerade() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
getShouldMasquerade() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
getShouldProcessRule() - Method in class org.archive.modules.Processor
 
getShouldReportAtEndOfCrawl() - Method in class org.archive.crawler.reporting.Report
 
getShouldReportDuringCrawl() - Method in class org.archive.crawler.reporting.Report
 
getSizeBytes() - Method in interface org.archive.util.BloomFilter
The amount of memory in bytes consumed by the bloom bitfield.
getSizeBytes() - Method in class org.archive.util.BloomFilter64bit
 
getSkipIdenticalDigests() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getSlotState(long) - Method in class org.archive.util.AbstractLongFPSet
Check the state of a slot in the storage.
getSlotState(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
getSmallest() - Method in class org.archive.crawler.util.TopNSet
 
getSnapshot() - Method in class org.archive.crawler.event.StatSnapshotEvent
 
getSnapshot() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getSnoozedCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getSnoozeLongMs() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getSocket() - Method in class org.apache.commons.httpclient.HttpConnection
Returns the connection socket.
getSortedByCounts() - Method in class org.archive.util.Histotable
 
getSortedByKeys() - Method in class org.archive.util.Histotable
 
getSortedDuplicates() - Method in class org.archive.bdb.BdbModule.BdbConfig
 
getSortKey() - Method in class org.apache.commons.httpclient.Cookie
Create a 'sort key' for this Cookie that will cause it to sort alongside other Cookies of the same domain (with or without leading '.').
getSoTimeout() - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnectionParams.getSoTimeout(), HttpConnection.getParams().
getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchFTP
 
getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchWhois
 
getSource() - Method in class org.archive.modules.extractor.Link
 
getSourceCodeDir() - Method in class org.archive.state.ModuleTestBase
Returns the location of the source code directory for your project.
getSourceTag() - Method in class org.archive.modules.CrawlURI
 
getSourceTagSeeds() - Method in class org.archive.modules.seeds.SeedModule
 
getSslTrustLevel() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getStackTrace() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
getStartNewFilesOnCheckpoint() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getState() - Method in class org.archive.crawler.event.CrawlStateEvent
 
getState() - Method in class org.archive.crawler.framework.CrawlController
 
getStaticRef(String) - Method in class org.archive.crawler.restlet.BaseResource
 
getStaticRef(String) - Method in class org.archive.crawler.restlet.EditRepresentation
 
getStatisticsTracker() - Method in class org.archive.crawler.framework.CrawlController
 
getStatisticsTracker() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
getStats() - Method in class org.archive.crawler.framework.CrawlJob
 
getStatusCode() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the response status code.
getStatusCode() - Method in class org.archive.crawler.reporting.SeedRecord
 
getStatusCodeDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
Return a objectCache representing the distribution of status codes for successfully fetched curis, as represented by a cache where key -> val represents (string)code -> (integer)count.
getStatusCodes() - Method in class org.archive.modules.deciderules.FetchStatusDecideRule
 
getStatusLine() - Method in class org.apache.commons.httpclient.HttpMethodBase
Provides access to the response status line.
getStatusText() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the status text (or "reason phrase") associated with the latest response.
getStep() - Method in class org.archive.crawler.framework.ToeThread
 
getStoredMap(String, Class<K>, Class<V>, boolean, boolean) - Method in class org.archive.bdb.BdbModule
Creates a database-backed TempStoredSortedMap for transient reporting requirements.
getStoredQueue(String, Class<K>, boolean) - Method in class org.archive.bdb.BdbModule
 
getStorePaths() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getString(byte[], int, int, String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the byte array of HTTP content characters to a string.
getString(byte[], String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
Converts the byte array of HTTP content characters to a string.
getString(CrawlURI) - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
 
getString(CrawlURI) - Method in class org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule
 
getString(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusMatchesRegexDecideRule
 
getString(CrawlURI) - Method in class org.archive.modules.deciderules.HopsPathMatchesRegexDecideRule
 
getString(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
 
getStripRegex() - Method in class org.archive.modules.extractor.HTTPContentDigest
 
getSubContext(String) - Static method in class org.archive.util.JndiUtils
Get subcontext.
getSubContext(CompoundName) - Static method in class org.archive.util.JndiUtils
Get subcontext.
getSubqueue(UURI, int) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
getSubstats() - Method in class org.archive.crawler.frontier.WorkQueue
 
getSubstats() - Method in interface org.archive.modules.fetcher.FetchStats.HasFetchStats
 
getSubstats() - Method in class org.archive.modules.net.CrawlHost
 
getSubstats() - Method in class org.archive.modules.net.CrawlServer
 
getSuccess() - Method in class org.archive.checkpointing.Checkpoint
 
getSuccessBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getSuccessfullyCrawledUrls() - Method in class org.archive.crawler.util.RecoveryLogMapper
 
getSuffixAtEnd() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getSupplementaryRule() - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
getSurtAuthority(String) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
getSurtPrefixes() - Method in class org.archive.crawler.spring.SurtPrefixesSheetAssociation
 
getSurtsDumpFile() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getSurtsSource() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
getSurtsSourceFile() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Deprecated.
redundant now that we have SurtPrefixedDecideRule.surtsSource
getTags() - Method in class org.archive.io.ReadSourceEditor
 
getTags() - Method in class org.archive.spring.ConfigPathEditor
 
getTargetSheetNames() - Method in class org.archive.crawler.spring.SheetAssociation
 
getTemplate() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
getTemplate() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getTemplateConfiguration() - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
getTemplateConfiguration() - Method in class org.archive.crawler.restlet.EngineResource
 
getTemplateConfiguration() - Method in class org.archive.crawler.restlet.JobResource
 
getTemplateConfiguration() - Method in class org.archive.crawler.restlet.ScriptResource
 
getTestEnvironment(File) - Static method in class org.archive.util.bdbje.EnhancedEnvironment
Create a temporary test environment in the given directory.
getText(String) - Static method in class org.archive.util.ms.Doc
Returns the text of the .doc file with the given file name.
getText(File) - Static method in class org.archive.util.ms.Doc
Returns the text of the given .doc file.
getText(SeekInputStream) - Static method in class org.archive.util.ms.Doc
Returns the text of the given .doc file.
getText(BlockFileSystem, int) - Static method in class org.archive.util.ms.Doc
Returns the text for the given .doc file.
getTextSource() - Method in class org.archive.modules.seeds.TextSeedModule
 
getThreadNumber() - Method in class org.archive.modules.CrawlURI
Get the number of the ToeThread responsible for processing this uri.
getTimeoutSeconds() - Method in class org.archive.modules.fetcher.FetchFTP
 
getTimeoutSeconds() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getTmpDir() - Method in class org.archive.util.TmpDirTestCase
 
getToeCount() - Method in class org.archive.crawler.framework.CrawlController
 
getToeCount() - Method in class org.archive.crawler.framework.ToePool
 
getToePool() - Method in class org.archive.crawler.framework.CrawlController
 
getToeThreadReport() - Method in class org.archive.crawler.framework.CrawlController
 
getToeThreadReportShort() - Method in class org.archive.crawler.framework.CrawlController
 
getToeThreadReportShortData() - Method in class org.archive.crawler.framework.CrawlController
 
getTooLongDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getTopSet() - Method in class org.archive.crawler.util.TopNSet
Make internal map available (for checkpoint/restore purposes).
getTotal() - Method in class org.archive.util.Histotable
Return the total of all tallies.
getTotalBytes() - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
getTotalBytes() - Method in class org.archive.modules.fetcher.FetchStats
 
getTotalBytesWritten() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getTotalEligibleInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Total of all URIs in inactive queues at precedences above the floor
getTotalExpenditure() - Method in class org.archive.crawler.frontier.WorkQueue
Return the tally of all expenditures on this queue
getTotalInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Total of all URIs in inactive queues at all precedences
getTotalIneligibleInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Total of all URIs in inactive queues at precedences at or below the floor
getTotalScheduled() - Method in class org.archive.modules.fetcher.FetchStats
 
getTotalUrls() - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
getTrackSeeds() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getTrackSources() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
getTransHops() - Method in class org.archive.modules.CrawlURI
Tally up the number of transitive (non-simple-link) hops at the end of this CrawlURI's pathFromSeed.
getTreatFramesAsEmbedLinks() - Method in class org.archive.modules.extractor.ExtractorHTML
 
getType() - Method in interface org.archive.util.ms.Entry
 
getUnderscoreSet() - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
getUpperBound() - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Returns the upper bound on the range of acceptable status codes.
getUpperBound() - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Returns the upper bound on the range of acceptable status codes.
getUpperBound() - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
 
getURI() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns the URI of the HTTP method
getUri() - Method in class org.archive.crawler.reporting.SeedRecord
 
getURI() - Method in class org.archive.modules.CrawlURI
 
getURICount() - Method in class org.archive.modules.Processor
Returns the number of URIs this processor has handled.
getUriErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getUriErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getUriPrecedencePolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
getUriProcessing() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
getUriRegex() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
getURIs() - Method in class org.archive.modules.extractor.PDFParser
Get a list of URIs retrieved from the Pdf during the extractURIs operation.
getURIsList(String, int, String, boolean) - Method in interface org.archive.crawler.framework.Frontier
Returns a list of all uncrawled URIs starting from a specified marker until numberOfMatches is reached.
getURIsList(String, int, String, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
Return list of urls.
getUriUniqFilter() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
getURL(String, String) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
Overwrite handling of discovered URIs.
getUseHardLinkCheckpoints() - Method in class org.archive.bdb.BdbModule
 
getUseHeaderLength() - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
getUseHTTP11() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getUsePreset() - Method in class org.archive.modules.deciderules.MatchesFilePatternDecideRule
 
getUsePublicSuffixesRegex() - Method in class org.archive.crawler.processor.HashCrawlMapper
 
getUserAgent() - Method in class org.archive.modules.CrawlMetadata
 
getUserAgent() - Method in class org.archive.modules.CrawlURI
Get the user agent to use for crawling this URI.
getUserAgent() - Method in interface org.archive.modules.fetcher.UserAgentProvider
 
getUserAgentProvider() - Method in class org.archive.modules.fetcher.FetchHTTP
 
getUserAgentTemplate() - Method in class org.archive.modules.CrawlMetadata
 
getUsername() - Method in class org.archive.modules.fetcher.FetchFTP
 
getUseSharedCache() - Method in class org.archive.bdb.BdbModule
 
getUURI() - Method in class org.archive.modules.CrawlURI
 
getValidator() - Method in class org.archive.crawler.framework.CheckpointService
 
getValidator() - Method in class org.archive.modules.CrawlMetadata
 
getValidator() - Method in interface org.archive.spring.HasValidator
 
getValidDateFormats() - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Returns the Collection of date patterns used for parsing.
getValidDateFormats() - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
 
getValidDateFormats() - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
getValidTestData() - Method in class org.archive.modules.extractor.StringExtractorTestBase
Returns an array of valid test data pairs.
getValue() - Method in class org.archive.io.ReadSourceEditor
 
getValue() - Method in class org.archive.spring.ConfigFileEditor
 
getValue() - Method in class org.archive.spring.ConfigPathEditor
 
getValue() - Method in class org.archive.spring.ConfigString
 
getVariants() - Method in class org.archive.crawler.restlet.EnhDirectoryResource
Add EditRepresentation as a variant when appropriate.
getVersion() - Method in class org.apache.commons.httpclient.Cookie
Returns the version of the cookie specification to which this cookie conforms.
getVia() - Method in class org.archive.modules.CrawlURI
 
getViaContext() - Method in class org.archive.modules.CrawlURI
 
getVirtualHost() - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
no longer applicable
getWakeTime() - Method in class org.archive.crawler.frontier.WorkQueue
 
getWhoisQuery(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
getWhoisServer(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
getWorkQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
getWriteBufferSize() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
getWriteMetadata() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getWriteRequests() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getWriteRevisitForIdenticalDigests() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getWriteRevisitForNotModified() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
getXmlWriter(Writer) - Static method in class org.archive.crawler.restlet.XmlMarshaller
 
groovyTemplate() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
groovyTemplates - Variable in class org.archive.modules.extractor.ExtractorMultipleRegex
 
GROUP - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
gzipFile - Variable in class org.archive.io.CrawlerJournal
File we're writing journal to.

H

handle401(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
Server is looking for basic/digest auth credentials (RFC2617).
handlePrerequisite(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
The CrawlURI has a prerequisite; apply scoping and update Link to CrawlURI in manner analogous to outlink handling.
handleQueue(WorkQueue, boolean, long, long) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Send an active queue to its next state, based on the supplied parameters.
Handler - Class in org.archive.net.s3
Handler for Amazon S3 URLs of the form s3://id:secret@bucket/key
Handler() - Constructor for class org.archive.net.s3.Handler
 
handleSeed(CrawlURI, String) - Method in class org.archive.crawler.reporting.StatisticsTracker
If the curi is a seed, we update the processedSeeds cache.
handleUnregisteredClass(Class) - Method in class org.archive.bdb.AutoKryo
 
harvester - Variable in class org.archive.modules.writer.Kw3WriterProcessor
Name of the harvester that is used for the web harvesting.
HARVESTER_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
hasApplicationContext() - Method in class org.archive.crawler.framework.CrawlJob
 
hasAvailableCheckpoints() - Method in class org.archive.crawler.framework.CheckpointService
 
hasBeenLinkExtracted() - Method in class org.archive.modules.CrawlURI
If true then a link extractor has already claimed this CrawlURI and performed link extraction on the document content.
hasBeenLookedUp() - Method in class org.archive.modules.net.CrawlHost
Return true if the IP for this host has been looked up.
hasBeenUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true if the HTTP method has been already executed, but not recycled.
hasContentDigestHistory() - Method in class org.archive.modules.CrawlURI
 
hasCredentials() - Method in class org.archive.modules.CrawlURI
 
hasCredentials() - Method in class org.archive.modules.net.CrawlServer
 
hasData() - Method in class org.archive.modules.extractor.Link
 
hasErrors - Variable in class org.archive.modules.net.Robotstxt
 
hash(CharSequence, int, int) - Method in class org.archive.util.BloomFilter64bit
Hashes the given sequence with the given hash function.
hash(CharSequence) - Method in class org.archive.util.LongToIntConsistentHash
 
hashCode() - Method in class org.apache.commons.httpclient.Cookie
Returns a hash code in keeping with the Object.hashCode() general hashCode contract.
hashCode() - Method in class org.archive.modules.extractor.Link
 
hashCode() - Method in class org.archive.modules.extractor.LinkContext
 
hashCode() - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
All instances of DefaultProtocolSocketFactory have the same hash code.
hashCode() - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
 
hashCode() - Method in class org.archive.modules.net.CrawlHost
 
hashCode() - Method in class org.archive.modules.net.CrawlServer
 
HashCrawlMapper - Class in org.archive.crawler.processor
Maps URIs to one of N crawler names by applying a hash to the URI's (possibly-transformed) classKey.
HashCrawlMapper() - Constructor for class org.archive.crawler.processor.HashCrawlMapper
Constructor.
hashSet - Variable in class org.archive.crawler.util.MemUriUniqFilter
 
hasHttpAuthenticationCredential(CrawlURI) - Static method in class org.archive.modules.Processor
 
hasIdenticalDigest(CrawlURI) - Static method in class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
Utility method for testing if a CrawlURI's last two history entries (one being the most recent fetch) have identical content-digest information.
HasKeyedProperties - Interface in org.archive.spring
Interface indicating an object has an internal map of properties, and thus at least partially amenable to sheet-based contextual overriding of properties.
hasNext() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Test whether any items remain; loads next item into holding 'next' field.
hasNext() - Method in class org.archive.util.iterator.CompositeIterator
 
hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
 
hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
 
hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
hasPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
 
hasRfc2617Credential() - Method in class org.archive.modules.CrawlURI
 
hasStarted - Variable in class org.archive.crawler.framework.CrawlController
 
hasStarted() - Method in class org.archive.crawler.framework.CrawlController
 
hasValidApplicationContext() - Method in class org.archive.crawler.framework.CrawlJob
Did the ApplicationContext self-validate? return true if validation passed without errors
HasValidator - Interface in org.archive.spring
 
hasValidStamp(File) - Static method in class org.archive.checkpointing.Checkpoint
 
HasViaDecideRule - Class in org.archive.modules.deciderules
Rule applies the configured decision for any URI which has a 'via' (essentially, any URI that was a seed or some kinds of mid-crawl adds).
HasViaDecideRule() - Constructor for class org.archive.modules.deciderules.HasViaDecideRule
Usual constructor.
hasWriteTag(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
 
haveOverlayNamesBeenSet() - Method in class org.archive.modules.CrawlURI
 
haveOverlayNamesBeenSet() - Method in interface org.archive.spring.OverlayContext
test if this context has actually been configured with overlays (even if in fact no overlays were added)
haveSeen(int, int) - Method in class org.archive.modules.extractor.PDFParser
Indicates, based on a PDFObject's generation/id pair whether the parser has already encountered this object (or a reference to it) so we don't infinitely loop on circuits within the PDF.
HEADER_LENGTH_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
HEADER_MD5_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
HEADER_PREDICTS_MISSING - Static variable in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
HEADER_TRUNC - Static variable in interface org.archive.modules.CoreAttributeConstants
 
HEADER_TRUNC - Static variable in class org.archive.modules.fetcher.FetchErrors
 
headSetInclusive(SortedSet<String>, String) - Static method in class org.archive.util.PrefixFinder
 
heapReport() - Method in class org.archive.crawler.framework.Engine
 
heapReportData() - Method in class org.archive.crawler.framework.Engine
 
Heritrix - Class in org.archive.crawler
Main class for Heritrix crawler.
Heritrix() - Constructor for class org.archive.crawler.Heritrix
 
HeritrixHttpMethodRetryHandler - Class in org.archive.modules.fetcher
Retry handler that tries ten times to establish connection and then once established, if a GET method, tries ten times to get response (If POST, it tries once only).
HeritrixHttpMethodRetryHandler() - Constructor for class org.archive.modules.fetcher.HeritrixHttpMethodRetryHandler
Constructor.
HeritrixHttpMethodRetryHandler(int) - Constructor for class org.archive.modules.fetcher.HeritrixHttpMethodRetryHandler
Constructor.
HeritrixLifecycleProcessor - Class in org.archive.spring
Stand-in LifecycleProcessor to avoid a full automatic start() when our ApplicationContext (PathSharingContext) is built ('refreshed').
HeritrixLifecycleProcessor() - Constructor for class org.archive.spring.HeritrixLifecycleProcessor
 
HeritrixProtocolSocketFactory - Class in org.archive.modules.fetcher
Version of protocol socket factory that tries to get IP from heritrix IP cache -- if its been set into the HttpConnectionParameters.
HeritrixProtocolSocketFactory() - Constructor for class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
Constructor.
HeritrixSSLProtocolSocketFactory - Class in org.archive.modules.fetcher
Implementation of the commons-httpclient SSLProtocolSocketFactory so we can return SSLSockets whose trust manager is ConfigurableX509TrustManager.
HeritrixSSLProtocolSocketFactory() - Constructor for class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
Shutdown constructor.
HIDDEN_PROPS - Static variable in class org.archive.crawler.restlet.JobRelatedResource
suppress problematic properties
HIGH - Static variable in class org.archive.modules.SchedulingConstants
High scheduling priority.
HIGHEST - Static variable in class org.archive.modules.SchedulingConstants
Highest scheduling priority.
highestPrecedenceWaiting - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
HighestUriQueuePrecedencePolicy - Class in org.archive.crawler.frontier.precedence
QueuePrecedencePolicy that sets a uri-queue's precedence to that of the highest URI currently enqueued within itself, added to the configured base-precedence.
HighestUriQueuePrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy
 
HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider - Class in org.archive.crawler.frontier.precedence
Helper provider for maintaining the tracked distribution of included URIs and calculating the queue precedence.
HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider(int) - Constructor for class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
HISTORY_DB_CONFIG - Static variable in class org.archive.modules.recrawl.PersistProcessor
 
historyDb - Variable in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
historyDb - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
historyDb - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
 
historyDbConfig - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
historyDbConfig() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
historyDbName - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
historyDbName - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
 
historyLength - Variable in class org.archive.modules.recrawl.FetchHistoryProcessor
Desired history array length.
Histotable<K> - Class in org.archive.util
Collect and report frequency information.
Histotable() - Constructor for class org.archive.util.Histotable
 
holder - Variable in class org.archive.modules.CrawlURI
 
holderCost - Variable in class org.archive.modules.CrawlURI
spot for an integer cost to be placed by external facility (frontier).
holderKey - Variable in class org.archive.modules.CrawlURI
 
hookupDatabase(Database, Class<E>, StoredClassCatalog) - Method in class org.archive.bdb.StoredQueue
 
Hop - Enum in org.archive.modules.extractor
The kind of "hop" from one URI to another.
HopCrossesAssignmentLevelDomainDecideRule - Class in org.archive.modules.deciderules
Applies its decision if the current URI differs in that portion of its hostname/domain that is assigned/sold by registrars, its 'assignment-level-domain' (ALD) (AKA 'public suffix' or in previous Heritrix versions, 'topmost assigned SURT')
HopCrossesAssignmentLevelDomainDecideRule() - Constructor for class org.archive.modules.deciderules.HopCrossesAssignmentLevelDomainDecideRule
 
HopsPathMatchesRegexDecideRule - Class in org.archive.modules.deciderules
Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regex.
HopsPathMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.HopsPathMatchesRegexDecideRule
Usual constructor.
hopString - Variable in enum org.archive.modules.extractor.Hop
 
HopsUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
UriPrecedencePolicy which assigns URIs a precedence equal to the number of hops in its hops-path-from-seed (either all hops or just navlink ('L') hops.
HopsUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
 
HOST - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
hostKeys() - Method in class org.archive.modules.fetcher.DefaultServerCache
 
hostKeys() - Method in class org.archive.modules.net.ServerCache
 
hostMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
This list is grouped in pairs.
HostnameQueueAssignmentPolicy - Class in org.archive.crawler.frontier
QueueAssignmentPolicy based on the hostname:port evident in the given CrawlURI.
HostnameQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
 
HostResolver - Interface in org.archive.modules.fetcher
 
hosts - Variable in class org.archive.modules.fetcher.DefaultServerCache
hostname -> CrawlHost.
hostsBytesTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
hostsDistributionTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
hostsLastFinishedTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
HostsReport - Class in org.archive.crawler.reporting
The "Hosts Report", tallies by host.
HostsReport() - Constructor for class org.archive.crawler.reporting.HostsReport
 
HTML_TAGS - Static variable in class org.archive.util.UriUtils
 
HTMLForm - Class in org.archive.modules.forms
Simple representation of a discovered HTML Form.
HTMLForm() - Constructor for class org.archive.modules.forms.HTMLForm
 
HTMLForm.FormInput - Class in org.archive.modules.forms
 
HTMLForm.FormInput() - Constructor for class org.archive.modules.forms.HTMLForm.FormInput
 
HtmlFormCredential - Class in org.archive.modules.credential
Credential that holds all needed to do a GET/POST to a HTML form.
HtmlFormCredential() - Constructor for class org.archive.modules.credential.HtmlFormCredential
Constructor.
HTMLLinkContext - Class in org.archive.modules.extractor
XPath-like context for HTML discovered URIs.
HTMLLinkContext(String) - Constructor for class org.archive.modules.extractor.HTMLLinkContext
Constructor.
HTMLLinkContext(CharSequence, CharSequence) - Constructor for class org.archive.modules.extractor.HTMLLinkContext
 
HTTP_BIND_ADDRESS - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
HTTP_SCHEME - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
HttpAuthenticationCredential - Class in org.archive.modules.credential
A Basic/Digest HTTP Authentication (RFC2617) credential.
HttpAuthenticationCredential() - Constructor for class org.archive.modules.credential.HttpAuthenticationCredential
Constructor.
HttpConnection - Class in org.apache.commons.httpclient
An abstraction of an HTTP InputStream and OutputStream pair, together with the relevant attributes.
HttpConnection(String, int) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host and port.
HttpConnection(String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host and port using the given protocol.
HttpConnection(String, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host with the virtual alias and port using given protocol.
HttpConnection(String, int, String, int) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host and port via the given proxy host and port using the default protocol.
HttpConnection(HostConfiguration) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host configuration.
HttpConnection(String, int, String, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
Deprecated.
use #HttpConnection(String, int, String, int, Protocol)
HttpConnection(String, int, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
Creates a new HTTP connection for the given host with the virtual alias and port via the given proxy host and port using the given protocol.
HTTPContentDigest - Class in org.archive.modules.extractor
A processor for calculating custom HTTP content digests in place of the default (if any) computed by the HTTP fetcher processors.
HTTPContentDigest() - Constructor for class org.archive.modules.extractor.HTTPContentDigest
Constructor.
httpMethod - Variable in class org.archive.modules.credential.HtmlFormCredential
GET or POST.
HttpMethodBase - Class in org.apache.commons.httpclient
An abstract base implementation of HttpMethod.
HttpMethodBase() - Constructor for class org.apache.commons.httpclient.HttpMethodBase
No-arg constructor.
HttpMethodBase(String) - Constructor for class org.apache.commons.httpclient.HttpMethodBase
Constructor specifying a URI.
HttpParser - Class in org.apache.commons.httpclient
This class exists solely for compatibility, it's with httpclient The actual functionality is in LaxHttpParser
HttpParser() - Constructor for class org.apache.commons.httpclient.HttpParser
 
HTTPS_SCHEME - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
HttpState - Class in org.apache.commons.httpclient
A container for HTTP attributes that may persist from request to request, such as cookies and authentication credentials.
HttpState() - Constructor for class org.apache.commons.httpclient.HttpState
Default constructor.

I

id - Variable in class org.archive.net.s3.S3URLConnection
 
IdenticalDigestDecideRule - Class in org.archive.modules.deciderules.recrawl
Rule applies configured decision to any CrawlURIs whose prior-history content-digest matches the latest fetch.
IdenticalDigestDecideRule() - Constructor for class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
Usual constructor.
IdentityCacheable - Interface in org.archive.util
Common interface for objects held in ObjectIdentityCaches.
IdentityCacheableWrapper<K> - Class in org.archive.util
Wrapper allowing other objects to be held in an ObjectIdentityCache.
IdentityCacheableWrapper(String, K) - Constructor for class org.archive.util.IdentityCacheableWrapper
 
IgnoreCookiesSpec - Class in org.apache.commons.httpclient.cookie
A cookie spec that does nothing.
IgnoreCookiesSpec() - Constructor for class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
IgnoreRobotsPolicy - Class in org.archive.modules.net
Policy to ignore robots.
IgnoreRobotsPolicy() - Constructor for class org.archive.modules.net.IgnoreRobotsPolicy
 
IMG_SRC - Static variable in class org.archive.modules.extractor.HTMLLinkContext
 
importRecoverFormat(File, boolean, boolean, boolean, String) - Method in interface org.archive.crawler.framework.Frontier
Import URIs from the given file (in recover-log-like format, with a 3-character 'type' tag preceding a URI with optional hops/via).
importRecoverFormat(File, boolean, boolean, boolean, String) - Method in class org.archive.crawler.frontier.AbstractFrontier
Import URIs from the given file (in recover-log-like format, with a 3-character 'type' tag preceding a URI with optional hops/via).
importRecoverLog(JSONObject, Frontier) - Static method in class org.archive.crawler.frontier.FrontierJournal
Utility method for scanning a recovery journal and applying it to a Frontier.
importURIs(String) - Method in interface org.archive.crawler.framework.Frontier
Load URIs from a file, for scheduling and/or considered-included status (if from a recovery log).
importURIs(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
importURIsSimple(JSONObject) - Method in class org.archive.crawler.frontier.AbstractFrontier
Import URIs from either a simple (one URI per line) or crawl.log format.
inactiveQueuesByPrecedence - Variable in class org.archive.crawler.frontier.BdbFrontier
All 'inactive' queues, not yet in active rotation.
included(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
includesRetireDirective() - Method in class org.archive.modules.CrawlURI
 
incrementConsecutiveConnectionErrors() - Method in class org.archive.modules.net.CrawlServer
 
incrementDeferrals() - Method in class org.archive.modules.CrawlURI
Increment the deferral count.
incrementDiscardedOutLinks() - Method in class org.archive.modules.CrawlURI
 
incrementDisregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of disregarded URIs.
incrementFailedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of failed URIs.
incrementFetchAttempts() - Method in class org.archive.modules.CrawlURI
Increment the count of attempts (trips through the processing loop) at getting the document referenced by this URI.
incrementMapCount(ConcurrentMap<String, AtomicLong>, String) - Static method in class org.archive.crawler.reporting.StatisticsTracker
Increment a counter for a key in a given HashMap.
incrementMapCount(ConcurrentMap<String, AtomicLong>, String, long) - Static method in class org.archive.crawler.reporting.StatisticsTracker
Increment a counter for a key in a given HashMap by an arbitrary amount.
incrementQueuedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of queued URIs.
incrementQueuedUriCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of queued URIs.
incrementSucceededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
Increment the running count of successfully fetched URIs.
INDEX_FORMAT - Static variable in class org.archive.checkpointing.Checkpoint
format for serial numbers
indexOfCurrentIterator - Variable in class org.archive.util.iterator.CompositeIterator
 
INFERRED_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for inferred urls without other context.
inferRootPage - Variable in class org.archive.modules.extractor.ExtractorHTTP
should all HTTP URIs be used to infer a link to the site's root?
inheritFrom(CrawlURI) - Method in class org.archive.modules.CrawlURI
Inherit (copy) the relevant keys-values from the ancestor.
initAllQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
initAllQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Initialize the allQueues field in an implementation-appropriate way.
INITARGS - Static variable in class org.archive.bdb.AutoKryo
 
initialDelaySeconds - Variable in class org.archive.crawler.framework.ActionDirectory
how long after crawl start to first scan action directory
initialize(Database) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Method shared by constructors.
initialize(File) - Method in class org.archive.io.CrawlerJournal
 
initialize() - Method in class org.archive.modules.extractor.PDFParser
Initialize opens the document for reading.
initialize(Environment, String, Class, StoredClassCatalog) - Method in class org.archive.util.ObjectIdentityBdbCache
Call this method when you have an instance when you used the default constructor or when you have a deserialized instance that you want to reconnect with an extant bdbje environment.
initialize(Environment, String, Class, StoredClassCatalog) - Method in class org.archive.util.ObjectIdentityBdbManualCache
Call this method when you have an instance when you used the default constructor or when you have a deserialized instance that you want to reconnect with an extant bdbje environment.
initializeFromReader(BufferedReader) - Method in class org.archive.modules.net.Robotstxt
 
initInternalQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Initializes internal queues.
initLaunchDir() - Method in class org.archive.spring.PathSharingContext
 
initLaunchId() - Method in class org.archive.spring.PathSharingContext
 
initLifecycleProcessor() - Method in class org.archive.spring.PathSharingContext
Initialize the LifecycleProcessor.
initOtherQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
 
initOtherQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Initialize all other internal queues in an implementation-appropriate way.
initOutputStream(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
Get the OutputStream for the file to write to.
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.AcceptDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.ContentLengthDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PredicatedDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PrerequisiteAcceptDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.RejectDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.SeedAcceptDecideRule
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
Actually extracts links.
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorCSS
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorDOC
Processes a word document and extracts any hyperlinks from it.
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTML
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorJS
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorPDF
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorSWF
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorUniversal
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorXML
 
innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.TrapSuppressExtractor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
Run candidates chain on each of (1) any prerequisite, if present; (2) any outCandidates, if present; (3) all outlinks, if appropriate
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.CandidateScoper
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
innerProcess(CrawlURI) - Method in class org.archive.modules.extractor.Extractor
Processes the given URI.
innerProcess(CrawlURI) - Method in class org.archive.modules.extractor.HTTPContentDigest
 
innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchDNS
 
innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchFTP
Processes the given URI.
innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
 
innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
innerProcess(CrawlURI) - Method in class org.archive.modules.forms.FormLoginProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.Processor
Actually performs the process.
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryLoader
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryStorer
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistStoreProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.ScriptedProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
innerProcess(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
Notes a CrawlURI's content size in its running tally.
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.CandidateScoper
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
innerProcessResult(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
innerProcessResult(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
innerProcessResult(CrawlURI) - Method in class org.archive.modules.Processor
 
innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.ARCWriterProcessor
Writes a CrawlURI and its associated data to store file.
innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.WARCWriterProcessor
Writes a CrawlURI and its associated data to store file.
innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
innerRejectProcess(CrawlURI) - Method in class org.archive.modules.Processor
Invoked after a URI has been rejected.
innerRejectProcess(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.SimpleCookieStorage
 
inProcessQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
all per-class queues from whom a URI is outstanding
insertItem(WorkQueueFrontier, CrawlURI, boolean) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
insertItem(WorkQueueFrontier, CrawlURI, boolean) - Method in class org.archive.crawler.frontier.WorkQueue
Insert the given curi, whether it is already present or not.
insertKeyToString(DatabaseEntry) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
 
installProvider(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
Install the appropriate provider helper object into the WorkQueue, if necessary.
installProvider(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy
 
installReplicasUpTo(int) - Method in class org.archive.util.LongToIntConsistentHash
Install necessary replicas, if not already present.
INSTANCE - Static variable in class org.archive.modules.net.IgnoreRobotsPolicy
 
INSTANCE - Static variable in class org.archive.modules.net.ObeyRobotsPolicy
 
INSTANCE - Static variable in interface org.archive.util.CLibrary
 
INSTANCE - Static variable in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
 
instance - Variable in class org.archive.util.Supplier
 
instanceMain(String[]) - Method in class org.archive.crawler.Heritrix
 
instanceMain(String[]) - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
 
instanceMain(String[]) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
instantiateContainer() - Method in class org.archive.crawler.framework.CrawlJob
Can the configuration yield an assembled ApplicationContext?
interpolate(String) - Method in class org.archive.spring.ConfigPathConfigurer
 
intervalSeconds - Variable in class org.archive.crawler.reporting.StatisticsTracker
The interval between writing progress information to log.
invert(DecideResult) - Static method in enum org.archive.modules.deciderules.DecideResult
 
invokeStatic(String, Class<?>, Class<?>[], Object[]) - Method in class org.archive.bdb.AutoKryo
 
IP_ADDRESS - Static variable in class org.archive.modules.extractor.ExtractorUniversal
Matches any string that begins with http:// or https:// followed by something that looks like an ip address (four numbers, none longer then 3 chars seperated by 3 dots).
IP_ADDRESS_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
IP_ADDRESS_REGEX - Static variable in class org.archive.modules.fetcher.FetchWhois
 
IP_NEVER_EXPIRES - Static variable in class org.archive.modules.net.CrawlHost
Flag value indicating always-valid IP
IP_NEVER_LOOKED_UP - Static variable in class org.archive.modules.net.CrawlHost
Flag value indicating an IP has not yet been looked up
IpAddressSetDecideRule - Class in org.archive.modules.deciderules
IpAddressSetDecideRule must be used with Preselector.setRecheckScope(boolean) set to true because it relies on Heritrix' dns lookup to establish the ip address for a URI before it can run.
IpAddressSetDecideRule() - Constructor for class org.archive.modules.deciderules.IpAddressSetDecideRule
 
IPQueueAssignmentPolicy - Class in org.archive.crawler.frontier
Uses target IP as basis for queue-assignment, unless it is unavailable, in which case it behaves as HostnameQueueAssignmentPolicy.
IPQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
is2XXSuccess() - Method in class org.archive.modules.CrawlURI
 
isAborted() - Method in class org.apache.commons.httpclient.HttpMethodBase
Tests whether the execution of this method has been aborted
isActive() - Method in class org.archive.crawler.framework.CrawlController
Is this crawl actively able/trying to crawl? Includes both states RUNNING and EMPTY.
isActive() - Method in class org.archive.crawler.framework.ToeThread
Is this thread validly processing a URI, not paused, waiting for a URI, or interrupted?
isAllowCreate() - Method in class org.archive.bdb.BdbModule.BdbConfig
 
isARCType(String) - Method in class org.archive.io.Warc2Arc
 
isAuthenticationPreemptive() - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
Use HttpClientParams.isAuthenticationPreemptive(), HttpClient.getParams().
isCheckpointing() - Method in class org.archive.crawler.framework.CheckpointService
 
isCheckpointRecovery - Variable in class org.archive.modules.fetcher.BdbCookieStorage
are we a checkpoint recovery? (in which case, reuse stored cookie data?)
isCheckpointRecovery - Variable in class org.archive.modules.net.BdbServerCache
 
isConnectionCloseForced() - Method in class org.apache.commons.httpclient.HttpMethodBase
Tests if the connection should be force-closed when no longer needed.
isDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
isDomainAttributeSpecified() - Method in class org.apache.commons.httpclient.Cookie
Returns true if cookie's domain was set via a domain attribute in the Set-Cookie header.
isEmpty() - Method in class org.archive.bdb.StoredQueue
 
isEmpty() - Method in interface org.archive.crawler.framework.Frontier
Returns true if the frontier contains no more URIs to crawl.
isEmpty() - Method in class org.archive.crawler.frontier.AbstractFrontier
Frontier is empty only if all queues are empty and no URIs are in-process
isEmpty() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Return whether frontier is exhausted: all crawlable URIs done (none waiting or pending).
isEveryTime() - Method in class org.archive.modules.credential.Credential
 
isEveryTime() - Method in class org.archive.modules.credential.HtmlFormCredential
 
isEveryTime() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
isExpired() - Method in class org.apache.commons.httpclient.Cookie
Returns true if this cookie has expired.
isExpired(Date) - Method in class org.apache.commons.httpclient.Cookie
Returns true if this cookie has expired according to the time passed in.
isExpired() - Method in class org.archive.crawler.restlet.Flash
Indicate whether the Flash should persist.
isFailure() - Method in class org.archive.crawler.restlet.models.ScriptModel
 
isFinished() - Method in class org.archive.crawler.framework.CrawlController
 
isHtmlExpectedHere(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTML
Test whether this HTML is so unexpected (eg in place of a GIF URI) that it shouldn't be scanned for links.
isHttp11() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
Use HttpMethodParams.getVersion()
isHttpTransaction() - Method in class org.archive.modules.CrawlURI
Return true if this is a http transaction.
isInScope(CrawlURI) - Method in class org.archive.crawler.framework.Scoper
Schedule the given CrawlURI with the Frontier.
isInScope(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
isIpExpired(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
Return true if ip should be looked up.
isLaunchable() - Method in class org.archive.crawler.framework.CrawlJob
Is it reasonable to offer a launch button
isLaunchInfoPartial - Variable in class org.archive.crawler.framework.CrawlJob
 
isLaunchInfoPartial() - Method in class org.archive.crawler.framework.CrawlJob
 
isLikelyFalsePositive(CharSequence) - Static method in class org.archive.util.UriUtils
 
isLikelyUri(CharSequence) - Static method in class org.archive.util.UriUtils
Deprecated.
produces too many false positives, UriUtils.isVeryLikelyUri(CharSequence) is preferred
isLikelyUriHtmlContextLegacy(CharSequence) - Static method in class org.archive.util.UriUtils
 
isLikelyUriJavascriptContextLegacy(CharSequence) - Static method in class org.archive.util.UriUtils
 
isLocation() - Method in class org.archive.modules.CrawlURI
 
isLocked() - Method in class org.apache.commons.httpclient.HttpConnection
Tests if the connection is locked.
isManaged - Variable in class org.archive.crawler.frontier.WorkQueue
Whether queue is already in lifecycle stage
isManaged() - Method in class org.archive.crawler.frontier.WorkQueue
Whether the queue is already in a lifecycle stage -- such as ready, in-progress, snoozed -- and thus should not be redundantly inserted to readyClassQueues
isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.CustomRobotsPolicy
 
isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
isolateThreads - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
Whether each ToeThread should get its own independent script engine, or they should share synchronized access to one engine.
isolateThreads - Variable in class org.archive.modules.ScriptedProcessor
Whether each ToeThread should get its own independent script engine, or they should share synchronized access to one engine.
isOpen - Variable in class org.apache.commons.httpclient.HttpConnection
Whether or not the connection is connected.
isOpen() - Method in class org.apache.commons.httpclient.HttpConnection
Tests if the connection is open.
isOverSessionBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Check whether queue has temporarily (session) exceeded its budget.
isOverTotalBudget() - Method in class org.archive.crawler.frontier.WorkQueue
Check whether queue has permanently (total) exceeded its budget.
isPaintable() - Method in class org.archive.io.ReadSourceEditor
 
isPaintable() - Method in class org.archive.spring.ConfigPathEditor
 
isPathAttributeSpecified() - Method in class org.apache.commons.httpclient.Cookie
Returns true if cookie's path was set via a path attribute in the Set-Cookie header.
isPausable() - Method in class org.archive.crawler.framework.CrawlJob
 
isPaused() - Method in class org.archive.crawler.framework.CrawlController
Tell if the controller is paused
isPausing() - Method in class org.archive.crawler.framework.CrawlController
 
isPersistent() - Method in class org.apache.commons.httpclient.Cookie
Returns false if the cookie should be discarded at the end of the "session"; true otherwise.
isPossibleUri(CharSequence) - Static method in class org.archive.util.UriUtils
 
isPost() - Method in class org.archive.modules.credential.Credential
 
isPost() - Method in class org.archive.modules.credential.HtmlFormCredential
 
isPost() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
isPrerequisite() - Method in class org.archive.modules.CrawlURI
Returns true if this CrawlURI is a prerequisite.
isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
 
isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
 
isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
isProfile() - Method in class org.archive.crawler.framework.CrawlJob
Is this job a 'profile' (or template), meaning it may be editted or copied to another jobs, but should not be launched.
isProxied() - Method in class org.apache.commons.httpclient.HttpConnection
Returns true if the connection is established via a proxy, false otherwise.
isQuadAddress(CrawlURI, String, CrawlHost) - Method in class org.archive.modules.fetcher.FetchDNS
 
isRequestSent() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true if the HTTP has been transmitted to the target server in its entirety, false otherwise.
isResponseAvailable() - Method in class org.apache.commons.httpclient.HttpConnection
Tests if input data avaialble.
isResponseAvailable(int) - Method in class org.apache.commons.httpclient.HttpConnection
Tests if input data becomes available within the given period time in milliseconds.
isRetired() - Method in class org.archive.crawler.frontier.WorkQueue
 
isRobotsExpired(int) - Method in class org.archive.modules.net.CrawlServer
Is the robots policy expired.
isRunning - Variable in class org.archive.bdb.BdbModule
 
isRunning() - Method in class org.archive.bdb.BdbModule
 
isRunning() - Method in class org.archive.crawler.framework.ActionDirectory
 
isRunning - Variable in class org.archive.crawler.framework.CheckpointService
 
isRunning() - Method in class org.archive.crawler.framework.CheckpointService
 
isRunning - Variable in class org.archive.crawler.framework.CrawlController
 
isRunning() - Method in class org.archive.crawler.framework.CrawlController
 
isRunning() - Method in class org.archive.crawler.framework.CrawlJob
 
isRunning - Variable in class org.archive.crawler.framework.Scoper
 
isRunning() - Method in class org.archive.crawler.framework.Scoper
 
isRunning() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
isRunning() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
isRunning() - Method in class org.archive.crawler.processor.CrawlMapper
 
isRunning - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
isRunning() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
isRunning - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
isRunning() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
isRunning - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
isRunning() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
isRunning - Variable in class org.archive.modules.deciderules.DecideRuleSequence
 
isRunning() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
isRunning - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
 
isRunning() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
isRunning() - Method in class org.archive.modules.fetcher.FetchHTTP
 
isRunning() - Method in class org.archive.modules.fetcher.FetchWhois
 
isRunning - Variable in class org.archive.modules.net.BdbServerCache
 
isRunning() - Method in class org.archive.modules.net.BdbServerCache
 
isRunning - Variable in class org.archive.modules.Processor
 
isRunning() - Method in class org.archive.modules.Processor
 
isRunning - Variable in class org.archive.modules.ProcessorChain
 
isRunning() - Method in class org.archive.modules.ProcessorChain
 
isRunning() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
isRunning() - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
isRunning() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
 
isSecure() - Method in class org.apache.commons.httpclient.HttpConnection
Returns true if the connection is established over a secure protocol.
isSeed() - Method in class org.archive.modules.CrawlURI
 
isStale() - Method in class org.apache.commons.httpclient.HttpConnection
Determines whether this connection is "stale", which is to say that either it is no longer open, or an attempt to read the connection would fail.
isStaleCheckingEnabled() - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnectionParams.isStaleCheckingEnabled(), HttpConnection.getParams().
isStopComplete - Variable in class org.archive.crawler.framework.CrawlController
 
isStopComplete() - Method in class org.archive.crawler.framework.CrawlController
 
isStrictMode() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
Use HttpParams.setParameter(String, Object) to exercise a more granular control over HTTP protocol strictness.
isSuccess() - Method in class org.archive.modules.CrawlURI
Ask this URI if it was a success or not.
isSuccess(CrawlURI) - Static method in class org.archive.modules.Processor
 
isTransactional() - Method in class org.archive.bdb.BdbModule.BdbConfig
 
isTransparent() - Method in class org.apache.commons.httpclient.HttpConnection
Indicates if the connection is completely transparent from end to end.
isUnicode() - Method in class org.archive.util.ms.Piece
 
isUnpausable() - Method in class org.archive.crawler.framework.CrawlJob
 
isValidRobots() - Method in class org.archive.modules.net.CrawlServer
If true then valid robots.txt information has been retrieved.
isVeryLikelyUri(CharSequence) - Static method in class org.archive.util.UriUtils
 
isXmlOk() - Method in class org.archive.crawler.framework.CrawlJob
Is the primary config file legal XML?
iterator() - Method in class org.archive.bdb.StoredQueue
 
iterator() - Method in class org.archive.modules.ProcessorChain
 
iterator - Variable in class org.archive.util.Iteratorable
 
iterator() - Method in class org.archive.util.Iteratorable
 
iterator() - Method in class org.archive.util.Transform
 
Iteratorable<K> - Class in org.archive.util
Make an Iterator usable as an Iterable (and thus enable new-style for-each loops).
Iteratorable(Iterator<K>) - Constructor for class org.archive.util.Iteratorable
 
iterators - Variable in class org.archive.util.iterator.CompositeIterator
 

J

JavaLiterals - Class in org.archive.util
Utility functions to escape or unescape Java literal strings.
JavaLiterals() - Constructor for class org.archive.util.JavaLiterals
 
JAVASCRIPT_STRING_EXTRACTOR - Static variable in class org.archive.modules.extractor.ExtractorJS
 
JerichoExtractorHTML - Class in org.archive.modules.extractor
Improved link-extraction from an HTML content-body using jericho-html parser.
JerichoExtractorHTML() - Constructor for class org.archive.modules.extractor.JerichoExtractorHTML
 
JndiUtils - Class in org.archive.util
JNDI utilities.
JndiUtils() - Constructor for class org.archive.util.JndiUtils
 
jobConfigs - Variable in class org.archive.crawler.framework.Engine
map of job short names -> CrawlJob instances
jobDirRelativePath(File) - Method in class org.archive.crawler.framework.CrawlJob
Compute a path relative to the job directory for all contained files, or null if the File is not inside the job directory.
jobLogger - Variable in class org.archive.crawler.framework.CrawlJob
 
jobName - Variable in class org.archive.modules.CrawlMetadata
 
JobRelatedResource - Class in org.archive.crawler.restlet
Shared superclass for resources that represent functional aspects of a CrawlJob.
JobRelatedResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.JobRelatedResource
 
JobResource - Class in org.archive.crawler.restlet
Restlet Resource representing a single local CrawlJob inside an Engine.
JobResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.JobResource
 
jobsDir - Variable in class org.archive.crawler.framework.Engine
directory where job directories are expected
JS_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for JavaScript-discovered urls without other context.
JSONUtils - Class in org.archive.util
Utilities for working with JSON/JSONObjects.
JSONUtils() - Constructor for class org.archive.util.JSONUtils
 
JSSTRING - Static variable in class org.archive.modules.extractor.ExtractorSWF
 
jump(String) - Static method in class org.archive.modules.ProcessResult
 

K

keepSnapshotsCount - Variable in class org.archive.crawler.reporting.StatisticsTracker
Number of crawl-stat sample snapshots to keep for calculation purposes.
key - Variable in class org.archive.util.IdentityCacheableWrapper
 
KeyedProperties - Class in org.archive.spring
Map for storing overridable properties.
KeyedProperties() - Constructor for class org.archive.spring.KeyedProperties
 
keys - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
keySet() - Method in class org.archive.util.ObjectIdentityBdbCache
 
keySet() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
keySet() - Method in interface org.archive.util.ObjectIdentityCache
set of all keys
keySet() - Method in class org.archive.util.ObjectIdentityMemCache
 
kill() - Method in class org.archive.crawler.framework.ToeThread
Terminates a thread.
killThread(int, boolean) - Method in class org.archive.crawler.framework.CrawlController
Kills a thread.
killThread(int, boolean) - Method in class org.archive.crawler.framework.ToePool
Kills specified thread.
kind - Variable in class org.archive.crawler.restlet.Flash
kind of flash, ACK NACK or ADVISORY
kp - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
kp - Variable in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
kp - Variable in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
kp - Variable in class org.archive.crawler.frontier.QueueAssignmentPolicy
 
kp - Variable in class org.archive.modules.canonicalize.BaseRule
 
kp - Variable in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
 
kp - Variable in class org.archive.modules.CrawlMetadata
 
kp - Variable in class org.archive.modules.credential.CredentialStore
 
kp - Variable in class org.archive.modules.deciderules.DecideRule
 
kp - Variable in class org.archive.modules.Processor
 
kp - Variable in class org.archive.modules.ProcessorChain
 
kryo - Variable in class org.archive.bdb.KryoBinding
 
KryoBinding<K> - Class in org.archive.bdb
Binding for use with BerkeleyDB-JE that uses Kryo serialization rather than BDB's (custom version of) Java serialization.
KryoBinding(Class<K>) - Constructor for class org.archive.bdb.KryoBinding
Constructor.
Kw3Constants - Interface in org.archive.modules.writer
 
Kw3WriterProcessor - Class in org.archive.modules.writer
Processor module that writes the results of successful fetches to files on disk.
Kw3WriterProcessor() - Constructor for class org.archive.modules.writer.Kw3WriterProcessor
Constructor.

L

largestKnownKey - Variable in class org.archive.crawler.util.TopNSet
 
largestKnownValue - Variable in class org.archive.crawler.util.TopNSet
 
largestQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
remember keys of small number of largest queues for reporting
lastCacheMiss - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
lastCacheMissDiff - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
lastCheckpoint - Variable in class org.archive.crawler.framework.CheckpointService
 
lastCheckpointSnapshot - Variable in class org.archive.crawler.framework.CheckpointService
 
lastCost - Variable in class org.archive.crawler.frontier.WorkQueue
Cost of the last item to be charged against queue
lastDequeueTime - Variable in class org.archive.crawler.frontier.WorkQueue
time of last dequeue (disposition of some URI)
lastFailureTime - Variable in class org.archive.crawler.restlet.RateLimitGuard
 
lastLaunch - Variable in class org.archive.crawler.framework.CrawlJob
 
lastPeeked - Variable in class org.archive.crawler.frontier.WorkQueue
Last URI peeked
lastQueued - Variable in class org.archive.crawler.frontier.WorkQueue
Last URI enqueued
lastReachedState - Variable in class org.archive.crawler.frontier.AbstractFrontier
last Frontier.State reached; used to suppress duplicate notifications
lastSuccessTime - Variable in class org.archive.modules.fetcher.FetchStats
 
launch() - Method in class org.archive.crawler.framework.CrawlJob
Launch a crawl into 'running' status, assembling if necessary.
launchCount - Variable in class org.archive.crawler.framework.CrawlJob
 
LENGTH_TRUNC - Static variable in interface org.archive.modules.CoreAttributeConstants
 
LENGTH_TRUNC - Static variable in class org.archive.modules.fetcher.FetchErrors
 
LexicalCrawlMapper - Class in org.archive.crawler.processor
A simple crawl splitter/mapper, dividing up CrawlURIs/CrawlURIs between crawlers by diverting some range of URIs to local log files (which can then be imported to other crawlers).
LexicalCrawlMapper() - Constructor for class org.archive.crawler.processor.LexicalCrawlMapper
Constructor.
LIKELY_URI_PATH - Static variable in class org.archive.util.UriUtils
 
lineCount - Variable in class org.archive.crawler.restlet.PagedRepresentation
desired line count; negative to go back from position; default 128
linePos - Variable in class org.archive.util.PaddingStringBuffer
 
lines - Variable in class org.archive.crawler.restlet.PagedRepresentation
text lines
lines - Variable in class org.archive.io.CrawlerJournal
line count
Link - Class in org.archive.modules.extractor
Link represents one discovered "edge" of the web graph: the source URI, the destination URI, and the type of reference (represented by the context in which it was found).
Link(CharSequence, CharSequence, LinkContext, Hop) - Constructor for class org.archive.modules.extractor.Link
Create a Link with the given fields.
link(String, String) - Method in interface org.archive.util.CLibrary
 
LinkContext - Class in org.archive.modules.extractor
The context of link discovery.
LinkContext() - Constructor for class org.archive.modules.extractor.LinkContext
 
LinkContext.SimpleLinkContext - Class in org.archive.modules.extractor
Class for representing handy default LinkContext values.
LinkContext.SimpleLinkContext(String) - Constructor for class org.archive.modules.extractor.LinkContext.SimpleLinkContext
 
linkExtractorFinished() - Method in class org.archive.modules.CrawlURI
Note that link extraction has been performed on this CrawlURI.
LinksScoper - Class in org.archive.crawler.postprocessor
Deprecated.
Use CandidatesProcessor and CandidateChain/CandidateScoper instead
LinksScoper() - Constructor for class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
list() - Method in interface org.archive.util.ms.Entry
 
listProblems(List<String>) - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
 
listSnapshots() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
liveHostReportSize - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
load(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
 
load(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractContentDigestHistory
Looks up the history by key persistKeyFor(curi) and loads it into curi.getContentDigestHistory().
load(CrawlURI) - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
loadCookies(Reader, SortedMap<String, Cookie>) - Static method in class org.archive.modules.fetcher.AbstractCookieStorage
Load cookies.
loadCookies(ConfigFile, SortedMap<String, Cookie>) - Static method in class org.archive.modules.fetcher.AbstractCookieStorage
 
loadCookies(String, SortedMap<String, Cookie>) - Static method in class org.archive.modules.fetcher.AbstractCookieStorage
 
loadFactor - Variable in class org.archive.util.AbstractLongFPSet
The load factor, as a fraction.
loadJson(String) - Method in class org.archive.checkpointing.Checkpoint
 
loadLines() - Method in class org.archive.crawler.restlet.PagedRepresentation
Actually read the requested lines, and reverses if appropriate.
loadMap() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
Retrieve and parse the mapping specification from a local path or HTTP URL.
loadOverridesFrom(OverlayContext) - Static method in class org.archive.spring.KeyedProperties
 
loadReader(String, String) - Method in class org.archive.checkpointing.Checkpoint
 
loadReport() - Method in class org.archive.crawler.framework.CrawlJob
 
loadReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
localName - Variable in class org.archive.crawler.processor.CrawlMapper
Name of local crawler node; mappings to this name result in normal processing (no diversion).
LOG - Static variable in class org.apache.commons.httpclient.cookie.CookieSpecBase
Log object
log(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Log to the main crawl.log
log - Variable in class org.archive.modules.recrawl.PersistLogProcessor
 
LOG_ERROR - Static variable in class org.archive.io.CrawlerJournal
prefix for error lines
LOG_TIMESTAMP - Static variable in class org.archive.io.CrawlerJournal
prefix for timestamp lines
logExtraInfo - Variable in class org.archive.crawler.io.UriProcessingFormatter
 
logExtraInfo - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
Whether to include the "extra info" field for each entry in crawl.log.
logFile - Variable in class org.archive.modules.recrawl.PersistLogProcessor
 
logGeneration - Variable in class org.archive.crawler.processor.CrawlMapper
Truncated timestamp prefix for diversion logs; when current time doesn't match, it's time to close all current logs.
logger - Static variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
logger - Static variable in class org.archive.modules.canonicalize.RegexRule
 
logger - Static variable in class org.archive.modules.extractor.AggressiveExtractorHTML
 
loggerModule - Variable in class org.archive.crawler.framework.CrawlController
 
loggerModule - Variable in class org.archive.crawler.framework.Scoper
 
loggerModule - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
loggerModule - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
 
loggerModule - Variable in class org.archive.crawler.prefetch.PreconditionEnforcer
 
loggerModule - Variable in class org.archive.modules.deciderules.DecideRuleSequence
 
loggerModule - Variable in class org.archive.modules.extractor.Extractor
 
loggerModule - Variable in class org.archive.modules.forms.FormLoginProcessor
 
loggers - Variable in class org.archive.crawler.reporting.AlertThreadGroup
 
login - Variable in class org.archive.modules.credential.HttpAuthenticationCredential
Login.
loginUri - Variable in class org.archive.modules.credential.HtmlFormCredential
Full URI of page that contains the HTML login form we're to apply these credentials too: E.g.
LOGNAME_RECOVER - Static variable in class org.archive.crawler.frontier.FrontierJournal
 
logNonfatalErrors(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Take note of any processor-local errors that have been entered into the CrawlURI.
logNote(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
logProgressStatistics(String) - Method in class org.archive.crawler.framework.CrawlController
Log to the progress statistics log.
LogReader - Class in org.archive.crawler.util
This class contains a variety of methods for reading log files (or other text files containing repeated lines with similar information).
LogReader() - Constructor for class org.archive.crawler.util.LogReader
 
Logs - Enum in org.archive.crawler.util
Enumerates existing Heritrix logs
LOGS_DIR_NAME - Static variable in class org.archive.crawler.framework.Engine
 
logUriError(URIException, UURI, CharSequence) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
Log a URIException from deep inside other components to the crawl's shared log.
logUriError(URIException, UURI, CharSequence) - Method in class org.archive.modules.extractor.Extractor
 
logUriError(URIException, UURI, CharSequence) - Method in interface org.archive.modules.extractor.UriErrorLoggerModule
 
LogUtils - Class in org.archive.crawler.util
Logging utils.
LogUtils() - Constructor for class org.archive.crawler.util.LogUtils
 
longestPrefixLength(ConcurrentSkipListSet<String>, String) - Method in class org.archive.modules.net.RobotsDirectives
 
LongFPSet - Interface in org.archive.util.fingerprint
Set for holding primitive long fingerprints.
LongFPSetCache - Class in org.archive.util.fingerprint
Like a MemLongFPSet, but with fixed capacity and maximum size.
LongFPSetCache() - Constructor for class org.archive.util.fingerprint.LongFPSetCache
 
LongFPSetCache(int, float) - Constructor for class org.archive.util.fingerprint.LongFPSetCache
 
LongFPSetTestCase - Class in org.archive.util.fingerprint
JUnit test suite for LongFPSet.
LongFPSetTestCase(String) - Constructor for class org.archive.util.fingerprint.LongFPSetTestCase
Create a new LongFPSetTest object
LongToIntConsistentHash - Class in org.archive.util
Simple consistent-hashing implementation: provided a long and an integer bucket-number upper-bound (exclusive), return the matching integer.
LongToIntConsistentHash() - Constructor for class org.archive.util.LongToIntConsistentHash
 
LongToIntConsistentHash(int) - Constructor for class org.archive.util.LongToIntConsistentHash
 
lookahead() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Check if there's a next by trying to read it.
lookup - Variable in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
lookup(InetAddress) - Method in interface org.archive.modules.deciderules.ExternalGeoLookupInterface
 
lookupTable(String[]) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
LowDiskPauseProcessor - Class in org.archive.crawler.postprocessor
Deprecated.
Is highly system dependant. Use DiskSpaceMonitor instead.
LowDiskPauseProcessor() - Constructor for class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
LowercaseRule - Class in org.archive.modules.canonicalize
Lowercases the URL.
LowercaseRule() - Constructor for class org.archive.modules.canonicalize.LowercaseRule
Constructor.
lpSecurityDescriptor - Variable in class org.archive.util.FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES
 
LRU<K,V> - Class in org.archive.util
A least-recently used cache.
LRU(int) - Constructor for class org.archive.util.LRU
Constructor.

M

m - Variable in class org.archive.util.BloomFilter64bit
The number of bits in this filter.
main(String[]) - Static method in class org.archive.crawler.frontier.precedence.PrecedenceLoader
Utility main for importing a text file (first argument) with lines of the form: URI [whitespace] precedence into a BDB-JE environment (second argument, created if necessary).
main(String[]) - Static method in class org.archive.crawler.Heritrix
Launches a local Engine and restfgul web interface given the command-line options or defaults.
main(String[]) - Static method in class org.archive.crawler.migrate.MigrateH1to3Tool
 
main(String[]) - Static method in class org.archive.crawler.util.BenchmarkUriUniqFilters
Test the UriUniqFilter implementation (MemUriUniqFilter, BloomUriUniqFilter, or BdbUriUniqFilter) named in first argument against the file of one-per-line URIs named in the second argument.
main(String[]) - Static method in class org.archive.crawler.util.RecoveryLogMapper
 
main(String[]) - Static method in class org.archive.io.Arc2Warc
Command-line interface to Arc2Warc.
main(String[]) - Static method in class org.archive.io.Warc2Arc
Command-line interface to Arc2Warc.
main(String[]) - Static method in class org.archive.modules.extractor.PDFParser
 
main(String[]) - Static method in class org.archive.modules.recrawl.PersistProcessor
Utility main for importing a log into a BDB-JE environment or moving a database between environments (2 arguments), or simply dumping a log to stderr in a more readable format (1 argument).
main(String[]) - Static method in class org.archive.util.Base32
For testing, take a command-line argument in Base32, decode, print in hex, encode, print
main(String[]) - Static method in class org.archive.util.FilesystemLinkMaker
 
main(String[]) - Static method in class org.archive.util.JndiUtils
Testing code.
main(String[]) - Static method in class org.archive.util.OneLineSimpleLogger
Test this logger.
make(long, int) - Static method in class st.ata.util.FPGenerator
Return a fingerprint generator.
makeBindings(Map<String, ExtractorMultipleRegex.MatchList>, String[], int) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
makeConsequentCandidate(String, LinkContext, Hop) - Method in class org.archive.modules.CrawlURI
Create a consequent CrawlURI from this one, given the additional parameters
makeData(String, String) - Method in class org.archive.modules.extractor.StringExtractorTestBase
 
makeDataModel() - Method in class org.archive.crawler.restlet.BeanBrowseResource
Constructs a nested Map data structure with the information represented by this Resource.
makeDataModel() - Method in class org.archive.crawler.restlet.EngineResource
Constructs a nested Map data structure with the information represented by this Resource.
makeDataModel() - Method in class org.archive.crawler.restlet.JobResource
Constructs a nested Map data structure with the information represented by this Resource.
makeDataModel() - Method in class org.archive.crawler.restlet.ScriptResource
Constructs a nested Map data structure with the information represented by this Resource.
makeDirty() - Method in class org.archive.crawler.frontier.WorkQueue
 
makeDirty() - Method in class org.archive.crawler.reporting.SeedRecord
 
makeDirty() - Method in class org.archive.modules.net.CrawlHost
 
makeDirty() - Method in class org.archive.modules.net.CrawlServer
 
makeDirty() - Method in interface org.archive.util.IdentityCacheable
 
makeDirty() - Method in class org.archive.util.IdentityCacheableWrapper
 
makeExtractor() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
Subclasses should return an Extractor instance to test.
makeHardLink(String, String) - Static method in class org.archive.util.FilesystemLinkMaker
Wrapper over platform-dependent system calls to create a hard link.
makeHeritable(String) - Method in class org.archive.modules.CrawlURI
Make the given key 'heritable', meaning its value will be added to descendant CrawlURIs.
makeLongFPSet() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
 
makeModule() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
 
makeModule() - Method in class org.archive.state.ModuleTestBase
Return an example instance of the module.
makeNonHeritable(String) - Method in class org.archive.modules.CrawlURI
Make the given key non-'heritable', meaning its value will not be added to descendant CrawlURIs.
makeOne(String, boolean, String) - Method in class org.archive.net.UURIFactory
 
makeOne(UsableURI, UsableURI) - Method in class org.archive.net.UURIFactory
 
makePackageSuite(Class<?>) - Static method in class org.archive.util.TestUtils
 
makePresentableMapFor(String, Object) - Method in class org.archive.crawler.restlet.JobRelatedResource
Constructs a nested Map data structure of the information represented by object.
makePresentableMapFor(String, Object, String) - Method in class org.archive.crawler.restlet.JobRelatedResource
Constructs a nested Map data structure of the information represented by object.
makePresentableMapFor(String, Object, HashSet<Object>, String) - Method in class org.archive.crawler.restlet.JobRelatedResource
Constructs a nested Map data structure of the information represented by object.
makeSpace() - Method in class org.archive.util.AbstractLongFPSet
Make additional space to keep the load under the target loadFactor level.
makeSpace() - Method in class org.archive.util.fingerprint.LongFPSetCache
 
makeSpace() - Method in class org.archive.util.fingerprint.MemLongFPSet
 
makeSuite(File, File) - Static method in class org.archive.util.TestUtils
 
makeSymbolicLink(String, String) - Static method in class org.archive.util.FilesystemLinkMaker
Wrapper over platform-dependent system calls to create a symbolic link.
makeTempDir() - Static method in class org.archive.modules.net.DefaultTempDirProvider
 
makeWhoisUrl(String, String) - Method in class org.archive.modules.fetcher.FetchWhois
 
managementTasks() - Method in class org.archive.crawler.frontier.AbstractFrontier
Main loop of frontier's managerThread.
MANAGER - Static variable in class org.archive.crawler.framework.ActionDirectory
shared ScriptEngineManager
MANAGER - Static variable in class org.archive.crawler.restlet.ScriptResource
 
managerThread - Variable in class org.archive.crawler.frontier.AbstractFrontier
Distinguished frontier manager thread which handles all juggling of URI queues and queues/maps of queues for proper ordering/delay of URI processing.
MANIFEST_CONFIG_FILE - Static variable in class org.archive.crawler.reporting.CrawlerLoggerModule
abbreviation label for config files in manifest
MANIFEST_LOG_FILE - Static variable in class org.archive.crawler.reporting.CrawlerLoggerModule
abbreviation label for log files in manifest
MANIFEST_REPORT_FILE - Static variable in class org.archive.crawler.reporting.CrawlerLoggerModule
abbreviation label for report files in manifest
map(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
Look up the crawler node name to which the given CrawlURI should be mapped.
map(CrawlURI) - Method in class org.archive.crawler.processor.HashCrawlMapper
Look up the crawler node name to which the given CrawlURI should be mapped.
map - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
Mapping of classKey ranges (as represented by their start) to crawlers (by abstract name/filename)
map(CrawlURI) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
Look up the crawler node name to which the given CrawlURI should be mapped.
map - Variable in class org.archive.spring.Sheet
map of full property-paths (from BeanFactory to individual property) and their changed value when this Sheet of overrides is in effect
map - Variable in class org.archive.util.ObjectIdentityMemCache
 
mapPath - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
Path to map specification file.
mapString(String, String, long) - Static method in class org.archive.crawler.processor.HashCrawlMapper
 
mapUri - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
URI to map specification file.
markAsSeen(int, int) - Method in class org.archive.modules.extractor.PDFParser
Note that an object (id/generation pair) has been seen by this parser so that it can be handled differently when it is encountered again.
markPrerequisite(String) - Method in class org.archive.modules.CrawlURI
Do all actions associated with setting a CrawlURI as requiring a prerequisite.
marshal(XmlWriter, String, Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
 
marshal(XmlWriter, String, Map<?, ?>) - Static method in class org.archive.crawler.restlet.XmlMarshaller
 
marshal(XmlWriter, String, Iterable<?>) - Static method in class org.archive.crawler.restlet.XmlMarshaller
 
marshal(XmlWriter, Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
 
marshalAsElement(Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
test if obj has XmlRootElement annotation.
marshalBean(XmlWriter, String, Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
generate nested XML structure for a bean obj.
marshalDocument(Writer, String, Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
Writes content as xml to writer.
match(String, int, String, boolean, Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Determines if a Cookie matches a location.
match(String, int, String, boolean, Cookie[]) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Deprecated.
use match(String, int, String, boolean, SortedMap) // END IA/HERITRIX CHANGES
match(String, int, String, boolean, SortedMap<String, Cookie>) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Determines which of an array of Cookies matches a location.
match(String, int, String, boolean, Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Return true if the cookie should be submitted with a request with given attributes, false otherwise.
match(String, int, String, boolean, Cookie[]) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Deprecated.
use match(String, int, String, boolean, SortedMap) // END IA/HERITRIX CHANGES
match(String, int, String, boolean, SortedMap<String, Cookie>) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Return an array of Cookies that should be submitted with a request with given attributes, false otherwise.
match(String, int, String, boolean, Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
match(String, int, String, boolean, Cookie[]) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
Returns an empty cookie array.
match(String, int, String, boolean, SortedMap<String, Cookie>) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
MatchesFilePatternDecideRule - Class in org.archive.modules.deciderules
Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern, applying its configured decision to all matches.
MatchesFilePatternDecideRule() - Constructor for class org.archive.modules.deciderules.MatchesFilePatternDecideRule
Usual constructor.
MatchesFilePatternDecideRule.Preset - Enum in org.archive.modules.deciderules
 
MatchesListRegexDecideRule - Class in org.archive.modules.deciderules
Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexs.
MatchesListRegexDecideRule() - Constructor for class org.archive.modules.deciderules.MatchesListRegexDecideRule
Usual constructor.
MatchesRegexDecideRule - Class in org.archive.modules.deciderules
Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regex.
MatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.MatchesRegexDecideRule
Usual constructor.
MatchesStatusCodeDecideRule - Class in org.archive.modules.deciderules
Provides a rule that returns "true" for any CrawlURIs which have a fetch status code that falls within the provided inclusive range.
MatchesStatusCodeDecideRule() - Constructor for class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Creates a new MatchStatusCodeDecideRule instance.
MAX_SIZE - Static variable in class org.archive.modules.net.Robotstxt
 
MAX_SNOOZED_IN_MEMORY - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
maxBytesDownload - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
Maximum number of bytes to download.
maxDocumentsDownload - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
Maximum number of documents to download.
maxFileSizeBytes - Variable in class org.archive.modules.writer.Kw3WriterProcessor
Max size for each file.
maxFileSizeBytes - Variable in class org.archive.modules.writer.WriterPoolProcessor
Max size of each file.
maximumNumberOfKeys() - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
maximumNumberOfKeys() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
Returns the maximum number of different keys this policy can create.
maxPathLength - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Maximum file system path length.
maxPending - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
size at which to force flush of pending items
maxQueuesPerReportCategory - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
truncate reporting of queues at this large but not unbounded number
maxSegLength - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Maximum file system path segment length.
maxsize - Variable in class org.archive.crawler.util.TopNSet
 
maxTimeSeconds - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
Maximum amount of time to crawl (in seconds).
maxToeThreads - Variable in class org.archive.crawler.framework.CrawlController
Maximum number of threads processing URIs at the same time.
maxTotalBytesToWrite - Variable in class org.archive.modules.writer.WriterPoolProcessor
Total file bytes to write to disk.
maxWaitForIdleMs - Variable in class org.archive.modules.writer.WriterPoolProcessor
Maximum time to wait on idle writer before (possibly) creating an additional instance.
MEDIUM - Static variable in class org.archive.modules.SchedulingConstants
Medium priority.
MemFPMergeUriUniqFilter - Class in org.archive.crawler.util
Crude all-in-memory FP-merging UriUniqFilter.
MemFPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
MemLongFPSet - Class in org.archive.util.fingerprint
Open-addressing in-memory hash set for holding primitive long fingerprints.
MemLongFPSet() - Constructor for class org.archive.util.fingerprint.MemLongFPSet
 
MemLongFPSet(int, float) - Constructor for class org.archive.util.fingerprint.MemLongFPSet
 
memMap - Variable in class org.archive.util.ObjectIdentityBdbCache
in-memory map of new/recent/still-referenced-elsewhere instances
memMap - Variable in class org.archive.util.ObjectIdentityBdbManualCache
in-memory map of new/recent/still-referenced-elsewhere instances
MemUriUniqFilter - Class in org.archive.crawler.util
A purely in-memory UriUniqFilter based on a HashSet, which remembers every full URI string it sees.
MemUriUniqFilter() - Constructor for class org.archive.crawler.util.MemUriUniqFilter
 
merge(ConfigPath) - Method in class org.archive.spring.ConfigPath
To maintain ConfigPath's 'base' and object-identity, this merge should be used to updated ConfigPath properties in other beans, rather than discarding the old value.
mergeDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
mergeDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
mergePrior(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
Merge any data from the Map stored in the URI-history store into the current instance.
message - Variable in class org.archive.crawler.event.CrawlStateEvent
 
message - Variable in class org.archive.crawler.restlet.Flash
the message to show, if any
META - Static variable in class org.archive.modules.extractor.HTMLLinkContext
 
META_HREF - Static variable in class org.archive.modules.extractor.HTMLLinkContext
 
metadata - Variable in class org.archive.crawler.framework.CrawlController
 
metadata - Variable in class org.archive.crawler.postprocessor.DispositionProcessor
Auto-discovered module providing configured (or overridden) User-Agent value and RobotsHonoringPolicy
metadata - Variable in class org.archive.crawler.prefetch.PreconditionEnforcer
Auto-discovered module providing configured (or overridden) User-Agent value and RobotsHonoringPolicy
metadata - Variable in class org.archive.modules.extractor.ExtractorHTML
CrawlMetadata provides the robots honoring policy to use when considering a robots META tag.
MigrateH1to3Tool - Class in org.archive.crawler.migrate
Utility class which takes a H1 order.xml and creates a similar H3 job directory, with as many simple settings converted over (as top-of-crawler-beans overrides) as possible at this time.
MigrateH1to3Tool() - Constructor for class org.archive.crawler.migrate.MigrateH1to3Tool
 
mimeTypeBytes - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
mimeTypeDistribution - Variable in class org.archive.crawler.reporting.StatisticsTracker
Keep track of the file types we see (mime type -> count)
MimetypesReport - Class in org.archive.crawler.reporting
The "Mimetypes Report", tallies by MIME type.
MimetypesReport() - Constructor for class org.archive.crawler.reporting.MimetypesReport
 
MIN_ROBOTS_RETRIES - Static variable in class org.archive.modules.net.CrawlServer
only check if robots-fetch is perhaps superfluous after this many tries
MirrorWriterProcessor - Class in org.archive.modules.writer
Processor module that writes the results of successful fetches to files on disk.
MirrorWriterProcessor() - Constructor for class org.archive.modules.writer.MirrorWriterProcessor
 
ModuleTestBase - Class in org.archive.state
Base class for unit testing Module implementations.
ModuleTestBase() - Constructor for class org.archive.state.ModuleTestBase
Magical constructor that attempts to auto-create static key field descriptions for your module class.
monitorConfigPaths - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
 
monitorMounts - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
List of filessystem mounts whose 'available' space should be monitored via 'df' (if available).
monitorPaths - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
 
MostFavoredRobotsPolicy - Class in org.archive.modules.net
Follow a most-favored robots policy -- allowing an URL if either the conventionally-configured User-Agent, or any of a number of alternate User-Agents (from the candidateUserAgents list) would be allowed.
MostFavoredRobotsPolicy() - Constructor for class org.archive.modules.net.MostFavoredRobotsPolicy
 

N

NAIVE_LIKELY_URI_PATTERN - Static variable in class org.archive.util.UriUtils
 
name - Variable in class org.archive.checkpointing.Checkpoint
 
name - Variable in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
name - Variable in class org.archive.modules.forms.HTMLForm.FormInput
 
name - Variable in class org.archive.spring.ConfigPath
 
name - Variable in class org.archive.spring.HeritrixLifecycleProcessor
 
name - Variable in class org.archive.spring.Sheet
unique name of this Sheet; if Sheet has a beanName from original configuration, that is always the name -- but the name might also be another string, in the case of Sheets added after initial container wiring
namedUserAgents - Variable in class org.archive.modules.net.Robotstxt
 
NATURAL_LOG_OF_2 - Static variable in class org.archive.util.BloomFilter64bit
The natural logarithm of 2, used in the computation of the number of bits.
NAVLINK_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for navlink urls without other context.
needsReenqueuing(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Checks if a recently processed CrawlURI that did not finish successfully needs to be reenqueued (and thus possibly, processed again after some time elapses)
needTeardown - Variable in class org.archive.crawler.framework.CrawlJob
 
newCount - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newEngine() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
Create a new ScriptEngine instance, preloaded with any supplied source file and the variables 'self' (this ScriptedDecideRule) and 'context' (the ApplicationContext).
newEngine() - Method in class org.archive.modules.ScriptedProcessor
Create a new ScriptEngine instance, preloaded with any supplied source file and the variables 'self' (this ScriptedProcessor) and 'context' (the ApplicationContext).
newFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newFps - Variable in class org.archive.crawler.util.MemFPMergeUriUniqFilter
 
newFpsFile - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
newInstance(Class<T>) - Method in class org.archive.bdb.AutoKryo
 
newline() - Method in class org.archive.util.PaddingStringBuffer
Forces a new line in the buffer.
next() - Method in interface org.archive.crawler.framework.Frontier
Get the next URI that should be processed.
next() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
next() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
Return the next item.
next() - Method in class org.archive.util.iterator.CompositeIterator
 
nextCheckpointNumber - Variable in class org.archive.crawler.framework.CheckpointService
Next overall series checkpoint number
nextdrop - Static variable in class org.archive.crawler.restlet.Flash
 
nextFlushAllowableAfter - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
time-based throttle on flush-merge operations
nextLong() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
nextOrdinal - Variable in class org.archive.crawler.frontier.AbstractFrontier
ordinal numbers to assign to created CrawlURIs
nextSearch() - Method in class org.archive.surt.SURTTokenizer
update internal state and return the next smaller search string for the url
nextSerialNumber - Variable in class org.archive.crawler.framework.ToePool
 
nLength - Variable in class org.archive.util.FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES
 
NO_DIRECTIVES - Static variable in class org.archive.modules.net.Robotstxt
 
NO_ROBOTS - Static variable in class org.archive.modules.net.Robotstxt
empty, reusable instance for all sites providing no rules
NonFatalErrorFormatter - Class in org.archive.crawler.io
 
NonFatalErrorFormatter(boolean) - Constructor for class org.archive.crawler.io.NonFatalErrorFormatter
 
nonfatalErrorsLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
nonseedLine(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
Do nothing with non-seed lines
nonseedLine(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
Do nothing with nonseed lines.
nonseedLine(String) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Consider nonseed lines as possible SURT prefix directives.
nonseedLine(String) - Method in interface org.archive.modules.seeds.SeedListener
 
nonseedLine(String) - Method in class org.archive.modules.seeds.TextSeedModule
Handle a read line that is not a seed, but may still have meaning to seed-consumers (such as scoping beans).
NoopUriUniqFilter - Class in org.archive.crawler.util
A UriUniqFilter that doesn't actually provide any uniqueness filter on presented items: all are passed through.
NoopUriUniqFilter() - Constructor for class org.archive.crawler.util.NoopUriUniqFilter
 
NORMAL - Static variable in class org.archive.modules.SchedulingConstants
Normal/low priority.
note(String) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Note item as seen, without passing through to receiver.
note(String) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
note(String) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
noteAboutToEmit(CrawlURI, WorkQueue) - Method in class org.archive.crawler.frontier.AbstractFrontier
Perform fixups on a CrawlURI about to be returned via next().
noteAccess(long) - Method in class org.archive.util.fingerprint.LongFPSetCache
 
noteDeactivated() - Method in class org.archive.crawler.frontier.WorkQueue
Update queue state to recognize it has been sent to one of the inactive (by-precedence) queues, waiting for a turn.
noteError(int) - Method in class org.archive.crawler.frontier.WorkQueue
Note an error and assess an extra penalty.
noteExhausted() - Method in class org.archive.crawler.frontier.WorkQueue
Update queue state to recognize it has been completely exhausted, and is no longer on any of the ready/inactive queues-of-queues
noteFrontierState(Frontier.State) - Method in class org.archive.crawler.framework.CrawlController
Receive notification from the frontier, in the frontier's own manager thread, that the frontier has reached a new state.
noteLine() - Method in class org.archive.io.CrawlerJournal
Count and note a line
noteStart() - Method in class org.archive.crawler.reporting.StatisticsTracker
Notify tracker that crawl has begun.
NotMatchesFilePatternDecideRule - Class in org.archive.modules.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied (file-pattern) regex.
NotMatchesFilePatternDecideRule() - Constructor for class org.archive.modules.deciderules.NotMatchesFilePatternDecideRule
Usual constructor.
NotMatchesListRegexDecideRule - Class in org.archive.modules.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied regex.
NotMatchesListRegexDecideRule() - Constructor for class org.archive.modules.deciderules.NotMatchesListRegexDecideRule
Usual constructor.
NotMatchesRegexDecideRule - Class in org.archive.modules.deciderules
Rule applies configured decision to any URIs which do *not* match the supplied regex.
NotMatchesRegexDecideRule(String) - Constructor for class org.archive.modules.deciderules.NotMatchesRegexDecideRule
Usual constructor.
NotMatchesStatusCodeDecideRule - Class in org.archive.modules.deciderules
Provides a rule that returns "true" for any CrawlURIs which has a fetch status code that does not fall within the provided inclusive range.
NotMatchesStatusCodeDecideRule() - Constructor for class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Creates a new NotMatchesStatusCodeDecideRule instance.
NOTMODIFIED - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
notModifiedBytes - Variable in class org.archive.modules.fetcher.FetchStats
 
NOTMODIFIEDCOUNT - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
notModifiedUrls - Variable in class org.archive.modules.fetcher.FetchStats
 
NotOnDomainsDecideRule - Class in org.archive.modules.deciderules.surt
Rule applies configured decision to any URIs that are *not* in one of the domains in the configured set of domains, filled from the seed set.
NotOnDomainsDecideRule() - Constructor for class org.archive.modules.deciderules.surt.NotOnDomainsDecideRule
Usual constructor.
NotOnHostsDecideRule - Class in org.archive.modules.deciderules.surt
Rule applies configured decision to any URIs that are *not* on one of the hosts in the configured set of hosts, filled from the seed set.
NotOnHostsDecideRule() - Constructor for class org.archive.modules.deciderules.surt.NotOnHostsDecideRule
Usual constructor.
NotSurtPrefixedDecideRule - Class in org.archive.modules.deciderules.surt
Rule applies configured decision to any URIs that, when expressed in SURT form, do *not* begin with one of the prefixes in the configured set.
NotSurtPrefixedDecideRule() - Constructor for class org.archive.modules.deciderules.surt.NotSurtPrefixedDecideRule
Usual constructor.
NOVEL - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
novelBytes - Variable in class org.archive.modules.fetcher.FetchStats
 
NOVELCOUNT - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
 
novelUrls - Variable in class org.archive.modules.fetcher.FetchStats
 
NUMBER_OF_WEIGHTS - Static variable in class org.archive.util.BloomFilter64bit
The number of weights used to create hash functions.
numberOfCURIsHandled - Variable in class org.archive.modules.extractor.ExtractorJS
 
numberOfCURIsHandled - Variable in class org.archive.modules.extractor.TrapSuppressExtractor
 
numberOfCURIsSuppressed - Variable in class org.archive.modules.extractor.TrapSuppressExtractor
 
numberOfFormsProcessed - Variable in class org.archive.modules.extractor.JerichoExtractorHTML
 
numberOfLinksExtracted - Variable in class org.archive.modules.extractor.Extractor
 
numReplicas - Variable in class org.archive.util.LongToIntConsistentHash
 

O

obeyMetaRobotsNofollow - Variable in class org.archive.modules.net.CustomRobotsPolicy
whether to obey the 'nofollow' directive in an HTML META ROBOTS element
obeyMetaRobotsNofollow() - Method in class org.archive.modules.net.CustomRobotsPolicy
 
obeyMetaRobotsNofollow - Variable in class org.archive.modules.net.FirstNamedRobotsPolicy
whether to obey the 'nofollow' directive in an HTML META ROBOTS element
obeyMetaRobotsNofollow() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
obeyMetaRobotsNofollow() - Method in class org.archive.modules.net.IgnoreRobotsPolicy
 
obeyMetaRobotsNofollow - Variable in class org.archive.modules.net.MostFavoredRobotsPolicy
whether to obey the 'nofollow' directive in an HTML META ROBOTS element
obeyMetaRobotsNofollow() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
obeyMetaRobotsNofollow() - Method in class org.archive.modules.net.ObeyRobotsPolicy
 
obeyMetaRobotsNofollow() - Method in class org.archive.modules.net.RobotsPolicy
 
ObeyRobotsPolicy - Class in org.archive.modules.net
Classic obey-robots-as-declared policy.
ObeyRobotsPolicy() - Constructor for class org.archive.modules.net.ObeyRobotsPolicy
 
object - Variable in class org.archive.net.s3.S3URLConnection
 
ObjectIdentityBdbCache<V extends IdentityCacheable> - Class in org.archive.util
A BDB JE backed object cache.
ObjectIdentityBdbCache() - Constructor for class org.archive.util.ObjectIdentityBdbCache
Constructor.
ObjectIdentityBdbCache.LowMemoryCanary - Class in org.archive.util
 
ObjectIdentityBdbCache.LowMemoryCanary() - Constructor for class org.archive.util.ObjectIdentityBdbCache.LowMemoryCanary
 
ObjectIdentityBdbManualCache<V extends IdentityCacheable> - Class in org.archive.util
A BDB JE backed object cache.
ObjectIdentityBdbManualCache() - Constructor for class org.archive.util.ObjectIdentityBdbManualCache
Constructor.
ObjectIdentityCache<V extends IdentityCacheable> - Interface in org.archive.util
An object cache for create-once-by-name-and-then-reuse objects.
ObjectIdentityMemCache<V extends IdentityCacheable> - Class in org.archive.util
Trivial all-in-memory object cache, using a single internal ConcurrentHashMap.
ObjectIdentityMemCache() - Constructor for class org.archive.util.ObjectIdentityMemCache
 
ObjectIdentityMemCache(int, float, int) - Constructor for class org.archive.util.ObjectIdentityMemCache
 
objectToEntry(K, DatabaseEntry) - Method in class org.archive.bdb.KryoBinding
Copies superclass simply to allow different source for FastOoutputStream.
objectToEntry(Object, DatabaseEntry) - Method in class org.archive.crawler.frontier.RecyclingSerialBinding
Copies superclass simply to allow different source for FastOoutputStream.
obtainReader() - Method in class org.archive.modules.seeds.TextSeedModule
 
obtainReader() - Method in class org.archive.spring.ConfigFile
 
obtainReader() - Method in class org.archive.spring.ConfigString
 
obtainWriter() - Method in class org.archive.spring.ConfigFile
 
obtainWriter(boolean) - Method in class org.archive.spring.ConfigFile
 
obtainWriter() - Method in interface org.archive.spring.WriteTarget
Obtain a Writer for changing this object's contents.
obtainWriter(boolean) - Method in interface org.archive.spring.WriteTarget
Obtain a Writer for changing this object's contents.
offer(E) - Method in class org.archive.bdb.StoredQueue
 
oldFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
onApplicationEvent(ApplicationEvent) - Method in class org.archive.crawler.framework.CrawlJob
Log note of all ApplicationEvents.
onApplicationEvent(ApplicationEvent) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
onApplicationEvent(ApplicationEvent) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
onApplicationEvent(ApplicationEvent) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
Checks available space on StatSnapshotEvents.
onApplicationEvent(ApplicationEvent) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
onApplicationEvent(ApplicationEvent) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Ensure all sheets are 'primed' after the entire ApplicatiotnContext is assembled.
onApplicationEvent(ApplicationEvent) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
onApplicationEvent(ApplicationEvent) - Method in class org.archive.spring.ConfigPathConfigurer
Fix all beans with ConfigPath properties that lack a base path or a name, to use a job-implied base path and name.
OnDomainsDecideRule - Class in org.archive.modules.deciderules.surt
Rule applies configured decision to any URIs that are on one of the domains in the configured set of domains, filled from the seed set.
OnDomainsDecideRule() - Constructor for class org.archive.modules.deciderules.surt.OnDomainsDecideRule
Usual constructor.
OneLineSimpleLogger - Class in org.archive.util
Logger that writes entry on one line with less verbose date.
OneLineSimpleLogger() - Constructor for class org.archive.util.OneLineSimpleLogger
 
onEviction(String, V) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
OnHostsDecideRule - Class in org.archive.modules.deciderules.surt
Rule applies configured decision to any URIs that are on one of the hosts in the configured set of hosts, filled from the seed set.
OnHostsDecideRule() - Constructor for class org.archive.modules.deciderules.surt.OnHostsDecideRule
Usual constructor.
onlyDecision(CrawlURI) - Method in class org.archive.modules.deciderules.AcceptDecideRule
 
onlyDecision(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
 
onlyDecision(CrawlURI) - Method in class org.archive.modules.deciderules.RejectDecideRule
 
onlyStoreIfWriteTagPresent - Variable in class org.archive.modules.recrawl.AbstractPersistProcessor
 
onRefresh() - Method in class org.archive.spring.HeritrixLifecycleProcessor
 
open() - Method in class org.apache.commons.httpclient.HttpConnection
Establishes a connection to the specified host and port (via a proxy if specified).
open(Database) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
open() - Method in interface org.archive.util.ms.Entry
 
openConnection(URL) - Method in class org.archive.net.s3.Handler
 
openDatabase(String, BdbModule.BdbConfig, boolean) - Method in class org.archive.bdb.BdbModule
Open a Database inside this BdbModule's environment, and remember it for automatic close-at-module-stop.
openDatabase(Environment, String) - Method in class org.archive.util.ObjectIdentityBdbCache
 
openDatabase(Environment, String) - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
openDataConnection(int, String) - Method in class org.archive.net.ClientFTP
Opens a data connection.
operator - Variable in class org.archive.modules.CrawlMetadata
 
order - Variable in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
orderProperties(PropertyDescriptor[], String[]) - Static method in class org.archive.crawler.restlet.XmlMarshaller
sort PropertyDescriptors according to propOrder.
ordinal - Variable in class org.archive.modules.CrawlURI
Monotonically increasing number within a crawl; useful for tending towards breadth-first ordering.
org.apache.commons.httpclient - package org.apache.commons.httpclient
 
org.apache.commons.httpclient.cookie - package org.apache.commons.httpclient.cookie
 
org.apache.commons.httpclient.util - package org.apache.commons.httpclient.util
 
org.archive.bdb - package org.archive.bdb
 
org.archive.checkpointing - package org.archive.checkpointing
 
org.archive.crawler - package org.archive.crawler
Introduction to Heritrix.
org.archive.crawler.datamodel - package org.archive.crawler.datamodel
 
org.archive.crawler.deciderules - package org.archive.crawler.deciderules
Provides classes for a simple decision rules framework.
org.archive.crawler.event - package org.archive.crawler.event
 
org.archive.crawler.framework - package org.archive.crawler.framework
 
org.archive.crawler.frontier - package org.archive.crawler.frontier
 
org.archive.crawler.frontier.precedence - package org.archive.crawler.frontier.precedence
 
org.archive.crawler.io - package org.archive.crawler.io
 
org.archive.crawler.migrate - package org.archive.crawler.migrate
 
org.archive.crawler.monitor - package org.archive.crawler.monitor
This package consists of modules that monitor an ongoing crawl by various means, typically interceding if certain limits/thresholds/conditions are met.
org.archive.crawler.postprocessor - package org.archive.crawler.postprocessor
 
org.archive.crawler.prefetch - package org.archive.crawler.prefetch
 
org.archive.crawler.processor - package org.archive.crawler.processor
 
org.archive.crawler.reporting - package org.archive.crawler.reporting
 
org.archive.crawler.restlet - package org.archive.crawler.restlet
 
org.archive.crawler.restlet.models - package org.archive.crawler.restlet.models
 
org.archive.crawler.spring - package org.archive.crawler.spring
 
org.archive.crawler.util - package org.archive.crawler.util
 
org.archive.io - package org.archive.io
 
org.archive.modules - package org.archive.modules
The beginnings of a refactored settings framework.
org.archive.modules.canonicalize - package org.archive.modules.canonicalize
 
org.archive.modules.credential - package org.archive.modules.credential
Contains html form login and basic and digest credentials used by Heritrix logging into sites.
org.archive.modules.deciderules - package org.archive.modules.deciderules
 
org.archive.modules.deciderules.recrawl - package org.archive.modules.deciderules.recrawl
 
org.archive.modules.deciderules.surt - package org.archive.modules.deciderules.surt
 
org.archive.modules.extractor - package org.archive.modules.extractor
 
org.archive.modules.fetcher - package org.archive.modules.fetcher
 
org.archive.modules.forms - package org.archive.modules.forms
 
org.archive.modules.net - package org.archive.modules.net
 
org.archive.modules.recrawl - package org.archive.modules.recrawl
 
org.archive.modules.seeds - package org.archive.modules.seeds
 
org.archive.modules.writer - package org.archive.modules.writer
 
org.archive.net - package org.archive.net
 
org.archive.net.s3 - package org.archive.net.s3
 
org.archive.spring - package org.archive.spring
 
org.archive.state - package org.archive.state
 
org.archive.surt - package org.archive.surt
 
org.archive.util - package org.archive.util
 
org.archive.util.bdbje - package org.archive.util.bdbje
 
org.archive.util.fingerprint - package org.archive.util.fingerprint
 
org.archive.util.iterator - package org.archive.util.iterator
 
org.archive.util.ms - package org.archive.util.ms
Memory-efficient reading of .doc files.
organization - Variable in class org.archive.modules.CrawlMetadata
 
out - Variable in class org.archive.io.CrawlerJournal
Stream on which we record frontier events.
outboundLock - Variable in class org.archive.crawler.frontier.AbstractFrontier
lock to allow holding all worker ToeThreads from taking URIs already on the outbound queue; they acquire read permission before take()ing; frontier can acquire write permission to hold threads
outCandidates - Variable in class org.archive.modules.CrawlURI
 
outlinkRule - Variable in class org.archive.crawler.processor.CrawlMapper
Decide rules to determine if an outlink is subject to mapping.
outLinks - Variable in class org.archive.modules.CrawlURI
All discovered outbound Links (navlinks, embeds, etc.) Can either contain Link instances or CrawlURI instances, or both.
outOfScope(CrawlURI) - Method in class org.archive.crawler.framework.Scoper
Called when a CrawlURI is ruled out of scope.
outOfScope(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
outOfScope(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
Called when a CrawlURI is ruled out of scope.
OverlayContext - Interface in org.archive.spring
Interface for objects that can contribute 'overlays' to replace the usual values in configured objects.
overlayMapsSource - Variable in class org.archive.modules.CrawlURI
 
OverlayMapsSource - Interface in org.archive.spring
Interface for a source of overlay maps by name.
overlayNames - Variable in class org.archive.modules.CrawlURI
 
overMaxRetries(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
overridesActiveFrom(OverlayContext) - Static method in class org.archive.spring.KeyedProperties
 

P

PaddingStringBuffer - Class in org.archive.util
StringBuffer-like utility which can add spaces to reach a certain column.
PaddingStringBuffer() - Constructor for class org.archive.util.PaddingStringBuffer
Create a new PaddingStringBuffer
padTo(int) - Method in class org.archive.util.PaddingStringBuffer
Pad to a given column.
PagedRepresentation - Class in org.archive.crawler.restlet
Representation wrapping a FileRepresentation, displaying its contents in batches of lines at a time, with forward and backward navigation.
PagedRepresentation(FileRepresentation, EnhDirectoryResource, String, String, String) - Constructor for class org.archive.crawler.restlet.PagedRepresentation
 
pageFilter - Variable in class org.archive.crawler.restlet.EnhDirectory
 
pageOutStaleEntries() - Method in class org.archive.util.ObjectIdentityBdbCache
An incremental, poll-based expunger.
paintValue(Graphics, Rectangle) - Method in class org.archive.io.ReadSourceEditor
 
paintValue(Graphics, Rectangle) - Method in class org.archive.spring.ConfigPathEditor
 
parse(String, int, String, boolean, String) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Parse the "Set-Cookie" header value into Cookie array.
parse(String, int, String, boolean, Header) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Parse the "Set-Cookie" Header into an array of Cookies.
parse(String, int, String, boolean, String) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Parses the Set-Cookie value into an array of Cookies.
parse(String, int, String, boolean, Header) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Parse the "Set-Cookie" Header into an array of Cookies.
parse(String, int, String, boolean, String) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
Returns an empty cookie array.
parse(String, int, String, boolean, Header) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
Returns an empty cookie array.
parseAttribute(NameValuePair, Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Parse the cookie attribute and update the corresponsing Cookie properties.
parseAttribute(NameValuePair, Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Parse the cookie attribute and update the corresponsing Cookie properties.
parseAttribute(NameValuePair, Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
Does nothing.
parseDefineBits(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineBitsJPEG3(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineBitsLossless(InStream, int, boolean) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineButtonSound(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineFont(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineFont2(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineJPEG2(InStream, int) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineJPEGTables(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineShape(int, InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineSound(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseDefineSprite(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseFontInfo(InStream, int, boolean) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parsePlaceObject2(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
 
parseRevision(String) - Static method in class org.archive.io.Warc2Arc
 
password - Variable in class org.archive.modules.credential.HttpAuthenticationCredential
Password.
path - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
path - Variable in class org.archive.modules.writer.Kw3WriterProcessor
Top-level directory for archive files.
path - Variable in class org.archive.modules.writer.MirrorWriterProcessor
Top-level directory for mirror files.
path - Variable in class org.archive.spring.ConfigPath
 
path - Variable in class org.archive.spring.ConfigPathConfigurer
'home' directory for all other paths to be resolved relative to; defaults to directory of primary XML config file
PATH_DELIM - Static variable in interface org.apache.commons.httpclient.cookie.CookieSpec
Path delimiter
PATH_DELIM_CHAR - Static variable in interface org.apache.commons.httpclient.cookie.CookieSpec
Path delimiting charachter
pathMatch(String, String) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Performs path-match as defined by the cookie specification.
pathMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Performs path-match as implemented in common browsers.
pathMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
 
PathologicalPathDecideRule - Class in org.archive.modules.deciderules
Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments)
PathologicalPathDecideRule() - Constructor for class org.archive.modules.deciderules.PathologicalPathDecideRule
Constructs a new PathologicalPathFilter.
PathSharingContext - Class in org.archive.spring
Spring ApplicationContext extended for Heritrix use.
PathSharingContext(String) - Constructor for class org.archive.spring.PathSharingContext
 
PathSharingContext(String[], ApplicationContext) - Constructor for class org.archive.spring.PathSharingContext
 
PathSharingContext(String[], boolean, ApplicationContext) - Constructor for class org.archive.spring.PathSharingContext
 
PathSharingContext(String[], boolean) - Constructor for class org.archive.spring.PathSharingContext
 
PathSharingContext(String[]) - Constructor for class org.archive.spring.PathSharingContext
 
pause() - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should not release any URIs, instead holding all threads, until instructed otherwise.
pause() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
pauseAtStart - Variable in class org.archive.crawler.framework.CrawlController
whether to pause at crawl start
pauseThresholdKb - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
When available space on any monitored mounts falls below this threshold, the crawl will be paused.
pauseThresholdMiB - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
 
PDFParser - Class in org.archive.modules.extractor
Supports PDF parsing operations.
PDFParser(String) - Constructor for class org.archive.modules.extractor.PDFParser
 
PDFParser(byte[]) - Constructor for class org.archive.modules.extractor.PDFParser
 
peek() - Method in class org.archive.bdb.StoredQueue
 
peek(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Return the topmost queue item -- and remember it, such that even later higher-priority inserts don't change it.
peekItem - Variable in class org.archive.bdb.StoredQueue
 
peekItem(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.BdbWorkQueue
 
peekItem - Variable in class org.archive.crawler.frontier.WorkQueue
The next item to be returned
peekItem(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
Returns first item from queue (does not delete)
pend(long, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
Place the given FP/CrawlURI pair into the pending set, awaiting a merge to determine if it's actually accepted.
pendDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pendDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pending() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Count of items added, but not yet filtered in or out.
pending() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
pending() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
pendingSet - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
items awaiting merge TODO: consider only sorting just pre-merge TODO: consider using a fastutil long->Object class TODO: consider actually writing items to disk file, as in Najork/Heydon
pendingUris - Variable in class org.archive.crawler.frontier.BdbFrontier
all URIs scheduled to be crawled
percentOfDiscoveredUrisCompleted() - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
This returns the number of completed URIs as a percentage of the total number of URIs encountered (should be inverse to the discovery curve)
persistKeyFor(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractContentDigestHistory
 
persistKeyFor(CrawlURI) - Static method in class org.archive.modules.recrawl.PersistProcessor
Return a preferred String key for persisting the given CrawlURI's AList state.
persistKeyFor(String) - Static method in class org.archive.modules.recrawl.PersistProcessor
 
PersistLoadProcessor - Class in org.archive.modules.recrawl
Loads CrawlURI attributes from previous fetch from persistent storage for consultation by a later recrawl.
PersistLoadProcessor() - Constructor for class org.archive.modules.recrawl.PersistLoadProcessor
 
PersistLogProcessor - Class in org.archive.modules.recrawl
Log CrawlURI attributes from latest fetch for consultation by a later recrawl.
PersistLogProcessor() - Constructor for class org.archive.modules.recrawl.PersistLogProcessor
 
PersistOnlineProcessor - Class in org.archive.modules.recrawl
Common superclass for persisting Processors which directly store/load to persistence (as opposed to logging for batch load later).
PersistOnlineProcessor() - Constructor for class org.archive.modules.recrawl.PersistOnlineProcessor
 
PersistProcessor - Class in org.archive.modules.recrawl
Superclass for Processors which utilize BDB-JE for URI state (including most notably history) persistence.
PersistProcessor() - Constructor for class org.archive.modules.recrawl.PersistProcessor
 
PersistStoreProcessor - Class in org.archive.modules.recrawl
Store CrawlURI attributes from latest fetch to persistent storage for consultation by a later recrawl.
PersistStoreProcessor() - Constructor for class org.archive.modules.recrawl.PersistStoreProcessor
 
Piece - Class in org.archive.util.ms
 
Piece(int, int, int, boolean) - Constructor for class org.archive.util.ms.Piece
 
politenessDelay - Variable in class org.archive.modules.CrawlURI
 
politenessDelayFor(CrawlURI) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
Update any scheduling structures with the new information in this CrawlURI.
poll() - Method in class org.archive.bdb.StoredQueue
 
polynomial - Variable in class st.ata.util.FPGenerator
The polynomial used by this to generate fingerprints.
polynomials - Static variable in class st.ata.util.FPGenerator
Array of irreducible polynomials.
poolMaxActive - Variable in class org.archive.modules.writer.WriterPoolProcessor
Maximum active files in pool.
popOverridesContext() - Static method in class org.archive.spring.KeyedProperties
Remove last-added override map from the stack
populate(CrawlURI, HttpClient, HttpMethod, Map<String, String>) - Method in class org.archive.modules.credential.Credential
 
populate(CrawlURI, HttpClient, HttpMethod, Map<String, String>) - Method in class org.archive.modules.credential.HtmlFormCredential
 
populate(CrawlURI, HttpClient, HttpMethod, Map<String, String>) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
populatePersistEnv(String, File) - Static method in class org.archive.modules.recrawl.PersistProcessor
Populates a new environment db from an old environment db or a persist log.
position - Variable in class org.archive.crawler.restlet.PagedRepresentation
position in file around which to fetch lines
position() - Method in class org.archive.util.ms.BlockInputStream
 
position(long) - Method in class org.archive.util.ms.BlockInputStream
 
postProcessAfterInitialization(Object, String) - Method in class org.archive.spring.ConfigPathConfigurer
Remember all beans for later fixup.
postProcessBeforeInitialization(Object, String) - Method in class org.archive.spring.ConfigPathConfigurer
 
power - Variable in class org.archive.util.BloomFilter64bit
if bitfield is an exact power of 2 in length, it is this power
precedence - Variable in class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
 
precedenceFloor - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
precedence rank at or below which queues are not crawled
PrecedenceLoader - Class in org.archive.crawler.frontier.precedence
Utility class for loading externally-created URI-precedence values into the URI-history database.
PrecedenceLoader() - Constructor for class org.archive.crawler.frontier.precedence.PrecedenceLoader
 
PrecedenceProvider - Class in org.archive.crawler.frontier.precedence
Parent class for precedence-providers, stateful helpers that can be installed in a WorkQueue to implement various queue-precedence policies.
PrecedenceProvider() - Constructor for class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
precedenceProvider - Variable in class org.archive.crawler.frontier.WorkQueue
assigned precedence
PreconditionEnforcer - Class in org.archive.crawler.prefetch
Ensures the preconditions for a fetch -- such as DNS lookup or acquiring and respecting a robots.txt policy -- are satisfied before a URI is passed to subsequent stages.
PreconditionEnforcer() - Constructor for class org.archive.crawler.prefetch.PreconditionEnforcer
 
PredicatedDecideRule - Class in org.archive.modules.deciderules
Rule which applies the configured decision only if a test evaluates to true.
PredicatedDecideRule() - Constructor for class org.archive.modules.deciderules.PredicatedDecideRule
 
PREEMPTIVE_DEFAULT - Static variable in class org.apache.commons.httpclient.HttpState
Deprecated.
This field and feature will be removed following HttpClient 3.0.
PREEMPTIVE_PROPERTY - Static variable in class org.apache.commons.httpclient.HttpState
Deprecated.
This field and feature will be removed following HttpClient 3.0.
prefix - Variable in class org.archive.modules.writer.WriterPoolProcessor
File prefix.
PrefixFinder - Class in org.archive.util
Utility class for extracting prefixes of a given string from a SortedMap.
PrefixFinder() - Constructor for class org.archive.util.PrefixFinder
 
prefixFrom(String) - Method in class org.archive.modules.deciderules.surt.OnDomainsDecideRule
 
prefixFrom(String) - Method in class org.archive.modules.deciderules.surt.OnHostsDecideRule
 
prefixFrom(String) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
prefixKey(String) - Static method in class org.archive.surt.SURTTokenizer
 
preformat(LogRecord) - Method in class org.archive.crawler.io.UriProcessingFormatter
 
PreloadedUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
UriPrecedencePolicy which assigns URIs a precedence from a value that was preloaded for them into the uri-history database.
PreloadedUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
preloadSource - Variable in class org.archive.modules.recrawl.PersistLoadProcessor
A source (either log file or BDB directory) from which to copy history information into the current store at startup.
preloadSourceUrl - Variable in class org.archive.modules.recrawl.PersistLoadProcessor
A log file source url from which to copy history information into the current store at startup.
prepare(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
Apply all configured policies to CrawlURI
prepareMap() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
prepareMap() - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
prepareMap() - Method in class org.archive.modules.fetcher.SimpleCookieStorage
 
preparer - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
prepForFrontier(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
PREREQ_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for prerequisite urls without other context.
PrerequisiteAcceptDecideRule - Class in org.archive.modules.deciderules
Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in the last hopsPath position).
PrerequisiteAcceptDecideRule() - Constructor for class org.archive.modules.deciderules.PrerequisiteAcceptDecideRule
 
Preselector - Class in org.archive.crawler.prefetch
If set to recheck the crawl's scope, gives a yes/no on whether a CrawlURI should be processed at all.
Preselector() - Constructor for class org.archive.crawler.prefetch.Preselector
Constructor.
primaryConfig - Variable in class org.archive.crawler.framework.CrawlJob
 
prime() - Method in class org.archive.spring.Sheet
Ensure any properties targetted by this Sheet know to check the right property paths for overrides at lookup time, and that the override values are compatible types for their destination properties.
print(String) - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnection.print(String, String) Writes the specified String (as bytes) to the output stream.
print(String, String) - Method in class org.apache.commons.httpclient.HttpConnection
Writes the specified String (as bytes) to the output stream.
printHelp() - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
 
printLine(String) - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnection.printLine(String, String) Writes the specified String (as bytes), followed by "\r\n".getBytes() to the output stream.
printLine(String, String) - Method in class org.apache.commons.httpclient.HttpConnection
Writes the specified String (as bytes), followed by "\r\n".getBytes() to the output stream.
printLine() - Method in class org.apache.commons.httpclient.HttpConnection
Writes "\r\n".getBytes() to the output stream.
PROCEED - Static variable in class org.archive.modules.ProcessResult
 
process(CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
 
process(CrawlURI) - Method in class org.archive.modules.Processor
Processes the given URI.
process(CrawlURI, ProcessorChain.ChainStatusReceiver) - Method in class org.archive.modules.ProcessorChain
 
processedSeedsRecords - Variable in class org.archive.crawler.reporting.StatisticsTracker
Record of seeds and latest results
processEmbed(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
 
processEmbed(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
 
processFinish(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Handle the given CrawlURI as having finished a worker ToeThread processing attempt.
processFinish(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Note that the previously emitted CrawlURI has completed its processing (for now).
processForm(CrawlURI, Element) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
 
processGeneralTag(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
 
processGeneralTag(CrawlURI, Element, Attributes) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
 
processingCleanup() - Method in class org.archive.modules.CrawlURI
Clean up after a run through the processing chain.
processLink(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
Handle generic HREF cases.
processMeta(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
Process metadata tags.
processMeta(CrawlURI, Element) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
 
Processor - Class in org.archive.modules
A processor of URIs.
Processor() - Constructor for class org.archive.modules.Processor
 
ProcessorChain - Class in org.archive.modules
Collection of Processors to run.
ProcessorChain() - Constructor for class org.archive.modules.ProcessorChain
 
ProcessorChain.ChainStatusReceiver - Interface in org.archive.modules
 
ProcessorsReport - Class in org.archive.crawler.reporting
The "Processors Report", delegated through the CrawlController to each Processor to dump whatever information it collects for this purpose.
ProcessorsReport() - Constructor for class org.archive.crawler.reporting.ProcessorsReport
 
ProcessorTestBase - Class in org.archive.modules
Unit test for Processor.
ProcessorTestBase() - Constructor for class org.archive.modules.ProcessorTestBase
 
processResponseBody(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
This method is invoked immediately after HttpMethodBase.readResponseBody(HttpState,HttpConnection) and can be overridden by sub-classes in order to provide custom body processing.
processResponseHeaders(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
This method is invoked immediately after HttpMethodBase.readResponseHeaders(HttpState,HttpConnection) and can be overridden by sub-classes in order to provide custom response headers processing.
ProcessResult - Class in org.archive.modules
Returned by a Processor's process method to indicate the status of the process.
ProcessResult.ProcessStatus - Enum in org.archive.modules
 
processScheduleAlways(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Schedule the given CrawlURI regardless of its already-seen status.
processScheduleAlways(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Accept the given CrawlURI for scheduling, as it has passed the alreadyIncluded filter.
processScheduleIfUnique(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Schedule the given CrawlURI if not already-seen.
processScheduleIfUnique(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Arrange for the given CrawlURI to be visited, if it is not already scheduled/completed.
processScript(CrawlURI, CharSequence, int) - Method in class org.archive.modules.extractor.AggressiveExtractorHTML
 
processScript(CrawlURI, CharSequence, int) - Method in class org.archive.modules.extractor.ExtractorHTML
 
processScript(CrawlURI, Element) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
 
processScriptCode(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
Extract the (java)script source in the given CharSequence.
processStatusLine(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
This method is invoked immediately after HttpMethodBase.readStatusLine(HttpState,HttpConnection) and can be overridden by sub-classes in order to provide custom response status line processing.
processStyle(CrawlURI, CharSequence, int) - Method in class org.archive.modules.extractor.ExtractorHTML
Process style text.
processStyle(CrawlURI, Element) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
 
processStyleCode(Extractor, CrawlURI, CharSequence) - Static method in class org.archive.modules.extractor.ExtractorCSS
 
processXml(Extractor, CrawlURI, CharSequence) - Static method in class org.archive.modules.extractor.ExtractorXML
 
profileCxmlPath - Variable in class org.archive.crawler.framework.Engine
 
profileLog - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
profileLog(String) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
profileLog - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
profileLog(String) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
progressLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
progressStatisticsEvent() - Method in class org.archive.crawler.reporting.StatisticsTracker
A method for logging current crawler state.
progressStatisticsLegend(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
progressStatisticsLegend() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
progressStatisticsLine(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
propertyName - Variable in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
 
protocolCommandSent(ProtocolCommandEvent) - Method in class org.archive.net.ClientFTP
 
protocolReplyReceived(ProtocolCommandEvent) - Method in class org.archive.net.ClientFTP
 
publish(LogRecord) - Method in class org.archive.crawler.reporting.AlertHandler
Pass record to AlertThreadGroup.
publish(LogRecord) - Method in class org.archive.crawler.reporting.AlertThreadGroup
Pass a record to all loggers registered with the AlertThreadGroup.
publishAddedSeed(CrawlURI) - Method in class org.archive.modules.seeds.SeedModule
 
publishConcludedSeedBatch() - Method in class org.archive.modules.seeds.SeedModule
 
publishCurrent(LogRecord) - Static method in class org.archive.crawler.reporting.AlertThreadGroup
 
publishNonSeedLine(String) - Method in class org.archive.modules.seeds.SeedModule
 
purgeExpiredCookies() - Method in class org.apache.commons.httpclient.HttpState
Removes all of cookies in this HTTP state that have expired according to the current system time.
purgeExpiredCookies(Date) - Method in class org.apache.commons.httpclient.HttpState
Removes all of cookies in this HTTP state that have expired by the specified date.
push(String) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
 
pushOverrideContext(OverlayContext) - Static method in class org.archive.spring.KeyedProperties
Add an override map to the stack
put(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Put the given CrawlURI in at the appropriate place.
putAllAtomicLongs(Map<String, AtomicLong>, JSONObject) - Static method in class org.archive.util.JSONUtils
 
putAllLongs(Map<String, Long>, JSONObject) - Static method in class org.archive.util.JSONUtils
 
putSheetOverlay(String, String, Object) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Add to named sheet an overlay of the given bean-path and new value.

Q

QNV - Static variable in class org.archive.util.UriUtils
 
qualifyRecordID(URI, String, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
QueueAssignmentPolicy - Class in org.archive.crawler.frontier
Establishes a mapping from CrawlURIs to String keys (queue names).
QueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.QueueAssignmentPolicy
 
queueCreated(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
queueCreated(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.QueuePrecedencePolicy
Set an appropriate initial precedence value on the given newly-created WorkQueue.
queueDb - Variable in class org.archive.bdb.StoredQueue
 
queuedUriCount() - Method in interface org.archive.crawler.framework.Frontier
Number of URIs queued up and waiting for processing.
queuedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
total URIs queued to be visited
queuedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
queuedUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
queueMap - Variable in class org.archive.bdb.StoredQueue
 
QueuePrecedencePolicy - Class in org.archive.crawler.frontier.precedence
Superclass for QueuePrecedencePolicies, which set a integer precedence value on uri-queues inside the frontier when the uri-queue is first created, and before the uri-queue is placed on a new internal queue-of-queues.
QueuePrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.QueuePrecedencePolicy
 
queueReevaluate(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
queueReevaluate(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.QueuePrecedencePolicy
Update an appropriate initial precedence value on the given already-existing WorkQueue.
quickCache - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
cache of most recently seen FPs
quickContains(long) - Method in class org.archive.util.AbstractLongFPSet
Low-cost, non-definitive (except when true) contains test.
quickContains(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
quickContains(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Do a contains() check that doesn't require laggy activity (eg disk IO).
quickContains(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
quickDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
quickDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
QuotaEnforcer - Class in org.archive.crawler.prefetch
A simple quota enforcer.
QuotaEnforcer() - Constructor for class org.archive.crawler.prefetch.QuotaEnforcer
 

R

raAppend(int, String) - Method in class org.archive.util.PaddingStringBuffer
Append a string, right-aligned to the given columm.
raAppend(int, int) - Method in class org.archive.util.PaddingStringBuffer
Append an int right-aligned to the given column.
raAppend(int, long) - Method in class org.archive.util.PaddingStringBuffer
Append a long, right-aligned to the given column.
range - Variable in class org.archive.crawler.restlet.PagedRepresentation
position range [start-of-first-line, past-end-of-last-line] in file
RANGE - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
RANGE_PREFIX - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
RateLimitGuard - Class in org.archive.crawler.restlet
Guard that slows and logs failed authentication attempts, to make brute-force guessing attacks less feasible.
RateLimitGuard(Context, ChallengeScheme, String) - Constructor for class org.archive.crawler.restlet.RateLimitGuard
 
RateLimitGuard(Context, String, Collection<String>, String) - Constructor for class org.archive.crawler.restlet.RateLimitGuard
 
rateReport() - Method in class org.archive.crawler.framework.CrawlJob
 
rateReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
reachedState(Frontier.State) - Method in class org.archive.crawler.frontier.AbstractFrontier
The given state has been reached; if it is a new state, generate a notification to the CrawlController.
read() - Method in class org.archive.util.ms.BlockInputStream
 
read(byte[], int, int) - Method in class org.archive.util.ms.BlockInputStream
 
read(byte[]) - Method in class org.archive.util.ms.BlockInputStream
 
readLine() - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
use #readLine(String)
readLine(String) - Method in class org.apache.commons.httpclient.HttpConnection
Reads up to "\n" from the (unchunked) input stream.
readObjectData(Kryo, ByteBuffer) - Method in class org.archive.net.UURI
 
readObjectFromFile(Class<T>, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
readObjectFromFile(Class<T>, String, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
readPrefixes() - Method in class org.archive.modules.deciderules.surt.OnDomainsDecideRule
Patch the SURT prefix set so that it only includes host-enforcing prefixes
readPrefixes() - Method in class org.archive.modules.deciderules.surt.OnHostsDecideRule
Patch the SURT prefix set so that it only includes host-enforcing prefixes
readPrefixes() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
readResponse(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Reads the response from the given connection.
readResponseBody(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Read the response body from the given HttpConnection.
readResponseHeaders(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Reads the response headers from the given connection.
ReadSourceEditor<T> - Class in org.archive.io
PropertyEditor allowing Strings to become ConfigString instances (implementing ReadSource).
ReadSourceEditor() - Constructor for class org.archive.io.ReadSourceEditor
 
readStatusLine(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Read the status line from the given HttpConnection, setting my status code and status text.
readUuri(String) - Method in class org.archive.modules.CrawlURI
Read a UURI from a String, handling a null or URIException
readyClassQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All per-class queues whose first item may be handed out.
readyQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Put the given queue on the readyClassQueues queue
realm - Variable in class org.archive.modules.credential.HttpAuthenticationCredential
Basic/Digest Auth realm.
receive(CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter.CrawlUriReceiver
 
receive(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Accept the given CrawlURI for scheduling, as it has passed the alreadyIncluded filter.
receive(CrawlURI) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
 
receiver - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
 
receiver - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
 
recheckThresholdKb - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
Available space via 'df' is rechecked after every increment of this much content (uncompressed) is observed.
reconsiderRetiredQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Accommodate any changes in retirement-determining settings (like total-budget or force-retire changes/overlays.
recordControlMessage(String, String) - Method in class org.archive.net.ClientFTP
 
recordDNS(CrawlURI, Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
 
recorderInBufferBytes - Variable in class org.archive.crawler.framework.CrawlController
Size in bytes of in-memory buffer to record inbound traffic.
recorderOutBufferBytes - Variable in class org.archive.crawler.framework.CrawlController
Size in bytes of in-memory buffer to record outbound traffic.
recover - Variable in class org.archive.crawler.frontier.AbstractFrontier
Crawl replay logger.
recoveryCheckpoint - Variable in class org.archive.bdb.BdbModule
 
recoveryCheckpoint - Variable in class org.archive.crawler.framework.CheckpointService
 
recoveryCheckpoint - Variable in class org.archive.crawler.framework.CrawlController
 
recoveryCheckpoint - Variable in class org.archive.crawler.frontier.BdbFrontier
 
recoveryCheckpoint - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
recoveryCheckpoint - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
recoveryCheckpoint - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
recoveryCheckpoint - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
recoveryCheckpoint - Variable in class org.archive.modules.Processor
 
RecoveryLogMapper - Class in org.archive.crawler.util
 
RecoveryLogMapper(String) - Constructor for class org.archive.crawler.util.RecoveryLogMapper
Normal constructor - if encounter not-found seeds while loading recoverLogFileName, will throw throw SeedUrlNotFoundException.
RecoveryLogMapper(String, String) - Constructor for class org.archive.crawler.util.RecoveryLogMapper
Constructor to use if you want to allow not-found seeds, logging them to seedNotFoundLogFileName.
RecrawlAttributeConstants - Interface in org.archive.modules.recrawl
 
recycle() - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
no longer supported and will be removed in the future version of HttpClient
RecyclingSerialBinding<K> - Class in org.archive.crawler.frontier
A SerialBinding that recycles a single FastOutputStream per thread, avoiding reallocation of the internal buffer for either repeated serializations or because of mid-serialization expansions.
RecyclingSerialBinding(ClassCatalog, Class) - Constructor for class org.archive.crawler.frontier.RecyclingSerialBinding
Constructor.
reduce(long) - Method in class st.ata.util.FPGenerator
Return a value equal (mod polynomial) to fp and of degree less than degree.
reenqueued(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
reenqueueQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Enqueue the given queue to either readyClassQueues or inactiveQueues, as appropriate.
referentField - Static variable in class org.archive.util.ObjectIdentityBdbCache
Reference to the Reference#referent Field.
REFERER - Static variable in class org.archive.modules.fetcher.FetchHTTP
 
REFLECTION_FACTORY - Static variable in class org.archive.bdb.AutoKryo
 
refQueue - Variable in class org.archive.util.ObjectIdentityBdbCache
 
RegexRule - Class in org.archive.modules.canonicalize
General conversion rule.
RegexRule() - Constructor for class org.archive.modules.canonicalize.RegexRule
 
registeredClasses - Variable in class org.archive.bdb.AutoKryo
 
RejectDecideRule - Class in org.archive.modules.deciderules
 
RejectDecideRule() - Constructor for class org.archive.modules.deciderules.RejectDecideRule
 
releaseConnection() - Method in class org.apache.commons.httpclient.HttpConnection
Releases the connection.
releaseConnection() - Method in class org.apache.commons.httpclient.HttpMethodBase
Releases the connection being used by this HTTP method.
relocate(long, long, long) - Method in class org.archive.util.AbstractLongFPSet
 
relocate(long, long, long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
remember(String, ConfigPath) - Method in class org.archive.spring.ConfigPathConfigurer
 
remove() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
remove(long) - Method in class org.archive.util.AbstractLongFPSet
 
remove(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
remove(long) - Method in interface org.archive.util.fingerprint.LongFPSet
Remove a fingerprint from the set, if it is there
remove() - Method in class org.archive.util.iterator.CompositeIterator
 
removeAt(long) - Method in class org.archive.util.AbstractLongFPSet
Remove the value at the given index, relocating its successors as necessary.
removeDataPersistentMember(String) - Static method in class org.archive.modules.CrawlURI
Remove the key from those data map members persisted.
removeEldestEntry(Map.Entry<K, V>) - Method in class org.archive.util.LRU
 
removePropertyChangeListener(PropertyChangeListener) - Method in class org.archive.io.ReadSourceEditor
 
removePropertyChangeListener(PropertyChangeListener) - Method in class org.archive.spring.ConfigPathEditor
 
removeRequestHeader(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Remove the request header associated with the given name.
removeRequestHeader(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
Removes the given request header.
removeSheetOverlay(String, String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Remove the given bean-path overlay in the named sheet.
removeSurtAssociation(String, String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
 
renderFlashesHTML(Writer, Request) - Static method in class org.archive.crawler.restlet.Flash
 
reopen(Database) - Method in class org.archive.crawler.util.BdbUriUniqFilter
Call after deserializing an instance of this class.
replicaLocation(int, int) - Method in class org.archive.util.LongToIntConsistentHash
 
replicasInstalledUpTo - Variable in class org.archive.util.LongToIntConsistentHash
 
Report - Class in org.archive.crawler.reporting
Abstract superclass for named crawl reports that need only a StatisticsTracker and can dump a plain-text representation to a PrintWriter.
Report() - Constructor for class org.archive.crawler.reporting.Report
 
report() - Method in class org.archive.modules.extractor.Extractor
 
report() - Method in class org.archive.modules.extractor.JerichoExtractorHTML
 
report() - Method in class org.archive.modules.fetcher.FetchHTTP
 
report() - Method in class org.archive.modules.Processor
 
report() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
reportClass - Variable in class org.archive.crawler.restlet.ReportGenResource
 
ReportGenResource - Class in org.archive.crawler.restlet
Restlet Resource which generates fresh reports and then redirects requests to the report in the filesystem.
ReportGenResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.ReportGenResource
 
reports - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
REPORTS_DIR_NAME - Static variable in class org.archive.crawler.framework.Engine
 
reportsDir - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
reportThread(Thread, PrintWriter) - Static method in class org.archive.crawler.framework.ToeThread
 
reportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
reportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
Compiles and returns a report on its status.
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
 
reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
This method compiles a human readable report on the status of the frontier at the time of the call.
reportTo(PrintWriter) - Method in class org.archive.modules.CrawlURI
 
reportTo(PrintWriter) - Method in class org.archive.modules.fetcher.FetchStats
 
reportTo(PrintWriter) - Method in class org.archive.modules.ProcessorChain
Compiles and returns a human readable report on the active processors.
ReportUtils - Class in org.archive.util
 
ReportUtils() - Constructor for class org.archive.util.ReportUtils
 
represent(Variant) - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
represent(Variant) - Method in class org.archive.crawler.restlet.EngineResource
 
represent(Variant) - Method in class org.archive.crawler.restlet.JobResource
 
represent(Variant) - Method in class org.archive.crawler.restlet.ReportGenResource
 
represent(Variant) - Method in class org.archive.crawler.restlet.ScriptResource
 
requestCrawlCheckpoint() - Method in class org.archive.crawler.framework.CheckpointService
Run a checkpoint of the crawler
requestCrawlPause() - Method in class org.archive.crawler.framework.CrawlController
Stop the crawl temporarly.
requestCrawlResume() - Method in class org.archive.crawler.framework.CrawlController
Resume crawl from paused state
requestCrawlStart() - Method in class org.archive.crawler.framework.CrawlController
Operator requested crawl begin
requestCrawlStop() - Method in class org.archive.crawler.framework.CrawlController
Operator requested for crawl to stop.
requestCrawlStop(CrawlStatus) - Method in class org.archive.crawler.framework.CrawlController
Operator requested for crawl to stop.
requestFlush() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Request that any pending items be added/dropped.
requestFlush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
requestFlush() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
requestLaunch(String) - Method in class org.archive.crawler.framework.Engine
 
requestState(Frontier.State) - Method in interface org.archive.crawler.framework.Frontier
Request the Frontier reach the given state as soon as possible.
requestState(Frontier.State) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
requiredPattern - Variable in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
 
rescheduleTime - Variable in class org.archive.modules.CrawlURI
A future time at which this CrawlURI should be reenqueued.
ReschedulingProcessor - Class in org.archive.crawler.postprocessor
The most simple forced-rescheduling step possible: use a local setting (perhaps overlaid to vary based on the URI) to set an exact future reschedule time, as a delay from now.
ReschedulingProcessor() - Constructor for class org.archive.crawler.postprocessor.ReschedulingProcessor
 
reset() - Method in class org.archive.util.PaddingStringBuffer
reset the buffer back to empty
resetAlertCount() - Method in class org.archive.crawler.reporting.AlertThreadGroup
 
resetAlertCount() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
resetConsecutiveConnectionErrors() - Method in class org.archive.modules.net.CrawlServer
 
resetDeferrals() - Method in class org.archive.modules.CrawlURI
Reset deferrals counter.
resetFetchAttempts() - Method in class org.archive.modules.CrawlURI
Reset fetchAttempts counter.
resetForRescheduling() - Method in class org.archive.modules.CrawlURI
Reset state that that should not persist when a URI is rescheduled for a specific future time.
resetState() - Method in class org.archive.modules.extractor.PDFParser
Reinitialize the object as though a new one were created.
resetState(byte[]) - Method in class org.archive.modules.extractor.PDFParser
Reset the object and initialize it with a new byte array (the document).
resetState(String) - Method in class org.archive.modules.extractor.PDFParser
Reinitialize the object as though a new one were created, complete with a valid pointer to a document that can be read
resolve(String) - Method in class org.archive.crawler.framework.ToeThread
 
resolve(String) - Method in interface org.archive.modules.fetcher.HostResolver
 
ResourceLongerThanDecideRule - Class in org.archive.modules.deciderules
Applies configured decision for URIs with content length greater than a given threshold length value.
ResourceLongerThanDecideRule() - Constructor for class org.archive.modules.deciderules.ResourceLongerThanDecideRule
 
ResourceNoLongerThanDecideRule - Class in org.archive.modules.deciderules
Applies configured decision for URIs with content length less than or equal to a given threshold length value.
ResourceNoLongerThanDecideRule() - Constructor for class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
RESPONSE_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
responseBodyConsumed() - Method in class org.apache.commons.httpclient.HttpMethodBase
A response has been consumed.
ResponseCodeReport - Class in org.archive.crawler.reporting
The "Response Codes Report", tallies by response/disposition code.
ResponseCodeReport() - Constructor for class org.archive.crawler.reporting.ResponseCodeReport
 
ResponseContentLengthDecideRule - Class in org.archive.modules.deciderules
Decide rule that will ACCEPT or REJECT a uri, depending on the "decision" property, after it's fetched, if the content body is within a specified size range, specified in bytes.
ResponseContentLengthDecideRule() - Constructor for class org.archive.modules.deciderules.ResponseContentLengthDecideRule
 
RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
retire() - Method in class org.archive.crawler.framework.ToeThread
Request that this thread retire (exit cleanly) at the earliest opportunity.
retired - Variable in class org.archive.crawler.frontier.WorkQueue
 
retiredQueues - Variable in class org.archive.crawler.frontier.BdbFrontier
'retired' queues, no longer considered for activation.
retireQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Put the given queue on the retiredQueues queue
retryDelayFor(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Return a suitable value to wait before retrying the given URI.
retryMethod(HttpMethod, IOException, int) - Method in class org.archive.modules.fetcher.HeritrixHttpMethodRetryHandler
 
reversedOrder - Variable in class org.archive.crawler.restlet.PagedRepresentation
whether to display lines in reversed order (latest first)
ROBOTS_NOT_FETCHED - Static variable in class org.archive.modules.net.CrawlServer
 
robotsDenials - Variable in class org.archive.modules.fetcher.FetchStats
 
RobotsDirectives - Class in org.archive.modules.net
Represents the directives that apply to a user-agent (or set of user-agents)
RobotsDirectives() - Constructor for class org.archive.modules.net.RobotsDirectives
 
robotsFetched - Variable in class org.archive.modules.net.CrawlServer
 
RobotsPolicy - Class in org.archive.modules.net
RobotsPolicy represents the strategy used by the crawler for determining how robots.txt files will be honored.
RobotsPolicy() - Constructor for class org.archive.modules.net.RobotsPolicy
 
robotstxt - Variable in class org.archive.modules.net.CrawlServer
 
Robotstxt - Class in org.archive.modules.net
Utility class for parsing and representing 'robots.txt' format directives, into a list of named user-agents and map from user-agents to RobotsDirectives.
Robotstxt() - Constructor for class org.archive.modules.net.Robotstxt
 
Robotstxt(BufferedReader) - Constructor for class org.archive.modules.net.Robotstxt
 
Robotstxt(ReadSource) - Constructor for class org.archive.modules.net.Robotstxt
 
rootUriMatch(ServerCache, CrawlURI) - Method in class org.archive.modules.credential.Credential
Test passed curi matches this credentials rootUri.
rotateForCheckpoint(Checkpoint) - Method in class org.archive.io.CrawlerJournal
Handle a checkpoint by rotating the current log to a checkpoint-named file and starting a new log.
rotateLogFiles() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
rotateLogFiles(String) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
rotateLogFiles(String, boolean) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
rotationDigits - Variable in class org.archive.crawler.processor.CrawlMapper
Number of timestamp digits to use as prefix of log names (grouping all diversions from that period in a single log).
ruleAssociations - Variable in class org.archive.crawler.spring.SheetOverlaysManager
all SheetAssociations by DecideRule evaluation
rules - Variable in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
rules - Variable in class org.archive.spring.BeanFieldsPatternValidator
 
RulesCanonicalizationPolicy - Class in org.archive.modules.canonicalize
URI Canonicalizatioon Policy
RulesCanonicalizationPolicy() - Constructor for class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
 
run() - Method in class org.archive.crawler.framework.ActionDirectory
Action taken at scheduled intervals
run() - Method in interface org.archive.crawler.framework.Frontier
Request that Frontier allow crawling to begin.
run() - Method in class org.archive.crawler.framework.ToeThread
(non-Javadoc)
run() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
run() - Method in class org.archive.crawler.reporting.StatisticsTracker
Do activity.
runCandidateChain(CrawlURI, CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
Run candidatesChain on a single candidate CrawlURI; if its reported status is nonnegative, schedule to frontier.
runTest() - Method in class org.archive.state.ModuleTestBase
 
RuntimeErrorFormatter - Class in org.archive.crawler.io
Runtime exception log formatter.
RuntimeErrorFormatter(boolean) - Constructor for class org.archive.crawler.io.RuntimeErrorFormatter
 
runtimeErrorsLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
RuntimeLimitEnforcer - Class in org.archive.crawler.prefetch
A processor to enforce runtime limits on crawls.
RuntimeLimitEnforcer() - Constructor for class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
RuntimeLimitEnforcer.Operation - Enum in org.archive.crawler.prefetch
The action that the processor takes once the runtime has elapsed.
runtimeSeconds - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
The amount of time, in seconds, that the crawl will be allowed to run before this processor performs it's 'end operation.'
runWhileEmpty - Variable in class org.archive.crawler.framework.CrawlController
whether to keep running (without pause or finish) when frontier is empty

S

S3URLConnection - Class in org.archive.net.s3
URLConnection for Amazon S3 objects.
S3URLConnection(URL) - Constructor for class org.archive.net.s3.S3URLConnection
Contruct a new S3URLConnection.
S_BLOCKED_BY_CUSTOM_PROCESSOR - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
Blocked by custom prefetcher processor.
S_BLOCKED_BY_QUOTA - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
Blocked due to exceeding an established quota.
S_BLOCKED_BY_RUNTIME_LIMIT - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
Blocked due to exceeding an established runtime.
S_BLOCKED_BY_USER - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
blocked from fetch by user setting.
S_CONNECT_FAILED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
HTTP connect failed
S_CONNECT_LOST - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
HTTP connect broken
S_DEEMED_CHAFF - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
'chaff' detection of traps/content of negligible value applied
S_DEEMED_NOT_FOUND - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
synthetic status, used when some other status (such as connection-lost) is considered by policy the same as a document-not-found
S_DEFERRED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
temporary status assigned URIs awaiting preconditions; appearance in logs is a bug
S_DELETED_BY_USER - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
deleted from frontier by user
S_DNS_SUCCESS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
DNS success
S_DOMAIN_PREREQUISITE_FAILURE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_DOMAIN_UNRESOLVABLE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
DNS lookup failed
S_GETBYNAME_SUCCESS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
InetAddress.getByName success
S_NOT_FOUND - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
HTTP 404 NOT FOUND
S_OTHER_PREREQUISITE_FAILURE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_OUT_OF_SCOPE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
out-of-scope upoin reexamination (only when scope changes during crawl)
S_PREREQUISITE_UNSCHEDULABLE_FAILURE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
DNS prerequisite failed, precluding attempt
S_PROCESSING_THREAD_KILLED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
Processing thread was killed
S_ROBOTS_PRECLUDED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
robots rules precluded fetch
S_ROBOTS_PREREQUISITE_FAILURE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
Robots prerequisite failed, precluding attempt
S_RUNTIME_EXCEPTION - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
Unexpected runtime exception; see runtime-errors.log
S_SERIOUS_ERROR - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
severe java 'Error' conditions (OutOfMemoryError, StackOverflowError, etc.) during URI processing
S_TIMEOUT - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
HTTP timeout (before any meaningful response received)
S_TOO_MANY_EMBED_HOPS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
overstepped embed/trans hops
S_TOO_MANY_LINK_HOPS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
overstepped link hops
S_TOO_MANY_RETRIES - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
multiple retries all failed
S_UNATTEMPTED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
fetch never tried (perhaps protocol unsupported or illegal URI)
S_UNFETCHABLE_URI - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
URI recognized as unsupported or illegal)
S_UNQUEUEABLE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
URI could not be queued in Frontier; when URIs are properly filtered for format, should never occur
S_WHOIS_GENERIC_FINISHED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
Finished all fetches for serverless WHOIS url (whois:foo.org)
S_WHOIS_SUCCESS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
WHOIS success
sameProgressAs(CrawlStatSnapshot) - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
Return true if this snapshot shows no tangible progress in its URI counts over the supplied snapshot.
saveCookies(String, Map<String, Cookie>) - Static method in class org.archive.modules.fetcher.AbstractCookieStorage
 
saveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
saveCookiesMap(Map<String, Cookie>) - Method in interface org.archive.modules.fetcher.CookieStorage
 
saveHeader(String, HttpMethod, HashMap<String, Object>) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
Save a header from the given HTTP operation into the AList.
saveHeader(String, HttpMethod, ANVLRecord, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
Save a header from the given HTTP operation into the provider headers under a new name
saveHostStats(String, long) - Method in class org.archive.crawler.reporting.StatisticsTracker
Update some running-stats based on a URI success
saveJson(String, JSONObject) - Method in class org.archive.checkpointing.Checkpoint
 
saveSourceStats(String, String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
saveWriter(String, String) - Method in class org.archive.checkpointing.Checkpoint
 
scanActionDirectory() - Method in class org.archive.crawler.framework.ActionDirectory
Find any new files in the 'action' directory; process each in order.
scanJobLog() - Method in class org.archive.crawler.framework.CrawlJob
Refresh knowledge of total launched and last launch by scanning the job.log.
schedule(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
Schedules a CrawlURI.
schedule(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
Arrange for the given CrawlURI to be visited, if it is not already scheduled/completed.
schedule(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Arrange for the given CrawlURI to be visited, if it is not already enqueued/completed.
SchedulingConstants - Class in org.archive.modules
 
SchemeNotInSetDecideRule - Class in org.archive.modules.deciderules
Rule applies the configured decision (default REJECT) for any URI which has a URI-scheme NOT contained in the configured Set.
SchemeNotInSetDecideRule() - Constructor for class org.archive.modules.deciderules.SchemeNotInSetDecideRule
Usual constructor.
schemes - Variable in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
set of schemes to test URI scheme
scope - Variable in class org.archive.crawler.framework.Scoper
 
scope - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
Scoper - Class in org.archive.crawler.framework
Base class for Scopers.
Scoper() - Constructor for class org.archive.crawler.framework.Scoper
Constructor.
scratchDir - Variable in class org.archive.crawler.framework.CrawlController
Scratch directory for temporary overflow-to-disk
scratchDir - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
 
SCRIPT_SRC - Static variable in class org.archive.modules.extractor.HTMLLinkContext
 
ScriptedDecideRule - Class in org.archive.modules.deciderules
Rule which runs a JSR-223 script to make its decision.
ScriptedDecideRule() - Constructor for class org.archive.modules.deciderules.ScriptedDecideRule
 
ScriptedProcessor - Class in org.archive.modules
A processor which runs a JSR-223 script on the CrawlURI.
ScriptedProcessor() - Constructor for class org.archive.modules.ScriptedProcessor
Constructor.
ScriptingConsole - Class in org.archive.crawler.restlet
ScriptingConsole implements view-independent logic of scripting console.
ScriptingConsole(CrawlJob) - Constructor for class org.archive.crawler.restlet.ScriptingConsole
 
ScriptModel - Class in org.archive.crawler.restlet.models
 
ScriptModel(ScriptingConsole, String, Collection<Map<String, String>>) - Constructor for class org.archive.crawler.restlet.models.ScriptModel
 
ScriptResource - Class in org.archive.crawler.restlet
Restlet Resource which runs an arbitrary script, which is supplied with variables pointing to the job and appContext, from which all other live crawl objects are reachable.
ScriptResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.ScriptResource
 
scriptSource - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
 
scriptSource - Variable in class org.archive.modules.ScriptedProcessor
 
secret - Variable in class org.archive.net.s3.S3URLConnection
 
SeedAcceptDecideRule - Class in org.archive.modules.deciderules
Rule which ACCEPTs all 'seed' URIs (those for which isSeed is true).
SeedAcceptDecideRule() - Constructor for class org.archive.modules.deciderules.SeedAcceptDecideRule
 
seedLine(String) - Method in class org.archive.modules.seeds.TextSeedModule
Handle a read line that is probably a seed.
SeedListener - Interface in org.archive.modules.seeds
Implemented by components which want notifications of seed list changes.
seedListeners - Variable in class org.archive.modules.seeds.SeedModule
 
SeedModule - Class in org.archive.modules.seeds
 
SeedModule() - Constructor for class org.archive.modules.seeds.SeedModule
 
SeedRecord - Class in org.archive.crawler.reporting
Record of all interesting info about the most-recent processing of a specific seed.
SeedRecord(CrawlURI, String) - Constructor for class org.archive.crawler.reporting.SeedRecord
Create a record from the given CrawlURI and disposition string
SeedRecord(String, String) - Constructor for class org.archive.crawler.reporting.SeedRecord
Constructor for when a CrawlURI is unavailable; such as when considering seeds not yet passed through as CrawlURIs.
SeedRecord(String, String, int, String) - Constructor for class org.archive.crawler.reporting.SeedRecord
Create a record from the given URI, disposition, HTTP status code, and redirect URI.
seeds - Variable in class org.archive.crawler.framework.ActionDirectory
 
seeds - Variable in class org.archive.crawler.framework.CrawlController
 
seeds - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
seeds - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
 
seeds - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
seeds - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
SEEDS_REDIRECT_NEW_SEEDS_MAX_HOPS - Static variable in class org.archive.crawler.postprocessor.CandidatesProcessor
 
seedsAsSurtPrefixes - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Should seeds also be interpreted as SURT prefixes.
seedsCrawled - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
SeedsReport - Class in org.archive.crawler.reporting
The "Seeds Report", results per provided seed.
SeedsReport() - Constructor for class org.archive.crawler.reporting.SeedsReport
 
seedsTotal - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
SeedUrlNotFoundException - Exception in org.archive.crawler.util
 
SeedUrlNotFoundException(String) - Constructor for exception org.archive.crawler.util.SeedUrlNotFoundException
 
seemsLoginForm() - Method in class org.archive.modules.forms.HTMLForm
For now, we consider a POST form with only 1 password field and 1 potential username field (type text or email) to be a likely login form.
sendCrawlStateChangeEvent(CrawlController.State, CrawlStatus) - Method in class org.archive.crawler.framework.CrawlController
Send crawl change event to all listeners.
sendToQueue(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Send a CrawlURI to the appropriate subqueue.
serialize(Object) - Static method in class org.archive.util.TestUtils
 
SERIALIZED_CLASS_SUFFIX - Static variable in class org.archive.crawler.util.CheckpointUtils
 
seriousError(String) - Method in class org.archive.io.CrawlerJournal
Note a serious error vioa a special log line
SERVER - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
serverCache - Variable in class org.archive.crawler.framework.CrawlController
 
serverCache - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
serverCache - Variable in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
serverCache - Variable in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
serverCache - Variable in class org.archive.crawler.postprocessor.DispositionProcessor
 
serverCache - Variable in class org.archive.crawler.prefetch.PreconditionEnforcer
 
serverCache - Variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
serverCache - Variable in class org.archive.crawler.reporting.StatisticsTracker
 
serverCache - Variable in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
serverCache - Variable in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
serverCache - Variable in class org.archive.modules.fetcher.FetchDNS
Used to do DNS lookups.
serverCache - Variable in class org.archive.modules.fetcher.FetchHTTP
Used to do DNS lookups.
serverCache - Variable in class org.archive.modules.fetcher.FetchWhois
 
ServerCache - Class in org.archive.modules.net
Abstract class for crawl-global registry of CrawlServer (host:port) and CrawlHost (hostname) objects.
ServerCache() - Constructor for class org.archive.modules.net.ServerCache
 
serverCache - Variable in class org.archive.modules.writer.Kw3WriterProcessor
The server cache to use.
serverCache - Variable in class org.archive.modules.writer.WriterPoolProcessor
 
serverInetAddr - Variable in class org.archive.modules.fetcher.FetchDNS
 
servers - Variable in class org.archive.modules.fetcher.DefaultServerCache
hostname[:port] -> CrawlServer.
sessionBudget - Variable in class org.archive.crawler.frontier.WorkQueue
Per-session 'budget' controlling activity duration
set - Variable in class org.archive.crawler.util.TopNSet
 
setAcceptCompression(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setAcceptHeaders(List<String>) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setAcceptNonDnsResolves(boolean) - Method in class org.archive.modules.fetcher.FetchDNS
 
setAction(String) - Method in class org.archive.modules.forms.HTMLForm
 
setActionDir(ConfigPath) - Method in class org.archive.crawler.framework.ActionDirectory
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setAdd(CharSequence) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setAlertsLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setAllowByRegex(String) - Method in class org.archive.crawler.prefetch.Preselector
 
setAllowCreate(boolean) - Method in class org.archive.bdb.BdbModule.BdbConfig
 
setAlsoCheckVia(boolean) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
setApplicableSurtPrefix(String) - Method in class org.archive.modules.forms.FormLoginProcessor
 
setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.framework.ActionDirectory
 
setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.framework.CheckpointService
 
setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.framework.CrawlController
 
setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setApplicationContext(ApplicationContext) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
setApplicationContext(ApplicationContext) - Method in class org.archive.modules.ScriptedProcessor
 
setApplicationContext(ApplicationContext) - Method in class org.archive.spring.ConfigPathConfigurer
Remember ApplicationContext, and if possible primary configuration file's home directory.
setAsText(String) - Method in class org.archive.io.ReadSourceEditor
 
setAsText(String) - Method in class org.archive.spring.ConfigPathEditor
 
setAt(long, long) - Method in class org.archive.util.AbstractLongFPSet
Set the stored value at the given slot.
setAt(long, long) - Method in class org.archive.util.fingerprint.MemLongFPSet
 
setAudience(String) - Method in class org.archive.modules.CrawlMetadata
 
setAuthenticationPreemptive(boolean) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
Use HttpClientParams.setAuthenticationPreemptive(boolean), HttpClient.getParams().
setAvailableRobotsPolicies(Map<String, RobotsPolicy>) - Method in class org.archive.modules.CrawlMetadata
 
setBalanceReplenishAmount(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setBase(ConfigPath) - Method in class org.archive.spring.ConfigPath
 
SetBasedUriUniqFilter - Class in org.archive.crawler.util
UriUniqFilter based on an underlying UriSet (essentially a Set).
SetBasedUriUniqFilter() - Constructor for class org.archive.crawler.util.SetBasedUriUniqFilter
 
setBasePrecedence(int) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
 
setBasePrecedence(int) - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
setBaseURI(String) - Method in class org.archive.modules.CrawlURI
Set the (HTML) Base URI used for derelativizing internal URIs.
setBaseURI(UURI) - Method in class org.archive.modules.CrawlURI
 
setBdbModule(BdbModule) - Method in class org.archive.crawler.frontier.BdbFrontier
 
setBdbModule(BdbModule) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
setBdbModule(BdbModule) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setBdbModule(BdbModule) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setBdbModule(BdbModule) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
setBdbModule(BdbModule) - Method in class org.archive.modules.fetcher.FetchWhois
 
setBdbModule(BdbModule) - Method in class org.archive.modules.net.BdbServerCache
 
setBdbModule(BdbModule) - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
setBdbModule(BdbModule) - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
 
setBeanFactory(BeanFactory) - Method in class org.archive.crawler.spring.SheetOverlaysManager
 
setBeanFactory(BeanFactory) - Method in class org.archive.spring.Sheet
 
setBeanName(String) - Method in class org.archive.crawler.frontier.BdbFrontier
 
setBeanName(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setBeanName(String) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
setBeanName(String) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setBeanName(String) - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
setBeanName(String) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
setBeanName(String) - Method in class org.archive.modules.Processor
 
setBeanName(String) - Method in class org.archive.spring.HeritrixLifecycleProcessor
 
setBeanName(String) - Method in class org.archive.spring.Sheet
 
setBit(long) - Method in class org.archive.util.BloomFilter64bit
Changes the bit with index bitIndex in local bitvector.
setBlockAll(boolean) - Method in class org.archive.crawler.prefetch.Preselector
 
setBlockAwaitingSeedLines(int) - Method in class org.archive.modules.seeds.TextSeedModule
 
setBlockByRegex(String) - Method in class org.archive.crawler.prefetch.Preselector
 
setBloomFilter(BloomFilter) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setCachePercent(int) - Method in class org.archive.bdb.BdbModule
 
setCacheSize(int) - Method in class org.archive.bdb.BdbModule
 
setCalculateRobotsOnly(boolean) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
setCandidateChain(CandidateChain) - Method in class org.archive.crawler.framework.CrawlController
 
setCandidateChain(CandidateChain) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
setCandidateUserAgents(List<String>) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
setCandidateUserAgents(List<String>) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
setCanonicalizationPolicy(UriCanonicalizationPolicy) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
setCanonicalString(String) - Method in class org.archive.modules.CrawlURI
 
setCapacity(int) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
 
setCaseSensitiveFilesystem(boolean) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setCharacterMap(List<String>) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setCheckOutlinks(boolean) - Method in class org.archive.crawler.processor.CrawlMapper
 
setCheckpointDir(ConfigPath) - Method in class org.archive.checkpointing.Checkpoint
 
setCheckpointIntervalMinutes(long) - Method in class org.archive.crawler.framework.CheckpointService
Period at which to create automatic checkpoints; -1 means no auto checkpointing.
setCheckpointsDir(ConfigPath) - Method in class org.archive.crawler.framework.CheckpointService
Checkpoints directory
setCheckUri(boolean) - Method in class org.archive.crawler.processor.CrawlMapper
 
setChmod(boolean) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
setChmodValue(String) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
setClassKey(String) - Method in class org.archive.modules.CrawlURI
 
setCollection(String) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
setComment(String) - Method in class org.apache.commons.httpclient.Cookie
If a user agent (web browser) presents this cookie to a user, the cookie's purpose will be described using this comment.
setComment(String) - Method in class org.archive.modules.deciderules.DecideRule
 
setCompress(boolean) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setConditionalGetHeader(CrawlURI, HttpMethod, boolean, String, String) - Method in class org.archive.modules.fetcher.FetchHTTP
Set the given conditional-GET header, if the setting is enabled and a suitable value is available in the URI history.
setConfigPathConfigurer(ConfigPathConfigurer) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
Autowire access to ConfigPathConfigurer
setConfigurer(ConfigPathConfigurer) - Method in class org.archive.spring.ConfigPath
 
setConnectionCloseForced(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets whether or not the connection should be force-closed when no longer needed.
setConnectionTimeout(int) - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnectionParams.setConnectionTimeout(int), HttpConnection.getParams().
setConnectTimeoutMs(int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
 
setConsoleHandler() - Static method in class org.archive.util.OneLineSimpleLogger
 
setContentDigest(byte[]) - Method in class org.archive.modules.CrawlURI
setContentDigest(String, byte[]) - Method in class org.archive.modules.CrawlURI
 
setContentDigestHistory(AbstractContentDigestHistory) - Method in class org.archive.modules.recrawl.ContentDigestHistoryLoader
 
setContentDigestHistory(AbstractContentDigestHistory) - Method in class org.archive.modules.recrawl.ContentDigestHistoryStorer
 
setContentLengthThreshold(long) - Method in class org.archive.modules.deciderules.ContentLengthDecideRule
 
setContentLengthThreshold(long) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
setContentRegexes(Map<String, String>) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
A map of { name => regex }.
setContentSize(long) - Method in class org.archive.modules.CrawlURI
Sets the 'content size' for the URI, which is considered inclusive of all of all recorded material (such as protocol headers) or even material 'virtually' considered (as in material from a previous fetch confirmed unchanged with a server).
setContentType(String) - Method in class org.archive.modules.CrawlURI
Set a fetched uri's content type.
setContentTypeMap(List<String>) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setCookiePolicy(int) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
Use HttpMethodParams.setCookiePolicy(String), HttpMethod.getParams().
setCookiesLoadFile(ConfigFile) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
setCookiesMap(SortedMap<String, Cookie>) - Method in class org.apache.commons.httpclient.HttpState
Replace the standard sorted map with an external implemenations (such as one backed by persistent store, like BDB's StoredSortedMap.)
setCookiesSaveFile(ConfigPath) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
setCookieStorage(CookieStorage) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setCostAssignmentPolicy(CostAssignmentPolicy) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
setCount() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setCount() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setCountryCode(String) - Method in class org.archive.modules.net.CrawlHost
Set country code for this hos
setCountryCodes(List<String>) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
setCrawlController(CrawlController) - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
 
setCrawlController(CrawlController) - Method in class org.archive.crawler.framework.CheckpointService
 
setCrawlController(CrawlController) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
setCrawlController(CrawlController) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setCrawlController(CrawlController) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
Autowire access to CrawlController
setCrawlController(CrawlController) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
setCrawlController(CrawlController) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
setCrawlController(CrawlController) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setCrawlDelay(float) - Method in class org.archive.modules.net.RobotsDirectives
 
setCrawlerCount(long) - Method in class org.archive.crawler.processor.HashCrawlMapper
 
setCrawlLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setCreateHostDirectory(boolean) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setCreatePortDirectory(boolean) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setCredentials(String, String, Credentials) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use #setCredentials(AuthScope, Credentials)
setCredentials(AuthScope, Credentials) - Method in class org.apache.commons.httpclient.HttpState
Sets the credentials for the given authentication scope.
setCredentials(Map<String, Credential>) - Method in class org.archive.modules.credential.CredentialStore
 
setCredentialStore(CredentialStore) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
setCredentialStore(CredentialStore) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setCustomRobots(ReadSource) - Method in class org.archive.modules.net.CustomRobotsPolicy
 
setDecision(DecideResult) - Method in class org.archive.modules.deciderules.PredicatedDecideRule
 
setDefaultEncoding(String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setDefaultUriPrecedencePolicy(UriPrecedencePolicy) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
setDeferredWrite(boolean) - Method in class org.archive.bdb.BdbModule.BdbConfig
 
setDeferToPrevious(boolean) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
setDelayFactor(float) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
setDelaySeconds(int) - Method in class org.archive.crawler.framework.ActionDirectory
 
setDescription(String) - Method in class org.archive.modules.CrawlMetadata
 
setDestination(UriUniqFilter.CrawlUriReceiver) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Receiver of uniq URIs.
setDestination(UriUniqFilter.CrawlUriReceiver) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setDestination(UriUniqFilter.CrawlUriReceiver) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setDigestAlgorithm(String) - Method in class org.archive.modules.fetcher.FetchDNS
 
setDigestAlgorithm(String) - Method in class org.archive.modules.fetcher.FetchFTP
 
setDigestAlgorithm(String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setDigestContent(boolean) - Method in class org.archive.modules.fetcher.FetchDNS
 
setDigestContent(boolean) - Method in class org.archive.modules.fetcher.FetchFTP
 
setDigestContent(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setDir(ConfigPath) - Method in class org.archive.bdb.BdbModule
 
setDirectory(ConfigPath) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setDirectoryFile(String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setDispositionChain(DispositionChain) - Method in class org.archive.crawler.framework.CrawlController
 
setDiversionDir(ConfigPath) - Method in class org.archive.crawler.processor.CrawlMapper
 
setDNSServerIPLabel(String) - Method in class org.archive.modules.CrawlURI
 
setDoAuthentication(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets whether or not the HTTP method should automatically handle HTTP authentication challenges (status code 401, etc.)
setDomain(String) - Method in class org.apache.commons.httpclient.Cookie
Sets the domain attribute.
setDomain(String) - Method in class org.archive.modules.credential.Credential
 
setDomainAttributeSpecified(boolean) - Method in class org.apache.commons.httpclient.Cookie
Indicates whether the cookie had a domain specified in a domain attribute of the Set-Cookie header.
setDoneDir(ConfigPath) - Method in class org.archive.crawler.framework.ActionDirectory
 
setDotBegin(String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setDotEnd(String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setDumpPendingAtClose(boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
 
setEarliestNextURIEmitTime(long) - Method in class org.archive.modules.net.CrawlHost
Set the earliest time a URI for this host could be emitted.
setEditFilter(IOFileFilter) - Method in class org.archive.crawler.restlet.EnhDirectory
 
setEnabled(boolean) - Method in class org.archive.modules.canonicalize.BaseRule
 
setEnabled(boolean) - Method in class org.archive.modules.deciderules.DecideRule
 
setEnabled(boolean) - Method in class org.archive.modules.Processor
 
setEngineName(String) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
setEngineName(String) - Method in class org.archive.modules.ScriptedProcessor
 
setError(String) - Method in class org.archive.modules.CrawlURI
 
setErrorPenaltyAmount(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setExpectedConcurrency(int) - Method in class org.archive.bdb.BdbModule
 
setExpirationOperation(RuntimeLimitEnforcer.Operation) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
setExpiryDate(Date) - Method in class org.apache.commons.httpclient.Cookie
Sets expiration date.
setExtract404s(boolean) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setExtractAllForms(boolean) - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
setExtractFromDirs(boolean) - Method in class org.archive.modules.fetcher.FetchFTP
 
setExtractIndependently(boolean) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setExtractJavascript(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setExtractOnlyFormGets(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setExtractorJS(ExtractorJS) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setExtractorJS(ExtractorJS) - Method in class org.archive.modules.extractor.ExtractorSWF
 
setExtractorParameters(ExtractorParameters) - Method in class org.archive.modules.extractor.Extractor
 
setExtractParent(boolean) - Method in class org.archive.modules.fetcher.FetchFTP
 
setExtractValueAttributes(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setFetchBeginTime(long) - Method in class org.archive.modules.CrawlURI
 
setFetchChain(FetchChain) - Method in class org.archive.crawler.framework.CrawlController
 
setFetchCompletedTime(long) - Method in class org.archive.modules.CrawlURI
 
setFetchStatus(int) - Method in class org.archive.modules.CrawlURI
Set the overall/fetch status of this CrawlURI for its current trip through the processing loop.
setFetchType(CrawlURI.FetchType) - Method in class org.archive.modules.CrawlURI
 
setFlashes(List<Flash>) - Method in class org.archive.crawler.restlet.models.ViewModel
 
setFollowRedirects(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets whether or not the HTTP method should automatically follow HTTP redirects (status code 302, etc.)
setForceFetch(boolean) - Method in class org.archive.modules.CrawlURI
Method to signal that this URI should be fetched even though it already has been crawled.
setForceQueueAssignment(String) - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
 
setForceRetire(boolean) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
setForceRetire(boolean) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setForceRetire(boolean) - Method in class org.archive.modules.CrawlURI
 
setForgetAllButLatest(boolean) - Method in class org.archive.checkpointing.Checkpoint
 
setForgetAllButLatest(boolean) - Method in class org.archive.crawler.framework.CheckpointService
True to save only the latest checkpoint, false to save all of them.
setFormat(String) - Method in class org.archive.modules.canonicalize.RegexRule
 
setFormat(String) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
setFormItems(Map<String, String>) - Method in class org.archive.modules.credential.HtmlFormCredential
 
setFpset(LongFPSet) - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setFrequentFlushes(boolean) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setFrontier(Frontier) - Method in class org.archive.crawler.framework.ActionDirectory
 
setFrontier(Frontier) - Method in class org.archive.crawler.framework.CrawlController
 
setFrontier(Frontier) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
setFrontier(Frontier) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setFrontier(Frontier) - Method in class org.archive.crawler.processor.HashCrawlMapper
 
setFrontier(Frontier) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
setFrontierPreparer(FrontierPreparer) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setFullVia(CrawlURI) - Method in class org.archive.modules.CrawlURI
 
setGetBit(long) - Method in class org.archive.util.BloomFilter64bit
Sets the bit with index bitIndex in local bitvector -- returning the old value.
setGroupMaxAllKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setGroupMaxFetchResponses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setGroupMaxFetchSuccesses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setGroupMaxSuccessKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setHarvester(String) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
setHistoryDbName(String) - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
setHistoryDbName(String) - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
 
setHistoryLength(int) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
 
setHolder(Object) - Method in class org.archive.modules.CrawlURI
Remember a 'holder' to which some enclosing/queueing facility has assigned this CrawlURI .
setHolderCost(int) - Method in class org.archive.modules.CrawlURI
Remember a 'holderCost' which some enclosing/queueing facility has assigned this CrawlURI
setHolderKey(Object) - Method in class org.archive.modules.CrawlURI
Remember a 'holderKey' which some enclosing/queueing facility has assigned this CrawlURI .
setHost(String) - Method in class org.apache.commons.httpclient.HttpConnection
Sets the host to connect to.
setHostConfiguration(HostConfiguration) - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
no longer applicable
setHostMap(List<String>) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setHostMaxAllKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setHostMaxFetchResponses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setHostMaxFetchSuccesses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setHostMaxSuccessKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setHttp11(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
Use HttpMethodParams.setVersion(HttpVersion)
setHttpAuthChallenges(Map<String, String>) - Method in class org.archive.modules.CrawlURI
 
setHttpAuthChallenges(Map<String, String>) - Method in class org.archive.modules.net.CrawlServer
 
setHttpBindAddress(String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setHttpConnectionManager(HttpConnectionManager) - Method in class org.apache.commons.httpclient.HttpConnection
Sets the httpConnectionManager.
setHttpMethod(HttpMethod) - Method in class org.archive.modules.CrawlURI
 
setHttpMethod(HtmlFormCredential.Method) - Method in class org.archive.modules.credential.HtmlFormCredential
 
setHttpProxyHost(String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setHttpProxyPassword(String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setHttpProxyPort(int) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setHttpProxyUser(String) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.crawler.frontier.WorkQueue
 
setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.crawler.reporting.SeedRecord
 
setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.modules.net.CrawlHost
 
setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.modules.net.CrawlServer
 
setIdentityCache(ObjectIdentityCache<?>) - Method in interface org.archive.util.IdentityCacheable
 
setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.util.IdentityCacheableWrapper
 
setIgnoreCookies(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setIgnoreFormActionUrls(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setIgnoreUnexpectedHtml(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setIncrementCounts(String) - Method in class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
 
setInferRootPage(boolean) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
setInitialDelaySeconds(int) - Method in class org.archive.crawler.framework.ActionDirectory
 
setIntervalSeconds(int) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setIP(InetAddress, long) - Method in class org.archive.modules.net.CrawlHost
Set the IP address for this host.
setIpAddresses(Set<String>) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
setIpValidityDurationSeconds(int) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
setIsolateThreads(boolean) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
setIsolateThreads(boolean) - Method in class org.archive.modules.ScriptedProcessor
 
setJobName(String) - Method in class org.archive.modules.CrawlMetadata
 
setKeepSnapshotsCount(int) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setLargestQueuesCount(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setLastResponseInputStream(InputStream) - Method in class org.apache.commons.httpclient.HttpConnection
Set the state to keep track of the last response for the last request.
setListLogicalOr(boolean) - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
 
setLiveHostReportSize(int) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setLocalAddress(InetAddress) - Method in class org.apache.commons.httpclient.HttpConnection
Set the local address used when creating the connection.
setLocalName(String) - Method in class org.archive.crawler.processor.CrawlMapper
 
setLocked(boolean) - Method in class org.apache.commons.httpclient.HttpConnection
Locks or unlocks the connection.
setLogExtraInfo(boolean) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setLogFile(ConfigPath) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.framework.CrawlController
 
setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.framework.Scoper
 
setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
setLoggerModule(SimpleFileLoggerProvider) - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
setLoggerModule(UriErrorLoggerModule) - Method in class org.archive.modules.extractor.Extractor
 
setLoggerModule(UriErrorLoggerModule) - Method in class org.archive.modules.forms.FormLoginProcessor
 
setLogin(String) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
setLoginPassword(String) - Method in class org.archive.modules.forms.FormLoginProcessor
 
setLoginUri(String) - Method in class org.archive.modules.credential.HtmlFormCredential
 
setLoginUsername(String) - Method in class org.archive.modules.forms.FormLoginProcessor
 
setLogRejectsRule(DecideRule) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
setLogToFile(boolean) - Method in class org.archive.crawler.framework.Scoper
 
setLogToFile(boolean) - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
setLookup(ExternalGeoLookupInterface) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
setLowerBound(Integer) - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Sets the lower bound on the range of acceptable status codes.
setLowerBound(Integer) - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Sets the lower bound on the range of acceptable status codes.
setLowerBound(long) - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
The rule will apply if the url has been fetched and content body length is greater than or equal to this number of bytes.
setMap(Map<String, Object>) - Method in class org.archive.spring.Sheet
Set map of property full bean-path (starting with a target bean-name) to alternate values.
setMapPath(ConfigPath) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
setMapUri(String) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
setMaxAttributeNameLength(int) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setMaxAttributeValLength(int) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setMaxBytesDownload(long) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
setMaxDelayMs(int) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
setMaxDocumentsDownload(long) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
setMaxElementLength(int) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setMaxFetchKBSec(int) - Method in class org.archive.modules.fetcher.FetchFTP
 
setMaxFetchKBSec(int) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setMaxFileSizeBytes(long) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
setMaxFileSizeBytes(long) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setMaxHops(int) - Method in class org.archive.modules.deciderules.TooManyHopsDecideRule
 
setMaxLengthBytes(long) - Method in class org.archive.modules.fetcher.FetchFTP
 
setMaxLengthBytes(long) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setMaxOutlinks(int) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setMaxPathDepth(int) - Method in class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
 
setMaxPathLength(int) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setMaxPending(int) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setMaxPerHostBandwidthUsageKbSec(int) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
setMaxQueuesPerReportCategory(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setMaxRepetitions(int) - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
 
setMaxRetries(int) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setMaxSegLength(int) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setMaxSize(int) - Method in class org.archive.crawler.util.TopNSet
 
setMaxSizeToDigest(long) - Method in class org.archive.modules.extractor.HTTPContentDigest
 
setMaxSizeToParse(long) - Method in class org.archive.modules.extractor.ExtractorPDF
 
setMaxSizeToParse(long) - Method in class org.archive.modules.extractor.ExtractorUniversal
 
setMaxSpeculativeHops(int) - Method in class org.archive.modules.deciderules.TransclusionDecideRule
 
setMaxTimeSeconds(long) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
 
setMaxToeThreads(int) - Method in class org.archive.crawler.framework.CrawlController
 
setMaxTotalBytesToWrite(long) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setMaxTransHops(int) - Method in class org.archive.modules.deciderules.TransclusionDecideRule
 
setMaxWaitForIdleMs(int) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setMetadata(CrawlMetadata) - Method in class org.archive.crawler.framework.CrawlController
 
setMetadata(CrawlMetadata) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
setMetadata(CrawlMetadata) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
setMetadata(CrawlMetadata) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setMetadataProvider(CrawlMetadata) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setMethod(String) - Method in class org.archive.modules.forms.HTMLForm
 
setMethodRetryHandler(MethodRetryHandler) - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
use HttpMethodParams
setMinDelayMs(int) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
setMonitorConfigPaths(boolean) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
If enabled, all the paths returned by ConfigPathConfigurer.getAllConfigPaths() will be monitored in addition to any paths explicitly specified via DiskSpaceMonitor.setMonitorPaths(List).
setMonitorMounts(List<String>) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
setMonitorPaths(List<String>) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
 
setName(String) - Method in class org.archive.spring.ConfigPath
 
setName(String) - Method in class org.archive.spring.Sheet
 
setNavlinksOnly(boolean) - Method in class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
 
setNonfatalErrorsLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setObeyMetaRobotsNofollow(boolean) - Method in class org.archive.modules.net.CustomRobotsPolicy
 
setObeyMetaRobotsNofollow(boolean) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
setObeyMetaRobotsNofollow(boolean) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
setOnlyStoreIfWriteTagPresent(boolean) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
 
setOperator(String) - Method in class org.archive.modules.CrawlMetadata
 
setOperatorContactUrl(String) - Method in class org.archive.modules.CrawlMetadata
 
setOperatorFrom(String) - Method in class org.archive.modules.CrawlMetadata
 
setOrder(int) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
setOrdinal(long) - Method in class org.archive.modules.CrawlURI
 
setOrganization(String) - Method in class org.archive.modules.CrawlMetadata
 
setOutlinkRule(DecideRule) - Method in class org.archive.crawler.processor.CrawlMapper
 
setOverlayMapsSource(OverlayMapsSource) - Method in class org.archive.modules.CrawlURI
 
setParallelQueues(int) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
setParams(HttpConnectionParams) - Method in class org.apache.commons.httpclient.HttpConnection
Assigns HTTP protocol parameters for this method.
setParams(HttpMethodParams) - Method in class org.apache.commons.httpclient.HttpMethodBase
Assigns HTTP protocol parameters for this method.
setPassword(String) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
setPassword(String) - Method in class org.archive.modules.fetcher.FetchFTP
 
setPath(String) - Method in class org.apache.commons.httpclient.Cookie
Sets the path attribute.
setPath(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets the path of the HTTP method.
setPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setPath(ConfigPath) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
setPath(ConfigPath) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setPath(String) - Method in class org.archive.spring.ConfigPath
 
setPathAttributeSpecified(boolean) - Method in class org.apache.commons.httpclient.Cookie
Indicates whether the cookie had a path specified in a path attribute of the Set-Cookie header.
setPauseAtStart(boolean) - Method in class org.archive.crawler.framework.CrawlController
 
setPauseThresholdKb(int) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
setPauseThresholdMiB(long) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
Set the minimum amount of space that must be available on all monitored paths.
setPolitenessDelay(long) - Method in class org.archive.modules.CrawlURI
 
setPool(WriterPool) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setPoolMaxActive(int) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setPort(int) - Method in class org.apache.commons.httpclient.HttpConnection
Sets the port to connect to.
setPrecedence(Integer) - Method in class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
 
setPrecedence(int) - Method in class org.archive.modules.CrawlURI
 
setPrecedenceFloor(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setPrecedenceProvider(PrecedenceProvider) - Method in class org.archive.crawler.frontier.WorkQueue
 
setPreferenceDepthHops(int) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
setPreferenceDepthHops(int) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
setPreferenceEmbedHops(int) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
setPrefix(String) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setPreloadSource(ConfigPath) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
setPreloadSourceUrl(String) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
setPrerequisite(boolean) - Method in class org.archive.modules.CrawlURI
Set if this CrawlURI is itself a prerequisite URI.
setPrerequisiteUri(CrawlURI) - Method in class org.archive.modules.CrawlURI
Set a prerequisite for this URI.
setProcessErrorOutlinks(boolean) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
setProcessors(List<Processor>) - Method in class org.archive.modules.ProcessorChain
 
setProfileLog(File) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
Set a File to receive a log for replay profiling.
setProfileLog(File) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
 
setProfileLog(File) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setProgressLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setProtocol(Protocol) - Method in class org.apache.commons.httpclient.HttpConnection
Sets the protocol used to establish the connection
setProxyCredentials(String, String, Credentials) - Method in class org.apache.commons.httpclient.HttpState
Deprecated.
use #setProxyCredentials(AuthScope, Credentials)
setProxyCredentials(AuthScope, Credentials) - Method in class org.apache.commons.httpclient.HttpState
Sets the proxy credentials for the given authentication realm.
setProxyHost(String) - Method in class org.apache.commons.httpclient.HttpConnection
Sets the host to proxy through.
setProxyPort(int) - Method in class org.apache.commons.httpclient.HttpConnection
Sets the port of the host to proxy through.
setQueryString(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets the query string of this HTTP method.
setQueryString(NameValuePair[]) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets the query string of this HTTP method.
setQueueAssignmentPolicy(QueueAssignmentPolicy) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
setQueuePrecedencePolicy(QueuePrecedencePolicy) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setQueueTotalBudget(long) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setRealm(String) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
 
setRecheckScope(boolean) - Method in class org.archive.crawler.prefetch.Preselector
 
setRecheckThresholdKb(int) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
setRecorder(Recorder) - Method in class org.archive.modules.CrawlURI
Set the http recorder to be associated with this uri.
setRecorderInBufferBytes(int) - Method in class org.archive.crawler.framework.CrawlController
 
setRecorderOutBufferBytes(int) - Method in class org.archive.crawler.framework.CrawlController
 
setRecordIDGenerator(RecordIDGenerator) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
 
setRecoveryCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
Used to inform a bean that it should restore its state from the given Checkpoint when launched (Lifecycle start()).
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CheckpointService
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
 
setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
 
setRecoveryCheckpointByName(String) - Method in class org.archive.crawler.framework.CheckpointService
Given the name of a valid checkpoint subdirectory in the checkpoints directory, create a Checkpoint instance, and insert it into all Checkpointable beans.
setRecoveryLogEnabled(boolean) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setReducePrefixRegex(String) - Method in class org.archive.crawler.processor.HashCrawlMapper
 
setRegex(Pattern) - Method in class org.archive.modules.canonicalize.RegexRule
 
setRegex(Pattern) - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
 
setRegex(Pattern) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
setRegexList(List<Pattern>) - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.BloomUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.FPUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.MemUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.NoopUriUniqFilter
 
setRemove(CharSequence) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
 
setRemoveTriggerUris(boolean) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
setReports(List<Report>) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setReportsDir(ConfigPath) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setRequestHeader(String, String) - Method in class org.apache.commons.httpclient.HttpMethodBase
Set the specified request header, overwriting any previous value.
setRequestHeader(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets the specified request header, overwriting any previous value.
setRescheduleDelaySeconds(long) - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
 
setRescheduleTime(long) - Method in class org.archive.modules.CrawlURI
 
setRespectCrawlDelayUpToSeconds(int) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
setResponseStream(InputStream) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets the response stream.
setRetired(boolean) - Method in class org.archive.crawler.frontier.WorkQueue
Set the retired status of this queue.
setRetryDelaySeconds(int) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setRobotsPolicyName(String) - Method in class org.archive.modules.CrawlMetadata
 
setRobotsValidityDurationSeconds(int) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
setRotationDigits(int) - Method in class org.archive.crawler.processor.CrawlMapper
 
setRules(DecideRule) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
 
setRules(List<CanonicalizationRule>) - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
 
setRules(List<DecideRule>) - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
setRuntimeErrorsLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setRuntimeSeconds(long) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
setRunWhileEmpty(boolean) - Method in class org.archive.crawler.framework.CrawlController
 
setSchedulingDirective(int) - Method in class org.archive.modules.CrawlURI
 
setSchemes(Set<String>) - Method in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
 
setScope(DecideRule) - Method in class org.archive.crawler.framework.Scoper
 
setScope(DecideRule) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setScratchDir(ConfigPath) - Method in class org.archive.crawler.framework.CrawlController
 
setScriptSource(ReadSource) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
 
setScriptSource(ReadSource) - Method in class org.archive.modules.ScriptedProcessor
 
setSecure(boolean) - Method in class org.apache.commons.httpclient.Cookie
Sets the secure attribute of the cookie.
setSeed(boolean) - Method in class org.archive.modules.CrawlURI
Set the isSeed attribute of this URI.
setSeedListeners(Set<SeedListener>) - Method in class org.archive.modules.seeds.SeedModule
 
setSeeds(SeedModule) - Method in class org.archive.crawler.framework.ActionDirectory
 
setSeeds(SeedModule) - Method in class org.archive.crawler.framework.CrawlController
 
setSeeds(SeedModule) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setSeeds(SeedModule) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
setSeeds(SeedModule) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setSeeds(SeedModule) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
setSeedsAsSurtPrefixes(boolean) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
setSeedsRedirectNewSeeds(boolean) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
setSeedsRedirectNewSeeds(boolean) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
setSendBufferSize(int) - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnectionParams.setSendBufferSize(int), HttpConnection.getParams().
setSendConnectionClose(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setSendIfModifiedSince(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setSendIfNoneMatch(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setSendRange(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setSendReferer(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setServerCache(ServerCache) - Method in class org.archive.crawler.framework.CrawlController
 
setServerCache(ServerCache) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setServerCache(ServerCache) - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
 
setServerCache(ServerCache) - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
 
setServerCache(ServerCache) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
setServerCache(ServerCache) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
setServerCache(ServerCache) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setServerCache(ServerCache) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setServerCache(ServerCache) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
 
setServerCache(ServerCache) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
 
setServerCache(ServerCache) - Method in class org.archive.modules.fetcher.FetchDNS
 
setServerCache(ServerCache) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setServerCache(ServerCache) - Method in class org.archive.modules.fetcher.FetchWhois
 
setServerCache(ServerCache) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
setServerCache(ServerCache) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setServerMaxAllKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setServerMaxFetchResponses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setServerMaxFetchSuccesses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setServerMaxSuccessKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
setSessionBudget(int) - Method in class org.archive.crawler.frontier.WorkQueue
Set the session 'activity budget' to the given value.
setSheetOverlaysManager(SheetOverlaysManager) - Method in class org.archive.crawler.frontier.AbstractFrontier
 
setSheetOverlaysManager(SheetOverlaysManager) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
setSheetsByName(Map<String, Sheet>) - Method in class org.archive.crawler.spring.SheetOverlaysManager
Collect all Sheets, by beanName.
setShouldFetchBodyRule(DecideRule) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setShouldMasquerade(boolean) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
 
setShouldMasquerade(boolean) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
 
setShouldProcessRule(DecideRule) - Method in class org.archive.modules.Processor
 
setShouldReportAtEndOfCrawl(boolean) - Method in class org.archive.crawler.reporting.Report
 
setShouldReportDuringCrawl(boolean) - Method in class org.archive.crawler.reporting.Report
 
setSize(int) - Method in class org.archive.crawler.framework.ToePool
Change the number of ToeThreads.
setSizes(CrawlURI, Recorder) - Method in class org.archive.modules.fetcher.FetchHTTP
Update CrawlURI internal sizes based on current transaction (and in the case of 304s, history)
setSkipIdenticalDigests(boolean) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setSnoozeLongMs(long) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setSocketTimeout(int) - Method in class org.apache.commons.httpclient.HttpConnection
Sets SO_TIMEOUT value directly on the underlying socket.
setSortedDuplicates(boolean) - Method in class org.archive.bdb.BdbModule.BdbConfig
 
setSoTimeout(int) - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnectionParams.setSoTimeout(int), HttpConnection.getParams().
setSoTimeoutMs(int) - Method in class org.archive.modules.fetcher.FetchFTP
 
setSoTimeoutMs(int) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setSoTimeoutMs(int) - Method in class org.archive.modules.fetcher.FetchWhois
 
setSourceTag(String) - Method in class org.archive.modules.CrawlURI
 
setSourceTagSeeds(boolean) - Method in class org.archive.modules.seeds.SeedModule
 
setSpecialQueryTemplates(Map<String, String>) - Method in class org.archive.modules.fetcher.FetchWhois
 
setSslTrustLevel(ConfigurableX509TrustManager.TrustLevel) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setStaleCheckingEnabled(boolean) - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
Use HttpConnectionParams.setStaleCheckingEnabled(boolean), HttpConnection.getParams().
setStartNewFilesOnCheckpoint(boolean) - Method in class org.archive.modules.writer.WriterPoolProcessor
Whether to close output files and start new ones on checkpoint.
setStatisticsTracker(StatisticsTracker) - Method in class org.archive.crawler.framework.CrawlController
 
setStatisticsTracker(StatisticsTracker) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
setStatusCodes(List<Integer>) - Method in class org.archive.modules.deciderules.FetchStatusDecideRule
 
setStep(ToeThread.Step, String) - Method in class org.archive.crawler.framework.ToeThread
 
setStorePaths(List<ConfigPath>) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setStrictMode(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
Deprecated.
Use HttpParams.setParameter(String, Object) to exercise a more granular control over HTTP protocol strictness.
setStripRegex(String) - Method in class org.archive.modules.extractor.HTTPContentDigest
 
setSuccess(boolean) - Method in class org.archive.checkpointing.Checkpoint
 
setSuffixAtEnd(boolean) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setSupplementaryRule(DecideRule) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
setSurtPrefixes(List<String>) - Method in class org.archive.crawler.spring.SurtPrefixesSheetAssociation
 
setSurtsDumpFile(ConfigFile) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
setSurtsSource(ReadSource) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
setSurtsSourceFile(ConfigFile) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Deprecated. 
setTargetSheetNames(List<String>) - Method in class org.archive.crawler.spring.SheetAssociation
 
setTemplate(String) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
URI-building template.
setTemplate(String) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setTemplateConfiguration(Configuration) - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
setTemplateConfiguration(Configuration) - Method in class org.archive.crawler.restlet.EngineResource
 
setTemplateConfiguration(Configuration) - Method in class org.archive.crawler.restlet.JobResource
 
setTemplateConfiguration(Configuration) - Method in class org.archive.crawler.restlet.ScriptResource
 
setTextSource(ReadSource) - Method in class org.archive.modules.seeds.TextSeedModule
 
setThreadLogger(Logger) - Static method in class org.archive.crawler.reporting.AlertThreadGroup
set alternate temporary alert logger
setThreadNumber(int) - Method in class org.archive.modules.CrawlURI
Set the number of the ToeThread responsible for processing this uri.
setTimeoutSeconds(int) - Method in class org.archive.modules.fetcher.FetchFTP
 
setTimeoutSeconds(int) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setTooLongDirectory(String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setTotalBudget(long) - Method in class org.archive.crawler.frontier.WorkQueue
Set the total expenditure level allowable before queue is considered inherently 'over-budget'.
setTotalBytesWritten(long) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setTrackSeeds(boolean) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setTrackSources(boolean) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
setTransactional(boolean) - Method in class org.archive.bdb.BdbModule.BdbConfig
 
setTreatFramesAsEmbedLinks(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
 
setUnderscoreSet(List<String>) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
setUnresolvable(CrawlURI, CrawlHost) - Method in class org.archive.modules.fetcher.FetchDNS
 
setup(File, boolean) - Method in class org.archive.bdb.BdbModule
 
setUp() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
Sets up the ContentExtractorTestBase.extractor and ModuleTestBase#processorClass fields.
setUp() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
 
setUp() - Method in class org.archive.util.TmpDirTestCase
 
setupCheckpointTask() - Method in class org.archive.crawler.framework.CheckpointService
Setup checkpointTask according to current interval.
setupCopyEnvironment(File) - Static method in class org.archive.modules.recrawl.PersistProcessor
 
setupCopyEnvironment(File, boolean) - Static method in class org.archive.modules.recrawl.PersistProcessor
 
setupGlobalProperties(int) - Method in class org.archive.crawler.Heritrix
Setup global system properties that may be of use elsewhere.
setupLogs() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setUpperBound(Integer) - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
Sets the upper bound on the range of acceptable status codes.
setUpperBound(Integer) - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
Sets the upper bound on the range of acceptable status codes.
setUpperBound(long) - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
The rule will apply if the url has been fetched and content body length is less than or equal to this number of bytes.
setupPool(AtomicInteger) - Method in class org.archive.modules.writer.ARCWriterProcessor
 
setupPool(AtomicInteger) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
setupPool(AtomicInteger) - Method in class org.archive.modules.writer.WriterPoolProcessor
Set up pool of files.
setupServer(int, String, String, String, String) - Method in class org.archive.crawler.Heritrix
Create an HTTPS restlet Server instance matching the given parameters.
setupSimpleLog(String) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setupSimpleLog(String) - Method in interface org.archive.modules.SimpleFileLoggerProvider
 
setupToePool() - Method in class org.archive.crawler.framework.CrawlController
 
setURI(URI) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sets the URI for this method.
setUriErrorsLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
setUriPrecedencePolicy(UriPrecedencePolicy) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
setUriRegex(String) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
Regular expression against which to match the URI.
setUriUniqFilter(UriUniqFilter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
setUseHardLinkCheckpoints(boolean) - Method in class org.archive.bdb.BdbModule
 
setUseHeaderLength(boolean) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
setUseHTTP11(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setUsePreset(MatchesFilePatternDecideRule.Preset) - Method in class org.archive.modules.deciderules.MatchesFilePatternDecideRule
 
setUsePublicSuffixesRegex(boolean) - Method in class org.archive.crawler.processor.HashCrawlMapper
 
setUserAgent(String) - Method in class org.archive.modules.CrawlURI
Set the user agent to use when crawling this URI.
setUserAgentProvider(UserAgentProvider) - Method in class org.archive.modules.fetcher.FetchHTTP
 
setUserAgentTemplate(String) - Method in class org.archive.modules.CrawlMetadata
 
setUsername(String) - Method in class org.archive.modules.fetcher.FetchFTP
 
setUseSharedCache(boolean) - Method in class org.archive.bdb.BdbModule
 
setValidDateFormats(Collection<String>) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Sets the Collection of date patterns used for parsing.
setValidDateFormats(Collection<String>) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
 
setValidDateFormats(Collection<String>) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
Does nothing.
setValue(Object) - Method in class org.archive.io.ReadSourceEditor
 
setValue(Object) - Method in class org.archive.spring.ConfigPathEditor
 
setValue(String) - Method in class org.archive.spring.ConfigString
 
setVersion(int) - Method in class org.apache.commons.httpclient.Cookie
Sets the version of the cookie specification to which this cookie conforms.
setVia(UURI) - Method in class org.archive.modules.CrawlURI
 
setVirtualHost(String) - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
no longer applicable
setWakeTime(long) - Method in class org.archive.crawler.frontier.WorkQueue
 
setWriteBufferSize(int) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
setWriteMetadata(boolean) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
setWriteRequests(boolean) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
setWriteRevisitForIdenticalDigests(boolean) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
setWriteRevisitForNotModified(boolean) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
sharedEngine - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
 
sharedEngine - Variable in class org.archive.modules.ScriptedProcessor
 
Sheet - Class in org.archive.spring
Collection of overrides: alternative values for object properties that should apply in some contexts.
Sheet() - Constructor for class org.archive.spring.Sheet
 
SheetAssociation - Class in org.archive.crawler.spring
Represents target Sheets that should be associated with some grouping of URIs.
SheetAssociation() - Constructor for class org.archive.crawler.spring.SheetAssociation
 
sheetNamesBySurt - Variable in class org.archive.crawler.spring.SheetOverlaysManager
 
sheetOverlaysManager - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
sheetOverlaysManager - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
 
SheetOverlaysManager - Class in org.archive.crawler.spring
Manager which marks-up CrawlURIs with the names of all applicable Sheets, and returns overlay maps by name.
SheetOverlaysManager() - Constructor for class org.archive.crawler.spring.SheetOverlaysManager
 
sheetsByName - Variable in class org.archive.crawler.spring.SheetOverlaysManager
all sheets by (bean)name
shortMessage(BeansException) - Method in class org.archive.crawler.framework.CrawlJob
Return a short useful message for common BeansExceptions.
shortName - Variable in class org.archive.checkpointing.Checkpoint
 
shortReportLegend() - Method in class org.archive.crawler.framework.ToePool
 
shortReportLegend() - Method in class org.archive.crawler.framework.ToeThread
 
shortReportLegend() - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
shortReportLegend() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
shortReportLegend() - Method in class org.archive.crawler.frontier.WorkQueue
 
shortReportLegend() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
shortReportLegend() - Method in class org.archive.modules.CrawlURI
 
shortReportLegend() - Method in class org.archive.modules.fetcher.FetchStats
 
shortReportLegend() - Method in class org.archive.modules.ProcessorChain
 
shortReportLine() - Method in class org.archive.crawler.framework.ToeThread
 
shortReportLine() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
shortReportLine() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
shortReportLine() - Method in class org.archive.crawler.frontier.WorkQueue
 
shortReportLine() - Method in class org.archive.modules.CrawlURI
 
shortReportLine() - Method in class org.archive.modules.fetcher.FetchStats
 
shortReportLine(Reporter) - Static method in class org.archive.util.ReportUtils
Utility method to get a String shortReportLine from Reporter
shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
 
shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
 
shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
 
shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
shortReportLineTo(PrintWriter) - Method in class org.archive.modules.CrawlURI
 
shortReportLineTo(PrintWriter) - Method in class org.archive.modules.fetcher.FetchStats
 
shortReportLineTo(PrintWriter) - Method in class org.archive.modules.ProcessorChain
 
shortReportMap() - Method in class org.archive.crawler.framework.ToePool
 
shortReportMap() - Method in class org.archive.crawler.framework.ToeThread
 
shortReportMap() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
shortReportMap() - Method in class org.archive.crawler.frontier.WorkQueue
 
shortReportMap() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
shortReportMap() - Method in class org.archive.modules.CrawlURI
 
shortReportMap() - Method in class org.archive.modules.fetcher.FetchStats
 
shortReportMap() - Method in class org.archive.modules.ProcessorChain
 
shouldCloseConnection(HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Tests if the connection should be closed after the method has been executed.
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
Determines if otherwise valid URIs should have links extracted or not.
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorCSS
 
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorDOC
 
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTML
 
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorJS
 
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorPDF
 
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorSWF
 
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorUniversal
 
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorXML
 
shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.TrapSuppressExtractor
 
shouldLoad(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
Whether the current CrawlURI's state should be loaded
shouldMasquerade - Variable in class org.archive.modules.net.FirstNamedRobotsPolicy
whether to adopt the user-agent that is allowed for the fetch
shouldMasquerade - Variable in class org.archive.modules.net.MostFavoredRobotsPolicy
whether to adopt the user-agent that is allowed for the fetch
shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
Deprecated.
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.CandidateScoper
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
shouldProcess(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
Determines if links should be extracted from the given URI.
shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTTP
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorURI
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.HTTPContentDigest
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchDNS
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchFTP
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
Can this processor fetch the given CrawlURI.
shouldProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.forms.ExtractorHTMLForms
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.forms.FormLoginProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.Processor
Determines whether the given uri should be processed by this processor.
shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryLoader
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryStorer
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistStoreProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.ScriptedProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
shouldProcess(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
 
shouldRetire() - Method in class org.archive.crawler.framework.ToeThread
Whether this thread should cleanly retire at the earliest opportunity.
shouldStore(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
Whether the current CrawlURI's state should be persisted (to log or direct to database)
shouldWrite(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
Whether the given CrawlURI should be written to archive files.
shutdown() - Method in class org.archive.crawler.framework.Engine
 
shutdownOutput() - Method in class org.apache.commons.httpclient.HttpConnection
Deprecated.
unused
SimpleCookieStorage - Class in org.archive.modules.fetcher
 
SimpleCookieStorage() - Constructor for class org.archive.modules.fetcher.SimpleCookieStorage
 
SimpleFileLoggerProvider - Interface in org.archive.modules
 
SimplePrecedenceProvider - Class in org.archive.crawler.frontier.precedence
The most simple precedence provider, simply wrapping a resettable integer value.
SimplePrecedenceProvider(int) - Constructor for class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
 
size() - Method in class org.archive.bdb.StoredQueue
 
size() - Method in class org.archive.crawler.util.TopNSet
 
size() - Method in class org.archive.modules.ProcessorChain
 
size() - Method in interface org.archive.util.BloomFilter
The number of character sequences in the filter (considered to be the number of add()s that returned 'true')
size - Variable in class org.archive.util.BloomFilter64bit
The number of elements currently in the filter.
size() - Method in class org.archive.util.BloomFilter64bit
The number of character sequences in the filter.
size() - Method in class org.archive.util.ObjectIdentityBdbCache
 
size() - Method in class org.archive.util.ObjectIdentityBdbManualCache
 
size() - Method in interface org.archive.util.ObjectIdentityCache
count of name-to-object contained
size() - Method in class org.archive.util.ObjectIdentityMemCache
 
size() - Method in class org.archive.util.Transform
 
sizeTotalsReport() - Method in class org.archive.crawler.framework.CrawlJob
 
sizeTotalsReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
skip(int) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
 
skip(long) - Method in class org.archive.util.ms.BlockInputStream
 
skipIdenticalDigests - Variable in class org.archive.modules.writer.WriterPoolProcessor
Whether to skip the writing of a record when URI history information is available and indicates the prior fetch had an identical content digest.
slots - Variable in class org.archive.util.fingerprint.MemLongFPSet
 
smallestKnownKey - Variable in class org.archive.crawler.util.TopNSet
 
smallestKnownValue - Variable in class org.archive.crawler.util.TopNSet
 
smear - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
 
sn - Variable in class org.archive.bdb.BdbModule
uniqueness serial number for temp map databases
snapshot - Variable in class org.archive.crawler.event.StatSnapshotEvent
 
snapshots - Variable in class org.archive.crawler.reporting.StatisticsTracker
snapshots of crawl tallies and rates
snapshotToLaunchDir(File) - Method in class org.archive.spring.ConfigPathConfigurer
 
snoozedClassQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
All per-class queues held in snoozed state, sorted by wake time.
snoozedOverflow - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
snoozedOverflowCount - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
 
snoozeLongMs - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
When a snooze target for a queue is longer than this amount, the queue will be "long snoozed" instead of "short snoozed".
socketFactory - Variable in class org.archive.modules.fetcher.FetchFTP
 
sortedDuplicates - Variable in class org.archive.bdb.BdbModule.BdbConfig
 
sortShiftStatusCode() - Method in class org.archive.crawler.reporting.SeedRecord
 
sourceHostDistribution - Variable in class org.archive.crawler.reporting.StatisticsTracker
Keep track of URL counts per host per seed
sourceOrderXmlDom - Variable in class org.archive.crawler.migrate.MigrateH1to3Tool
 
sourceTagSeeds - Variable in class org.archive.modules.seeds.SeedModule
Whether to tag seeds with their own URI as a heritable 'source' String, which will be carried-forward to all URIs discovered on paths originating from that seed.
SourceTagsReport - Class in org.archive.crawler.reporting
The "Source Report", tallies of source tags (usually seeds) by host.
SourceTagsReport() - Constructor for class org.archive.crawler.reporting.SourceTagsReport
 
specialQueryTemplates - Variable in class org.archive.modules.fetcher.FetchWhois
 
SPECULATIVE_MISC - Static variable in class org.archive.modules.extractor.LinkContext
Stand-in value for speculative/aggressively extracted urls without other context.
speculativeFixup(String, UURI) - Static method in class org.archive.util.UriUtils
Perform additional fixup of likely-URI Strings
splitH1userAgent(String, StringBuilder) - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
 
st.ata.util - package st.ata.util
 
STANDARD_POLICIES - Static variable in class org.archive.modules.net.RobotsPolicy
 
start() - Method in class org.archive.bdb.BdbModule
 
start() - Method in class org.archive.crawler.framework.ActionDirectory
 
start() - Method in class org.archive.crawler.framework.CheckpointService
 
start() - Method in class org.archive.crawler.framework.CrawlController
 
start() - Method in class org.archive.crawler.framework.Scoper
 
start() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
start() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
start() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
start() - Method in class org.archive.crawler.processor.CrawlMapper
 
start() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
 
start() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
start() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
start() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
start() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
start() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
start() - Method in class org.archive.modules.fetcher.FetchHTTP
 
start() - Method in class org.archive.modules.fetcher.FetchWhois
 
start() - Method in class org.archive.modules.net.BdbServerCache
 
start() - Method in class org.archive.modules.Processor
 
start() - Method in class org.archive.modules.ProcessorChain
 
start() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
start() - Method in class org.archive.modules.recrawl.PersistLoadProcessor
 
start() - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
start() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
 
start() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
start() - Method in class org.archive.spring.PathSharingContext
 
startCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
 
startCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
Note a checkpoint is about to begin.
startCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
 
startCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
 
startCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
startCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
startCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
startCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
startCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
 
startCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
 
startCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
 
startCheckpoint(Checkpoint) - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
startContext() - Method in class org.archive.crawler.framework.CrawlJob
Start the context, catching and reporting any BeansExceptions.
startManagerThread() - Method in class org.archive.crawler.frontier.AbstractFrontier
Start the dedicated thread with an independent view of the frontier's state.
startNewFilesOnCheckpoint - Variable in class org.archive.modules.writer.WriterPoolProcessor
 
state - Variable in class org.archive.crawler.event.CrawlStateEvent
 
StatisticsLogFormatter - Class in org.archive.crawler.io
 
StatisticsLogFormatter() - Constructor for class org.archive.crawler.io.StatisticsLogFormatter
 
statisticsTracker - Variable in class org.archive.crawler.framework.CrawlController
Statistics tracking modules.
statisticsTracker - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
 
StatisticsTracker - Class in org.archive.crawler.reporting
This is an implementation of the AbstractTracker.
StatisticsTracker() - Constructor for class org.archive.crawler.reporting.StatisticsTracker
 
StatSnapshotEvent - Class in org.archive.crawler.event
ApplicationEvent published when the StatisticsTracker takes its sample of various statistics.
StatSnapshotEvent(StatisticsTracker, CrawlStatSnapshot) - Constructor for class org.archive.crawler.event.StatSnapshotEvent
 
STATUS_CODE_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
statusCodeDistribution - Variable in class org.archive.crawler.reporting.StatisticsTracker
Keep track of fetch status codes
statusCodes - Variable in class org.archive.modules.deciderules.FetchStatusDecideRule
 
std24 - Static variable in class st.ata.util.FPGenerator
A standard 24-bit fingerprint generator using polynomials[0][24].
std32 - Static variable in class st.ata.util.FPGenerator
A standard 32-bit fingerprint generator using polynomials[0][32].
std40 - Static variable in class st.ata.util.FPGenerator
A standard 40-bit fingerprint generator using polynomials[0][40].
std64 - Static variable in class st.ata.util.FPGenerator
The standard 64-bit fingerprint generator using polynomials[0][64].
stop() - Method in class org.archive.bdb.BdbModule
 
stop() - Method in class org.archive.crawler.framework.ActionDirectory
 
stop() - Method in class org.archive.crawler.framework.CheckpointService
 
stop() - Method in class org.archive.crawler.framework.CrawlController
 
stop() - Method in class org.archive.crawler.framework.Scoper
 
stop() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
stop() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
stop() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
 
stop() - Method in class org.archive.crawler.processor.CrawlMapper
 
stop() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
 
stop() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
stop() - Method in class org.archive.crawler.util.BdbUriUniqFilter
 
stop() - Method in class org.archive.modules.deciderules.DecideRuleSequence
 
stop() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
 
stop() - Method in class org.archive.modules.fetcher.FetchHTTP
 
stop() - Method in class org.archive.modules.fetcher.FetchWhois
 
stop() - Method in class org.archive.modules.net.BdbServerCache
 
stop() - Method in class org.archive.modules.Processor
 
stop() - Method in class org.archive.modules.ProcessorChain
 
stop() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
stop() - Method in class org.archive.modules.recrawl.PersistLogProcessor
 
stop() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
 
stop() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
store - Variable in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
store(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractContentDigestHistory
Stores curi.getContentDigestHistory() for the key persistKeyFor(curi).
store - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
 
store(CrawlURI) - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
 
store - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
 
storeDNSRecord(CrawlURI, String, CrawlHost, Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
 
StoredQueue<E extends Serializable> - Class in org.archive.bdb
Queue backed by a JE Collections StoredSortedMap.
StoredQueue(Database, Class<E>, StoredClassCatalog) - Constructor for class org.archive.bdb.StoredQueue
Create a StoredQueue backed by the given Database.
storePaths - Variable in class org.archive.modules.writer.WriterPoolProcessor
Where to save files.
STRING_URI_DETECTOR - Static variable in class org.archive.util.UriUtils
 
STRING_URI_DETECTOR_EXCEPTIONS - Static variable in class org.archive.util.UriUtils
 
StringExtractorTestBase - Class in org.archive.modules.extractor
 
StringExtractorTestBase() - Constructor for class org.archive.modules.extractor.StringExtractorTestBase
 
StringExtractorTestBase.TestData - Class in org.archive.modules.extractor
 
StringExtractorTestBase.TestData(CrawlURI, Link) - Constructor for class org.archive.modules.extractor.StringExtractorTestBase.TestData
 
StripExtraSlashes - Class in org.archive.modules.canonicalize
Strip any extra slashes, '/', found in the path.
StripExtraSlashes() - Constructor for class org.archive.modules.canonicalize.StripExtraSlashes
 
StripSessionCFIDs - Class in org.archive.modules.canonicalize
Strip cold fusion session ids.
StripSessionCFIDs() - Constructor for class org.archive.modules.canonicalize.StripSessionCFIDs
 
StripSessionIDs - Class in org.archive.modules.canonicalize
Strip known session ids.
StripSessionIDs() - Constructor for class org.archive.modules.canonicalize.StripSessionIDs
 
stripToMinimal() - Method in class org.archive.modules.CrawlURI
Remove all attributes set on this uri.
StripUserinfoRule - Class in org.archive.modules.canonicalize
Strip any 'userinfo' found on http/https URLs.
StripUserinfoRule() - Constructor for class org.archive.modules.canonicalize.StripUserinfoRule
 
StripWWWNRule - Class in org.archive.modules.canonicalize
Strip any 'www[0-9]*' found on http/https URLs IF they have some path/query component (content after third slash).
StripWWWNRule() - Constructor for class org.archive.modules.canonicalize.StripWWWNRule
 
StripWWWRule - Class in org.archive.modules.canonicalize
Strip any 'www' found on http/https URLs, IF they have some path/query component (content after third slash).
StripWWWRule() - Constructor for class org.archive.modules.canonicalize.StripWWWRule
 
SUBARRAY_LENGTH_IN_LONGS - Static variable in class org.archive.util.BloomFilter64bit
number of longs in one subarray
SUBARRAY_MASK - Static variable in class org.archive.util.BloomFilter64bit
mask for lowest SUBARRAY_POWER_OF_TWO bits
SUBARRAY_POWER_OF_TWO - Static variable in class org.archive.util.BloomFilter64bit
power-of-two to use as maximum size of bitfield subarrays
subclasses(Collection<? extends Object>, Class<Target>) - Static method in class org.archive.util.Transform
Returns a transform containing only objects of a given class.
submitStatusFor(String) - Method in class org.archive.modules.forms.FormLoginProcessor
 
subset(CrawlURI, Class<?>) - Method in class org.archive.modules.credential.CredentialStore
Return set made up of all credentials of the passed type.
subset(CrawlURI, Class<?>, String) - Method in class org.archive.modules.credential.CredentialStore
Return set made up of all credentials of the passed type.
substats - Variable in class org.archive.crawler.frontier.WorkQueue
Substats for all CrawlURIs in this group
substats - Variable in class org.archive.modules.net.CrawlHost
 
substats - Variable in class org.archive.modules.net.CrawlServer
 
subtract(Histotable<K>) - Method in class org.archive.util.Histotable
 
succeededFetchCount() - Method in interface org.archive.crawler.framework.Frontier
Number of successfully processed URIs.
succeededFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
 
succeededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
(non-Javadoc)
success - Variable in class org.archive.checkpointing.Checkpoint
 
SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
successBytes - Variable in class org.archive.modules.fetcher.FetchStats
 
SuccessCountsQueuePrecedencePolicy - Class in org.archive.crawler.frontier.precedence
QueuePrecedencePolicy that sets a uri-queue's precedence to a configured base value, then lowers its precedence with each tier of successful URIs completed.
SuccessCountsQueuePrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
 
SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
 
suffixAtEnd - Variable in class org.archive.modules.writer.MirrorWriterProcessor
If true, the suffix is placed at the end of the path, after the query (if any).
summary() - Method in class org.archive.crawler.util.CrawledBytesHistotable
 
SupplementaryLinksScoper - Class in org.archive.crawler.postprocessor
Run CrawlURI links carried in the passed CrawlURI through a filter and 'handle' rejections.
SupplementaryLinksScoper() - Constructor for class org.archive.crawler.postprocessor.SupplementaryLinksScoper
 
Supplier<V> - Class in org.archive.util
Class for optionally providing one instance of the parameterized type.
Supplier() - Constructor for class org.archive.util.Supplier
 
Supplier(V) - Constructor for class org.archive.util.Supplier
 
supports(Class<?>) - Method in class org.archive.crawler.framework.CheckpointValidator
 
supports(Class<?>) - Method in class org.archive.spring.BeanFieldsPatternValidator
 
supportsCustomEditor() - Method in class org.archive.io.ReadSourceEditor
 
supportsCustomEditor() - Method in class org.archive.spring.ConfigPathEditor
 
SurtAuthorityQueueAssignmentPolicy - Class in org.archive.crawler.frontier
SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname.
SurtAuthorityQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
 
SurtPrefixedDecideRule - Class in org.archive.modules.deciderules.surt
Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set.
SurtPrefixedDecideRule() - Constructor for class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
surtPrefixes - Variable in class org.archive.crawler.spring.SurtPrefixesSheetAssociation
 
surtPrefixes - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
 
SurtPrefixesSheetAssociation - Class in org.archive.crawler.spring
SheetAssociation applied on the basis of matching SURT prefixes.
SurtPrefixesSheetAssociation() - Constructor for class org.archive.crawler.spring.SurtPrefixesSheetAssociation
 
surtsDumpFile - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Dump file to save SURT prefixes actually used: Useful debugging SURTs.
surtsSource - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Text from which to infer SURT prefixes.
SURTTokenizer - Class in org.archive.surt
provides iterative Url reduction for prefix matching to find ever coarser grained URL-specific configuration.
SURTTokenizer(String) - Constructor for class org.archive.surt.SURTTokenizer
constructor
symlink(String, String) - Method in interface org.archive.util.CLibrary
 
sync() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
Method used by BdbFrontier during checkpointing.
sync() - Method in class org.archive.util.ObjectIdentityBdbCache
Sync all in-memory map entries to backing disk store.
sync() - Method in class org.archive.util.ObjectIdentityBdbManualCache
Sync all in-memory map entries to backing disk store.
sync() - Method in interface org.archive.util.ObjectIdentityCache
force the persistent backend, if any, to be updated with all live object state
sync() - Method in class org.archive.util.ObjectIdentityMemCache
 

T

tagDefineButton(int, Vector) - Method in class org.archive.modules.extractor.CustomSWFTags
 
tagDefineButton2(int, boolean, Vector) - Method in class org.archive.modules.extractor.CustomSWFTags
 
tagDefineSprite(int) - Method in class org.archive.modules.extractor.CustomSWFTags
 
tagDoAction() - Method in class org.archive.modules.extractor.CustomSWFTags
 
tagDoInActions(int) - Method in class org.archive.modules.extractor.CustomSWFTags
 
tagPlaceObject2(boolean, int, int, int, Matrix, AlphaTransform, int, String, int) - Method in class org.archive.modules.extractor.CustomSWFTags
 
tail(String) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail' command
tail(String, int) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail -n' command
tail(RandomAccessFile, int) - Static method in class org.archive.crawler.util.LogReader
Implementation of a unix-like 'tail -n' command
tailFilter - Variable in class org.archive.crawler.restlet.EnhDirectory
 
tailIndex - Variable in class org.archive.bdb.StoredQueue
 
tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.crawler.frontier.AbstractFrontier
Report CrawlURI to each of the three 'substats' accumulators (group/queue, server, host) for a given stage.
tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
 
tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
 
tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.crawler.frontier.WorkQueue
 
tally(CrawlURI, FetchStats.Stage) - Method in interface org.archive.modules.fetcher.FetchStats.CollectsFetchStats
 
tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.modules.fetcher.FetchStats
 
tally(K) - Method in class org.archive.util.Histotable
Record one more occurence of the given object key.
tally(K, long) - Method in class org.archive.util.Histotable
Record count more occurence(s) of the given object key.
tallyCurrentPause() - Method in class org.archive.crawler.reporting.StatisticsTracker
For a current pause (if any), add paused time to total and reset
tallySeeds() - Method in class org.archive.crawler.reporting.StatisticsTracker
 
targetSheetNames - Variable in class org.archive.crawler.spring.SheetAssociation
 
targetSize - Variable in class org.archive.crawler.framework.ToePool
 
targetState - Variable in class org.archive.crawler.frontier.AbstractFrontier
Frontier.state that manager thread should seek to reach
teardown() - Method in class org.archive.crawler.framework.CrawlJob
Ensure a fresh start for any configuration changes or relaunches, by stopping and discarding an existing ApplicationContext.
TempDirProvider - Interface in org.archive.modules.extractor
 
template - Variable in class org.archive.modules.writer.WriterPoolProcessor
Template from which a filename is interpolated.
terminate() - Method in class org.archive.crawler.framework.CrawlJob
 
terminate() - Method in interface org.archive.crawler.framework.Frontier
Notify Frontier that it should end the crawl, giving any worker ToeThread that askss for a next() an EndedException.
terminate() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
test(int) - Method in class org.archive.modules.deciderules.ResourceLongerThanDecideRule
 
test(int) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
 
test(BeanWrapperImpl, Errors) - Method in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
 
TEST_TMP_SYSTEM_PROPERTY_NAME - Static variable in class org.archive.util.TmpDirTestCase
Name of the system property that holds pointer to tmp directory into which we can safely write files.
testAdd() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check that we can add fingerprints
testContains() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check that contains() does what we expect
testCount() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check count works ok
testExtraction() - Method in class org.archive.modules.extractor.StringExtractorTestBase
Tests each text/URI pair in the test data array.
testFinished() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
Tests that a URI whose linkExtractionFinished flag has been set has no links extracted.
testRemove() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
test remove() works as expected
testSerialization(Object) - Static method in class org.archive.util.TestUtils
 
testSerializationIfAppropriate() - Method in class org.archive.state.ModuleTestBase
Tests that the module can be serialized.
TestUtils - Class in org.archive.util
Utility methods useful in testing situations.
TestUtils() - Constructor for class org.archive.util.TestUtils
 
testWithZero() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
check we can call add/remove/contains() with 0 as a value
testZeroContent() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
Tests that a URI with a zero content length has no links extracted.
TextSeedModule - Class in org.archive.modules.seeds
Module that announces a list of seeds from a text source (such as a ConfigFile or ConfigString), and provides a mechanism for adding seeds after a crawl has begun.
TextSeedModule() - Constructor for class org.archive.modules.seeds.TextSeedModule
 
textSource - Variable in class org.archive.modules.seeds.TextSeedModule
Text from which to extract seeds
threadBuffer - Variable in class org.archive.bdb.KryoBinding
 
threadCount() - Method in class org.archive.crawler.reporting.StatisticsTracker
Get the total number of ToeThreads (sleeping and active)
threadEngine - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
 
threadEngine - Variable in class org.archive.modules.ScriptedProcessor
 
threadLogger - Static variable in class org.archive.crawler.reporting.AlertThreadGroup
 
threadOverrides - Static variable in class org.archive.spring.KeyedProperties
ThreadLocal (contextual) collection of pushed override maps
threadReport() - Method in class org.archive.crawler.framework.CrawlJob
 
threadReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
timer - Variable in class org.archive.crawler.framework.CheckpointService
service for auto-checkpoint tasks at an interval
TIMER_TRUNC - Static variable in interface org.archive.modules.CoreAttributeConstants
 
TIMER_TRUNC - Static variable in class org.archive.modules.fetcher.FetchErrors
 
timestamp - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
timestamp_interval - Variable in class org.archive.io.CrawlerJournal
number of lines between timestamps
TLDs - Static variable in class org.archive.modules.extractor.ExtractorUniversal
Matches any string that begins with a TLD (no .) followed by a '/' slash or end of string.
tmpDir() - Static method in class org.archive.util.TestUtils
 
tmpDir() - Static method in class org.archive.util.TmpDirTestCase
 
TmpDirTestCase - Class in org.archive.util
Base class for TestCases that want access to a tmp dir for the writing of files.
TmpDirTestCase() - Constructor for class org.archive.util.TmpDirTestCase
 
TmpDirTestCase(String) - Constructor for class org.archive.util.TmpDirTestCase
 
toCheckpointJson() - Method in class org.archive.modules.extractor.Extractor
 
toCheckpointJson() - Method in class org.archive.modules.forms.FormLoginProcessor
 
toCheckpointJson() - Method in class org.archive.modules.Processor
Return a JSONObject of current stat that can be consulted on recovery to restore necessary values.
toCheckpointJson() - Method in class org.archive.modules.writer.WARCWriterProcessor
 
toCheckpointJson() - Method in class org.archive.modules.writer.WriterPoolProcessor
 
toDatabaseConfig() - Method in class org.archive.bdb.BdbModule.BdbConfig
 
ToePool - Class in org.archive.crawler.framework
A collection of ToeThreads.
ToePool(AlertThreadGroup, CrawlController) - Constructor for class org.archive.crawler.framework.ToePool
Constructor.
ToeThread - Class in org.archive.crawler.framework
One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise.
ToeThread(ToePool, int) - Constructor for class org.archive.crawler.framework.ToeThread
Create a ToeThread
ToeThread.Step - Enum in org.archive.crawler.framework
 
ToeThreadsReport - Class in org.archive.crawler.reporting
Traditional report of all ToeThread call-stacks, as often consulted to diagnose live crawl issues.
ToeThreadsReport() - Constructor for class org.archive.crawler.reporting.ToeThreadsReport
 
toExternalForm() - Method in class org.apache.commons.httpclient.Cookie
Return a textual representation of the cookie.
tooLongDirectory - Variable in class org.archive.modules.writer.MirrorWriterProcessor
If all the directories in the URI would exceed, or come close to exceeding, the file system maximum path length, then they are all replaced by this.
TooManyHopsDecideRule - Class in org.archive.modules.deciderules
Rule REJECTs any CrawlURIs whose total number of hops (length of the hopsPath string, traversed links of any type) is over a threshold.
TooManyHopsDecideRule() - Constructor for class org.archive.modules.deciderules.TooManyHopsDecideRule
Usual constructor.
TooManyPathSegmentsDecideRule - Class in org.archive.modules.deciderules
Rule REJECTs any CrawlURIs whose total number of path-segments (as indicated by the count of '/' characters not including the first '//') is over a given threshold.
TooManyPathSegmentsDecideRule() - Constructor for class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
Usual constructor.
TopNSet - Class in org.archive.crawler.util
Counting Set which only remembers the 'top N' of all String values reported (with counts) to it.
TopNSet(int) - Constructor for class org.archive.crawler.util.TopNSet
 
toString() - Method in class org.apache.commons.httpclient.Cookie
Return a textual representation of the cookie.
toString() - Method in class org.apache.commons.httpclient.HttpState
Returns a string representation of this HTTP state.
toString() - Method in class org.archive.crawler.frontier.WorkQueue
 
toString() - Method in class org.archive.modules.CrawlURI
 
toString() - Method in class org.archive.modules.extractor.HTMLLinkContext
 
toString() - Method in class org.archive.modules.extractor.Link
 
toString() - Method in class org.archive.modules.extractor.LinkContext.SimpleLinkContext
 
toString() - Method in class org.archive.modules.forms.HTMLForm.FormInput
 
toString() - Method in class org.archive.modules.forms.HTMLForm
 
toString() - Method in class org.archive.modules.net.CrawlHost
 
toString() - Method in class org.archive.modules.net.CrawlServer
 
toString() - Method in class org.archive.util.ms.Piece
 
toString() - Method in class org.archive.util.PaddingStringBuffer
 
totalBudget - Variable in class org.archive.crawler.frontier.WorkQueue
Total to spend on this queue over its lifetime
totalBytes - Variable in class org.archive.modules.fetcher.FetchStats
 
totalCount() - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
 
totalExpenditure - Variable in class org.archive.crawler.frontier.WorkQueue
Running tally of total expenditures on this queue
totalKiBPerSec - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
totalProcessedBytes - Variable in class org.archive.crawler.frontier.AbstractFrontier
Used when bandwidth constraint are used.
totalScheduled - Variable in class org.archive.modules.fetcher.FetchStats
 
trackSeeds - Variable in class org.archive.crawler.reporting.StatisticsTracker
Whether to maintain seed disposition records (expensive in crawls with millions of seeds)
trackSources - Variable in class org.archive.crawler.reporting.StatisticsTracker
Whether to maintain hosts-per-source-tag records for; very expensive in crawls with large numbers of source-tags (seeds) or large crawls over many hosts
transactional - Variable in class org.archive.bdb.BdbModule.BdbConfig
 
TransclusionDecideRule - Class in org.archive.modules.deciderules
Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see CandidateURI#getPathFromSeed()) ends with at least one, but not more than, the given number of non-navlink ('L') hops.
TransclusionDecideRule() - Constructor for class org.archive.modules.deciderules.TransclusionDecideRule
Usual constructor.
transform(File, File, boolean) - Method in class org.archive.io.Arc2Warc
 
transform(ARCReader, File) - Method in class org.archive.io.Arc2Warc
 
transform(File, File, String, String, boolean) - Method in class org.archive.io.Warc2Arc
 
transform(WARCReader, ARCWriter) - Method in class org.archive.io.Warc2Arc
 
Transform<Original,Transformed> - Class in org.archive.util
A transformation of a collection.
Transform(Collection<? extends Original>, Transformer<Original, Transformed>) - Constructor for class org.archive.util.Transform
Constructor.
transform(Original) - Method in interface org.archive.util.Transformer
Transforms the given object.
Transformer<Original,Transformed> - Interface in org.archive.util
Transforms objects from one thing into another.
TrapSuppressExtractor - Class in org.archive.modules.extractor
Pseudo-extractor that suppresses link-extraction of likely trap pages, by noticing when content's digest is identical to that of its 'via'.
TrapSuppressExtractor() - Constructor for class org.archive.modules.extractor.TrapSuppressExtractor
Usual constructor.
TRUNC_SUFFIX - Static variable in interface org.archive.modules.CoreAttributeConstants
Fetch truncation codes present in CrawlURI annotations.
TRUNC_SUFFIX - Static variable in class org.archive.modules.fetcher.FetchErrors
Fetch truncation codes present in ProcessorURI annotations.
tryAsScript(File, String) - Method in class org.archive.crawler.framework.ActionDirectory
Try the actionFile as a script, deducing the proper scripting language from its file extension.
tunnelCreated() - Method in class org.apache.commons.httpclient.HttpConnection
Instructs the proxy to establish a secure tunnel to the host.
type - Variable in class org.archive.modules.forms.HTMLForm.FormInput
 

U

ULTRA_SUFFIX_WHOIS_SERVER - Static variable in class org.archive.modules.fetcher.FetchWhois
 
unbind(String) - Method in class org.archive.crawler.restlet.ScriptingConsole
 
unbindObjectName(Context, ObjectName) - Static method in class org.archive.util.JndiUtils
 
UNCALCULATED - Static variable in class org.archive.modules.CrawlURI
 
underscoreSet - Variable in class org.archive.modules.writer.MirrorWriterProcessor
If a directory name appears (case-insensitive) in this list then an underscore is placed before it.
unescape(String) - Static method in class org.archive.util.JavaLiterals
 
UnitCostAssignmentPolicy - Class in org.archive.crawler.frontier
A CostAssignment policy that uses a constant value of 1 for all CrawlURIs.
UnitCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.UnitCostAssignmentPolicy
 
unpause() - Method in interface org.archive.crawler.framework.Frontier
Resumes the release of URIs to crawl, allowing worker ToeThreads to proceed.
unpause() - Method in class org.archive.crawler.frontier.AbstractFrontier
 
unpeek(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Forgive the peek, allowing a subsequent peek to return a different item.
update(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
Update the given CrawlURI, which should already be present.
update(String, long) - Method in class org.archive.crawler.util.TopNSet
Update the given String key with a new total value, perhaps displacing an existing top-valued entry, and updating the fields recording max/min keys in any case.
updateBounds() - Method in class org.archive.crawler.util.TopNSet
After an operation invalidating the previous largest/smallest entry, find the new largest/smallest.
updateDescriptor(PropertyDescriptor) - Method in interface org.archive.crawler.restlet.DescriptorUpdater
 
updateGeneration(String) - Method in class org.archive.crawler.processor.CrawlMapper
Close and mark as finished all existing diversion logs, and arrange for new logs to use the new generation prefix.
updateHighestWaiting(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Recalculate the value of thehighest-precedence queue waiting among inactive queues.
updateMetadataAfterWrite(CrawlURI, WARCWriter, long) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
updateRobots(CrawlURI) - Method in class org.archive.modules.net.CrawlServer
Update the robotstxt
updateWith(CrawlURI, String) - Method in class org.archive.crawler.reporting.SeedRecord
A later/repeat report of the same seed has arrived; update with latest.
uri - Variable in class org.archive.modules.extractor.StringExtractorTestBase.TestData
 
URI_HISTORY_DBNAME - Static variable in class org.archive.modules.recrawl.PersistProcessor
name of history Database
URIAuthorityBasedQueueAssignmentPolicy - Class in org.archive.crawler.frontier
SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname.
URIAuthorityBasedQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
 
UriCanonicalizationPolicy - Class in org.archive.modules.canonicalize
URI Canonicalizatioon Policy
UriCanonicalizationPolicy() - Constructor for class org.archive.modules.canonicalize.UriCanonicalizationPolicy
 
uriCount - Variable in class org.archive.modules.Processor
The number of URIs processed by this processor.
UriErrorFormatter - Class in org.archive.crawler.io
Formatter for 'uri-errors.log', of URIs so malformed they could not be instantiated.
UriErrorFormatter() - Constructor for class org.archive.crawler.io.UriErrorFormatter
 
UriErrorLoggerModule - Interface in org.archive.modules.extractor
 
uriErrorsLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
 
UriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
Superclass for URI precedence policies, which set a integer precedence value on individual URIs when they are first submitted to a frontier for scheduling.
UriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.UriPrecedencePolicy
 
UriProcessingFormatter - Class in org.archive.crawler.io
Formatter for 'crawl.log'.
UriProcessingFormatter(boolean) - Constructor for class org.archive.crawler.io.UriProcessingFormatter
 
uriScheduled(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
 
uriScheduled(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.CostUriPrecedencePolicy
 
uriScheduled(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
 
uriScheduled(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.UriPrecedencePolicy
Add a precedence value to the supplied CrawlURI, which is being scheduled onto a frontier queue for the first time.
urisFetched - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
 
uriTotalsReport() - Method in class org.archive.crawler.framework.CrawlJob
 
uriTotalsReportData() - Method in class org.archive.crawler.framework.CrawlJob
 
UriUniqFilter - Interface in org.archive.crawler.datamodel
A UriUniqFilter passes URI objects to a destination (receiver) if the passed URI object has not been previously seen.
uriUniqFilter - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
The UriUniqFilter to use, tracking those UURIs which are already in-process (or processed), and thus should not be rescheduled.
UriUniqFilter.CrawlUriReceiver - Interface in org.archive.crawler.datamodel
URIs that pass the filter (are new / unique / not already-seen) are passed to this object, typically a frontier.
UriUtils - Class in org.archive.util
URI-related utilities.
UriUtils() - Constructor for class org.archive.util.UriUtils
 
URL_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
 
useAdhocKeystore(PrintStream) - Method in class org.archive.crawler.Heritrix
Perform preparation to use an ad-hoc, created-as-necessary certificate/keystore for HTTPS access.
useHardLinkCheckpoints - Variable in class org.archive.bdb.BdbModule
Whether to use hard-links to log files to collect/retain the BDB log files needed for a checkpoint.
UserAgentProvider - Interface in org.archive.modules.fetcher
 
useSharedCache - Variable in class org.archive.bdb.BdbModule
 
UURI - Class in org.archive.net
Usable URI.
UURI(String, boolean, String) - Constructor for class org.archive.net.UURI
 
UURI(UsableURI, UsableURI) - Constructor for class org.archive.net.UURI
 
UURI() - Constructor for class org.archive.net.UURI
 
UURIFactory - Class in org.archive.net
Factory that returns UURIs.
UURIFactory() - Constructor for class org.archive.net.UURIFactory
 

V

VALID_DF_OUTPUT - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
Deprecated.
 
validate(String, int, String, boolean, Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
Validate the cookie according to validation rules defined by the cookie specification.
validate(String, int, String, boolean, Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
Performs most common Cookie validation
validate(String, int, String, boolean, Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
Does nothing.
validate() - Method in class org.apache.commons.httpclient.HttpMethodBase
Returns true the method is ready to execute, false otherwise.
validate(Object, Errors) - Method in class org.archive.crawler.framework.CheckpointValidator
 
validate(Pattern, String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
 
validate(Object, Errors) - Method in class org.archive.spring.BeanFieldsPatternValidator
 
validate() - Method in class org.archive.spring.PathSharingContext
 
validateConfiguration() - Method in class org.archive.crawler.framework.CrawlJob
Does the assembled ApplicationContext self-validate? Any failures are reported as WARNING log events in the job log.
VALIDATOR - Static variable in class org.archive.crawler.framework.CheckpointService
 
VALIDATOR - Static variable in class org.archive.modules.CrawlMetadata
 
VALIDITY_STAMP_FILENAME - Static variable in class org.archive.checkpointing.Checkpoint
Name of file written with timestamp into valid checkpoints
validRobots - Variable in class org.archive.modules.net.CrawlServer
 
value - Variable in class org.archive.crawler.util.BdbUriUniqFilter
 
value - Variable in class org.archive.io.ReadSourceEditor
 
value - Variable in class org.archive.modules.forms.HTMLForm.FormInput
 
value - Variable in class org.archive.spring.ConfigPathEditor
 
value - Variable in class org.archive.spring.ConfigString
 
valueOf(String) - Static method in enum org.archive.crawler.event.CrawlURIDispositionEvent.Disposition
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.crawler.framework.CrawlController.State
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.crawler.framework.CrawlStatus
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.crawler.framework.Frontier.State
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.crawler.framework.ToeThread.Step
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.crawler.prefetch.RuntimeLimitEnforcer.Operation
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.crawler.restlet.Flash.Kind
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.crawler.util.Logs
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.modules.CrawlURI.FetchType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.modules.deciderules.DecideResult
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.modules.deciderules.MatchesFilePatternDecideRule.Preset
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.modules.extractor.Hop
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.modules.fetcher.FetchStats.Stage
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.modules.fetcher.FetchWhois.UrlStatus
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.modules.ProcessResult.ProcessStatus
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum org.archive.util.ms.Entry.EntryType
Returns the enum constant of this type with the specified name.
values() - Static method in enum org.archive.crawler.event.CrawlURIDispositionEvent.Disposition
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.crawler.framework.CrawlController.State
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.crawler.framework.CrawlStatus
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.crawler.framework.Frontier.State
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.crawler.framework.ToeThread.Step
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.crawler.prefetch.RuntimeLimitEnforcer.Operation
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.crawler.restlet.Flash.Kind
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.crawler.util.Logs
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.modules.CrawlURI.FetchType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.modules.deciderules.DecideResult
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.modules.deciderules.MatchesFilePatternDecideRule.Preset
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.modules.extractor.Hop
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.modules.fetcher.FetchStats.Stage
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.modules.fetcher.FetchWhois.UrlStatus
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum org.archive.modules.ProcessResult.ProcessStatus
Returns an array containing the constants of this enum type, in the order they are declared.
values - Variable in class org.archive.util.fingerprint.MemLongFPSet
 
values() - Static method in enum org.archive.util.ms.Entry.EntryType
Returns an array containing the constants of this enum type, in the order they are declared.
verifySerialization(Object, byte[], Object, byte[]) - Method in class org.archive.state.ModuleTestBase
Verifies that serialization was successful.
VERY_LIKELY_RELATIVE_URI_PATTERN - Static variable in class org.archive.util.UriUtils
 
ViewModel - Class in org.archive.crawler.restlet.models
 
ViewModel() - Constructor for class org.archive.crawler.restlet.models.ViewModel
 

W

WagCostAssignmentPolicy - Class in org.archive.crawler.frontier
A CostAssignmentPolicy based on some wild guesses of kinds of URIs that should be deferred into the (potentially never-crawled) future.
WagCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.WagCostAssignmentPolicy
 
waitForAll() - Method in class org.archive.crawler.framework.ToePool
 
waitForNoRunningJobs(long) - Method in class org.archive.crawler.framework.Engine
Wait for all jobs to be in non-running state, or until timeout (given in ms) elapses.
wakeQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Wake any queues sitting in the snoozed queue whose time has come.
wakeTime - Variable in class org.archive.crawler.frontier.WorkQueue
Time to wake, if snoozed
Warc2Arc - Class in org.archive.io
Convert WARCs to (sortof) ARCs.
Warc2Arc() - Constructor for class org.archive.io.Warc2Arc
 
warcHeaderFor(String) - Method in class org.archive.modules.forms.FormLoginProcessor
 
WARCWriterProcessor - Class in org.archive.modules.writer
WARCWriterProcessor.
WARCWriterProcessor() - Constructor for class org.archive.modules.writer.WARCWriterProcessor
 
weight - Variable in class org.archive.util.BloomFilter64bit
The random integers used to generate the hash functions.
WHOIS_SERVER_REGEX - Static variable in class org.archive.modules.fetcher.FetchWhois
 
wildcardDirectives - Variable in class org.archive.modules.net.Robotstxt
 
withOverridesDo(OverlayContext, Runnable) - Static method in class org.archive.spring.KeyedProperties
 
WorkQueue - Class in org.archive.crawler.frontier
A single queue of related URIs to visit, grouped by a classKey (typically "hostname:port" or similar)
WorkQueue(String) - Constructor for class org.archive.crawler.frontier.WorkQueue
 
workQueueDataOnDisk() - Method in class org.archive.crawler.frontier.BdbFrontier
 
workQueueDataOnDisk() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
Returns true if the WorkQueue implementation of this Frontier stores its workload on disk instead of relying on serialization mechanisms.
WorkQueueFrontier - Class in org.archive.crawler.frontier
A common Frontier base using several queues to hold pending URIs.
WorkQueueFrontier() - Constructor for class org.archive.crawler.frontier.WorkQueueFrontier
Constructor.
wrapped - Variable in class org.archive.util.IdentityCacheableWrapper
 
write(byte[]) - Method in class org.apache.commons.httpclient.HttpConnection
Writes the specified bytes to the output stream.
write(byte[], int, int) - Method in class org.apache.commons.httpclient.HttpConnection
Writes length bytes in data starting at offset to the output stream.
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.CrawlSummaryReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.FrontierNonemptyReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.FrontierSummaryReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.HostsReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.MimetypesReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.ProcessorsReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.Report
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.ResponseCodeReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.SeedsReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.SourceTagsReport
 
write(PrintWriter, StatisticsTracker) - Method in class org.archive.crawler.reporting.ToeThreadsReport
 
write(Writer) - Method in class org.archive.crawler.restlet.EditRepresentation
 
write(Writer) - Method in class org.archive.crawler.restlet.PagedRepresentation
Write the paged HTML.
write(WARCWriter, ARCRecord) - Method in class org.archive.io.Arc2Warc
 
write(CrawlURI, long, InputStream, String) - Method in class org.archive.modules.writer.ARCWriterProcessor
 
write(String, CrawlURI) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeArchiveInfoPart(String, CrawlURI, ReplayInputStream, OutputStream) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
writeBufferSize - Variable in class org.archive.modules.writer.WriterPoolProcessor
Size of buffer in front of disk-writing.
writeContentPart(String, CrawlURI, ReplayInputStream, OutputStream) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
writeDnsRecords(CrawlURI, WARCWriter, URI, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeFtpControlConversation(WARCWriter, String, URI, CrawlURI, ANVLRecord, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeFtpRecords(WARCWriter, CrawlURI, URI, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeHeaderPart(String, ReplayInputStream, OutputStream) - Method in class org.archive.modules.writer.Kw3WriterProcessor
 
writeHtml(Writer) - Method in class org.archive.crawler.restlet.BeanBrowseResource
 
writeHtml(Writer) - Method in class org.archive.crawler.restlet.EngineResource
 
writeHtml(Writer) - Method in class org.archive.crawler.restlet.JobResource
 
writeHtml(Writer) - Method in class org.archive.crawler.restlet.ScriptResource
 
writeHtmlTo(PrintWriter) - Method in class org.archive.crawler.framework.CrawlJob
 
writeHtmlTo(PrintWriter, String) - Method in class org.archive.crawler.framework.CrawlJob
 
writeHttpRecords(CrawlURI, WARCWriter, URI, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeJobPathFile(CrawlJob) - Method in class org.archive.crawler.framework.Engine
Writes a .jobpath file for the new CrawlJob, whose directory is outside the main Engine jobs directory.
writeLine(byte[]) - Method in class org.apache.commons.httpclient.HttpConnection
Writes the specified bytes, followed by "\r\n".getBytes() to the output stream.
writeLine() - Method in class org.apache.commons.httpclient.HttpConnection
Writes "\r\n".getBytes() to the output stream.
writeLine(String...) - Method in class org.archive.io.CrawlerJournal
Write a line
writeLine(MutableString) - Method in class org.archive.io.CrawlerJournal
Write a line.
writeLongUriLine(String, CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
 
writeMetadata(WARCWriter, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeMimeFile(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
The actual writing of the Kulturarw3 MIME-file.
writeObjectData(Kryo, ByteBuffer) - Method in class org.archive.net.UURI
 
writeObjectToFile(Object, File) - Static method in class org.archive.crawler.util.CheckpointUtils
Utility function to serialize an object to a file in current checkpoint dir.
writeObjectToFile(Object, String, File) - Static method in class org.archive.crawler.util.CheckpointUtils
 
writeReportFile(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
writeReportFile(Report, boolean) - Method in class org.archive.crawler.reporting.StatisticsTracker
 
writeReportLine(PrintWriter, Object...) - Method in class org.archive.crawler.reporting.HostsReport
 
writeRequest(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Sends the request via the given connection.
writeRequest(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeRequestBody(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Writes the request body to the given connection.
writeRequestHeaders(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Writes the request headers to the given connection.
writeRequestLine(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
Writes the request line to the given connection.
writeResource(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeResponse(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeRevisitDigest(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeRevisitDigest(WARCWriter, String, String, URI, CrawlURI, ANVLRecord, long) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeRevisitNotModified(WARCWriter, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
writeRevisitUriAgnosticDigest(WARCWriter, String, String, URI, CrawlURI, ANVLRecord) - Method in class org.archive.modules.writer.WARCWriterProcessor
 
WriterPoolProcessor - Class in org.archive.modules.writer
Abstract implementation of a file pool processor.
WriterPoolProcessor() - Constructor for class org.archive.modules.writer.WriterPoolProcessor
 
WriteTarget - Interface in org.archive.spring
Interface for objects that can provide a Writer for replacing or appending to their textual contents.
writeValidity(String) - Method in class org.archive.checkpointing.Checkpoint
 
writeWhoisRecords(WARCWriter, CrawlURI, URI, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
 

X

XmlMarshaller - Class in org.archive.crawler.restlet
XmlMarshaller can be used to write data structures as simple xml.
XmlMarshaller() - Constructor for class org.archive.crawler.restlet.XmlMarshaller
 
xmlOkAt - Variable in class org.archive.crawler.framework.CrawlJob
 

Z

ZERO_LENGTH_ENTRY - Static variable in class org.archive.crawler.util.BdbUriUniqFilter
 
ZeroCostAssignmentPolicy - Class in org.archive.crawler.frontier
CostAssignmentPolicy considering all URIs costless -- essentially disabling budgetting features.
ZeroCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.ZeroCostAssignmentPolicy
 

_

_connectAction_() - Method in class org.archive.net.ClientFTP
 
A B C D E F G H I J K L M N O P Q R S T U V W X Z _ 

Copyright © 2003-2014 Internet Archive. All Rights Reserved.