- A_ANNOTATIONS - Static variable in interface org.archive.modules.CoreAttributeConstants
-
shorthand string tokens indicating notable occurrences,
separated by commas
- A_CONTENT_DIGEST - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
content digest
- A_CONTENT_DIGEST_COUNT - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
number of times we've seen this content digest (1 original + n duplicates)
- A_CONTENT_DIGEST_HISTORY - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
content digest history map
- A_CONTENT_TYPE - Static variable in interface org.archive.modules.CoreAttributeConstants
-
Extracted MIME type of fetched content; should be
set immediately by fetching module if possible
(rather than waiting for a later analyzer)
- A_CREDENTIALS_KEY - Static variable in interface org.archive.modules.CoreAttributeConstants
-
Key to get credential avatars from A_LIST.
- A_DELAY_FACTOR - Static variable in interface org.archive.modules.CoreAttributeConstants
-
Multiplier of last fetch duration to wait before
fetching another item of the same class (eg host)
- A_DISTANCE_FROM_SEED - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_DNS_FETCH_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_DNS_SERVER_IP_LABEL - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_ETAG_HEADER - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
header name (and AList key) for ETag
- A_FETCH_BEGAN_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_FETCH_COMPLETED_TIME - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_FETCH_HISTORY - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
fetch history array
- A_FORCE_RETIRE - Static variable in interface org.archive.modules.CoreAttributeConstants
-
flag indicating the containing queue should be retired
- A_FORM_OFFSETS - Static variable in class org.archive.modules.extractor.ExtractorHTML
-
- A_FTP_CONTROL_CONVERSATION - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_FTP_FETCH_STATUS - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_HERITABLE_KEYS - Static variable in interface org.archive.modules.CoreAttributeConstants
-
Key to (optional) attribute specifying a list of keys that
are passed to CandidateURIs that 'descend' (are discovered
via) this URI.
- A_HREF - Static variable in class org.archive.modules.extractor.HTMLLinkContext
-
- A_HTML_BASE - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_HTML_FORM_OBJECTS - Static variable in class org.archive.modules.forms.ExtractorHTMLForms
-
- A_HTTP_AUTH_CHALLENGES - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_HTTP_PROXY_HOST - Static variable in interface org.archive.modules.CoreAttributeConstants
-
local override of proxy host
- A_HTTP_PROXY_PORT - Static variable in interface org.archive.modules.CoreAttributeConstants
-
local override of proxy port
- A_LAST_MODIFIED_HEADER - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
header name (and AList key) for last-modified timestamp
- A_META_ROBOTS - Static variable in class org.archive.modules.extractor.ExtractorHTML
-
- A_MINIMUM_DELAY - Static variable in interface org.archive.modules.CoreAttributeConstants
-
Minimum delay before fetching another item of th
same class (eg host).
- A_MIRROR_PATH - Static variable in interface org.archive.modules.CoreAttributeConstants
-
Define for org.archive.crawler.writer.MirrorWriterProcessor.
- A_MIRROR_PATH - Static variable in class org.archive.modules.writer.MirrorWriterProcessor
-
- A_NONFATAL_ERRORS - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_ORIGINAL_DATE - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
date content payload was written
- A_ORIGINAL_URL - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
url that the content payload was written for
- A_PRECALC_PRECEDENCE - Static variable in interface org.archive.modules.CoreAttributeConstants
-
key to attribute containing pre-calculated precedence
- A_PREREQUISITE_URI - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_REFERENCE_LENGTH - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
reference length (content length or virtual length
- A_RETRY_DELAY - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_RRECORD_SET_LABEL - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_RUNTIME_EXCEPTION - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_SOURCE_TAG - Static variable in interface org.archive.modules.CoreAttributeConstants
-
a 'source' (usu.
- A_STATUS - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
key for status (when in history)
- A_SUBMIT_DATA - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_VIA_DIGEST - Static variable in class org.archive.modules.extractor.TrapSuppressExtractor
-
ALIst attribute key for carrying-forward content-digest from 'via'
- A_WARC_FILE_OFFSET - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
offset into warc file of warc record with content payload
- A_WARC_FILENAME - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
warc filename containing the content payload
- A_WARC_RECORD_ID - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
warc record id of warc record with the content payload
- A_WARC_RESPONSE_HEADERS - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_WHOIS_SERVER_IP - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- A_WRITE_TAG - Static variable in interface org.archive.modules.recrawl.RecrawlAttributeConstants
-
Writer processors of all types are encouraged to put a 'writeTag'
(analogous to HTTP 'etag') in the CrawlURI state.
- abort() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Aborts the execution of this method.
- aboutToLog() - Method in class org.archive.modules.CrawlURI
-
Notify CrawlURI it is about to be logged; opportunity
for self-annotation
- ABS_HTTP_URI_PATTERN - Static variable in class org.archive.modules.extractor.ExtractorURI
-
- AbstractContentDigestHistory - Class in org.archive.modules.recrawl
-
Represents a store of information, presumably persistent, keyed by content
digest.
- AbstractContentDigestHistory() - Constructor for class org.archive.modules.recrawl.AbstractContentDigestHistory
-
- AbstractCookieStorage - Class in org.archive.modules.fetcher
-
- AbstractCookieStorage() - Constructor for class org.archive.modules.fetcher.AbstractCookieStorage
-
- AbstractFrontier - Class in org.archive.crawler.frontier
-
Shared facilities for Frontier implementations.
- AbstractFrontier() - Constructor for class org.archive.crawler.frontier.AbstractFrontier
-
- AbstractLongFPSet - Class in org.archive.util
-
Shell of functionality for a Set of primitive long fingerprints, held
in an array of possibly-empty slots.
- AbstractLongFPSet() - Constructor for class org.archive.util.AbstractLongFPSet
-
To support serialization
TODO: verify needed?
- AbstractLongFPSet(int, float) - Constructor for class org.archive.util.AbstractLongFPSet
-
Create a new AbstractLongFPSet with a given capacity and load Factor
- AbstractPersistProcessor - Class in org.archive.modules.recrawl
-
- AbstractPersistProcessor() - Constructor for class org.archive.modules.recrawl.AbstractPersistProcessor
-
- ac - Variable in class org.archive.crawler.framework.CrawlJob
-
- AcceptDecideRule - Class in org.archive.modules.deciderules
-
- AcceptDecideRule() - Constructor for class org.archive.modules.deciderules.AcceptDecideRule
-
- acceptNonDnsResolves - Variable in class org.archive.modules.fetcher.FetchDNS
-
If a DNS lookup fails, whether or not to fallback to InetAddress
resolution, which may use local 'hosts' files or other mechanisms.
- acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.BeanBrowseResource
-
- acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.EngineResource
-
- acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.EnhDirectoryResource
-
Accept a POST used to edit or create a file.
- acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.JobResource
-
- acceptRepresentation(Representation) - Method in class org.archive.crawler.restlet.ScriptResource
-
- accepts(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
-
- accumulate(CrawlURI) - Method in class org.archive.crawler.util.CrawledBytesHistotable
-
- actionDir - Variable in class org.archive.crawler.framework.ActionDirectory
-
- ActionDirectory - Class in org.archive.crawler.framework
-
Directory watched for new files.
- ActionDirectory() - Constructor for class org.archive.crawler.framework.ActionDirectory
-
- actions - Variable in class org.archive.modules.extractor.CustomSWFTags
-
- activateInactiveQueue() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Activate an inactive queue, if any are available.
- active - Variable in class org.archive.crawler.frontier.WorkQueue
-
whether queue is active (ready/in-process/snoozed) or on a waiting queue
- actOn(File) - Method in class org.archive.crawler.framework.ActionDirectory
-
Process an individual action file found
- actOn(File) - Method in class org.archive.modules.seeds.SeedModule
-
- actOn(File) - Method in class org.archive.modules.seeds.TextSeedModule
-
Treat the given file as a source of additional seeds,
announcing to SeedListeners.
- add(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Add given uri, if not already present.
- add(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- add(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- add(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
-
- add(long) - Method in class org.archive.util.AbstractLongFPSet
-
Add the given value to this set
- add(CharSequence) - Method in interface org.archive.util.BloomFilter
-
Adds a character sequence to the filter.
- add(CharSequence) - Method in class org.archive.util.BloomFilter64bit
-
Adds a character sequence to the filter.
- add(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
-
- add(long) - Method in interface org.archive.util.fingerprint.LongFPSet
-
Add a fingerprint to the set.
- add(Histotable<K>) - Method in class org.archive.util.Histotable
-
- add(Iterator<E>) - Method in class org.archive.util.iterator.CompositeIterator
-
Add an iterator to the internal chain.
- addAllow(String) - Method in class org.archive.modules.net.RobotsDirectives
-
- addCap(byte[]) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Add a dummy 'cap' entry at the given insertion key.
- addCookie(Cookie) - Method in class org.apache.commons.httpclient.HttpState
-
Adds an
HTTP cookie
, replacing any existing equivalent cookies.
- addCookieRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Generates
Cookie request headers for those
cookie
s
that match the given host, port and path.
- addCookies(Cookie[]) - Method in class org.apache.commons.httpclient.HttpState
-
- addCredential(Credential) - Method in class org.archive.modules.net.CrawlServer
-
Add an avatar.
- addDataPersistentMember(String) - Static method in class org.archive.modules.CrawlURI
-
Add the key of data map items you want to persist across
processings.
- addDisallow(String) - Method in class org.archive.modules.net.RobotsDirectives
-
- added(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
-
- addedSeed(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
When notified of a seed via the SeedListener interface,
schedule it.
- addedSeed(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Create a seed record, even on initial notification (before
any real attempt/processing.
- addedSeed(CrawlURI) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
If appropriate, convert seed notification into prefix-addition.
- addedSeed(CrawlURI) - Method in interface org.archive.modules.seeds.SeedListener
-
- addExternalPath(String) - Method in class org.archive.spring.KeyedProperties
-
Add a path by which the outside world can reach this map
- addExtraInfo(String, Object) - Method in class org.archive.modules.CrawlURI
-
- addField(String, String, String) - Method in class org.archive.modules.forms.HTMLForm
-
Add a discovered INPUT, tracking it as potential
username/password receiver.
- addFlash(Response, String) - Static method in class org.archive.crawler.restlet.Flash
-
- addFlash(Response, String, Flash.Kind) - Static method in class org.archive.crawler.restlet.Flash
-
- addForce(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Add given uri, all the way through to underlying destination, even
if already present.
- addForce(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- addForce(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- addGlobalVariable(String, String) - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- addHeaderLink(CrawlURI, Header) - Method in class org.archive.modules.extractor.ExtractorHTTP
-
- addHeaderLink(CrawlURI, String, String) - Method in class org.archive.modules.extractor.ExtractorHTTP
-
- addHostRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Generates Host request header, as long as no Host request
header already exists.
- addIfNotBlank(ANVLRecord, String, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- addJobDirectory(File) - Method in class org.archive.crawler.framework.Engine
-
Adds a job directory to the Engine known jobConfigs if not extant.
- addLinkFromString(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- addLogger(Logger) - Method in class org.archive.crawler.reporting.AlertThreadGroup
-
- addNewFp(long) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
-
- addNewFp(long) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
Add an FP (which may be an old or new FP) to the new complete
list.
- addNewFp(long) - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
-
- addNow(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Immediately add uri.
- addNow(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- addNow(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- addOutlink(CrawlURI, String, LinkContext, Hop) - Method in class org.archive.modules.extractor.Extractor
-
Create and add a 'Link' to the CrawlURI with given URI/context/hop-type
- addPersistentDataMapKey(String) - Method in class org.archive.modules.CrawlURI
-
- addPresentableNestedNames(Collection<Object>, Object, Set<Object>) - Method in class org.archive.crawler.restlet.JobRelatedResource
-
Starting at (and including) the given object, adds nested Map
representations of named beans to the namedBeans
Collection.
- addPropertyChangeListener(PropertyChangeListener) - Method in class org.archive.io.ReadSourceEditor
-
- addPropertyChangeListener(PropertyChangeListener) - Method in class org.archive.spring.ConfigPathEditor
-
- addProxyConnectionHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Generates Proxy-Connection: Keep-Alive request header when
communicating via a proxy server.
- AddRedirectFromRootServerToScope - Class in org.archive.modules.deciderules
-
- AddRedirectFromRootServerToScope() - Constructor for class org.archive.modules.deciderules.AddRedirectFromRootServerToScope
-
- addRefreshHeaderLink(CrawlURI, Header) - Method in class org.archive.modules.extractor.ExtractorHTTP
-
- addRelativeToBase(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
-
- addRelativeToVia(CrawlURI, int, String, LinkContext, Hop) - Static method in class org.archive.modules.extractor.Link
-
- addRequestHeader(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Adds the specified request header, NOT overwriting any previous value.
- addRequestHeader(String, String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Adds the specified request header, NOT overwriting any previous value.
- addRequestHeaders(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Generates all the required request
header
s
to be submitted via the given
connection
.
- addResponseContent(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
-
This method populates curi
with response status and
content type.
- addResponseFooter(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Use this method internally to add footers.
- ADDRESS_BITS_PER_UNIT - Static variable in class org.archive.util.BloomFilter64bit
-
- addRuleAssociation(DecideRuledSheetAssociation) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
- addRuleAssociations(Set<DecideRuledSheetAssociation>) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Collect all rule-based SheetAssociations.
- addSeed(CrawlURI) - Method in class org.archive.modules.seeds.SeedModule
-
- addSeed(CrawlURI) - Method in class org.archive.modules.seeds.TextSeedModule
-
Add a new seed to scope.
- addSeedListener(SeedListener) - Method in class org.archive.modules.seeds.SeedModule
-
- addStats(Map<String, Map<String, Long>>) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- addSurtAssociation(String, String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
- addSurtAssociations(List<SurtPrefixesSheetAssociation>) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Collect all SURT-based SheetAssociations.
- addSurtsAssociation(SurtPrefixesSheetAssociation) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Add an individual surtsAssociation to the sheetNamesBySurt map.
- addToManifest(String, char, boolean) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
Add a file to the manifest of files used/generated by the current
crawl.
- addToManifest(String, char, boolean) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- addUserAgentRequestHeader(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Generates default User-Agent request header, as long as no
User-Agent request header already exists.
- addWhoisLink(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchWhois
-
- addWhoisLinks(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
-
Adds outlinks to whois:{domain} and whois:{ipAddress}
- afterPropertiesSet() - Method in class org.archive.checkpointing.Checkpoint
-
- afterPropertiesSet() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- afterPropertiesSet() - Method in class org.archive.crawler.util.BloomUriUniqFilter
-
Initializer.
- afterPropertiesSet() - Method in class org.archive.modules.CrawlMetadata
-
- afterPropertiesSet() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- afterPropertiesSet() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- afterPropertiesSet() - Method in class org.archive.modules.ScriptedProcessor
-
- agentsToDirectives - Variable in class org.archive.modules.net.Robotstxt
-
- AggressiveExtractorHTML - Class in org.archive.modules.extractor
-
Extended version of ExtractorHTML with more aggressive javascript link
extraction where javascript code is parsed first with general HTML tags
regex, and than by javascript speculative link regex.
- AggressiveExtractorHTML() - Constructor for class org.archive.modules.extractor.AggressiveExtractorHTML
-
- AlertHandler - Class in org.archive.crawler.reporting
-
Stub Handler, catching and relaying WARNING/SEVERE events to
AlertThreadGroup.
- AlertHandler() - Constructor for class org.archive.crawler.reporting.AlertHandler
-
- alertsLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- alertThreadGroup - Variable in class org.archive.crawler.framework.CrawlController
-
- alertThreadGroup - Variable in class org.archive.crawler.framework.CrawlJob
-
- AlertThreadGroup - Class in org.archive.crawler.reporting
-
Parent thread group which lets all child threads find the right
'alert' error handler.
- AlertThreadGroup(String) - Constructor for class org.archive.crawler.reporting.AlertThreadGroup
-
- allBeans - Variable in class org.archive.spring.ConfigPathConfigurer
-
- allConfigPaths - Variable in class org.archive.spring.ConfigPathConfigurer
-
- allErrors - Variable in class org.archive.spring.PathSharingContext
-
- allFps - Variable in class org.archive.crawler.util.MemFPMergeUriUniqFilter
-
- allNonemptyReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Compact report of all nonempty queues (one queue per line)
- allowCreate - Variable in class org.archive.bdb.BdbModule.BdbConfig
-
- allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.CustomRobotsPolicy
-
- allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
-
- allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.IgnoreRobotsPolicy
-
- allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
-
- allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.ObeyRobotsPolicy
-
- allows - Variable in class org.archive.modules.net.RobotsDirectives
-
- allows(String) - Method in class org.archive.modules.net.RobotsDirectives
-
- allows(String, CrawlURI, Robotstxt) - Method in class org.archive.modules.net.RobotsPolicy
-
- allowsAll() - Method in class org.archive.modules.net.Robotstxt
-
Does this policy effectively allow everything? (No
disallows or timing (crawl-delay) directives?)
- allowsEdit(File) - Method in class org.archive.crawler.restlet.EnhDirectory
-
- allowsPaging(File) - Method in class org.archive.crawler.restlet.EnhDirectory
-
- allQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
All known queues.
- allQueuesReportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Compact report of all nonempty queues (one queue per line)
- alreadySeen - Variable in class org.archive.crawler.util.BdbUriUniqFilter
-
- analyze(CrawlURI, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
-
Run analysis: find form METHOD, ACTION, and all INPUT names/values
Log as configured.
- ANNOTATION_UNWRITTEN - Static variable in class org.archive.modules.writer.WriterPoolProcessor
-
CrawlURI annotation indicating no record was written.
- announceSeeds() - Method in class org.archive.modules.seeds.SeedModule
-
- announceSeeds() - Method in class org.archive.modules.seeds.TextSeedModule
-
Announce all seeds from configured source to SeedListeners
(including nonseed lines mixed in).
- announceSeeds(CountDownLatch) - Method in class org.archive.modules.seeds.TextSeedModule
-
- announceSeedsFromReader(BufferedReader, CountDownLatch) - Method in class org.archive.modules.seeds.TextSeedModule
-
Announce all seeds (and nonseed possible-directive lines) from
the given Reader
- AntiCalendarCostAssignmentPolicy - Class in org.archive.crawler.frontier
-
CostAssignmentPolicy that further penalizes URIs with
calendar-suggestive strings in them, with an extra unit
of cost.
- AntiCalendarCostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
-
- appCtx - Variable in class org.archive.crawler.framework.ActionDirectory
-
- appCtx - Variable in class org.archive.crawler.framework.CheckpointService
-
- appCtx - Variable in class org.archive.crawler.framework.CrawlController
-
- appCtx - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
- appCtx - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- appCtx - Variable in class org.archive.crawler.restlet.BeanBrowseResource
-
- appCtx - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
-
- appCtx - Variable in class org.archive.modules.ScriptedProcessor
-
- appCtx - Variable in class org.archive.spring.ConfigPathConfigurer
-
- append(String) - Method in class org.archive.util.PaddingStringBuffer
-
append a string directly to the buffer
- append(int) - Method in class org.archive.util.PaddingStringBuffer
-
append an int
to the buffer.
- append(long) - Method in class org.archive.util.PaddingStringBuffer
-
append a long
to the buffer.
- appendQueueReports(PrintWriter, String, Iterator<?>, int, int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Append queue report to general Frontier report.
- applyOverlaysTo(CrawlURI) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Apply the proper overlays (by Sheet beanName) to the given CrawlURI,
according to configured associations.
- applyQuota(CrawlURI, String, long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
Apply the quota specified by the given key against the actual
value provided.
- Arc2Warc - Class in org.archive.io
-
Convert ARCs to (sortof) WARCs.
- Arc2Warc() - Constructor for class org.archive.io.Arc2Warc
-
- ARCHIVE_TIME_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- ARCWriterProcessor - Class in org.archive.modules.writer
-
Processor module for writing the results of successful fetches (and
perhaps someday, certain kinds of network failures) to the Internet Archive
ARC file format.
- ARCWriterProcessor() - Constructor for class org.archive.modules.writer.ARCWriterProcessor
-
- ArrayLongFPCache - Class in org.archive.util.fingerprint
-
Simple long fingerprint cache using a backing array; any long maps to
one of 'smear' slots.
- ArrayLongFPCache() - Constructor for class org.archive.util.fingerprint.ArrayLongFPCache
-
- asAnnotation() - Method in class org.archive.modules.forms.HTMLForm
-
Provide abbreviated annotation, of the form...
- asHttpClientDataWith(String, String) - Method in class org.archive.modules.forms.HTMLForm
-
Create the NameValuePair array expected by HttpClient, merging
username and password into the appropriate value slots.
- assertNoSideEffects(CrawlURI) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
-
Asserts that the given URI has no URI errors, no localized errors, and
no annotations.
- assertNotOpen() - Method in class org.apache.commons.httpclient.HttpConnection
-
- assertOpen() - Method in class org.apache.commons.httpclient.HttpConnection
-
- AssignmentLevelSurtQueueAssignmentPolicy - Class in org.archive.crawler.frontier
-
Create a queueKey based on the SURT authority, reduced to the
public-suffix-plus-one domain (topmost assignable domain).
- AssignmentLevelSurtQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.AssignmentLevelSurtQueueAssignmentPolicy
-
- atFinish() - Method in class org.archive.crawler.framework.CrawlController
-
Evaluate if the crawl should stop because it is finished,
without actually stopping the crawl.
- atProcessor(Processor) - Method in class org.archive.crawler.framework.ToeThread
-
- atProcessor(Processor) - Method in interface org.archive.modules.ProcessorChain.ChainStatusReceiver
-
- attach(CrawlURI) - Method in class org.archive.modules.credential.Credential
-
Attach this credentials avatar to the passed curi
.
- ATTR_MAX_BYTES_WRITTEN - Static variable in class org.archive.modules.writer.Kw3WriterProcessor
-
Max size for each file.Key for the maximum ARC bytes to write attribute.
- audience - Variable in class org.archive.modules.CrawlMetadata
-
- AUDIO_VIDEO_IMAGE_MIMETYPE_SET - Static variable in class org.archive.util.UriUtils
-
- AUDIO_VIDEO_IMAGE_MIMETYPES - Static variable in class org.archive.util.UriUtils
-
- authenticate(Request) - Method in class org.archive.crawler.restlet.RateLimitGuard
-
- authenticated(Credential, CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
Has passed credential already been authenticated.
- AutoKryo - Class in org.archive.bdb
-
Extensions to Kryo to let classes control their own registration, suggest
other classes to register together, and use the same (Sun-JVM-only) trick
for deserializing classes without no-arg constructors.
- AutoKryo() - Constructor for class org.archive.bdb.AutoKryo
-
- autoregister(Class<?>) - Method in class org.archive.bdb.AutoKryo
-
- autoregisterTo(AutoKryo) - Static method in class org.archive.crawler.frontier.BdbWorkQueue
-
- autoregisterTo(AutoKryo) - Static method in class org.archive.modules.CrawlURI
-
- autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.CrawlHost
-
- autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.CrawlServer
-
- autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.RobotsDirectives
-
- autoregisterTo(AutoKryo) - Static method in class org.archive.modules.net.Robotstxt
-
- autoregisterTo(AutoKryo) - Static method in class org.archive.util.IdentityCacheableWrapper
-
- AVAILABLE_EXTRACTOR - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- availableRobotsPolicies - Variable in class org.archive.modules.CrawlMetadata
-
Map of all available RobotsPolicies, by name, to choose from.
- averageDepth() - Method in interface org.archive.crawler.framework.Frontier
-
Average depth of the last URI in all eligible queues.
- averageDepth() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- averageDepth - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- cache - Variable in class org.archive.crawler.processor.CrawlMapper
-
- cache - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
-
- cachedFormat - Variable in class org.archive.crawler.io.UriProcessingFormatter
-
- cacheLength() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
-
- cachePercent - Variable in class org.archive.bdb.BdbModule
-
- cacheSize - Variable in class org.archive.bdb.BdbModule
-
- calcOutputDirs() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- calcReverseSortedHostsDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Return a copy of the hosts distribution in reverse-sorted
(largest first) order.
- calcSchemeAuthorityKeyBytes(String) - Static method in class org.archive.crawler.util.BdbUriUniqFilter
-
- calcSeedRecordsSortedByStatusCode() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- calculateInsertKey(CrawlURI) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Calculate the insertKey that places a CrawlURI in the
desired spot.
- calculateOriginKey(String) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Calculate the 'origin' key for a virtual queue of items
with the given classKey.
- calculatePrecedence(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
-
Calculate the precedence value for the given queue.
- calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
-
Calculate the precedence value for the given URI.
- calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
-
- calculatePrecedence(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- calculatePrecedence(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
-
- CALENDARISH - Static variable in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
-
- canary - Variable in class org.archive.util.ObjectIdentityBdbCache
-
- candidateChain - Variable in class org.archive.crawler.framework.CrawlController
-
Candidate chain
- candidateChain - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
-
Candidate chain
- CandidateChain - Class in org.archive.modules
-
- CandidateChain() - Constructor for class org.archive.modules.CandidateChain
-
- CandidateScoper - Class in org.archive.crawler.prefetch
-
Simple single-URI scoper, considers passed-in URI as candidate; sets
fetchstatus negative and skips to end of processing if out-of-scope.
- CandidateScoper() - Constructor for class org.archive.crawler.prefetch.CandidateScoper
-
- CandidatesProcessor - Class in org.archive.crawler.postprocessor
-
Processor which sends all candidate outlinks through the
CandidateChain, scheduling those with non-negative status
codes to the frontier.
- CandidatesProcessor() - Constructor for class org.archive.crawler.postprocessor.CandidatesProcessor
-
Usual no-argument constructor
- candidateUserAgents - Variable in class org.archive.modules.net.FirstNamedRobotsPolicy
-
list of user-agents to try; if any are allowed, a URI will be crawled
- candidateUserAgents - Variable in class org.archive.modules.net.MostFavoredRobotsPolicy
-
list of user-agents to try; if any are allowed, a URI will be crawled
- CanonicalizationRule - Interface in org.archive.modules.canonicalize
-
A rule to apply canonicalizing a url.
- canonicalize(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
Canonicalize passed CrawlURI.
- canonicalize(String) - Method in interface org.archive.modules.canonicalize.CanonicalizationRule
-
Apply this canonicalization rule.
- canonicalize(String) - Method in class org.archive.modules.canonicalize.FixupQueryString
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.LowercaseRule
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.RegexRule
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
-
Run the passed uuri through the list of rules.
- canonicalize(String) - Method in class org.archive.modules.canonicalize.StripExtraSlashes
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.StripSessionCFIDs
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.StripSessionIDs
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.StripUserinfoRule
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.StripWWWNRule
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.StripWWWRule
-
- canonicalize(String) - Method in class org.archive.modules.canonicalize.UriCanonicalizationPolicy
-
- canonicalString - Variable in class org.archive.modules.CrawlURI
-
- capacityPowerOfTwo - Variable in class org.archive.util.AbstractLongFPSet
-
the capacity of this set, specified as the exponent of a power of 2
- caseSensitiveFilesystem - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
True if the file system is case-sensitive, like UNIX.
- catalog - Variable in class org.archive.modules.extractor.PDFParser
-
- characterMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
This list is grouped in pairs.
- checkAvailableSpace(File) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
Probe via File.getUsableSpace to see if monitored paths have fallen below
the pause threshold.
- checkBytesWritten() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- checkForLimitsExceeded(CrawlStatSnapshot) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- checkForNull(String) - Method in class org.archive.crawler.io.UriProcessingFormatter
-
- checkForSeedPromotion(CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
Check if the URI needs special 'discovered seed' treatment.
- checkFutures() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Check for any future-scheduled URIs now eligible for reenqueuing
- checkMidfetchAbort(CrawlURI, HttpRecorderMethod, HttpConnection) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- checkNotUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- checkOutlinks - Variable in class org.archive.crawler.processor.CrawlMapper
-
Whether to apply the mapping to discovered outlinks, for example after
extraction has occurred.
- Checkpoint - Class in org.archive.checkpointing
-
Represents a single checkpoint, by its name and main store directory.
- Checkpoint() - Constructor for class org.archive.checkpointing.Checkpoint
-
- checkpoint - Variable in class org.archive.crawler.framework.CheckpointSuccessEvent
-
- Checkpointable - Interface in org.archive.checkpointing
-
Interface for objects that can checkpoint their state, possibly
but not necessarily into the provided Checkpoint instance, on
request.
- checkpointDir - Variable in class org.archive.checkpointing.Checkpoint
-
Checkpoints directory; either an absolute path, or relative to the
CheckpointService's checkpointsDirectory (which will be inserted as
the COnfigPath base before the Checkpoint is consulted).
- checkpointFailed(Exception) - Method in class org.archive.crawler.framework.CheckpointService
-
Note that a checkpoint failed
- checkpointFailed(String) - Method in class org.archive.crawler.framework.CheckpointService
-
- checkpointInProgress - Variable in class org.archive.crawler.framework.CheckpointService
-
- checkpointIntervalMinutes - Variable in class org.archive.crawler.framework.CheckpointService
-
- checkpointsDir - Variable in class org.archive.crawler.framework.CheckpointService
-
- CheckpointService - Class in org.archive.crawler.framework
-
Executes checkpoints, and offers convenience methods for enumerating
available Checkpoints and injecting a recovery-Checkpoint after
build and before launch (setRecoveryCheckpointByName).
- CheckpointService() - Constructor for class org.archive.crawler.framework.CheckpointService
-
Create a new Checkpointer
- CheckpointSuccessEvent - Class in org.archive.crawler.framework
-
Report success of a Checkpoint (so that it may be reported by the
CrawlJOb to the job log).
- CheckpointSuccessEvent(CheckpointService, Checkpoint) - Constructor for class org.archive.crawler.framework.CheckpointSuccessEvent
-
- checkpointTask - Variable in class org.archive.crawler.framework.CheckpointService
-
- CheckpointUtils - Class in org.archive.crawler.util
-
Utilities useful checkpointing.
- CheckpointUtils() - Constructor for class org.archive.crawler.util.CheckpointUtils
-
- CheckpointValidator - Class in org.archive.crawler.framework
-
- CheckpointValidator() - Constructor for class org.archive.crawler.framework.CheckpointValidator
-
- checkQuotas(CrawlURI, FetchStats.HasFetchStats, int) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
Check all quotas for the given substats and category (server, host, or
group).
- checkUri - Variable in class org.archive.crawler.processor.CrawlMapper
-
Whether to apply the mapping to a URI being processed itself, for example
early in processing (while its status is still 'unattempted').
- checkUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- checkXML() - Method in class org.archive.crawler.framework.CrawlJob
-
Is the primary XML config minimally well-formed?
- chmod - Variable in class org.archive.modules.writer.Kw3WriterProcessor
-
Should permissions be changed for the newly created dirs.
- chmodValue - Variable in class org.archive.modules.writer.Kw3WriterProcessor
-
What should the permissions be set to.
- chosenEngine - Variable in class org.archive.crawler.restlet.ScriptResource
-
- circle - Variable in class org.archive.util.LongToIntConsistentHash
-
- cj - Variable in class org.archive.crawler.restlet.JobRelatedResource
-
- cj - Variable in class org.archive.crawler.restlet.JobResource
-
- classCatalog - Variable in class org.archive.util.bdbje.EnhancedEnvironment
-
- classCatalogDB - Variable in class org.archive.util.bdbje.EnhancedEnvironment
-
- classKey - Variable in class org.archive.crawler.frontier.WorkQueue
-
The classKey
- ClassKeyMatchesRegexDecideRule - Class in org.archive.crawler.deciderules
-
Rule applies configured decision to any CrawlURI class key -- i.e.
- ClassKeyMatchesRegexDecideRule() - Constructor for class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
-
Usual constructor.
- clazz - Variable in class org.archive.spring.BeanFieldsPatternValidator
-
- cleanup() - Method in class org.archive.crawler.framework.ToePool
-
- cleanupHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
-
Perform any final cleanup related to the HttpClient instance.
- cleanUpOldFiles(String) - Method in class org.archive.util.TmpDirTestCase
-
Delete any files left over from previous run.
- cleanUpOldFiles(File, String) - Method in class org.archive.util.TmpDirTestCase
-
Delete any files left over from previous run.
- clear() - Method in class org.apache.commons.httpclient.HttpState
-
Clears the state information (all cookies, credentials and proxy credentials).
- clear() - Method in class org.archive.crawler.io.UriProcessingFormatter
-
- clearAllOverrideContexts() - Static method in class org.archive.spring.KeyedProperties
-
- clearAt(long) - Method in class org.archive.util.AbstractLongFPSet
-
- clearAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
-
- clearCookies() - Method in class org.apache.commons.httpclient.HttpState
-
Clears all cookies.
- clearCredentials() - Method in class org.apache.commons.httpclient.HttpState
-
Clears all credentials.
- clearOverridesFrom(OverlayContext) - Static method in class org.archive.spring.KeyedProperties
-
- clearPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
-
Clear prerequisite, if any.
- clearProxyCredentials() - Method in class org.apache.commons.httpclient.HttpState
-
Clears all proxy credentials.
- CLibrary - Interface in org.archive.util
-
Interface to standard C library functions; initially just link().
- ClientFTP - Class in org.archive.net
-
Client for FTP operations.
- ClientFTP() - Constructor for class org.archive.net.ClientFTP
-
Constructs a new ClientFTP
.
- close() - Method in class org.apache.commons.httpclient.HttpConnection
-
Closes the socket and streams.
- close() - Method in class org.archive.bdb.BdbModule
-
- close() - Method in class org.archive.bdb.StoredQueue
-
- close() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Close down any allocated resources.
- close() - Method in class org.archive.crawler.frontier.BdbFrontier
-
- close() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
clean up
- close() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Release resources only needed when running
- close() - Method in class org.archive.crawler.reporting.AlertHandler
-
- close() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- close() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- close() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- close() - Method in class org.archive.io.CrawlerJournal
-
Flush and close the underlying IO objects.
- close() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- close() - Method in class org.archive.modules.fetcher.DefaultServerCache
-
Called when shutting down the cache so we can do clean up.
- close() - Method in class org.archive.util.bdbje.EnhancedEnvironment
-
- close() - Method in class org.archive.util.ObjectIdentityBdbCache
-
- close() - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- close() - Method in interface org.archive.util.ObjectIdentityCache
-
close/release any associated resources
- close() - Method in class org.archive.util.ObjectIdentityMemCache
-
- closeDatabase(Database) - Method in class org.archive.bdb.BdbModule
-
- closeDatabase(String) - Method in class org.archive.bdb.BdbModule
-
- closeDataConnection() - Method in class org.archive.net.ClientFTP
-
- closeIfStale() - Method in class org.apache.commons.httpclient.HttpConnection
-
Closes the connection if stale.
- closeLogFiles() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
Close all log files and remove handlers from loggers.
- closeSocketAndStreams() - Method in class org.apache.commons.httpclient.HttpConnection
-
Closes everything out.
- collect(CrawlController, StatisticsTracker) - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
-
Collect all relevant snapshot samples, from the given CrawlController
and StatisticsTracker (which also provides the previous snapshot
for rate-calculations.
- collection - Variable in class org.archive.modules.writer.Kw3WriterProcessor
-
Name of collection.
- COLLECTION_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- comment - Variable in class org.archive.modules.deciderules.DecideRule
-
- compactReportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
-
- compare(Object, Object) - Method in class org.apache.commons.httpclient.Cookie
-
Compares two cookies to determine order for cookie header.
- compareTo(CrawlJob) - Method in class org.archive.crawler.framework.CrawlJob
-
Sort for reverse-chronological listing.
- compareTo(Delayed) - Method in class org.archive.crawler.frontier.WorkQueue
-
- compareTo(DecideRuledSheetAssociation) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
-
- compareTo(FPMergeUriUniqFilter.PendingItem) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
-
- compareTo(Link) - Method in class org.archive.modules.extractor.Link
-
- completePause() - Method in class org.archive.crawler.framework.CrawlController
-
- completeStop() - Method in class org.archive.crawler.framework.CrawlController
-
Called when the last toethread exits.
- component - Variable in class org.archive.crawler.Heritrix
-
- composeCacheSummary() - Method in class org.archive.util.ObjectIdentityBdbCache
-
- composeCacheSummary() - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- CompositeIterator<E> - Class in org.archive.util.iterator
-
An iterator that's built up out of any number of other iterators.
- CompositeIterator() - Constructor for class org.archive.util.iterator.CompositeIterator
-
Create an empty CompositeIterator.
- CompositeIterator(Iterator<E>, Iterator<E>) - Constructor for class org.archive.util.iterator.CompositeIterator
-
Convenience method for concatenating together
two iterators.
- compress - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Whether to gzip-compress files when writing to disk;
by default true, meaning do-compress.
- concludedSeedBatch() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- concludedSeedBatch() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- concludedSeedBatch() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- concludedSeedBatch() - Method in interface org.archive.modules.seeds.SeedListener
-
- ConfigFile - Class in org.archive.spring
-
ConfigPath with added implication that it is an individual,
readable/writable File.
- ConfigFile() - Constructor for class org.archive.spring.ConfigFile
-
- ConfigFile(String, String) - Constructor for class org.archive.spring.ConfigFile
-
- ConfigFileEditor - Class in org.archive.spring
-
PropertyEditor allowing Strings to become ConfigFile instances.
- ConfigFileEditor() - Constructor for class org.archive.spring.ConfigFileEditor
-
- ConfigPath - Class in org.archive.spring
-
A filesystem path, as a bean, for the convenience of configuration
via srping beans.xml or user interfaces to same.
- ConfigPath() - Constructor for class org.archive.spring.ConfigPath
-
- ConfigPath(String, String) - Constructor for class org.archive.spring.ConfigPath
-
- configPathConfigurer - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- ConfigPathConfigurer - Class in org.archive.spring
-
Bean to fixup all configuration-relative ConfigPath instances, and
maintain an inventory of referenced paths.
- ConfigPathConfigurer() - Constructor for class org.archive.spring.ConfigPathConfigurer
-
- ConfigPathEditor - Class in org.archive.spring
-
PropertyEditor allowing Strings to become ConfigPath instances.
- ConfigPathEditor() - Constructor for class org.archive.spring.ConfigPathEditor
-
- ConfigString - Class in org.archive.spring
-
A configuration string that provides its own reader via the ReadSource
interface, for convenient use in spring configuration where any of an
inline string, path to local file (ConfigPath), or any other
readable-text-source would all be equally welcome.
- ConfigString() - Constructor for class org.archive.spring.ConfigString
-
- ConfigString(String) - Constructor for class org.archive.spring.ConfigString
-
- configureHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- configureHttp(int, String, String, int, String, String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- configureMethod(CrawlURI, HttpMethod) - Method in class org.archive.modules.fetcher.FetchHTTP
-
Configure the HttpMethod setting options and headers.
- configurer - Variable in class org.archive.spring.ConfigPath
-
- congestionRatio() - Method in interface org.archive.crawler.framework.Frontier
-
Ratio of number of threads that would theoretically allow
maximum crawl progress (if each was as productive as current
threads), to current number of threads.
- congestionRatio() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- congestionRatio - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- conhash - Variable in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
- connect() - Method in class org.archive.net.s3.S3URLConnection
-
Connect to S3 and get the object reference, but don't read any of
the object data yet.
- connectTimeoutMs - Variable in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- consecutiveConnectionErrors - Variable in class org.archive.modules.net.CrawlServer
-
- considerActive() - Method in class org.archive.crawler.frontier.WorkQueue
-
Begin an 'active' session, which begins when a queue first offers a
URI for crawling, and continues until it is deactivated (for example,
for session-budget reasons).
- considerDnsPreconditions(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- considerIfLikelyUri(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
-
Consider whether a given string is URI-like.
- considerIncluded(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
-
Notify Frontier that it should consider the given UURI as if
already scheduled.
- considerIncluded(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- considerQueryStringValues(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
-
Consider a query-string-like collections of key=value[&key=value]
pairs for URI-like strings in the values.
- considerRobotsPreconditions(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
Consider the robots precondition.
- considerString(Extractor, CrawlURI, boolean, String) - Method in class org.archive.modules.extractor.ExtractorJS
-
- considerStringAsUri(String) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
-
- considerStrings(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorJS
-
- considerStrings(Extractor, CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorJS
-
- considerStrings(Extractor, CrawlURI, CharSequence, boolean) - Method in class org.archive.modules.extractor.ExtractorJS
-
- considerTimestamp() - Method in class org.archive.io.CrawlerJournal
-
Write a timestamp line if appropriate
- consistencyCheck() - Method in class org.archive.crawler.frontier.BdbFrontier
-
Run a self-consistency check over queue collections, queues-of-queues,
etc.
- consistencyMarkup(DisposableStoredSortedMap<String, String>, Iterable<?>, String) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- CONSTRUCTOR_CACHE - Static variable in class org.archive.bdb.AutoKryo
-
- constructRegex(int) - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
-
- contains(long) - Method in class org.archive.util.AbstractLongFPSet
-
Does this set contain the given value?
- contains(CharSequence) - Method in interface org.archive.util.BloomFilter
-
Checks whether the given character sequence is in this filter.
- contains(CharSequence) - Method in class org.archive.util.BloomFilter64bit
-
Checks whether the given character sequence is in this filter.
- contains(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
-
- contains(long) - Method in interface org.archive.util.fingerprint.LongFPSet
-
Does this set contain a given fingerprint.
- contains(int) - Method in class org.archive.util.ms.Piece
-
- containsContentTypeCharsetDeclaration() - Method in class org.archive.modules.CrawlURI
-
- containsDataKey(String) - Method in class org.archive.modules.CrawlURI
-
- containsHost(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
-
- containsKey(Object) - Method in class org.archive.crawler.framework.BeanLookupBindings
-
- containsServer(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
-
- CONTENT_LENGTH_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- CONTENT_MD5_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- CONTENT_TYPE_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- contentDigestHistory - Variable in class org.archive.modules.recrawl.ContentDigestHistoryLoader
-
- contentDigestHistory - Variable in class org.archive.modules.recrawl.ContentDigestHistoryStorer
-
- ContentDigestHistoryLoader - Class in org.archive.modules.recrawl
-
- ContentDigestHistoryLoader() - Constructor for class org.archive.modules.recrawl.ContentDigestHistoryLoader
-
- ContentDigestHistoryStorer - Class in org.archive.modules.recrawl
-
- ContentDigestHistoryStorer() - Constructor for class org.archive.modules.recrawl.ContentDigestHistoryStorer
-
- ContentExtractor - Class in org.archive.modules.extractor
-
Extracts link from the fetched content of a URI, as opposed to its headers.
- ContentExtractor() - Constructor for class org.archive.modules.extractor.ContentExtractor
-
- ContentExtractorTestBase - Class in org.archive.modules.extractor
-
Abstract base class for unit testing ContentExtractor implementations.
- ContentExtractorTestBase() - Constructor for class org.archive.modules.extractor.ContentExtractorTestBase
-
- ContentLengthDecideRule - Class in org.archive.modules.deciderules
-
- ContentLengthDecideRule() - Constructor for class org.archive.modules.deciderules.ContentLengthDecideRule
-
Usual constructor.
- contentSinceCheck - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- contentTypeMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
This list is grouped in pairs.
- ContentTypeMatchesRegexDecideRule - Class in org.archive.modules.deciderules
-
DecideRule whose decision is applied if the URI's content-type
is present and matches the supplied regular expression.
- ContentTypeMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule
-
- ContentTypeNotMatchesRegexDecideRule - Class in org.archive.modules.deciderules
-
DecideRule whose decision is applied if the URI's content-type
is present and does not match the supplied regular expression.
- ContentTypeNotMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.ContentTypeNotMatchesRegexDecideRule
-
- controlConversation - Variable in class org.archive.net.ClientFTP
-
- controller - Variable in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
-
- controller - Variable in class org.archive.crawler.framework.CheckpointService
-
- controller - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- controller - Variable in class org.archive.crawler.framework.ToePool
-
- controller - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- controller - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- controller - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- controller - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- controller - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- Cookie - Class in org.apache.commons.httpclient
-
HTTP "magic-cookie" represents a piece of state information
that the HTTP agent and the target server can exchange to maintain
a session.
- Cookie() - Constructor for class org.apache.commons.httpclient.Cookie
-
Default constructor.
- Cookie(String, String, String) - Constructor for class org.apache.commons.httpclient.Cookie
-
Creates a cookie with the given name, value and domain attribute.
- Cookie(String, String, String, String, Date, boolean) - Constructor for class org.apache.commons.httpclient.Cookie
-
Creates a cookie with the given name, value, domain attribute,
path attribute, expiration attribute, and secure attribute
- Cookie(String, String, String, String, int, boolean) - Constructor for class org.apache.commons.httpclient.Cookie
-
Creates a cookie with the given name, value, domain attribute,
path attribute, maximum age attribute, and secure attribute
- COOKIEDB_NAME - Static variable in class org.archive.modules.fetcher.BdbCookieStorage
-
- cookiesLoadFile - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
-
- CookieSpec - Interface in org.apache.commons.httpclient.cookie
-
Defines the cookie management specification.
- CookieSpecBase - Class in org.apache.commons.httpclient.cookie
-
Cookie management functions shared by all specification.
- CookieSpecBase() - Constructor for class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Default constructor
- cookiesSaveFile - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
-
- CookieStorage - Interface in org.archive.modules.fetcher
-
- cookieStorage - Variable in class org.archive.modules.fetcher.FetchHTTP
-
- copy(CrawlJob, File, boolean) - Method in class org.archive.crawler.framework.Engine
-
Copy a job to a new location, possibly making a job
a profile or a profile a runnable job.
- copy(CrawlJob, String, boolean) - Method in class org.archive.crawler.framework.Engine
-
Copy a job to a new location, possibly making a job
a profile or a profile a runnable job.
- copyForwardWriteTagIfDupe(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
If this fetch is identical to the last written (archived) fetch, then
copy forward the writeTag.
- copyJob(String, boolean) - Method in class org.archive.crawler.restlet.JobResource
-
- copyPersistSourceToHistoryMap(File, StoredSortedMap<String, Map>) - Static method in class org.archive.modules.recrawl.PersistProcessor
-
Populates a given StoredSortedMap (history map) from an old
environment db or a persist log.
- copyPersistSourceToHistoryMap(URL, StoredSortedMap<String, Map>) - Static method in class org.archive.modules.recrawl.PersistProcessor
-
Populates a given StoredSortedMap (history map) from an old persist log.
- CoreAttributeConstants - Interface in org.archive.modules
-
Attribute keys and constant strings used by the core crawler
classes.
- CostAssignmentPolicy - Class in org.archive.crawler.frontier
-
Calculate a integer 'cost' value for the given CrawlURI.
- CostAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.CostAssignmentPolicy
-
- costCount - Variable in class org.archive.crawler.frontier.WorkQueue
-
Total number of items charged against queue; with totalExpenditure
can be used to calculate 'average cost'.
- costOf(CrawlURI) - Method in class org.archive.crawler.frontier.AntiCalendarCostAssignmentPolicy
-
- costOf(CrawlURI) - Method in class org.archive.crawler.frontier.CostAssignmentPolicy
-
- costOf(CrawlURI) - Method in class org.archive.crawler.frontier.UnitCostAssignmentPolicy
-
- costOf(CrawlURI) - Method in class org.archive.crawler.frontier.WagCostAssignmentPolicy
-
Add constant penalties for certain features of URI (and
its 'via') that make it more delayable/skippable.
- costOf(CrawlURI) - Method in class org.archive.crawler.frontier.ZeroCostAssignmentPolicy
-
- CostUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
-
UriPrecedencePolicy which sets a URI's precedence to its 'cost' -- which
simulates the in-queue sorting order in Heritrix 1.x, where cost
contributed the same bits to the queue-insert-key that precedence now does.
- CostUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.CostUriPrecedencePolicy
-
- count() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
- count - Variable in class org.archive.crawler.frontier.WorkQueue
-
Total number of stored items
- count - Variable in class org.archive.crawler.reporting.AlertThreadGroup
-
- count - Variable in class org.archive.crawler.util.BdbUriUniqFilter
-
- count - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
-
- count() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
-
- count() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
-
- count() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- count - Variable in class org.archive.util.AbstractLongFPSet
-
The current number of elements in the set
- count() - Method in class org.archive.util.AbstractLongFPSet
-
Return the number of entries in this set.
- count - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
-
- count() - Method in class org.archive.util.fingerprint.ArrayLongFPCache
-
- count() - Method in interface org.archive.util.fingerprint.LongFPSet
-
get the number of elements in the Set
- count - Variable in class org.archive.util.ObjectIdentityBdbCache
-
- count - Variable in class org.archive.util.ObjectIdentityBdbManualCache
-
- countryCodes - Variable in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
Country code name.
- Cp1252 - Class in org.archive.util.ms
-
A fast implementation of code page 1252.
- crawlCheckpoint(Object, File) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- CrawlController - Class in org.archive.crawler.framework
-
CrawlController collects all the classes which cooperate to
perform a crawl and provides a high-level interface to the
running crawl.
- CrawlController() - Constructor for class org.archive.crawler.framework.CrawlController
-
- CrawlController.State - Enum in org.archive.crawler.framework
-
- CrawlController.StopCompleteEvent - Class in org.archive.crawler.framework
-
- CrawlController.StopCompleteEvent(Object) - Constructor for class org.archive.crawler.framework.CrawlController.StopCompleteEvent
-
- crawlDelay - Variable in class org.archive.modules.net.RobotsDirectives
-
- crawledBytes - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
tally sizes novel, verified (same hash), vouched (not-modified)
- CrawledBytesHistotable - Class in org.archive.crawler.util
-
- CrawledBytesHistotable() - Constructor for class org.archive.crawler.util.CrawledBytesHistotable
-
- crawledBytesSummary() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawledURIDisregard(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawledURIFailure(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawledURINeedRetry(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawledURISuccessful(CrawlURI) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawlEmpty(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawlEnded(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- crawlEnded(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawlEnding(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawlEndTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
wall-clock time the crawl ended
- crawlerCount - Variable in class org.archive.crawler.processor.HashCrawlMapper
-
Number of crawlers among which to split up the URIs.
- CrawlerJournal - Class in org.archive.io
-
Utility class for a crawler journal/log that is compressed and
rotates by serial number at checkpoints.
- CrawlerJournal(String, String) - Constructor for class org.archive.io.CrawlerJournal
-
Create a new crawler journal at the given location
- CrawlerJournal(File) - Constructor for class org.archive.io.CrawlerJournal
-
Create a new crawler journal at the given location
- CrawlerLoggerModule - Class in org.archive.crawler.reporting
-
Module providing all expected whole-crawl logging facilities
- CrawlerLoggerModule() - Constructor for class org.archive.crawler.reporting.CrawlerLoggerModule
-
- CrawlHost - Class in org.archive.modules.net
-
Represents a single remote "host".
- CrawlHost(String) - Constructor for class org.archive.modules.net.CrawlHost
-
Create a new CrawlHost object.
- CrawlHost(String, String) - Constructor for class org.archive.modules.net.CrawlHost
-
Create a new CrawlHost object.
- CrawlJob - Class in org.archive.crawler.framework
-
CrawlJob represents a crawl configuration, including its
configuration files, instantiated/running ApplicationContext, and
disk output, potentially across multiple runs.
- CrawlJob(File) - Constructor for class org.archive.crawler.framework.CrawlJob
-
- CrawlJob.JobLogFormatter - Class in org.archive.crawler.framework
-
Formatter for job.log
- CrawlJob.JobLogFormatter() - Constructor for class org.archive.crawler.framework.CrawlJob.JobLogFormatter
-
- CrawlJobModel - Class in org.archive.crawler.restlet.models
-
- CrawlJobModel(CrawlJob, String) - Constructor for class org.archive.crawler.restlet.models.CrawlJobModel
-
- CrawlLimitEnforcer - Class in org.archive.crawler.framework
-
Bean to enforce limits on the size of a crawl in URI count,
byte count, or elapsed time.
- CrawlLimitEnforcer() - Constructor for class org.archive.crawler.framework.CrawlLimitEnforcer
-
- crawlLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- CrawlMapper - Class in org.archive.crawler.processor
-
A simple crawl splitter/mapper, dividing up CrawlURIs/CrawlURIs
between crawlers by diverting some range of URIs to local log files
(which can then be imported to other crawlers).
- CrawlMapper() - Constructor for class org.archive.crawler.processor.CrawlMapper
-
Constructor.
- CrawlMetadata - Class in org.archive.modules
-
Basic crawl metadata, as consulted by functional modules and
recorded in ARCs/WARCs.
- CrawlMetadata() - Constructor for class org.archive.modules.CrawlMetadata
-
- crawlPaused(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawlPauseStarted - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
wall-clock time of last pause, while pause in progres
- crawlPausing(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- crawlResuming(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- CrawlServer - Class in org.archive.modules.net
-
Represents a single remote "server".
- CrawlServer(String) - Constructor for class org.archive.modules.net.CrawlServer
-
Creates a new CrawlServer object.
- crawlStartTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
wall-clock time the crawl started
- CrawlStateEvent - Class in org.archive.crawler.event
-
- CrawlStateEvent(Object, CrawlController.State, String) - Constructor for class org.archive.crawler.event.CrawlStateEvent
-
- CrawlStatSnapshot - Class in org.archive.crawler.reporting
-
Frozen snapshot of a variety of crawl statistics.
- CrawlStatSnapshot() - Constructor for class org.archive.crawler.reporting.CrawlStatSnapshot
-
- CrawlStatus - Enum in org.archive.crawler.framework
-
- CrawlSummaryReport - Class in org.archive.crawler.reporting
-
The "Crawl Report", with summaries of overall crawl size.
- CrawlSummaryReport() - Constructor for class org.archive.crawler.reporting.CrawlSummaryReport
-
- crawlTotalPausedTime - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
duration tally of all time spent in paused state
- CrawlURI - Class in org.archive.modules
-
Represents a candidate URI and the associated state it
collects as it is crawled.
- CrawlURI(UURI) - Constructor for class org.archive.modules.CrawlURI
-
Create a new instance of CrawlURI from a
UURI
.
- CrawlURI(UURI, String, UURI, LinkContext) - Constructor for class org.archive.modules.CrawlURI
-
- CrawlURI.FetchType - Enum in org.archive.modules
-
- CrawlURIDispositionEvent - Class in org.archive.crawler.event
-
- CrawlURIDispositionEvent(Object, CrawlURI, CrawlURIDispositionEvent.Disposition) - Constructor for class org.archive.crawler.event.CrawlURIDispositionEvent
-
- CrawlURIDispositionEvent.Disposition - Enum in org.archive.crawler.event
-
- createCrawlURI(UURI, Link) - Method in class org.archive.modules.CrawlURI
-
Utility method for creation of CandidateURIs found extracting
links from this CrawlURI.
- createCrawlURI(UURI, Link, int, boolean) - Method in class org.archive.modules.CrawlURI
-
Utility method for creation of CandidateURIs found extracting
links from this CrawlURI.
- createdEnvironment - Variable in class org.archive.crawler.util.BdbUriUniqFilter
-
- createDiskMap(Database, StoredClassCatalog, Class) - Method in class org.archive.util.ObjectIdentityBdbCache
-
- createDiskMap(Database, StoredClassCatalog, Class) - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- createFileLogger(File, String, Logger) - Static method in class org.archive.crawler.util.LogUtils
-
Creates a file logger that use heritrix.properties file logger
configuration.
- createFormSubmissionAttempt(CrawlURI, HTMLForm, String) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- createFp(CharSequence) - Static method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
Create a fingerprint from the given key
- CreateHardLinkA(String, String, FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES) - Method in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
-
- createHostDirectory - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
Create a subdirectory named for the host in the URI.
- createInactiveQueueForPrecedence(int) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- createInactiveQueueForPrecedence(int, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
-
Optionally reuse prior data, for use when resuming from a checkpoint
- createInactiveQueueForPrecedence(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Create an inactiveQueue to hold queue names at the given precedence
- createKey(CharSequence) - Static method in class org.archive.crawler.util.BdbUriUniqFilter
-
Create fingerprint.
- createMultipleWorkQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
-
Create the single object (within which is one BDB database)
inside which all the other queues live.
- createNewJobWithDefaults(File) - Method in class org.archive.crawler.framework.Engine
-
create a new job dir and copy profile CXML into as non-profile CXML
- createPortDirectory - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
Create a subdirectory named for the port in the URI.
- createRecorder(String) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
-
Deprecated.
- createRecorder(String, String) - Static method in class org.archive.modules.extractor.ContentExtractorTestBase
-
- createRoot() - Method in class org.archive.crawler.restlet.EngineApplication
-
- createSocket() - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- createSocket(String, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- createSocket(InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- createSocket(InetAddress, int, InetAddress, int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
-
- createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
-
Attempts to get a new socket connection to the given host within the
given time limit.
- createSocket(String, int) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
-
- createSocket(String, int, InetAddress, int) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
-
- createSocket(String, int) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
-
- createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
-
- createSocket(Socket, String, int, boolean) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
-
- CreateSymbolicLinkA(String, String, FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES) - Method in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
-
- createUriSet() - Method in class org.archive.crawler.util.MemUriUniqFilter
-
- createUriSet() - Method in class org.archive.crawler.util.NoopUriUniqFilter
-
- Credential - Class in org.archive.modules.credential
-
Credential type.
- Credential() - Constructor for class org.archive.modules.credential.Credential
-
Constructor.
- credentialPrecondition(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
Consider credential preconditions.
- CredentialStore - Class in org.archive.modules.credential
-
Front door to the credential store.
- CredentialStore() - Constructor for class org.archive.modules.credential.CredentialStore
-
Constructor.
- CSS_BACKSLASH_ESCAPE - Static variable in class org.archive.modules.extractor.ExtractorCSS
-
- CSS_URI_EXTRACTOR - Static variable in class org.archive.modules.extractor.ExtractorCSS
-
CSS URL extractor pattern.
- curi - Variable in class org.archive.crawler.event.CrawlURIDispositionEvent
-
- curi - Variable in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
-
- current() - Static method in class org.archive.crawler.reporting.AlertThreadGroup
-
- current - Variable in class org.archive.crawler.util.BenchmarkUriUniqFilters
-
- currentDocsPerSecond - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- currentFps - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
-
- currentIterator - Variable in class org.archive.util.iterator.CompositeIterator
-
- currentKiBPerSec - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- currentLaunchDir - Variable in class org.archive.spring.PathSharingContext
-
- currentLaunchId - Variable in class org.archive.spring.PathSharingContext
-
- currentLaunchJobLogHandler - Variable in class org.archive.crawler.framework.CrawlJob
-
- customRobots - Variable in class org.archive.modules.net.CustomRobotsPolicy
-
textual alternate robots.txt rules to follow
- CustomRobotsPolicy - Class in org.archive.modules.net
-
Follow a custom-written robots policy, rather than the site's own declarations
Does not support overlays of different custom-robots; instead it is
recommended each custom policy be declared as a separate bean, with a
distinct name.
- CustomRobotsPolicy() - Constructor for class org.archive.modules.net.CustomRobotsPolicy
-
- customRobotstxt - Variable in class org.archive.modules.net.CustomRobotsPolicy
-
- CustomSWFTags - Class in org.archive.modules.extractor
-
Overwrite action tags, that may hold URI, to use CrawlUriSWFAction
action.
- CustomSWFTags(SWFActions) - Constructor for class org.archive.modules.extractor.CustomSWFTags
-
- d - Variable in class org.archive.util.BloomFilter64bit
-
The number of hash functions used by this filter.
- data - Variable in class org.archive.modules.CrawlURI
-
Flexible dynamic attributes list.
- data - Variable in class org.archive.modules.extractor.Link
-
Flexible dynamic attributes list.
- data - Variable in class org.archive.spring.PathSharingContext
-
- databaseConfig() - Static method in class org.archive.bdb.StoredQueue
-
A suitable DatabaseConfig for the Database backing a StoredQueue.
- dataSocket - Variable in class org.archive.net.ClientFTP
-
- db - Variable in class org.archive.bdb.DisposableStoredSortedMap
-
- db - Variable in class org.archive.util.ObjectIdentityBdbCache
-
The BDB JE database used for this instance.
- db - Variable in class org.archive.util.ObjectIdentityBdbManualCache
-
The BDB JE database used for this instance.
- deactivateQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Put the given queue on the inactiveQueues queue
- DEBUG - Static variable in class org.archive.util.BloomFilter64bit
-
- DecideResult - Enum in org.archive.modules.deciderules
-
The decision of a DecideRule.
- DecideRule - Class in org.archive.modules.deciderules
-
- DecideRule() - Constructor for class org.archive.modules.deciderules.DecideRule
-
- DecideRuledSheetAssociation - Class in org.archive.crawler.spring
-
SheetAssociation applied on the basis of DecideRules.
- DecideRuledSheetAssociation() - Constructor for class org.archive.crawler.spring.DecideRuledSheetAssociation
-
- DecideRuleSequence - Class in org.archive.modules.deciderules
-
- DecideRuleSequence() - Constructor for class org.archive.modules.deciderules.DecideRuleSequence
-
- decideToMapOutlink(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
-
- decisionFor(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
-
- decode(String) - Static method in class org.archive.util.Base32
-
Decodes the given Base32 String to a raw byte array.
- decode(int) - Static method in class org.archive.util.ms.Cp1252
-
Returns the Unicode character for the given Cp1252 byte.
- decrementQueuedCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Note that a number of queued Uris have been deleted.
- deepestUri() - Method in interface org.archive.crawler.framework.Frontier
-
Ordinal position of the 'deepest' URI eligible
for crawling.
- deepestUri() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- deepestUri - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- DEFAULT_CAPACITY - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
-
- DEFAULT_CLASS_KEY - Static variable in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
- DEFAULT_IP_WHOIS_SERVER - Static variable in class org.archive.modules.fetcher.FetchWhois
-
- DEFAULT_LOWER_BOUND - Static variable in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
-
Default lower bound
- DEFAULT_LOWER_BOUND - Static variable in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
-
Default lower bound
- DEFAULT_MAX_PENDING - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- DEFAULT_PARAMETERS - Static variable in class org.archive.modules.extractor.Extractor
-
- DEFAULT_REPLICAS - Static variable in class org.archive.util.LongToIntConsistentHash
-
- DEFAULT_SMEAR - Static variable in class org.archive.util.fingerprint.ArrayLongFPCache
-
- DEFAULT_TEST_TMP_DIR - Static variable in class org.archive.util.TmpDirTestCase
-
Default test tmp.
- DEFAULT_TOE_PRIORITY - Static variable in class org.archive.crawler.framework.ToePool
-
run worker thread slightly lower than usual
- DEFAULT_UPPER_BOUND - Static variable in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
-
Default upper bound
- DEFAULT_UPPER_BOUND - Static variable in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
-
Default upper bound
- DefaultBlockFileSystem - Class in org.archive.util.ms
-
Default implementation of the Block File System.
- DefaultBlockFileSystem(SeekInputStream, int) - Constructor for class org.archive.util.ms.DefaultBlockFileSystem
-
Constructor.
- DefaultServerCache - Class in org.archive.modules.fetcher
-
Server and Host cache.
- DefaultServerCache() - Constructor for class org.archive.modules.fetcher.DefaultServerCache
-
Constructor.
- DefaultServerCache(ObjectIdentityCache<CrawlServer>, ObjectIdentityCache<CrawlHost>) - Constructor for class org.archive.modules.fetcher.DefaultServerCache
-
- DefaultTempDirProvider - Class in org.archive.modules.net
-
- DefaultTempDirProvider() - Constructor for class org.archive.modules.net.DefaultTempDirProvider
-
- defaultUpdateDescriptor(PropertyDescriptor) - Method in class org.archive.crawler.restlet.JobRelatedResource
-
- defaultURI() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
-
Returns a CrawlURI for testing purposes.
- deferOrFinishGeneric(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchWhois
-
- deferredWrite - Variable in class org.archive.bdb.BdbModule.BdbConfig
-
- degree - Variable in class st.ata.util.FPGenerator
-
The number of bits in fingerprints generated by
this
.
- delaySeconds - Variable in class org.archive.crawler.framework.ActionDirectory
-
delay between scans of actionDirectory for new files
- delete(CrawlURI) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Delete the given CrawlURI from persistent store.
- deleted(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
-
Notify Frontier that a CrawlURI has been deleted outside of the
normal next()/finished() lifecycle.
- deleted(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Force logging, etc.
- deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.BdbWorkQueue
-
- deleteItem(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
-
Removes the given item from the queue.
- deleteJob(CrawlJob) - Method in class org.archive.crawler.framework.Engine
-
- deleteMatching(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
-
Delete URIs matching the given pattern from this queue.
- deleteMatchingFromQueue(String, String, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Delete all CrawlURIs matching the given expression.
- deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.BdbWorkQueue
-
- deleteMatchingFromQueue(WorkQueueFrontier, String) - Method in class org.archive.crawler.frontier.WorkQueue
-
Delete URIs matching the given pattern from this queue.
- deleteSheet(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Delete a named sheet from all associations and the master named
sheets map.
- deleteURIs(String, String) - Method in interface org.archive.crawler.framework.Frontier
-
Delete any URI that matches the given regular expression from the list
of discovered and pending URIs.
- deleteURIs(String, String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- dequeue(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
-
Remove the peekItem from the queue and adjusts the count.
- desc - Variable in enum org.archive.crawler.framework.CrawlStatus
-
- description - Variable in class org.archive.modules.CrawlMetadata
-
- DescriptorUpdater - Interface in org.archive.crawler.restlet
-
- destroy() - Method in class org.archive.bdb.BdbModule
-
- destroy() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- destroy() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- destroy() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- detach(CrawlURI) - Method in class org.archive.modules.credential.Credential
-
Detach this credential from passed curi.
- detachAll(CrawlURI) - Method in class org.archive.modules.credential.Credential
-
Detach all credentials of this type from passed curi.
- determineRootRef(Request) - Method in class org.archive.crawler.restlet.EnhDirectory
-
- digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchDNS
-
Which algorithm (for example MD5 or SHA-1) to use to perform an
on-the-fly digest hash of retrieved content-bodies.
- digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchFTP
-
Which algorithm (for example MD5 or SHA-1) to use to perform an
on-the-fly digest hash of retrieved content-bodies.
- digestAlgorithm - Variable in class org.archive.modules.fetcher.FetchHTTP
-
Which algorithm (for example MD5 or SHA-1) to use to perform an
on-the-fly digest hash of retrieved content-bodies.
- dir - Variable in class org.archive.bdb.BdbModule
-
- directory - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
- directoryFile - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
Implicitly append this to a URI ending with '/'.
- dirResource - Variable in class org.archive.crawler.restlet.EditRepresentation
-
- dirResource - Variable in class org.archive.crawler.restlet.PagedRepresentation
-
wrapped EnhDirectoryResource; used to formulate self-links
- dirtyItems - Variable in class org.archive.util.ObjectIdentityBdbManualCache
-
- dirtyKey(String) - Method in class org.archive.util.ObjectIdentityBdbCache
-
- dirtyKey(String) - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- dirtyKey(String) - Method in interface org.archive.util.ObjectIdentityCache
-
force the persistent backend, if any, to eventually be updated with
live object state for the given key
- dirtyKey(String) - Method in class org.archive.util.ObjectIdentityMemCache
-
- disallows - Variable in class org.archive.modules.net.RobotsDirectives
-
- disconnect() - Method in class org.archive.net.ClientFTP
-
- discoveredUriCount() - Method in interface org.archive.crawler.framework.Frontier
-
Number of discovered URIs.
- discoveredUriCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
(non-Javadoc)
- discoveredUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- DiskFPMergeUriUniqFilter - Class in org.archive.crawler.util
-
Crude FPMergeUriUniqFilter using a disk data file of raw longs as the
overall FP record.
- DiskFPMergeUriUniqFilter(File) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter
-
- DiskFPMergeUriUniqFilter.DataFileLongIterator - Class in org.archive.crawler.util
-
- DiskFPMergeUriUniqFilter.DataFileLongIterator(DataInputStream) - Constructor for class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
-
Construct a long iterator reading from the given
stream.
- diskMap - Variable in class org.archive.util.ObjectIdentityBdbCache
-
The Collection view of the BDB JE database used for this instance.
- diskMap - Variable in class org.archive.util.ObjectIdentityBdbManualCache
-
The Collection view of the BDB JE database used for this instance.
- DiskSpaceMonitor - Class in org.archive.crawler.monitor
-
Monitors the available space on the paths configured.
- DiskSpaceMonitor() - Constructor for class org.archive.crawler.monitor.DiskSpaceMonitor
-
- DisposableStoredSortedMap<K,V> - Class in org.archive.bdb
-
TempStoredSortedMap remembers its backing Database, and offers
a dispose() method for closing/discarding the underlying Database.
- DisposableStoredSortedMap(Database, EntryBinding<K>, EntityBinding<V>, boolean) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
-
- DisposableStoredSortedMap(Database, EntryBinding<K>, EntityBinding<V>, PrimaryKeyAssigner) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
-
- DisposableStoredSortedMap(Database, EntryBinding<K>, EntryBinding<V>, boolean) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
-
- DisposableStoredSortedMap(Database, EntryBinding<K>, EntryBinding<V>, PrimaryKeyAssigner) - Constructor for class org.archive.bdb.DisposableStoredSortedMap
-
- dispose() - Method in class org.archive.bdb.DisposableStoredSortedMap
-
- disposition - Variable in class org.archive.crawler.event.CrawlURIDispositionEvent
-
- dispositionChain - Variable in class org.archive.crawler.framework.CrawlController
-
Disposition chain
- DispositionChain - Class in org.archive.modules
-
- DispositionChain() - Constructor for class org.archive.modules.DispositionChain
-
- dispositionInProgressLock - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
lock allowing steps of outside processing that need to complete
all-or-nothing to signal their in-progress status
- dispositionPending - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
remembers a disposition-in-progress, so that extra endDisposition()
calls are harmless
- DispositionProcessor - Class in org.archive.crawler.postprocessor
-
A step, late in the processing of a CrawlURI, for marking-up the
CrawlURI with values to affect frontier disposition, and updating
information that may have been affected by the fetch.
- DispositionProcessor() - Constructor for class org.archive.crawler.postprocessor.DispositionProcessor
-
- disregardedUriCount() - Method in interface org.archive.crawler.framework.Frontier
-
Number of URIs that were scheduled at one point but have been
disregarded.
- disregardedUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
URIs that are disregarded (for example because of robot.txt rules
- disregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- diversionDir - Variable in class org.archive.crawler.processor.CrawlMapper
-
Directory to write diversion logs.
- diversionLogs - Variable in class org.archive.crawler.processor.CrawlMapper
-
Mapping of target crawlers to logs (PrintWriters)
- divertLog(CrawlURI, String) - Method in class org.archive.crawler.processor.CrawlMapper
-
Note the given CrawlURI in the appropriate diversion log.
- DNSJavaUtil - Class in org.archive.util
-
Utility methods based on DNSJava.
- doAbort(CrawlURI, HttpMethod, String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- Doc - Class in org.archive.util.ms
-
Reads .doc files.
- doCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
-
- doCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
-
Do the actual checkpoint.
- doCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
-
- doCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- doCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
Run checkpointing.
- doCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- doCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- doCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- doCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
-
- doCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
-
- doCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
-
- doCheckpoint(Checkpoint) - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- doCheckpoint(Checkpoint) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- docsPerSecond - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- document - Variable in class org.archive.modules.extractor.PDFParser
-
- DOCUMENT_BUILDER - Static variable in class org.archive.crawler.migrate.MigrateH1to3Tool
-
- documentReader - Variable in class org.archive.modules.extractor.PDFParser
-
- doJournalAdded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- doJournalDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- doJournalEmitted(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- doJournalFinishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- doJournalFinishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- doJournalReenqueued(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- doJournalRelocated(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- domain - Variable in class org.archive.modules.credential.Credential
-
The root domain this credential goes against: E.g.
- DOMAIN_OVERBOUNDS - Static variable in class org.apache.commons.httpclient.Cookie
-
Character which, if appended to end of a domain, will give a
boundary key that sorts past all Cookie sortKeys for the same
domain.
- domainMatch(String, String) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Performs domain-match as defined by the cookie specification.
- domainMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Performs domain-match as implemented in common browsers.
- domainMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- doneDir - Variable in class org.archive.crawler.framework.ActionDirectory
-
- doRecover() - Method in class org.archive.bdb.BdbModule
-
- doStripRegexMatch(String, String) - Method in class org.archive.modules.canonicalize.BaseRule
-
Run a regex that strips elements of a string.
- dotBegin - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
If a segment starts with '.', the '.' is replaced by this.
- doTeardown() - Method in class org.archive.crawler.framework.CrawlJob
-
- dotEnd - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
If a directory name ends with '.' it is replaced by this.
- doubleToString(double, int) - Method in class org.archive.crawler.restlet.models.CrawlJobModel
-
- downloadDisregards - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- downloadedUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- downloadFailures - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- dropboxes - Static variable in class org.archive.crawler.restlet.Flash
-
- dumpAllPendingToLog() - Method in class org.archive.crawler.frontier.BdbFrontier
-
Dump all still-enqueued URIs to the crawl.log -- without actually
dequeuing.
- dumpPendingAtClose - Variable in class org.archive.crawler.frontier.BdbFrontier
-
- dumpReports() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Run the reports.
- dumpSurtPrefixSet() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
Dump the current prefixes in use to configured dump file (if any)
- dupByHashBytes - Variable in class org.archive.modules.fetcher.FetchStats
-
- dupByHashUrls - Variable in class org.archive.modules.fetcher.FetchStats
-
- DUPLICATE - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
-
- DUPLICATECOUNT - Static variable in class org.archive.crawler.util.CrawledBytesHistotable
-
- duplicateCount - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- duplicatesAtLastSample - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- EDIT_FILTER - Static variable in class org.archive.crawler.restlet.JobResource
-
- EDIT_FILTER - Static variable in class org.archive.crawler.restlet.models.CrawlJobModel
-
- editFilter - Variable in class org.archive.crawler.restlet.EnhDirectory
-
- EditRepresentation - Class in org.archive.crawler.restlet
-
Representation wrapping a FileRepresentation, displaying its contents
in a TextArea for editting.
- EditRepresentation(FileRepresentation, EnhDirectoryResource) - Constructor for class org.archive.crawler.restlet.EditRepresentation
-
- elapsedMilliseconds - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- elapsedReport() - Method in class org.archive.crawler.framework.CrawlJob
-
- elapsedReportData() - Method in class org.archive.crawler.framework.CrawlJob
-
- elementContext(CharSequence, CharSequence) - Static method in class org.archive.modules.extractor.ExtractorHTML
-
Create a suitable XPath-like context from an element name and optional
attribute name.
- EMBED_MISC - Static variable in class org.archive.modules.extractor.LinkContext
-
Stand-in value for embeds without other context.
- emitBumper(PrintWriter, boolean) - Method in class org.archive.crawler.restlet.PagedRepresentation
-
Emit a "start" or "EOF" bumper as appropriate to prominently
indicate if page borders start- or end- of-file.
- emitControls(PrintWriter) - Method in class org.archive.crawler.restlet.PagedRepresentation
-
Emit the navigational controls.
- emitted(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
-
- EMPTY - Static variable in class org.archive.util.AbstractLongFPSet
-
A constant used to indicate that a slot in the set storage is empty.
- empty - Variable in class st.ata.util.FPGenerator
-
Fingerprint of the empty string of bytes.
- encode(byte[]) - Static method in class org.archive.util.Base32
-
Encodes byte array to Base32 String.
- EncodingUtil - Class in org.apache.commons.httpclient.util
-
The home for utility methods that handle various encoding tasks.
- encounteredReferences - Variable in class org.archive.modules.extractor.PDFParser
-
- endDisposition() - Method in interface org.archive.crawler.framework.Frontier
-
Inform frontier the processing signalled by an earlier pending
beginDisposition() call has finished.
- endDisposition() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- Engine - Class in org.archive.crawler.framework
-
Implementation for Engine.
- Engine(File) - Constructor for class org.archive.crawler.framework.Engine
-
- engine - Variable in class org.archive.crawler.Heritrix
-
- engine - Variable in class org.archive.crawler.restlet.EngineApplication
-
- EngineApplication - Class in org.archive.crawler.restlet
-
Restlet Application for a Heritrix crawl 'Engine', which is aware of
local job configurations/directories and can assemble/launch/monitor/
manage crawls.
- EngineApplication(Engine) - Constructor for class org.archive.crawler.restlet.EngineApplication
-
- EngineApplication.EngineStatusService - Class in org.archive.crawler.restlet
-
Customize Restlet error to include back button and full stack.
- EngineApplication.EngineStatusService() - Constructor for class org.archive.crawler.restlet.EngineApplication.EngineStatusService
-
- EngineModel - Class in org.archive.crawler.restlet.models
-
- EngineModel(Engine, String) - Constructor for class org.archive.crawler.restlet.models.EngineModel
-
- engineName - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
-
engine name; default "beanshell"
- engineName - Variable in class org.archive.modules.ScriptedProcessor
-
engine name; default "beanshell"
- EngineResource - Class in org.archive.crawler.restlet
-
Restlet Resource representing an Engine that may be used
to assemble, launch, monitor, and manage crawls.
- EngineResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.EngineResource
-
- EnhancedEnvironment - Class in org.archive.util.bdbje
-
Version of BDB_JE Environment with additional convenience features, such as
a shared, cached StoredClassCatalog.
- EnhancedEnvironment(File, EnvironmentConfig) - Constructor for class org.archive.util.bdbje.EnhancedEnvironment
-
Constructor
- EnhDirectory - Class in org.archive.crawler.restlet
-
Enhanced version of Restlet Directory, which allows the local
filesystem directory to be determined dynamically based on the
request details.
- EnhDirectory(Context, Reference) - Constructor for class org.archive.crawler.restlet.EnhDirectory
-
- EnhDirectory(Context, String) - Constructor for class org.archive.crawler.restlet.EnhDirectory
-
- EnhDirectoryResource - Class in org.archive.crawler.restlet
-
Enhanced version of Restlet DirectoryResource, adding ability to
edit some files.
- EnhDirectoryResource(EnhDirectory, Request, Response) - Constructor for class org.archive.crawler.restlet.EnhDirectoryResource
-
- enqueue(WorkQueueFrontier, CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueue
-
Add the given CrawlURI, noting its addition in running count.
- enqueueCount - Variable in class org.archive.crawler.frontier.WorkQueue
-
Total number of items ever enqueued
- enqueuedCounts - Variable in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
-
- ensureStandardPoliciesAvailable() - Method in class org.archive.modules.CrawlMetadata
-
- ensureStaticInitialization() - Static method in class org.archive.crawler.reporting.AlertHandler
-
Simply to ensure static initialization (installing catchall
handler on topmost logger) is run.
- Entry - Interface in org.archive.util.ms
-
- Entry.EntryType - Enum in org.archive.util.ms
-
- entryString(Object) - Static method in class org.archive.util.Histotable
-
Utility method to convert a key->Long into
the string "count key".
- entryToObject(DatabaseEntry) - Method in class org.archive.bdb.KryoBinding
-
- equals(Object) - Method in class org.apache.commons.httpclient.Cookie
-
Two cookies are equal if the name, path and domain match.
- equals(Object) - Method in class org.archive.modules.extractor.Link
-
- equals(Object) - Method in class org.archive.modules.extractor.LinkContext
-
- equals(Object) - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
-
All instances of DefaultProtocolSocketFactory are the same.
- equals(Object) - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
-
- equals(Object) - Method in class org.archive.modules.net.CrawlHost
-
- equals(Object) - Method in class org.archive.modules.net.CrawlServer
-
- errorCount - Variable in class org.archive.crawler.frontier.WorkQueue
-
count of errors encountered
- errorMessage - Variable in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
-
- escape(String) - Static method in class org.archive.util.JavaLiterals
-
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.AddRedirectFromRootServerToScope
-
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ContentTypeNotMatchesRegexDecideRule
-
Evaluate whether given object's string version does not match
configured regex (by reversing the superclass's answer).
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusDecideRule
-
Evaluate whether given object is equal to the configured status
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusNotMatchesRegexDecideRule
-
Evaluate whether given object's FetchStatus does not match
configured regex (by reversing the superclass's answer).
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.HasViaDecideRule
-
Evaluate whether given object is over the threshold number of
hops.
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.HopCrossesAssignmentLevelDomainDecideRule
-
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
-
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
-
Evaluate whether given object's string version
matches configured regexes
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
-
Evaluate whether given object's string version
matches configured regex
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
-
Returns "true" if the provided CrawlURI has a fetch status that falls
within this instance's specified range.
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesFilePatternDecideRule
-
Evaluate whether given object's string version does not match
configured regex (by reversing the superclass's answer).
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesListRegexDecideRule
-
Evaluate whether given object's string version does not match
configured regexs (by reversing the superclass's answer).
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesRegexDecideRule
-
Evaluate whether given object's string version does not match
configured regex (by reversing the superclass's answer).
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
-
Returns "true" if the provided CrawlURI has a fetch status that does not
fall within this instance's specified range.
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.PredicatedDecideRule
-
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
-
Evaluate whether given CrawlURI's content-digest exactly
matches that of preceding fetch.
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
-
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
-
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
-
Evaluate whether given object is over the threshold number of
hops.
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotOnDomainsDecideRule
-
Evaluate whether given object's URI is NOT in the set of
domains -- simply reverse superclass's determination
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotOnHostsDecideRule
-
Evaluate whether given object's URI is NOT in the set of
hosts -- simply reverse superclass's determination
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.NotSurtPrefixedDecideRule
-
Evaluate whether given object's URI is NOT in the SURT
prefix set -- simply reverse superclass's determination
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
Evaluate whether given object's URI is covered by the SURT prefix set
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TooManyHopsDecideRule
-
Evaluate whether given object is over the threshold number of
hops.
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
-
Evaluate whether given object is over the threshold number of
path-segments.
- evaluate(CrawlURI) - Method in class org.archive.modules.deciderules.TransclusionDecideRule
-
Evaluate whether given object is within the acceptable thresholds of
transitive hops.
- exactKey(String) - Static method in class org.archive.surt.SURTTokenizer
-
- execute(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Executes this method using the specified HttpConnection
and
HttpState
.
- execute(ScriptEngine, String) - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- executor - Variable in class org.archive.crawler.framework.ActionDirectory
-
- executor - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- expectedConcurrency - Variable in class org.archive.bdb.BdbModule
-
Expected number of concurrent threads; used to tune nLockTables
according to JE FAQ
http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33
- expectedInserts - Variable in class org.archive.util.BloomFilter64bit
-
The expected number of inserts; determines calculated size
- expectedResult - Variable in class org.archive.modules.extractor.StringExtractorTestBase.TestData
-
- expend(int) - Method in class org.archive.crawler.frontier.WorkQueue
-
Decrease the internal running budget by the given amount.
- expenditureAtLastActivation - Variable in class org.archive.crawler.frontier.WorkQueue
-
Record of expenditures at last activation (session start)
- expirationOperation - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
The action that the processor takes once the runtime has elapsed.
- extend(long, byte) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding the low eight
bits of "b".
- extend(long, char) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding (all bits of)
"v".
- extend(long, int) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding (all bits of)
"v".
- extend(long, long) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding (all bits of)
"v".
- extend(long, byte[], int, int) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding "n" bytes of
"buf" starting from "buf[start]".
- extend(long, char[], int, int) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding (all bits of) "n"
characters of "buf" starting from "buf[i]".
- extend(long, CharSequence) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding (all bits of)
the characters of "s".
- extend(long, int[], int, int) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding (all bits of) "n"
characters of "buf" starting from "buf[i]".
- extend(long, long[], int, int) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding (all bits of) "n"
characters of "buf" starting from "buf[i]".
- extend8(long, String) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding the lower eight
bits of the characters of "s".
- extend8(long, char[], int, int) - Method in class st.ata.util.FPGenerator
-
Extends fingerprint f
by adding the lower eight
bits of "n" characters of "buf" starting from "buf[i]".
- extend_byte(long, int) - Method in class st.ata.util.FPGenerator
-
Extends f
with lower eight bits of v
without full reduction.
- extend_char(long, int) - Method in class st.ata.util.FPGenerator
-
Extends f
with lower sixteen bits of v
.
- extend_int(long, int) - Method in class st.ata.util.FPGenerator
-
Extends f
with (all bits of) v
.
- extend_long(long, long) - Method in class st.ata.util.FPGenerator
-
Extends f
with v
.
- extendHopsPath(String, char) - Static method in class org.archive.modules.CrawlURI
-
Extend a 'hopsPath' (pathFromSeed string of single-character hop-type symbols),
keeping the number of displayed hop-types under MAX_HOPS_DISPLAYED.
- ExternalGeoLocationDecideRule - Class in org.archive.modules.deciderules
-
A rule that can be configured to take alternate implementations
of the ExternalGeoLocationInterface.
- ExternalGeoLocationDecideRule() - Constructor for class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- ExternalGeoLookupInterface - Interface in org.archive.modules.deciderules
-
Interface used by ExternalImplDecideRule
.
- externalPaths - Variable in class org.archive.spring.KeyedProperties
-
the alternate global property-paths leading to this map
TODO: consider if deterministic ordered list is important
- extract(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
-
Extracts links
- extract(CrawlURI) - Method in class org.archive.modules.extractor.Extractor
-
Extracts links from the given URI.
- extract(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
-
Run extractor.
- extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTTP
-
- extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
-
Perform usual extraction on a CrawlURI
- extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
- extract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorURI
-
Perform usual extraction on a CrawlURI
- extract(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
-
Run extractor.
- extract(CrawlURI) - Method in class org.archive.modules.forms.ExtractorHTMLForms
-
- extractImplied(CharSequence, Pattern, String) - Static method in class org.archive.modules.extractor.ExtractorImpliedURI
-
Utility method for extracting 'implied' URI given a source uri,
trigger pattern, and build pattern.
- extractLink(CrawlURI, Link) - Method in class org.archive.modules.extractor.ExtractorURI
-
Consider a single Link for internal URIs
- extractor - Variable in class org.archive.modules.extractor.ContentExtractorTestBase
-
An extractor created during the setUp.
- Extractor - Class in org.archive.modules.extractor
-
Extracts links from fetched URIs.
- Extractor() - Constructor for class org.archive.modules.extractor.Extractor
-
- ExtractorCSS - Class in org.archive.modules.extractor
-
This extractor is parsing URIs from CSS type files.
- ExtractorCSS() - Constructor for class org.archive.modules.extractor.ExtractorCSS
-
- ExtractorDOC - Class in org.archive.modules.extractor
-
This class allows the caller to extract href style links from word97-format word documents.
- ExtractorDOC() - Constructor for class org.archive.modules.extractor.ExtractorDOC
-
- ExtractorHTML - Class in org.archive.modules.extractor
-
Basic link-extraction, from an HTML content-body,
using regular expressions.
- ExtractorHTML() - Constructor for class org.archive.modules.extractor.ExtractorHTML
-
- ExtractorHTMLForms - Class in org.archive.modules.forms
-
Extracts extra information about FORMs in HTML, loading this
into the CrawlURI (for potential later use by FormLoginProcessor)
and adding a small annotation to the crawl.log.
- ExtractorHTMLForms() - Constructor for class org.archive.modules.forms.ExtractorHTMLForms
-
- ExtractorHTTP - Class in org.archive.modules.extractor
-
Extracts URIs from HTTP response headers.
- ExtractorHTTP() - Constructor for class org.archive.modules.extractor.ExtractorHTTP
-
- ExtractorImpliedURI - Class in org.archive.modules.extractor
-
An extractor for finding 'implied' URIs inside other URIs.
- ExtractorImpliedURI() - Constructor for class org.archive.modules.extractor.ExtractorImpliedURI
-
Constructor.
- extractorJS - Variable in class org.archive.modules.extractor.ExtractorHTML
-
Javascript extractor to use to process inline javascript.
- ExtractorJS - Class in org.archive.modules.extractor
-
Processes Javascript files for strings that are likely to be
crawlable URIs.
- ExtractorJS() - Constructor for class org.archive.modules.extractor.ExtractorJS
-
- extractorJS - Variable in class org.archive.modules.extractor.ExtractorSWF
-
Javascript extractor to use to process inline javascript.
- ExtractorMultipleRegex - Class in org.archive.modules.extractor
-
An extractor that uses regular expressions to find strings in the fetched
content of a URI, and constructs outlink URIs from those strings.
- ExtractorMultipleRegex() - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex
-
- ExtractorMultipleRegex.GroupList - Class in org.archive.modules.extractor
-
- ExtractorMultipleRegex.GroupList(MatchResult) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.GroupList
-
- ExtractorMultipleRegex.MatchList - Class in org.archive.modules.extractor
-
- ExtractorMultipleRegex.MatchList(String, CharSequence) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.MatchList
-
- ExtractorMultipleRegex.MatchList(ExtractorMultipleRegex.GroupList...) - Constructor for class org.archive.modules.extractor.ExtractorMultipleRegex.MatchList
-
- extractorParameters - Variable in class org.archive.modules.extractor.Extractor
-
- ExtractorParameters - Interface in org.archive.modules.extractor
-
Bean interface for parameters consulted by multiple Extractors, and
thus provided by some shared object.
- ExtractorPDF - Class in org.archive.modules.extractor
-
Allows the caller to process a CrawlURI representing a PDF
for the purpose of extracting URIs
- ExtractorPDF() - Constructor for class org.archive.modules.extractor.ExtractorPDF
-
- ExtractorSWF - Class in org.archive.modules.extractor
-
Extracts URIs from SWF (flash/shockwave) files.
- ExtractorSWF() - Constructor for class org.archive.modules.extractor.ExtractorSWF
-
- ExtractorSWF.CrawlUriSWFAction - Class in org.archive.modules.extractor
-
SWF action that handles discovered URIs.
- ExtractorSWF.CrawlUriSWFAction(CrawlURI, Extractor) - Constructor for class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
-
- ExtractorSWF.ExtractorTagParser - Class in org.archive.modules.extractor
-
TagParser customized to ignore SWFTags that
will never contain extractable URIs.
- ExtractorSWF.ExtractorTagParser(SWFTagTypes) - Constructor for class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- ExtractorUniversal - Class in org.archive.modules.extractor
-
A last ditch extractor that will look at the raw byte code and try to extract
anything that looks like a link.
- ExtractorUniversal() - Constructor for class org.archive.modules.extractor.ExtractorUniversal
-
Constructor.
- ExtractorURI - Class in org.archive.modules.extractor
-
An extractor for finding URIs inside other URIs.
- ExtractorURI() - Constructor for class org.archive.modules.extractor.ExtractorURI
-
Constructor
- ExtractorXML - Class in org.archive.modules.extractor
-
A simple extractor which finds HTTP URIs inside XML/RSS files,
inside attribute values and simple elements (those with only
whitespace + HTTP URI + whitespace as contents).
- ExtractorXML() - Constructor for class org.archive.modules.extractor.ExtractorXML
-
- extractQueryStringLinks(UURI) - Static method in class org.archive.modules.extractor.ExtractorURI
-
Look for URIs inside the supplied UURI.
- extractURIs() - Method in class org.archive.modules.extractor.PDFParser
-
Extract URIs from all objects found in a Pdf document's catalog.
- extractURIs(PdfObject) - Method in class org.archive.modules.extractor.PDFParser
-
Parse a PdfDictionary, looking for URIs recursively and adding
them to foundURIs
- extraInfo - Variable in class org.archive.modules.CrawlURI
-
- F_ADD - Static variable in class org.archive.crawler.frontier.FrontierJournal
-
- F_DISREGARD - Static variable in class org.archive.crawler.frontier.FrontierJournal
-
- F_EMIT - Static variable in class org.archive.crawler.frontier.FrontierJournal
-
- F_FAILURE - Static variable in class org.archive.crawler.frontier.FrontierJournal
-
- F_INCLUDE - Static variable in class org.archive.crawler.frontier.FrontierJournal
-
- F_REENQUEUED - Static variable in class org.archive.crawler.frontier.FrontierJournal
-
- F_SUCCESS - Static variable in class org.archive.crawler.frontier.FrontierJournal
-
- FACTORIES - Static variable in class org.archive.crawler.restlet.ScriptResource
-
- failedFetchCount() - Method in interface org.archive.crawler.framework.Frontier
-
Number of URIs that failed to process.
- failedFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- failedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
(non-Javadoc)
- fakeResponse(StatusLine, HeaderGroup, InputStream) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
This method is a dirty hack intended to work around
current (2.0) design flaw that prevents the user from
obtaining correct status code, headers and response body from the
preceding HTTP CONNECT method.
- fastOutputStreamHolder - Variable in class org.archive.crawler.frontier.RecyclingSerialBinding
-
Thread-local cache of reusable FastOutputStream
- fetch(CrawlURI, String, String) - Method in class org.archive.modules.fetcher.FetchWhois
-
- fetchChain - Variable in class org.archive.crawler.framework.CrawlController
-
Fetch chain
- FetchChain - Class in org.archive.modules
-
- FetchChain() - Constructor for class org.archive.modules.FetchChain
-
- fetchDisregards - Variable in class org.archive.modules.fetcher.FetchStats
-
- FetchDNS - Class in org.archive.modules.fetcher
-
Processor to resolve 'dns:' URIs.
- FetchDNS() - Constructor for class org.archive.modules.fetcher.FetchDNS
-
- FetchErrors - Class in org.archive.modules.fetcher
-
- FetchErrors() - Constructor for class org.archive.modules.fetcher.FetchErrors
-
- fetchFailures - Variable in class org.archive.modules.fetcher.FetchStats
-
- FetchFTP - Class in org.archive.modules.fetcher
-
Fetches documents and directory listings using FTP.
- FetchFTP() - Constructor for class org.archive.modules.fetcher.FetchFTP
-
Constructs a new FetchFTP
.
- FetchFTP.SocketFactoryWithTimeout - Class in org.archive.modules.fetcher
-
- FetchFTP.SocketFactoryWithTimeout() - Constructor for class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- FetchHistoryProcessor - Class in org.archive.modules.recrawl
-
Maintain a history of fetch information inside the CrawlURI's attributes.
- FetchHistoryProcessor() - Constructor for class org.archive.modules.recrawl.FetchHistoryProcessor
-
- FetchHTTP - Class in org.archive.modules.fetcher
-
- FetchHTTP() - Constructor for class org.archive.modules.fetcher.FetchHTTP
-
Constructor.
- fetchNonResponses - Variable in class org.archive.modules.fetcher.FetchStats
-
- fetchResponses - Variable in class org.archive.modules.fetcher.FetchStats
-
- FetchStats - Class in org.archive.modules.fetcher
-
Collector of statistics for a 'subset' of a crawl,
such as a server (host:port), host, or frontier group
(eg queue).
- FetchStats() - Constructor for class org.archive.modules.fetcher.FetchStats
-
- FetchStats.CollectsFetchStats - Interface in org.archive.modules.fetcher
-
- FetchStats.HasFetchStats - Interface in org.archive.modules.fetcher
-
- FetchStats.Stage - Enum in org.archive.modules.fetcher
-
- FetchStatusCodes - Interface in org.archive.modules.fetcher
-
Constant flag codes to be used, in lieu of per-protocol
codes (like HTTP's 200, 404, etc.), when network/internal/
out-of-band conditions occur.
- fetchStatusCodesToString(int) - Static method in class org.archive.modules.CrawlURI
-
Takes a status code and converts it into a human readable string.
- FetchStatusDecideRule - Class in org.archive.modules.deciderules
-
Rule applies the configured decision for any URI which has a
fetch status equal to the 'target-status' setting.
- FetchStatusDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusDecideRule
-
Usual constructor.
- FetchStatusMatchesRegexDecideRule - Class in org.archive.modules.deciderules
-
- FetchStatusMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusMatchesRegexDecideRule
-
Usual constructor.
- FetchStatusNotMatchesRegexDecideRule - Class in org.archive.modules.deciderules
-
- FetchStatusNotMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.FetchStatusNotMatchesRegexDecideRule
-
Usual constructor.
- fetchSuccesses - Variable in class org.archive.modules.fetcher.FetchStats
-
- FetchWhois - Class in org.archive.modules.fetcher
-
WHOIS Fetcher (RFC 3912).
- FetchWhois() - Constructor for class org.archive.modules.fetcher.FetchWhois
-
- FetchWhois.UrlStatus - Enum in org.archive.modules.fetcher
-
- file - Variable in class org.archive.crawler.restlet.PagedRepresentation
-
File
- fileLogger - Variable in class org.archive.crawler.framework.Scoper
-
- fileLogger - Variable in class org.archive.modules.deciderules.DecideRuleSequence
-
- filename - Variable in enum org.archive.crawler.util.Logs
-
- fileRepresentation - Variable in class org.archive.crawler.restlet.EditRepresentation
-
- fileRepresentation - Variable in class org.archive.crawler.restlet.PagedRepresentation
-
wrapped FileRepresentation
- FilesystemLinkMaker - Class in org.archive.util
-
Wrapper for platform-dependent hard link creation.
- FilesystemLinkMaker() - Constructor for class org.archive.util.FilesystemLinkMaker
-
- FilesystemLinkMaker.Kernel32Library - Interface in org.archive.util
-
- FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES - Class in org.archive.util
-
- FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES() - Constructor for class org.archive.util.FilesystemLinkMaker.Kernel32Library.LPSECURITY_ATTRIBUTES
-
- fillWith(CrawlURI, String) - Method in class org.archive.crawler.reporting.SeedRecord
-
Fill instance with given values; skips makeDirty so may be used
on initialization.
- finalize() - Method in class org.archive.util.ObjectIdentityBdbCache
-
- finalize() - Method in class org.archive.util.ObjectIdentityBdbCache.LowMemoryCanary
-
When collected/finalized -- as should be expected in
low-memory conditions -- trigger an expunge and a
new 'canary' insertion.
- finalize() - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- finalTasks() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Perform any tasks necessary before entering
FINISH frontier state/FINISHED crawl state
- finalTasks() - Method in class org.archive.crawler.frontier.BdbFrontier
-
- find(SortedSet<String>, String) - Static method in class org.archive.util.PrefixFinder
-
Extracts prefixes of a given string from a SortedSet.
- findAttributeValueGroup(String, int, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
-
- findAvailableCheckpointDirectories() - Method in class org.archive.crawler.framework.CheckpointService
-
Returns a list of available, valid (contains 'valid' file)
checkpoint directories, as File instances, with the more
recently-written appearing first.
- findEligibleURI() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Find a CrawlURI eligible to be put on the outbound queue for
processing.
- findEligibleURI() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Return the next CrawlURI eligible to be processed (and presumably
visited/fetched) by a a worker thread.
- findFirstLineBeginning(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
-
Return the line number of the first line in the
log/file that that begins with the given string.
- findFirstLineBeginningFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
-
Return the line number of the first line in the
log/file that begins with the given string.
- findFirstLineContaining(String, String) - Static method in class org.archive.crawler.util.LogReader
-
Return the line number of the first line in the
log/file that matches a given regular expression.
- findFirstLineContaining(InputStreamReader, String) - Static method in class org.archive.crawler.util.LogReader
-
Return the line number of the first line in the
log/file that matches a given regular expression.
- findFirstLineContainingFromSeries(String, String) - Static method in class org.archive.crawler.util.LogReader
-
Return the line number of the first line in the
log/file that matches a given regular expression.
- findGroups(String, int, CharSequence) - Method in class org.archive.modules.forms.ExtractorHTMLForms
-
- findJobConfigs() - Method in class org.archive.crawler.framework.Engine
-
Find all job configurations in the usual place -- subdirectories
of the jobs directory with files ending '.cxml', and from jobPathFiles
(previously added by user) found in the jobs directory
- findKeys(SortedMap<String, ?>, String) - Static method in class org.archive.util.PrefixFinder
-
- findTarget(Request, Response) - Method in class org.archive.crawler.restlet.EnhDirectory
-
- FINISH - Static variable in class org.archive.modules.ProcessResult
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
-
- finishCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
-
Cleanup/unlock; need not complete for a checkpoint to be valid.
- finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
-
- finishCheckpoint(Checkpoint) - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- finished(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
-
Report a URI being processed as having finished processing.
- finished(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Note that the previously emitted CrawlURI has completed
its processing (for now).
- finishedDisregard(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
-
- finishedFailure(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
-
- finishedSuccess(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
-
- finishedUriCount() - Method in interface org.archive.crawler.framework.Frontier
-
Number of URIs that have finished processing.
- finishedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
(non-Javadoc)
- finishedUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- finishFpMerge() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
-
- finishFpMerge() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
Complete the merge of candidate and previously-known FPs (closing
files/iterators as appropriate).
- finishFpMerge() - Method in class org.archive.crawler.util.MemFPMergeUriUniqFilter
-
- FirstNamedRobotsPolicy - Class in org.archive.modules.net
-
Working from an ordered list of potential User-Agents, consisting of first
the regularly-configured User-Agent and then those in the candidateUserAgents
list, consider each potential agent in order.
- FirstNamedRobotsPolicy() - Constructor for class org.archive.modules.net.FirstNamedRobotsPolicy
-
- fixup(String) - Method in class org.archive.crawler.reporting.HostsReport
-
- fixupConfigPath(ConfigPath, String) - Method in class org.archive.spring.ConfigPathConfigurer
-
- fixupPaths(Object, String) - Method in class org.archive.spring.ConfigPathConfigurer
-
Find any ConfigPath properties in the passed bean; ensure that
if they have a null 'base', that is replaced with the job home
directory.
- FixupQueryString - Class in org.archive.modules.canonicalize
-
Strip any trailing question mark.
- FixupQueryString() - Constructor for class org.archive.modules.canonicalize.FixupQueryString
-
- Flash - Class in org.archive.crawler.restlet
-
Utility for including a brief last-action or background-action
message on web responses.
- Flash(String) - Constructor for class org.archive.crawler.restlet.Flash
-
Create an ACK flash of default styling with the given message.
- Flash(String, Flash.Kind) - Constructor for class org.archive.crawler.restlet.Flash
-
Create a Flash of the given kind, message with default styling.
- Flash.Kind - Enum in org.archive.crawler.restlet
-
usual types
- flattenH1Order(Document) - Static method in class org.archive.crawler.migrate.MigrateH1to3Tool
-
Given a Document, return a Map of all non-blank simple text
nodes, keyed by the pseudo-XPath to their parent element.
- flattenVia() - Method in class org.archive.modules.CrawlURI
-
Method returns string version of this URI's referral URI.
- flattenVia(CrawlURI) - Static method in class org.archive.modules.Processor
-
- flush() - Method in class org.archive.crawler.reporting.AlertHandler
-
- flush() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- flush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
Perform a merge of all 'pending' items to the overall fingerprint list.
- FLUSH_DELAY_FACTOR - Static variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- flushRequestOutputStream() - Method in class org.apache.commons.httpclient.HttpConnection
-
Flushes the output request stream.
- forAllHostsDo(Closure) - Method in class org.archive.modules.fetcher.DefaultServerCache
-
NOTE: Should not mutate the CrawlHost instance so retrieved; depending on
the hostscache implementation, the change may not be reliably persistent.
- forAllHostsDo(Closure) - Method in class org.archive.modules.net.ServerCache
-
Utility for performing an action on every CrawlHost.
- forAllPendingDo(Closure) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Utility method to perform action for all pending CrawlURI instances.
- forceFetch() - Method in class org.archive.modules.CrawlURI
-
If this method returns true, this URI should be fetched even though
it already has been crawled.
- forceScarceMemory() - Static method in class org.archive.util.TestUtils
-
Temporarily exhaust memory, forcing weak/soft references to
be broken.
- forceWakeQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Utility method for advanced users/experimentation: force wake all snoozed
queues -- for example to kick a crawl where connectivity problems have
put all queues in slow-retry-snoozes back to busy-ness.
- forget(String, CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Forget item was seen
- forget(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Forget the given CrawlURI.
- forget(String, CrawlURI) - Method in class org.archive.crawler.util.BloomUriUniqFilter
-
- forget(String, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- forget(String, CrawlURI) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- forgetAllButLatest - Variable in class org.archive.checkpointing.Checkpoint
-
- forgetAllButLatest - Variable in class org.archive.crawler.framework.CheckpointService
-
- forgetAllSchemeAuthorityMatching(String) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
Forget all entries that match the scheme+host+port of the given url, so
that they can be crawled again if discovered again.
- format(LogRecord) - Method in class org.archive.crawler.framework.CrawlJob.JobLogFormatter
-
- format(LogRecord) - Method in class org.archive.crawler.io.NonFatalErrorFormatter
-
- format(LogRecord) - Method in class org.archive.crawler.io.RuntimeErrorFormatter
-
- format(LogRecord) - Method in class org.archive.crawler.io.StatisticsLogFormatter
-
- format(LogRecord) - Method in class org.archive.crawler.io.UriErrorFormatter
-
- format(LogRecord) - Method in class org.archive.crawler.io.UriProcessingFormatter
-
- format(LogRecord) - Method in class org.archive.util.OneLineSimpleLogger
-
- formatBytes(Long) - Method in class org.archive.crawler.restlet.models.CrawlJobModel
-
- formatCookie(Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Create a "Cookie" header value for an array of cookies.
- formatCookie(Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Return a string suitable for sending in a "Cookie" header
- formatCookie(Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- formatCookieHeader(Cookie[]) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Create a "Cookie" Header for an array of Cookies.
- formatCookieHeader(Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Create a "Cookie" Header for single Cookie.
- formatCookieHeader(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Create a
"Cookie" Header
containing all
Cookie
s
in
cookies.
- formatCookieHeader(Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Create a
"Cookie" Header
containing the
Cookie
.
- formatCookieHeader(Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- formatCookieHeader(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- formatCookies(Cookie[]) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Create a "Cookie" header value for an array of cookies.
- formatCookies(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Create a
"Cookie" header value containing all
Cookie
s in
cookies suitable for sending in a
"Cookie" header
- formatCookies(Cookie[]) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- formItems - Variable in class org.archive.modules.credential.HtmlFormCredential
-
Form items.
- FormLoginProcessor - Class in org.archive.modules.forms
-
A step, post-ExtractorHTMLForms, where a followup CrawlURI to
attempt a form submission may be synthesized.
- FormLoginProcessor() - Constructor for class org.archive.modules.forms.FormLoginProcessor
-
- formUrlEncode(NameValuePair[], String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
-
Form-urlencoding routine.
- foundURIs - Variable in class org.archive.modules.extractor.PDFParser
-
- fp(byte[], int, int) - Method in class st.ata.util.FPGenerator
-
Compute fingerprint of "n" bytes of "buf" starting from
"buf[start]".
- fp(char[], int, int) - Method in class st.ata.util.FPGenerator
-
Compute fingerprint of (all bits of) "n" characters of "buf"
starting from "buf[i]".
- fp(CharSequence) - Method in class st.ata.util.FPGenerator
-
Compute fingerprint of (all bits of) the characters of "s".
- fp(int[], int, int) - Method in class st.ata.util.FPGenerator
-
Compute fingerprint of (all bits of) "n" characters of "buf"
starting from "buf[i]".
- fp(long[], int, int) - Method in class st.ata.util.FPGenerator
-
Compute fingerprint of (all bits of) "n" characters of "buf"
starting from "buf[i]".
- fp8(String) - Method in class st.ata.util.FPGenerator
-
Compute fingerprint of the lower eight bits of the characters
of "s".
- fp8(char[], int, int) - Method in class st.ata.util.FPGenerator
-
Compute fingerprint of the lower eight bits of "n" characters
of "buf" starting from "buf[i]".
- FPGenerator - Class in st.ata.util
-
This class provides methods that construct fingerprints of strings
of bytes via operations in GF[2^d] for 0 < d <= 64.
- FPMergeUriUniqFilter - Class in org.archive.crawler.util
-
UriUniqFilter based on merging FP arrays (in memory or from disk).
- FPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter
-
- FPMergeUriUniqFilter.PendingItem - Class in org.archive.crawler.util
-
Represents a long fingerprint and (possibly) its corresponding
CrawlURI, awaiting the next merge in a 'pending' state.
- FPMergeUriUniqFilter.PendingItem(long, CrawlURI) - Constructor for class org.archive.crawler.util.FPMergeUriUniqFilter.PendingItem
-
- fpset - Variable in class org.archive.crawler.util.FPUriUniqFilter
-
- FPUriUniqFilter - Class in org.archive.crawler.util
-
UriUniqFilter storing 64-bit UURI fingerprints, using an internal LongFPSet
instance.
- FPUriUniqFilter(LongFPSet) - Constructor for class org.archive.crawler.util.FPUriUniqFilter
-
Create FPUriUniqFilter wrapping given long set
- FPUriUniqFilter() - Constructor for class org.archive.crawler.util.FPUriUniqFilter
-
- freeReserveMemory() - Method in class org.archive.crawler.framework.CrawlController
-
- frequentFlushes - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Whether to flush to underlying file frequently (at least after each
record), or not.
- fromCheckpointJson(JSONObject) - Method in class org.archive.modules.extractor.Extractor
-
- fromCheckpointJson(JSONObject) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- fromCheckpointJson(JSONObject) - Method in class org.archive.modules.Processor
-
Restore internal state from JSONObject stored at earlier
checkpoint-time.
- fromCheckpointJson(JSONObject) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- fromCheckpointJson(JSONObject) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- fromHopsViaString(String) - Static method in class org.archive.modules.CrawlURI
-
- frontier - Variable in class org.archive.crawler.framework.ActionDirectory
-
autowired frontier for actions
- frontier - Variable in class org.archive.crawler.framework.CrawlController
-
The frontier to use for the crawl.
- Frontier - Interface in org.archive.crawler.framework
-
An interface for URI Frontiers.
- frontier - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
-
The frontier to use.
- frontier - Variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- frontier - Variable in class org.archive.crawler.processor.HashCrawlMapper
-
- frontier - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
-
- Frontier.FrontierGroup - Interface in org.archive.crawler.framework
-
Generic interface representing the internal groupings
of a Frontier's URIs -- usually queues.
- Frontier.State - Enum in org.archive.crawler.framework
-
Enumeration of possible target states.
- FrontierJournal - Class in org.archive.crawler.frontier
-
Helper class for managing a simple Frontier change-events journal which is
useful for recovering from crawl problems.
- FrontierJournal(String, String) - Constructor for class org.archive.crawler.frontier.FrontierJournal
-
Create a new recovery journal at the given location
- FrontierNonemptyReport - Class in org.archive.crawler.reporting
-
Report of all nonempty Frontier queues (as usually dumped at end of
crawl for reference).
- FrontierNonemptyReport() - Constructor for class org.archive.crawler.reporting.FrontierNonemptyReport
-
- FrontierPreparer - Class in org.archive.crawler.prefetch
-
Processor to preload URI with as much precalculated policy-based
info as possible before it reaches frontier criticial sections.
- FrontierPreparer() - Constructor for class org.archive.crawler.prefetch.FrontierPreparer
-
- frontierReport() - Method in class org.archive.crawler.framework.CrawlJob
-
- frontierReportData() - Method in class org.archive.crawler.framework.CrawlJob
-
- FrontierSummaryReport - Class in org.archive.crawler.reporting
-
Frontier summary report showing a limited number of queues of each
type -- as typically consulted during a crawl in progress.
- FrontierSummaryReport() - Constructor for class org.archive.crawler.reporting.FrontierSummaryReport
-
- fullVia - Variable in class org.archive.modules.CrawlURI
-
- futureUriCount() - Method in interface org.archive.crawler.framework.Frontier
-
- futureUriCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- futureUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- futureUriCount - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- futureUris - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
URIs scheduled to be re-enqueued at future date
- generateCrawlLogTail() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
-
- generateFrom(ConfigPath, int) - Method in class org.archive.checkpointing.Checkpoint
-
Use immediately after instantiation to fill-in a Checkpoint
created outside Spring configuration.
- generateJobLogTail() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
-
- generateReports() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
-
- generateRequestLine(HttpConnection, String, String, String, String) - Static method in class org.apache.commons.httpclient.HttpMethodBase
-
Generates HTTP request line according to the specified attributes.
- generator - Variable in class org.archive.io.Arc2Warc
-
- generator - Variable in class org.archive.modules.writer.WARCWriterProcessor
-
Generator for record IDs
- get(Object) - Method in class org.archive.crawler.framework.BeanLookupBindings
-
- get(DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Get the next nearest item after the given key.
- get(String) - Static method in class org.archive.crawler.util.LogReader
-
Returns the entire file.
- get(InputStreamReader) - Static method in class org.archive.crawler.util.LogReader
-
Reads entire contents of reader, returns as string.
- get(String, int, int) - Static method in class org.archive.crawler.util.LogReader
-
Gets a portion of a log file.
- get(InputStreamReader, int, int, long) - Static method in class org.archive.crawler.util.LogReader
-
Gets a portion of a log file.
- get(Object, String) - Method in class org.archive.modules.credential.CredentialStore
-
- get(CharSequence, CharSequence) - Static method in class org.archive.modules.extractor.HTMLLinkContext
-
return an instance of HTMLLinkContext for attribute attr
in
element el
.
- get(String) - Static method in class org.archive.modules.extractor.HTMLLinkContext
-
return an instance of HTMLLinkContext for path path
.
- get(String) - Method in class org.archive.spring.KeyedProperties
-
Get the given value, checking override maps if appropriate.
- get(Object) - Method in class org.archive.util.Histotable
-
Return 0 instead of null for absent keys.
- get() - Method in class org.archive.util.IdentityCacheableWrapper
-
- get(String) - Method in class org.archive.util.ObjectIdentityBdbCache
-
- get(String) - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- get(String) - Method in interface org.archive.util.ObjectIdentityCache
-
get the object under the given key/name -- but should not mutate
object state
- get(String) - Method in class org.archive.util.ObjectIdentityMemCache
-
- get() - Method in class org.archive.util.Supplier
-
- getAcceptCompression() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getAcceptHeaders() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getAcceptNonDnsResolves() - Method in class org.archive.modules.fetcher.FetchDNS
-
- getAction() - Method in class org.archive.modules.forms.HTMLForm
-
- getActionDir() - Method in class org.archive.crawler.framework.ActionDirectory
-
- getActiveToeCount() - Method in class org.archive.crawler.framework.CrawlController
-
- getActiveToeCount() - Method in class org.archive.crawler.framework.ToePool
-
- getAlertCount() - Method in class org.archive.crawler.framework.CrawlJob
-
- getAlertCount() - Method in class org.archive.crawler.reporting.AlertThreadGroup
-
- getAlertCount() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getAlertsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getAll() - Method in class org.archive.modules.credential.CredentialStore
-
- getAllConfigPaths() - Method in class org.archive.spring.ConfigPathConfigurer
-
- getAllErrors() - Method in class org.archive.spring.PathSharingContext
-
- getAllowByRegex() - Method in class org.archive.crawler.prefetch.Preselector
-
- getAlsoCheckVia() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- getAnnotations() - Method in class org.archive.modules.CrawlURI
-
Get the annotations set for this uri.
- getApplicableSurtPrefix() - Method in class org.archive.modules.forms.FormLoginProcessor
-
- getAsciiBytes(String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
-
Converts the specified string to byte array of ASCII characters.
- getAsciiString(byte[], int, int) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
-
Converts the byte array of ASCII characters to a string.
- getAsciiString(byte[]) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
-
Converts the byte array of ASCII characters to a string.
- getAsText() - Method in class org.archive.io.ReadSourceEditor
-
- getAsText() - Method in class org.archive.spring.ConfigPathEditor
-
- getAt(long) - Method in class org.archive.util.AbstractLongFPSet
-
Get the stored value at the given slot.
- getAt(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
-
- getAttributeEither(CrawlURI, String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
Get a value either from inside the CrawlURI instance, or from
settings (module attributes).
- getAudience() - Method in class org.archive.modules.CrawlMetadata
-
- getAuthenticationRealm() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
use #getHostAuthState()
- getAuthScheme(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getAvailableGlobalVariables() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getAvailableGlobalVariables() - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- getAvailableRobotsPolicies() - Method in class org.archive.modules.CrawlMetadata
-
- getAvailableScriptEngines() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getAvailableScriptEngines() - Method in class org.archive.crawler.restlet.ScriptResource
-
- getBalanceReplenishAmount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getBase() - Method in class org.archive.spring.ConfigPath
-
- getBasePrecedence() - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
-
- getBasePrecedence() - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
-
- getBaseURI() - Method in class org.archive.modules.CrawlURI
-
Get the (HTML) Base URI used for derelativizing internal URIs.
- getBdbSubDirectory(File) - Static method in class org.archive.crawler.util.CheckpointUtils
-
- getBeanName() - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- getBeanName() - Method in class org.archive.modules.Processor
-
- getBeanName() - Method in class org.archive.spring.HeritrixLifecycleProcessor
-
- getBeanpathTarget(String) - Method in class org.archive.crawler.framework.CrawlJob
-
Utility method for getting a bean or any other object addressable
with a 'bean path' -- a property-path string (with dots and
[]indexes) starting with a bean name.
- getBeansRefPath() - Method in class org.archive.crawler.restlet.BeanBrowseResource
-
- getBit(long) - Method in interface org.archive.util.BloomFilter
-
- getBit(long) - Method in class org.archive.util.BloomFilter64bit
-
Returns from the local bitvector the value of the bit with
the specified index.
- getBlockAll() - Method in class org.archive.crawler.prefetch.Preselector
-
- getBlockAwaitingSeedLines() - Method in class org.archive.modules.seeds.TextSeedModule
-
- getBlockByRegex() - Method in class org.archive.crawler.prefetch.Preselector
-
- getBloomFilter() - Method in class org.archive.crawler.util.BloomUriUniqFilter
-
- getBuiltJobs() - Method in class org.archive.crawler.restlet.EngineResource
-
- getByRealm(Set<Credential>, String, CrawlURI) - Static method in class org.archive.modules.credential.HttpAuthenticationCredential
-
Convenience method that does look up on passed set using realm for key.
- getByRegex(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
-
Returns all lines in a log/file matching a given regular expression.
- getByRegex(InputStreamReader, String, int, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
-
Returns all lines in a log/file matching a given regular expression.
- getByRegex(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
-
Returns all lines in a log/file matching a given regular expression.
- getByRegex(InputStreamReader, String, String, boolean, int, int, long) - Static method in class org.archive.crawler.util.LogReader
-
Returns all lines in a log/file matching a given regular expression.
- getByRegexFromSeries(String, String, int, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
-
Returns all lines in a log/file matching a given regular expression.
- getByRegexFromSeries(String, String, String, boolean, int, int) - Static method in class org.archive.crawler.util.LogReader
-
Returns all lines in a log/file matching a given regular expression.
- getBytes(String, String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
-
Converts the specified string to a byte array.
- getBytesPerFileType(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Returns the accumulated number of bytes from files of a given file type.
- getBytesPerHost(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Returns the accumulated number of bytes downloaded from a given host.
- getCacheMisses() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- getCachePercent() - Method in class org.archive.bdb.BdbModule
-
- getCacheSize() - Method in class org.archive.bdb.BdbModule
-
- getCalculateRobotsOnly() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- getCandidateChain() - Method in class org.archive.crawler.framework.CrawlController
-
- getCandidateChain() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- getCandidateUserAgents() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
-
- getCandidateUserAgents() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
-
- getCanonicalizationPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- getCanonicalString() - Method in class org.archive.modules.CrawlURI
-
- getCaseSensitiveFilesystem() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getCharacterMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getCharPosLimit() - Method in class org.archive.util.ms.Piece
-
- getCharPosStart() - Method in class org.archive.util.ms.Piece
-
- getCheckOutlinks() - Method in class org.archive.crawler.processor.CrawlMapper
-
- getCheckpoint() - Method in class org.archive.crawler.framework.CheckpointSuccessEvent
-
- getCheckpointDir() - Method in class org.archive.checkpointing.Checkpoint
-
- getCheckpointIntervalMinutes() - Method in class org.archive.crawler.framework.CheckpointService
-
- getCheckpointsDir() - Method in class org.archive.crawler.framework.CheckpointService
-
- getCheckpointService() - Method in class org.archive.crawler.framework.CrawlJob
-
Return the configured Checkpointer instance, if there is exactly
one, otherwise null.
- getCheckUri() - Method in class org.archive.crawler.processor.CrawlMapper
-
- getChild() - Method in interface org.archive.util.ms.Entry
-
- getChmod() - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- getChmodValue() - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- getClassCatalog() - Method in class org.archive.bdb.BdbModule
-
- getClassCatalog() - Method in class org.archive.util.bdbje.EnhancedEnvironment
-
Return a StoredClassCatalog backed by a Database in this environment,
either pre-existing or created (and cached) if necessary.
- getClassCheckpointFile(File, String, Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
-
- getClassCheckpointFile(File, Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
-
- getClassCheckpointFilename(Class<?>) - Static method in class org.archive.crawler.util.CheckpointUtils
-
- getClassCheckpointFilename(Class<?>, String) - Static method in class org.archive.crawler.util.CheckpointUtils
-
- getClassKey(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
-
- getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.AssignmentLevelSurtQueueAssignmentPolicy
-
- getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
-
- getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
-
- getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
-
Get the String key (name) of the queue to which the
CrawlURI should be assigned.
- getClassKey(CrawlURI) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
- getClassKey() - Method in class org.archive.crawler.frontier.WorkQueue
-
- getClassKey(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- getClassKey() - Method in class org.archive.modules.CrawlURI
-
Get the token (usually the hostname + port) which indicates
what "class" this CrawlURI should be grouped with,
for the purposes of ensuring only one item of the
class is processed at once, all items of the class
are held for a politeness period, etc.
- getCollection() - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- getComment() - Method in class org.apache.commons.httpclient.Cookie
-
Returns the comment describing the purpose of this cookie, or
null if no such comment has been defined.
- getComment() - Method in class org.archive.modules.deciderules.DecideRule
-
- getComponent() - Method in class org.archive.crawler.Heritrix
-
- getCompoundName(String) - Static method in class org.archive.util.JndiUtils
-
- getCompoundName(ObjectName) - Static method in class org.archive.util.JndiUtils
-
Return name to use as jndi name.
- getCompress() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getConfigPathConfigurer() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- getConfigPaths() - Method in class org.archive.crawler.framework.CrawlJob
-
Return all known ConfigPaths, as an aid to viewing or editting.
- getConfigurationFile() - Method in class org.archive.spring.PathSharingContext
-
- getConfigurationFilePath() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
-
- getConnectTimeoutMs() - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- getContentCharSet(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the character set from the Content-Type header.
- getContentDeclaredCharset(CrawlURI, String) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getContentDeclaredCharset(CrawlURI, String) - Method in class org.archive.modules.extractor.ExtractorXML
-
- getContentDigest() - Method in class org.archive.modules.CrawlURI
-
Return the retained content-digest value, if any.
- getContentDigestHistory() - Method in class org.archive.modules.CrawlURI
-
- getContentDigestSchemeString() - Method in class org.archive.modules.CrawlURI
-
- getContentDigestString() - Method in class org.archive.modules.CrawlURI
-
- getContentLength() - Method in class org.archive.modules.CrawlURI
-
For completed HTTP transactions, the length of the content-body.
- getContentLengthThreshold() - Method in class org.archive.modules.deciderules.ContentLengthDecideRule
-
- getContentLengthThreshold() - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
-
- getContentRegexes() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
- getContentSize() - Method in class org.archive.modules.CrawlURI
-
Get the size in bytes of this URI's recorded content, inclusive
of things like protocol headers.
- getContentType() - Method in class org.archive.modules.CrawlURI
-
Get the content type of this URI.
- getContentType() - Method in class org.archive.net.s3.S3URLConnection
-
XXX Not sure what this should be or if it even matters for our use.
- getContentTypeMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getContext() - Method in class org.archive.modules.extractor.Link
-
- getControlConversation() - Method in class org.archive.net.ClientFTP
-
- getController() - Method in class org.archive.crawler.framework.ToePool
-
- getController() - Method in class org.archive.crawler.framework.ToeThread
-
Get the CrawlController acossiated with this thread.
- getControlUri(long, int, boolean) - Method in class org.archive.crawler.restlet.PagedRepresentation
-
Construct navigational URI for given parameters.
- getCookiePolicy() - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
Use
HttpMethodParams.getCookiePolicy()
,
HttpMethod.getParams()
.
- getCookies() - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
use getCookiesMap() // <- IA/HERITRIX CHANGE
- getCookies(String, int, String, boolean) - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
use CookieSpec#match(String, int, String, boolean, Cookie)
- getCookiesLoadFile() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- getCookiesMap() - Method in class org.apache.commons.httpclient.HttpState
-
Returns a sorted map of
cookies
that this HTTP
state currently contains.
- getCookiesMap() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- getCookiesMap() - Method in class org.archive.modules.fetcher.BdbCookieStorage
-
- getCookiesMap() - Method in interface org.archive.modules.fetcher.CookieStorage
-
- getCookiesMap() - Method in class org.archive.modules.fetcher.SimpleCookieStorage
-
- getCookiesSaveFile() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- getCookieStorage() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getCoreKey(UURI) - Method in class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
-
- getCoreKey(UURI) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
-
- getCoreKey(UURI) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
- getCost(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
Return the 'cost' of a CrawlURI (how much of its associated
queue's budget it depletes upon attempted processing)
- getCostAssignmentPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- getCount() - Method in class org.archive.crawler.frontier.WorkQueue
-
Count of URIs in this queue.
- getCountryCode() - Method in class org.archive.modules.net.CrawlHost
-
Get country code of this host
- getCountryCodes() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- getCrawlController() - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
-
- getCrawlController() - Method in class org.archive.crawler.framework.CheckpointService
-
- getCrawlController() - Method in class org.archive.crawler.framework.CrawlJob
-
- getCrawlController() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- getCrawlController() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getCrawlController() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- getCrawlController() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- getCrawlController() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- getCrawlController() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getCrawlDelay() - Method in class org.archive.modules.net.RobotsDirectives
-
- getCrawlDuration() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Returns how long the current crawl has been running *including*
time paused (contrast with getCrawlElapsedTime()).
- getCrawledBytes() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getCrawlElapsedTime() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getCrawlerCount() - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- getCrawlExitStatus() - Method in class org.archive.crawler.framework.CrawlController
-
- getCrawlJob() - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- getCrawlJobShortName() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getCrawlJobUrl() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getCrawlLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getCrawlURI() - Method in class org.archive.crawler.event.CrawlURIDispositionEvent
-
- getCreateHostDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getCreatePortDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getCredentials(String, String) - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
use #getCredentials(AuthScope)
- getCredentials(AuthScope) - Method in class org.apache.commons.httpclient.HttpState
-
Get the credentials
for the given authentication scope.
- getCredentials() - Method in class org.archive.modules.CrawlURI
-
- getCredentials() - Method in class org.archive.modules.credential.CredentialStore
-
- getCredentials() - Method in class org.archive.modules.net.CrawlServer
-
- getCredentialStore() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- getCredentialStore() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getCredentialTypes() - Static method in class org.archive.modules.credential.CredentialStore
-
- getCurrentLaunchDir() - Method in class org.archive.spring.PathSharingContext
-
- getCurrentLaunchId() - Method in class org.archive.spring.PathSharingContext
-
- getCurrentProcessorName() - Method in class org.archive.crawler.framework.ToeThread
-
- getCustomEditor() - Method in class org.archive.io.ReadSourceEditor
-
- getCustomEditor() - Method in class org.archive.spring.ConfigPathEditor
-
- getCustomRobots() - Method in class org.archive.modules.net.CustomRobotsPolicy
-
- getData() - Method in class org.archive.modules.CrawlURI
-
- getData() - Method in class org.archive.modules.extractor.Link
-
Attribute list
- getData() - Method in class org.archive.spring.PathSharingContext
-
- getDatabase(String) - Method in class org.archive.bdb.BdbModule
-
- getDatabaseConfig() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- getDatabaseName() - Method in class org.archive.util.ObjectIdentityBdbCache
-
- getDatabaseName() - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- getDataList(String) - Method in class org.archive.modules.CrawlURI
-
Convenience method: return (creating if necessary) list at
given data key
- getDecision() - Method in class org.archive.modules.deciderules.PredicatedDecideRule
-
- getDefaultCharset() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getDefaultEncoding() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getDefaultMaxFileSize() - Method in class org.archive.modules.writer.ARCWriterProcessor
-
- getDefaultMaxFileSize() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getDefaultMaxFileSize() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getDefaultRules() - Static method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
-
A reasonable set of default rules to use, if no others are
provided by operator configuration.
- getDefaultStorePaths() - Method in class org.archive.modules.writer.ARCWriterProcessor
-
- getDefaultStorePaths() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getDefaultStorePaths() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getDefaultUriPrecedencePolicy() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- getDeferrals() - Method in class org.archive.modules.CrawlURI
-
Get the deferral count.
- getDeferToPrevious() - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
Whether to always defer to a previously-assigned key inside
the CrawlURI.
- getDelay(TimeUnit) - Method in class org.archive.crawler.frontier.WorkQueue
-
- getDelayFactor() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- getDelaySeconds() - Method in class org.archive.crawler.framework.ActionDirectory
-
- getDescription() - Method in enum org.archive.crawler.framework.CrawlStatus
-
- getDescription() - Method in class org.archive.modules.CrawlMetadata
-
- getDestination() - Method in class org.archive.modules.extractor.Link
-
- getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchDNS
-
- getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchFTP
-
- getDigestAlgorithm() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getDigestContent() - Method in class org.archive.modules.fetcher.FetchDNS
-
- getDigestContent() - Method in class org.archive.modules.fetcher.FetchFTP
-
- getDigestContent() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getDir() - Method in class org.archive.bdb.BdbModule
-
- getDirectivesFor(String, boolean) - Method in class org.archive.modules.net.Robotstxt
-
Return the RobotsDirectives, if any, appropriate for the given User-Agent
string.
- getDirectivesFor(String) - Method in class org.archive.modules.net.Robotstxt
-
Return directives to use for the given User-Agent, resorting to wildcard
rules or the default no-directives if necessary.
- getDirectory() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getDirectoryFile() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getDisposition() - Method in class org.archive.crawler.event.CrawlURIDispositionEvent
-
- getDisposition() - Method in class org.archive.crawler.reporting.SeedRecord
-
- getDispositionChain() - Method in class org.archive.crawler.framework.CrawlController
-
- getDiversionDir() - Method in class org.archive.crawler.processor.CrawlMapper
-
- getDiversionLog(String) - Method in class org.archive.crawler.processor.CrawlMapper
-
Get the diversion log for a given target crawler node node.
- getDNSRecord(long, Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
-
- getDNSServerIPLabel() - Method in class org.archive.modules.CrawlURI
-
- getDoAuthentication() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns true if the HTTP method should automatically handle HTTP
authentication challenges (status code 401, etc.), false otherwise
- getDomain() - Method in class org.apache.commons.httpclient.Cookie
-
Returns domain attribute of the cookie.
- getDomain() - Method in class org.archive.modules.credential.Credential
-
- getDomDocument(File) - Method in class org.archive.crawler.framework.CrawlJob
-
Read a file to a DOM Document; return null if this isn't possible
for any reason.
- getDoneDir() - Method in class org.archive.crawler.framework.ActionDirectory
-
- getDotBegin() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getDotEnd() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getDumpPendingAtClose() - Method in class org.archive.crawler.frontier.BdbFrontier
-
- getDupByHashBytes() - Method in class org.archive.modules.fetcher.FetchStats
-
- getDupByHashUrls() - Method in class org.archive.modules.fetcher.FetchStats
-
- getEarliestNextURIEmitTime() - Method in class org.archive.modules.net.CrawlHost
-
Get the earliest time a URI for this host could be emitted.
- getEffectiveVersion() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the HTTP version used with this method (may be null
if undefined, that is, the method has not been executed)
- getEmbedHopCount() - Method in class org.archive.modules.CrawlURI
-
Get the embed hop count.
- getEnabled() - Method in class org.archive.modules.canonicalize.BaseRule
-
- getEnabled() - Method in interface org.archive.modules.canonicalize.CanonicalizationRule
-
- getEnabled() - Method in class org.archive.modules.deciderules.DecideRule
-
- getEnabled() - Method in class org.archive.modules.Processor
-
- getEngine() - Method in class org.archive.crawler.Heritrix
-
- getEngine() - Method in class org.archive.crawler.restlet.EngineApplication
-
- getEngine() - Method in class org.archive.crawler.restlet.EngineResource
-
- getEngine() - Method in class org.archive.crawler.restlet.JobRelatedResource
-
- getEngine() - Method in class org.archive.crawler.restlet.JobResource
-
- getEngine() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
Get the proper ScriptEngine instance -- either shared or local
to this thread.
- getEngine() - Method in class org.archive.modules.ScriptedProcessor
-
Get the proper ScriptEngine instance -- either shared or local
to this thread.
- getEngineName() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- getEngineName() - Method in class org.archive.modules.ScriptedProcessor
-
- getEnhDirectory() - Method in class org.archive.crawler.restlet.EnhDirectoryResource
-
- getEntriesDescending() - Method in class org.archive.crawler.util.TopNSet
-
Get descending ordered list of key,count Entries.
- getEntry(int) - Method in class org.archive.util.ms.DefaultBlockFileSystem
-
Returns the entry with the given number.
- getEntryByFrequencySortedSet() - Static method in class org.archive.util.Histotable
-
Get a SortedSet that, when filled with (String key)->(long count)
Entry instances, sorts them by (count, key) descending, as is useful
for most-frequent displays.
- getErrorPenaltyAmount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getException() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getException() - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- getExpectedConcurrency() - Method in class org.archive.bdb.BdbModule
-
- getExpectedInserts() - Method in interface org.archive.util.BloomFilter
-
Report the number of expected inserts used at instantiation time to
calculate the bitfield size.
- getExpectedInserts() - Method in class org.archive.util.BloomFilter64bit
-
- getExpirationOperation() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- getExpiryDate() - Method in class org.apache.commons.httpclient.Cookie
-
Returns the expiration
Date
of the cookie, or
null
if none exists.
- getExtract404s() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getExtract404s() - Method in interface org.archive.modules.extractor.ExtractorParameters
-
Whether to extract links from responses with a 404 'not found' response
code.
- getExtractAllForms() - Method in class org.archive.modules.forms.ExtractorHTMLForms
-
- getExtractFromDirs() - Method in class org.archive.modules.fetcher.FetchFTP
-
Returns the extract.from.dirs
attribute for this
FetchFTP
and the given curi.
- getExtractIndependently() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getExtractIndependently() - Method in interface org.archive.modules.extractor.ExtractorParameters
-
Whether each extractor should make an independent decision as to whether
it can extract links from a URI's content (when value is true), or
whether a previous extractor's success (marking the URI as
hasBeenLinkExtracted) should cancel later extractors (when value is
false).
- getExtractJavascript() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getExtractOnlyFormGets() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getExtractorJS() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getExtractorJS() - Method in class org.archive.modules.extractor.ExtractorSWF
-
- getExtractorParameters() - Method in class org.archive.modules.extractor.Extractor
-
- getExtractParent() - Method in class org.archive.modules.fetcher.FetchFTP
-
Returns the extract.parent
attribute for this
FetchFTP
and the given curi.
- getExtractValueAttributes() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getExtraInfo() - Method in class org.archive.modules.CrawlURI
-
- getFetchAttempts() - Method in class org.archive.modules.CrawlURI
-
Get the count of attempts (trips through the processing
loop) at getting the document referenced by this URI.
- getFetchBeginTime() - Method in class org.archive.modules.CrawlURI
-
- getFetchChain() - Method in class org.archive.crawler.framework.CrawlController
-
- getFetchCompletedTime() - Method in class org.archive.modules.CrawlURI
-
- getFetchDisregards() - Method in class org.archive.modules.fetcher.FetchStats
-
- getFetchDuration() - Method in class org.archive.modules.CrawlURI
-
- getFetchNonResponses() - Method in class org.archive.modules.fetcher.FetchStats
-
- getFetchResponses() - Method in class org.archive.modules.fetcher.FetchStats
-
- getFetchStatus() - Method in class org.archive.modules.CrawlURI
-
Return the overall/fetch status of this CrawlURI for its
current trip through the processing loop.
- getFetchSuccesses() - Method in class org.archive.modules.fetcher.FetchStats
-
- getFetchType() - Method in class org.archive.modules.CrawlURI
-
- getFile() - Method in class org.archive.spring.ConfigPath
-
- getFileDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Returns a HashMap that contains information about distributions of
encountered mime types.
- getFilename() - Method in class org.archive.crawler.reporting.CrawlSummaryReport
-
- getFilename() - Method in class org.archive.crawler.reporting.FrontierNonemptyReport
-
- getFilename() - Method in class org.archive.crawler.reporting.FrontierSummaryReport
-
- getFilename() - Method in class org.archive.crawler.reporting.HostsReport
-
- getFilename() - Method in class org.archive.crawler.reporting.MimetypesReport
-
- getFilename() - Method in class org.archive.crawler.reporting.ProcessorsReport
-
- getFilename() - Method in class org.archive.crawler.reporting.Report
-
- getFilename() - Method in class org.archive.crawler.reporting.ResponseCodeReport
-
- getFilename() - Method in class org.archive.crawler.reporting.SeedsReport
-
- getFilename() - Method in class org.archive.crawler.reporting.SourceTagsReport
-
- getFilename() - Method in class org.archive.crawler.reporting.ToeThreadsReport
-
- getFilename() - Method in enum org.archive.crawler.util.Logs
-
- getFilePos() - Method in class org.archive.util.ms.Piece
-
- getFileRepresentation() - Method in class org.archive.crawler.restlet.EditRepresentation
-
- getFirstARecord(Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
-
- getFirstKey() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
- getFlashes(Request) - Static method in class org.archive.crawler.restlet.Flash
-
- getFollowRedirects() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns true if the HTTP method should automatically follow HTTP redirects
(status code 302, etc.), false otherwise.
- getForceQueueAssignment() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
-
- getForceRetire() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- getForceRetire() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getForgetAllButLatest() - Method in class org.archive.checkpointing.Checkpoint
-
- getForgetAllButLatest() - Method in class org.archive.crawler.framework.CheckpointService
-
- getFormat() - Method in class org.archive.modules.canonicalize.RegexRule
-
- getFormat() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
-
- getFormItems() - Method in class org.archive.modules.credential.HtmlFormCredential
-
- getFormProvince(CrawlURI) - Method in class org.archive.modules.forms.FormLoginProcessor
-
Get the 'form province' - either the configured (applicableSurtPrefix)
or inferred (full current server) range of URIs that is considered
covered by one form login
- getFpset() - Method in class org.archive.crawler.util.FPUriUniqFilter
-
- getFrequentFlushes() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getFrom(String, int, Pattern, boolean) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
- getFrom() - Method in class org.archive.modules.CrawlMetadata
-
- getFrom() - Method in interface org.archive.modules.fetcher.UserAgentProvider
-
- getFromSeries(String, int, int) - Static method in class org.archive.crawler.util.LogReader
-
Gets a portion of a log spread across a numbered series of files.
- getFrontier() - Method in class org.archive.crawler.framework.ActionDirectory
-
- getFrontier() - Method in class org.archive.crawler.framework.CrawlController
-
- getFrontier() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- getFrontier() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getFrontier() - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- getFrontier() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
-
- getFrontierJournal() - Method in interface org.archive.crawler.framework.Frontier
-
- getFrontierJournal() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getFrontierPreparer() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getFrontierReportShort() - Method in class org.archive.crawler.framework.CrawlController
-
- getFullVia() - Method in class org.archive.modules.CrawlURI
-
- getGroup(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
-
Get the 'frontier group' (usually queue) for the given
CrawlURI.
- getGroup(CrawlURI) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- getGroupMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getGroupMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getGroupMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getGroupMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getHarvester() - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- getHashCount() - Method in interface org.archive.util.BloomFilter
-
Report the number of internal independent hash function (and thus the
number of bits set/checked for each item presented).
- getHashCount() - Method in class org.archive.util.BloomFilter64bit
-
- getHeritrixHome() - Static method in class org.archive.crawler.Heritrix
-
Exploit -Dheritrix.home
if available to us.
- getHeritrixVersion() - Method in class org.archive.crawler.framework.Engine
-
- getHistoryDbName() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- getHistoryDbName() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- getHistoryLength() - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
-
- getHolder() - Method in class org.archive.modules.CrawlURI
-
Return the 'holder' for the convenience of
an external facility.
- getHolderCost() - Method in class org.archive.modules.CrawlURI
-
Return the 'holderCost' for convenience of external facility (frontier)
- getHolderKey() - Method in class org.archive.modules.CrawlURI
-
Return the 'holderKey' for convenience of
an external facility (Frontier).
- getHopChar() - Method in enum org.archive.modules.extractor.Hop
-
Returns a hop character suitable for display in logs.
- getHopCount() - Method in class org.archive.modules.CrawlURI
-
Get total hops from seed.
- getHopString() - Method in enum org.archive.modules.extractor.Hop
-
- getHopType() - Method in class org.archive.modules.extractor.Link
-
- getHost() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns the host.
- getHostAddress(CrawlURI) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
-
from WriterPoolProcessor
- getHostAddress(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
Return IP address of given URI suitable for recording (as in a
classic ARC 5-field header line).
- getHostAddress(String) - Static method in class org.archive.util.DNSJavaUtil
-
Return an InetAddress for passed host
.
- getHostAuthState() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the target host authentication state
- getHostConfiguration() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
no longer applicable
- getHostFor(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
-
- getHostFor(String) - Method in class org.archive.modules.net.ServerCache
-
- getHostFor(UURI) - Method in class org.archive.modules.net.ServerCache
-
- getHostLastFinished(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Returns the time (in millisec) when a URI belonging to a given host was
last finished processing.
- getHostMap() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getHostMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getHostMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getHostMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getHostMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getHostName() - Method in class org.archive.modules.net.CrawlHost
-
Get the host name.
- getHrefPath(File, CrawlJob) - Static method in class org.archive.crawler.restlet.JobResource
-
Get a usable HrefPath, relative to the JobResource, for the given file.
- getHtmlOutput() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getHtmlOutput() - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- getHttp() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getHttpAuthChallenges() - Method in class org.archive.modules.CrawlURI
-
- getHttpAuthChallenges() - Method in class org.archive.modules.net.CrawlServer
-
- getHttpBindAddress() - Method in class org.archive.modules.fetcher.FetchHTTP
-
Local IP address or hostname to use when making connections (binding
sockets).
- getHttpConnectionManager() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns the httpConnectionManager.
- getHttpMethod() - Method in class org.archive.modules.CrawlURI
-
- getHttpMethod() - Method in class org.archive.modules.credential.HtmlFormCredential
-
- getHttpProxyHost() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getHttpProxyPassword() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getHttpProxyPort() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getHttpProxyUser() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getIgnoreCookies() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getIgnoreFormActionUrls() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getIgnoreUnexpectedHtml() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getImportedConfigs(File) - Method in class org.archive.crawler.framework.CrawlJob
-
Return all config files included via 'import' statements in the
primary config (or other included configs).
- getInactiveQueuesByPrecedence() - Method in class org.archive.crawler.frontier.BdbFrontier
-
- getInactiveQueuesByPrecedence() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Return a sorted map of all queues of WorkQueue keys, keyed by precedence
- getInactiveQueuesForPrecedence(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Get the queue of inactive uri-queue names at the given precedence.
- getIncrementCounts() - Method in class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
-
- getIndex() - Method in interface org.archive.util.ms.Entry
-
- getInferRootPage() - Method in class org.archive.modules.extractor.ExtractorHTTP
-
- getInFromFile(String) - Method in class org.archive.modules.extractor.PDFParser
-
Read a file named 'doc' and store its' bytes for later processing.
- getInitialDelaySeconds() - Method in class org.archive.crawler.framework.ActionDirectory
-
- getInProcessCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
The number of CrawlURIs 'in process' (passed to the outbound
queue and not yet finished by returning through the inbound
queue.)
- getInProcessCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getInputStream() - Method in class org.archive.net.s3.S3URLConnection
-
Get an InputStream for the object, connecting to S3 if connect()
hasn't been called yet.
- getInstance(String) - Static method in class org.archive.net.UURIFactory
-
- getInstance(UURI, String) - Static method in class org.archive.net.UURIFactory
-
- getIntervalSeconds() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getIP() - Method in class org.archive.modules.net.CrawlHost
-
Get the IP address for this host.
- getIpAddresses() - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
-
- getIpFetched() - Method in class org.archive.modules.net.CrawlHost
-
Get the time when the IP address for this host was last looked up.
- getIpTTL() - Method in class org.archive.modules.net.CrawlHost
-
Get the TTL value from the dns record for this host.
- getIpValidityDurationSeconds() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- getIsolateThreads() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- getIsolateThreads() - Method in class org.archive.modules.ScriptedProcessor
-
- getIteratorOfURLsSuccessfullyCrawledFromSeedUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
-
- getJavaInitializationString() - Method in class org.archive.io.ReadSourceEditor
-
- getJavaInitializationString() - Method in class org.archive.spring.ConfigPathEditor
-
- getJeLogsFilter() - Static method in class org.archive.crawler.util.CheckpointUtils
-
- getJob(String) - Method in class org.archive.crawler.framework.Engine
-
- getJobConfigs() - Method in class org.archive.crawler.framework.Engine
-
- getJobContext() - Method in class org.archive.crawler.framework.CrawlJob
-
- getJobDir() - Method in class org.archive.crawler.framework.CrawlJob
-
- getJobDirectoryFrom(File) - Method in class org.archive.crawler.framework.Engine
-
Return the job directory File read from the supplied ".jobpath" file,
or null on any error.
- getJobLog() - Method in class org.archive.crawler.framework.CrawlJob
-
- getJobLogger() - Method in class org.archive.crawler.framework.CrawlJob
-
Get a logger to a distinguished file, job.log in the job's
directory, into which job-specific events may be reported.
- getJobName() - Method in class org.archive.modules.CrawlMetadata
-
- getJobsDir() - Method in class org.archive.crawler.framework.Engine
-
- getJobStatusDescription() - Method in class org.archive.crawler.framework.CrawlJob
-
- getJumpTarget() - Method in class org.archive.modules.ProcessResult
-
- getKeepSnapshotsCount() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getKey() - Method in class org.archive.crawler.frontier.WorkQueue
-
- getKey() - Method in class org.archive.crawler.reporting.SeedRecord
-
- getKey() - Method in class org.archive.modules.credential.Credential
-
- getKey() - Method in class org.archive.modules.credential.HtmlFormCredential
-
- getKey() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- getKey() - Method in class org.archive.modules.net.CrawlHost
-
- getKey() - Method in class org.archive.modules.net.CrawlServer
-
- getKey() - Method in interface org.archive.util.IdentityCacheable
-
- getKey() - Method in class org.archive.util.IdentityCacheableWrapper
-
- getKeyedProperties() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getKeyedProperties() - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
-
- getKeyedProperties() - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
-
- getKeyedProperties() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
-
- getKeyedProperties() - Method in class org.archive.modules.canonicalize.BaseRule
-
- getKeyedProperties() - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
-
- getKeyedProperties() - Method in class org.archive.modules.CrawlMetadata
-
- getKeyedProperties() - Method in class org.archive.modules.credential.CredentialStore
-
- getKeyedProperties() - Method in class org.archive.modules.deciderules.DecideRule
-
- getKeyedProperties() - Method in class org.archive.modules.Processor
-
- getKeyedProperties() - Method in class org.archive.modules.ProcessorChain
-
- getKeyedProperties() - Method in interface org.archive.spring.HasKeyedProperties
-
- getKind() - Method in class org.archive.crawler.restlet.Flash
-
- getKryo() - Method in class org.archive.bdb.KryoBinding
-
- getLargest() - Method in class org.archive.crawler.util.TopNSet
-
- getLargestQueuesCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
remember this many largest queues for reporting's sake; actual tracking
can be somewhat approximate when some queues shrink before others'
sizes are again noted, or if the size is adjusted mid-crawl.
- getLargestValue() - Method in class org.archive.util.Histotable
-
Return the largest value of any key that is larger than 0.
- getLastActivityTime() - Method in class org.archive.crawler.framework.CrawlJob
-
- getLastCacheMissDiff() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- getLastHop() - Method in class org.archive.modules.CrawlURI
-
convenience access to last hop character, as string
- getLastLaunch() - Method in class org.archive.crawler.framework.CrawlJob
-
- getLastLaunchTime() - Method in class org.archive.crawler.restlet.models.CrawlJobModel
-
- getLastResponseInputStream() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns the stream used to read the last response's body.
- getLastSnapshot() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getLastSuccessTime() - Method in class org.archive.modules.fetcher.FetchStats
-
- getLaunchCount() - Method in class org.archive.crawler.framework.CrawlJob
-
- getLinesExecuted() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getLinesExecuted() - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- getLinkCount() - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
-
- getLinkHopCount() - Method in class org.archive.modules.CrawlURI
-
Get the link hop count.
- getListLogicalOr() - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
-
- getLiveHostReportSize() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getLocalAddress() - Method in class org.apache.commons.httpclient.HttpConnection
-
Return the local address used when creating the connection.
- getLocalName() - Method in class org.archive.crawler.processor.CrawlMapper
-
- getLogExtraInfo() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getLogFile() - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- getLogger() - Static method in class org.archive.crawler.util.RecoveryLogMapper
-
- getLoggerModule() - Method in class org.archive.crawler.framework.CrawlController
-
- getLoggerModule() - Method in class org.archive.crawler.framework.Scoper
-
- getLoggerModule() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getLoggerModule() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- getLoggerModule() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- getLoggerModule() - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- getLoggerModule() - Method in class org.archive.modules.extractor.Extractor
-
- getLoggerModule() - Method in class org.archive.modules.forms.FormLoginProcessor
-
- getLogin() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- getLoginPassword() - Method in class org.archive.modules.forms.FormLoginProcessor
-
- getLoginUri() - Method in class org.archive.modules.credential.HtmlFormCredential
-
- getLoginUsername() - Method in class org.archive.modules.forms.FormLoginProcessor
-
- getLogRejectsRule() - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
- getLogToFile() - Method in class org.archive.crawler.framework.Scoper
-
- getLogToFile() - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- getLookup() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- getLowerBound() - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
-
Returns the lower bound on the range of acceptable status codes.
- getLowerBound() - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
-
Returns the lower bound on the range of acceptable status codes.
- getLowerBound() - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
-
- getMap() - Method in class org.archive.spring.Sheet
-
Return map of full bean-path (starting with a target bean-name)
to the alternate value for that targeted property
- getMap() - Method in class org.archive.util.ObjectIdentityMemCache
-
Offer raw map access for convenience of checkpoint/recovery.
- getMapPath() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
-
- getMapUri() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
-
- getMaxAttributeNameLength() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getMaxAttributeValLength() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getMaxBytesDownload() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- getMaxDelayMs() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- getMaxDocumentsDownload() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- getMaxElementLength() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getMaxFetchKBSec() - Method in class org.archive.modules.fetcher.FetchFTP
-
- getMaxFetchKBSec() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getMaxFileSizeBytes() - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- getMaxFileSizeBytes() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getMaxHops() - Method in class org.archive.modules.deciderules.TooManyHopsDecideRule
-
- getMaxInWait() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Maximum amount of time to wait for an inbound update event before
giving up and rechecking on the ability to further fill the outbound
queue.
- getMaxInWait() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getMaxLengthBytes() - Method in class org.archive.modules.fetcher.FetchFTP
-
- getMaxLengthBytes() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getMaxOutlinks() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getMaxOutlinks() - Method in interface org.archive.modules.extractor.ExtractorParameters
-
The maximum number of outlinks to discover from any URI's content.
- getMaxPathDepth() - Method in class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
-
- getMaxPathLength() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getMaxPerHostBandwidthUsageKbSec() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- getMaxQueuesPerReportCategory() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getMaxRepetitions() - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
-
- getMaxRetries() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getMaxSegLength() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getMaxSize() - Method in class org.archive.crawler.util.TopNSet
-
- getMaxSizeToDigest() - Method in class org.archive.modules.extractor.HTTPContentDigest
-
- getMaxSizeToParse() - Method in class org.archive.modules.extractor.ExtractorPDF
-
- getMaxSizeToParse() - Method in class org.archive.modules.extractor.ExtractorUniversal
-
- getMaxSpeculativeHops() - Method in class org.archive.modules.deciderules.TransclusionDecideRule
-
- getMaxTimeSeconds() - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- getMaxToeThreads() - Method in class org.archive.crawler.framework.CrawlController
-
- getMaxTotalBytesToWrite() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getMaxTransHops() - Method in class org.archive.modules.deciderules.TransclusionDecideRule
-
- getMaxWaitForIdleMs() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getMessage() - Method in class org.archive.crawler.event.CrawlStateEvent
-
- getMessage() - Method in class org.archive.crawler.restlet.Flash
-
- getMetadata() - Method in class org.archive.crawler.framework.CrawlController
-
- getMetadata() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- getMetadata() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- getMetadata() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getMetadata() - Method in class org.archive.modules.writer.ARCWriterProcessor
-
- getMetadata() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getMetadata() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getMetadataProvider() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getMethodRetryHandler() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
use HttpMethodParams
- getMigrateMap() - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
-
- getMinDelayMs() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- getModuleClass() - Method in class org.archive.state.ModuleTestBase
-
Returns the class of the module to test.
- getMonitorConfigPaths() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- getMonitorMounts() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- getMonitorPaths() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- getName() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Obtains the name of the HTTP method as used in the HTTP request line,
for example "GET" or "POST".
- getName() - Method in class org.archive.checkpointing.Checkpoint
-
- getName() - Method in class org.archive.modules.net.CrawlServer
-
- getName() - Method in class org.archive.spring.ConfigPath
-
- getName() - Method in class org.archive.spring.Sheet
-
- getName() - Method in interface org.archive.util.ms.Entry
-
- getNamedUserAgents() - Method in class org.archive.modules.net.Robotstxt
-
- getNavlinksOnly() - Method in class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
-
- getNext() - Method in interface org.archive.util.ms.Entry
-
- getNextBlock(int) - Method in interface org.archive.util.ms.BlockFileSystem
-
Returns the number of the block that follows the given block.
- getNextBlock(int) - Method in class org.archive.util.ms.DefaultBlockFileSystem
-
- getNextCheckpointNumber() - Method in class org.archive.crawler.framework.CheckpointService
-
- getNextNearestItem(DatabaseEntry, DatabaseEntry) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
- getNonfatalErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getNonfatalErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getNonFatalFailures() - Method in class org.archive.modules.CrawlURI
-
- getNotModifiedBytes() - Method in class org.archive.modules.fetcher.FetchStats
-
- getNotModifiedUrls() - Method in class org.archive.modules.fetcher.FetchStats
-
- getNovelBytes() - Method in class org.archive.modules.fetcher.FetchStats
-
- getNovelUrls() - Method in class org.archive.modules.fetcher.FetchStats
-
- getObjectCache(String, boolean, Class<V>) - Method in class org.archive.bdb.BdbModule
-
- getObjectCache(String, boolean, Class<V>, Class<? extends V>) - Method in class org.archive.bdb.BdbModule
-
Get an ObjectIdentityCache, backed by a BDB Database of the given
name, with objects of the given valueClass type.
- getOIBCCache(String, boolean, Class<? extends V>) - Method in class org.archive.bdb.BdbModule
-
Get an ObjectIdentityBdbCache, backed by a BDB Database of the
given name, with the given value class type.
- getOnlyStoreIfWriteTagPresent() - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
-
- getOperator() - Method in class org.archive.modules.CrawlMetadata
-
- getOperatorContactUrl() - Method in class org.archive.modules.CrawlMetadata
-
- getOperatorFrom() - Method in class org.archive.modules.CrawlMetadata
-
- getOrCreateSheet(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Get a Sheet of the given name, or create if it does not already
exist.
- getOrder() - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
-
- getOrder() - Method in class org.archive.spring.ConfigPathConfigurer
-
Act as late as possible.
- getOrdinal() - Method in class org.archive.modules.CrawlURI
-
Get the ordinal (serial number) assigned at creation.
- getOrganization() - Method in class org.archive.modules.CrawlMetadata
-
- getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityBdbCache
-
- getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- getOrUse(String, Supplier<V>) - Method in interface org.archive.util.ObjectIdentityCache
-
get the object under the given key/name, using (and remembering)
the object supplied by the supplier if no prior mapping exists
-- but should not mutate object state
- getOrUse(String, Supplier<V>) - Method in class org.archive.util.ObjectIdentityMemCache
-
- getOutCandidates() - Method in class org.archive.modules.CrawlURI
-
Returns discovered candidate URIs.
- getOutlinkRule() - Method in class org.archive.crawler.processor.CrawlMapper
-
- getOutLinks() - Method in class org.archive.modules.CrawlURI
-
Returns discovered links.
- getOverlayMap(String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Retrieve the named overlay Map.
- getOverlayMap(String) - Method in class org.archive.modules.CrawlURI
-
- getOverlayMap(String) - Method in interface org.archive.spring.OverlayContext
-
get the map corresponding to the overlay name
- getOverlayMap(String) - Method in interface org.archive.spring.OverlayMapsSource
-
- getOverlayNames() - Method in class org.archive.modules.CrawlURI
-
- getOverlayNames() - Method in interface org.archive.spring.OverlayContext
-
return a list of the names of overlay maps to consider
- getOverrideKeys(String) - Method in class org.archive.spring.KeyedProperties
-
Compose the complete keys (externalPath + local key name) to use
for checking for contextual overrides.
- getParallelQueues() - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
The number of parallel queues to split a core key into.
- getParams() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns HTTP protocol parameters
associated with this method.
- getParams() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns HTTP protocol parameters
associated with this method.
- getPassword() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- getPassword() - Method in class org.archive.modules.fetcher.FetchFTP
-
- getPath() - Method in class org.apache.commons.httpclient.Cookie
-
Returns the path attribute of the cookie
- getPath() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Gets the path of this HTTP method.
- getPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getPath() - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- getPath() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getPath() - Method in class org.archive.spring.ConfigPath
-
- getPath() - Method in class org.archive.spring.ConfigPathConfigurer
-
- getPathFromSeed() - Method in class org.archive.modules.CrawlURI
-
- getPathQuery(CrawlURI) - Method in class org.archive.modules.net.RobotsPolicy
-
- getPattern() - Method in enum org.archive.modules.deciderules.MatchesFilePatternDecideRule.Preset
-
- getPauseAtStart() - Method in class org.archive.crawler.framework.CrawlController
-
- getPauseThresholdKb() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- getPauseThresholdMiB() - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- getPersistentDataKeys() - Static method in class org.archive.modules.CrawlURI
-
Add the key of items you want to persist across
processings.
- getPersistentDataMap() - Method in class org.archive.modules.CrawlURI
-
- getPolicyBasisUURI() - Method in class org.archive.modules.CrawlURI
-
Get the UURI that should be used as the basis of policy/overlay
decisions.
- getPolitenessDelay() - Method in class org.archive.modules.CrawlURI
-
- getPool() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getPoolMaxActive() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getPort() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns the port of the host.
- getPort() - Method in class org.archive.modules.net.CrawlServer
-
Get the port number for this server.
- getPrecedence() - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
-
- getPrecedence() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
-
- getPrecedence() - Method in class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
-
- getPrecedence() - Method in class org.archive.crawler.frontier.WorkQueue
-
- getPrecedence() - Method in class org.archive.modules.CrawlURI
-
- getPrecedenceFloor() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getPrecedenceProvider() - Method in class org.archive.crawler.frontier.WorkQueue
-
- getPreferenceDepthHops() - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
- getPreferenceDepthHops() - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- getPreferenceEmbedHops() - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- getPreferredVariant() - Method in class org.archive.crawler.restlet.BaseResource
-
If client can accept text/html, always prefer it.
- getPrefix() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getPrefixClassKey(byte[]) - Static method in class org.archive.crawler.frontier.BdbWorkQueue
-
- getPreloadSource() - Method in class org.archive.modules.recrawl.PersistLoadProcessor
-
- getPreloadSourceUrl() - Method in class org.archive.modules.recrawl.PersistLoadProcessor
-
- getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
-
Return the authentication URI, either absolute or relative, that serves
as prerequisite the passed curi
.
- getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
-
- getPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- getPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
-
Get the prerequisite for this URI.
- getPrevious() - Method in interface org.archive.util.ms.Entry
-
- getPrimaryConfig() - Method in class org.archive.crawler.framework.CrawlJob
-
- getPrimaryConfigurationPath() - Method in class org.archive.spring.PathSharingContext
-
- getProcessErrorOutlinks() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- getProcessors() - Method in class org.archive.modules.ProcessorChain
-
- getProcessStatus() - Method in class org.archive.modules.ProcessResult
-
- getProfileCxmlResource() - Method in class org.archive.crawler.framework.Engine
-
- getProgressLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getProgressStamp() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getProgressStatisticsLine() - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
-
Return one line of current progress-statistics
- getProgressStats() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getPropertyDescriptors(BeanWrapperImpl) - Method in class org.archive.crawler.restlet.JobRelatedResource
-
Get and modify the PropertyDescriptors associated with the BeanWrapper.
- getProtocol() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns the protocol used to establish the connection.
- getProxyAuthenticationRealm() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
use #getProxyAuthState()
- getProxyAuthState() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the proxy authentication state
- getProxyCredentials(String, String) - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
use #getProxyCredentials(AuthScope)
- getProxyCredentials(AuthScope) - Method in class org.apache.commons.httpclient.HttpState
-
Get the proxy credentials
for the given authentication scope.
- getProxyHost() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns the proxy host.
- getProxyPort() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns the port of the proxy host.
- getPseudoXpath(Node) - Static method in class org.archive.crawler.migrate.MigrateH1to3Tool
-
Given a node, give back an XPath-like string that addresses it.
- getQueryString() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Gets the query string of this HTTP method.
- getQueueAssignmentPolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- getQueueFor(String) - Method in class org.archive.crawler.frontier.BdbFrontier
-
Return the work queue for the given classKey, or null
if no such queue exists.
- getQueueFor(String) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Return the work queue for the given classKey, or null
if no such queue exists.
- getQueuePrecedencePolicy() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getQueueTotalBudget() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getRawInput() - Method in interface org.archive.util.ms.BlockFileSystem
-
Returns the raw input stream for this file system.
- getRawInput() - Method in class org.archive.util.ms.DefaultBlockFileSystem
-
- getRawOutput() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getRawOutput() - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- getReader() - Method in class org.archive.crawler.restlet.EditRepresentation
-
- getReader() - Method in class org.archive.crawler.restlet.PagedRepresentation
-
- getRealm() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- getRecheckScope() - Method in class org.archive.crawler.prefetch.Preselector
-
- getRecheckThresholdKb() - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- getRecordedFinishes() - Method in class org.archive.modules.fetcher.FetchStats
-
- getRecordedSize() - Method in class org.archive.modules.CrawlURI
-
Get size of data recorded (transferred)
- getRecordedSize(CrawlURI) - Static method in class org.archive.modules.Processor
-
- getRecorder() - Method in class org.archive.modules.CrawlURI
-
Get the http recorder associated with this uri.
- getRecorderInBufferBytes() - Method in class org.archive.crawler.framework.CrawlController
-
- getRecorderOutBufferBytes() - Method in class org.archive.crawler.framework.CrawlController
-
- getRecordID() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getRecordIDGenerator() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getRecoverableExceptionCount() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
no longer used
Returns the number of "recoverable" exceptions thrown and handled, to
allow for monitoring the quality of the connection.
- getRecoveryCheckpoint() - Method in class org.archive.crawler.framework.CheckpointService
-
- getRecoveryLogEnabled() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getRedirectUri() - Method in class org.archive.crawler.reporting.SeedRecord
-
- getReducePrefixRegex() - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- getReduceRegex(CrawlURI) - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- getReference(ObjectName) - Static method in class org.archive.util.JndiUtils
-
- getRegex() - Method in class org.archive.modules.canonicalize.RegexRule
-
- getRegex() - Method in class org.archive.modules.deciderules.MatchesFilePatternDecideRule
-
Use a preset if configured to do so.
- getRegex() - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
-
- getRegex() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
-
- getRegexList() - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
-
- getRemaining() - Method in class org.archive.modules.fetcher.FetchStats
-
- getRemoveTriggerUris() - Method in class org.archive.modules.extractor.ExtractorImpliedURI
-
- getReplyStrings() - Method in class org.archive.net.ClientFTP
-
- getReports() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getReportsDir() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getRepresentation(Status, Request, Response) - Method in class org.archive.crawler.restlet.EngineApplication.EngineStatusService
-
- getRequestCharSet() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the character encoding of the request from the Content-Type header.
- getRequestHeader(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the specified request header.
- getRequestHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Gets the header group
storing the request headers.
- getRequestHeaders() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns an array of the requests headers that the HTTP method currently has
- getRequestHeaders(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- getRequestOutputStream() - Method in class org.apache.commons.httpclient.HttpConnection
-
- getRescheduleDelaySeconds() - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
-
- getRescheduleTime() - Method in class org.archive.modules.CrawlURI
-
- getResourceDir() - Method in class org.archive.state.ModuleTestBase
-
Returns the location of the Java resources directory for your project.
- getRespectCrawlDelayUpToSeconds() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- getResponseBody() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the response body of the HTTP method, if any, as an array of bytes.
- getResponseBodyAsStream() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the response body of the HTTP method, if any, as an
InputStream
.
- getResponseBodyAsString() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the response body of the HTTP method, if any, as a
String
.
- getResponseCharSet() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the character encoding of the response from the Content-Type header.
- getResponseContentLength() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Return the length (in bytes) of the response body, as specified in a
Content-Length header.
- getResponseFooter(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Gets the response footer associated with the given name.
- getResponseFooters() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns an array of the response footers that the HTTP method currently has
in the order in which they were read.
- getResponseHeader(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Gets the response header associated with the given name.
- getResponseHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Gets the header group
storing the response headers.
- getResponseHeaders(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- getResponseHeaders() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns an array of the response headers that the HTTP method currently has
in the order in which they were read.
- getResponseInputStream() - Method in class org.apache.commons.httpclient.HttpConnection
-
Return a
InputStream
suitable for reading the response.
- getResponseStream() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns a stream from which the body of the current response may be read.
- getResponseTrailerHeaderGroup() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Gets the header group
storing the response trailer headers
as per RFC 2616 section 3.6.1.
- getRetiredQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
-
- getRetiredQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Return queue of all retired queue names.
- getRetryDelaySeconds() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getReverseSortedCopy(Map<String, AtomicLong>) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Sort the entries of the given Map in descending order by their
values, which must be longs wrapped with AtomicLong
.
- getReverseSortedHostCounts(Map<String, AtomicLong>) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Return a copy of the hosts distribution in reverse-sorted (largest first)
order.
- getRobotsDenials() - Method in class org.archive.modules.fetcher.FetchStats
-
- getRobotsPolicy() - Method in class org.archive.modules.CrawlMetadata
-
Get the currently-effective RobotsPolicy, as specified by the
string name and chosen from the full available map.
- getRobotsPolicyName() - Method in class org.archive.modules.CrawlMetadata
-
- getRobotstxt() - Method in class org.archive.modules.net.CrawlServer
-
- getRobotsValidityDurationSeconds() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- getRoot() - Method in interface org.archive.util.ms.BlockFileSystem
-
Returns the root entry of the file system.
- getRoot() - Method in class org.archive.util.ms.DefaultBlockFileSystem
-
- getRotationDigits() - Method in class org.archive.crawler.processor.CrawlMapper
-
- getRuleAssociations() - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
All DecideRuledSheetAssociations, in Ordered order
- getRules() - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
-
- getRules() - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
-
- getRules() - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- getRuntimeErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getRuntimeErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getRuntimeSeconds() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- getRunWhileEmpty() - Method in class org.archive.crawler.framework.CrawlController
-
- getSchedulingDirective(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
Calculate the coarse, original 'schedulingDirective' prioritization
for the given CrawlURI
- getSchedulingDirective() - Method in class org.archive.modules.CrawlURI
-
- getSchedulingFor(CrawlURI, Link, int) - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
Determine scheduling for the curi
.
- getSchemes() - Method in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
-
- getScope() - Method in interface org.archive.crawler.framework.Frontier
-
- getScope() - Method in class org.archive.crawler.framework.Scoper
-
- getScope() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getScratchDir() - Method in class org.archive.crawler.framework.CrawlController
-
- getScratchDisk() - Method in interface org.archive.modules.extractor.TempDirProvider
-
- getScratchDisk() - Method in class org.archive.modules.net.DefaultTempDirProvider
-
- getScript() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getScript() - Method in class org.archive.crawler.restlet.ScriptingConsole
-
- getScriptSource() - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- getScriptSource() - Method in class org.archive.modules.ScriptedProcessor
-
- getSecure() - Method in class org.apache.commons.httpclient.Cookie
-
- getSeedCollection() - Method in class org.archive.crawler.util.RecoveryLogMapper
-
- getSeedForUrl(String) - Method in class org.archive.crawler.util.RecoveryLogMapper
-
Returns seed for urlString (null if seed not found).
- getSeedListeners() - Method in class org.archive.modules.seeds.SeedModule
-
- getSeeds() - Method in class org.archive.crawler.framework.ActionDirectory
-
- getSeeds() - Method in class org.archive.crawler.framework.CrawlController
-
- getSeeds() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getSeeds() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- getSeeds() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getSeeds() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- getSeedsAsSurtPrefixes() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- getSeedsIterator() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Get a seed iterator for the job being monitored.
- getSeedsRedirectNewSeeds() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- getSeedsRedirectNewSeeds() - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
- getSeedUrlToDiscoveredUrlsMap() - Method in class org.archive.crawler.util.RecoveryLogMapper
-
- getSendBufferSize() - Method in class org.apache.commons.httpclient.HttpConnection
-
Gets the socket's sendBufferSize.
- getSendConnectionClose() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getSendIfModifiedSince() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getSendIfNoneMatch() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getSendRange() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getSendReferer() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getSerialNo() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getSerialNumber() - Method in class org.archive.crawler.framework.ToeThread
-
- getServerCache() - Method in class org.archive.crawler.framework.CrawlController
-
- getServerCache() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getServerCache() - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
-
- getServerCache() - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
-
- getServerCache() - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- getServerCache() - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- getServerCache() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getServerCache() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getServerCache() - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- getServerCache() - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
-
- getServerCache() - Method in class org.archive.modules.fetcher.FetchDNS
-
- getServerCache() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getServerCache() - Method in class org.archive.modules.fetcher.FetchWhois
-
- getServerCache() - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- getServerCache() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getServerFor(String) - Method in class org.archive.modules.fetcher.DefaultServerCache
-
- getServerFor(String) - Method in class org.archive.modules.net.ServerCache
-
- getServerFor(UURI) - Method in class org.archive.modules.net.ServerCache
-
- getServerKey(UURI) - Static method in class org.archive.modules.net.CrawlServer
-
Get key to use doing lookup on server instances.
- getServerMaxAllKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getServerMaxFetchResponses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getServerMaxFetchSuccesses() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getServerMaxSuccessKb() - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- getSessionBalance() - Method in class org.archive.crawler.frontier.WorkQueue
-
- getSessionBudget() - Method in class org.archive.crawler.frontier.WorkQueue
-
Return current session 'activity budget balance'
- getSheetOverlaysManager() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- getSheetOverlaysManager() - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- getSheetsByName() - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Sheets, by name; starts with all autowired Sheets but others
may be added by other means (mid-crawl reconfiguration).
- getSheetsNamesBySurt() - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Sheet names, by the SURT prefix to which they should be applied.
- getShortName() - Method in class org.archive.checkpointing.Checkpoint
-
- getShortName() - Method in class org.archive.crawler.framework.CrawlJob
-
- getShouldFetchBodyRule() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getShouldMasquerade() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
-
- getShouldMasquerade() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
-
- getShouldProcessRule() - Method in class org.archive.modules.Processor
-
- getShouldReportAtEndOfCrawl() - Method in class org.archive.crawler.reporting.Report
-
- getShouldReportDuringCrawl() - Method in class org.archive.crawler.reporting.Report
-
- getSizeBytes() - Method in interface org.archive.util.BloomFilter
-
The amount of memory in bytes consumed by the bloom
bitfield.
- getSizeBytes() - Method in class org.archive.util.BloomFilter64bit
-
- getSkipIdenticalDigests() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getSlotState(long) - Method in class org.archive.util.AbstractLongFPSet
-
Check the state of a slot in the storage.
- getSlotState(long) - Method in class org.archive.util.fingerprint.MemLongFPSet
-
- getSmallest() - Method in class org.archive.crawler.util.TopNSet
-
- getSnapshot() - Method in class org.archive.crawler.event.StatSnapshotEvent
-
- getSnapshot() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getSnoozedCount() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getSnoozeLongMs() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getSocket() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns the connection socket.
- getSortedByCounts() - Method in class org.archive.util.Histotable
-
- getSortedByKeys() - Method in class org.archive.util.Histotable
-
- getSortedDuplicates() - Method in class org.archive.bdb.BdbModule.BdbConfig
-
- getSortKey() - Method in class org.apache.commons.httpclient.Cookie
-
Create a 'sort key' for this Cookie that will cause it to sort
alongside other Cookies of the same domain (with or without leading
'.').
- getSoTimeout() - Method in class org.apache.commons.httpclient.HttpConnection
-
- getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchFTP
-
- getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getSoTimeoutMs() - Method in class org.archive.modules.fetcher.FetchWhois
-
- getSource() - Method in class org.archive.modules.extractor.Link
-
- getSourceCodeDir() - Method in class org.archive.state.ModuleTestBase
-
Returns the location of the source code directory for your project.
- getSourceTag() - Method in class org.archive.modules.CrawlURI
-
- getSourceTagSeeds() - Method in class org.archive.modules.seeds.SeedModule
-
- getSslTrustLevel() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getStackTrace() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- getStartNewFilesOnCheckpoint() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getState() - Method in class org.archive.crawler.event.CrawlStateEvent
-
- getState() - Method in class org.archive.crawler.framework.CrawlController
-
- getStaticRef(String) - Method in class org.archive.crawler.restlet.BaseResource
-
- getStaticRef(String) - Method in class org.archive.crawler.restlet.EditRepresentation
-
- getStatisticsTracker() - Method in class org.archive.crawler.framework.CrawlController
-
- getStatisticsTracker() - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- getStats() - Method in class org.archive.crawler.framework.CrawlJob
-
- getStatusCode() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the response status code.
- getStatusCode() - Method in class org.archive.crawler.reporting.SeedRecord
-
- getStatusCodeDistribution() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Return a objectCache representing the distribution of status codes for
successfully fetched curis, as represented by a cache where key ->
val represents (string)code -> (integer)count.
- getStatusCodes() - Method in class org.archive.modules.deciderules.FetchStatusDecideRule
-
- getStatusLine() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Provides access to the response status line.
- getStatusText() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the status text (or "reason phrase") associated with the latest
response.
- getStep() - Method in class org.archive.crawler.framework.ToeThread
-
- getStoredMap(String, Class<K>, Class<V>, boolean, boolean) - Method in class org.archive.bdb.BdbModule
-
Creates a database-backed TempStoredSortedMap for transient
reporting requirements.
- getStoredQueue(String, Class<K>, boolean) - Method in class org.archive.bdb.BdbModule
-
- getStorePaths() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getString(byte[], int, int, String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
-
Converts the byte array of HTTP content characters to a string.
- getString(byte[], String) - Static method in class org.apache.commons.httpclient.util.EncodingUtil
-
Converts the byte array of HTTP content characters to a string.
- getString(CrawlURI) - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
-
- getString(CrawlURI) - Method in class org.archive.modules.deciderules.ContentTypeMatchesRegexDecideRule
-
- getString(CrawlURI) - Method in class org.archive.modules.deciderules.FetchStatusMatchesRegexDecideRule
-
- getString(CrawlURI) - Method in class org.archive.modules.deciderules.HopsPathMatchesRegexDecideRule
-
- getString(CrawlURI) - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
-
- getStripRegex() - Method in class org.archive.modules.extractor.HTTPContentDigest
-
- getSubContext(String) - Static method in class org.archive.util.JndiUtils
-
Get subcontext.
- getSubContext(CompoundName) - Static method in class org.archive.util.JndiUtils
-
Get subcontext.
- getSubqueue(UURI, int) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
- getSubstats() - Method in class org.archive.crawler.frontier.WorkQueue
-
- getSubstats() - Method in interface org.archive.modules.fetcher.FetchStats.HasFetchStats
-
- getSubstats() - Method in class org.archive.modules.net.CrawlHost
-
- getSubstats() - Method in class org.archive.modules.net.CrawlServer
-
- getSuccess() - Method in class org.archive.checkpointing.Checkpoint
-
- getSuccessBytes() - Method in class org.archive.modules.fetcher.FetchStats
-
- getSuccessfullyCrawledUrls() - Method in class org.archive.crawler.util.RecoveryLogMapper
-
- getSuffixAtEnd() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getSupplementaryRule() - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
-
- getSurtAuthority(String) - Method in class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
-
- getSurtPrefixes() - Method in class org.archive.crawler.spring.SurtPrefixesSheetAssociation
-
- getSurtsDumpFile() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- getSurtsSource() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- getSurtsSourceFile() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- getTags() - Method in class org.archive.io.ReadSourceEditor
-
- getTags() - Method in class org.archive.spring.ConfigPathEditor
-
- getTargetSheetNames() - Method in class org.archive.crawler.spring.SheetAssociation
-
- getTemplate() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
- getTemplate() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getTemplateConfiguration() - Method in class org.archive.crawler.restlet.BeanBrowseResource
-
- getTemplateConfiguration() - Method in class org.archive.crawler.restlet.EngineResource
-
- getTemplateConfiguration() - Method in class org.archive.crawler.restlet.JobResource
-
- getTemplateConfiguration() - Method in class org.archive.crawler.restlet.ScriptResource
-
- getTestEnvironment(File) - Static method in class org.archive.util.bdbje.EnhancedEnvironment
-
Create a temporary test environment in the given directory.
- getText(String) - Static method in class org.archive.util.ms.Doc
-
Returns the text of the .doc file with the given file name.
- getText(File) - Static method in class org.archive.util.ms.Doc
-
Returns the text of the given .doc file.
- getText(SeekInputStream) - Static method in class org.archive.util.ms.Doc
-
Returns the text of the given .doc file.
- getText(BlockFileSystem, int) - Static method in class org.archive.util.ms.Doc
-
Returns the text for the given .doc file.
- getTextSource() - Method in class org.archive.modules.seeds.TextSeedModule
-
- getThreadNumber() - Method in class org.archive.modules.CrawlURI
-
Get the number of the ToeThread responsible for processing this uri.
- getTimeoutSeconds() - Method in class org.archive.modules.fetcher.FetchFTP
-
- getTimeoutSeconds() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getTmpDir() - Method in class org.archive.util.TmpDirTestCase
-
- getToeCount() - Method in class org.archive.crawler.framework.CrawlController
-
- getToeCount() - Method in class org.archive.crawler.framework.ToePool
-
- getToePool() - Method in class org.archive.crawler.framework.CrawlController
-
- getToeThreadReport() - Method in class org.archive.crawler.framework.CrawlController
-
- getToeThreadReportShort() - Method in class org.archive.crawler.framework.CrawlController
-
- getToeThreadReportShortData() - Method in class org.archive.crawler.framework.CrawlController
-
- getTooLongDirectory() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getTopSet() - Method in class org.archive.crawler.util.TopNSet
-
Make internal map available (for checkpoint/restore purposes).
- getTotal() - Method in class org.archive.util.Histotable
-
Return the total of all tallies.
- getTotalBytes() - Method in class org.archive.crawler.util.CrawledBytesHistotable
-
- getTotalBytes() - Method in class org.archive.modules.fetcher.FetchStats
-
- getTotalBytesWritten() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getTotalEligibleInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Total of all URIs in inactive queues at precedences above the floor
- getTotalExpenditure() - Method in class org.archive.crawler.frontier.WorkQueue
-
Return the tally of all expenditures on this queue
- getTotalInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Total of all URIs in inactive queues at all precedences
- getTotalIneligibleInactiveQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Total of all URIs in inactive queues at precedences at or below the floor
- getTotalScheduled() - Method in class org.archive.modules.fetcher.FetchStats
-
- getTotalUrls() - Method in class org.archive.crawler.util.CrawledBytesHistotable
-
- getTrackSeeds() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getTrackSources() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- getTransHops() - Method in class org.archive.modules.CrawlURI
-
Tally up the number of transitive (non-simple-link) hops at
the end of this CrawlURI's pathFromSeed.
- getTreatFramesAsEmbedLinks() - Method in class org.archive.modules.extractor.ExtractorHTML
-
- getType() - Method in interface org.archive.util.ms.Entry
-
- getUnderscoreSet() - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- getUpperBound() - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
-
Returns the upper bound on the range of acceptable status codes.
- getUpperBound() - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
-
Returns the upper bound on the range of acceptable status codes.
- getUpperBound() - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
-
- getURI() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns the URI of the HTTP method
- getUri() - Method in class org.archive.crawler.reporting.SeedRecord
-
- getURI() - Method in class org.archive.modules.CrawlURI
-
- getURICount() - Method in class org.archive.modules.Processor
-
Returns the number of URIs this processor has handled.
- getUriErrors() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getUriErrorsLogPath() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getUriPrecedencePolicy() - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- getUriProcessing() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- getUriRegex() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
- getURIs() - Method in class org.archive.modules.extractor.PDFParser
-
Get a list of URIs retrieved from the Pdf during the
extractURIs operation.
- getURIsList(String, int, String, boolean) - Method in interface org.archive.crawler.framework.Frontier
-
Returns a list of all uncrawled URIs starting from a specified marker
until numberOfMatches
is reached.
- getURIsList(String, int, String, boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
-
Return list of urls.
- getUriUniqFilter() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- getURL(String, String) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
-
Overwrite handling of discovered URIs.
- getUseHardLinkCheckpoints() - Method in class org.archive.bdb.BdbModule
-
- getUseHeaderLength() - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
-
- getUseHTTP11() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getUsePreset() - Method in class org.archive.modules.deciderules.MatchesFilePatternDecideRule
-
- getUsePublicSuffixesRegex() - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- getUserAgent() - Method in class org.archive.modules.CrawlMetadata
-
- getUserAgent() - Method in class org.archive.modules.CrawlURI
-
Get the user agent to use for crawling this URI.
- getUserAgent() - Method in interface org.archive.modules.fetcher.UserAgentProvider
-
- getUserAgentProvider() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- getUserAgentTemplate() - Method in class org.archive.modules.CrawlMetadata
-
- getUsername() - Method in class org.archive.modules.fetcher.FetchFTP
-
- getUseSharedCache() - Method in class org.archive.bdb.BdbModule
-
- getUURI() - Method in class org.archive.modules.CrawlURI
-
- getValidator() - Method in class org.archive.crawler.framework.CheckpointService
-
- getValidator() - Method in class org.archive.modules.CrawlMetadata
-
- getValidator() - Method in interface org.archive.spring.HasValidator
-
- getValidDateFormats() - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Returns the
Collection
of date patterns used for parsing.
- getValidDateFormats() - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
- getValidDateFormats() - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- getValidTestData() - Method in class org.archive.modules.extractor.StringExtractorTestBase
-
Returns an array of valid test data pairs.
- getValue() - Method in class org.archive.io.ReadSourceEditor
-
- getValue() - Method in class org.archive.spring.ConfigFileEditor
-
- getValue() - Method in class org.archive.spring.ConfigPathEditor
-
- getValue() - Method in class org.archive.spring.ConfigString
-
- getVariants() - Method in class org.archive.crawler.restlet.EnhDirectoryResource
-
Add EditRepresentation as a variant when appropriate.
- getVersion() - Method in class org.apache.commons.httpclient.Cookie
-
Returns the version of the cookie specification to which this
cookie conforms.
- getVia() - Method in class org.archive.modules.CrawlURI
-
- getViaContext() - Method in class org.archive.modules.CrawlURI
-
- getVirtualHost() - Method in class org.apache.commons.httpclient.HttpConnection
-
Deprecated.
no longer applicable
- getWakeTime() - Method in class org.archive.crawler.frontier.WorkQueue
-
- getWhoisQuery(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
-
- getWhoisServer(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
-
- getWorkQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
-
- getWriteBufferSize() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- getWriteMetadata() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getWriteRequests() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getWriteRevisitForIdenticalDigests() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getWriteRevisitForNotModified() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- getXmlWriter(Writer) - Static method in class org.archive.crawler.restlet.XmlMarshaller
-
- groovyTemplate() - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
- groovyTemplates - Variable in class org.archive.modules.extractor.ExtractorMultipleRegex
-
- GROUP - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- gzipFile - Variable in class org.archive.io.CrawlerJournal
-
File we're writing journal to.
- handle401(HttpMethod, CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
-
Server is looking for basic/digest auth credentials (RFC2617).
- handlePrerequisite(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
The CrawlURI has a prerequisite; apply scoping and update
Link to CrawlURI in manner analogous to outlink handling.
- handleQueue(WorkQueue, boolean, long, long) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Send an active queue to its next state, based on the supplied
parameters.
- Handler - Class in org.archive.net.s3
-
Handler for Amazon S3 URLs of the form
s3://id:secret@bucket/key
- Handler() - Constructor for class org.archive.net.s3.Handler
-
- handleSeed(CrawlURI, String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
If the curi is a seed, we update the processedSeeds cache.
- handleUnregisteredClass(Class) - Method in class org.archive.bdb.AutoKryo
-
- harvester - Variable in class org.archive.modules.writer.Kw3WriterProcessor
-
Name of the harvester that is used for the web harvesting.
- HARVESTER_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- hasApplicationContext() - Method in class org.archive.crawler.framework.CrawlJob
-
- hasAvailableCheckpoints() - Method in class org.archive.crawler.framework.CheckpointService
-
- hasBeenLinkExtracted() - Method in class org.archive.modules.CrawlURI
-
If true then a link extractor has already claimed this CrawlURI and
performed link extraction on the document content.
- hasBeenLookedUp() - Method in class org.archive.modules.net.CrawlHost
-
Return true if the IP for this host has been looked up.
- hasBeenUsed() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- hasContentDigestHistory() - Method in class org.archive.modules.CrawlURI
-
- hasCredentials() - Method in class org.archive.modules.CrawlURI
-
- hasCredentials() - Method in class org.archive.modules.net.CrawlServer
-
- hasData() - Method in class org.archive.modules.extractor.Link
-
- hasErrors - Variable in class org.archive.modules.net.Robotstxt
-
- hash(CharSequence, int, int) - Method in class org.archive.util.BloomFilter64bit
-
Hashes the given sequence with the given hash function.
- hash(CharSequence) - Method in class org.archive.util.LongToIntConsistentHash
-
- hashCode() - Method in class org.apache.commons.httpclient.Cookie
-
- hashCode() - Method in class org.archive.modules.extractor.Link
-
- hashCode() - Method in class org.archive.modules.extractor.LinkContext
-
- hashCode() - Method in class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
-
All instances of DefaultProtocolSocketFactory have the same hash code.
- hashCode() - Method in class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
-
- hashCode() - Method in class org.archive.modules.net.CrawlHost
-
- hashCode() - Method in class org.archive.modules.net.CrawlServer
-
- HashCrawlMapper - Class in org.archive.crawler.processor
-
Maps URIs to one of N crawler names by applying a hash to the
URI's (possibly-transformed) classKey.
- HashCrawlMapper() - Constructor for class org.archive.crawler.processor.HashCrawlMapper
-
Constructor.
- hashSet - Variable in class org.archive.crawler.util.MemUriUniqFilter
-
- hasHttpAuthenticationCredential(CrawlURI) - Static method in class org.archive.modules.Processor
-
- hasIdenticalDigest(CrawlURI) - Static method in class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
-
Utility method for testing if a CrawlURI's last two history
entries (one being the most recent fetch) have identical
content-digest information.
- HasKeyedProperties - Interface in org.archive.spring
-
Interface indicating an object has an internal map of properties,
and thus at least partially amenable to sheet-based contextual
overriding of properties.
- hasNext() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
-
Test whether any items remain; loads next item into
holding 'next' field.
- hasNext() - Method in class org.archive.util.iterator.CompositeIterator
-
- hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
-
- hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
-
- hasPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- hasPrerequisiteUri() - Method in class org.archive.modules.CrawlURI
-
- hasRfc2617Credential() - Method in class org.archive.modules.CrawlURI
-
- hasStarted - Variable in class org.archive.crawler.framework.CrawlController
-
- hasStarted() - Method in class org.archive.crawler.framework.CrawlController
-
- hasValidApplicationContext() - Method in class org.archive.crawler.framework.CrawlJob
-
Did the ApplicationContext self-validate?
return true if validation passed without errors
- HasValidator - Interface in org.archive.spring
-
- hasValidStamp(File) - Static method in class org.archive.checkpointing.Checkpoint
-
- HasViaDecideRule - Class in org.archive.modules.deciderules
-
Rule applies the configured decision for any URI which has a 'via'
(essentially, any URI that was a seed or some kinds of mid-crawl adds).
- HasViaDecideRule() - Constructor for class org.archive.modules.deciderules.HasViaDecideRule
-
Usual constructor.
- hasWriteTag(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
-
- haveOverlayNamesBeenSet() - Method in class org.archive.modules.CrawlURI
-
- haveOverlayNamesBeenSet() - Method in interface org.archive.spring.OverlayContext
-
test if this context has actually been configured with overlays
(even if in fact no overlays were added)
- haveSeen(int, int) - Method in class org.archive.modules.extractor.PDFParser
-
Indicates, based on a PDFObject's generation/id pair whether
the parser has already encountered this object (or a reference to it)
so we don't infinitely loop on circuits within the PDF.
- HEADER_LENGTH_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- HEADER_MD5_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- HEADER_PREDICTS_MISSING - Static variable in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
-
- HEADER_TRUNC - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- HEADER_TRUNC - Static variable in class org.archive.modules.fetcher.FetchErrors
-
- headSetInclusive(SortedSet<String>, String) - Static method in class org.archive.util.PrefixFinder
-
- heapReport() - Method in class org.archive.crawler.framework.Engine
-
- heapReportData() - Method in class org.archive.crawler.framework.Engine
-
- Heritrix - Class in org.archive.crawler
-
Main class for Heritrix crawler.
- Heritrix() - Constructor for class org.archive.crawler.Heritrix
-
- HeritrixHttpMethodRetryHandler - Class in org.archive.modules.fetcher
-
Retry handler that tries ten times to establish connection and then once
established, if a GET method, tries ten times to get response (If POST,
it tries once only).
- HeritrixHttpMethodRetryHandler() - Constructor for class org.archive.modules.fetcher.HeritrixHttpMethodRetryHandler
-
Constructor.
- HeritrixHttpMethodRetryHandler(int) - Constructor for class org.archive.modules.fetcher.HeritrixHttpMethodRetryHandler
-
Constructor.
- HeritrixLifecycleProcessor - Class in org.archive.spring
-
Stand-in LifecycleProcessor to avoid a full automatic start() when our
ApplicationContext (PathSharingContext) is built ('refreshed').
- HeritrixLifecycleProcessor() - Constructor for class org.archive.spring.HeritrixLifecycleProcessor
-
- HeritrixProtocolSocketFactory - Class in org.archive.modules.fetcher
-
Version of protocol socket factory that tries to get IP from heritrix IP
cache -- if its been set into the HttpConnectionParameters.
- HeritrixProtocolSocketFactory() - Constructor for class org.archive.modules.fetcher.HeritrixProtocolSocketFactory
-
Constructor.
- HeritrixSSLProtocolSocketFactory - Class in org.archive.modules.fetcher
-
Implementation of the commons-httpclient SSLProtocolSocketFactory so we
can return SSLSockets whose trust manager is
ConfigurableX509TrustManager
.
- HeritrixSSLProtocolSocketFactory() - Constructor for class org.archive.modules.fetcher.HeritrixSSLProtocolSocketFactory
-
Shutdown constructor.
- HIDDEN_PROPS - Static variable in class org.archive.crawler.restlet.JobRelatedResource
-
suppress problematic properties
- HIGH - Static variable in class org.archive.modules.SchedulingConstants
-
High scheduling priority.
- HIGHEST - Static variable in class org.archive.modules.SchedulingConstants
-
Highest scheduling priority.
- highestPrecedenceWaiting - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
- HighestUriQueuePrecedencePolicy - Class in org.archive.crawler.frontier.precedence
-
QueuePrecedencePolicy that sets a uri-queue's precedence to that of the
highest URI currently enqueued within itself, added to the configured
base-precedence.
- HighestUriQueuePrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy
-
- HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider - Class in org.archive.crawler.frontier.precedence
-
Helper provider for maintaining the tracked distribution of included
URIs and calculating the queue precedence.
- HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider(int) - Constructor for class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
-
- HISTORY_DB_CONFIG - Static variable in class org.archive.modules.recrawl.PersistProcessor
-
- historyDb - Variable in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- historyDb - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- historyDb - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- historyDbConfig - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- historyDbConfig() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- historyDbName - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- historyDbName - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- historyLength - Variable in class org.archive.modules.recrawl.FetchHistoryProcessor
-
Desired history array length.
- Histotable<K> - Class in org.archive.util
-
Collect and report frequency information.
- Histotable() - Constructor for class org.archive.util.Histotable
-
- holder - Variable in class org.archive.modules.CrawlURI
-
- holderCost - Variable in class org.archive.modules.CrawlURI
-
spot for an integer cost to be placed by external facility (frontier).
- holderKey - Variable in class org.archive.modules.CrawlURI
-
- hookupDatabase(Database, Class<E>, StoredClassCatalog) - Method in class org.archive.bdb.StoredQueue
-
- Hop - Enum in org.archive.modules.extractor
-
The kind of "hop" from one URI to another.
- HopCrossesAssignmentLevelDomainDecideRule - Class in org.archive.modules.deciderules
-
Applies its decision if the current URI differs in that portion of
its hostname/domain that is assigned/sold by registrars, its
'assignment-level-domain' (ALD) (AKA 'public suffix' or in previous
Heritrix versions, 'topmost assigned SURT')
- HopCrossesAssignmentLevelDomainDecideRule() - Constructor for class org.archive.modules.deciderules.HopCrossesAssignmentLevelDomainDecideRule
-
- HopsPathMatchesRegexDecideRule - Class in org.archive.modules.deciderules
-
Rule applies configured decision to any CrawlURIs whose 'hops-path'
(string like "LLXE" etc.) matches the supplied regex.
- HopsPathMatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.HopsPathMatchesRegexDecideRule
-
Usual constructor.
- hopString - Variable in enum org.archive.modules.extractor.Hop
-
- HopsUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
-
UriPrecedencePolicy which assigns URIs a precedence equal to the number
of hops in its hops-path-from-seed (either all hops or just navlink ('L')
hops.
- HopsUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
-
- HOST - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- hostKeys() - Method in class org.archive.modules.fetcher.DefaultServerCache
-
- hostKeys() - Method in class org.archive.modules.net.ServerCache
-
- hostMap - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
This list is grouped in pairs.
- HostnameQueueAssignmentPolicy - Class in org.archive.crawler.frontier
-
QueueAssignmentPolicy based on the hostname:port evident in the given
CrawlURI.
- HostnameQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
-
- HostResolver - Interface in org.archive.modules.fetcher
-
- hosts - Variable in class org.archive.modules.fetcher.DefaultServerCache
-
hostname -> CrawlHost.
- hostsBytesTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- hostsDistributionTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- hostsLastFinishedTop - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- HostsReport - Class in org.archive.crawler.reporting
-
The "Hosts Report", tallies by host.
- HostsReport() - Constructor for class org.archive.crawler.reporting.HostsReport
-
- HTML_TAGS - Static variable in class org.archive.util.UriUtils
-
- HTMLForm - Class in org.archive.modules.forms
-
Simple representation of a discovered HTML Form.
- HTMLForm() - Constructor for class org.archive.modules.forms.HTMLForm
-
- HTMLForm.FormInput - Class in org.archive.modules.forms
-
- HTMLForm.FormInput() - Constructor for class org.archive.modules.forms.HTMLForm.FormInput
-
- HtmlFormCredential - Class in org.archive.modules.credential
-
Credential that holds all needed to do a GET/POST to a HTML form.
- HtmlFormCredential() - Constructor for class org.archive.modules.credential.HtmlFormCredential
-
Constructor.
- HTMLLinkContext - Class in org.archive.modules.extractor
-
XPath-like context for HTML discovered URIs.
- HTMLLinkContext(String) - Constructor for class org.archive.modules.extractor.HTMLLinkContext
-
Constructor.
- HTMLLinkContext(CharSequence, CharSequence) - Constructor for class org.archive.modules.extractor.HTMLLinkContext
-
- HTTP_BIND_ADDRESS - Static variable in class org.archive.modules.fetcher.FetchHTTP
-
- HTTP_SCHEME - Static variable in class org.archive.modules.fetcher.FetchHTTP
-
- HttpAuthenticationCredential - Class in org.archive.modules.credential
-
A Basic/Digest HTTP Authentication (RFC2617) credential.
- HttpAuthenticationCredential() - Constructor for class org.archive.modules.credential.HttpAuthenticationCredential
-
Constructor.
- HttpConnection - Class in org.apache.commons.httpclient
-
- HttpConnection(String, int) - Constructor for class org.apache.commons.httpclient.HttpConnection
-
Creates a new HTTP connection for the given host and port.
- HttpConnection(String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
-
Creates a new HTTP connection for the given host and port
using the given protocol.
- HttpConnection(String, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
-
Creates a new HTTP connection for the given host with the virtual
alias and port using given protocol.
- HttpConnection(String, int, String, int) - Constructor for class org.apache.commons.httpclient.HttpConnection
-
Creates a new HTTP connection for the given host and port via the
given proxy host and port using the default protocol.
- HttpConnection(HostConfiguration) - Constructor for class org.apache.commons.httpclient.HttpConnection
-
Creates a new HTTP connection for the given host configuration.
- HttpConnection(String, int, String, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
-
Deprecated.
use #HttpConnection(String, int, String, int, Protocol)
- HttpConnection(String, int, String, int, Protocol) - Constructor for class org.apache.commons.httpclient.HttpConnection
-
Creates a new HTTP connection for the given host with the virtual
alias and port via the given proxy host and port using the given
protocol.
- HTTPContentDigest - Class in org.archive.modules.extractor
-
A processor for calculating custom HTTP content digests in place of the
default (if any) computed by the HTTP fetcher processors.
- HTTPContentDigest() - Constructor for class org.archive.modules.extractor.HTTPContentDigest
-
Constructor.
- httpMethod - Variable in class org.archive.modules.credential.HtmlFormCredential
-
GET or POST.
- HttpMethodBase - Class in org.apache.commons.httpclient
-
An abstract base implementation of HttpMethod.
- HttpMethodBase() - Constructor for class org.apache.commons.httpclient.HttpMethodBase
-
No-arg constructor.
- HttpMethodBase(String) - Constructor for class org.apache.commons.httpclient.HttpMethodBase
-
Constructor specifying a URI.
- HttpParser - Class in org.apache.commons.httpclient
-
This class exists solely for compatibility, it's with httpclient
The actual functionality is in LaxHttpParser
- HttpParser() - Constructor for class org.apache.commons.httpclient.HttpParser
-
- HTTPS_SCHEME - Static variable in class org.archive.modules.fetcher.FetchHTTP
-
- HttpState - Class in org.apache.commons.httpclient
-
A container for HTTP attributes that may persist from request
to request, such as
cookies
and authentication
credentials
.
- HttpState() - Constructor for class org.apache.commons.httpclient.HttpState
-
Default constructor.
- id - Variable in class org.archive.net.s3.S3URLConnection
-
- IdenticalDigestDecideRule - Class in org.archive.modules.deciderules.recrawl
-
Rule applies configured decision to any CrawlURIs whose prior-history
content-digest matches the latest fetch.
- IdenticalDigestDecideRule() - Constructor for class org.archive.modules.deciderules.recrawl.IdenticalDigestDecideRule
-
Usual constructor.
- IdentityCacheable - Interface in org.archive.util
-
Common interface for objects held in ObjectIdentityCaches.
- IdentityCacheableWrapper<K> - Class in org.archive.util
-
Wrapper allowing other objects to be held in an ObjectIdentityCache.
- IdentityCacheableWrapper(String, K) - Constructor for class org.archive.util.IdentityCacheableWrapper
-
- IgnoreCookiesSpec - Class in org.apache.commons.httpclient.cookie
-
A cookie spec that does nothing.
- IgnoreCookiesSpec() - Constructor for class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- IgnoreRobotsPolicy - Class in org.archive.modules.net
-
Policy to ignore robots.
- IgnoreRobotsPolicy() - Constructor for class org.archive.modules.net.IgnoreRobotsPolicy
-
- IMG_SRC - Static variable in class org.archive.modules.extractor.HTMLLinkContext
-
- importRecoverFormat(File, boolean, boolean, boolean, String) - Method in interface org.archive.crawler.framework.Frontier
-
Import URIs from the given file (in recover-log-like format, with
a 3-character 'type' tag preceding a URI with optional hops/via).
- importRecoverFormat(File, boolean, boolean, boolean, String) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Import URIs from the given file (in recover-log-like format, with
a 3-character 'type' tag preceding a URI with optional hops/via).
- importRecoverLog(JSONObject, Frontier) - Static method in class org.archive.crawler.frontier.FrontierJournal
-
Utility method for scanning a recovery journal and applying it to
a Frontier.
- importURIs(String) - Method in interface org.archive.crawler.framework.Frontier
-
Load URIs from a file, for scheduling and/or considered-included
status (if from a recovery log).
- importURIs(String) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- importURIsSimple(JSONObject) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Import URIs from either a simple (one URI per line) or crawl.log
format.
- inactiveQueuesByPrecedence - Variable in class org.archive.crawler.frontier.BdbFrontier
-
All 'inactive' queues, not yet in active rotation.
- included(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
-
- includesRetireDirective() - Method in class org.archive.modules.CrawlURI
-
- incrementConsecutiveConnectionErrors() - Method in class org.archive.modules.net.CrawlServer
-
- incrementDeferrals() - Method in class org.archive.modules.CrawlURI
-
Increment the deferral count.
- incrementDiscardedOutLinks() - Method in class org.archive.modules.CrawlURI
-
- incrementDisregardedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Increment the running count of disregarded URIs.
- incrementFailedFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Increment the running count of failed URIs.
- incrementFetchAttempts() - Method in class org.archive.modules.CrawlURI
-
Increment the count of attempts (trips through the processing
loop) at getting the document referenced by this URI.
- incrementMapCount(ConcurrentMap<String, AtomicLong>, String) - Static method in class org.archive.crawler.reporting.StatisticsTracker
-
Increment a counter for a key in a given HashMap.
- incrementMapCount(ConcurrentMap<String, AtomicLong>, String, long) - Static method in class org.archive.crawler.reporting.StatisticsTracker
-
Increment a counter for a key in a given HashMap by an arbitrary amount.
- incrementQueuedUriCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Increment the running count of queued URIs.
- incrementQueuedUriCount(long) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Increment the running count of queued URIs.
- incrementSucceededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Increment the running count of successfully fetched URIs.
- INDEX_FORMAT - Static variable in class org.archive.checkpointing.Checkpoint
-
format for serial numbers
- indexOfCurrentIterator - Variable in class org.archive.util.iterator.CompositeIterator
-
- INFERRED_MISC - Static variable in class org.archive.modules.extractor.LinkContext
-
Stand-in value for inferred urls without other context.
- inferRootPage - Variable in class org.archive.modules.extractor.ExtractorHTTP
-
should all HTTP URIs be used to infer a link to the site's root?
- inheritFrom(CrawlURI) - Method in class org.archive.modules.CrawlURI
-
Inherit (copy) the relevant keys-values from the ancestor.
- initAllQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
-
- initAllQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Initialize the allQueues field in an implementation-appropriate
way.
- INITARGS - Static variable in class org.archive.bdb.AutoKryo
-
- initialDelaySeconds - Variable in class org.archive.crawler.framework.ActionDirectory
-
how long after crawl start to first scan action directory
- initialize(Database) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
Method shared by constructors.
- initialize(File) - Method in class org.archive.io.CrawlerJournal
-
- initialize() - Method in class org.archive.modules.extractor.PDFParser
-
Initialize opens the document for reading.
- initialize(Environment, String, Class, StoredClassCatalog) - Method in class org.archive.util.ObjectIdentityBdbCache
-
Call this method when you have an instance when you used the
default constructor or when you have a deserialized instance that you
want to reconnect with an extant bdbje environment.
- initialize(Environment, String, Class, StoredClassCatalog) - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
Call this method when you have an instance when you used the
default constructor or when you have a deserialized instance that you
want to reconnect with an extant bdbje environment.
- initializeFromReader(BufferedReader) - Method in class org.archive.modules.net.Robotstxt
-
- initInternalQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Initializes internal queues.
- initLaunchDir() - Method in class org.archive.spring.PathSharingContext
-
- initLaunchId() - Method in class org.archive.spring.PathSharingContext
-
- initLifecycleProcessor() - Method in class org.archive.spring.PathSharingContext
-
Initialize the LifecycleProcessor.
- initOtherQueues() - Method in class org.archive.crawler.frontier.BdbFrontier
-
- initOtherQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Initialize all other internal queues in an implementation-appropriate
way.
- initOutputStream(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
Get the OutputStream for the file to write to.
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.AcceptDecideRule
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.ContentLengthDecideRule
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRule
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PredicatedDecideRule
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.PrerequisiteAcceptDecideRule
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.RejectDecideRule
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- innerDecide(CrawlURI) - Method in class org.archive.modules.deciderules.SeedAcceptDecideRule
-
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
-
Actually extracts links.
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorCSS
-
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorDOC
-
Processes a word document and extracts any hyperlinks from it.
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorJS
-
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorPDF
-
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorSWF
-
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorUniversal
-
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorXML
-
- innerExtract(CrawlURI) - Method in class org.archive.modules.extractor.TrapSuppressExtractor
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
Run candidates chain on each of (1) any prerequisite, if present;
(2) any outCandidates, if present; (3) all outlinks, if appropriate
- innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
- innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.CandidateScoper
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- innerProcess(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.extractor.Extractor
-
Processes the given URI.
- innerProcess(CrawlURI) - Method in class org.archive.modules.extractor.HTTPContentDigest
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchDNS
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchFTP
-
Processes the given URI.
- innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.Processor
-
Actually performs the process.
- innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryLoader
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryStorer
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistStoreProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.ScriptedProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- innerProcess(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- innerProcessResult(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
Notes a CrawlURI's content size in its running tally.
- innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.CandidateScoper
-
- innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
-
- innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- innerProcessResult(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- innerProcessResult(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
-
- innerProcessResult(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
-
- innerProcessResult(CrawlURI) - Method in class org.archive.modules.Processor
-
- innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.ARCWriterProcessor
-
Writes a CrawlURI and its associated data to store file.
- innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
Writes a CrawlURI and its associated data to store file.
- innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- innerRejectProcess(CrawlURI) - Method in class org.archive.modules.Processor
-
Invoked after a URI has been rejected.
- innerRejectProcess(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.BdbCookieStorage
-
- innerSaveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.SimpleCookieStorage
-
- inProcessQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
all per-class queues from whom a URI is outstanding
- insertItem(WorkQueueFrontier, CrawlURI, boolean) - Method in class org.archive.crawler.frontier.BdbWorkQueue
-
- insertItem(WorkQueueFrontier, CrawlURI, boolean) - Method in class org.archive.crawler.frontier.WorkQueue
-
Insert the given curi, whether it is already present or not.
- insertKeyToString(DatabaseEntry) - Static method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
- installProvider(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
-
Install the appropriate provider helper object into the WorkQueue,
if necessary.
- installProvider(WorkQueue) - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy
-
- installReplicasUpTo(int) - Method in class org.archive.util.LongToIntConsistentHash
-
Install necessary replicas, if not already present.
- INSTANCE - Static variable in class org.archive.modules.net.IgnoreRobotsPolicy
-
- INSTANCE - Static variable in class org.archive.modules.net.ObeyRobotsPolicy
-
- INSTANCE - Static variable in interface org.archive.util.CLibrary
-
- INSTANCE - Static variable in interface org.archive.util.FilesystemLinkMaker.Kernel32Library
-
- instance - Variable in class org.archive.util.Supplier
-
- instanceMain(String[]) - Method in class org.archive.crawler.Heritrix
-
- instanceMain(String[]) - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
-
- instanceMain(String[]) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
-
- instantiateContainer() - Method in class org.archive.crawler.framework.CrawlJob
-
Can the configuration yield an assembled ApplicationContext?
- interpolate(String) - Method in class org.archive.spring.ConfigPathConfigurer
-
- intervalSeconds - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
The interval between writing progress information to log.
- invert(DecideResult) - Static method in enum org.archive.modules.deciderules.DecideResult
-
- invokeStatic(String, Class<?>, Class<?>[], Object[]) - Method in class org.archive.bdb.AutoKryo
-
- IP_ADDRESS - Static variable in class org.archive.modules.extractor.ExtractorUniversal
-
Matches any string that begins with http:// or https:// followed by
something that looks like an ip address (four numbers, none longer then
3 chars seperated by 3 dots).
- IP_ADDRESS_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- IP_ADDRESS_REGEX - Static variable in class org.archive.modules.fetcher.FetchWhois
-
- IP_NEVER_EXPIRES - Static variable in class org.archive.modules.net.CrawlHost
-
Flag value indicating always-valid IP
- IP_NEVER_LOOKED_UP - Static variable in class org.archive.modules.net.CrawlHost
-
Flag value indicating an IP has not yet been looked up
- IpAddressSetDecideRule - Class in org.archive.modules.deciderules
-
IpAddressSetDecideRule must be used with
Preselector.setRecheckScope(boolean)
set
to true because it relies on Heritrix' dns lookup to establish the ip address
for a URI before it can run.
- IpAddressSetDecideRule() - Constructor for class org.archive.modules.deciderules.IpAddressSetDecideRule
-
- IPQueueAssignmentPolicy - Class in org.archive.crawler.frontier
-
Uses target IP as basis for queue-assignment, unless it is unavailable,
in which case it behaves as HostnameQueueAssignmentPolicy.
- IPQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.IPQueueAssignmentPolicy
-
- is2XXSuccess() - Method in class org.archive.modules.CrawlURI
-
- isAborted() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Tests whether the execution of this method has been aborted
- isActive() - Method in class org.archive.crawler.framework.CrawlController
-
Is this crawl actively able/trying to crawl? Includes both
states RUNNING and EMPTY.
- isActive() - Method in class org.archive.crawler.framework.ToeThread
-
Is this thread validly processing a URI, not paused, waiting for
a URI, or interrupted?
- isAllowCreate() - Method in class org.archive.bdb.BdbModule.BdbConfig
-
- isARCType(String) - Method in class org.archive.io.Warc2Arc
-
- isAuthenticationPreemptive() - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
Use
HttpClientParams.isAuthenticationPreemptive()
,
HttpClient.getParams()
.
- isCheckpointing() - Method in class org.archive.crawler.framework.CheckpointService
-
- isCheckpointRecovery - Variable in class org.archive.modules.fetcher.BdbCookieStorage
-
are we a checkpoint recovery? (in which case, reuse stored cookie data?)
- isCheckpointRecovery - Variable in class org.archive.modules.net.BdbServerCache
-
- isConnectionCloseForced() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Tests if the connection should be force-closed when no longer needed.
- isDisregarded(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- isDomainAttributeSpecified() - Method in class org.apache.commons.httpclient.Cookie
-
Returns true if cookie's domain was set via a domain
attribute in the Set-Cookie header.
- isEmpty() - Method in class org.archive.bdb.StoredQueue
-
- isEmpty() - Method in interface org.archive.crawler.framework.Frontier
-
Returns true if the frontier contains no more URIs to crawl.
- isEmpty() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Frontier is empty only if all queues are empty and no URIs are in-process
- isEmpty() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Return whether frontier is exhausted: all crawlable URIs done (none
waiting or pending).
- isEveryTime() - Method in class org.archive.modules.credential.Credential
-
- isEveryTime() - Method in class org.archive.modules.credential.HtmlFormCredential
-
- isEveryTime() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- isExpired() - Method in class org.apache.commons.httpclient.Cookie
-
Returns true if this cookie has expired.
- isExpired(Date) - Method in class org.apache.commons.httpclient.Cookie
-
Returns true if this cookie has expired according to the time passed in.
- isExpired() - Method in class org.archive.crawler.restlet.Flash
-
Indicate whether the Flash should persist.
- isFailure() - Method in class org.archive.crawler.restlet.models.ScriptModel
-
- isFinished() - Method in class org.archive.crawler.framework.CrawlController
-
- isHtmlExpectedHere(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTML
-
Test whether this HTML is so unexpected (eg in place of a GIF URI)
that it shouldn't be scanned for links.
- isHttp11() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
Use HttpMethodParams.getVersion()
- isHttpTransaction() - Method in class org.archive.modules.CrawlURI
-
Return true if this is a http transaction.
- isInScope(CrawlURI) - Method in class org.archive.crawler.framework.Scoper
-
Schedule the given
CrawlURI
with the Frontier.
- isInScope(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
-
- isIpExpired(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
Return true if ip should be looked up.
- isLaunchable() - Method in class org.archive.crawler.framework.CrawlJob
-
Is it reasonable to offer a launch button
- isLaunchInfoPartial - Variable in class org.archive.crawler.framework.CrawlJob
-
- isLaunchInfoPartial() - Method in class org.archive.crawler.framework.CrawlJob
-
- isLikelyFalsePositive(CharSequence) - Static method in class org.archive.util.UriUtils
-
- isLikelyUri(CharSequence) - Static method in class org.archive.util.UriUtils
-
- isLikelyUriHtmlContextLegacy(CharSequence) - Static method in class org.archive.util.UriUtils
-
- isLikelyUriJavascriptContextLegacy(CharSequence) - Static method in class org.archive.util.UriUtils
-
- isLocation() - Method in class org.archive.modules.CrawlURI
-
- isLocked() - Method in class org.apache.commons.httpclient.HttpConnection
-
Tests if the connection is locked.
- isManaged - Variable in class org.archive.crawler.frontier.WorkQueue
-
Whether queue is already in lifecycle stage
- isManaged() - Method in class org.archive.crawler.frontier.WorkQueue
-
Whether the queue is already in a lifecycle stage --
such as ready, in-progress, snoozed -- and thus should
not be redundantly inserted to readyClassQueues
- isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.CustomRobotsPolicy
-
- isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
-
- isObeyMetaRobotsNofollow() - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
-
- isolateThreads - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
-
Whether each ToeThread should get its own independent script
engine, or they should share synchronized access to one
engine.
- isolateThreads - Variable in class org.archive.modules.ScriptedProcessor
-
Whether each ToeThread should get its own independent script
engine, or they should share synchronized access to one
engine.
- isOpen - Variable in class org.apache.commons.httpclient.HttpConnection
-
Whether or not the connection is connected.
- isOpen() - Method in class org.apache.commons.httpclient.HttpConnection
-
Tests if the connection is open.
- isOverSessionBudget() - Method in class org.archive.crawler.frontier.WorkQueue
-
Check whether queue has temporarily (session) exceeded its budget.
- isOverTotalBudget() - Method in class org.archive.crawler.frontier.WorkQueue
-
Check whether queue has permanently (total) exceeded its budget.
- isPaintable() - Method in class org.archive.io.ReadSourceEditor
-
- isPaintable() - Method in class org.archive.spring.ConfigPathEditor
-
- isPathAttributeSpecified() - Method in class org.apache.commons.httpclient.Cookie
-
Returns true if cookie's path was set via a path attribute
in the Set-Cookie header.
- isPausable() - Method in class org.archive.crawler.framework.CrawlJob
-
- isPaused() - Method in class org.archive.crawler.framework.CrawlController
-
Tell if the controller is paused
- isPausing() - Method in class org.archive.crawler.framework.CrawlController
-
- isPersistent() - Method in class org.apache.commons.httpclient.Cookie
-
Returns false if the cookie should be discarded at the end
of the "session"; true otherwise.
- isPossibleUri(CharSequence) - Static method in class org.archive.util.UriUtils
-
- isPost() - Method in class org.archive.modules.credential.Credential
-
- isPost() - Method in class org.archive.modules.credential.HtmlFormCredential
-
- isPost() - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- isPrerequisite() - Method in class org.archive.modules.CrawlURI
-
Returns true if this CrawlURI is a prerequisite.
- isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.Credential
-
- isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HtmlFormCredential
-
- isPrerequisite(CrawlURI) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- isProfile() - Method in class org.archive.crawler.framework.CrawlJob
-
Is this job a 'profile' (or template), meaning it may be editted
or copied to another jobs, but should not be launched.
- isProxied() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns true if the connection is established via a proxy,
false otherwise.
- isQuadAddress(CrawlURI, String, CrawlHost) - Method in class org.archive.modules.fetcher.FetchDNS
-
- isRequestSent() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns true if the HTTP has been transmitted to the target
server in its entirety, false otherwise.
- isResponseAvailable() - Method in class org.apache.commons.httpclient.HttpConnection
-
Tests if input data avaialble.
- isResponseAvailable(int) - Method in class org.apache.commons.httpclient.HttpConnection
-
Tests if input data becomes available within the given period time in milliseconds.
- isRetired() - Method in class org.archive.crawler.frontier.WorkQueue
-
- isRobotsExpired(int) - Method in class org.archive.modules.net.CrawlServer
-
Is the robots policy expired.
- isRunning - Variable in class org.archive.bdb.BdbModule
-
- isRunning() - Method in class org.archive.bdb.BdbModule
-
- isRunning() - Method in class org.archive.crawler.framework.ActionDirectory
-
- isRunning - Variable in class org.archive.crawler.framework.CheckpointService
-
- isRunning() - Method in class org.archive.crawler.framework.CheckpointService
-
- isRunning - Variable in class org.archive.crawler.framework.CrawlController
-
- isRunning() - Method in class org.archive.crawler.framework.CrawlController
-
- isRunning() - Method in class org.archive.crawler.framework.CrawlJob
-
- isRunning - Variable in class org.archive.crawler.framework.Scoper
-
- isRunning() - Method in class org.archive.crawler.framework.Scoper
-
- isRunning() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- isRunning() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- isRunning() - Method in class org.archive.crawler.processor.CrawlMapper
-
- isRunning - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- isRunning() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- isRunning - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- isRunning() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- isRunning - Variable in class org.archive.crawler.util.BdbUriUniqFilter
-
- isRunning() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- isRunning - Variable in class org.archive.modules.deciderules.DecideRuleSequence
-
- isRunning() - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- isRunning - Variable in class org.archive.modules.fetcher.AbstractCookieStorage
-
- isRunning() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- isRunning() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- isRunning() - Method in class org.archive.modules.fetcher.FetchWhois
-
- isRunning - Variable in class org.archive.modules.net.BdbServerCache
-
- isRunning() - Method in class org.archive.modules.net.BdbServerCache
-
- isRunning - Variable in class org.archive.modules.Processor
-
- isRunning() - Method in class org.archive.modules.Processor
-
- isRunning - Variable in class org.archive.modules.ProcessorChain
-
- isRunning() - Method in class org.archive.modules.ProcessorChain
-
- isRunning() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- isRunning() - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- isRunning() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- isSecure() - Method in class org.apache.commons.httpclient.HttpConnection
-
Returns true if the connection is established over
a secure protocol.
- isSeed() - Method in class org.archive.modules.CrawlURI
-
- isStale() - Method in class org.apache.commons.httpclient.HttpConnection
-
Determines whether this connection is "stale", which is to say that either
it is no longer open, or an attempt to read the connection would fail.
- isStaleCheckingEnabled() - Method in class org.apache.commons.httpclient.HttpConnection
-
- isStopComplete - Variable in class org.archive.crawler.framework.CrawlController
-
- isStopComplete() - Method in class org.archive.crawler.framework.CrawlController
-
- isStrictMode() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
Use HttpParams.setParameter(String, Object)
to exercise a more granular control over HTTP protocol strictness.
- isSuccess() - Method in class org.archive.modules.CrawlURI
-
Ask this URI if it was a success or not.
- isSuccess(CrawlURI) - Static method in class org.archive.modules.Processor
-
- isTransactional() - Method in class org.archive.bdb.BdbModule.BdbConfig
-
- isTransparent() - Method in class org.apache.commons.httpclient.HttpConnection
-
Indicates if the connection is completely transparent from end to end.
- isUnicode() - Method in class org.archive.util.ms.Piece
-
- isUnpausable() - Method in class org.archive.crawler.framework.CrawlJob
-
- isValidRobots() - Method in class org.archive.modules.net.CrawlServer
-
If true then valid robots.txt information has been retrieved.
- isVeryLikelyUri(CharSequence) - Static method in class org.archive.util.UriUtils
-
- isXmlOk() - Method in class org.archive.crawler.framework.CrawlJob
-
Is the primary config file legal XML?
- iterator() - Method in class org.archive.bdb.StoredQueue
-
- iterator() - Method in class org.archive.modules.ProcessorChain
-
- iterator - Variable in class org.archive.util.Iteratorable
-
- iterator() - Method in class org.archive.util.Iteratorable
-
- iterator() - Method in class org.archive.util.Transform
-
- Iteratorable<K> - Class in org.archive.util
-
Make an Iterator usable as an Iterable (and thus enable new-style
for-each loops).
- Iteratorable(Iterator<K>) - Constructor for class org.archive.util.Iteratorable
-
- iterators - Variable in class org.archive.util.iterator.CompositeIterator
-
- m - Variable in class org.archive.util.BloomFilter64bit
-
The number of bits in this filter.
- main(String[]) - Static method in class org.archive.crawler.frontier.precedence.PrecedenceLoader
-
Utility main for importing a text file (first argument) with lines of
the form:
URI [whitespace] precedence
into a BDB-JE environment (second argument, created if necessary).
- main(String[]) - Static method in class org.archive.crawler.Heritrix
-
Launches a local Engine and restfgul web interface given the
command-line options or defaults.
- main(String[]) - Static method in class org.archive.crawler.migrate.MigrateH1to3Tool
-
- main(String[]) - Static method in class org.archive.crawler.util.BenchmarkUriUniqFilters
-
Test the UriUniqFilter implementation (MemUriUniqFilter,
BloomUriUniqFilter, or BdbUriUniqFilter) named in first
argument against the file of one-per-line URIs named
in the second argument.
- main(String[]) - Static method in class org.archive.crawler.util.RecoveryLogMapper
-
- main(String[]) - Static method in class org.archive.io.Arc2Warc
-
Command-line interface to Arc2Warc.
- main(String[]) - Static method in class org.archive.io.Warc2Arc
-
Command-line interface to Arc2Warc.
- main(String[]) - Static method in class org.archive.modules.extractor.PDFParser
-
- main(String[]) - Static method in class org.archive.modules.recrawl.PersistProcessor
-
Utility main for importing a log into a BDB-JE environment or moving a
database between environments (2 arguments), or simply dumping a log
to stderr in a more readable format (1 argument).
- main(String[]) - Static method in class org.archive.util.Base32
-
For testing, take a command-line argument in Base32, decode, print in hex,
encode, print
- main(String[]) - Static method in class org.archive.util.FilesystemLinkMaker
-
- main(String[]) - Static method in class org.archive.util.JndiUtils
-
Testing code.
- main(String[]) - Static method in class org.archive.util.OneLineSimpleLogger
-
Test this logger.
- make(long, int) - Static method in class st.ata.util.FPGenerator
-
Return a fingerprint generator.
- makeBindings(Map<String, ExtractorMultipleRegex.MatchList>, String[], int) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
- makeConsequentCandidate(String, LinkContext, Hop) - Method in class org.archive.modules.CrawlURI
-
Create a consequent CrawlURI from this one, given the
additional parameters
- makeData(String, String) - Method in class org.archive.modules.extractor.StringExtractorTestBase
-
- makeDataModel() - Method in class org.archive.crawler.restlet.BeanBrowseResource
-
Constructs a nested Map data structure with the information represented
by this Resource.
- makeDataModel() - Method in class org.archive.crawler.restlet.EngineResource
-
Constructs a nested Map data structure with the information represented
by this Resource.
- makeDataModel() - Method in class org.archive.crawler.restlet.JobResource
-
Constructs a nested Map data structure with the information represented
by this Resource.
- makeDataModel() - Method in class org.archive.crawler.restlet.ScriptResource
-
Constructs a nested Map data structure with the information represented
by this Resource.
- makeDirty() - Method in class org.archive.crawler.frontier.WorkQueue
-
- makeDirty() - Method in class org.archive.crawler.reporting.SeedRecord
-
- makeDirty() - Method in class org.archive.modules.net.CrawlHost
-
- makeDirty() - Method in class org.archive.modules.net.CrawlServer
-
- makeDirty() - Method in interface org.archive.util.IdentityCacheable
-
- makeDirty() - Method in class org.archive.util.IdentityCacheableWrapper
-
- makeExtractor() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
-
Subclasses should return an Extractor instance to test.
- makeHardLink(String, String) - Static method in class org.archive.util.FilesystemLinkMaker
-
Wrapper over platform-dependent system calls to create a hard link.
- makeHeritable(String) - Method in class org.archive.modules.CrawlURI
-
Make the given key 'heritable', meaning its value will be
added to descendant CrawlURIs.
- makeLongFPSet() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
-
- makeModule() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
-
- makeModule() - Method in class org.archive.state.ModuleTestBase
-
Return an example instance of the module.
- makeNonHeritable(String) - Method in class org.archive.modules.CrawlURI
-
Make the given key non-'heritable', meaning its value will
not be added to descendant CrawlURIs.
- makeOne(String, boolean, String) - Method in class org.archive.net.UURIFactory
-
- makeOne(UsableURI, UsableURI) - Method in class org.archive.net.UURIFactory
-
- makePackageSuite(Class<?>) - Static method in class org.archive.util.TestUtils
-
- makePresentableMapFor(String, Object) - Method in class org.archive.crawler.restlet.JobRelatedResource
-
Constructs a nested Map data structure of the information represented
by object
.
- makePresentableMapFor(String, Object, String) - Method in class org.archive.crawler.restlet.JobRelatedResource
-
Constructs a nested Map data structure of the information represented
by object
.
- makePresentableMapFor(String, Object, HashSet<Object>, String) - Method in class org.archive.crawler.restlet.JobRelatedResource
-
Constructs a nested Map data structure of the information represented
by object
.
- makeSpace() - Method in class org.archive.util.AbstractLongFPSet
-
Make additional space to keep the load under the target
loadFactor level.
- makeSpace() - Method in class org.archive.util.fingerprint.LongFPSetCache
-
- makeSpace() - Method in class org.archive.util.fingerprint.MemLongFPSet
-
- makeSuite(File, File) - Static method in class org.archive.util.TestUtils
-
- makeSymbolicLink(String, String) - Static method in class org.archive.util.FilesystemLinkMaker
-
Wrapper over platform-dependent system calls to create a symbolic link.
- makeTempDir() - Static method in class org.archive.modules.net.DefaultTempDirProvider
-
- makeWhoisUrl(String, String) - Method in class org.archive.modules.fetcher.FetchWhois
-
- managementTasks() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Main loop of frontier's managerThread.
- MANAGER - Static variable in class org.archive.crawler.framework.ActionDirectory
-
shared ScriptEngineManager
- MANAGER - Static variable in class org.archive.crawler.restlet.ScriptResource
-
- managerThread - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
Distinguished frontier manager thread which handles all juggling
of URI queues and queues/maps of queues for proper ordering/delay of
URI processing.
- MANIFEST_CONFIG_FILE - Static variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
abbreviation label for config files in manifest
- MANIFEST_LOG_FILE - Static variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
abbreviation label for log files in manifest
- MANIFEST_REPORT_FILE - Static variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
abbreviation label for report files in manifest
- map(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
-
Look up the crawler node name to which the given CrawlURI
should be mapped.
- map(CrawlURI) - Method in class org.archive.crawler.processor.HashCrawlMapper
-
Look up the crawler node name to which the given CrawlURI
should be mapped.
- map - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
-
Mapping of classKey ranges (as represented by their start) to
crawlers (by abstract name/filename)
- map(CrawlURI) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
-
Look up the crawler node name to which the given CrawlURI
should be mapped.
- map - Variable in class org.archive.spring.Sheet
-
map of full property-paths (from BeanFactory to individual
property) and their changed value when this Sheet of overrides
is in effect
- map - Variable in class org.archive.util.ObjectIdentityMemCache
-
- mapPath - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
-
Path to map specification file.
- mapString(String, String, long) - Static method in class org.archive.crawler.processor.HashCrawlMapper
-
- mapUri - Variable in class org.archive.crawler.processor.LexicalCrawlMapper
-
URI to map specification file.
- markAsSeen(int, int) - Method in class org.archive.modules.extractor.PDFParser
-
Note that an object (id/generation pair) has been seen by this parser
so that it can be handled differently when it is encountered again.
- markPrerequisite(String) - Method in class org.archive.modules.CrawlURI
-
Do all actions associated with setting a CrawlURI
as
requiring a prerequisite.
- marshal(XmlWriter, String, Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
-
- marshal(XmlWriter, String, Map<?, ?>) - Static method in class org.archive.crawler.restlet.XmlMarshaller
-
- marshal(XmlWriter, String, Iterable<?>) - Static method in class org.archive.crawler.restlet.XmlMarshaller
-
- marshal(XmlWriter, Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
-
- marshalAsElement(Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
-
- marshalBean(XmlWriter, String, Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
-
generate nested XML structure for a bean obj
.
- marshalDocument(Writer, String, Object) - Static method in class org.archive.crawler.restlet.XmlMarshaller
-
Writes content
as xml to writer
.
- match(String, int, String, boolean, Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Determines if a Cookie matches a location.
- match(String, int, String, boolean, Cookie[]) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Deprecated.
use match(String, int, String, boolean, SortedMap)
// END IA/HERITRIX CHANGES
- match(String, int, String, boolean, SortedMap<String, Cookie>) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Determines which of an array of Cookies matches a location.
- match(String, int, String, boolean, Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Return true if the cookie should be submitted with a request
with given attributes, false otherwise.
- match(String, int, String, boolean, Cookie[]) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Deprecated.
use match(String, int, String, boolean, SortedMap)
// END IA/HERITRIX CHANGES
- match(String, int, String, boolean, SortedMap<String, Cookie>) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Return an array of
Cookie
s that should be submitted with a
request with given attributes,
false otherwise.
- match(String, int, String, boolean, Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- match(String, int, String, boolean, Cookie[]) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
Returns an empty
cookie
array.
- match(String, int, String, boolean, SortedMap<String, Cookie>) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- MatchesFilePatternDecideRule - Class in org.archive.modules.deciderules
-
Compares suffix of a passed CrawlURI, UURI, or String against a regular
expression pattern, applying its configured decision to all matches.
- MatchesFilePatternDecideRule() - Constructor for class org.archive.modules.deciderules.MatchesFilePatternDecideRule
-
Usual constructor.
- MatchesFilePatternDecideRule.Preset - Enum in org.archive.modules.deciderules
-
- MatchesListRegexDecideRule - Class in org.archive.modules.deciderules
-
Rule applies configured decision to any CrawlURIs whose String URI
matches the supplied regexs.
- MatchesListRegexDecideRule() - Constructor for class org.archive.modules.deciderules.MatchesListRegexDecideRule
-
Usual constructor.
- MatchesRegexDecideRule - Class in org.archive.modules.deciderules
-
Rule applies configured decision to any CrawlURIs whose String URI
matches the supplied regex.
- MatchesRegexDecideRule() - Constructor for class org.archive.modules.deciderules.MatchesRegexDecideRule
-
Usual constructor.
- MatchesStatusCodeDecideRule - Class in org.archive.modules.deciderules
-
Provides a rule that returns "true" for any CrawlURIs which have a fetch
status code that falls within the provided inclusive range.
- MatchesStatusCodeDecideRule() - Constructor for class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
-
Creates a new MatchStatusCodeDecideRule instance.
- MAX_SIZE - Static variable in class org.archive.modules.net.Robotstxt
-
- MAX_SNOOZED_IN_MEMORY - Static variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
- maxBytesDownload - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
-
Maximum number of bytes to download.
- maxDocumentsDownload - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
-
Maximum number of documents to download.
- maxFileSizeBytes - Variable in class org.archive.modules.writer.Kw3WriterProcessor
-
Max size for each file.
- maxFileSizeBytes - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Max size of each file.
- maximumNumberOfKeys() - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
-
- maximumNumberOfKeys() - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
-
Returns the maximum number of different keys this policy
can create.
- maxPathLength - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
Maximum file system path length.
- maxPending - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
size at which to force flush of pending items
- maxQueuesPerReportCategory - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
truncate reporting of queues at this large but not unbounded number
- maxSegLength - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
Maximum file system path segment length.
- maxsize - Variable in class org.archive.crawler.util.TopNSet
-
- maxTimeSeconds - Variable in class org.archive.crawler.framework.CrawlLimitEnforcer
-
Maximum amount of time to crawl (in seconds).
- maxToeThreads - Variable in class org.archive.crawler.framework.CrawlController
-
Maximum number of threads processing URIs at the same time.
- maxTotalBytesToWrite - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Total file bytes to write to disk.
- maxWaitForIdleMs - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Maximum time to wait on idle writer before (possibly) creating an
additional instance.
- MEDIUM - Static variable in class org.archive.modules.SchedulingConstants
-
Medium priority.
- MemFPMergeUriUniqFilter - Class in org.archive.crawler.util
-
Crude all-in-memory FP-merging UriUniqFilter.
- MemFPMergeUriUniqFilter() - Constructor for class org.archive.crawler.util.MemFPMergeUriUniqFilter
-
- MemLongFPSet - Class in org.archive.util.fingerprint
-
Open-addressing in-memory hash set for holding primitive long fingerprints.
- MemLongFPSet() - Constructor for class org.archive.util.fingerprint.MemLongFPSet
-
- MemLongFPSet(int, float) - Constructor for class org.archive.util.fingerprint.MemLongFPSet
-
- memMap - Variable in class org.archive.util.ObjectIdentityBdbCache
-
in-memory map of new/recent/still-referenced-elsewhere instances
- memMap - Variable in class org.archive.util.ObjectIdentityBdbManualCache
-
in-memory map of new/recent/still-referenced-elsewhere instances
- MemUriUniqFilter - Class in org.archive.crawler.util
-
A purely in-memory UriUniqFilter based on a HashSet, which remembers
every full URI string it sees.
- MemUriUniqFilter() - Constructor for class org.archive.crawler.util.MemUriUniqFilter
-
- merge(ConfigPath) - Method in class org.archive.spring.ConfigPath
-
To maintain ConfigPath's 'base' and object-identity, this merge
should be used to updated ConfigPath properties in other beans,
rather than discarding the old value.
- mergeDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- mergeDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- mergePrior(CrawlURI) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
Merge any data from the Map stored in the URI-history store into the
current instance.
- message - Variable in class org.archive.crawler.event.CrawlStateEvent
-
- message - Variable in class org.archive.crawler.restlet.Flash
-
the message to show, if any
- META - Static variable in class org.archive.modules.extractor.HTMLLinkContext
-
- META_HREF - Static variable in class org.archive.modules.extractor.HTMLLinkContext
-
- metadata - Variable in class org.archive.crawler.framework.CrawlController
-
- metadata - Variable in class org.archive.crawler.postprocessor.DispositionProcessor
-
Auto-discovered module providing configured (or overridden)
User-Agent value and RobotsHonoringPolicy
- metadata - Variable in class org.archive.crawler.prefetch.PreconditionEnforcer
-
Auto-discovered module providing configured (or overridden)
User-Agent value and RobotsHonoringPolicy
- metadata - Variable in class org.archive.modules.extractor.ExtractorHTML
-
CrawlMetadata provides the robots honoring policy to use when
considering a robots META tag.
- MigrateH1to3Tool - Class in org.archive.crawler.migrate
-
Utility class which takes a H1 order.xml and creates a similar
H3 job directory, with as many simple settings converted over
(as top-of-crawler-beans overrides) as possible at this time.
- MigrateH1to3Tool() - Constructor for class org.archive.crawler.migrate.MigrateH1to3Tool
-
- mimeTypeBytes - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- mimeTypeDistribution - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
Keep track of the file types we see (mime type -> count)
- MimetypesReport - Class in org.archive.crawler.reporting
-
The "Mimetypes Report", tallies by MIME type.
- MimetypesReport() - Constructor for class org.archive.crawler.reporting.MimetypesReport
-
- MIN_ROBOTS_RETRIES - Static variable in class org.archive.modules.net.CrawlServer
-
only check if robots-fetch is perhaps superfluous
after this many tries
- MirrorWriterProcessor - Class in org.archive.modules.writer
-
Processor module that writes the results of successful fetches to
files on disk.
- MirrorWriterProcessor() - Constructor for class org.archive.modules.writer.MirrorWriterProcessor
-
- ModuleTestBase - Class in org.archive.state
-
Base class for unit testing Module implementations.
- ModuleTestBase() - Constructor for class org.archive.state.ModuleTestBase
-
Magical constructor that attempts to auto-create static key field
descriptions for your module class.
- monitorConfigPaths - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- monitorMounts - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
List of filessystem mounts whose 'available' space should be monitored
via 'df' (if available).
- monitorPaths - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- MostFavoredRobotsPolicy - Class in org.archive.modules.net
-
Follow a most-favored robots policy -- allowing an URL if either the
conventionally-configured User-Agent, or any of a number of alternate
User-Agents (from the candidateUserAgents list) would be allowed.
- MostFavoredRobotsPolicy() - Constructor for class org.archive.modules.net.MostFavoredRobotsPolicy
-
- PaddingStringBuffer - Class in org.archive.util
-
StringBuffer-like utility which can add spaces to reach a certain column.
- PaddingStringBuffer() - Constructor for class org.archive.util.PaddingStringBuffer
-
Create a new PaddingStringBuffer
- padTo(int) - Method in class org.archive.util.PaddingStringBuffer
-
Pad to a given column.
- PagedRepresentation - Class in org.archive.crawler.restlet
-
Representation wrapping a FileRepresentation, displaying its contents
in batches of lines at a time, with forward and backward navigation.
- PagedRepresentation(FileRepresentation, EnhDirectoryResource, String, String, String) - Constructor for class org.archive.crawler.restlet.PagedRepresentation
-
- pageFilter - Variable in class org.archive.crawler.restlet.EnhDirectory
-
- pageOutStaleEntries() - Method in class org.archive.util.ObjectIdentityBdbCache
-
An incremental, poll-based expunger.
- paintValue(Graphics, Rectangle) - Method in class org.archive.io.ReadSourceEditor
-
- paintValue(Graphics, Rectangle) - Method in class org.archive.spring.ConfigPathEditor
-
- parse(String, int, String, boolean, String) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Parse the "Set-Cookie" header value into Cookie array.
- parse(String, int, String, boolean, Header) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Parse the "Set-Cookie" Header into an array of Cookies.
- parse(String, int, String, boolean, String) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Parses the Set-Cookie value into an array of Cookies.
- parse(String, int, String, boolean, Header) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Parse the
"Set-Cookie" Header
into an array of
Cookie
s.
- parse(String, int, String, boolean, String) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
Returns an empty
cookie
array.
- parse(String, int, String, boolean, Header) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
Returns an empty
cookie
array.
- parseAttribute(NameValuePair, Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Parse the cookie attribute and update the corresponsing Cookie
properties.
- parseAttribute(NameValuePair, Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Parse the cookie attribute and update the corresponsing
Cookie
properties.
- parseAttribute(NameValuePair, Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
Does nothing.
- parseDefineBits(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineBitsJPEG3(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineBitsLossless(InStream, int, boolean) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineButtonSound(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineFont(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineFont2(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineJPEG2(InStream, int) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineJPEGTables(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineShape(int, InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineSound(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseDefineSprite(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseFontInfo(InStream, int, boolean) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parsePlaceObject2(InStream) - Method in class org.archive.modules.extractor.ExtractorSWF.ExtractorTagParser
-
- parseRevision(String) - Static method in class org.archive.io.Warc2Arc
-
- password - Variable in class org.archive.modules.credential.HttpAuthenticationCredential
-
Password.
- path - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- path - Variable in class org.archive.modules.writer.Kw3WriterProcessor
-
Top-level directory for archive files.
- path - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
Top-level directory for mirror files.
- path - Variable in class org.archive.spring.ConfigPath
-
- path - Variable in class org.archive.spring.ConfigPathConfigurer
-
'home' directory for all other paths to be resolved
relative to; defaults to directory of primary XML config file
- PATH_DELIM - Static variable in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Path delimiter
- PATH_DELIM_CHAR - Static variable in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Path delimiting charachter
- pathMatch(String, String) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Performs path-match as defined by the cookie specification.
- pathMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Performs path-match as implemented in common browsers.
- pathMatch(String, String) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
- PathologicalPathDecideRule - Class in org.archive.modules.deciderules
-
Rule REJECTs any URI which contains an excessive number of identical,
consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a'
segments)
- PathologicalPathDecideRule() - Constructor for class org.archive.modules.deciderules.PathologicalPathDecideRule
-
Constructs a new PathologicalPathFilter.
- PathSharingContext - Class in org.archive.spring
-
Spring ApplicationContext extended for Heritrix use.
- PathSharingContext(String) - Constructor for class org.archive.spring.PathSharingContext
-
- PathSharingContext(String[], ApplicationContext) - Constructor for class org.archive.spring.PathSharingContext
-
- PathSharingContext(String[], boolean, ApplicationContext) - Constructor for class org.archive.spring.PathSharingContext
-
- PathSharingContext(String[], boolean) - Constructor for class org.archive.spring.PathSharingContext
-
- PathSharingContext(String[]) - Constructor for class org.archive.spring.PathSharingContext
-
- pause() - Method in interface org.archive.crawler.framework.Frontier
-
Notify Frontier that it should not release any URIs, instead
holding all threads, until instructed otherwise.
- pause() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- pauseAtStart - Variable in class org.archive.crawler.framework.CrawlController
-
whether to pause at crawl start
- pauseThresholdKb - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
When available space on any monitored mounts falls below this threshold,
the crawl will be paused.
- pauseThresholdMiB - Variable in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- PDFParser - Class in org.archive.modules.extractor
-
Supports PDF parsing operations.
- PDFParser(String) - Constructor for class org.archive.modules.extractor.PDFParser
-
- PDFParser(byte[]) - Constructor for class org.archive.modules.extractor.PDFParser
-
- peek() - Method in class org.archive.bdb.StoredQueue
-
- peek(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
-
Return the topmost queue item -- and remember it,
such that even later higher-priority inserts don't
change it.
- peekItem - Variable in class org.archive.bdb.StoredQueue
-
- peekItem(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.BdbWorkQueue
-
- peekItem - Variable in class org.archive.crawler.frontier.WorkQueue
-
The next item to be returned
- peekItem(WorkQueueFrontier) - Method in class org.archive.crawler.frontier.WorkQueue
-
Returns first item from queue (does not delete)
- pend(long, CrawlURI) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
Place the given FP/CrawlURI pair into the pending set, awaiting
a merge to determine if it's actually accepted.
- pendDupAtLast - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- pendDuplicateCount - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- pending() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Count of items added, but not yet filtered in or out.
- pending() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- pending() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- pendingSet - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
items awaiting merge
TODO: consider only sorting just pre-merge
TODO: consider using a fastutil long->Object class
TODO: consider actually writing items to disk file,
as in Najork/Heydon
- pendingUris - Variable in class org.archive.crawler.frontier.BdbFrontier
-
all URIs scheduled to be crawled
- percentOfDiscoveredUrisCompleted() - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
-
This returns the number of completed URIs as a percentage of the total
number of URIs encountered (should be inverse to the discovery curve)
- persistKeyFor(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractContentDigestHistory
-
- persistKeyFor(CrawlURI) - Static method in class org.archive.modules.recrawl.PersistProcessor
-
Return a preferred String key for persisting the given CrawlURI's
AList state.
- persistKeyFor(String) - Static method in class org.archive.modules.recrawl.PersistProcessor
-
- PersistLoadProcessor - Class in org.archive.modules.recrawl
-
Loads CrawlURI attributes from previous fetch from persistent storage for
consultation by a later recrawl.
- PersistLoadProcessor() - Constructor for class org.archive.modules.recrawl.PersistLoadProcessor
-
- PersistLogProcessor - Class in org.archive.modules.recrawl
-
Log CrawlURI attributes from latest fetch for consultation by a later
recrawl.
- PersistLogProcessor() - Constructor for class org.archive.modules.recrawl.PersistLogProcessor
-
- PersistOnlineProcessor - Class in org.archive.modules.recrawl
-
Common superclass for persisting Processors which directly store/load
to persistence (as opposed to logging for batch load later).
- PersistOnlineProcessor() - Constructor for class org.archive.modules.recrawl.PersistOnlineProcessor
-
- PersistProcessor - Class in org.archive.modules.recrawl
-
Superclass for Processors which utilize BDB-JE for URI state
(including most notably history) persistence.
- PersistProcessor() - Constructor for class org.archive.modules.recrawl.PersistProcessor
-
- PersistStoreProcessor - Class in org.archive.modules.recrawl
-
Store CrawlURI attributes from latest fetch to persistent storage for
consultation by a later recrawl.
- PersistStoreProcessor() - Constructor for class org.archive.modules.recrawl.PersistStoreProcessor
-
- Piece - Class in org.archive.util.ms
-
- Piece(int, int, int, boolean) - Constructor for class org.archive.util.ms.Piece
-
- politenessDelay - Variable in class org.archive.modules.CrawlURI
-
- politenessDelayFor(CrawlURI) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
Update any scheduling structures with the new information in this
CrawlURI.
- poll() - Method in class org.archive.bdb.StoredQueue
-
- polynomial - Variable in class st.ata.util.FPGenerator
-
The polynomial used by this
to generate
fingerprints.
- polynomials - Static variable in class st.ata.util.FPGenerator
-
Array of irreducible polynomials.
- poolMaxActive - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Maximum active files in pool.
- popOverridesContext() - Static method in class org.archive.spring.KeyedProperties
-
Remove last-added override map from the stack
- populate(CrawlURI, HttpClient, HttpMethod, Map<String, String>) - Method in class org.archive.modules.credential.Credential
-
- populate(CrawlURI, HttpClient, HttpMethod, Map<String, String>) - Method in class org.archive.modules.credential.HtmlFormCredential
-
- populate(CrawlURI, HttpClient, HttpMethod, Map<String, String>) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- populatePersistEnv(String, File) - Static method in class org.archive.modules.recrawl.PersistProcessor
-
Populates a new environment db from an old environment db or a persist
log.
- position - Variable in class org.archive.crawler.restlet.PagedRepresentation
-
position in file around which to fetch lines
- position() - Method in class org.archive.util.ms.BlockInputStream
-
- position(long) - Method in class org.archive.util.ms.BlockInputStream
-
- postProcessAfterInitialization(Object, String) - Method in class org.archive.spring.ConfigPathConfigurer
-
Remember all beans for later fixup.
- postProcessBeforeInitialization(Object, String) - Method in class org.archive.spring.ConfigPathConfigurer
-
- power - Variable in class org.archive.util.BloomFilter64bit
-
if bitfield is an exact power of 2 in length, it is this power
- precedence - Variable in class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
-
- precedenceFloor - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
precedence rank at or below which queues are not crawled
- PrecedenceLoader - Class in org.archive.crawler.frontier.precedence
-
Utility class for loading externally-created URI-precedence values
into the URI-history database.
- PrecedenceLoader() - Constructor for class org.archive.crawler.frontier.precedence.PrecedenceLoader
-
- PrecedenceProvider - Class in org.archive.crawler.frontier.precedence
-
Parent class for precedence-providers, stateful helpers that can be
installed in a WorkQueue to implement various queue-precedence policies.
- PrecedenceProvider() - Constructor for class org.archive.crawler.frontier.precedence.PrecedenceProvider
-
- precedenceProvider - Variable in class org.archive.crawler.frontier.WorkQueue
-
assigned precedence
- PreconditionEnforcer - Class in org.archive.crawler.prefetch
-
Ensures the preconditions for a fetch -- such as DNS lookup
or acquiring and respecting a robots.txt policy -- are
satisfied before a URI is passed to subsequent stages.
- PreconditionEnforcer() - Constructor for class org.archive.crawler.prefetch.PreconditionEnforcer
-
- PredicatedDecideRule - Class in org.archive.modules.deciderules
-
Rule which applies the configured decision only if a
test evaluates to true.
- PredicatedDecideRule() - Constructor for class org.archive.modules.deciderules.PredicatedDecideRule
-
- PREEMPTIVE_DEFAULT - Static variable in class org.apache.commons.httpclient.HttpState
-
Deprecated.
This field and feature will be removed following HttpClient 3.0.
- PREEMPTIVE_PROPERTY - Static variable in class org.apache.commons.httpclient.HttpState
-
Deprecated.
This field and feature will be removed following HttpClient 3.0.
- prefix - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
File prefix.
- PrefixFinder - Class in org.archive.util
-
Utility class for extracting prefixes of a given string from a SortedMap.
- PrefixFinder() - Constructor for class org.archive.util.PrefixFinder
-
- prefixFrom(String) - Method in class org.archive.modules.deciderules.surt.OnDomainsDecideRule
-
- prefixFrom(String) - Method in class org.archive.modules.deciderules.surt.OnHostsDecideRule
-
- prefixFrom(String) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- prefixKey(String) - Static method in class org.archive.surt.SURTTokenizer
-
- preformat(LogRecord) - Method in class org.archive.crawler.io.UriProcessingFormatter
-
- PreloadedUriPrecedencePolicy - Class in org.archive.crawler.frontier.precedence
-
UriPrecedencePolicy which assigns URIs a precedence from a value that
was preloaded for them into the uri-history database.
- PreloadedUriPrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- preloadSource - Variable in class org.archive.modules.recrawl.PersistLoadProcessor
-
A source (either log file or BDB directory) from which to copy history
information into the current store at startup.
- preloadSourceUrl - Variable in class org.archive.modules.recrawl.PersistLoadProcessor
-
A log file source url from which to copy history information into the
current store at startup.
- prepare(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
Apply all configured policies to CrawlURI
- prepareMap() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- prepareMap() - Method in class org.archive.modules.fetcher.BdbCookieStorage
-
- prepareMap() - Method in class org.archive.modules.fetcher.SimpleCookieStorage
-
- preparer - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- prepForFrontier(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- PREREQ_MISC - Static variable in class org.archive.modules.extractor.LinkContext
-
Stand-in value for prerequisite urls without other context.
- PrerequisiteAcceptDecideRule - Class in org.archive.modules.deciderules
-
Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in
the last hopsPath position).
- PrerequisiteAcceptDecideRule() - Constructor for class org.archive.modules.deciderules.PrerequisiteAcceptDecideRule
-
- Preselector - Class in org.archive.crawler.prefetch
-
If set to recheck the crawl's scope, gives a yes/no on whether
a CrawlURI should be processed at all.
- Preselector() - Constructor for class org.archive.crawler.prefetch.Preselector
-
Constructor.
- primaryConfig - Variable in class org.archive.crawler.framework.CrawlJob
-
- prime() - Method in class org.archive.spring.Sheet
-
Ensure any properties targetted by this Sheet know to
check the right property paths for overrides at lookup time,
and that the override values are compatible types for their
destination properties.
- print(String) - Method in class org.apache.commons.httpclient.HttpConnection
-
- print(String, String) - Method in class org.apache.commons.httpclient.HttpConnection
-
Writes the specified String (as bytes) to the output stream.
- printHelp() - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
-
- printLine(String) - Method in class org.apache.commons.httpclient.HttpConnection
-
- printLine(String, String) - Method in class org.apache.commons.httpclient.HttpConnection
-
Writes the specified String (as bytes), followed by
"\r\n".getBytes() to the output stream.
- printLine() - Method in class org.apache.commons.httpclient.HttpConnection
-
Writes "\r\n".getBytes() to the output stream.
- PROCEED - Static variable in class org.archive.modules.ProcessResult
-
- process(CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- process(CrawlURI) - Method in class org.archive.modules.Processor
-
Processes the given URI.
- process(CrawlURI, ProcessorChain.ChainStatusReceiver) - Method in class org.archive.modules.ProcessorChain
-
- processedSeedsRecords - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
Record of seeds and latest results
- processEmbed(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- processEmbed(CrawlURI, CharSequence, CharSequence, Hop) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- processFinish(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Handle the given CrawlURI as having finished a worker ToeThread
processing attempt.
- processFinish(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Note that the previously emitted CrawlURI has completed
its processing (for now).
- processForm(CrawlURI, Element) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
-
- processGeneralTag(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- processGeneralTag(CrawlURI, Element, Attributes) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
-
- processingCleanup() - Method in class org.archive.modules.CrawlURI
-
Clean up after a run through the processing chain.
- processLink(CrawlURI, CharSequence, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
-
Handle generic HREF cases.
- processMeta(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
-
Process metadata tags.
- processMeta(CrawlURI, Element) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
-
- Processor - Class in org.archive.modules
-
A processor of URIs.
- Processor() - Constructor for class org.archive.modules.Processor
-
- ProcessorChain - Class in org.archive.modules
-
Collection of Processors to run.
- ProcessorChain() - Constructor for class org.archive.modules.ProcessorChain
-
- ProcessorChain.ChainStatusReceiver - Interface in org.archive.modules
-
- ProcessorsReport - Class in org.archive.crawler.reporting
-
The "Processors Report", delegated through the CrawlController
to each Processor to dump whatever information it collects for
this purpose.
- ProcessorsReport() - Constructor for class org.archive.crawler.reporting.ProcessorsReport
-
- ProcessorTestBase - Class in org.archive.modules
-
Unit test for Processor.
- ProcessorTestBase() - Constructor for class org.archive.modules.ProcessorTestBase
-
- processResponseBody(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- processResponseHeaders(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- ProcessResult - Class in org.archive.modules
-
Returned by a Processor's process method to indicate the status of the
process.
- ProcessResult.ProcessStatus - Enum in org.archive.modules
-
- processScheduleAlways(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Schedule the given CrawlURI regardless of its already-seen status.
- processScheduleAlways(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Accept the given CrawlURI for scheduling, as it has
passed the alreadyIncluded filter.
- processScheduleIfUnique(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Schedule the given CrawlURI if not already-seen.
- processScheduleIfUnique(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Arrange for the given CrawlURI to be visited, if it is not
already scheduled/completed.
- processScript(CrawlURI, CharSequence, int) - Method in class org.archive.modules.extractor.AggressiveExtractorHTML
-
- processScript(CrawlURI, CharSequence, int) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- processScript(CrawlURI, Element) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
-
- processScriptCode(CrawlURI, CharSequence) - Method in class org.archive.modules.extractor.ExtractorHTML
-
Extract the (java)script source in the given CharSequence.
- processStatusLine(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- processStyle(CrawlURI, CharSequence, int) - Method in class org.archive.modules.extractor.ExtractorHTML
-
Process style text.
- processStyle(CrawlURI, Element) - Method in class org.archive.modules.extractor.JerichoExtractorHTML
-
- processStyleCode(Extractor, CrawlURI, CharSequence) - Static method in class org.archive.modules.extractor.ExtractorCSS
-
- processXml(Extractor, CrawlURI, CharSequence) - Static method in class org.archive.modules.extractor.ExtractorXML
-
- profileCxmlPath - Variable in class org.archive.crawler.framework.Engine
-
- profileLog - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- profileLog(String) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- profileLog - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- profileLog(String) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- progressLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- progressStatisticsEvent() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
A method for logging current crawler state.
- progressStatisticsLegend(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
-
- progressStatisticsLegend() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- progressStatisticsLine(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
-
- propertyName - Variable in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
-
- protocolCommandSent(ProtocolCommandEvent) - Method in class org.archive.net.ClientFTP
-
- protocolReplyReceived(ProtocolCommandEvent) - Method in class org.archive.net.ClientFTP
-
- publish(LogRecord) - Method in class org.archive.crawler.reporting.AlertHandler
-
Pass record to AlertThreadGroup.
- publish(LogRecord) - Method in class org.archive.crawler.reporting.AlertThreadGroup
-
Pass a record to all loggers registered with the
AlertThreadGroup.
- publishAddedSeed(CrawlURI) - Method in class org.archive.modules.seeds.SeedModule
-
- publishConcludedSeedBatch() - Method in class org.archive.modules.seeds.SeedModule
-
- publishCurrent(LogRecord) - Static method in class org.archive.crawler.reporting.AlertThreadGroup
-
- publishNonSeedLine(String) - Method in class org.archive.modules.seeds.SeedModule
-
- purgeExpiredCookies() - Method in class org.apache.commons.httpclient.HttpState
-
Removes all of
cookies
in this HTTP state
that have expired according to the current system time.
- purgeExpiredCookies(Date) - Method in class org.apache.commons.httpclient.HttpState
-
Removes all of
cookies
in this HTTP state
that have expired by the specified
date
.
- push(String) - Method in class org.archive.modules.extractor.ExtractorSWF.CrawlUriSWFAction
-
- pushOverrideContext(OverlayContext) - Static method in class org.archive.spring.KeyedProperties
-
Add an override map to the stack
- put(CrawlURI, boolean) - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Put the given CrawlURI in at the appropriate place.
- putAllAtomicLongs(Map<String, AtomicLong>, JSONObject) - Static method in class org.archive.util.JSONUtils
-
- putAllLongs(Map<String, Long>, JSONObject) - Static method in class org.archive.util.JSONUtils
-
- putSheetOverlay(String, String, Object) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Add to named sheet an overlay of the given bean-path and new value.
- raAppend(int, String) - Method in class org.archive.util.PaddingStringBuffer
-
Append a string, right-aligned to the given columm.
- raAppend(int, int) - Method in class org.archive.util.PaddingStringBuffer
-
Append an int
right-aligned to the given column.
- raAppend(int, long) - Method in class org.archive.util.PaddingStringBuffer
-
Append a long
, right-aligned to the given column.
- range - Variable in class org.archive.crawler.restlet.PagedRepresentation
-
position range [start-of-first-line, past-end-of-last-line] in file
- RANGE - Static variable in class org.archive.modules.fetcher.FetchHTTP
-
- RANGE_PREFIX - Static variable in class org.archive.modules.fetcher.FetchHTTP
-
- RateLimitGuard - Class in org.archive.crawler.restlet
-
Guard that slows and logs failed authentication attempts, to make
brute-force guessing attacks less feasible.
- RateLimitGuard(Context, ChallengeScheme, String) - Constructor for class org.archive.crawler.restlet.RateLimitGuard
-
- RateLimitGuard(Context, String, Collection<String>, String) - Constructor for class org.archive.crawler.restlet.RateLimitGuard
-
- rateReport() - Method in class org.archive.crawler.framework.CrawlJob
-
- rateReportData() - Method in class org.archive.crawler.framework.CrawlJob
-
- reachedState(Frontier.State) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
The given state has been reached; if it is a new state, generate
a notification to the CrawlController.
- read() - Method in class org.archive.util.ms.BlockInputStream
-
- read(byte[], int, int) - Method in class org.archive.util.ms.BlockInputStream
-
- read(byte[]) - Method in class org.archive.util.ms.BlockInputStream
-
- readLine() - Method in class org.apache.commons.httpclient.HttpConnection
-
Deprecated.
use #readLine(String)
- readLine(String) - Method in class org.apache.commons.httpclient.HttpConnection
-
Reads up to "\n" from the (unchunked) input stream.
- readObjectData(Kryo, ByteBuffer) - Method in class org.archive.net.UURI
-
- readObjectFromFile(Class<T>, File) - Static method in class org.archive.crawler.util.CheckpointUtils
-
- readObjectFromFile(Class<T>, String, File) - Static method in class org.archive.crawler.util.CheckpointUtils
-
- readPrefixes() - Method in class org.archive.modules.deciderules.surt.OnDomainsDecideRule
-
Patch the SURT prefix set so that it only includes host-enforcing prefixes
- readPrefixes() - Method in class org.archive.modules.deciderules.surt.OnHostsDecideRule
-
Patch the SURT prefix set so that it only includes host-enforcing prefixes
- readPrefixes() - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- readResponse(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- readResponseBody(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- readResponseHeaders(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Reads the response headers from the given
connection
.
- ReadSourceEditor<T> - Class in org.archive.io
-
PropertyEditor allowing Strings to become ConfigString instances
(implementing ReadSource).
- ReadSourceEditor() - Constructor for class org.archive.io.ReadSourceEditor
-
- readStatusLine(HttpState, HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
- readUuri(String) - Method in class org.archive.modules.CrawlURI
-
Read a UURI from a String, handling a null or URIException
- readyClassQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
All per-class queues whose first item may be handed out.
- readyQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Put the given queue on the readyClassQueues queue
- realm - Variable in class org.archive.modules.credential.HttpAuthenticationCredential
-
Basic/Digest Auth realm.
- receive(CrawlURI) - Method in interface org.archive.crawler.datamodel.UriUniqFilter.CrawlUriReceiver
-
- receive(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Accept the given CrawlURI for scheduling, as it has
passed the alreadyIncluded filter.
- receive(CrawlURI) - Method in class org.archive.crawler.util.BenchmarkUriUniqFilters
-
- receiver - Variable in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- receiver - Variable in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- recheckThresholdKb - Variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
Available space via 'df' is rechecked after every increment of this much
content (uncompressed) is observed.
- reconsiderRetiredQueues() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Accommodate any changes in retirement-determining settings (like
total-budget or force-retire changes/overlays.
- recordControlMessage(String, String) - Method in class org.archive.net.ClientFTP
-
- recordDNS(CrawlURI, Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
-
- recorderInBufferBytes - Variable in class org.archive.crawler.framework.CrawlController
-
Size in bytes of in-memory buffer to record inbound traffic.
- recorderOutBufferBytes - Variable in class org.archive.crawler.framework.CrawlController
-
Size in bytes of in-memory buffer to record outbound traffic.
- recover - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
Crawl replay logger.
- recoveryCheckpoint - Variable in class org.archive.bdb.BdbModule
-
- recoveryCheckpoint - Variable in class org.archive.crawler.framework.CheckpointService
-
- recoveryCheckpoint - Variable in class org.archive.crawler.framework.CrawlController
-
- recoveryCheckpoint - Variable in class org.archive.crawler.frontier.BdbFrontier
-
- recoveryCheckpoint - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- recoveryCheckpoint - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- recoveryCheckpoint - Variable in class org.archive.crawler.util.BdbUriUniqFilter
-
- recoveryCheckpoint - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- recoveryCheckpoint - Variable in class org.archive.modules.Processor
-
- RecoveryLogMapper - Class in org.archive.crawler.util
-
- RecoveryLogMapper(String) - Constructor for class org.archive.crawler.util.RecoveryLogMapper
-
Normal constructor - if encounter not-found seeds while loading
recoverLogFileName, will throw throw SeedUrlNotFoundException.
- RecoveryLogMapper(String, String) - Constructor for class org.archive.crawler.util.RecoveryLogMapper
-
Constructor to use if you want to allow not-found seeds, logging them to
seedNotFoundLogFileName.
- RecrawlAttributeConstants - Interface in org.archive.modules.recrawl
-
- recycle() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
no longer supported and will be removed in the future
version of HttpClient
- RecyclingSerialBinding<K> - Class in org.archive.crawler.frontier
-
A SerialBinding that recycles a single FastOutputStream per
thread, avoiding reallocation of the internal buffer for
either repeated serializations or because of mid-serialization
expansions.
- RecyclingSerialBinding(ClassCatalog, Class) - Constructor for class org.archive.crawler.frontier.RecyclingSerialBinding
-
Constructor.
- reduce(long) - Method in class st.ata.util.FPGenerator
-
Return a value equal (mod polynomial
) to
fp
and of degree less than degree
.
- reenqueued(CrawlURI) - Method in class org.archive.crawler.frontier.FrontierJournal
-
- reenqueueQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Enqueue the given queue to either readyClassQueues or inactiveQueues,
as appropriate.
- referentField - Static variable in class org.archive.util.ObjectIdentityBdbCache
-
Reference to the Reference#referent Field.
- REFERER - Static variable in class org.archive.modules.fetcher.FetchHTTP
-
- REFLECTION_FACTORY - Static variable in class org.archive.bdb.AutoKryo
-
- refQueue - Variable in class org.archive.util.ObjectIdentityBdbCache
-
- RegexRule - Class in org.archive.modules.canonicalize
-
General conversion rule.
- RegexRule() - Constructor for class org.archive.modules.canonicalize.RegexRule
-
- registeredClasses - Variable in class org.archive.bdb.AutoKryo
-
- RejectDecideRule - Class in org.archive.modules.deciderules
-
- RejectDecideRule() - Constructor for class org.archive.modules.deciderules.RejectDecideRule
-
- releaseConnection() - Method in class org.apache.commons.httpclient.HttpConnection
-
Releases the connection.
- releaseConnection() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Releases the connection being used by this HTTP method.
- relocate(long, long, long) - Method in class org.archive.util.AbstractLongFPSet
-
- relocate(long, long, long) - Method in class org.archive.util.fingerprint.MemLongFPSet
-
- remember(String, ConfigPath) - Method in class org.archive.spring.ConfigPathConfigurer
-
- remove() - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
-
- remove(long) - Method in class org.archive.util.AbstractLongFPSet
-
- remove(long) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
-
- remove(long) - Method in interface org.archive.util.fingerprint.LongFPSet
-
Remove a fingerprint from the set, if it is there
- remove() - Method in class org.archive.util.iterator.CompositeIterator
-
- removeAt(long) - Method in class org.archive.util.AbstractLongFPSet
-
Remove the value at the given index, relocating its
successors as necessary.
- removeDataPersistentMember(String) - Static method in class org.archive.modules.CrawlURI
-
Remove the key from those data map members persisted.
- removeEldestEntry(Map.Entry<K, V>) - Method in class org.archive.util.LRU
-
- removePropertyChangeListener(PropertyChangeListener) - Method in class org.archive.io.ReadSourceEditor
-
- removePropertyChangeListener(PropertyChangeListener) - Method in class org.archive.spring.ConfigPathEditor
-
- removeRequestHeader(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Remove the request header associated with the given name.
- removeRequestHeader(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Removes the given request header.
- removeSheetOverlay(String, String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Remove the given bean-path overlay in the named sheet.
- removeSurtAssociation(String, String) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
- renderFlashesHTML(Writer, Request) - Static method in class org.archive.crawler.restlet.Flash
-
- reopen(Database) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
Call after deserializing an instance of this class.
- replicaLocation(int, int) - Method in class org.archive.util.LongToIntConsistentHash
-
- replicasInstalledUpTo - Variable in class org.archive.util.LongToIntConsistentHash
-
- Report - Class in org.archive.crawler.reporting
-
Abstract superclass for named crawl reports that need only a
StatisticsTracker and can dump a plain-text representation to a
PrintWriter.
- Report() - Constructor for class org.archive.crawler.reporting.Report
-
- report() - Method in class org.archive.modules.extractor.Extractor
-
- report() - Method in class org.archive.modules.extractor.JerichoExtractorHTML
-
- report() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- report() - Method in class org.archive.modules.Processor
-
- report() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- reportClass - Variable in class org.archive.crawler.restlet.ReportGenResource
-
- ReportGenResource - Class in org.archive.crawler.restlet
-
Restlet Resource which generates fresh reports and then redirects
requests to the report in the filesystem.
- ReportGenResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.ReportGenResource
-
- reports - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- REPORTS_DIR_NAME - Static variable in class org.archive.crawler.framework.Engine
-
- reportsDir - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- reportThread(Thread, PrintWriter) - Static method in class org.archive.crawler.framework.ToeThread
-
- reportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
-
- reportTo(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
-
Compiles and returns a report on its status.
- reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
-
- reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
-
- reportTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
This method compiles a human readable report on the status of the frontier
at the time of the call.
- reportTo(PrintWriter) - Method in class org.archive.modules.CrawlURI
-
- reportTo(PrintWriter) - Method in class org.archive.modules.fetcher.FetchStats
-
- reportTo(PrintWriter) - Method in class org.archive.modules.ProcessorChain
-
Compiles and returns a human readable report on the active processors.
- ReportUtils - Class in org.archive.util
-
- ReportUtils() - Constructor for class org.archive.util.ReportUtils
-
- represent(Variant) - Method in class org.archive.crawler.restlet.BeanBrowseResource
-
- represent(Variant) - Method in class org.archive.crawler.restlet.EngineResource
-
- represent(Variant) - Method in class org.archive.crawler.restlet.JobResource
-
- represent(Variant) - Method in class org.archive.crawler.restlet.ReportGenResource
-
- represent(Variant) - Method in class org.archive.crawler.restlet.ScriptResource
-
- requestCrawlCheckpoint() - Method in class org.archive.crawler.framework.CheckpointService
-
Run a checkpoint of the crawler
- requestCrawlPause() - Method in class org.archive.crawler.framework.CrawlController
-
Stop the crawl temporarly.
- requestCrawlResume() - Method in class org.archive.crawler.framework.CrawlController
-
Resume crawl from paused state
- requestCrawlStart() - Method in class org.archive.crawler.framework.CrawlController
-
Operator requested crawl begin
- requestCrawlStop() - Method in class org.archive.crawler.framework.CrawlController
-
Operator requested for crawl to stop.
- requestCrawlStop(CrawlStatus) - Method in class org.archive.crawler.framework.CrawlController
-
Operator requested for crawl to stop.
- requestFlush() - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Request that any pending items be added/dropped.
- requestFlush() - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- requestFlush() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- requestLaunch(String) - Method in class org.archive.crawler.framework.Engine
-
- requestState(Frontier.State) - Method in interface org.archive.crawler.framework.Frontier
-
Request the Frontier reach the given state as soon as possible.
- requestState(Frontier.State) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- requiredPattern - Variable in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
-
- rescheduleTime - Variable in class org.archive.modules.CrawlURI
-
A future time at which this CrawlURI should be reenqueued.
- ReschedulingProcessor - Class in org.archive.crawler.postprocessor
-
The most simple forced-rescheduling step possible: use a local
setting (perhaps overlaid to vary based on the URI) to set an exact
future reschedule time, as a delay from now.
- ReschedulingProcessor() - Constructor for class org.archive.crawler.postprocessor.ReschedulingProcessor
-
- reset() - Method in class org.archive.util.PaddingStringBuffer
-
reset the buffer back to empty
- resetAlertCount() - Method in class org.archive.crawler.reporting.AlertThreadGroup
-
- resetAlertCount() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- resetConsecutiveConnectionErrors() - Method in class org.archive.modules.net.CrawlServer
-
- resetDeferrals() - Method in class org.archive.modules.CrawlURI
-
Reset deferrals counter.
- resetFetchAttempts() - Method in class org.archive.modules.CrawlURI
-
Reset fetchAttempts counter.
- resetForRescheduling() - Method in class org.archive.modules.CrawlURI
-
Reset state that that should not persist when a URI is
rescheduled for a specific future time.
- resetState() - Method in class org.archive.modules.extractor.PDFParser
-
Reinitialize the object as though a new one were created.
- resetState(byte[]) - Method in class org.archive.modules.extractor.PDFParser
-
Reset the object and initialize it with a new byte array (the document).
- resetState(String) - Method in class org.archive.modules.extractor.PDFParser
-
Reinitialize the object as though a new one were created, complete
with a valid pointer to a document that can be read
- resolve(String) - Method in class org.archive.crawler.framework.ToeThread
-
- resolve(String) - Method in interface org.archive.modules.fetcher.HostResolver
-
- ResourceLongerThanDecideRule - Class in org.archive.modules.deciderules
-
Applies configured decision for URIs with content length greater than
a given threshold length value.
- ResourceLongerThanDecideRule() - Constructor for class org.archive.modules.deciderules.ResourceLongerThanDecideRule
-
- ResourceNoLongerThanDecideRule - Class in org.archive.modules.deciderules
-
Applies configured decision for URIs with content length less than or equal
to a given threshold length value.
- ResourceNoLongerThanDecideRule() - Constructor for class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
-
- RESPONSE_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- responseBodyConsumed() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
A response has been consumed.
- ResponseCodeReport - Class in org.archive.crawler.reporting
-
The "Response Codes Report", tallies by response/disposition code.
- ResponseCodeReport() - Constructor for class org.archive.crawler.reporting.ResponseCodeReport
-
- ResponseContentLengthDecideRule - Class in org.archive.modules.deciderules
-
Decide rule that will ACCEPT or REJECT a uri, depending on the
"decision" property, after it's fetched, if the content body is within a
specified size range, specified in bytes.
- ResponseContentLengthDecideRule() - Constructor for class org.archive.modules.deciderules.ResponseContentLengthDecideRule
-
- RESPONSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- retire() - Method in class org.archive.crawler.framework.ToeThread
-
Request that this thread retire (exit cleanly) at the earliest
opportunity.
- retired - Variable in class org.archive.crawler.frontier.WorkQueue
-
- retiredQueues - Variable in class org.archive.crawler.frontier.BdbFrontier
-
'retired' queues, no longer considered for activation.
- retireQueue(WorkQueue) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Put the given queue on the retiredQueues queue
- retryDelayFor(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Return a suitable value to wait before retrying the given URI.
- retryMethod(HttpMethod, IOException, int) - Method in class org.archive.modules.fetcher.HeritrixHttpMethodRetryHandler
-
- reversedOrder - Variable in class org.archive.crawler.restlet.PagedRepresentation
-
whether to display lines in reversed order (latest first)
- ROBOTS_NOT_FETCHED - Static variable in class org.archive.modules.net.CrawlServer
-
- robotsDenials - Variable in class org.archive.modules.fetcher.FetchStats
-
- RobotsDirectives - Class in org.archive.modules.net
-
Represents the directives that apply to a user-agent (or set of
user-agents)
- RobotsDirectives() - Constructor for class org.archive.modules.net.RobotsDirectives
-
- robotsFetched - Variable in class org.archive.modules.net.CrawlServer
-
- RobotsPolicy - Class in org.archive.modules.net
-
RobotsPolicy represents the strategy used by the crawler
for determining how robots.txt files will be honored.
- RobotsPolicy() - Constructor for class org.archive.modules.net.RobotsPolicy
-
- robotstxt - Variable in class org.archive.modules.net.CrawlServer
-
- Robotstxt - Class in org.archive.modules.net
-
Utility class for parsing and representing 'robots.txt' format
directives, into a list of named user-agents and map from user-agents
to RobotsDirectives.
- Robotstxt() - Constructor for class org.archive.modules.net.Robotstxt
-
- Robotstxt(BufferedReader) - Constructor for class org.archive.modules.net.Robotstxt
-
- Robotstxt(ReadSource) - Constructor for class org.archive.modules.net.Robotstxt
-
- rootUriMatch(ServerCache, CrawlURI) - Method in class org.archive.modules.credential.Credential
-
Test passed curi matches this credentials rootUri.
- rotateForCheckpoint(Checkpoint) - Method in class org.archive.io.CrawlerJournal
-
Handle a checkpoint by rotating the current log to a checkpoint-named
file and starting a new log.
- rotateLogFiles() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- rotateLogFiles(String) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- rotateLogFiles(String, boolean) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- rotationDigits - Variable in class org.archive.crawler.processor.CrawlMapper
-
Number of timestamp digits to use as prefix of log names (grouping all
diversions from that period in a single log).
- ruleAssociations - Variable in class org.archive.crawler.spring.SheetOverlaysManager
-
all SheetAssociations by DecideRule evaluation
- rules - Variable in class org.archive.crawler.spring.DecideRuledSheetAssociation
-
- rules - Variable in class org.archive.spring.BeanFieldsPatternValidator
-
- RulesCanonicalizationPolicy - Class in org.archive.modules.canonicalize
-
URI Canonicalizatioon Policy
- RulesCanonicalizationPolicy() - Constructor for class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
-
- run() - Method in class org.archive.crawler.framework.ActionDirectory
-
Action taken at scheduled intervals
- run() - Method in interface org.archive.crawler.framework.Frontier
-
Request that Frontier allow crawling to begin.
- run() - Method in class org.archive.crawler.framework.ToeThread
-
(non-Javadoc)
- run() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- run() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Do activity.
- runCandidateChain(CrawlURI, CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
Run candidatesChain on a single candidate CrawlURI; if its
reported status is nonnegative, schedule to frontier.
- runTest() - Method in class org.archive.state.ModuleTestBase
-
- RuntimeErrorFormatter - Class in org.archive.crawler.io
-
Runtime exception log formatter.
- RuntimeErrorFormatter(boolean) - Constructor for class org.archive.crawler.io.RuntimeErrorFormatter
-
- runtimeErrorsLogPath - Variable in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- RuntimeLimitEnforcer - Class in org.archive.crawler.prefetch
-
A processor to enforce runtime limits on crawls.
- RuntimeLimitEnforcer() - Constructor for class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- RuntimeLimitEnforcer.Operation - Enum in org.archive.crawler.prefetch
-
The action that the processor takes once the runtime has elapsed.
- runtimeSeconds - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
The amount of time, in seconds, that the crawl will be allowed to run
before this processor performs it's 'end operation.'
- runWhileEmpty - Variable in class org.archive.crawler.framework.CrawlController
-
whether to keep running (without pause or finish) when frontier is empty
- S3URLConnection - Class in org.archive.net.s3
-
URLConnection for Amazon S3 objects.
- S3URLConnection(URL) - Constructor for class org.archive.net.s3.S3URLConnection
-
Contruct a new S3URLConnection.
- S_BLOCKED_BY_CUSTOM_PROCESSOR - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
Blocked by custom prefetcher processor.
- S_BLOCKED_BY_QUOTA - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
Blocked due to exceeding an established quota.
- S_BLOCKED_BY_RUNTIME_LIMIT - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
Blocked due to exceeding an established runtime.
- S_BLOCKED_BY_USER - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
blocked from fetch by user setting.
- S_CONNECT_FAILED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
HTTP connect failed
- S_CONNECT_LOST - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
HTTP connect broken
- S_DEEMED_CHAFF - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
'chaff' detection of traps/content of negligible value applied
- S_DEEMED_NOT_FOUND - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
synthetic status, used when some other status (such as connection-lost)
is considered by policy the same as a document-not-found
- S_DEFERRED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
temporary status assigned URIs awaiting preconditions; appearance in
logs is a bug
- S_DELETED_BY_USER - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
deleted from frontier by user
- S_DNS_SUCCESS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
DNS success
- S_DOMAIN_PREREQUISITE_FAILURE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
DNS prerequisite failed, precluding attempt
- S_DOMAIN_UNRESOLVABLE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
DNS lookup failed
- S_GETBYNAME_SUCCESS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
InetAddress.getByName success
- S_NOT_FOUND - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
HTTP 404 NOT FOUND
- S_OTHER_PREREQUISITE_FAILURE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
DNS prerequisite failed, precluding attempt
- S_OUT_OF_SCOPE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
out-of-scope upoin reexamination (only when scope changes during
crawl)
- S_PREREQUISITE_UNSCHEDULABLE_FAILURE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
DNS prerequisite failed, precluding attempt
- S_PROCESSING_THREAD_KILLED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
Processing thread was killed
- S_ROBOTS_PRECLUDED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
robots rules precluded fetch
- S_ROBOTS_PREREQUISITE_FAILURE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
Robots prerequisite failed, precluding attempt
- S_RUNTIME_EXCEPTION - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
Unexpected runtime exception; see runtime-errors.log
- S_SERIOUS_ERROR - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
severe java 'Error' conditions (OutOfMemoryError, StackOverflowError,
etc.) during URI processing
- S_TIMEOUT - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
HTTP timeout (before any meaningful response received)
- S_TOO_MANY_EMBED_HOPS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
overstepped embed/trans hops
- S_TOO_MANY_LINK_HOPS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
overstepped link hops
- S_TOO_MANY_RETRIES - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
multiple retries all failed
- S_UNATTEMPTED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
fetch never tried (perhaps protocol unsupported or illegal URI)
- S_UNFETCHABLE_URI - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
URI recognized as unsupported or illegal)
- S_UNQUEUEABLE - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
URI could not be queued in Frontier; when URIs are properly
filtered for format, should never occur
- S_WHOIS_GENERIC_FINISHED - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
Finished all fetches for serverless WHOIS url (whois:foo.org)
- S_WHOIS_SUCCESS - Static variable in interface org.archive.modules.fetcher.FetchStatusCodes
-
WHOIS success
- sameProgressAs(CrawlStatSnapshot) - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
-
Return true if this snapshot shows no tangible progress in
its URI counts over the supplied snapshot.
- saveCookies(String, Map<String, Cookie>) - Static method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- saveCookiesMap(Map<String, Cookie>) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- saveCookiesMap(Map<String, Cookie>) - Method in interface org.archive.modules.fetcher.CookieStorage
-
- saveHeader(String, HttpMethod, HashMap<String, Object>) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
-
Save a header from the given HTTP operation into the AList.
- saveHeader(String, HttpMethod, ANVLRecord, String) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
Save a header from the given HTTP operation into the
provider headers under a new name
- saveHostStats(String, long) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Update some running-stats based on a URI success
- saveJson(String, JSONObject) - Method in class org.archive.checkpointing.Checkpoint
-
- saveSourceStats(String, String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- saveWriter(String, String) - Method in class org.archive.checkpointing.Checkpoint
-
- scanActionDirectory() - Method in class org.archive.crawler.framework.ActionDirectory
-
Find any new files in the 'action' directory; process each in
order.
- scanJobLog() - Method in class org.archive.crawler.framework.CrawlJob
-
Refresh knowledge of total launched and last launch by scanning
the job.log.
- schedule(CrawlURI) - Method in interface org.archive.crawler.framework.Frontier
-
Schedules a CrawlURI.
- schedule(CrawlURI) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Arrange for the given CrawlURI to be visited, if it is not
already scheduled/completed.
- schedule(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Arrange for the given CrawlURI to be visited, if it is not
already enqueued/completed.
- SchedulingConstants - Class in org.archive.modules
-
- SchemeNotInSetDecideRule - Class in org.archive.modules.deciderules
-
Rule applies the configured decision (default REJECT) for any URI which
has a URI-scheme NOT contained in the configured Set.
- SchemeNotInSetDecideRule() - Constructor for class org.archive.modules.deciderules.SchemeNotInSetDecideRule
-
Usual constructor.
- schemes - Variable in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
-
set of schemes to test URI scheme
- scope - Variable in class org.archive.crawler.framework.Scoper
-
- scope - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- Scoper - Class in org.archive.crawler.framework
-
Base class for Scopers.
- Scoper() - Constructor for class org.archive.crawler.framework.Scoper
-
Constructor.
- scratchDir - Variable in class org.archive.crawler.framework.CrawlController
-
Scratch directory for temporary overflow-to-disk
- scratchDir - Variable in class org.archive.crawler.util.DiskFPMergeUriUniqFilter
-
- SCRIPT_SRC - Static variable in class org.archive.modules.extractor.HTMLLinkContext
-
- ScriptedDecideRule - Class in org.archive.modules.deciderules
-
Rule which runs a JSR-223 script to make its decision.
- ScriptedDecideRule() - Constructor for class org.archive.modules.deciderules.ScriptedDecideRule
-
- ScriptedProcessor - Class in org.archive.modules
-
A processor which runs a JSR-223 script on the CrawlURI.
- ScriptedProcessor() - Constructor for class org.archive.modules.ScriptedProcessor
-
Constructor.
- ScriptingConsole - Class in org.archive.crawler.restlet
-
ScriptingConsole implements view-independent logic of scripting console.
- ScriptingConsole(CrawlJob) - Constructor for class org.archive.crawler.restlet.ScriptingConsole
-
- ScriptModel - Class in org.archive.crawler.restlet.models
-
- ScriptModel(ScriptingConsole, String, Collection<Map<String, String>>) - Constructor for class org.archive.crawler.restlet.models.ScriptModel
-
- ScriptResource - Class in org.archive.crawler.restlet
-
Restlet Resource which runs an arbitrary script, which is supplied
with variables pointing to the job and appContext, from which all
other live crawl objects are reachable.
- ScriptResource(Context, Request, Response) - Constructor for class org.archive.crawler.restlet.ScriptResource
-
- scriptSource - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
-
- scriptSource - Variable in class org.archive.modules.ScriptedProcessor
-
- secret - Variable in class org.archive.net.s3.S3URLConnection
-
- SeedAcceptDecideRule - Class in org.archive.modules.deciderules
-
Rule which ACCEPTs all 'seed' URIs (those for which
isSeed is true).
- SeedAcceptDecideRule() - Constructor for class org.archive.modules.deciderules.SeedAcceptDecideRule
-
- seedLine(String) - Method in class org.archive.modules.seeds.TextSeedModule
-
Handle a read line that is probably a seed.
- SeedListener - Interface in org.archive.modules.seeds
-
Implemented by components which want notifications of
seed list changes.
- seedListeners - Variable in class org.archive.modules.seeds.SeedModule
-
- SeedModule - Class in org.archive.modules.seeds
-
- SeedModule() - Constructor for class org.archive.modules.seeds.SeedModule
-
- SeedRecord - Class in org.archive.crawler.reporting
-
Record of all interesting info about the most-recent
processing of a specific seed.
- SeedRecord(CrawlURI, String) - Constructor for class org.archive.crawler.reporting.SeedRecord
-
Create a record from the given CrawlURI and disposition string
- SeedRecord(String, String) - Constructor for class org.archive.crawler.reporting.SeedRecord
-
Constructor for when a CrawlURI is unavailable; such
as when considering seeds not yet passed through as
CrawlURIs.
- SeedRecord(String, String, int, String) - Constructor for class org.archive.crawler.reporting.SeedRecord
-
Create a record from the given URI, disposition, HTTP status code,
and redirect URI.
- seeds - Variable in class org.archive.crawler.framework.ActionDirectory
-
- seeds - Variable in class org.archive.crawler.framework.CrawlController
-
- seeds - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- seeds - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- seeds - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- seeds - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- SEEDS_REDIRECT_NEW_SEEDS_MAX_HOPS - Static variable in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- seedsAsSurtPrefixes - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
Should seeds also be interpreted as SURT prefixes.
- seedsCrawled - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- SeedsReport - Class in org.archive.crawler.reporting
-
The "Seeds Report", results per provided seed.
- SeedsReport() - Constructor for class org.archive.crawler.reporting.SeedsReport
-
- seedsTotal - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- SeedUrlNotFoundException - Exception in org.archive.crawler.util
-
- SeedUrlNotFoundException(String) - Constructor for exception org.archive.crawler.util.SeedUrlNotFoundException
-
- seemsLoginForm() - Method in class org.archive.modules.forms.HTMLForm
-
For now, we consider a POST form with only 1 password
field and 1 potential username field (type text or email)
to be a likely login form.
- sendCrawlStateChangeEvent(CrawlController.State, CrawlStatus) - Method in class org.archive.crawler.framework.CrawlController
-
Send crawl change event to all listeners.
- sendToQueue(CrawlURI) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
Send a CrawlURI to the appropriate subqueue.
- serialize(Object) - Static method in class org.archive.util.TestUtils
-
- SERIALIZED_CLASS_SUFFIX - Static variable in class org.archive.crawler.util.CheckpointUtils
-
- seriousError(String) - Method in class org.archive.io.CrawlerJournal
-
Note a serious error vioa a special log line
- SERVER - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- serverCache - Variable in class org.archive.crawler.framework.CrawlController
-
- serverCache - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- serverCache - Variable in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
-
- serverCache - Variable in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
-
- serverCache - Variable in class org.archive.crawler.postprocessor.DispositionProcessor
-
- serverCache - Variable in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- serverCache - Variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- serverCache - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
- serverCache - Variable in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- serverCache - Variable in class org.archive.modules.deciderules.IpAddressSetDecideRule
-
- serverCache - Variable in class org.archive.modules.fetcher.FetchDNS
-
Used to do DNS lookups.
- serverCache - Variable in class org.archive.modules.fetcher.FetchHTTP
-
Used to do DNS lookups.
- serverCache - Variable in class org.archive.modules.fetcher.FetchWhois
-
- ServerCache - Class in org.archive.modules.net
-
Abstract class for crawl-global registry of CrawlServer (host:port) and
CrawlHost (hostname) objects.
- ServerCache() - Constructor for class org.archive.modules.net.ServerCache
-
- serverCache - Variable in class org.archive.modules.writer.Kw3WriterProcessor
-
The server cache to use.
- serverCache - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
- serverInetAddr - Variable in class org.archive.modules.fetcher.FetchDNS
-
- servers - Variable in class org.archive.modules.fetcher.DefaultServerCache
-
hostname[:port] -> CrawlServer.
- sessionBudget - Variable in class org.archive.crawler.frontier.WorkQueue
-
Per-session 'budget' controlling activity duration
- set - Variable in class org.archive.crawler.util.TopNSet
-
- setAcceptCompression(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setAcceptHeaders(List<String>) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setAcceptNonDnsResolves(boolean) - Method in class org.archive.modules.fetcher.FetchDNS
-
- setAction(String) - Method in class org.archive.modules.forms.HTMLForm
-
- setActionDir(ConfigPath) - Method in class org.archive.crawler.framework.ActionDirectory
-
- setAdd(CharSequence) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- setAdd(CharSequence) - Method in class org.archive.crawler.util.BloomUriUniqFilter
-
- setAdd(CharSequence) - Method in class org.archive.crawler.util.FPUriUniqFilter
-
- setAdd(CharSequence) - Method in class org.archive.crawler.util.MemUriUniqFilter
-
- setAdd(CharSequence) - Method in class org.archive.crawler.util.NoopUriUniqFilter
-
- setAdd(CharSequence) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- setAlertsLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setAllowByRegex(String) - Method in class org.archive.crawler.prefetch.Preselector
-
- setAllowCreate(boolean) - Method in class org.archive.bdb.BdbModule.BdbConfig
-
- setAlsoCheckVia(boolean) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- setApplicableSurtPrefix(String) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.framework.ActionDirectory
-
- setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.framework.CheckpointService
-
- setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.framework.CrawlController
-
- setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setApplicationContext(ApplicationContext) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setApplicationContext(ApplicationContext) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- setApplicationContext(ApplicationContext) - Method in class org.archive.modules.ScriptedProcessor
-
- setApplicationContext(ApplicationContext) - Method in class org.archive.spring.ConfigPathConfigurer
-
Remember ApplicationContext, and if possible primary
configuration file's home directory.
- setAsText(String) - Method in class org.archive.io.ReadSourceEditor
-
- setAsText(String) - Method in class org.archive.spring.ConfigPathEditor
-
- setAt(long, long) - Method in class org.archive.util.AbstractLongFPSet
-
Set the stored value at the given slot.
- setAt(long, long) - Method in class org.archive.util.fingerprint.MemLongFPSet
-
- setAudience(String) - Method in class org.archive.modules.CrawlMetadata
-
- setAuthenticationPreemptive(boolean) - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
Use
HttpClientParams.setAuthenticationPreemptive(boolean)
,
HttpClient.getParams()
.
- setAvailableRobotsPolicies(Map<String, RobotsPolicy>) - Method in class org.archive.modules.CrawlMetadata
-
- setBalanceReplenishAmount(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setBase(ConfigPath) - Method in class org.archive.spring.ConfigPath
-
- SetBasedUriUniqFilter - Class in org.archive.crawler.util
-
UriUniqFilter based on an underlying UriSet (essentially a Set).
- SetBasedUriUniqFilter() - Constructor for class org.archive.crawler.util.SetBasedUriUniqFilter
-
- setBasePrecedence(int) - Method in class org.archive.crawler.frontier.precedence.BaseQueuePrecedencePolicy
-
- setBasePrecedence(int) - Method in class org.archive.crawler.frontier.precedence.BaseUriPrecedencePolicy
-
- setBaseURI(String) - Method in class org.archive.modules.CrawlURI
-
Set the (HTML) Base URI used for derelativizing internal URIs.
- setBaseURI(UURI) - Method in class org.archive.modules.CrawlURI
-
- setBdbModule(BdbModule) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- setBdbModule(BdbModule) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- setBdbModule(BdbModule) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setBdbModule(BdbModule) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- setBdbModule(BdbModule) - Method in class org.archive.modules.fetcher.BdbCookieStorage
-
- setBdbModule(BdbModule) - Method in class org.archive.modules.fetcher.FetchWhois
-
- setBdbModule(BdbModule) - Method in class org.archive.modules.net.BdbServerCache
-
- setBdbModule(BdbModule) - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- setBdbModule(BdbModule) - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- setBeanFactory(BeanFactory) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
- setBeanFactory(BeanFactory) - Method in class org.archive.spring.Sheet
-
- setBeanName(String) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- setBeanName(String) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setBeanName(String) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
-
- setBeanName(String) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- setBeanName(String) - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- setBeanName(String) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- setBeanName(String) - Method in class org.archive.modules.Processor
-
- setBeanName(String) - Method in class org.archive.spring.HeritrixLifecycleProcessor
-
- setBeanName(String) - Method in class org.archive.spring.Sheet
-
- setBit(long) - Method in class org.archive.util.BloomFilter64bit
-
Changes the bit with index bitIndex in local bitvector.
- setBlockAll(boolean) - Method in class org.archive.crawler.prefetch.Preselector
-
- setBlockAwaitingSeedLines(int) - Method in class org.archive.modules.seeds.TextSeedModule
-
- setBlockByRegex(String) - Method in class org.archive.crawler.prefetch.Preselector
-
- setBloomFilter(BloomFilter) - Method in class org.archive.crawler.util.BloomUriUniqFilter
-
- setCachePercent(int) - Method in class org.archive.bdb.BdbModule
-
- setCacheSize(int) - Method in class org.archive.bdb.BdbModule
-
- setCalculateRobotsOnly(boolean) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- setCandidateChain(CandidateChain) - Method in class org.archive.crawler.framework.CrawlController
-
- setCandidateChain(CandidateChain) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- setCandidateUserAgents(List<String>) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
-
- setCandidateUserAgents(List<String>) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
-
- setCanonicalizationPolicy(UriCanonicalizationPolicy) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- setCanonicalString(String) - Method in class org.archive.modules.CrawlURI
-
- setCapacity(int) - Method in class org.archive.util.fingerprint.ArrayLongFPCache
-
- setCaseSensitiveFilesystem(boolean) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setCharacterMap(List<String>) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setCheckOutlinks(boolean) - Method in class org.archive.crawler.processor.CrawlMapper
-
- setCheckpointDir(ConfigPath) - Method in class org.archive.checkpointing.Checkpoint
-
- setCheckpointIntervalMinutes(long) - Method in class org.archive.crawler.framework.CheckpointService
-
Period at which to create automatic checkpoints; -1 means
no auto checkpointing.
- setCheckpointsDir(ConfigPath) - Method in class org.archive.crawler.framework.CheckpointService
-
Checkpoints directory
- setCheckUri(boolean) - Method in class org.archive.crawler.processor.CrawlMapper
-
- setChmod(boolean) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- setChmodValue(String) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- setClassKey(String) - Method in class org.archive.modules.CrawlURI
-
- setCollection(String) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- setComment(String) - Method in class org.apache.commons.httpclient.Cookie
-
If a user agent (web browser) presents this cookie to a user, the
cookie's purpose will be described using this comment.
- setComment(String) - Method in class org.archive.modules.deciderules.DecideRule
-
- setCompress(boolean) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setConditionalGetHeader(CrawlURI, HttpMethod, boolean, String, String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
Set the given conditional-GET header, if the setting is enabled and
a suitable value is available in the URI history.
- setConfigPathConfigurer(ConfigPathConfigurer) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
Autowire access to ConfigPathConfigurer
- setConfigurer(ConfigPathConfigurer) - Method in class org.archive.spring.ConfigPath
-
- setConnectionCloseForced(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets whether or not the connection should be force-closed when no longer
needed.
- setConnectionTimeout(int) - Method in class org.apache.commons.httpclient.HttpConnection
-
- setConnectTimeoutMs(int) - Method in class org.archive.modules.fetcher.FetchFTP.SocketFactoryWithTimeout
-
- setConsoleHandler() - Static method in class org.archive.util.OneLineSimpleLogger
-
- setContentDigest(byte[]) - Method in class org.archive.modules.CrawlURI
-
- setContentDigest(String, byte[]) - Method in class org.archive.modules.CrawlURI
-
- setContentDigestHistory(AbstractContentDigestHistory) - Method in class org.archive.modules.recrawl.ContentDigestHistoryLoader
-
- setContentDigestHistory(AbstractContentDigestHistory) - Method in class org.archive.modules.recrawl.ContentDigestHistoryStorer
-
- setContentLengthThreshold(long) - Method in class org.archive.modules.deciderules.ContentLengthDecideRule
-
- setContentLengthThreshold(long) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
-
- setContentRegexes(Map<String, String>) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
A map of { name => regex }.
- setContentSize(long) - Method in class org.archive.modules.CrawlURI
-
Sets the 'content size' for the URI, which is considered inclusive of all
of all recorded material (such as protocol headers) or even material
'virtually' considered (as in material from a previous fetch
confirmed unchanged with a server).
- setContentType(String) - Method in class org.archive.modules.CrawlURI
-
Set a fetched uri's content type.
- setContentTypeMap(List<String>) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setCookiePolicy(int) - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
Use HttpMethodParams.setCookiePolicy(String)
,
HttpMethod.getParams()
.
- setCookiesLoadFile(ConfigFile) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- setCookiesMap(SortedMap<String, Cookie>) - Method in class org.apache.commons.httpclient.HttpState
-
Replace the standard sorted map with an external implemenations
(such as one backed by persistent store, like BDB's StoredSortedMap.)
- setCookiesSaveFile(ConfigPath) - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- setCookieStorage(CookieStorage) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setCostAssignmentPolicy(CostAssignmentPolicy) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- setCount() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- setCount() - Method in class org.archive.crawler.util.BloomUriUniqFilter
-
- setCount() - Method in class org.archive.crawler.util.FPUriUniqFilter
-
- setCount() - Method in class org.archive.crawler.util.MemUriUniqFilter
-
- setCount() - Method in class org.archive.crawler.util.NoopUriUniqFilter
-
- setCount() - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- setCountryCode(String) - Method in class org.archive.modules.net.CrawlHost
-
Set country code for this hos
- setCountryCodes(List<String>) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- setCrawlController(CrawlController) - Method in class org.archive.crawler.deciderules.ClassKeyMatchesRegexDecideRule
-
- setCrawlController(CrawlController) - Method in class org.archive.crawler.framework.CheckpointService
-
- setCrawlController(CrawlController) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- setCrawlController(CrawlController) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setCrawlController(CrawlController) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
Autowire access to CrawlController
- setCrawlController(CrawlController) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- setCrawlController(CrawlController) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- setCrawlController(CrawlController) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setCrawlDelay(float) - Method in class org.archive.modules.net.RobotsDirectives
-
- setCrawlerCount(long) - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- setCrawlLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setCreateHostDirectory(boolean) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setCreatePortDirectory(boolean) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setCredentials(String, String, Credentials) - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
use #setCredentials(AuthScope, Credentials)
- setCredentials(AuthScope, Credentials) - Method in class org.apache.commons.httpclient.HttpState
-
Sets the credentials
for the given authentication
scope.
- setCredentials(Map<String, Credential>) - Method in class org.archive.modules.credential.CredentialStore
-
- setCredentialStore(CredentialStore) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- setCredentialStore(CredentialStore) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setCustomRobots(ReadSource) - Method in class org.archive.modules.net.CustomRobotsPolicy
-
- setDecision(DecideResult) - Method in class org.archive.modules.deciderules.PredicatedDecideRule
-
- setDefaultEncoding(String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setDefaultUriPrecedencePolicy(UriPrecedencePolicy) - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- setDeferredWrite(boolean) - Method in class org.archive.bdb.BdbModule.BdbConfig
-
- setDeferToPrevious(boolean) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
- setDelayFactor(float) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- setDelaySeconds(int) - Method in class org.archive.crawler.framework.ActionDirectory
-
- setDescription(String) - Method in class org.archive.modules.CrawlMetadata
-
- setDestination(UriUniqFilter.CrawlUriReceiver) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Receiver of uniq URIs.
- setDestination(UriUniqFilter.CrawlUriReceiver) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- setDestination(UriUniqFilter.CrawlUriReceiver) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- setDigestAlgorithm(String) - Method in class org.archive.modules.fetcher.FetchDNS
-
- setDigestAlgorithm(String) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setDigestAlgorithm(String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setDigestContent(boolean) - Method in class org.archive.modules.fetcher.FetchDNS
-
- setDigestContent(boolean) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setDigestContent(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setDir(ConfigPath) - Method in class org.archive.bdb.BdbModule
-
- setDirectory(ConfigPath) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setDirectoryFile(String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setDispositionChain(DispositionChain) - Method in class org.archive.crawler.framework.CrawlController
-
- setDiversionDir(ConfigPath) - Method in class org.archive.crawler.processor.CrawlMapper
-
- setDNSServerIPLabel(String) - Method in class org.archive.modules.CrawlURI
-
- setDoAuthentication(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets whether or not the HTTP method should automatically handle HTTP
authentication challenges (status code 401, etc.)
- setDomain(String) - Method in class org.apache.commons.httpclient.Cookie
-
Sets the domain attribute.
- setDomain(String) - Method in class org.archive.modules.credential.Credential
-
- setDomainAttributeSpecified(boolean) - Method in class org.apache.commons.httpclient.Cookie
-
Indicates whether the cookie had a domain specified in a
domain attribute of the Set-Cookie header.
- setDoneDir(ConfigPath) - Method in class org.archive.crawler.framework.ActionDirectory
-
- setDotBegin(String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setDotEnd(String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setDumpPendingAtClose(boolean) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- setEarliestNextURIEmitTime(long) - Method in class org.archive.modules.net.CrawlHost
-
Set the earliest time a URI for this host could be emitted.
- setEditFilter(IOFileFilter) - Method in class org.archive.crawler.restlet.EnhDirectory
-
- setEnabled(boolean) - Method in class org.archive.modules.canonicalize.BaseRule
-
- setEnabled(boolean) - Method in class org.archive.modules.deciderules.DecideRule
-
- setEnabled(boolean) - Method in class org.archive.modules.Processor
-
- setEngineName(String) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- setEngineName(String) - Method in class org.archive.modules.ScriptedProcessor
-
- setError(String) - Method in class org.archive.modules.CrawlURI
-
- setErrorPenaltyAmount(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setExpectedConcurrency(int) - Method in class org.archive.bdb.BdbModule
-
- setExpirationOperation(RuntimeLimitEnforcer.Operation) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- setExpiryDate(Date) - Method in class org.apache.commons.httpclient.Cookie
-
Sets expiration date.
- setExtract404s(boolean) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setExtractAllForms(boolean) - Method in class org.archive.modules.forms.ExtractorHTMLForms
-
- setExtractFromDirs(boolean) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setExtractIndependently(boolean) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setExtractJavascript(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setExtractOnlyFormGets(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setExtractorJS(ExtractorJS) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setExtractorJS(ExtractorJS) - Method in class org.archive.modules.extractor.ExtractorSWF
-
- setExtractorParameters(ExtractorParameters) - Method in class org.archive.modules.extractor.Extractor
-
- setExtractParent(boolean) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setExtractValueAttributes(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setFetchBeginTime(long) - Method in class org.archive.modules.CrawlURI
-
- setFetchChain(FetchChain) - Method in class org.archive.crawler.framework.CrawlController
-
- setFetchCompletedTime(long) - Method in class org.archive.modules.CrawlURI
-
- setFetchStatus(int) - Method in class org.archive.modules.CrawlURI
-
Set the overall/fetch status of this CrawlURI for
its current trip through the processing loop.
- setFetchType(CrawlURI.FetchType) - Method in class org.archive.modules.CrawlURI
-
- setFlashes(List<Flash>) - Method in class org.archive.crawler.restlet.models.ViewModel
-
- setFollowRedirects(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets whether or not the HTTP method should automatically follow HTTP redirects
(status code 302, etc.)
- setForceFetch(boolean) - Method in class org.archive.modules.CrawlURI
-
Method to signal that this URI should be fetched even though
it already has been crawled.
- setForceQueueAssignment(String) - Method in class org.archive.crawler.frontier.QueueAssignmentPolicy
-
- setForceRetire(boolean) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- setForceRetire(boolean) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setForceRetire(boolean) - Method in class org.archive.modules.CrawlURI
-
- setForgetAllButLatest(boolean) - Method in class org.archive.checkpointing.Checkpoint
-
- setForgetAllButLatest(boolean) - Method in class org.archive.crawler.framework.CheckpointService
-
True to save only the latest checkpoint, false to save all of them.
- setFormat(String) - Method in class org.archive.modules.canonicalize.RegexRule
-
- setFormat(String) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
-
- setFormItems(Map<String, String>) - Method in class org.archive.modules.credential.HtmlFormCredential
-
- setFpset(LongFPSet) - Method in class org.archive.crawler.util.FPUriUniqFilter
-
- setFrequentFlushes(boolean) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setFrontier(Frontier) - Method in class org.archive.crawler.framework.ActionDirectory
-
- setFrontier(Frontier) - Method in class org.archive.crawler.framework.CrawlController
-
- setFrontier(Frontier) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- setFrontier(Frontier) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setFrontier(Frontier) - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- setFrontier(Frontier) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
-
- setFrontierPreparer(FrontierPreparer) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setFullVia(CrawlURI) - Method in class org.archive.modules.CrawlURI
-
- setGetBit(long) - Method in class org.archive.util.BloomFilter64bit
-
Sets the bit with index bitIndex in local bitvector --
returning the old value.
- setGroupMaxAllKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setGroupMaxFetchResponses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setGroupMaxFetchSuccesses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setGroupMaxSuccessKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setHarvester(String) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- setHistoryDbName(String) - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- setHistoryDbName(String) - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- setHistoryLength(int) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
-
- setHolder(Object) - Method in class org.archive.modules.CrawlURI
-
Remember a 'holder' to which some enclosing/queueing
facility has assigned this CrawlURI
.
- setHolderCost(int) - Method in class org.archive.modules.CrawlURI
-
Remember a 'holderCost' which some enclosing/queueing
facility has assigned this CrawlURI
- setHolderKey(Object) - Method in class org.archive.modules.CrawlURI
-
Remember a 'holderKey' which some enclosing/queueing
facility has assigned this CrawlURI
.
- setHost(String) - Method in class org.apache.commons.httpclient.HttpConnection
-
Sets the host to connect to.
- setHostConfiguration(HostConfiguration) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
no longer applicable
- setHostMap(List<String>) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setHostMaxAllKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setHostMaxFetchResponses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setHostMaxFetchSuccesses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setHostMaxSuccessKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setHttp11(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
Use HttpMethodParams.setVersion(HttpVersion)
- setHttpAuthChallenges(Map<String, String>) - Method in class org.archive.modules.CrawlURI
-
- setHttpAuthChallenges(Map<String, String>) - Method in class org.archive.modules.net.CrawlServer
-
- setHttpBindAddress(String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setHttpConnectionManager(HttpConnectionManager) - Method in class org.apache.commons.httpclient.HttpConnection
-
Sets the httpConnectionManager.
- setHttpMethod(HttpMethod) - Method in class org.archive.modules.CrawlURI
-
- setHttpMethod(HtmlFormCredential.Method) - Method in class org.archive.modules.credential.HtmlFormCredential
-
- setHttpProxyHost(String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setHttpProxyPassword(String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setHttpProxyPort(int) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setHttpProxyUser(String) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.crawler.frontier.WorkQueue
-
- setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.crawler.reporting.SeedRecord
-
- setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.modules.net.CrawlHost
-
- setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.modules.net.CrawlServer
-
- setIdentityCache(ObjectIdentityCache<?>) - Method in interface org.archive.util.IdentityCacheable
-
- setIdentityCache(ObjectIdentityCache<?>) - Method in class org.archive.util.IdentityCacheableWrapper
-
- setIgnoreCookies(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setIgnoreFormActionUrls(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setIgnoreUnexpectedHtml(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setIncrementCounts(String) - Method in class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
-
- setInferRootPage(boolean) - Method in class org.archive.modules.extractor.ExtractorHTTP
-
- setInitialDelaySeconds(int) - Method in class org.archive.crawler.framework.ActionDirectory
-
- setIntervalSeconds(int) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setIP(InetAddress, long) - Method in class org.archive.modules.net.CrawlHost
-
Set the IP address for this host.
- setIpAddresses(Set<String>) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
-
- setIpValidityDurationSeconds(int) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- setIsolateThreads(boolean) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- setIsolateThreads(boolean) - Method in class org.archive.modules.ScriptedProcessor
-
- setJobName(String) - Method in class org.archive.modules.CrawlMetadata
-
- setKeepSnapshotsCount(int) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setLargestQueuesCount(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setLastResponseInputStream(InputStream) - Method in class org.apache.commons.httpclient.HttpConnection
-
Set the state to keep track of the last response for the last request.
- setListLogicalOr(boolean) - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
-
- setLiveHostReportSize(int) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setLocalAddress(InetAddress) - Method in class org.apache.commons.httpclient.HttpConnection
-
Set the local address used when creating the connection.
- setLocalName(String) - Method in class org.archive.crawler.processor.CrawlMapper
-
- setLocked(boolean) - Method in class org.apache.commons.httpclient.HttpConnection
-
Locks or unlocks the connection.
- setLogExtraInfo(boolean) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setLogFile(ConfigPath) - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.framework.CrawlController
-
- setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.framework.Scoper
-
- setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- setLoggerModule(CrawlerLoggerModule) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- setLoggerModule(SimpleFileLoggerProvider) - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- setLoggerModule(UriErrorLoggerModule) - Method in class org.archive.modules.extractor.Extractor
-
- setLoggerModule(UriErrorLoggerModule) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- setLogin(String) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- setLoginPassword(String) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- setLoginUri(String) - Method in class org.archive.modules.credential.HtmlFormCredential
-
- setLoginUsername(String) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- setLogRejectsRule(DecideRule) - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
- setLogToFile(boolean) - Method in class org.archive.crawler.framework.Scoper
-
- setLogToFile(boolean) - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- setLookup(ExternalGeoLookupInterface) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- setLowerBound(Integer) - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
-
Sets the lower bound on the range of acceptable status codes.
- setLowerBound(Integer) - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
-
Sets the lower bound on the range of acceptable status codes.
- setLowerBound(long) - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
-
The rule will apply if the url has been fetched and content body length
is greater than or equal to this number of bytes.
- setMap(Map<String, Object>) - Method in class org.archive.spring.Sheet
-
Set map of property full bean-path (starting with a target
bean-name) to alternate values.
- setMapPath(ConfigPath) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
-
- setMapUri(String) - Method in class org.archive.crawler.processor.LexicalCrawlMapper
-
- setMaxAttributeNameLength(int) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setMaxAttributeValLength(int) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setMaxBytesDownload(long) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- setMaxDelayMs(int) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- setMaxDocumentsDownload(long) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- setMaxElementLength(int) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setMaxFetchKBSec(int) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setMaxFetchKBSec(int) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setMaxFileSizeBytes(long) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- setMaxFileSizeBytes(long) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setMaxHops(int) - Method in class org.archive.modules.deciderules.TooManyHopsDecideRule
-
- setMaxLengthBytes(long) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setMaxLengthBytes(long) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setMaxOutlinks(int) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setMaxPathDepth(int) - Method in class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
-
- setMaxPathLength(int) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setMaxPending(int) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- setMaxPerHostBandwidthUsageKbSec(int) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- setMaxQueuesPerReportCategory(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setMaxRepetitions(int) - Method in class org.archive.modules.deciderules.PathologicalPathDecideRule
-
- setMaxRetries(int) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setMaxSegLength(int) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setMaxSize(int) - Method in class org.archive.crawler.util.TopNSet
-
- setMaxSizeToDigest(long) - Method in class org.archive.modules.extractor.HTTPContentDigest
-
- setMaxSizeToParse(long) - Method in class org.archive.modules.extractor.ExtractorPDF
-
- setMaxSizeToParse(long) - Method in class org.archive.modules.extractor.ExtractorUniversal
-
- setMaxSpeculativeHops(int) - Method in class org.archive.modules.deciderules.TransclusionDecideRule
-
- setMaxTimeSeconds(long) - Method in class org.archive.crawler.framework.CrawlLimitEnforcer
-
- setMaxToeThreads(int) - Method in class org.archive.crawler.framework.CrawlController
-
- setMaxTotalBytesToWrite(long) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setMaxTransHops(int) - Method in class org.archive.modules.deciderules.TransclusionDecideRule
-
- setMaxWaitForIdleMs(int) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setMetadata(CrawlMetadata) - Method in class org.archive.crawler.framework.CrawlController
-
- setMetadata(CrawlMetadata) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- setMetadata(CrawlMetadata) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- setMetadata(CrawlMetadata) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setMetadataProvider(CrawlMetadata) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setMethod(String) - Method in class org.archive.modules.forms.HTMLForm
-
- setMethodRetryHandler(MethodRetryHandler) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
use HttpMethodParams
- setMinDelayMs(int) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- setMonitorConfigPaths(boolean) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- setMonitorMounts(List<String>) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- setMonitorPaths(List<String>) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
- setName(String) - Method in class org.archive.spring.ConfigPath
-
- setName(String) - Method in class org.archive.spring.Sheet
-
- setNavlinksOnly(boolean) - Method in class org.archive.crawler.frontier.precedence.HopsUriPrecedencePolicy
-
- setNonfatalErrorsLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setObeyMetaRobotsNofollow(boolean) - Method in class org.archive.modules.net.CustomRobotsPolicy
-
- setObeyMetaRobotsNofollow(boolean) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
-
- setObeyMetaRobotsNofollow(boolean) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
-
- setOnlyStoreIfWriteTagPresent(boolean) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
-
- setOperator(String) - Method in class org.archive.modules.CrawlMetadata
-
- setOperatorContactUrl(String) - Method in class org.archive.modules.CrawlMetadata
-
- setOperatorFrom(String) - Method in class org.archive.modules.CrawlMetadata
-
- setOrder(int) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
-
- setOrdinal(long) - Method in class org.archive.modules.CrawlURI
-
- setOrganization(String) - Method in class org.archive.modules.CrawlMetadata
-
- setOutlinkRule(DecideRule) - Method in class org.archive.crawler.processor.CrawlMapper
-
- setOverlayMapsSource(OverlayMapsSource) - Method in class org.archive.modules.CrawlURI
-
- setParallelQueues(int) - Method in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
-
- setParams(HttpConnectionParams) - Method in class org.apache.commons.httpclient.HttpConnection
-
Assigns HTTP protocol parameters
for this method.
- setParams(HttpMethodParams) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Assigns HTTP protocol parameters
for this method.
- setPassword(String) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- setPassword(String) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setPath(String) - Method in class org.apache.commons.httpclient.Cookie
-
Sets the path attribute.
- setPath(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets the path of the HTTP method.
- setPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setPath(ConfigPath) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- setPath(ConfigPath) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setPath(String) - Method in class org.archive.spring.ConfigPath
-
- setPathAttributeSpecified(boolean) - Method in class org.apache.commons.httpclient.Cookie
-
Indicates whether the cookie had a path specified in a
path attribute of the Set-Cookie header.
- setPauseAtStart(boolean) - Method in class org.archive.crawler.framework.CrawlController
-
- setPauseThresholdKb(int) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- setPauseThresholdMiB(long) - Method in class org.archive.crawler.monitor.DiskSpaceMonitor
-
Set the minimum amount of space that must be available on all monitored paths.
- setPolitenessDelay(long) - Method in class org.archive.modules.CrawlURI
-
- setPool(WriterPool) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setPoolMaxActive(int) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setPort(int) - Method in class org.apache.commons.httpclient.HttpConnection
-
Sets the port to connect to.
- setPrecedence(Integer) - Method in class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
-
- setPrecedence(int) - Method in class org.archive.modules.CrawlURI
-
- setPrecedenceFloor(int) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setPrecedenceProvider(PrecedenceProvider) - Method in class org.archive.crawler.frontier.WorkQueue
-
- setPreferenceDepthHops(int) - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
- setPreferenceDepthHops(int) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- setPreferenceEmbedHops(int) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- setPrefix(String) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setPreloadSource(ConfigPath) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
-
- setPreloadSourceUrl(String) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
-
- setPrerequisite(boolean) - Method in class org.archive.modules.CrawlURI
-
Set if this CrawlURI is itself a prerequisite URI.
- setPrerequisiteUri(CrawlURI) - Method in class org.archive.modules.CrawlURI
-
Set a prerequisite for this URI.
- setProcessErrorOutlinks(boolean) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- setProcessors(List<Processor>) - Method in class org.archive.modules.ProcessorChain
-
- setProfileLog(File) - Method in interface org.archive.crawler.datamodel.UriUniqFilter
-
Set a File to receive a log for replay profiling.
- setProfileLog(File) - Method in class org.archive.crawler.util.FPMergeUriUniqFilter
-
- setProfileLog(File) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- setProgressLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setProtocol(Protocol) - Method in class org.apache.commons.httpclient.HttpConnection
-
Sets the protocol used to establish the connection
- setProxyCredentials(String, String, Credentials) - Method in class org.apache.commons.httpclient.HttpState
-
Deprecated.
use #setProxyCredentials(AuthScope, Credentials)
- setProxyCredentials(AuthScope, Credentials) - Method in class org.apache.commons.httpclient.HttpState
-
Sets the proxy credentials
for the given authentication
realm.
- setProxyHost(String) - Method in class org.apache.commons.httpclient.HttpConnection
-
Sets the host to proxy through.
- setProxyPort(int) - Method in class org.apache.commons.httpclient.HttpConnection
-
Sets the port of the host to proxy through.
- setQueryString(String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets the query string of this HTTP method.
- setQueryString(NameValuePair[]) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets the query string of this HTTP method.
- setQueueAssignmentPolicy(QueueAssignmentPolicy) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- setQueuePrecedencePolicy(QueuePrecedencePolicy) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setQueueTotalBudget(long) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setRealm(String) - Method in class org.archive.modules.credential.HttpAuthenticationCredential
-
- setRecheckScope(boolean) - Method in class org.archive.crawler.prefetch.Preselector
-
- setRecheckThresholdKb(int) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- setRecorder(Recorder) - Method in class org.archive.modules.CrawlURI
-
Set the http recorder to be associated with this uri.
- setRecorderInBufferBytes(int) - Method in class org.archive.crawler.framework.CrawlController
-
- setRecorderOutBufferBytes(int) - Method in class org.archive.crawler.framework.CrawlController
-
- setRecordIDGenerator(RecordIDGenerator) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
-
- setRecoveryCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
-
Used to inform a bean that it should restore its state from
the given Checkpoint when launched (Lifecycle start()).
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CheckpointService
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
-
- setRecoveryCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
-
- setRecoveryCheckpointByName(String) - Method in class org.archive.crawler.framework.CheckpointService
-
Given the name of a valid checkpoint subdirectory in the checkpoints
directory, create a Checkpoint instance, and insert it into all
Checkpointable beans.
- setRecoveryLogEnabled(boolean) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setReducePrefixRegex(String) - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- setRegex(Pattern) - Method in class org.archive.modules.canonicalize.RegexRule
-
- setRegex(Pattern) - Method in class org.archive.modules.deciderules.MatchesRegexDecideRule
-
- setRegex(Pattern) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
-
- setRegexList(List<Pattern>) - Method in class org.archive.modules.deciderules.MatchesListRegexDecideRule
-
- setRemove(CharSequence) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- setRemove(CharSequence) - Method in class org.archive.crawler.util.BloomUriUniqFilter
-
- setRemove(CharSequence) - Method in class org.archive.crawler.util.FPUriUniqFilter
-
- setRemove(CharSequence) - Method in class org.archive.crawler.util.MemUriUniqFilter
-
- setRemove(CharSequence) - Method in class org.archive.crawler.util.NoopUriUniqFilter
-
- setRemove(CharSequence) - Method in class org.archive.crawler.util.SetBasedUriUniqFilter
-
- setRemoveTriggerUris(boolean) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
-
- setReports(List<Report>) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setReportsDir(ConfigPath) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setRequestHeader(String, String) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Set the specified request header, overwriting any previous value.
- setRequestHeader(Header) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets the specified request header, overwriting any previous value.
- setRescheduleDelaySeconds(long) - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
-
- setRescheduleTime(long) - Method in class org.archive.modules.CrawlURI
-
- setRespectCrawlDelayUpToSeconds(int) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- setResponseStream(InputStream) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets the response stream.
- setRetired(boolean) - Method in class org.archive.crawler.frontier.WorkQueue
-
Set the retired status of this queue.
- setRetryDelaySeconds(int) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setRobotsPolicyName(String) - Method in class org.archive.modules.CrawlMetadata
-
- setRobotsValidityDurationSeconds(int) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- setRotationDigits(int) - Method in class org.archive.crawler.processor.CrawlMapper
-
- setRules(DecideRule) - Method in class org.archive.crawler.spring.DecideRuledSheetAssociation
-
- setRules(List<CanonicalizationRule>) - Method in class org.archive.modules.canonicalize.RulesCanonicalizationPolicy
-
- setRules(List<DecideRule>) - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- setRuntimeErrorsLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setRuntimeSeconds(long) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- setRunWhileEmpty(boolean) - Method in class org.archive.crawler.framework.CrawlController
-
- setSchedulingDirective(int) - Method in class org.archive.modules.CrawlURI
-
- setSchemes(Set<String>) - Method in class org.archive.modules.deciderules.SchemeNotInSetDecideRule
-
- setScope(DecideRule) - Method in class org.archive.crawler.framework.Scoper
-
- setScope(DecideRule) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setScratchDir(ConfigPath) - Method in class org.archive.crawler.framework.CrawlController
-
- setScriptSource(ReadSource) - Method in class org.archive.modules.deciderules.ScriptedDecideRule
-
- setScriptSource(ReadSource) - Method in class org.archive.modules.ScriptedProcessor
-
- setSecure(boolean) - Method in class org.apache.commons.httpclient.Cookie
-
Sets the secure attribute of the cookie.
- setSeed(boolean) - Method in class org.archive.modules.CrawlURI
-
Set the isSeed attribute of this URI.
- setSeedListeners(Set<SeedListener>) - Method in class org.archive.modules.seeds.SeedModule
-
- setSeeds(SeedModule) - Method in class org.archive.crawler.framework.ActionDirectory
-
- setSeeds(SeedModule) - Method in class org.archive.crawler.framework.CrawlController
-
- setSeeds(SeedModule) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setSeeds(SeedModule) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- setSeeds(SeedModule) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setSeeds(SeedModule) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- setSeedsAsSurtPrefixes(boolean) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- setSeedsRedirectNewSeeds(boolean) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- setSeedsRedirectNewSeeds(boolean) - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
- setSendBufferSize(int) - Method in class org.apache.commons.httpclient.HttpConnection
-
- setSendConnectionClose(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setSendIfModifiedSince(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setSendIfNoneMatch(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setSendRange(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setSendReferer(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setServerCache(ServerCache) - Method in class org.archive.crawler.framework.CrawlController
-
- setServerCache(ServerCache) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setServerCache(ServerCache) - Method in class org.archive.crawler.frontier.BucketQueueAssignmentPolicy
-
- setServerCache(ServerCache) - Method in class org.archive.crawler.frontier.IPQueueAssignmentPolicy
-
- setServerCache(ServerCache) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- setServerCache(ServerCache) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- setServerCache(ServerCache) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setServerCache(ServerCache) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setServerCache(ServerCache) - Method in class org.archive.modules.deciderules.ExternalGeoLocationDecideRule
-
- setServerCache(ServerCache) - Method in class org.archive.modules.deciderules.IpAddressSetDecideRule
-
- setServerCache(ServerCache) - Method in class org.archive.modules.fetcher.FetchDNS
-
- setServerCache(ServerCache) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setServerCache(ServerCache) - Method in class org.archive.modules.fetcher.FetchWhois
-
- setServerCache(ServerCache) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- setServerCache(ServerCache) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setServerMaxAllKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setServerMaxFetchResponses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setServerMaxFetchSuccesses(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setServerMaxSuccessKb(long) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- setSessionBudget(int) - Method in class org.archive.crawler.frontier.WorkQueue
-
Set the session 'activity budget' to the given value.
- setSheetOverlaysManager(SheetOverlaysManager) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- setSheetOverlaysManager(SheetOverlaysManager) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- setSheetsByName(Map<String, Sheet>) - Method in class org.archive.crawler.spring.SheetOverlaysManager
-
Collect all Sheets, by beanName.
- setShouldFetchBodyRule(DecideRule) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setShouldMasquerade(boolean) - Method in class org.archive.modules.net.FirstNamedRobotsPolicy
-
- setShouldMasquerade(boolean) - Method in class org.archive.modules.net.MostFavoredRobotsPolicy
-
- setShouldProcessRule(DecideRule) - Method in class org.archive.modules.Processor
-
- setShouldReportAtEndOfCrawl(boolean) - Method in class org.archive.crawler.reporting.Report
-
- setShouldReportDuringCrawl(boolean) - Method in class org.archive.crawler.reporting.Report
-
- setSize(int) - Method in class org.archive.crawler.framework.ToePool
-
Change the number of ToeThreads.
- setSizes(CrawlURI, Recorder) - Method in class org.archive.modules.fetcher.FetchHTTP
-
Update CrawlURI internal sizes based on current transaction (and
in the case of 304s, history)
- setSkipIdenticalDigests(boolean) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setSnoozeLongMs(long) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setSocketTimeout(int) - Method in class org.apache.commons.httpclient.HttpConnection
-
Sets
SO_TIMEOUT
value directly on the underlying
socket
.
- setSortedDuplicates(boolean) - Method in class org.archive.bdb.BdbModule.BdbConfig
-
- setSoTimeout(int) - Method in class org.apache.commons.httpclient.HttpConnection
-
- setSoTimeoutMs(int) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setSoTimeoutMs(int) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setSoTimeoutMs(int) - Method in class org.archive.modules.fetcher.FetchWhois
-
- setSourceTag(String) - Method in class org.archive.modules.CrawlURI
-
- setSourceTagSeeds(boolean) - Method in class org.archive.modules.seeds.SeedModule
-
- setSpecialQueryTemplates(Map<String, String>) - Method in class org.archive.modules.fetcher.FetchWhois
-
- setSslTrustLevel(ConfigurableX509TrustManager.TrustLevel) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setStaleCheckingEnabled(boolean) - Method in class org.apache.commons.httpclient.HttpConnection
-
- setStartNewFilesOnCheckpoint(boolean) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
Whether to close output files and start new ones on checkpoint.
- setStatisticsTracker(StatisticsTracker) - Method in class org.archive.crawler.framework.CrawlController
-
- setStatisticsTracker(StatisticsTracker) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- setStatusCodes(List<Integer>) - Method in class org.archive.modules.deciderules.FetchStatusDecideRule
-
- setStep(ToeThread.Step, String) - Method in class org.archive.crawler.framework.ToeThread
-
- setStorePaths(List<ConfigPath>) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setStrictMode(boolean) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Deprecated.
Use HttpParams.setParameter(String, Object)
to exercise a more granular control over HTTP protocol strictness.
- setStripRegex(String) - Method in class org.archive.modules.extractor.HTTPContentDigest
-
- setSuccess(boolean) - Method in class org.archive.checkpointing.Checkpoint
-
- setSuffixAtEnd(boolean) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setSupplementaryRule(DecideRule) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
-
- setSurtPrefixes(List<String>) - Method in class org.archive.crawler.spring.SurtPrefixesSheetAssociation
-
- setSurtsDumpFile(ConfigFile) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- setSurtsSource(ReadSource) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- setSurtsSourceFile(ConfigFile) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
Deprecated.
- setTargetSheetNames(List<String>) - Method in class org.archive.crawler.spring.SheetAssociation
-
- setTemplate(String) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
URI-building template.
- setTemplate(String) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setTemplateConfiguration(Configuration) - Method in class org.archive.crawler.restlet.BeanBrowseResource
-
- setTemplateConfiguration(Configuration) - Method in class org.archive.crawler.restlet.EngineResource
-
- setTemplateConfiguration(Configuration) - Method in class org.archive.crawler.restlet.JobResource
-
- setTemplateConfiguration(Configuration) - Method in class org.archive.crawler.restlet.ScriptResource
-
- setTextSource(ReadSource) - Method in class org.archive.modules.seeds.TextSeedModule
-
- setThreadLogger(Logger) - Static method in class org.archive.crawler.reporting.AlertThreadGroup
-
set alternate temporary alert logger
- setThreadNumber(int) - Method in class org.archive.modules.CrawlURI
-
Set the number of the ToeThread responsible for processing this uri.
- setTimeoutSeconds(int) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setTimeoutSeconds(int) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setTooLongDirectory(String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setTotalBudget(long) - Method in class org.archive.crawler.frontier.WorkQueue
-
Set the total expenditure level allowable before queue is
considered inherently 'over-budget'.
- setTotalBytesWritten(long) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setTrackSeeds(boolean) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setTrackSources(boolean) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- setTransactional(boolean) - Method in class org.archive.bdb.BdbModule.BdbConfig
-
- setTreatFramesAsEmbedLinks(boolean) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- setUnderscoreSet(List<String>) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- setUnresolvable(CrawlURI, CrawlHost) - Method in class org.archive.modules.fetcher.FetchDNS
-
- setup(File, boolean) - Method in class org.archive.bdb.BdbModule
-
- setUp() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
-
- setUp() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
-
- setUp() - Method in class org.archive.util.TmpDirTestCase
-
- setupCheckpointTask() - Method in class org.archive.crawler.framework.CheckpointService
-
Setup checkpointTask according to current interval.
- setupCopyEnvironment(File) - Static method in class org.archive.modules.recrawl.PersistProcessor
-
- setupCopyEnvironment(File, boolean) - Static method in class org.archive.modules.recrawl.PersistProcessor
-
- setupGlobalProperties(int) - Method in class org.archive.crawler.Heritrix
-
Setup global system properties that may be of use elsewhere.
- setupLogs() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setUpperBound(Integer) - Method in class org.archive.modules.deciderules.MatchesStatusCodeDecideRule
-
Sets the upper bound on the range of acceptable status codes.
- setUpperBound(Integer) - Method in class org.archive.modules.deciderules.NotMatchesStatusCodeDecideRule
-
Sets the upper bound on the range of acceptable status codes.
- setUpperBound(long) - Method in class org.archive.modules.deciderules.ResponseContentLengthDecideRule
-
The rule will apply if the url has been fetched and content body length
is less than or equal to this number of bytes.
- setupPool(AtomicInteger) - Method in class org.archive.modules.writer.ARCWriterProcessor
-
- setupPool(AtomicInteger) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- setupPool(AtomicInteger) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
Set up pool of files.
- setupServer(int, String, String, String, String) - Method in class org.archive.crawler.Heritrix
-
Create an HTTPS restlet Server instance matching the given parameters.
- setupSimpleLog(String) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setupSimpleLog(String) - Method in interface org.archive.modules.SimpleFileLoggerProvider
-
- setupToePool() - Method in class org.archive.crawler.framework.CrawlController
-
- setURI(URI) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Sets the URI for this method.
- setUriErrorsLogPath(ConfigPath) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- setUriPrecedencePolicy(UriPrecedencePolicy) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- setUriRegex(String) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
Regular expression against which to match the URI.
- setUriUniqFilter(UriUniqFilter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- setUseHardLinkCheckpoints(boolean) - Method in class org.archive.bdb.BdbModule
-
- setUseHeaderLength(boolean) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
-
- setUseHTTP11(boolean) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setUsePreset(MatchesFilePatternDecideRule.Preset) - Method in class org.archive.modules.deciderules.MatchesFilePatternDecideRule
-
- setUsePublicSuffixesRegex(boolean) - Method in class org.archive.crawler.processor.HashCrawlMapper
-
- setUserAgent(String) - Method in class org.archive.modules.CrawlURI
-
Set the user agent to use when crawling this URI.
- setUserAgentProvider(UserAgentProvider) - Method in class org.archive.modules.fetcher.FetchHTTP
-
- setUserAgentTemplate(String) - Method in class org.archive.modules.CrawlMetadata
-
- setUsername(String) - Method in class org.archive.modules.fetcher.FetchFTP
-
- setUseSharedCache(boolean) - Method in class org.archive.bdb.BdbModule
-
- setValidDateFormats(Collection<String>) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Sets the
Collection
of date patterns used for parsing.
- setValidDateFormats(Collection<String>) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
- setValidDateFormats(Collection<String>) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
Does nothing.
- setValue(Object) - Method in class org.archive.io.ReadSourceEditor
-
- setValue(Object) - Method in class org.archive.spring.ConfigPathEditor
-
- setValue(String) - Method in class org.archive.spring.ConfigString
-
- setVersion(int) - Method in class org.apache.commons.httpclient.Cookie
-
Sets the version of the cookie specification to which this
cookie conforms.
- setVia(UURI) - Method in class org.archive.modules.CrawlURI
-
- setVirtualHost(String) - Method in class org.apache.commons.httpclient.HttpConnection
-
Deprecated.
no longer applicable
- setWakeTime(long) - Method in class org.archive.crawler.frontier.WorkQueue
-
- setWriteBufferSize(int) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- setWriteMetadata(boolean) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- setWriteRequests(boolean) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- setWriteRevisitForIdenticalDigests(boolean) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- setWriteRevisitForNotModified(boolean) - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- sharedEngine - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
-
- sharedEngine - Variable in class org.archive.modules.ScriptedProcessor
-
- Sheet - Class in org.archive.spring
-
Collection of overrides: alternative values for object properties
that should apply in some contexts.
- Sheet() - Constructor for class org.archive.spring.Sheet
-
- SheetAssociation - Class in org.archive.crawler.spring
-
Represents target Sheets that should be associated with
some grouping of URIs.
- SheetAssociation() - Constructor for class org.archive.crawler.spring.SheetAssociation
-
- sheetNamesBySurt - Variable in class org.archive.crawler.spring.SheetOverlaysManager
-
- sheetOverlaysManager - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- sheetOverlaysManager - Variable in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- SheetOverlaysManager - Class in org.archive.crawler.spring
-
Manager which marks-up CrawlURIs with the names of all applicable
Sheets, and returns overlay maps by name.
- SheetOverlaysManager() - Constructor for class org.archive.crawler.spring.SheetOverlaysManager
-
- sheetsByName - Variable in class org.archive.crawler.spring.SheetOverlaysManager
-
all sheets by (bean)name
- shortMessage(BeansException) - Method in class org.archive.crawler.framework.CrawlJob
-
Return a short useful message for common BeansExceptions.
- shortName - Variable in class org.archive.checkpointing.Checkpoint
-
- shortReportLegend() - Method in class org.archive.crawler.framework.ToePool
-
- shortReportLegend() - Method in class org.archive.crawler.framework.ToeThread
-
- shortReportLegend() - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
-
- shortReportLegend() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
-
- shortReportLegend() - Method in class org.archive.crawler.frontier.WorkQueue
-
- shortReportLegend() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- shortReportLegend() - Method in class org.archive.modules.CrawlURI
-
- shortReportLegend() - Method in class org.archive.modules.fetcher.FetchStats
-
- shortReportLegend() - Method in class org.archive.modules.ProcessorChain
-
- shortReportLine() - Method in class org.archive.crawler.framework.ToeThread
-
- shortReportLine() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- shortReportLine() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
-
- shortReportLine() - Method in class org.archive.crawler.frontier.WorkQueue
-
- shortReportLine() - Method in class org.archive.modules.CrawlURI
-
- shortReportLine() - Method in class org.archive.modules.fetcher.FetchStats
-
- shortReportLine(Reporter) - Static method in class org.archive.util.ReportUtils
-
Utility method to get a String shortReportLine from Reporter
- shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.framework.ToePool
-
- shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.framework.ToeThread
-
- shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
-
- shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
-
- shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueue
-
- shortReportLineTo(PrintWriter) - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- shortReportLineTo(PrintWriter) - Method in class org.archive.modules.CrawlURI
-
- shortReportLineTo(PrintWriter) - Method in class org.archive.modules.fetcher.FetchStats
-
- shortReportLineTo(PrintWriter) - Method in class org.archive.modules.ProcessorChain
-
- shortReportMap() - Method in class org.archive.crawler.framework.ToePool
-
- shortReportMap() - Method in class org.archive.crawler.framework.ToeThread
-
- shortReportMap() - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
-
- shortReportMap() - Method in class org.archive.crawler.frontier.WorkQueue
-
- shortReportMap() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- shortReportMap() - Method in class org.archive.modules.CrawlURI
-
- shortReportMap() - Method in class org.archive.modules.fetcher.FetchStats
-
- shortReportMap() - Method in class org.archive.modules.ProcessorChain
-
- shouldCloseConnection(HttpConnection) - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Tests if the connection should be closed after the method has been executed.
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
-
Determines if otherwise valid URIs should have links extracted or not.
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorCSS
-
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorDOC
-
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTML
-
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorJS
-
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorPDF
-
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorSWF
-
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorUniversal
-
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorXML
-
- shouldExtract(CrawlURI) - Method in class org.archive.modules.extractor.TrapSuppressExtractor
-
- shouldLoad(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
-
Whether the current CrawlURI's state should be loaded
- shouldMasquerade - Variable in class org.archive.modules.net.FirstNamedRobotsPolicy
-
whether to adopt the user-agent that is allowed for the fetch
- shouldMasquerade - Variable in class org.archive.modules.net.MostFavoredRobotsPolicy
-
whether to adopt the user-agent that is allowed for the fetch
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.CandidatesProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.DispositionProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LinksScoper
-
Deprecated.
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.ReschedulingProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.postprocessor.SupplementaryLinksScoper
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.CandidateScoper
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.FrontierPreparer
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.PreconditionEnforcer
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.Preselector
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.QuotaEnforcer
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- shouldProcess(CrawlURI) - Method in class org.archive.crawler.processor.CrawlMapper
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ContentExtractor
-
Determines if links should be extracted from the given URI.
- shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorHTTP
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorImpliedURI
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorMultipleRegex
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.ExtractorURI
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.extractor.HTTPContentDigest
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchDNS
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchFTP
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchHTTP
-
Can this processor fetch the given CrawlURI.
- shouldProcess(CrawlURI) - Method in class org.archive.modules.fetcher.FetchWhois
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.forms.ExtractorHTMLForms
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.Processor
-
Determines whether the given uri should be processed by this
processor.
- shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryLoader
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.ContentDigestHistoryStorer
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.FetchHistoryProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLoadProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.recrawl.PersistStoreProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.ScriptedProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.writer.Kw3WriterProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- shouldProcess(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- shouldRetire() - Method in class org.archive.crawler.framework.ToeThread
-
Whether this thread should cleanly retire at the earliest
opportunity.
- shouldStore(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractPersistProcessor
-
Whether the current CrawlURI's state should be persisted (to log or
direct to database)
- shouldWrite(CrawlURI) - Method in class org.archive.modules.writer.WriterPoolProcessor
-
Whether the given CrawlURI should be written to archive files.
- shutdown() - Method in class org.archive.crawler.framework.Engine
-
- shutdownOutput() - Method in class org.apache.commons.httpclient.HttpConnection
-
- SimpleCookieStorage - Class in org.archive.modules.fetcher
-
- SimpleCookieStorage() - Constructor for class org.archive.modules.fetcher.SimpleCookieStorage
-
- SimpleFileLoggerProvider - Interface in org.archive.modules
-
- SimplePrecedenceProvider - Class in org.archive.crawler.frontier.precedence
-
The most simple precedence provider, simply wrapping a resettable
integer value.
- SimplePrecedenceProvider(int) - Constructor for class org.archive.crawler.frontier.precedence.SimplePrecedenceProvider
-
- size() - Method in class org.archive.bdb.StoredQueue
-
- size() - Method in class org.archive.crawler.util.TopNSet
-
- size() - Method in class org.archive.modules.ProcessorChain
-
- size() - Method in interface org.archive.util.BloomFilter
-
The number of character sequences in the filter (considered to be the
number of add()s that returned 'true')
- size - Variable in class org.archive.util.BloomFilter64bit
-
The number of elements currently in the filter.
- size() - Method in class org.archive.util.BloomFilter64bit
-
The number of character sequences in the filter.
- size() - Method in class org.archive.util.ObjectIdentityBdbCache
-
- size() - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
- size() - Method in interface org.archive.util.ObjectIdentityCache
-
count of name-to-object contained
- size() - Method in class org.archive.util.ObjectIdentityMemCache
-
- size() - Method in class org.archive.util.Transform
-
- sizeTotalsReport() - Method in class org.archive.crawler.framework.CrawlJob
-
- sizeTotalsReportData() - Method in class org.archive.crawler.framework.CrawlJob
-
- skip(int) - Method in class org.archive.crawler.util.DiskFPMergeUriUniqFilter.DataFileLongIterator
-
- skip(long) - Method in class org.archive.util.ms.BlockInputStream
-
- skipIdenticalDigests - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Whether to skip the writing of a record when URI history information is
available and indicates the prior fetch had an identical content digest.
- slots - Variable in class org.archive.util.fingerprint.MemLongFPSet
-
- smallestKnownKey - Variable in class org.archive.crawler.util.TopNSet
-
- smallestKnownValue - Variable in class org.archive.crawler.util.TopNSet
-
- smear - Variable in class org.archive.util.fingerprint.ArrayLongFPCache
-
- sn - Variable in class org.archive.bdb.BdbModule
-
uniqueness serial number for temp map databases
- snapshot - Variable in class org.archive.crawler.event.StatSnapshotEvent
-
- snapshots - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
snapshots of crawl tallies and rates
- snapshotToLaunchDir(File) - Method in class org.archive.spring.ConfigPathConfigurer
-
- snoozedClassQueues - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
All per-class queues held in snoozed state, sorted by wake time.
- snoozedOverflow - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
- snoozedOverflowCount - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
- snoozeLongMs - Variable in class org.archive.crawler.frontier.WorkQueueFrontier
-
When a snooze target for a queue is longer than this amount, the queue
will be "long snoozed" instead of "short snoozed".
- socketFactory - Variable in class org.archive.modules.fetcher.FetchFTP
-
- sortedDuplicates - Variable in class org.archive.bdb.BdbModule.BdbConfig
-
- sortShiftStatusCode() - Method in class org.archive.crawler.reporting.SeedRecord
-
- sourceHostDistribution - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
Keep track of URL counts per host per seed
- sourceOrderXmlDom - Variable in class org.archive.crawler.migrate.MigrateH1to3Tool
-
- sourceTagSeeds - Variable in class org.archive.modules.seeds.SeedModule
-
Whether to tag seeds with their own URI as a heritable 'source' String,
which will be carried-forward to all URIs discovered on paths originating
from that seed.
- SourceTagsReport - Class in org.archive.crawler.reporting
-
The "Source Report", tallies of source tags (usually seeds) by host.
- SourceTagsReport() - Constructor for class org.archive.crawler.reporting.SourceTagsReport
-
- specialQueryTemplates - Variable in class org.archive.modules.fetcher.FetchWhois
-
- SPECULATIVE_MISC - Static variable in class org.archive.modules.extractor.LinkContext
-
Stand-in value for speculative/aggressively extracted urls without
other context.
- speculativeFixup(String, UURI) - Static method in class org.archive.util.UriUtils
-
Perform additional fixup of likely-URI Strings
- splitH1userAgent(String, StringBuilder) - Method in class org.archive.crawler.migrate.MigrateH1to3Tool
-
- st.ata.util - package st.ata.util
-
- STANDARD_POLICIES - Static variable in class org.archive.modules.net.RobotsPolicy
-
- start() - Method in class org.archive.bdb.BdbModule
-
- start() - Method in class org.archive.crawler.framework.ActionDirectory
-
- start() - Method in class org.archive.crawler.framework.CheckpointService
-
- start() - Method in class org.archive.crawler.framework.CrawlController
-
- start() - Method in class org.archive.crawler.framework.Scoper
-
- start() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- start() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- start() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- start() - Method in class org.archive.crawler.processor.CrawlMapper
-
- start() - Method in class org.archive.crawler.processor.LexicalCrawlMapper
-
- start() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- start() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- start() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- start() - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- start() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- start() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- start() - Method in class org.archive.modules.fetcher.FetchWhois
-
- start() - Method in class org.archive.modules.net.BdbServerCache
-
- start() - Method in class org.archive.modules.Processor
-
- start() - Method in class org.archive.modules.ProcessorChain
-
- start() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- start() - Method in class org.archive.modules.recrawl.PersistLoadProcessor
-
- start() - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- start() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- start() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- start() - Method in class org.archive.spring.PathSharingContext
-
- startCheckpoint(Checkpoint) - Method in class org.archive.bdb.BdbModule
-
- startCheckpoint(Checkpoint) - Method in interface org.archive.checkpointing.Checkpointable
-
Note a checkpoint is about to begin.
- startCheckpoint(Checkpoint) - Method in class org.archive.crawler.framework.CrawlController
-
- startCheckpoint(Checkpoint) - Method in class org.archive.crawler.frontier.BdbFrontier
-
- startCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- startCheckpoint(Checkpoint) - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- startCheckpoint(Checkpoint) - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- startCheckpoint(Checkpoint) - Method in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- startCheckpoint(Checkpoint) - Method in class org.archive.modules.fetcher.BdbCookieStorage
-
- startCheckpoint(Checkpoint) - Method in class org.archive.modules.net.BdbServerCache
-
- startCheckpoint(Checkpoint) - Method in class org.archive.modules.Processor
-
- startCheckpoint(Checkpoint) - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- startContext() - Method in class org.archive.crawler.framework.CrawlJob
-
Start the context, catching and reporting any BeansExceptions.
- startManagerThread() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Start the dedicated thread with an independent view of the frontier's
state.
- startNewFilesOnCheckpoint - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
- state - Variable in class org.archive.crawler.event.CrawlStateEvent
-
- StatisticsLogFormatter - Class in org.archive.crawler.io
-
- StatisticsLogFormatter() - Constructor for class org.archive.crawler.io.StatisticsLogFormatter
-
- statisticsTracker - Variable in class org.archive.crawler.framework.CrawlController
-
Statistics tracking modules.
- statisticsTracker - Variable in class org.archive.crawler.prefetch.RuntimeLimitEnforcer
-
- StatisticsTracker - Class in org.archive.crawler.reporting
-
This is an implementation of the AbstractTracker.
- StatisticsTracker() - Constructor for class org.archive.crawler.reporting.StatisticsTracker
-
- StatSnapshotEvent - Class in org.archive.crawler.event
-
ApplicationEvent published when the StatisticsTracker takes its
sample of various statistics.
- StatSnapshotEvent(StatisticsTracker, CrawlStatSnapshot) - Constructor for class org.archive.crawler.event.StatSnapshotEvent
-
- STATUS_CODE_KEY - Static variable in interface org.archive.modules.writer.Kw3Constants
-
- statusCodeDistribution - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
Keep track of fetch status codes
- statusCodes - Variable in class org.archive.modules.deciderules.FetchStatusDecideRule
-
- std24 - Static variable in class st.ata.util.FPGenerator
-
A standard 24-bit fingerprint generator using
polynomials[0][24]
.
- std32 - Static variable in class st.ata.util.FPGenerator
-
A standard 32-bit fingerprint generator using
polynomials[0][32]
.
- std40 - Static variable in class st.ata.util.FPGenerator
-
A standard 40-bit fingerprint generator using
polynomials[0][40]
.
- std64 - Static variable in class st.ata.util.FPGenerator
-
The standard 64-bit fingerprint generator using
polynomials[0][64]
.
- stop() - Method in class org.archive.bdb.BdbModule
-
- stop() - Method in class org.archive.crawler.framework.ActionDirectory
-
- stop() - Method in class org.archive.crawler.framework.CheckpointService
-
- stop() - Method in class org.archive.crawler.framework.CrawlController
-
- stop() - Method in class org.archive.crawler.framework.Scoper
-
- stop() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- stop() - Method in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- stop() - Method in class org.archive.crawler.frontier.WorkQueueFrontier
-
- stop() - Method in class org.archive.crawler.processor.CrawlMapper
-
- stop() - Method in class org.archive.crawler.reporting.CrawlerLoggerModule
-
- stop() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- stop() - Method in class org.archive.crawler.util.BdbUriUniqFilter
-
- stop() - Method in class org.archive.modules.deciderules.DecideRuleSequence
-
- stop() - Method in class org.archive.modules.fetcher.AbstractCookieStorage
-
- stop() - Method in class org.archive.modules.fetcher.FetchHTTP
-
- stop() - Method in class org.archive.modules.fetcher.FetchWhois
-
- stop() - Method in class org.archive.modules.net.BdbServerCache
-
- stop() - Method in class org.archive.modules.Processor
-
- stop() - Method in class org.archive.modules.ProcessorChain
-
- stop() - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- stop() - Method in class org.archive.modules.recrawl.PersistLogProcessor
-
- stop() - Method in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- stop() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- store - Variable in class org.archive.crawler.frontier.precedence.PreloadedUriPrecedencePolicy
-
- store(CrawlURI) - Method in class org.archive.modules.recrawl.AbstractContentDigestHistory
-
Stores curi.getContentDigestHistory()
for the key
persistKeyFor(curi)
.
- store - Variable in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- store(CrawlURI) - Method in class org.archive.modules.recrawl.BdbContentDigestHistory
-
- store - Variable in class org.archive.modules.recrawl.PersistOnlineProcessor
-
- storeDNSRecord(CrawlURI, String, CrawlHost, Record[]) - Method in class org.archive.modules.fetcher.FetchDNS
-
- StoredQueue<E extends Serializable> - Class in org.archive.bdb
-
Queue backed by a JE Collections StoredSortedMap.
- StoredQueue(Database, Class<E>, StoredClassCatalog) - Constructor for class org.archive.bdb.StoredQueue
-
Create a StoredQueue backed by the given Database.
- storePaths - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Where to save files.
- STRING_URI_DETECTOR - Static variable in class org.archive.util.UriUtils
-
- STRING_URI_DETECTOR_EXCEPTIONS - Static variable in class org.archive.util.UriUtils
-
- StringExtractorTestBase - Class in org.archive.modules.extractor
-
- StringExtractorTestBase() - Constructor for class org.archive.modules.extractor.StringExtractorTestBase
-
- StringExtractorTestBase.TestData - Class in org.archive.modules.extractor
-
- StringExtractorTestBase.TestData(CrawlURI, Link) - Constructor for class org.archive.modules.extractor.StringExtractorTestBase.TestData
-
- StripExtraSlashes - Class in org.archive.modules.canonicalize
-
Strip any extra slashes, '/', found in the path.
- StripExtraSlashes() - Constructor for class org.archive.modules.canonicalize.StripExtraSlashes
-
- StripSessionCFIDs - Class in org.archive.modules.canonicalize
-
Strip cold fusion session ids.
- StripSessionCFIDs() - Constructor for class org.archive.modules.canonicalize.StripSessionCFIDs
-
- StripSessionIDs - Class in org.archive.modules.canonicalize
-
Strip known session ids.
- StripSessionIDs() - Constructor for class org.archive.modules.canonicalize.StripSessionIDs
-
- stripToMinimal() - Method in class org.archive.modules.CrawlURI
-
Remove all attributes set on this uri.
- StripUserinfoRule - Class in org.archive.modules.canonicalize
-
Strip any 'userinfo' found on http/https URLs.
- StripUserinfoRule() - Constructor for class org.archive.modules.canonicalize.StripUserinfoRule
-
- StripWWWNRule - Class in org.archive.modules.canonicalize
-
Strip any 'www[0-9]*' found on http/https URLs IF they have some
path/query component (content after third slash).
- StripWWWNRule() - Constructor for class org.archive.modules.canonicalize.StripWWWNRule
-
- StripWWWRule - Class in org.archive.modules.canonicalize
-
Strip any 'www' found on http/https URLs, IF they have some
path/query component (content after third slash).
- StripWWWRule() - Constructor for class org.archive.modules.canonicalize.StripWWWRule
-
- SUBARRAY_LENGTH_IN_LONGS - Static variable in class org.archive.util.BloomFilter64bit
-
number of longs in one subarray
- SUBARRAY_MASK - Static variable in class org.archive.util.BloomFilter64bit
-
mask for lowest SUBARRAY_POWER_OF_TWO bits
- SUBARRAY_POWER_OF_TWO - Static variable in class org.archive.util.BloomFilter64bit
-
power-of-two to use as maximum size of bitfield subarrays
- subclasses(Collection<? extends Object>, Class<Target>) - Static method in class org.archive.util.Transform
-
Returns a transform containing only objects of a given class.
- submitStatusFor(String) - Method in class org.archive.modules.forms.FormLoginProcessor
-
- subset(CrawlURI, Class<?>) - Method in class org.archive.modules.credential.CredentialStore
-
Return set made up of all credentials of the passed
type
.
- subset(CrawlURI, Class<?>, String) - Method in class org.archive.modules.credential.CredentialStore
-
Return set made up of all credentials of the passed
type
.
- substats - Variable in class org.archive.crawler.frontier.WorkQueue
-
Substats for all CrawlURIs in this group
- substats - Variable in class org.archive.modules.net.CrawlHost
-
- substats - Variable in class org.archive.modules.net.CrawlServer
-
- subtract(Histotable<K>) - Method in class org.archive.util.Histotable
-
- succeededFetchCount() - Method in interface org.archive.crawler.framework.Frontier
-
Number of successfully processed URIs.
- succeededFetchCount - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
- succeededFetchCount() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
(non-Javadoc)
- success - Variable in class org.archive.checkpointing.Checkpoint
-
- SUCCESS_KB - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- successBytes - Variable in class org.archive.modules.fetcher.FetchStats
-
- SuccessCountsQueuePrecedencePolicy - Class in org.archive.crawler.frontier.precedence
-
QueuePrecedencePolicy that sets a uri-queue's precedence to a configured
base value, then lowers its precedence with each tier of successful URIs
completed.
- SuccessCountsQueuePrecedencePolicy() - Constructor for class org.archive.crawler.frontier.precedence.SuccessCountsQueuePrecedencePolicy
-
- SUCCESSES - Static variable in class org.archive.crawler.prefetch.QuotaEnforcer
-
- suffixAtEnd - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
If true, the suffix is placed at the end of the path, after the query (if
any).
- summary() - Method in class org.archive.crawler.util.CrawledBytesHistotable
-
- SupplementaryLinksScoper - Class in org.archive.crawler.postprocessor
-
Run CrawlURI links carried in the passed CrawlURI through a filter
and 'handle' rejections.
- SupplementaryLinksScoper() - Constructor for class org.archive.crawler.postprocessor.SupplementaryLinksScoper
-
- Supplier<V> - Class in org.archive.util
-
Class for optionally providing one instance of the parameterized
type.
- Supplier() - Constructor for class org.archive.util.Supplier
-
- Supplier(V) - Constructor for class org.archive.util.Supplier
-
- supports(Class<?>) - Method in class org.archive.crawler.framework.CheckpointValidator
-
- supports(Class<?>) - Method in class org.archive.spring.BeanFieldsPatternValidator
-
- supportsCustomEditor() - Method in class org.archive.io.ReadSourceEditor
-
- supportsCustomEditor() - Method in class org.archive.spring.ConfigPathEditor
-
- SurtAuthorityQueueAssignmentPolicy - Class in org.archive.crawler.frontier
-
SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname.
- SurtAuthorityQueueAssignmentPolicy() - Constructor for class org.archive.crawler.frontier.SurtAuthorityQueueAssignmentPolicy
-
- SurtPrefixedDecideRule - Class in org.archive.modules.deciderules.surt
-
Rule applies configured decision to any URIs that, when
expressed in SURT form, begin with one of the prefixes
in the configured set.
- SurtPrefixedDecideRule() - Constructor for class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- surtPrefixes - Variable in class org.archive.crawler.spring.SurtPrefixesSheetAssociation
-
- surtPrefixes - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- SurtPrefixesSheetAssociation - Class in org.archive.crawler.spring
-
SheetAssociation applied on the basis of matching SURT prefixes.
- SurtPrefixesSheetAssociation() - Constructor for class org.archive.crawler.spring.SurtPrefixesSheetAssociation
-
- surtsDumpFile - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
Dump file to save SURT prefixes actually used: Useful debugging SURTs.
- surtsSource - Variable in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
Text from which to infer SURT prefixes.
- SURTTokenizer - Class in org.archive.surt
-
provides iterative Url reduction for prefix matching to find ever coarser
grained URL-specific configuration.
- SURTTokenizer(String) - Constructor for class org.archive.surt.SURTTokenizer
-
constructor
- symlink(String, String) - Method in interface org.archive.util.CLibrary
-
- sync() - Method in class org.archive.crawler.frontier.BdbMultipleWorkQueues
-
Method used by BdbFrontier during checkpointing.
- sync() - Method in class org.archive.util.ObjectIdentityBdbCache
-
Sync all in-memory map entries to backing disk store.
- sync() - Method in class org.archive.util.ObjectIdentityBdbManualCache
-
Sync all in-memory map entries to backing disk store.
- sync() - Method in interface org.archive.util.ObjectIdentityCache
-
force the persistent backend, if any, to be updated with all
live object state
- sync() - Method in class org.archive.util.ObjectIdentityMemCache
-
- tagDefineButton(int, Vector) - Method in class org.archive.modules.extractor.CustomSWFTags
-
- tagDefineButton2(int, boolean, Vector) - Method in class org.archive.modules.extractor.CustomSWFTags
-
- tagDefineSprite(int) - Method in class org.archive.modules.extractor.CustomSWFTags
-
- tagDoAction() - Method in class org.archive.modules.extractor.CustomSWFTags
-
- tagDoInActions(int) - Method in class org.archive.modules.extractor.CustomSWFTags
-
- tagPlaceObject2(boolean, int, int, int, Matrix, AlphaTransform, int, String, int) - Method in class org.archive.modules.extractor.CustomSWFTags
-
- tail(String) - Static method in class org.archive.crawler.util.LogReader
-
Implementation of a unix-like 'tail' command
- tail(String, int) - Static method in class org.archive.crawler.util.LogReader
-
Implementation of a unix-like 'tail -n' command
- tail(RandomAccessFile, int) - Static method in class org.archive.crawler.util.LogReader
-
Implementation of a unix-like 'tail -n' command
- tailFilter - Variable in class org.archive.crawler.restlet.EnhDirectory
-
- tailIndex - Variable in class org.archive.bdb.StoredQueue
-
- tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.crawler.frontier.AbstractFrontier
-
Report CrawlURI to each of the three 'substats' accumulators
(group/queue, server, host) for a given stage.
- tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.crawler.frontier.precedence.HighestUriQueuePrecedencePolicy.HighestUriPrecedenceProvider
-
- tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.crawler.frontier.precedence.PrecedenceProvider
-
- tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.crawler.frontier.WorkQueue
-
- tally(CrawlURI, FetchStats.Stage) - Method in interface org.archive.modules.fetcher.FetchStats.CollectsFetchStats
-
- tally(CrawlURI, FetchStats.Stage) - Method in class org.archive.modules.fetcher.FetchStats
-
- tally(K) - Method in class org.archive.util.Histotable
-
Record one more occurence of the given object key.
- tally(K, long) - Method in class org.archive.util.Histotable
-
Record count more occurence(s) of the given object key.
- tallyCurrentPause() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
For a current pause (if any), add paused time to total and reset
- tallySeeds() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
- targetSheetNames - Variable in class org.archive.crawler.spring.SheetAssociation
-
- targetSize - Variable in class org.archive.crawler.framework.ToePool
-
- targetState - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
Frontier.state that manager thread should seek to reach
- teardown() - Method in class org.archive.crawler.framework.CrawlJob
-
Ensure a fresh start for any configuration changes or relaunches,
by stopping and discarding an existing ApplicationContext.
- TempDirProvider - Interface in org.archive.modules.extractor
-
- template - Variable in class org.archive.modules.writer.WriterPoolProcessor
-
Template from which a filename is interpolated.
- terminate() - Method in class org.archive.crawler.framework.CrawlJob
-
- terminate() - Method in interface org.archive.crawler.framework.Frontier
-
Notify Frontier that it should end the crawl, giving
any worker ToeThread that askss for a next() an
EndedException.
- terminate() - Method in class org.archive.crawler.frontier.AbstractFrontier
-
- test(int) - Method in class org.archive.modules.deciderules.ResourceLongerThanDecideRule
-
- test(int) - Method in class org.archive.modules.deciderules.ResourceNoLongerThanDecideRule
-
- test(BeanWrapperImpl, Errors) - Method in class org.archive.spring.BeanFieldsPatternValidator.PropertyPatternRule
-
- TEST_TMP_SYSTEM_PROPERTY_NAME - Static variable in class org.archive.util.TmpDirTestCase
-
Name of the system property that holds pointer to tmp directory into
which we can safely write files.
- testAdd() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
-
check that we can add fingerprints
- testContains() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
-
check that contains() does what we expect
- testCount() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
-
check count works ok
- testExtraction() - Method in class org.archive.modules.extractor.StringExtractorTestBase
-
Tests each text/URI pair in the test data array.
- testFinished() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
-
Tests that a URI whose linkExtractionFinished flag has been set has
no links extracted.
- testRemove() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
-
test remove() works as expected
- testSerialization(Object) - Static method in class org.archive.util.TestUtils
-
- testSerializationIfAppropriate() - Method in class org.archive.state.ModuleTestBase
-
Tests that the module can be serialized.
- TestUtils - Class in org.archive.util
-
Utility methods useful in testing situations.
- TestUtils() - Constructor for class org.archive.util.TestUtils
-
- testWithZero() - Method in class org.archive.util.fingerprint.LongFPSetTestCase
-
check we can call add/remove/contains() with 0 as a value
- testZeroContent() - Method in class org.archive.modules.extractor.ContentExtractorTestBase
-
Tests that a URI with a zero content length has no links extracted.
- TextSeedModule - Class in org.archive.modules.seeds
-
Module that announces a list of seeds from a text source (such
as a ConfigFile or ConfigString), and provides a mechanism for
adding seeds after a crawl has begun.
- TextSeedModule() - Constructor for class org.archive.modules.seeds.TextSeedModule
-
- textSource - Variable in class org.archive.modules.seeds.TextSeedModule
-
Text from which to extract seeds
- threadBuffer - Variable in class org.archive.bdb.KryoBinding
-
- threadCount() - Method in class org.archive.crawler.reporting.StatisticsTracker
-
Get the total number of ToeThreads (sleeping and active)
- threadEngine - Variable in class org.archive.modules.deciderules.ScriptedDecideRule
-
- threadEngine - Variable in class org.archive.modules.ScriptedProcessor
-
- threadLogger - Static variable in class org.archive.crawler.reporting.AlertThreadGroup
-
- threadOverrides - Static variable in class org.archive.spring.KeyedProperties
-
ThreadLocal (contextual) collection of pushed override maps
- threadReport() - Method in class org.archive.crawler.framework.CrawlJob
-
- threadReportData() - Method in class org.archive.crawler.framework.CrawlJob
-
- timer - Variable in class org.archive.crawler.framework.CheckpointService
-
service for auto-checkpoint tasks at an interval
- TIMER_TRUNC - Static variable in interface org.archive.modules.CoreAttributeConstants
-
- TIMER_TRUNC - Static variable in class org.archive.modules.fetcher.FetchErrors
-
- timestamp - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- timestamp_interval - Variable in class org.archive.io.CrawlerJournal
-
number of lines between timestamps
- TLDs - Static variable in class org.archive.modules.extractor.ExtractorUniversal
-
Matches any string that begins with a TLD (no .) followed by a '/' slash
or end of string.
- tmpDir() - Static method in class org.archive.util.TestUtils
-
- tmpDir() - Static method in class org.archive.util.TmpDirTestCase
-
- TmpDirTestCase - Class in org.archive.util
-
Base class for TestCases that want access to a tmp dir for the writing
of files.
- TmpDirTestCase() - Constructor for class org.archive.util.TmpDirTestCase
-
- TmpDirTestCase(String) - Constructor for class org.archive.util.TmpDirTestCase
-
- toCheckpointJson() - Method in class org.archive.modules.extractor.Extractor
-
- toCheckpointJson() - Method in class org.archive.modules.forms.FormLoginProcessor
-
- toCheckpointJson() - Method in class org.archive.modules.Processor
-
Return a JSONObject of current stat that can be consulted
on recovery to restore necessary values.
- toCheckpointJson() - Method in class org.archive.modules.writer.WARCWriterProcessor
-
- toCheckpointJson() - Method in class org.archive.modules.writer.WriterPoolProcessor
-
- toDatabaseConfig() - Method in class org.archive.bdb.BdbModule.BdbConfig
-
- ToePool - Class in org.archive.crawler.framework
-
A collection of ToeThreads.
- ToePool(AlertThreadGroup, CrawlController) - Constructor for class org.archive.crawler.framework.ToePool
-
Constructor.
- ToeThread - Class in org.archive.crawler.framework
-
One "worker thread"; asks for CrawlURIs, processes them,
repeats unless told otherwise.
- ToeThread(ToePool, int) - Constructor for class org.archive.crawler.framework.ToeThread
-
Create a ToeThread
- ToeThread.Step - Enum in org.archive.crawler.framework
-
- ToeThreadsReport - Class in org.archive.crawler.reporting
-
Traditional report of all ToeThread call-stacks, as often consulted
to diagnose live crawl issues.
- ToeThreadsReport() - Constructor for class org.archive.crawler.reporting.ToeThreadsReport
-
- toExternalForm() - Method in class org.apache.commons.httpclient.Cookie
-
Return a textual representation of the cookie.
- tooLongDirectory - Variable in class org.archive.modules.writer.MirrorWriterProcessor
-
If all the directories in the URI would exceed, or come close to
exceeding, the file system maximum path length, then they are all
replaced by this.
- TooManyHopsDecideRule - Class in org.archive.modules.deciderules
-
Rule REJECTs any CrawlURIs whose total number of hops (length of the
hopsPath string, traversed links of any type) is over a threshold.
- TooManyHopsDecideRule() - Constructor for class org.archive.modules.deciderules.TooManyHopsDecideRule
-
Usual constructor.
- TooManyPathSegmentsDecideRule - Class in org.archive.modules.deciderules
-
Rule REJECTs any CrawlURIs whose total number of path-segments (as
indicated by the count of '/' characters not including the first '//')
is over a given threshold.
- TooManyPathSegmentsDecideRule() - Constructor for class org.archive.modules.deciderules.TooManyPathSegmentsDecideRule
-
Usual constructor.
- TopNSet - Class in org.archive.crawler.util
-
Counting Set which only remembers the 'top N' of all String values
reported (with counts) to it.
- TopNSet(int) - Constructor for class org.archive.crawler.util.TopNSet
-
- toString() - Method in class org.apache.commons.httpclient.Cookie
-
Return a textual representation of the cookie.
- toString() - Method in class org.apache.commons.httpclient.HttpState
-
Returns a string representation of this HTTP state.
- toString() - Method in class org.archive.crawler.frontier.WorkQueue
-
- toString() - Method in class org.archive.modules.CrawlURI
-
- toString() - Method in class org.archive.modules.extractor.HTMLLinkContext
-
- toString() - Method in class org.archive.modules.extractor.Link
-
- toString() - Method in class org.archive.modules.extractor.LinkContext.SimpleLinkContext
-
- toString() - Method in class org.archive.modules.forms.HTMLForm.FormInput
-
- toString() - Method in class org.archive.modules.forms.HTMLForm
-
- toString() - Method in class org.archive.modules.net.CrawlHost
-
- toString() - Method in class org.archive.modules.net.CrawlServer
-
- toString() - Method in class org.archive.util.ms.Piece
-
- toString() - Method in class org.archive.util.PaddingStringBuffer
-
- totalBudget - Variable in class org.archive.crawler.frontier.WorkQueue
-
Total to spend on this queue over its lifetime
- totalBytes - Variable in class org.archive.modules.fetcher.FetchStats
-
- totalCount() - Method in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- totalExpenditure - Variable in class org.archive.crawler.frontier.WorkQueue
-
Running tally of total expenditures on this queue
- totalKiBPerSec - Variable in class org.archive.crawler.reporting.CrawlStatSnapshot
-
- totalProcessedBytes - Variable in class org.archive.crawler.frontier.AbstractFrontier
-
Used when bandwidth constraint are used.
- totalScheduled - Variable in class org.archive.modules.fetcher.FetchStats
-
- trackSeeds - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
Whether to maintain seed disposition records (expensive in
crawls with millions of seeds)
- trackSources - Variable in class org.archive.crawler.reporting.StatisticsTracker
-
Whether to maintain hosts-per-source-tag records for; very expensive in
crawls with large numbers of source-tags (seeds) or large crawls
over many hosts
- transactional - Variable in class org.archive.bdb.BdbModule.BdbConfig
-
- TransclusionDecideRule - Class in org.archive.modules.deciderules
-
Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see
CandidateURI#getPathFromSeed()
) ends
with at least one, but not more than, the given number of
non-navlink ('L') hops.
- TransclusionDecideRule() - Constructor for class org.archive.modules.deciderules.TransclusionDecideRule
-
Usual constructor.
- transform(File, File, boolean) - Method in class org.archive.io.Arc2Warc
-
- transform(ARCReader, File) - Method in class org.archive.io.Arc2Warc
-
- transform(File, File, String, String, boolean) - Method in class org.archive.io.Warc2Arc
-
- transform(WARCReader, ARCWriter) - Method in class org.archive.io.Warc2Arc
-
- Transform<Original,Transformed> - Class in org.archive.util
-
A transformation of a collection.
- Transform(Collection<? extends Original>, Transformer<Original, Transformed>) - Constructor for class org.archive.util.Transform
-
Constructor.
- transform(Original) - Method in interface org.archive.util.Transformer
-
Transforms the given object.
- Transformer<Original,Transformed> - Interface in org.archive.util
-
Transforms objects from one thing into another.
- TrapSuppressExtractor - Class in org.archive.modules.extractor
-
Pseudo-extractor that suppresses link-extraction of likely trap pages,
by noticing when content's digest is identical to that of its 'via'.
- TrapSuppressExtractor() - Constructor for class org.archive.modules.extractor.TrapSuppressExtractor
-
Usual constructor.
- TRUNC_SUFFIX - Static variable in interface org.archive.modules.CoreAttributeConstants
-
Fetch truncation codes present in
CrawlURI
annotations.
- TRUNC_SUFFIX - Static variable in class org.archive.modules.fetcher.FetchErrors
-
Fetch truncation codes present in ProcessorURI annotations.
- tryAsScript(File, String) - Method in class org.archive.crawler.framework.ActionDirectory
-
Try the actionFile as a script, deducing the proper scripting
language from its file extension.
- tunnelCreated() - Method in class org.apache.commons.httpclient.HttpConnection
-
Instructs the proxy to establish a secure tunnel to the host.
- type - Variable in class org.archive.modules.forms.HTMLForm.FormInput
-
- VALID_DF_OUTPUT - Static variable in class org.archive.crawler.postprocessor.LowDiskPauseProcessor
-
Deprecated.
- validate(String, int, String, boolean, Cookie) - Method in interface org.apache.commons.httpclient.cookie.CookieSpec
-
Validate the cookie according to validation rules defined by the
cookie specification.
- validate(String, int, String, boolean, Cookie) - Method in class org.apache.commons.httpclient.cookie.CookieSpecBase
-
Performs most common
Cookie
validation
- validate(String, int, String, boolean, Cookie) - Method in class org.apache.commons.httpclient.cookie.IgnoreCookiesSpec
-
Does nothing.
- validate() - Method in class org.apache.commons.httpclient.HttpMethodBase
-
Returns true the method is ready to execute, false otherwise.
- validate(Object, Errors) - Method in class org.archive.crawler.framework.CheckpointValidator
-
- validate(Pattern, String) - Method in class org.archive.modules.writer.MirrorWriterProcessor
-
- validate(Object, Errors) - Method in class org.archive.spring.BeanFieldsPatternValidator
-
- validate() - Method in class org.archive.spring.PathSharingContext
-
- validateConfiguration() - Method in class org.archive.crawler.framework.CrawlJob
-
Does the assembled ApplicationContext self-validate? Any failures
are reported as WARNING log events in the job log.
- VALIDATOR - Static variable in class org.archive.crawler.framework.CheckpointService
-
- VALIDATOR - Static variable in class org.archive.modules.CrawlMetadata
-
- VALIDITY_STAMP_FILENAME - Static variable in class org.archive.checkpointing.Checkpoint
-
Name of file written with timestamp into valid checkpoints
- validRobots - Variable in class org.archive.modules.net.CrawlServer
-
- value - Variable in class org.archive.crawler.util.BdbUriUniqFilter
-
- value - Variable in class org.archive.io.ReadSourceEditor
-
- value - Variable in class org.archive.modules.forms.HTMLForm.FormInput
-
- value - Variable in class org.archive.spring.ConfigPathEditor
-
- value - Variable in class org.archive.spring.ConfigString
-
- valueOf(String) - Static method in enum org.archive.crawler.event.CrawlURIDispositionEvent.Disposition
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.crawler.framework.CrawlController.State
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.crawler.framework.CrawlStatus
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.crawler.framework.Frontier.State
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.crawler.framework.ToeThread.Step
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.crawler.prefetch.RuntimeLimitEnforcer.Operation
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.crawler.restlet.Flash.Kind
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.crawler.util.Logs
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.modules.CrawlURI.FetchType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.modules.deciderules.DecideResult
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.modules.deciderules.MatchesFilePatternDecideRule.Preset
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.modules.extractor.Hop
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.modules.fetcher.FetchStats.Stage
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.modules.fetcher.FetchWhois.UrlStatus
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.modules.ProcessResult.ProcessStatus
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum org.archive.util.ms.Entry.EntryType
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum org.archive.crawler.event.CrawlURIDispositionEvent.Disposition
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.crawler.framework.CrawlController.State
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.crawler.framework.CrawlStatus
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.crawler.framework.Frontier.State
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.crawler.framework.ToeThread.Step
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.crawler.prefetch.RuntimeLimitEnforcer.Operation
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.crawler.restlet.Flash.Kind
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.crawler.util.Logs
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.modules.CrawlURI.FetchType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.modules.deciderules.DecideResult
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.modules.deciderules.MatchesFilePatternDecideRule.Preset
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.modules.extractor.Hop
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.modules.fetcher.FetchStats.Stage
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.modules.fetcher.FetchWhois.UrlStatus
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values() - Static method in enum org.archive.modules.ProcessResult.ProcessStatus
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- values - Variable in class org.archive.util.fingerprint.MemLongFPSet
-
- values() - Static method in enum org.archive.util.ms.Entry.EntryType
-
Returns an array containing the constants of this enum type, in
the order they are declared.
- verifySerialization(Object, byte[], Object, byte[]) - Method in class org.archive.state.ModuleTestBase
-
Verifies that serialization was successful.
- VERY_LIKELY_RELATIVE_URI_PATTERN - Static variable in class org.archive.util.UriUtils
-
- ViewModel - Class in org.archive.crawler.restlet.models
-
- ViewModel() - Constructor for class org.archive.crawler.restlet.models.ViewModel
-