Interface | Description |
---|---|
ExternalGeoLookupInterface |
Interface used by
ExternalImplDecideRule . |
Class | Description |
---|---|
AcceptDecideRule | |
AddRedirectFromRootServerToScope | |
ContentLengthDecideRule | |
ContentTypeMatchesRegexDecideRule |
DecideRule whose decision is applied if the URI's content-type
is present and matches the supplied regular expression.
|
ContentTypeNotMatchesRegexDecideRule |
DecideRule whose decision is applied if the URI's content-type
is present and does not match the supplied regular expression.
|
DecideRule | |
DecideRuleSequence | |
ExternalGeoLocationDecideRule |
A rule that can be configured to take alternate implementations
of the ExternalGeoLocationInterface.
|
FetchStatusDecideRule |
Rule applies the configured decision for any URI which has a
fetch status equal to the 'target-status' setting.
|
FetchStatusMatchesRegexDecideRule | |
FetchStatusNotMatchesRegexDecideRule | |
HasViaDecideRule |
Rule applies the configured decision for any URI which has a 'via'
(essentially, any URI that was a seed or some kinds of mid-crawl adds).
|
HopCrossesAssignmentLevelDomainDecideRule |
Applies its decision if the current URI differs in that portion of
its hostname/domain that is assigned/sold by registrars, its
'assignment-level-domain' (ALD) (AKA 'public suffix' or in previous
Heritrix versions, 'topmost assigned SURT')
|
HopsPathMatchesRegexDecideRule |
Rule applies configured decision to any CrawlURIs whose 'hops-path'
(string like "LLXE" etc.) matches the supplied regex.
|
IpAddressSetDecideRule |
IpAddressSetDecideRule must be used with
Preselector.setRecheckScope(boolean) set
to true because it relies on Heritrix' dns lookup to establish the ip address
for a URI before it can run. |
MatchesFilePatternDecideRule |
Compares suffix of a passed CrawlURI, UURI, or String against a regular
expression pattern, applying its configured decision to all matches.
|
MatchesListRegexDecideRule |
Rule applies configured decision to any CrawlURIs whose String URI
matches the supplied regexs.
|
MatchesRegexDecideRule |
Rule applies configured decision to any CrawlURIs whose String URI
matches the supplied regex.
|
MatchesStatusCodeDecideRule |
Provides a rule that returns "true" for any CrawlURIs which have a fetch
status code that falls within the provided inclusive range.
|
NotMatchesFilePatternDecideRule |
Rule applies configured decision to any URIs which do *not*
match the supplied (file-pattern) regex.
|
NotMatchesListRegexDecideRule |
Rule applies configured decision to any URIs which do *not*
match the supplied regex.
|
NotMatchesRegexDecideRule |
Rule applies configured decision to any URIs which do *not*
match the supplied regex.
|
NotMatchesStatusCodeDecideRule |
Provides a rule that returns "true" for any CrawlURIs which has a fetch
status code that does not fall within the provided inclusive range.
|
PathologicalPathDecideRule |
Rule REJECTs any URI which contains an excessive number of identical,
consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a'
segments)
|
PredicatedDecideRule |
Rule which applies the configured decision only if a
test evaluates to true.
|
PrerequisiteAcceptDecideRule |
Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in
the last hopsPath position).
|
RejectDecideRule | |
ResourceLongerThanDecideRule |
Applies configured decision for URIs with content length greater than
a given threshold length value.
|
ResourceNoLongerThanDecideRule |
Applies configured decision for URIs with content length less than or equal
to a given threshold length value.
|
ResponseContentLengthDecideRule |
Decide rule that will ACCEPT or REJECT a uri, depending on the
"decision" property, after it's fetched, if the content body is within a
specified size range, specified in bytes.
|
SchemeNotInSetDecideRule |
Rule applies the configured decision (default REJECT) for any URI which
has a URI-scheme NOT contained in the configured Set.
|
ScriptedDecideRule |
Rule which runs a JSR-223 script to make its decision.
|
SeedAcceptDecideRule |
Rule which ACCEPTs all 'seed' URIs (those for which
isSeed is true).
|
TooManyHopsDecideRule |
Rule REJECTs any CrawlURIs whose total number of hops (length of the
hopsPath string, traversed links of any type) is over a threshold.
|
TooManyPathSegmentsDecideRule |
Rule REJECTs any CrawlURIs whose total number of path-segments (as
indicated by the count of '/' characters not including the first '//')
is over a given threshold.
|
TransclusionDecideRule |
Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see
CandidateURI#getPathFromSeed() ) ends
with at least one, but not more than, the given number of
non-navlink ('L') hops. |
Enum | Description |
---|---|
DecideResult |
The decision of a DecideRule.
|
MatchesFilePatternDecideRule.Preset |
Copyright © 2003-2014 Internet Archive. All Rights Reserved.