public abstract class Processor extends Object implements HasKeyedProperties, org.springframework.context.Lifecycle, org.springframework.beans.factory.BeanNameAware, Checkpointable
Modifier and Type | Field and Description |
---|---|
protected String |
beanName |
protected boolean |
isRunning |
protected KeyedProperties |
kp |
protected Checkpoint |
recoveryCheckpoint |
protected AtomicLong |
uriCount
The number of URIs processed by this processor.
|
Constructor and Description |
---|
Processor() |
Modifier and Type | Method and Description |
---|---|
void |
doCheckpoint(Checkpoint checkpointInProgress)
Do the actual checkpoint.
|
void |
finishCheckpoint(Checkpoint checkpointInProgress)
Cleanup/unlock; need not complete for a checkpoint to be valid.
|
static String |
flattenVia(CrawlURI puri) |
protected void |
fromCheckpointJson(org.json.JSONObject json)
Restore internal state from JSONObject stored at earlier
checkpoint-time.
|
String |
getBeanName() |
boolean |
getEnabled() |
KeyedProperties |
getKeyedProperties() |
static long |
getRecordedSize(CrawlURI puri) |
DecideRule |
getShouldProcessRule() |
long |
getURICount()
Returns the number of URIs this processor has handled.
|
static boolean |
hasHttpAuthenticationCredential(CrawlURI puri) |
protected abstract void |
innerProcess(CrawlURI uri)
Actually performs the process.
|
protected ProcessResult |
innerProcessResult(CrawlURI uri) |
protected void |
innerRejectProcess(CrawlURI uri)
Invoked after a URI has been rejected.
|
boolean |
isRunning() |
static boolean |
isSuccess(CrawlURI puri) |
ProcessResult |
process(CrawlURI uri)
Processes the given URI.
|
String |
report() |
void |
setBeanName(String name) |
void |
setEnabled(boolean enabled) |
void |
setRecoveryCheckpoint(Checkpoint checkpoint)
Used to inform a bean that it should restore its state from
the given Checkpoint when launched (Lifecycle start()).
|
void |
setShouldProcessRule(DecideRule rule) |
protected abstract boolean |
shouldProcess(CrawlURI uri)
Determines whether the given uri should be processed by this
processor.
|
void |
start() |
void |
startCheckpoint(Checkpoint checkpointInProgress)
Note a checkpoint is about to begin.
|
void |
stop() |
protected org.json.JSONObject |
toCheckpointJson()
Return a JSONObject of current stat that can be consulted
on recovery to restore necessary values.
|
protected KeyedProperties kp
protected String beanName
protected AtomicLong uriCount
protected boolean isRunning
protected Checkpoint recoveryCheckpoint
public KeyedProperties getKeyedProperties()
getKeyedProperties
in interface HasKeyedProperties
public String getBeanName()
public void setBeanName(String name)
setBeanName
in interface org.springframework.beans.factory.BeanNameAware
public boolean getEnabled()
public void setEnabled(boolean enabled)
public DecideRule getShouldProcessRule()
public void setShouldProcessRule(DecideRule rule)
public ProcessResult process(CrawlURI uri) throws InterruptedException
#ENABLED
and
#DECIDE_RULES
. If ENABLED is false, then nothing happens.
If the DECIDE_RULES indicate REJECT, then the
#innerRejectProcess(ProcessorURI)
method is invoked, and
the process method returns.
Next, the #shouldProcess(ProcessorURI)
method is
consulted to see if this Processor knows how to handle the given
URI. If it returns false, then nothing futher occurs.
FIXME: Should innerRejectProcess be called when ENABLED is false, or when shouldProcess returns false? The previous Processor implementation didn't handle it that way.
Otherwise, the URI is considered valid. This processor's count
of handled URIs is incremented, and the
#innerProcess(ProcessorURI)
method is invoked to actually
perform the process.
uri
- The URI to processInterruptedException
- if the thread is interruptedpublic long getURICount()
#ENABLED
flag, by the #DECIDE_RULES
, or by the
#shouldProcess(ProcessorURI)
method.protected abstract boolean shouldProcess(CrawlURI uri)
uri
- the URI to testprotected ProcessResult innerProcessResult(CrawlURI uri) throws InterruptedException
InterruptedException
protected abstract void innerProcess(CrawlURI uri) throws InterruptedException
#ENABLED
, the
#DECIDE_RULES
and the #shouldProcess(ProcessorURI)
tests.uri
- the URI to processInterruptedException
- if the thread is interruptedprotected void innerRejectProcess(CrawlURI uri) throws InterruptedException
uri
- the URI that was rejectedInterruptedException
- if the thread is interruptedpublic static boolean isSuccess(CrawlURI puri)
public static long getRecordedSize(CrawlURI puri)
public static boolean hasHttpAuthenticationCredential(CrawlURI puri)
public String report()
public boolean isRunning()
isRunning
in interface org.springframework.context.Lifecycle
public void start()
start
in interface org.springframework.context.Lifecycle
public void stop()
stop
in interface org.springframework.context.Lifecycle
public void startCheckpoint(Checkpoint checkpointInProgress)
Checkpointable
startCheckpoint
in interface Checkpointable
checkpointInProgress
- Checkpointpublic void doCheckpoint(Checkpoint checkpointInProgress) throws IOException
Checkpointable
doCheckpoint
in interface Checkpointable
checkpointInProgress
- CheckpointIOException
protected org.json.JSONObject toCheckpointJson() throws org.json.JSONException
org.json.JSONException
protected void fromCheckpointJson(org.json.JSONObject json) throws org.json.JSONException
json
- JSONObjectorg.json.JSONException
public void finishCheckpoint(Checkpoint checkpointInProgress)
Checkpointable
finishCheckpoint
in interface Checkpointable
checkpointInProgress
- Checkpointpublic void setRecoveryCheckpoint(Checkpoint checkpoint)
Checkpointable
setRecoveryCheckpoint
in interface Checkpointable
checkpoint
- CheckpointCopyright © 2003-2014 Internet Archive. All Rights Reserved.