org.archive.io
Class ArchiveRecord

java.lang.Object
  extended by java.io.InputStream
      extended by org.archive.io.ArchiveRecord
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
ARCRecord, HeaderedArchiveRecord, WARCRecord

public abstract class ArchiveRecord
extends InputStream

Archive file Record.

Version:
$Date$ $Version$
Author:
stack

Field Summary
protected  MessageDigest digest
          Compute digest on what we read and add to metadata when done.
protected static long MIN_HTTP_HEADER_LENGTH
          Minimal http response or request header length.
 
Constructor Summary
ArchiveRecord(InputStream in)
          Constructor.
ArchiveRecord(InputStream in, ArchiveRecordHeader header)
          Constructor.
ArchiveRecord(InputStream in, ArchiveRecordHeader header, int bodyOffset, boolean digest, boolean strict)
          Constructor.
 
Method Summary
 int available()
          This available is not the stream's available.
 void close()
          Calling close on a record skips us past this record to the next record in the stream.
 void dump()
          Writes output on STDOUT.
 void dump(OutputStream os)
          Writes output on passed os.
protected  String getDigest4Cdx(ArchiveRecordHeader h)
           
 String getDigestStr()
           
 ArchiveRecordHeader getHeader()
           
protected  InputStream getIn()
           
protected  String getIp4Cdx(ArchiveRecordHeader h)
           
protected  String getMimetype4Cdx(ArchiveRecordHeader h)
           
protected  long getPosition()
           
protected  String getStatusCode4Cdx(ArchiveRecordHeader h)
           
 boolean hasContentHeaders()
          Is it likely that this record contains headers? This method will return true if the body is a http response that includes http response headers or the body is a http request that includes request headers, etc.
protected  void incrementPosition()
           
protected  void incrementPosition(long incr)
           
protected  boolean isEor()
           
 boolean isStrict()
           
 boolean markSupported()
           
protected  String outputCdx(String strippedFileName)
           
 int read()
           
 int read(byte[] b, int offset, int length)
           
protected  void setBodyOffset(int bodyOffset)
           
protected  void setEor(boolean eor)
           
protected  void setHeader(ArchiveRecordHeader header)
           
 void setStrict(boolean strict)
           
 long skip(long n)
           
 
Methods inherited from class java.io.InputStream
mark, read, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MIN_HTTP_HEADER_LENGTH

protected static final long MIN_HTTP_HEADER_LENGTH
Minimal http response or request header length. I've seen in arcs content length of 1 with no header.


digest

protected MessageDigest digest
Compute digest on what we read and add to metadata when done. Currently hardcoded as sha-1. TODO: Remove when archive records digest or else, add a facility that allows the arc reader to compare the calculated digest to that which is recorded in the arc.

Protected instead of private so subclasses can update and complete the digest.

Constructor Detail

ArchiveRecord

public ArchiveRecord(InputStream in)
              throws IOException
Constructor.

Parameters:
in - Stream cue'd up to be at the start of the record this instance is to represent.
Throws:
IOException

ArchiveRecord

public ArchiveRecord(InputStream in,
                     ArchiveRecordHeader header)
              throws IOException
Constructor.

Parameters:
in - Stream cue'd up to be at the start of the record this instance is to represent.
header - Header data.
Throws:
IOException

ArchiveRecord

public ArchiveRecord(InputStream in,
                     ArchiveRecordHeader header,
                     int bodyOffset,
                     boolean digest,
                     boolean strict)
              throws IOException
Constructor.

Parameters:
in - Stream cue'd up to be at the start of the record this instance is to represent.
header - Header data.
bodyOffset - Offset into the body. Usually 0.
digest - True if we're to calculate digest for this record. Not digesting saves about ~15% of cpu during an ARC parse.
strict - Be strict parsing (Parsing stops if ARC inproperly formatted).
Throws:
IOException
Method Detail

markSupported

public boolean markSupported()
Overrides:
markSupported in class InputStream

getHeader

public ArchiveRecordHeader getHeader()
Returns:
Header data for this record.

setHeader

protected void setHeader(ArchiveRecordHeader header)

close

public void close()
           throws IOException
Calling close on a record skips us past this record to the next record in the stream. It does not actually close the stream. The underlying steam is probably being used by the next arc record.

Specified by:
close in interface Closeable
Overrides:
close in class InputStream
Throws:
IOException

read

public int read()
         throws IOException
Specified by:
read in class InputStream
Returns:
Next character in this Record content else -1 if at EOR.
Throws:
IOException

read

public int read(byte[] b,
                int offset,
                int length)
         throws IOException
Overrides:
read in class InputStream
Throws:
IOException

available

public int available()
This available is not the stream's available. Its an available based on what the stated Archive record length is minus what we've read to date.

Overrides:
available in class InputStream
Returns:
True if bytes remaining in record content.

skip

public long skip(long n)
          throws IOException
Overrides:
skip in class InputStream
Throws:
IOException

isStrict

public boolean isStrict()
Returns:
Returns the strict.

setStrict

public void setStrict(boolean strict)
Parameters:
strict - The strict to set.

getIn

protected InputStream getIn()

getDigestStr

public String getDigestStr()

incrementPosition

protected void incrementPosition()

incrementPosition

protected void incrementPosition(long incr)

getPosition

protected long getPosition()

isEor

protected boolean isEor()

setEor

protected void setEor(boolean eor)

getStatusCode4Cdx

protected String getStatusCode4Cdx(ArchiveRecordHeader h)

getIp4Cdx

protected String getIp4Cdx(ArchiveRecordHeader h)

getDigest4Cdx

protected String getDigest4Cdx(ArchiveRecordHeader h)

getMimetype4Cdx

protected String getMimetype4Cdx(ArchiveRecordHeader h)

outputCdx

protected String outputCdx(String strippedFileName)
                    throws IOException
Throws:
IOException

dump

public void dump()
          throws IOException
Writes output on STDOUT.

Throws:
IOException

dump

public void dump(OutputStream os)
          throws IOException
Writes output on passed os.

Throws:
IOException

hasContentHeaders

public boolean hasContentHeaders()
Is it likely that this record contains headers? This method will return true if the body is a http response that includes http response headers or the body is a http request that includes request headers, etc. Be aware that headers in content are distinct from ArchiveRecordHeader 'headers'.

Returns:
True if this Record's content has headers:

setBodyOffset

protected void setBodyOffset(int bodyOffset)


Copyright © 2003-2012 Internet Archive. All Rights Reserved.