|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.archive.io.ArchiveReader
public abstract class ArchiveReader
Reader for an Archive file of Archive ArchiveRecord
s.
Nested Class Summary | |
---|---|
protected class |
ArchiveReader.ArchiveRecordIterator
Inner ArchiveRecord Iterator class. |
Field Summary | |
---|---|
protected InputStream |
in
Archive file input stream. |
static int |
MAX_ALLOWED_RECOVERABLES
Maximum amount of recoverable exceptions in a row. |
Fields inherited from interface org.archive.io.ArchiveFileConstants |
---|
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DOT_COMPRESSED_FILE_EXTENSION, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY |
Constructor Summary | |
---|---|
protected |
ArchiveReader()
|
Method Summary | |
---|---|
protected void |
cdxOutput(boolean toFile)
|
protected void |
cleanupCurrentRecord()
Cleanout the current record if there is one. |
void |
close()
|
protected abstract ArchiveRecord |
createArchiveRecord(InputStream is,
long offset)
Return an Archive Record homed on offset into
is . |
protected ArchiveRecord |
currentRecord(ArchiveRecord currentRecord)
|
abstract void |
dump(boolean compress)
Dump this file on STDOUT |
ArchiveRecord |
get()
|
ArchiveRecord |
get(long offset)
Get record at passed offset . |
protected ArchiveRecord |
getCurrentRecord()
|
abstract ArchiveReader |
getDeleteFileOnCloseReader(File f)
|
abstract String |
getDotFileExtension()
|
abstract String |
getFileExtension()
|
String |
getFileName()
|
protected InputStream |
getIn()
|
protected InputStream |
getInputStream(File f,
long offset)
Convenience method for constructors. |
protected Logger |
getLogger()
|
protected static org.apache.commons.cli.Options |
getOptions()
|
String |
getReaderIdentifier()
|
String |
getStrippedFileName()
|
static String |
getStrippedFileName(String name,
String dotFileExtension)
|
protected static boolean |
getTrueOrFalse(String value)
|
String |
getVersion()
|
protected abstract void |
gotoEOR(ArchiveRecord record)
Skip over any trailing new lines at end of the record so we're lined up ready to read the next. |
protected void |
initialize(String i)
Convenience method used by subclass constructors. |
boolean |
isCompressed()
|
boolean |
isDigest()
|
boolean |
isStrict()
|
boolean |
isValid()
Test Archive file is valid. |
Iterator<ArchiveRecord> |
iterator()
Returns an ArchiveRecord iterator. |
void |
logStdErr(Level level,
String message)
Log on stderr. |
protected boolean |
output(String format)
|
protected static void |
outputRecord(ArchiveReader r,
String format)
Output passed record using passed format specifier. |
boolean |
outputRecord(String format)
Output passed record using passed format specifier. |
protected static long |
positionForRecord(InputStream in)
|
protected void |
setCompressed(boolean compressed)
|
void |
setDigest(boolean d)
|
protected void |
setIn(InputStream in)
|
protected void |
setReaderIdentifier(String i)
|
void |
setStrict(boolean s)
|
protected void |
setVersion(String version)
|
protected static String |
stripExtension(String name,
String ext)
|
List<ArchiveRecordHeader> |
validate()
Validate the Archive file. |
List<ArchiveRecordHeader> |
validate(int numRecords)
Validate the Archive file. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected InputStream in
Set in constructor. Should support at least 1 byte mark/reset. Make it protected so subclasses have access.
public static final int MAX_ALLOWED_RECOVERABLES
Constructor Detail |
---|
protected ArchiveReader()
Method Detail |
---|
protected void initialize(String i)
i
- Identifier for Archive file this reader goes against.protected InputStream getInputStream(File f, long offset) throws IOException
f
- File to read.offset
- Offset at which to start reading.
IOException
- If failed open or fail to get a memory
mapped byte buffer on file.public boolean isCompressed()
public ArchiveRecord get(long offset) throws IOException
offset
.
offset
- Byte index into file at which a record starts.
IOException
public ArchiveRecord get() throws IOException
IOException
public void close() throws IOException
IOException
protected void cleanupCurrentRecord() throws IOException
IOException
protected abstract ArchiveRecord createArchiveRecord(InputStream is, long offset) throws IOException
offset
into
is
.
is
- Stream to read Record from.offset
- Offset to find Record at.
IOException
protected abstract void gotoEOR(ArchiveRecord record) throws IOException
record
-
IOException
public abstract String getFileExtension()
public abstract String getDotFileExtension()
public String getVersion()
public List<ArchiveRecordHeader> validate() throws IOException
Assumes the stream is at the start of the file.
IOException
public List<ArchiveRecordHeader> validate(int numRecords) throws IOException
We start validation from wherever we are in the stream.
numRecords
- Number of records expected. Pass -1 if number is
unknown.
IOException
public boolean isValid()
public boolean isStrict()
public void setStrict(boolean s)
s
- The strict to set.public void setDigest(boolean d)
d
- True if we're to digest.public boolean isDigest()
protected Logger getLogger()
public Iterator<ArchiveRecord> iterator()
strict
is not set, this will usually succeed.
iterator
in interface Iterable<ArchiveRecord>
protected void setCompressed(boolean compressed)
protected ArchiveRecord getCurrentRecord()
get()
protected ArchiveRecord currentRecord(ArchiveRecord currentRecord)
protected InputStream getIn()
protected void setIn(InputStream in)
protected void setVersion(String version)
public String getReaderIdentifier()
protected void setReaderIdentifier(String i)
public void logStdErr(Level level, String message)
level
- Level to log message at.message
- Message to log.protected static long positionForRecord(InputStream in)
protected static String stripExtension(String name, String ext)
public String getFileName()
public String getStrippedFileName()
public static String getStrippedFileName(String name, String dotFileExtension)
name
- Name of ARCFile.dotFileExtension
- '.arc' or '.warc', etc.
protected static boolean getTrueOrFalse(String value)
value
- Value to test.
protected boolean output(String format) throws IOException, ParseException
format
- Format to use outputting.
IOException
ParseException
protected void cdxOutput(boolean toFile) throws IOException
IOException
public boolean outputRecord(String format) throws IOException
format
- What format to use outputting.
IOException
public abstract void dump(boolean compress) throws IOException, ParseException
compress
- True if dumped output is compressed.
IOException
ParseException
public abstract ArchiveReader getDeleteFileOnCloseReader(File f)
protected static void outputRecord(ArchiveReader r, String format) throws IOException
r
- ARCReader instance to output.format
- What format to use outputting.
IOException
protected static org.apache.commons.cli.Options getOptions()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |