public class PDFParser extends Object
Modifier and Type | Field and Description |
---|---|
protected com.lowagie.text.pdf.PdfDictionary |
catalog |
protected byte[] |
document |
protected com.lowagie.text.pdf.PdfReader |
documentReader |
protected ArrayList<ArrayList<Integer>> |
encounteredReferences |
protected ArrayList<String> |
foundURIs |
Constructor and Description |
---|
PDFParser(byte[] doc) |
PDFParser(String doc) |
Modifier and Type | Method and Description |
---|---|
ArrayList<String> |
extractURIs()
Extract URIs from all objects found in a Pdf document's catalog.
|
protected void |
extractURIs(com.lowagie.text.pdf.PdfObject entity)
Parse a PdfDictionary, looking for URIs recursively and adding
them to foundURIs
|
protected void |
getInFromFile(String doc)
Read a file named 'doc' and store its' bytes for later processing.
|
ArrayList<String> |
getURIs()
Get a list of URIs retrieved from the Pdf during the
extractURIs operation.
|
protected boolean |
haveSeen(int generation,
int id)
Indicates, based on a PDFObject's generation/id pair whether
the parser has already encountered this object (or a reference to it)
so we don't infinitely loop on circuits within the PDF.
|
protected void |
initialize()
Initialize opens the document for reading.
|
static void |
main(String[] argv) |
protected void |
markAsSeen(int generation,
int id)
Note that an object (id/generation pair) has been seen by this parser
so that it can be handled differently when it is encountered again.
|
protected void |
resetState()
Reinitialize the object as though a new one were created.
|
void |
resetState(byte[] doc)
Reset the object and initialize it with a new byte array (the document).
|
void |
resetState(String doc)
Reinitialize the object as though a new one were created, complete
with a valid pointer to a document that can be read
|
protected com.lowagie.text.pdf.PdfReader documentReader
protected byte[] document
protected com.lowagie.text.pdf.PdfDictionary catalog
public PDFParser(String doc) throws IOException
IOException
public PDFParser(byte[] doc) throws IOException
IOException
protected void resetState()
public void resetState(byte[] doc) throws IOException
doc
- IOException
public void resetState(String doc) throws IOException
doc
- IOException
protected void getInFromFile(String doc) throws IOException
doc
- IOException
protected boolean haveSeen(int generation, int id)
generation
- id
- protected void markAsSeen(int generation, int id)
generation
- id
- public ArrayList<String> getURIs()
protected void initialize() throws IOException
IOException
public ArrayList<String> extractURIs()
protected void extractURIs(com.lowagie.text.pdf.PdfObject entity)
entity
- public static void main(String[] argv)
Copyright © 2003-2014 Internet Archive. All Rights Reserved.