I think, therefore I blog

How to Test if a File is PDF (Java)

By , 26 February 2016

How to Test if a File is PDF (Java)
How to Test if a File is PDF (Java)

Here is a simple java function to check if some data represents a PDF file. It is adapted from a C# function posted on Stack Overflow by NinjaCross.

    /**
     * Test if the data in the given byte array represents a PDF file.
     */
    public static boolean is_pdf(byte[] data) {
        if (data != null && data.length > 4 &&
                data[0] == 0x25 && // %
                data[1] == 0x50 && // P
                data[2] == 0x44 && // D
                data[3] == 0x46 && // F
                data[4] == 0x2D) { // -

            // version 1.3 file terminator
            if (data[5] == 0x31 && data[6] == 0x2E && data[7] == 0x33 &&
                    data[data.length - 7] == 0x25 && // %
                    data[data.length - 6] == 0x25 && // %
                    data[data.length - 5] == 0x45 && // E
                    data[data.length - 4] == 0x4F && // O
                    data[data.length - 3] == 0x46 && // F
                    data[data.length - 2] == 0x20 && // SPACE
                    data[data.length - 1] == 0x0A) { // EOL
                return true;
            }

            // version 1.3 file terminator
            if (data[5] == 0x31 && data[6] == 0x2E && data[7] == 0x34 &&
                    data[data.length - 6] == 0x25 && // %
                    data[data.length - 5] == 0x25 && // %
                    data[data.length - 4] == 0x45 && // E
                    data[data.length - 3] == 0x4F && // O
                    data[data.length - 2] == 0x46 && // F
                    data[data.length - 1] == 0x0A) { // EOL
                return true;
            }
        }
        return false;
    }

The function takes a byte[] to make it possible to test data in streams and files. To read a file into a byte array you can use java.nio.Files.readAllBytes, e.g:

    assertTrue(is_pdf(Files.readAllBytes(Paths.get("output.pdf"));

You'll have to handle the IOExceptions from those methods too. To test streamed output, just render to a ByteArrayOutputStream and call the getBytes() method after the output has rendered.

The unit tests for this function are below. Please feel free to contribute test cases, especially ones that fail.

    @Test
    public void test_valid_pdf_1_3_data_is_pdf() {
        assertTrue(is_pdf("%PDF-1.3 CONTENT %%EOF \n".getBytes()));
    }

    @Test
    public void test_valid_pdf_1_4_data_is_pdf() {
        assertTrue(is_pdf("%PDF-1.4 CONTENT %%EOF\n".getBytes()));
    }

    @Test
    public void test_invalid_data_is_not_pdf() {
        assertFalse(is_pdf("Hello World".getBytes()));
    }
How to Test if a File is PDF (Java)
 

About Roger Keays

How to Test if a File is PDF (Java)

Roger Keays is an artist, an engineer, and a student of life. Since he left Australia in 2009, he has been living as a digital nomad in over 40 different countries around the world. Roger is addicted to surfing. His other interests are music, psychology, languages, and finding good food. Click here to subscribe to his weekly blog, or stalk him on Facebook and Twitter.

Leave a Comment

Please visit https://RogerKeays.com/how-to-test-if-a-file-is-pdf-java to add your comments.

Join 3,559 People Who Think Outside The Box

I write every Sunday about travel, psychology, technology, and all sorts of interesting stuff. It's completely free, and you can subscribe for as long as you like. Do it now, so you don't miss a single post.

Chat For A While

Your Vote Matters

Which Tech Company is the Creepiest?
Facebook
Microsoft
Google
Amazon
Wikipedia
Wikileaks
Redtube