Friday, July 17, 2009

Simple Digest to Test File Uniqueness

Here is a simple way to use the groovy Digest class from org.raincity.glib.crypto package. It's limited to reasonably small files, but could be used with chunked sections of files. In any case, here is the test script:

void testStringDigest() {
def algorithm = 'SHA-512'
def s1 = 'this is test 1'
def s2 = 'this is test 2'

def hash1 = new Digest().createHashString( s1, algorithm )
def hash2 = new Digest().createHashString( s2, algorithm )

println "h1 = ${hash1}"
println "h2 = ${hash2}"

assert hash1 != hash2
}

s1 = the byte stream from the first file, s2 is the byte stream from the second file. If the hashes match, then the files are identical, i.e., not unique.

Digest's default algorithm is SHA-256, but SHA-512 is available in java > 1.4 and is much stronger. The 256 default was designed to match the SHA-256 limit for Adobe/Flex.

Thanks to Brad Rhoads for inspiring this post...