Friday, July 17, 2009

Simple Digest to Test File Uniqueness

Here is a simple way to use the groovy Digest class from org.raincity.glib.crypto package. It's limited to reasonably small files, but could be used with chunked sections of files. In any case, here is the test script:

void testStringDigest() {
def algorithm = 'SHA-512'
def s1 = 'this is test 1'
def s2 = 'this is test 2'

def hash1 = new Digest().createHashString( s1, algorithm )
def hash2 = new Digest().createHashString( s2, algorithm )

println "h1 = ${hash1}"
println "h2 = ${hash2}"

assert hash1 != hash2
}

s1 = the byte stream from the first file, s2 is the byte stream from the second file. If the hashes match, then the files are identical, i.e., not unique.

Digest's default algorithm is SHA-256, but SHA-512 is available in java > 1.4 and is much stronger. The 256 default was designed to match the SHA-256 limit for Adobe/Flex.

Thanks to Brad Rhoads for inspiring this post...

Thursday, June 4, 2009

Amazon S3 Access with JetS3t

Back in my ruby/rails days, I always accessed Amazon S3 from the command line. No that I'm coding in groovy/grails, I thought I would search for a groovy solution. Well, I found one, but it's not ready for prime-time. But, I did come across JetS3t, written in java with examples in groovy, so I thought I would give it a try.

After downloading, I opened the README and found that JetS3t comes with a nifty set of stand-alone and web based UI application/applets for accessing, browsing and uploading files. The application is called Cockpit and I was able to easily use it inside Firefox and as a standalone Java application.

Find out more about JetS3t suite of applications and be sure to read Andrew Glover's tutorial.

Wednesday, April 22, 2009

Lame Flex IDE

Ok, Adobe Flex, or AS3 is an OK language. But their commercial Flex IDE sucks. Adobe uses (abuses) the eclipse framework for their commercial IDE. They charge big bucks for a somewhat visual editor that amounts to a bucket of shit.

After wrestling with it for over two years I'm tempted to go back to vi and the command line. Anyone else feel this way?

Monday, March 23, 2009

Parsing ISO 8601 Dates in Flex

Flex has many capabilities, but parsing dates is not one of them. As a member of the JSR-310 team I am focused on date formatting and parsing across many formats, so it comes as a bit of a surprise that Flex has very little support for this. I guess it's time to roll my own.

Unlike Java date formatters, Flex doesn't let you set the parse format to accept custom inputs. The formats must match one of the seven standard parse formats that Flex supports. Some third party examples try to use Date.parse() after massaging the input string, but they are a bit lame.

A good way to parse dates in Flex is to use RegExp to extract the numbers. Then, the numbers need to be validated prior to assignment. First, lets look at a few code snippets to see how to pull the numbers out of a local date (no time zone offset).

// ISO 8601 date time as YYYY-MM-DDThh:mm:ss.SSS
var dateTimeArray:Array = inputDate.split('T')
var dateString:String = dateTimeArray[0]
var timeString:String = dateTimeArray[1]

// parse the date from YYYY-MM-DD
var pattern:RegExp = /(^\d{4})-(\d{2})-(\d{2}$)/
var result:Object = pattern.exec( dateString )
if (result == null)
throw(new Error("invalid date format"))

var year:uint = new uint( result[1] )
var month:uint = new uint( result[2] )
var day:uint = new uint( result[3] )

if (!validateDate(year, month, day))
throw(new Error("invalid date format"))

var date:Date = new Date(year, month - 1, day, 0, 0, 0, 0)

// now parse the time from HH:MM:SS.SSS
pattern = /(^\d{2}):(\d{2}):(\d{2}).(\d{3})/
result = pattern.exec( timeString )
if (result == null)
throw(new Error("invalid time format"))

var hour:uint = new uint( result[1] )
var minute:uint = new uint( result[2] )
var second:uint = new uint( result[3] )
var milli:uint = new uint( result[4] )

if (!validateTime(hour, minute, second, milli))
throw(new Error("invalid time format"))

date.setHours(hour, minute, second, milli)

Lenient Dates and Times: It appears that in Flex world, March 32nd == April 1st. And April 2nd at 25 hours == April 3rd at 1am. Is this an April fools joke? No, it's just the lenient way that Flex handles date parameters. No problem with that until you start parsing, then you need to be strict.

Strict Dates: Lets say a vendor sends an electronic invoice to you with a payment due date of 2009-May-31. Does he want his money on June 1st? Probably not. The best response is to reply that his date is invalid and he must submit a valid date.

That's what being strict means. Unfortunately, Flex does not have a way of controlling strict dates, so you need to do this manually.

Validation: Without validation constraints a Flex date can range from 0100-01-01T00:00:00.000 to beyond the year 10,000. Dates earlier than the year 100 are coerced to 1900. This is clearly, not what we want. For parsing purposes there should be a range of minimum/maximum acceptable dates. For discussion purposes, I'll set the range to start at the recognized Gregorian start date of 1582-10-17T00:00:00.000 and end at an arbitrary future date of 2999-12-31:23:59:59.999.

Now that we have a date range, the years are easy, 1582..2999. Months for Flex follow the old C convention of zero indexing, so months range from 0..11 but we will use 1..12 for validation then decrement the month prior to creating the flex Date object.

Days are a bit trickier, but lets use 31 as the default, then 30 for Apr, Jun, Sep, Nov and 28/29 for Feb, after determining the leap year. Here is the standard calculation for leap year lifted from the white book:

var leap = ((year % 4 == 0) && ( !(year % 100 == 0) || (year % 400 == 0)))

Hours, minutes, seconds and milliseconds are bound by 0..23, 0..59, 0..59, and 0..999. This ignores time zone changes and the occasional leap-second, but otherwise works fine.

The Final Code: I created a demo application that allows you to plug in and parse dates and run the unit tests. The demo and source code are available here. I'm still working on a parser for time-zoned dates, but this is a good start.

Saturday, February 28, 2009

JSR-310 javax.time Periods

The proposed JSR-310 date/time API comes with many representations of date and time including Instant, Duration, LocalDate, LocalTime, LocalDateTime, OffsetDate, OffsetTime, OffsetDateTime, ZonedDateTime, MonthDay, YearMonth and others. One of the more interesting classes is Period. It represents a quantity of time, not fixed to any time in space--just a quantity. This entry will discuss periods, how they are parsed, and examples of use.

Parsing: Periods are parsed from string using formats that conform to the ISO-8601 duration format PnYnMnDTnHnMn.nS. Variations of this format are parsed to create Period objects. Parsing is done in PeriodParser, a standalone helper class that is easily accessed through the static method Period.parse(). Here are a few examples:

assert Period.parse("PT0S") == Period.ZERO
assert Period.parse("P1Y") == Period.years(1)
assert Period.parse("P10Y8M22DT3M") == Period.period(10, 8, 22, 3)
assert Period.parse("PT1M") == Period.minutes(1)
assert Period.parse("P-4Y") == Period.years(-4)

As you can see, the parsing scheme is robust and flexible. The main thing to keep in mind is that the Period object is more of a value container as opposed to a process unit. So, 60 minutes won't directly translate to 1 hour, and 24 months does not directly equal 2 years. But this aside, there are many uses for the Period class.

For example, lets say I want to periodically archive old temp files from a system. I could set the period to minutes, hours, days, whatever, then based on an arbitrary start time, begin the archive sweep. In the same moment, I could compute the next archive sweep by adding my pre-defined period object to the current time.

A more elaborate implementation would be to self-adjust the scheduled periods based on some criteria. Sticking with the archive sweep, lets say I set the initial period to 2 hours, or better yet, 120 minutes to give a finer resolution. Then, at the end of the sweep, I tally archived files. The smaller the number, the longer the period, on a sliding scale, up to 240 minutes. A large number would decrease the period to zero minutes, or a continual sweep, so I want to make sure that this is the very worst case scenario.

The Period class makes this easy to implement and maintain. To keep it simple, lets just use a linear equation. The formula for calculating the number of minutes between sweeps is reduced to period in minutes = mx + b, where b = 240 minutes (our maximum amount), x = the number of files, and m = the slope. Lets say that the maximum number anticipated required archives were 2 per minute, so after 240 minutes we would have 480 files that needed to be archived. If that is our worst case then m = -0.5 (delta y divide delta x, or -240 / 480). So the curve looks like this:

Now each time I do an archive sweep, I tally the files and calculate the next sweep interval using:

maxMinutes = 240
slope = -0.5
periodInMinutes = slope * fileCount + maxMinutes

Or, as a single groovy closure:

def periodToNextSweep = { fileCount, slope = -0.5, maxMinutes = 240 ->
Period.minutes( slope * fileCount + maxMinutes )
}

You might think there is a danger in allowing the returned period to be a negative number of minutes. But in all practicality this is acceptable because the objective is to determine the next instant when a sweep should occur. If this time is in the past, simply do it now. Of coarse you would design the slope to target the worst case, so a negative time should seldom if ever occur. And the good part is that the parameters are easy to modify to fit changing environments.

Testing with a Fixed Time Source: My previous entries have demonstrated using TimeSource and Clock tied to the System clock. But, for these tests, I think a fixed time source would be more appropriate. The syntax is like this:

millis = 1234920035991L // 2009-02-17T17:20:35.991-08:00 Tuesday...
timeSource = TimeSource.fixed( Instant.millisInstant( millis ) )
clock = Clock.clockDefaultZone( timeSource )

So now when I create a date, time or date/time object from the clock, the time is always the same. Not very meaningful for real life, but great for testing.

If I create a class that uses clock, I can inject the TimeSource based on the system clock. And for testing, I can inject a fixed TimeSource, run tests, and not have to worry about the specific time, but simply base my tests on a static source. Here is the class:

class SweepController {
def clock = Clock.systemDefaultZone()
def slope = -0.5
def maxMinutes = 240

def periodToNextSweep = { fileCount ->
int x = (int)(slope * fileCount + maxMinutes)
x < 0 ? Period.ZERO : Period.minutes( x )
}

def nextSweepTime = { fileCount ->
def period = periodToNextSweep( fileCount )

clock.offsetDateTime() + period
}
}

And here is the test script:

millis = 1235721600000L // 2009-02-27T00:00-08:00
fixed = TimeSource.fixed( Instant.millisInstant( millis ) )
clock = Clock.clockDefaultZone( fixed )

sweep = new SweepController( clock:clock )
println "now -> ${clock.offsetDateTime()}"
source = [
[ 480, '2009-02-27T00:00-08:00' ],
[ 0, '2009-02-27T04:00-08:00' ],
[ 240, '2009-02-27T02:00-08:00' ],
[ 120, '2009-02-27T03:00-08:00' ],
]
source.each { count, value ->
println "count: ${count} -> ${sweep.periodToNextSweep( count )}, ${sweep.nextSweepTime( count )}"
assert value == sweep.nextSweepTime( count ).toString()
}

When I run the script, here is what I get:

now -> 2009-02-27T00:00-08:00
count: 480 -> PT0S, 2009-02-27T00:00-08:00
count: 0 -> PT240M, 2009-02-27T04:00-08:00
count: 240 -> PT120M, 2009-02-27T02:00-08:00
count: 120 -> PT180M, 2009-02-27T03:00-08:00

The main advantage is that I can run this independent of the current date, but still use the clock object without changing anything inside the class.

Conclusion: This quick look at Period and PeriodParser to see how it fits into the JSR-310 from the groovy coder's perspective. Next time well look closer at Date, Time and DateTime math capabilities and how they work with groovy plus/minus operator overloading.

Sunday, February 22, 2009

Adjusting Date and Time with javax.time

Java's JSR-310 date and time API, co-lead by Michael Nascimento Santos and Stephen Colebourne is a natural spinoff from the venerable joda time. The implementation has many advantages over java util's Date and Calendar. Compared to java.util.Date the API has many types of Date and Time including LocalDate, LocalTime, LocalDateTime, OffsetDate, OffsetTime, OffsetDateTime, and ZonedDateTime.

The new JSR-310 date, time, and date/time classes are immutable and thread safe, unlike java.util.Calendar, but at the same time they offer many basic math, adjusters, and matchers that enable date and time calculations missing in java.util.Date. They are created from a TimeSource that can be tied to the System clock, fixed, or offset in time. This makes the classes extremely test friendly.

This entry discusses how the basic date/time math work and how the date and time adjusters can be used to solve common problems. Lets look first at the basic math.

Look Ma, no Setters: The JSR-310 date/time classes are immutable and thread safe. To accomplish this, the API doesn't include any setXX methods. Days, Hours, Years, etc are manipulated through "plus", "minus", and "with" methods that return new objects of the same type. Here are a few examples, first with date then time:

// tomorrow and 5 years from now
clock = Clock.systemDefaultZone()
today = clock.today()
assert clock.tomorrow() == today.plusDays(1)
fiveYearsFromNow = today.plusYears( 5 )
assert today.plusMonths( 60 ).year == fiveYearsFromNow.year

// two hours ago
now = clock.timeToSecond()
twoHoursAgo = now.minusHours( 2 )
assert now.minusMinutes( 120 ) == twoHoursAgo

// use offset datetime to get today at noon
noon = clock.offsetDateTime().withTime( 12, 0, 0 )
assert noon.hourOfDay == 12
assert noon.minuteOfHour == 0
assert noon.secondOfMinute == 0

The Adjusters: Here is a tricky problem: how do you compute the specific date of a week numbered day of the month, for example the 3rd friday or 4th tuesday. Bay area residents know that not being able to calculate these simple problems can cost real money (street sweep adys). So here is how JSR-310 handles this:

dt = clock.offsetDate()

thirdFridayAdjuster = DateAdjusters.dayOfWeekInMonth( 3, DayOfWeek.FRIDAY )
fourthTuesdayAdjuster = DateAdjusters.dayOfWeekInMonth( 4, DayOfWeek.TUESDAY )

thirdFriday = dt.with( thirdFridayAdjuster )
println "third friday -> ${thirdFriday}, ${thirdFriday.toDayOfWeek()}, ${thirdFridayAdjuster}"
assert DayOfWeek.FRIDAY == thirdFriday.toDayOfWeek()

fourthTuesday = dt.with( fourthTuesdayAdjuster )
println "fourth tuesday -> ${fourthTuesday}, ${fourthTuesday.toDayOfWeek()}, ${fourthTuesdayAdjuster}"
assert DayOfWeek.TUESDAY == fourthTuesday.toDayOfWeek()

These examples just scratch the surface of the many date/time manipluation methods for JSR-310's javax.time package. The next entry will discuss how JSR-310 matchers help determine if a date lands on a leap year, leap day, last day of the month, etc. and how to use this in groovy.

Saturday, February 21, 2009

JSR-310 javax.time and Groovy Tests

The development and implementation of JSR-310 is continuing to move ahead at a good pace. Once delivered, (hopefully by JDK 7) the java community will have a solid replacement for the outdated and comparatively buggy Date and Calendar classes. At the same time, I have created a series of groovy tests that exercise the JSR-310 capabilities.

Formal unit tests for JSR-310 are written in testng and cover all the low level details including TimeZone resolving, serialization, etc. My tests simply verify a subset of the current API and demonstrate use with groovy. The tests cover the basic API, adjusters, matchers, peroids, formatters and parsers. This post demonstrates the basic tests.

Basic API Tests: These tests take advantage of groovy's invokeMethod() to test methods that take zero or more parameters. They demonstrate typical use of TimeSource, Clock, LocalDate, and LocalTime classes. With these basics it's easy to extend to LocalDateTime, OffsetDate, OffsetTime, or ZonedDateTime. Here is the output trace generated from running BasicTests.groovy:

TimeSource to Date/Calendar conversion tests ---------------------------------------
Timesource.system().instant() -> {ts.instant()}
Clock Method Tests ----------------------------------------------------------------
clock = Clock.systemDefaultZone() -> TimeSourceClock[SystemTimeSource, America/Los_Angeles]
Clock methods...
method: clock.getSource() -> SystemTimeSource
method: clock.getZone() -> America/Los_Angeles
method: clock.today() -> 2009-02-21
method: clock.tomorrow() -> 2009-02-22
method: clock.yesterday() -> 2009-02-20
method: clock.dateTime() -> 2009-02-21T08:56:35.512
method: clock.dateTimeToMinute() -> 2009-02-21T08:56
method: clock.dateTimeToSecond() -> 2009-02-21T08:56:35
method: clock.time() -> 08:56:35.515
method: clock.timeToMinute() -> 08:56
method: clock.timeToSecond() -> 08:56:35
method: clock.offsetDate() -> 2009-02-21-08:00
method: clock.offsetTime() -> 08:56:35.517-08:00
method: clock.offsetTimeToMinute() -> 08:56-08:00
method: clock.offsetTimeToSecond() -> 08:56:35-08:00
method: clock.offsetDateTime() -> 2009-02-21T08:56:35.518-08:00
method: clock.zonedDateTime() -> 2009-02-21T08:56:35.521-08:00 America/Los_Angeles
method: clock.zonedDateTimeToMinute() -> 2009-02-21T08:56-08:00 America/Los_Angeles
method: clock.zonedDateTimeToSecond() -> 2009-02-21T08:56:35-08:00 America/Los_Angeles
method: clock.year() -> Year=2009
method: clock.yearMonth() -> 2009-02

OffsetDate methods from offsetDate = clock.offsetDate() -> 2009-02-21-08:00
method: offsetDate.atMidnight() -> 2009-02-21T00:00-08:00
method: offsetDate.getDayOfMonth() -> 21
method: offsetDate.getDayOfWeek() -> DayOfWeek=SATURDAY
method: offsetDate.getDayOfYear() -> 52
method: offsetDate.getMonthOfYear() -> MonthOfYear=FEBRUARY
method: offsetDate.getYear() -> 2009
method: offsetDate.toDayOfMonth() -> DayOfMonth=21
method: offsetDate.toDayOfWeek() -> DayOfWeek=SATURDAY
method: offsetDate.toDayOfYear() -> DayOfYear=52
method: offsetDate.toLocalDate() -> 2009-02-21
method: offsetDate.toMonthOfYear() -> MonthOfYear=FEBRUARY
method: offsetDate.toYear() -> Year=2009
method: offsetDate.getDate() -> 2009-02-21

LocalDate methods from localDate = LocalDate.date(2019, 03, 01) -> 2019-03-01
method: localDate.atMidnight() -> 2019-03-01T00:00
method: localDate.getDayOfMonth() -> 1
method: localDate.getDayOfWeek() -> DayOfWeek=FRIDAY
method: localDate.getDayOfYear() -> 60
method: localDate.getMonthOfYear() -> MonthOfYear=MARCH
method: localDate.getYear() -> 2019
method: localDate.toDayOfMonth() -> DayOfMonth=1
method: localDate.toDayOfWeek() -> DayOfWeek=FRIDAY
method: localDate.toDayOfYear() -> DayOfYear=60
method: localDate.toLocalDate() -> 2019-03-01
method: localDate.toMonthOfYear() -> MonthOfYear=MARCH
method: localDate.toYear() -> Year=2019
method: localDate.getMonthDay() -> --03-01
method: localDate.getYearMonth() -> 2019-03
method: localDate.isLeapYear() -> false
method: localDate.toEpochDays() -> 17956
method: localDate.toModifiedJulianDays() -> 58543
method: localDate.toDateTimeFields() -> {ISO.Year=2019, ISO.MonthOfYear=3, ISO.DayOfMonth=1}

LocalTime methods, from localTime = clock.time() -> 08:56:35.549
method: localTime.getHourOfDay() -> 8
method: localTime.getMinuteOfHour() -> 56
method: localTime.getSecondOfMinute() -> 35
method: localTime.toDateTimeFields() -> {ISO.HourOfDay=8, ISO.MinuteOfHour=56, ISO.SecondOfMinute=35, ISO.NanoOfSecond=549000000}
method: localTime.toHourOfDay() -> HourOfDay=8
method: localTime.toLocalTime() -> 08:56:35.549
method: localTime.toMinuteOfHour() -> MinuteOfHour=56
method: localTime.toSecondOfMinute() -> SecondOfMinute=35
method: localTime.getNanoOfSecond() -> 549000000
method: localTime.toNanoOfSecond() -> NanoOfSecond=549000000
method: localTime.toNanoOfDay() -> 32195549000000
End basic tests...
Close inspection reveals that some methods in LocalDate are missing in OffsetDate. This isn't a huge problem, because OffsetDate can easily create a LocalDate object. But, it would be nice to have isLeapYear(), toEpochDays(), toModifiedJulianDays() and other methods in both classes. (This may already be in the works)...

Here is a simple way to access some of the missing methods:

offsetDate = clock.offsetDate()
offsetDate.toLocalDate().toEpochDays()
offsetDate.toLocalDate().toModifiedJulianDays()
offsetDate.toLocalDate().isLeapYear()
// or, even groovyer
offsetDate.toLocalDate().leapYear
Conclusion: What these tests really do is to demonstrate the basic functionality of the new javax.time classes. It's obvious that JSR-310 offers the horsepower missing from the current java implementations. And, it will be a very useful addition to the groovy toolkit as well.

Future posts will demonstrate the Adjusters, Matchers, Periods, Formatters and Parsers. Once the tests are cleaned up a bit, I'll post them on line.