I am a fan of design patterns. Patterns provide a common language and understanding of code-bases, and they help us solve problems in useful, understandable and maintainable ways.
Let's take the case I was working on a few months back. A co-worker and I were given a project to produce some reporting. The data being reported on was coming from 3 different data sources and needed to be normalized and scrubbed. A decision was made to create a data-mart, where the data from the real-time production systems would be copied into the data mart weekly to allow managers to run large reports.
From discussions with managers during the discovery phase one thing became clear : There was general disagreement on how to scrub the data. This data deals with time reporting, and sometimes records didn't necessarily make sense - For instance, workers reported 'starting' and 'ending' work 1 week apart. Clearly, they didn't work an entire week without rest, or food. How should this record, then, be handled? Should the number be truncated? Should we assume the worker meant the SAME day? Should the record be thrown out? I know some of you out there are saying "Um, contact the worker and get him to fix it.". In this case, that is not an option, and would likely be a post for a different site. I think you can imagine that managers could think of an infinite number of ways to 'transform' the data before placing it into the data mart.
The Implementation
It is probably clear from the description about that a routine that using things like if/then's is going to be an error prone, and difficult system to maintain. First off, when the first change comes in to modify how a record is handled, the developer will first have to find the correct if statement in a large chain of if's. Second, the concerns are mixed together. In addition, if a NEW type of data error is discovered that will require a new if statement. Also, if a new variant on an existing error is discovered, a new if statement might be accidentally created if the developer making the change doesn't realize that the new requirement is simple a variant on an existing error. Finally, given the amount of disagreement between the managers, it was clear that were going to go back and forth on what transformations they were going to want. With the if/then implementation, this means a second level of if statements to determine if the transformation is needed, further obfuscating the intent of the code.
Design patterns to the rescue. After a short discussion, it was clear that we wanted to use was the Chain of Responsibility (CoR). The CoR describes a pattern that defines a series of processing objects and the operations they can perform, along with a way to add new processing objects on the end of the chain.
Let me explain how the solution was constructed. Of course, I am not using the actual implemenation from work, but a variation on it in order to demonstrate the pattern. First, we created an object that can needs to be operated on, in our case, I'll call it a Time object. Then we created an abstract class called Transform that Takes a time object in its perform() method and a Transform object in its constructor. Then we created Transform implementations for the logical transformations needed. Here are a couple of things the Implementations may have had:
- MultipleDayEntryTransformImpl
- ZeroTimeTransformImpl
- StartTimeAfterEndTimeImpl
- ExactlyFiveMinutesImpl
/*Here is the class on which the Transform classes will operate.
*/
class Time {
private Date start;
private Date end;
public Date getStart(){ return start};
public Date getEnd() { return end};
public void setStart(Date start){this.start = start };
public void setEnd(Date end) {this.end = end};
}
/*Here is the abstract class that defines the interface and constructor
*/
abstract class Transform {
/*
*/
public Transform nextTransformation;
public Transform(Transform nextTransformation){
this.nextTransformation = nextTransformation;
};
public abstract perform(Time time);
}
/*Here is a simple, sample implementation
*/
class ZeroTimeTransformImpl {
public perform(Time time) {
if(time.getStart().equals(time.getEnd())) {
//Do something because the record meets the criteria for this error
}
if(nextTransformation != null) {
nextTransformation.transform(time);
}
}
}
/*Here is a partial class that might use these...
*/
class Importer {
public static final main(String[] args) {
//create the transformers. this could be defined in
//a DB, properties file, or made selectable through a GUI..
ZeroTimeTransformImpl zeroTime = new ZeroTimeTransformImpl(new ExactlyFiveMinutesImpl(...any number of transform objects here
List
//get Time objects, not implemented
for(Time time : times) {
//this will perform all the transformations in sequence
zeroTime.perform(time);
}
//store the normalized time records...
}
}
As you can see from the code, the pattern has done a few things for us that are desirable when maintaining code:
- The types of transformations are declarative and clear
- The logic to determine if a transformation is needed, and what that transformation is are segregated from all other code and put in their own little world, making it very clear to the developer what is going on
- Many new transformations can be added to the system without having to modify the main codebase at all. A new instance and a change to the zeroTime instantiation, and you are done (The creation of the Transform objects can also be abstracted, but that's a different pattern.)
- Business users can think up any number of transformations for the Time records, and the system should be able to handle it relatively easily.
- The work of implementing the the transformations can easily be distributed among multiple developers without worrying about merges.