Gather, Aggregate, Decide, Act
Most software processes can be distilled down to a couple of discrete steps:
- Gather data
- Aggregate to something manageable
- Examine the data and decide what actions must be taken
- Execute the operations
These steps, or phases, are true for both the smallest and largest processes. A process here is defined as “a series of actions or steps taken in order to achieve a particular end,” and this description holds true for both small pure functions and large business workflows.
More often than not, people don’t actually reflect on these issues and just interleave all the steps. There are, however, good reasons why you might want to extract these to distinct phases, or at least identify them and make use of their properties.
Before we go into details for the steps, we look at how a typical function might be constructed and identify where the boundaries between each steps.
void sendVotingReminder(int userId, string whenCannotVote) { // userId and message is part of the gather phase, but we also need some more data var user = getUser(userId); // We might also need to calculate some data var name = user.firstName + " " + user.lastName; // Then we need to decide what to do string msg; if (user < 18) { // We use null as a "don't send anything" message msg = null; } else if (!user.canVote) { msg = "Hi " + name + " " + whenCannotVote; } else { msg = "Hi " + name + ". Remember to vote!"; } // Now decisions have been made, we need to take action if (msg == null) { return; } else { sendMail(msg); } }
I wrote the function that way to make each step easily distinguishable, although people tend to shorten functions to avoid redundancy and rather writes something like the following.
void sendVotingReminder(int userId, string whenCannotVote) { var user = getUser(userId); if (user >= 18) { sendMail("Hi " + user.firstName + " " + user.lastName + " " + (user.canVote ? ". Remember to vote!" : whenCannotVote)); } }
While it’s a lot terser, the different steps are interleaved, making it more difficult to extract just a portion of the process, identifying the business rules, and so on. As the process becomes more complex, this can often result in difficult to understand code.
Now that we’ve looked at a concrete example, it’s time to describe the steps in more detail.
Gather/collect data: The process of retrieving all the data you need in order to decide what to do. These values can come from function parameters, or they can be fetched from external sources as databases and remote APIs. For many processes, it can even be a combination of data from several sources.
Aggregate/calculate data: When we have all the data we need, we need to massage it to make it more manageable by combining datasets, filtering, converting, cleaning, calculating and so on. We do this to make the next step easier easier to read, write and reason about.
Decide: Once data is in a format that’s easy to process, we can look at it and decide what to do. This is the business rules, e.g. “if this then that”. This is the actual important part, while the other steps is just necessary cruft to make the decision possible and to put it in effect.
Act/execute: Given that we have decided what to do, we need to actually perform the operation. This typically involves writing to databases, sending emails, calling external APIs and so on.
Describing the above steps should come as no surprise, and most experienced developers will probably go “well, duh!”. Many will probably also state that it’s not that simple in the real world as we need to do error handling, performance optimization and so on – And I totally agree. This post is to reflect on the distinct phases in a process and their properties to help us develop, refactor and test such processes.
We could describe any process with the following function:
let makeProcess (gather : 'a -> 'b) (aggregate : 'b -> 'c) (decide : 'c -> 'd) (act : 'd -> 'e) : 'a -> 'e = gather >> aggregate >> decide >> act
let makeProcess (gather : 'a -> 'b) (aggregate : 'b -> 'c) (decide : 'c -> 'd) (act : 'd -> 'e) : 'a -> 'e = gather >> aggregate >> decide >> act;; val makeProcess: gather: ('a -> 'b) -> aggregate: ('b -> 'c) -> decide: ('c -> 'd) -> act: ('d -> 'e) -> ('a -> 'e)
While most type systems don’t allow us to describe many properties of these functions, we can discuss them in prose.
gather
: Often a mix of pure and impure; things that is already fetched (like
parameters) are pure, while things we need to fetch from other sources is
impure. In general, we’ll say that this step is impure. On the other hand, it
will never mutate any data, only read data. You should be able to call this
function with all possible parameters, ignoring all results, and the world still
looks exactly the same.
aggregate
: Combines values to a more actionable format. Given the same data,
it will always return the same result. The step is pure, and can be memoized for
instance. This is why I like to think of gather
and aggregate
as distinct
phases. Pure functions are easy to reason about and test, so the more you’re able
to encode as pure functions, the better.
decide
: Only looks at the aggregated data and never writes any data, and is
thus a pure function. This is also where most domain logic/business rules
reside. Reducing this to a single pure step makes it trivially testable. Nothing
has to be mocked, and the core of the domain becomes very understandable. As
this is the main part that is of interest to the outside, keeping it pure,
separate, tested and documented is great for communicating with users of the
system.
act
: Performs the decided operations and is definitely not pure. This is the
only part of the process which mutates data. It will only use data which is
added by the prior decision step, and it will execute the effect.
To summarize:
gather
: queries the world, no effects on the world
aggregate
: pure – no contact with the world
decide
: pure – no contact with the world
act
: doesn’t query the world, only executes decided effects, no side-effects
Testing gather
requires us to mock the sources it fetches data from. But it
might be easier to test gather >> aggregate
rather than gather
alone, and
that’s fine – testing aggregate
alone doesn’t always give much benefit.
Similarly, testing act
requires us to mock the sources which is mutated.
Testing decide
is “trivial” as it doesn’t read or write to the outside world.
Since aggregate
and decide
are pure functions, you might just have one
function which does both, or you might not have any of them at all… A function
which doesn’t do anything and just returns the value passed into it is called
the identity function. We can use this to “skip” steps where we don’t need to
look at or change the data
You can run gather >> aggregate >> decide
until hell freezes over, and you
won’t have had any effect on the world! This is a really nice property.
Let’s look at some silly examples to show that our makeProcess
is able to
describe regular functions.
// We decide that + should be performed let myadd = makeProcess id id (+) id myadd 1 2
[K texlivetexmf-20230313 3.63GiB 54KiB/s 09:02 ▕▏ ▏ 0.8% let myadd = makeProcess id id (+) id myadd 1 2;; let myadd = makeProcess id id (+) id ------------^^^^^^^^^^^ /home/simendsjo/profiles/simendsjo/notes/public/source/blog/stdin(14,13): error FS0039: The value or constructor 'makeProcess' is not defined.
let perform = makeProcess id id id let add a b = a + b perform add 1 2
3
let mysum = makeProcess id id id (List.fold (+) 0) mysum [1 .. 3]
6
let const' x = makeProcess (fun _ -> x) id id id let const'' x = makeProcess id (fun _ -> x) id id let const''' x = makeProcess id id (fun _ -> x) id let const'''' x = makeProcess id id id (fun _ -> x)
let const' x = makeProcess (fun _ -> x) id id id let const'' x = makeProcess id (fun _ -> x) id id let const''' x = makeProcess id id (fun _ -> x) id let const'''' x = makeProcess id id id (fun _ -> x);; val const': x: 'a -> ('b -> 'a) val const'': x: 'a -> ('b -> 'a) val const''': x: 'a -> ('b -> 'a) val const'''': x: 'a -> ('b -> 'a)
Let’s look at how we could split these steps out of sendVotingReminder
, but
first we need to convert it to F#.
let sendVotingReminder (userId : int) (whenCannotVote : string) = // gather let user = getUser userId // aggregate let name = user.firstName + " " + user.lastName; // decide let msg = if (user < 18) then null else if (not user.canVote) then sprintf "Hi %s %s" name whenCannotVote else sprintf "Hi %s. Remember to vote!" name // act if (isNull msg) then () else sendMail msg
Encoding the possible decisions as a closed set is good both for documentation and robustness.
type Action = | NoActionBecauseUserTooYoung | SendCannotVoteMessage of message : string | SendReminder of message : string
Remember that act
shouldn’t query the outside world, so the only information
it has available is what is available in Action
. We could drop the
NoActionBecauseUserTooYoung
by using an Option
if we need 0 or 1 action, or
support 0 to many actions by returning a list of actions.
Sometimes it makes sense to let aggregate
return more information to decide
like the fact that a user is too young. But having a “no-op” case is often a
very useful feature (like the NullObject pattern in OOP, the identity function or
empty for Monoid), so we’ll leave it in.
let sendVotingReminder (userId : int) (whenCannotVote : string) = // gather let user = getUser userId // aggregate let name = user.firstName + " " + user.lastName; // decide let action = if (user < 18) then NoActionBecauseUserTooYoung else if (not user.canVote) then SendCannotVoteMessage (sprintf "Hi %s %s" name whenCannotVote) else SendReminder (sprintf "Hi %s. Remember to vote!" name) // act match action with | NoActionBecauseUserTooYoung -> () | SendCannotVoteMessage message -> sendMail msg | SendReminder message -> sendMail msg
We can start by creating inner functions for the parts we wish to extract
type Gathered = { user : User whenCannotVote : string } type Aggregated = { user : User whenCannotVote : string fullname : string } let sendVotingReminder (userId : int) (whenCannotVote : string) = let gather (userId : int) (whenCannotVote : string) : Gathered = { getUser userId; whenCannotVote } let aggregate (gathered : Gathered) : Aggregated = let name = user.firstName + " " + user.lastName; { gathered.user; gathered.whenCannotVote; name } let decide (aggregated : Aggregated) : Action = if (aggregated.user < 18) then NoActionBecauseUserTooYoung else if (not aggregated.user.canVote) then SendCannotVoteMessage (sprintf "Hi %s %s" aggregated.user.fullname aggregated.user.whenCannotVote) else SendReminder (sprintf "Hi %s. Remember to vote!" aggregated.user.fullname) // act let act (action : Action) : unit = match action with | NoActionBecauseUserTooYoung -> () | SendCannotVoteMessage message -> sendMail msg | SendReminder message -> sendMail msg gather userId whenCannotVote |> aggregate |> decide |> act
This is still the same function, and we can now reduce it to just its parts
type Gathered = { user : User whenCannotVote : string } type Aggregated = { user : User whenCannotVote : string fullname : string } type Action = | NoActionBecauseUserTooYoung | SendCannotVoteMessage of message : string | SendReminder of message : string let gather (userId : int) (whenCannotVote : string) : Gathered = { getUser userId; whenCannotVote } let aggregate (gathered : Gathered) : Aggregated = let name = user.firstName + " " + user.lastName; { gathered.user; gathered.whenCannotVote; name } let decide (aggregated : Aggregated) : Action = if (aggregated.user < 18) then NoActionBecauseUserTooYoung else if (not aggregated.user.canVote) then SendCannotVoteMessage (sprintf "Hi %s %s" aggregated.user.fullname aggregated.user.whenCannotVote) else SendReminder (sprintf "Hi %s. Remember to vote!" aggregated.user.fullname) let act (action : Action) : unit = match action with | NoActionBecauseUserTooYoung -> () | SendCannotVoteMessage message -> sendMail msg | SendReminder message -> sendMail msg let sendVotingReminder = makeProcess gather aggregate decide act
Just looking at the types, we can pretty much guess what’s going on. It’s pretty
easy to describe decide
to business users, and pretty easy to test in
isolation. It’s actually pretty easy to test each part in isolation as
necessary if the impure steps accepts functions for communication with their
dependencies.
We’ll look at a final example with just the end result.
open System
We create an API which returns dummy data for our example.
type User = { userId: int firstName: string lastName: string } type Profile = { address: string } type Post = { userId: int published : DateTime } let getUser (userId : int) : User = { userId = userId firstName = sprintf "firstname %d" userId lastName = sprintf "lastname %d" userId } let getProfile (userId : int) : Profile = { address = sprintf "address for %d" userId } let getPosts () : Post list = [ { userId = 1 published = DateTime.Today } ]
Now we’re ready to build our process, and the first step is to gather all the data needed.
let gather (userId : int) = let user = getUser userId let profile = getProfile userId let posts = getPosts () (user, profile, posts)
After all data is gathered, we need to process it. It is often useful to create a new structure to hold our information. This is pure, so given the same arguments, it will always return the same result, and it will never have any effects on the outside world.
type TodayDigestInfo = { userId: int; fullname: string; address: string; numBlogsToday: int } let aggregate ((user, profile, blogs) : (User * Profile * Post list)) = { userId = user.userId fullname = sprintf "%s, %s" user.lastName user.firstName address = profile.address.ToUpper() numBlogsToday = blogs |> Seq.filter (fun b -> b.userId = user.userId && b.published.Date = DateTime.Today) |> Seq.length }
type TodayDigestInfo = { userId: int; fullname: string; address: string; numBlogsToday: int } let aggregate ((user, profile, blogs) : (User * Profile * Post list)) = { userId = user.userId fullname = sprintf "%s, %s" user.lastName user.firstName address = profile.address.ToUpper() numBlogsToday = blogs |> Seq.filter (fun b -> b.userId = user.userId && b.published.Date = DateTime.Today) |> Seq.length };; type TodayDigestInfo = { userId: int fullname: string address: string numBlogsToday: int } val aggregate: User * Profile * Post list -> TodayDigestInfo
When we have our data, we’re ready to make decisions about what to do. Making the decision, the important business logic, is pure, and all possible outcomes are typed in the result of the function.
type Action = | SendCongratulationCard of name : string * address: string * message : string | ShameUser of userId : int * why : string let dailyDigest (info : TodayDigestInfo) : Action = if info.numBlogsToday = 0 then ShameUser (info.userId, "Booo. You didn't write any posts!") else SendCongratulationCard (info.fullname, info.address, (sprintf "You wrote %d posts" info.numBlogsToday))
Pure functions doesn’t actually “do” anything, so given our decisions, we need to modify the world. Everything we need to execute the decisision should be stored in the data passed to our execute function from the decision.
let executeAction (action : Action) = match action with | SendCongratulationCard (name, address, message) -> sprintf "UPS.sendCard %A %A %A" name address message | ShameUser (userId, why) -> sprintf "Shaming %A -- %A" userId why
And finally, we’ll create our process. Our process will then have the type
userId: int -> actionResult: string
let sendDailyDigest = makeProcess gather aggregate dailyDigest executeAction
Let’s test our code
sendDailyDigest 1
sendDailyDigest 1;; val it: string = "UPS.sendCard "lastname 1, firstname 1" "ADDRESS FOR 1" "You wrote 1 posts""
sendDailyDigest 2
Shaming 2 --
All this might look like complete overkill, and in many cases it is. But
recognizing that processes, from the smallest +
function, to the largest
business processes, all share the same general steps with the same properties is
powerful knowledge. It makes it easier to extract parts that can be reused by
other processes, parts that should be tested more thoroughly and so on.
In many cases you only want to extract a single part for some reason, like the business logic. The important thing is to remember that these are common boundaries that are often quite natural to extract and often yields some benefits as processes becomes more complex. Just having these distinct blocks in functions can be beneficial as it’s easier to reason about and reduces spaghetti code.