Gather, Aggregate, Decide, Act

Most software processes can be distilled down to a couple of discrete steps:

  1. Gather data
  2. Aggregate to something manageable
  3. Examine the data and decide what actions must be taken
  4. Execute the operations

These steps, or phases, are true for both the smallest and largest processes. A process here is defined as “a series of actions or steps taken in order to achieve a particular end.”, and this includes both small pure functions and large business workflows.

More often than not, people don’t actually reflect on these issues and just interleave all the steps. There are however good reasons why you might extract these to distinct phases, or at least identify them and make use of their properties.

Before we go into each step, it might be useful to see how a typical function might be constructed and where the boundaries between the steps are.

  void sendVotingReminder(int userId, string whenCannotVote) {
    // userId and message is part of the gather phase, but we also need some more data
    var user = getUser(userId);

    // We might also need to calculate some data
    var name = user.firstName + " " + user.lastName;

    // Then we need to decide what to do
    string msg;
    if (user < 18) {
      // We use null as a "don't send anything" message
      msg = null;
    } else if (!user.canVote) {
      msg = "Hi " + name + " " + whenCannotVote;
    } else {
      msg = "Hi " + name + ". Remember to vote!";
    }

    // Now decisions have been made, we need to take action
    if (msg == null) {
      return;
    } else {
      sendMail(msg);
    }
  }

Note that the function might also have been written as

  void sendVotingReminder(int userId, string whenCannotVote) {
    var user = getUser(userId);

    if (user < 18) {
      return;
    } else {
      var name = user.firstName + " " + user.lastName;
      if (!user.canVote) {
        sendMail("Hi " + name + " " + whenCannotVote);
      } else {
        sendMail("Hi " + name + ". Remember to vote!");
      }
    }
  }

While it’s a lot terser, the different steps are interleaved, making it more difficult to extract just a portion of the function, identifying the business rules and so on. As the process becomes more complex, this can often result in difficult to understand code, and often spaghetti code.

Now that we’ve looked at a concrete example, it’s time to describe the steps in more detail.

Gather/collect data: The process of getting all the data you need in order to make decisions on what to do. These values can come from function parameters, or they can be fetched from external sources as databases and remote APIs. Many times it will be a combination of data from several sources.

Aggregate/calculate data: When we have all the information we need, we often need to combine data from various sources, only extract some data, correct data, calculate data and so on. This is in order to make the logic of choosing what to do easier.

Decide: Once data is in a format that’s easy to process, we can look at it and decide what to do. This is the business rules “if this then that”.

Act/execute: Given that we have decided what to do, we need to actually perform the operation. This is typically writing to databases, sending emails, calling external APIs and so on.

Describing the above steps should come as no surprise, and most developers would probably go “well, duh!”. Many would probably also state that it’s not that simple is the real world as we need to do error handling, performance optimization and so on – And I totally agree. This post is to reflect on the distinct phases in a process and their properties to help us develop, refactor and test such processes.

We could describe the process with the following function:

  let makeProcess (gather : 'a -> 'b) (aggregate : 'b -> 'c) (decide : 'c -> 'd) (act : 'd -> 'e) : 'a -> 'e =
    gather >> aggregate >> decide >> act
val makeProcess :
  gather:('a -> 'b) ->
    aggregate:('b -> 'c) -> decide:('c -> 'd) -> act:('d -> 'e) -> ('a -> 'e)

While most type system doesn’t allow us to describe too many properties of these functions, we can discuss them in prose.

gather: This is often a mix of pure and impure. Things that is already fetched (like parameters) are pure, while things we need to fetch from other sources is impure. In general, we’ll say that this step is impure. On the other hand, it will never mutate any data, only read data. You should be able to call this function with all possible parameters, ignoring all results, and the world still looks exactly the same.

aggregate: This will take several values, and combine them. Given the same data, it will always return the same result. The step is pure, and can be memoized for instance. This is why I like to think of gather and aggregate as distinct phases. pure functions is easy to reason about and test, so the more you’re able to encode as pure functions, the merrier.

decide: This will only look at the aggregated data and never write any data, and is thus a pure function. This is also where most domain logic/business rules reside. Reducing this to a single pure step makes it trivially testable. Nothing has to be mocked, and the core of the domain becomes very understandable.

act: This will act on the decided operation and is definitely not pure. This is the only part of the process which writes data. It will only read data which is sent into it, and it will execute the effect.

To summarize: gather : queries the world, no effects on the world aggregate: pure – no contact with the world decide: pure – no contact with the world act: doesn’t query the world, only executes decided effects, no side-effects

Testing gather requires us to mock the sources it fetches data from. But it might be easier to test gather >> aggregate rather than gather alone, and that’s fine – testing aggregate alone doesn’t always makes sense. Similarly, testing act requires us to mock the sources which is mutated. Testing decide is “trivial” as it doesn’t read or write to the outside world. It only decides what should be done.

Since aggregate and decide are pure functions, you might just have one function which does both, or you might not have any of them at all… A function which doesn’t do anything and just returns the value passed into it is called the identity function. We can use this to “skip” steps where we don’t need to look at or change the data

You can run gather >> aggregate >> decide until hell freezes over, and you won’t have had any effect on the world! This is a really nice property.

Let’s look at some silly examples to show that makeProcess is able to describe regular functions.

  // We decide to add
  let myadd = makeProcess id id (+) id
  myadd 1 2
val myadd : (int -> int -> int)
val it : int = 3
  let call = makeProcess id id id
  let add a b = a + b
  call add 1 2
val call : ((int -> int -> int) -> int -> int -> int)
val add : a:int -> b:int -> int
val it : int = 3
  let mysum = makeProcess id id id (List.fold (+) 0)
  mysum [1 .. 3]
val mysum : (int list -> int)
val it : int = 6
  let const' x = makeProcess (fun _ -> x) id id id
  let const'' x = makeProcess id (fun _ -> x) id id
  let const''' x = makeProcess id id (fun _ -> x) id
  let const'''' x = makeProcess id id id (fun _ -> x)
val const' : x:'a -> ('b -> 'a)
val const'' : x:'a -> ('b -> 'a)
val const''' : x:'a -> ('b -> 'a)
val const'''' : x:'a -> ('b -> 'a)

Let’s look at how we could split these steps out of sendVotingReminder. But first we need to convert it to F#.

  let sendVotingReminder (userId : int) (whenCannotVote : string) =
    // gather
    let user = getUser userId

    // aggregate
    let name = user.firstName + " " + user.lastName;

    // decide
    let msg =
      if (user < 18)
      then null
      else if (not user.canVote)
      then sprintf "Hi %s %s" name whenCannotVote
      else sprintf "Hi %s. Remember to vote!" name

    // act
    if (isNull msg)
    then ()
    else sendMail msg

It’s often useful to first encode what the possible decisions are

  type Action =
    | NoActionBecauseUserTooYoung
    | SendCannotVoteMessage of message : string
    | SendReminder of message : string

Remember that act shouldn’t query the outside world, so the only information it has available is what is available in Action. We could drop the NoActionBecauseUserTooYoung by using an Option instead for instance. Sometimes it makes sense to let aggregate return more information to decide like the fact that a user is too young. But having a “no-op” case is often a very useful feature (like the NullObject pattern in OOP, the identity function or empty for Monoid), so we’ll leave it.

  let sendVotingReminder (userId : int) (whenCannotVote : string) =
    // gather
    let user = getUser userId

    // aggregate
    let name = user.firstName + " " + user.lastName;

    // decide
    let action =
      if (user < 18)
      then NoActionBecauseUserTooYoung
      else if (not user.canVote)
      then SendCannotVoteMessage (sprintf "Hi %s %s" name whenCannotVote)
      else SendReminder (sprintf "Hi %s. Remember to vote!" name)

    // act
    match action with
    | NoActionBecauseUserTooYoung ->
      ()
    | SendCannotVoteMessage message ->
      sendMail msg
    | SendReminder message ->
      sendMail msg

We can start by creating inner functions for the parts we wish to extract

  type Gathered =
      { user : User
        whenCannotVote : string
      }

  type Aggregated =
      { user : User
        whenCannotVote : string
        fullname : string
      }

  let sendVotingReminder (userId : int) (whenCannotVote : string) =
    let gather (userId : int) (whenCannotVote : string) : Gathered =
      { getUser userId; whenCannotVote }

    let aggregate (gathered : Gathered) : Aggregated =
      let name = user.firstName + " " + user.lastName;
      { gathered.user; gathered.whenCannotVote; name }

    let decide (aggregated : Aggregated) : Action =
      if (aggregated.user < 18)
      then NoActionBecauseUserTooYoung
      else if (not aggregated.user.canVote)
      then SendCannotVoteMessage (sprintf "Hi %s %s" aggregated.user.fullname aggregated.user.whenCannotVote)
      else SendReminder (sprintf "Hi %s. Remember to vote!" aggregated.user.fullname)

    // act
    let act (action : Action) : unit =
      match action with
      | NoActionBecauseUserTooYoung ->
        ()
      | SendCannotVoteMessage message ->
        sendMail msg
      | SendReminder message ->
        sendMail msg

    gather userId whenCannotVote
    |> aggregate
    |> decide
    |> act

This is still the same function, and we can now reduce it to just its parts

  type Gathered =
      { user : User
        whenCannotVote : string
      }

  type Aggregated =
      { user : User
        whenCannotVote : string
        fullname : string
      }

  type Action =
    | NoActionBecauseUserTooYoung
    | SendCannotVoteMessage of message : string
    | SendReminder of message : string

  let gather (userId : int) (whenCannotVote : string) : Gathered =
    { getUser userId; whenCannotVote }

  let aggregate (gathered : Gathered) : Aggregated =
    let name = user.firstName + " " + user.lastName;
    { gathered.user; gathered.whenCannotVote; name }

  let decide (aggregated : Aggregated) : Action =
    if (aggregated.user < 18)
    then NoActionBecauseUserTooYoung
    else if (not aggregated.user.canVote)
    then SendCannotVoteMessage (sprintf "Hi %s %s" aggregated.user.fullname aggregated.user.whenCannotVote)
    else SendReminder (sprintf "Hi %s. Remember to vote!" aggregated.user.fullname)

  let act (action : Action) : unit =
    match action with
    | NoActionBecauseUserTooYoung ->
      ()
    | SendCannotVoteMessage message ->
      sendMail msg
    | SendReminder message ->
      sendMail msg

  let sendVotingReminder = makeProcess gather aggregate decide act

Just looking at the types, we can pretty much guess what’s going on. It’s pretty easy to describe decide to business users, and pretty easy to test in isolation. It’s actually pretty easy to test each part in isolation as necessary.

We’ll look at a final example with just the end result.

  open System

  // This is some existing API our process is going to use.

  type User =
      { userId: int
        firstName: string
        lastName: string
      }

  type Profile =
      { address: string
      }

  type Post =
      { userId: int
        published : DateTime
      }

  let getUser (userId : int) : User =
      { userId    = userId
        firstName = sprintf "firstname %d" userId
        lastName  = sprintf "lastname %d" userId
      }

  let getProfile (userId : int) : Profile =
      { address = sprintf "address for %d" userId }

  let getPosts () : Post list =
      [
          { userId = 1
            published = DateTime.Today
          }
      ]


  // Now we're ready to build our process.
  //
  // The first step is to gather all the data needed.
  // Note that we will not to anything other than query for data.
  // No mutation, no side-effects

  let gather (userId : int) =
    let user = getUser userId
    let profile = getProfile userId
    let posts = getPosts ()
    (user, profile, posts)

  // After all data is gathered, we need to process it
  // At this stage it's useful to create a new structure to hold our information.
  // Note that this is fully pure. Given the same arguments, it will always return
  // the same result, and it will never have any effects on the outside world.

  type TodayDigestInfo = { userId: int; fullname: string; address: string; numBlogsToday: int }
  let aggregate ((user, profile, blogs) : (User * Profile * Post list)) =
    { userId = user.userId
      fullname = sprintf "%s, %s" user.lastName user.firstName
      address = profile.address.ToUpper()
      numBlogsToday = blogs |> Seq.filter (fun b -> b.userId = user.userId && b.published.Date = DateTime.Today) |> Seq.length
    }


  // When we have our data, we're ready to make decisions about what to do
  //
  // Note again that making the decision, the important business logic, is pure.
  // As it's pure, it is easily testable. Note that all possible outcomes is typed
  // in the result of the function.

  type Action =
    | SendCongratulationCard of name : string * address: string * message : string
    | ShameUser of userId : int * why : string

  let dailyDigest (info : TodayDigestInfo) : Action =
    if info.numBlogsToday = 0
    then ShameUser (info.userId, "Booo. You didn't write any posts!")
    else SendCongratulationCard (info.fullname, info.address, (sprintf "You wrote %d posts" info.numBlogsToday))

  // Of course, pure functions doesn't actually "do" anything, so given our decisions, we need to modify
  // the world. Note that we don't query anything, everything we need is encoded in our actions.

  let executeAction (action : Action) =
    match action with
    | SendCongratulationCard (name, address, message) ->
        sprintf "UPS.sendCard %A %A %A" name address message
    | ShameUser (userId, why) ->
        sprintf "Shaming %A -- %A" userId why

  // And finally, we'll create our process. Our process will then have the type
  // userId: int -> actionResult: string
  let sendDailyDigest = makeProcess gather aggregate dailyDigest executeAction
type User =
  {userId: int;
   firstName: string;
   lastName: string;}
type Profile =
  {address: string;}
type Post =
  {userId: int;
   published: DateTime;}
val getUser : userId:int -> User
val getProfile : userId:int -> Profile
val getPosts : unit -> Post list
val gather : userId:int -> User * Profile * Post list
type TodayDigestInfo =
  {userId: int;
   fullname: string;
   address: string;
   numBlogsToday: int;}
val aggregate : User * Profile * Post list -> TodayDigestInfo
type Action =
  | SendCongratulationCard of name: string * address: string * message: string
  | ShameUser of userId: int * why: string
val dailyDigest : info:TodayDigestInfo -> Action
val executeAction : action:Action -> string
val sendDailyDigest : (int -> string)
sendDailyDigest 1
val it : string =
  "UPS.sendCard "lastname 1, firstname 1" "ADDRESS FOR 1" "You wrote 1 posts""
sendDailyDigest 2
val it : string = "Shaming 2 -- "You didn't write any posts!""

All this might look like complete overkill, and in many cases it is. But recognizing that processes, from the smallest + function, to the largest business processes, all share the same general steps with the same properties is powerful knowledge. It makes it easier to extract parts that can be reused by other processes, parts that should be tested more thoroughly, parts that should be parameterized and so on.

In many cases you only wish to extract a single part for some reason. The important thing is to remember that these are common boundaries that’s often quite natural to extract and often can give some benefits to extract when the process becomes more complex. Just having these distinct blocks in functions can be beneficial as it’s easier to reason about and reduces spaghetti code.

Date: 2019-11-21 Thu 00:00

Author: Simen Endsjø

Created: 2022-03-23 Wed 21:45