High Level FP Architecture
March 07, 2020
In a previous post I argued for preferring the functional paradigm when programming because pure functions are easier to understand and help produce more correct code within larger code bases with less mental overhead. What does an FP first application look like - an application that trys to use more pure functions? When not using methods on objects, all functionality, pure or impure, is actually built using only functions. In that situation, we could count all functions in the project and build a metric around whether they are pure or impure - 59 pure, 22 impure etc. We could also take a count of pure vs impure lines of code. Either way we’d have a rough idea about how successful we have been at making the codebase pure. So, how do we improve this or similar metrics?
Impure functions can call pure functions or call other impure functions. They remain impure regardless of any pure functions called.
def add(x, y): return x + y
def sub(x, y): return x - y
def foo(x):
a = add(x, x)
global multiplicand
return sub(a, x*multiplicand)
Here foo
is impure because it references global data. We could also make it impure by printing. The trivial add
and sub
functions are pure. It doesn’t matter how many pure functions foo
calls though, the moment it does something impure it is forever impure. Pure functions can only call pure functions otherwise they become impure.
Programs that have zero side-effects, that is, they only have pure functions are useless. State and side-effects are required for a computer to be useful. Even a program that is one giant pure function composed of other functions needs to use side-effects to read inputs to the program and write/display/communicate the results somehow. So, we know that both pure and impure functions are required and pure functions are their own islands that can only use other pure functions.
The function call flow of the program can be visualised as a graph. The program call stack is a reflection of that.
If foo
is called by main
(or __main__
in python) this small graph as the program advances is:
1) main
2) main -> foo
3) main -> foo -> add
4) main -> foo -> sub
5) main -> foo
6) main
At points (3) and (4) we are using pure functions, everything else is impure. The entry point to the program will always be impure. It is realistic to expect other impure functions to exist in main that communicates the results of an otherwise pure program:
- main
- main -> pure_calculation
- main -> log_results
- main
When using a web framework with API endpoints there’s likely a large number of impure functions behind your endpoint,
not just a single main
function. The impure context is larger.
main -> framework ... -> endpoint_entry ...
The endpoint_entry
maybe complicated but its implementation (ideally) is not your concern, it’s a tool to use. To
maximise purity we try to put as many pure functions as high up in the call stack as possible.
- main
- main -> pure_calculation_1 (-> pure_calculation_implementation_1...)
- main -> pure_calculation_2 (-> pure_calculation_implementation_2...)
- main -> pure_calculation_3 (-> pure_calculation_implementation_3...)
- main -> log_results
- main
is better than
- main
- main -> pure_calculation_1 (-> pure_calculation_implementation_1...)
- main -> impure_2 (-> pure_2)
- main -> impure_3 (-> pure_3)
- main -> log_results
- main
The hope is that pure_calculation_implementation_n
constitutes a lot of the program. In the second program we prefer
that impure_2
and impure_3
mostly limit themselves to work around communicating the results of the program, not the
core logic. The impure functions may also, like the toy foo
function, be providing some re-usable functionality that
packages pure functions in an impure context.
In a simple program the ideal structure will be:
- main
- main -> impure_read_inputs
- main -> pure_calculation
- main -> impure_log_results
- main
The pure_calculation
could be significant - single pass compilers (albeit likely toy) compilers could be implemented
as a pure function over input passed via stdin
to main
.
Common Web Service Endpoint
Even a small web service will likely involve some sort of read and or write to a data persistence layer (i.e. database), possibly many.
- endpoint_entry
- endpoint_entry -> pure_read_inputs
- endpoint_entry -> pure_validate_inputs
- endpoint_entry -> pure_build_input_domain_types
- endpoint_entry -> impure_database_read
- endpoint_entry -> pure_validate_database_read
- endpoint_entry -> pure_build_domain_types_from_database
- endpoint_entry -> pure_calculations
- endpoint_entry -> impure_database_write
- endpoint_entry -> pure_calculate_api_response_types
- endpoint_entry -> pure_serialise_api_response_types
- endpoint_entry (impure_return_serialised_response_to_framework)
pure_read_inputs
may involve impurity if the request is not all in memory yet, otherwise the inputs are just there
for you. The web framework may have validated some properties of the inputs for you, but it is likely that some
validation pure_validate_inputs
is required. This is often combined with making your actual domain types
pure_build_input_domain_types
.
So far no side effects are required, except maybe logging, which as I’ve argued before, you could treat as an exception
to the definition of purity. To do one or more calculation(s) we often need to read a database impure_database_read
.
For convenience you might chain pure_build_domain_types_from_database
+ pure_validate_database_read
and the actual
impure_database_read
. The utility function that chains these together itself is impure, but should be easy to test
and understand if it only chains them instead of doing any interesting logic, at worst bailing out of the chain early
if there is an error:
utility -> impure_database_read
utility -> pure_validate_database_read
utility -> pure_build_domain_types_from_database
We could keep this utility
composition function pure by making the impure_database_read
yield (literally yield
in
python) the intent to read from the database and writing an impure interpreter to execute the side effect(s). The
FP interpreter pattern (a.k.a Free Monad) is something I want to cover at a later date in the context of
injecting dependencies.
The impure_database_write
is much the same as the read except we are feeding data in and probably don’t have any data that comes
back - if following CQRS (command query v.s. responsibility segregation). After that pure_calculate_api_response_types
and pure_serialise_api_response_types
are more pure functions that separates the domain types from the api response
types. When finally given back to the web framework, by function return, the framework takes over the job of doing
all the impure network actions, handling threads and new requests.
Small Scale Design
At the small scale, e.g. a module of functions, I hesitate to use the word architecture. Patterns are also a higher abstraction, and yes, FP has patterns much like e.g. OO programming does. Many of the original Design Patterns were documented in an OO context and have easy solutions in FP such that they are not really talked about or described as patterns, just features of the language. Many of the original design patterns in dynamic languages such as Python are also mostly irrelevant. Except maybe as some form of agreed terminology. FYI the book “Head First Design Patterns” seems to be the preferred modern book for those OO design patterns whilst the Enterprise Patterns books by Martin Fowler have been popular for OO large scale design. Some things in imperative programming such as state mutation are easy, but unless you allow local mutation in your FP program and can keep the mutation local to a function, then you may need something like the State monad (an FP pattern).
The small scale design is more about techniques. One technique is immutable data - local data aswell as function inputs. To “change” the local data we need to be able to opt in to mutation, e.g. with the mutable
keyword in F# or
we need to use recursion. So, we don’t have a stack overflow when recursing, FP languages often support
tail call optimisation meaning the compiler can rewrite it as a loop if the last instruction in the function is to
recursive call the same (or possibly different - mutual recursion) function. Functions are first class values, so
instead of writing the same recursive boilerplate code logic is re-used via higher order functions (e.g. map
,
filter
). As data is immutable it’s often accessible within a module, unless the internal and external representation
of the type is different - but that’s more about the public vs internal API. Efficient “changes” to big immutable data
introduces the need for persistent datastructures. As the
data is immutable we learn to pipeline data much like |
in a posix shell. Functions are building blocks so we
compose them, curry them or partially apply them to make new functions.
Conclusion
This FP architecture might be described as imperative shell and functional core. The pure functions are composed inside an impure context. Others have called the architecture an impure-pure(-impure) sandwich, or depending on how the side effecting imperative parts are implemented, maybe an OO-FP-OO sandwich. There is always an impure context, however it can often be reduced.
All the unfamiliar FP techniques exist to maximise the functional core and correspondingly reduce the impure context
to the bare minimum. Pure functions reduce the burden on the programmer to track state and mutation. The challenge
is that, apart from languages like Haskell, your language probably has little ability to help identify what functions
are pure. You could annotate them, e.g. @pure
in python, but whenever someone updates a function there is a burden
on the programmer to identify if a function is still pure.