Designing a graph-based payroll engine

The problem#

Payroll looks simple until you meet a real enterprise ruleset. A single employee’s pay is a pile of interdependent line items:

base = rank.base_salary * effort_ratio
overtime = base * 1.5 * overtime_hours
income_tax = (base + overtime + bonuses) * tax_bracket_fn(...)
net = base + overtime + bonuses - income_tax - social_insurance - ...

Each component can reference any other. HR changes formulas constantly. Worse: some companies have custom components — a “Tet bonus” formula that only exists for that client.

Hard-coding the calculation order meant every new rule was a code deploy. Lead time for a new payroll component was weeks.

The insight#

These formulas form a directed acyclic graph. A component is a node. “Uses the value of X” is an edge from X to the current component. The order you must compute them in is a topological sort of that graph.

Once you see it that way, two things fall out:

Configuration, not code. Store the formula text + dependencies in the database. Users change rules via UI.
Parallelism for free. Nodes at the same topological level have no dependencies between them — compute them concurrently.

Kahn’s algorithm#

Kahn’s is the classic O(V + E) topological sort. You keep a queue of nodes with no remaining incoming edges:

1
type Node struct {
2
    ID       string
3
    Formula  string          // "base * 1.5 * overtime_hours"
4
    DependsOn []string
5
}
6

7
func topoLevels(nodes []*Node) [][]*Node {
8
    inDeg := make(map[string]int, len(nodes))
9
    byID  := make(map[string]*Node, len(nodes))
10
    children := make(map[string][]*Node, len(nodes))
11

12
    for _, n := range nodes {
13
        byID[n.ID] = n
14
        inDeg[n.ID] = len(n.DependsOn)
15
        for _, dep := range n.DependsOn {
16
            children[dep] = append(children[dep], n)
17
        }
18
    }
19

20
    var levels [][]*Node
21
    ready := make([]*Node, 0)
22
    for _, n := range nodes {
23
        if inDeg[n.ID] == 0 {
24
            ready = append(ready, n)
25
        }
26
    }
27

28
    for len(ready) > 0 {
29
        level := ready
30
        levels = append(levels, level)
31
        ready = nil
32
        for _, n := range level {
33
            for _, c := range children[n.ID] {
34
                inDeg[c.ID]--
35
                if inDeg[c.ID] == 0 {
36
                    ready = append(ready, c)
37
                }
38
            }
39
        }
40
    }
41
    return levels
42
}

If the graph has a cycle (A depends on B which depends on A), some nodes will never hit in-degree 0 — detect it by comparing total processed nodes to the input length. We reject the ruleset at save time rather than at compute time.

Parallel calculation per level#

The critical trick: within a level, nodes are independent. For 6,000 employees × ~30 components each, the naive sequential version took ~5 minutes. Level-parallel with sync.WaitGroup and a bounded goroutine pool collapsed it to ~30 seconds.

1
func computeEmployee(emp *Employee, levels [][]*Node) (map[string]float64, error) {
2
    vals := make(map[string]float64)
3
    var mu sync.Mutex
4

5
    for _, level := range levels {
6
        var wg sync.WaitGroup
7
        errs := make(chan error, len(level))
8
        for _, node := range level {
9
            wg.Add(1)
10
            go func(n *Node) {
11
                defer wg.Done()
12
                v, err := evaluateFormula(n.Formula, emp, vals)
13
                if err != nil {
14
                    errs <- fmt.Errorf("%s: %w", n.ID, err)
15
                    return
16
                }
17
                mu.Lock()
18
                vals[n.ID] = v
19
                mu.Unlock()
20
            }(node)
21
        }
22
        wg.Wait()
23
        close(errs)
24
        for err := range errs {
25
            if err != nil {
26
                return nil, err
27
            }
28
        }
29
    }
30
    return vals, nil
31
}

In production we cap the pool — unlimited goroutines spawn thousands for a big company, thrashing the CPU and killing the formula interpreter cache.

What I’d do differently#

Cache compiled formulas. Parsing the expression per employee per component is 80% of the time. The DAG structure is shared across the whole company.
Structural sharing for the vals map. For identical employees (same rank, same contract type), most components have identical values — don’t recompute.
Persistent levels. The level structure rarely changes. Recompute only on ruleset update.

The whole thing is ~800 lines of Go. Business users have added 40+ custom components in the last year. Zero deploys.