Reduce Complexity in Research Software

Introduction: The anatomy of research complexity

In the world of research, software emerges from the need for exploration. To test a new hypothesis or process a novel dataset, we prioritise flexibility and immediate feedback. In this environment, we naturally adopt tactical programming, a mindset focused on “getting it to work” to validate an idea quickly.

Complexity doesn’t arrive in one big explosion; it accumulates through hundreds of small, tactical decisions. When domain experts, brilliant at their specific craft but with limited exposure to system design, build systems, the software can quickly become a “cognitive trap.”

The result is a codebase characterised by:

High cognitive load: Every small change requires understanding the entire, un-encapsulated mess.
Poor encapsulation: Scattered logic creates “unknown dependencies” where a change in one place breaks another.
Missing safety nets: Lacking unit tests or CI/CD pipelines makes refactoring a high-risk endeavour.

The LLM factor: Tool or tornado?

With Large Language Models (LLMs), the barrier to writing code has vanished. While LLMs can suggest sophisticated architectures, they often act as an accelerated “tactical tornado” if not guided by human intent.

LLMs are prone to the “local optimum” trap: it can generate a perfect, self-contained function in seconds, but without a broader architectural blueprint defining where the “brains” of the system should live, it will haphazardly scatter logic across the codebase, creating a patchwork of “shallow modules” that increase cognitive load rather than reduce it.

To counter this, we must move from simple code generation to Architectural Prompting. Before asking an LLM to write a single line of logic, we should first “feed” it our strategic blueprint (CLAUDE.md, skills, or similar), which represents our design philosophy for Deep Modules. By doing so, the AI ceases to be a chaotic force and instead becomes a skilled construction worker following a master architect’s plan. AI handles the how (implementation details), but the human must strictly govern the where and why.

Part 1: The mathematics of cognitive load

John Ousterhout’s core thesis in A Philosophy of Software Design is that complexity is the accumulation of cognitive load. We can visualise the total system complexity $C$ using this formula:

$$ C = \sum_{i=1}^{n} c_i \cdot w_i $$

$c_i$: The internal complexity (cognitive load) of component $i$.
$w_i$: The frequency with which a developer is forced to interact with that component.

This reveals a humble strategy: if we can’t simplify the core research logic ($c_i$), we must make the module Deep by hiding the complexity behind a simple interface to lower the interaction frequency ($w_i$).

This immediately reminded me of the “Fat Model” vs. “Service Layer” debate in Django. I’ve often felt that models get messy once the “science” part of the code gets deep, and Ousterhout’s ideas explain exactly why.

Part 2: Applying “Depth” to Django

The conflict: “Fat Models” vs. Services

Django has a long-standing tradition of “Fat Models, Skinny Views,” which means putting as much logic as possible into the model classes. For a standard web app, this is great. It keeps things tidy and centralised.

But when you’re doing research software, your “logic” isn’t just basic validation; it might be a 500-line data processing pipeline, a complex statistical model, or a custom text-analysis algorithm. Shoving that into a Django model turns it into a God Object that knows too much about both the database and the underlying research logic.

A Service Layer acts as a Strategic Boundary here, letting the core methodology evolve independently of the “web” part.

When fat models are enough

For many projects, especially simple CRUD (Create, Read, Update, Delete) applications, a service layer is overkill.

If your logic is just saving a field or performing a basic validation, adding a service layer creates a shallow module, a thin wrapper that adds cognitive load without providing any real abstraction. In these cases, staying with “fat models” is the more pragmatic, humble choice.

When to pivot: The “iceberg” test

When does a service layer become necessary? It depends on the depth of the logic. I use the “iceberg test”:

If the implementation is as simple as the interface, keep it in the model.
If the implementation is an “iceberg”, meaning there are 500 lines of complex research logic, external API calls, or multi-step data transformations hidden beneath a single intent, move it to a deep service.

The ultimate decoupling: Standalone packages

If the research logic is truly framework-independent, we can take this a step further: extract the service into a standalone Python package.

Moving the core algorithm to its own repository (for example, pip install my-research-core) is information hiding at its peak. The best research tools work this way. Libraries like scikit-learn or Haiku focus purely on the core logic, leaving “plumbing” tasks like data loading to the system that calls them.

The technical benefits are profound:

Pure Verification: In a Django environment, testing logic often requires heavy database configurations or mocking complex HttpRequest objects. A standalone package allows for pure unit testing - validating mathematical correctness at high speed with zero side effects.
Reproducibility: A versioned package (v1.2.3) ensures that other researchers can reproduce your results without needing your entire web stack.

This isn’t just a service layer; it’s a durable module that transcends the framework entirely, solving the ultimate conflict between “code reuse” and “scientific reproducibility.”

Part 3: React and the “Deep” Hook

In research software, we build complex dashboards or data visualisation tools using React. The most common mistake here is building “shallow” components that are bloated with state management, API calls, and data transformation logic.

To apply Ousterhout’s philosophy here, your components should be a thin layer focused purely on rendering. The “brains” of your UI should be encapsulated in Custom Hooks, often following a Headless pattern.

A Headless module provides the logic and state management without imposing any specific UI. This is the ultimate form of a Deep Module in the frontend ecosystem. Think of libraries like TanStack Table or TanStack Query. They handle incredibly complex tasks, such as multi-column sorting, pagination, or asynchronous caching, but they don’t render a single HTML element.

// A Headless Hook: Complex logic, zero UI noise.
const table: ReturnType<typeof useReactTable> = useReactTable({ data, columns, getCoreRowModel: getCoreRowModel() });

return (
  <table>
    {table.getHeaderGroups().map(headerGroup => (
      // The component only cares about the final state, not the sorting math.
    ))}
  </table>
);

Adopting a headless approach is really about stripping the logic away from the interface. When the “science” of how you filter or compute data sits in a hook that doesn’t care about the DOM, the UI code becomes significantly cleaner. It also means your core analysis can be tested properly without even needing a browser. This stops the component from turning into a “God Object”, one that’s trying to juggle CSS and data synchronisation at the same time.

Part 4: Infrastructure as a strategic blueprint

Complexity doesn’t stop at the code level. Many projects rely on manual setup procedures, a form of tactical programming that creates hidden dependencies and “it works on my machine” syndromes.

The Declarative Approach: Docker and Terraform

If we want durable research, we must move from procedures (doing things manually) to declarations (stating what we want).

From a cognitive load perspective, manual configuration is high-frequency interaction ($w_i$). By adopting a declarative approach with tools like Docker and Terraform, we shift our cognitive load from the “process” to the “intent”:

Docker is a deep module for the environment. It hides the mess of OS dependencies and library versions behind a single image. A researcher just needs to docker run, and the “iceberg” of environment setup remains transparent.
Terraform is a deep module for the infrastructure. It hides thousands of API calls behind a simple text file.

These aren’t just DevOps tools; they’re strategic blueprints that ensure the “experimental setup” is as reproducible as the code itself.

# One blueprint orchestrating a database, a server, and DNS.
resource "cloudflare_workers_kv_namespace" "data" {
  title = "app-storage"
}

resource "cloudflare_worker_script" "app" {
  name    = "research-api"
  content = file("worker.js")
  kv_namespace_binding {
    name         = "DB"
    namespace_id = cloudflare_workers_kv_namespace.data.id
  }
}

resource "cloudflare_record" "web" {
  zone_id = var.zone_id
  name    = "api"
  value   = "research-api.workers.dev"
  type    = "CNAME"
  proxied = true
}

Part 5: Verification as a design tool (and safety net)

One of the biggest barriers to reducing complexity in research is the fear of breaking the science. When an algorithm is complex, we often treat it as a “black box” that we are too terrified to touch, even when we know the internals are a mess.

This is where testing transforms from a chore into a design tool. Without a safety net, you cannot have a truly Deep Module because you lack the confidence to refactor its messy internals.

In research software, this usually means implementing Regression Tests. You don’t necessarily need 100% code coverage. What you need is a set of “golden results” where you know that for a given input, the findings remain consistent and correct.

When you have an automated way to verify the output, the mental cost of interaction ($w_i$) drops significantly. You stop worrying about accidental breakages and start focusing on keeping the design clean. A system that is easy to verify is, by definition, a system that is easy to simplify.

Conclusion: From complexity to systems

Tactical programming is a necessary starting point for exploration, but it shouldn’t be the end state. For research to be durable, we need to know when to stop “hacking” and start designing.

Reducing complexity isn’t about reaching some abstract ideal of “clean code.” It’s about clarity. It’s about making sure that when you or a colleague open this repository six months from now, the science is visible, not buried under framework noise or manual setup steps.

By being intentional, using Deep Modules to hide the domain complexity and a Declarative Approach to hide the “plumbing”, we turn the software back into a tool for discovery, rather than a barrier to it.

Ultimately, good design is a gift to your future self. It ensures that your work doesn’t just end with a paper, but serves as a stable foundation for the next discovery.

Introduction: The anatomy of research complexity#

The LLM factor: Tool or tornado?#

Part 1: The mathematics of cognitive load#

Part 2: Applying “Depth” to Django#

The conflict: “Fat Models” vs. Services#

When fat models are enough#

When to pivot: The “iceberg” test#

The ultimate decoupling: Standalone packages#

Part 3: React and the “Deep” Hook#

Part 4: Infrastructure as a strategic blueprint#

The Declarative Approach: Docker and Terraform#

Part 5: Verification as a design tool (and safety net)#

Conclusion: From complexity to systems#