Renovating Your House

Of all the weird and wonderful analogies to building software that I’ve heard, the one that sticks with me is building a house. No analogy is perfect, and I’d love to see the Pub/Sub aisle at Home Depot, but it works well enough. The general contractor picks the right tools and frameworks for a solid foundation, skilled craftsman frame the necessary modules and services, some great trades wire and pipe everything together, and then a designer works to make it look and feel like home. At this point the projects are usually well scoped, and even if most projects feel agile, they’re still waterfall-ish in nature. The goal is still to have a house at the end of those iterations. What interests me is what comes after.

Technical debt is a concept that is often cited and often poorly understood. Many definitions try to oversimplify the meaning of technical debt, limiting to just a handful of causes. However the reality is technical debt accumulates even while doing nothing. Frameworks age and fall out of popularity, industry-standard techniques are replaced or refined, deficits in the original design are discovered, the list goes on. On top of general age and rot, poor communication and planning can leave systems tightly coupled, and not up to standards. Time pressure only adds to the list, including cutting corners, writing poor or limited documentation, and skipping testing. All of these in various capacities contribute to the accumulation of debt over time.

How do you combat this kind of entropic chaos? Some lesser experienced technical leaders might try to pay debt all at once, or even worse not at all. Some project leads call for extreme system decomposition into dozens of microservices. The idea is that each one can be independently rewritten over time, but this introduces a whole host of new engineering complexity. Some try to intersperse improvements as independent tasks, but then which debts get paid and which continue to acrue interest? The truth lies somewhere in the midst of all of these options.

Leaning back into the house analogy, I think there is some good concepts to borrow from the general contracting industry. There are two basic principles that I have seen good builders apply over the years: structure first, and bring things up to code. The first concept is easy – don’t invest in anything relating to finishing until the structure is sound. This means that foundational upgrades such as critical security issues, framework upgrades, and tooling changes should be addressed first. Otherwise any other debt or even improvement tasks that get completed are just begging to be redone. The second concept is driven from a regulatory perspective, but I think applies as well. When a contractor opens up a wall, they need to bring whatever is behind that wall up to code before the inspector signs off on the finished changes. As a result, the estimated cost of bringing things up to code is budgeted as part of the renovation itself. When dealing with a reasonably well built newer house, the renovation cost is likely going to be low. When renovating an ancient DIY house, the contractor is going to be pulling asbestos and lead out of the walls, so it gets planned for with a contingency.

No reasonable individual or contractor would spend thousands of dollars renovating a kitchen in a house with no roof, so why invest in a new jQuery-based theme when a project goal may be to migrate to React or Svelte? Similar to the concept of bringing things up to code, I feel the key to successful technical debt management is to be continuously tackling technical debt. Larger foundational items such as changes in frameworks, languages, and tools should be discussed as major projects, but upgrading and patching should part of a healthy development and maintenance lifecycle. When a service has some major rennovation going on, such as adding new features, the developer should take the time to bring things around that new feature up to code. Migrating from class components to functional components in React? Rarely will it make sense to do that migration all at once, but while a developer is already rewriting and redefining classes in a service? It makes sense to bring that up to code before shipping it off.

I’m not advocating for free-for-all rewrites, every developer for themselves. What I am saying is every feature, every major bug, should take the time to consider current best practices and team standards. If the team decides that classes are just fine and never decide to adopt functional components, that’s not debt, that’s a decision. But a developer going into a module finding a major defect like unsanitized inputs, or a straightforward performance improvement should be empowered to bring the code up to current standards, and document the debt that they paid while working on the original feature.

For me, a great feature of this is the code base is continually improving, no single developer feels burdened doing nothing but paying technical debt, and the cost is negligible. Chances are your developers are paying in some form for debt right now – either quietly making improvements, or being burdened with the need to work around debt, potentially incurring even more debt. When planning we typically ask for an estimate for the task, which often gets padded out quietly. Instead, explicitly build in a contingency for bringing things up to code. An additional advantage is over time, the effort for a task versus the debt paid during that task can be monitored. If the debt ratio is rising, it can be an early indicator that it may be time to look at structual or systemic issues.

In short, these house renovation analogies can fit the software lifecycle well. It offers developers a deeper level of ownership, and it offers an early warning system for uncontrolled debt growth. A software system, much like a house, needs regular and thorough maintenance on its components. Budgeting for these issues, and correcting them early, ensures that the code base remains in peak performance.

On Software Quality

There are many metrics by which quality can be measured. We can take quantitative as well as qualitative approaches when evaluating systems for quality. As a software developer, I am particularly interested in how to effectively measure code quality. Quality is something that cannot be retroactively applied or evaluated, rather it must be designed into a system. As software developers, we have a responsibility in ensuring our code is written with quality in mind. Let’s explore some practical ways to improve code quality while maintaining velocity and reducing overhead work.

Write Readable Code

The key to writing high quality software is to ensure that any code you write can be read by another software developer at a later date. Unlike traditional engineering where finalized designs are rarely modified, software by it’s nature is intended to be re-written, improved, extended, and adapted. There is little or no performance cost when formatting your code for readability and providing comments for future developers. Another sure-fire way to improve code quality is to ensure that code reviews are built into your process. Constructive peer feedback allows developers to improve their own coding standards and habits. Implemented properly, it helps strengthen teams by allowing for open and honest conversation without fear of repercussion.

Technically there are several tools and practices that should be used by almost every team to manage a baseline for code quality. The first to enforce a coding standard across your project, and I don’t just mean spaces versus tabs. Things like brace placement (K&R style and the like), naming conventions, and deciding what work should be allowed in constructors. There are many existing style guides such as Google’s Style Guides which can be used as a base for your own style guides. Establishing a standard and a reference in-house also allows some of this work to be automated via critic and linting tools which can automatically format code and point out rules which may have been violated. Beyond code style there are best practices to follow for each language, and for each development style. An example of this would be in object oriented programming, regardless of language or style, concepts like classes having a single concern and subsystem architectures are solid foundations on which to build.

When it comes to personal experience, I’ve learned over my career that privacy and simplicity may require additional thought now, but definitely pays of later. Concepts like ensuring all loops (except main application loops) are properly bounded. There is almost no case for using unbounded while loops or goto statements. Simply by bounding your application entire classes of errors can be eliminated. Second, complex methods should always be broken down into smaller private methods. I don’t stick to the one-page rule-of-thumb, but I do recommend ensuring that your function is only responsible for one piece of logic. Complex methods that change behaviour based on input should likely be different methods and the caller should be responsible for making that distinction. Ensure that any variables and methods that don’t need to be exposed as public are explicitly marked as private or protected. It is easier to grant access to data than it is to revoke it. Verifying and tracing a system is much simpler when classes have a single concern and marking unnecessary variables and methods as private.

Keep Safety and Privacy In Mind

Developers must also consider the safety of their code, regardless of where it is running. Frequently drivers, services, and applications are compromised as a way to obtain access to restricted data. Modern systems are frequently written as online systems, exposing a rather large attack surface through things such as API endpoints, micro services, and web services. Frequently attackers will go after these for weak authentication (hard-coded credentials, default administrator accounts for various tools), SQL injection (not properly escaping or binding parameters), and out-of-date packages (outdated copies of WordPress, unpatched OpenSSL libraries). Within code, it’s worth ensuring that all external data is sanitized before being used, and that we don’t have access to more information than expected. For example, when writing an Oracle SQL statement, it’s safer to use binding than direct string injection. When you write your query, you can use placeholder binding like this: select * from foo where bar = ?. When you go to execute, you would specify your placeholder variables. This would ensure that the data is properly encoded by your driver and prevent one of the most common attack vectors. Even more subtle things, like auto-generated API endpoints that unintentionally expose variables because they are marked public instead of private. It is also important to avoid logging sensitive data as logs can be compromised as they often have more relaxed permissions on who can read them. Ensure things like session keys and passwords are not logged by your application.

Another paradigm of safety is ensuring that your classes are properly scoped. When other subsystems and services consume your code, they are encouraged to use anything marked as public or friendly. This can lead to unexpected behaviour if you unintentionally leave variables and methods public. When writing tight units of code, it is important to only mark methods you intend to be consumable as public. These include get/set methods, constructors, and action methods on your class. If you need to enable extensible code, consider offering interfaces which can be implemented, or base classes which can be extended. This ensures that your code retains control over key attributes and structure.

Cover Edge Cases

One key thing missed by many developers when coding are edge cases. While the happy path through the application is frequently well covered, unexpected or improperly formatted input is frequently missed. When writing methods, particularly those that interface with other subsystems, it’s worth ensuring that they are appropriately hardened by considering all potential input. This is accomplished via equivalence class partitioning which allows input to be partitioned into classes of equivalent data. Ideally you would then write one test case for each partition to cover the broad spectrum of input. For example, if a method accepts integers we would want to test for values at the minimum and maximum integer value marks. For strings we would want to test null, zero, one, and and maximum length strings to ensure full coverage. For common objects this could likely be generalized to a pattern since the class partitions would be common to the type. Complex input, such as objects more advanced partitions would need to be devised. When edge cases are covered and tested properly, we can be confident that our code is robust and correct.

Exceptions are also frequently mishandled in code. Methods that can throw exceptions aren’t trapped at the appropriate level, or worse they aren’t even correctly propagated upwards. This can lead to unstable or incorrect application behaviour. When writing code it is important to consider how exceptions should be handled. In some cases we may only want to handle a subset of errors. For example a configuration class may try to read a configuration file from disk and get a java.io.FileNotFoundException which we can handle gracefully by using application defaults rather than custom values from a file on disk. In another situation we may find a file on disk and try to parse it only to get an oracle.xml.parser.v2.XMLParseException which we may want to pass up to the GUI to let the user know their custom configuration file is corrupt. When writing exception classes, it’s worth being verbose so that other classes can decide if an exception should be caught or passed upwards. Well structured and handled exceptions can make entire subsystems more robust and facilitate integration with other system components.

Targeted Documentation

This is a fairly controversial topic. Few people like writing detailed documentation despite many developers wishing they had access to cleaner and clearer documentation. Some sources say that the number of lines of documentation should match lines of code. Others say that the code should serve as documentation. My preferred method is to write concise documentation only as needed. If you have followed the previous guidelines for code quality then documentation should act only to clarify complex logic and document API methods within classes and libraries. Examples of this would include covering a basic description for every public method and constructor, and commenting complex regular expressions. When writing, try to stick to unambiguous technical language and use consistent terminology throughout, especially when referring custom components or business-related entities.