On Software Quality

There are many metrics by which quality can be measured. We can take quantitative as well as qualitative approaches when evaluating systems for quality. As a software developer, I am particularly interested in how to effectively measure code quality. Quality is something that cannot be retroactively applied or evaluated, rather it must be designed into a system. As software developers, we have a responsibility in ensuring our code is written with quality in mind. Let’s explore some practical ways to improve code quality while maintaining velocity and reducing overhead work.

Write Readable Code

The key to writing high quality software is to ensure that any code you write can be read by another software developer at a later date. Unlike traditional engineering where finalized designs are rarely modified, software by it’s nature is intended to be re-written, improved, extended, and adapted. There is little or no performance cost when formatting your code for readability and providing comments for future developers. Another sure-fire way to improve code quality is to ensure that code reviews are built into your process. Constructive peer feedback allows developers to improve their own coding standards and habits. Implemented properly, it helps strengthen teams by allowing for open and honest conversation without fear of repercussion.

Technically there are several tools and practices that should be used by almost every team to manage a baseline for code quality. The first to enforce a coding standard across your project, and I don’t just mean spaces versus tabs. Things like brace placement (K&R style and the like), naming conventions, and deciding what work should be allowed in constructors. There are many existing style guides such as Google’s Style Guides which can be used as a base for your own style guides. Establishing a standard and a reference in-house also allows some of this work to be automated via critic and linting tools which can automatically format code and point out rules which may have been violated. Beyond code style there are best practices to follow for each language, and for each development style. An example of this would be in object oriented programming, regardless of language or style, concepts like classes having a single concern and subsystem architectures are solid foundations on which to build.

When it comes to personal experience, I’ve learned over my career that privacy and simplicity may require additional thought now, but definitely pays of later. Concepts like ensuring all loops (except main application loops) are properly bounded. There is almost no case for using unbounded while loops or goto statements. Simply by bounding your application entire classes of errors can be eliminated. Second, complex methods should always be broken down into smaller private methods. I don’t stick to the one-page rule-of-thumb, but I do recommend ensuring that your function is only responsible for one piece of logic. Complex methods that change behaviour based on input should likely be different methods and the caller should be responsible for making that distinction. Ensure that any variables and methods that don’t need to be exposed as public are explicitly marked as private or protected. It is easier to grant access to data than it is to revoke it. Verifying and tracing a system is much simpler when classes have a single concern and marking unnecessary variables and methods as private.

Keep Safety and Privacy In Mind

Developers must also consider the safety of their code, regardless of where it is running. Frequently drivers, services, and applications are compromised as a way to obtain access to restricted data. Modern systems are frequently written as online systems, exposing a rather large attack surface through things such as API endpoints, micro services, and web services. Frequently attackers will go after these for weak authentication (hard-coded credentials, default administrator accounts for various tools), SQL injection (not properly escaping or binding parameters), and out-of-date packages (outdated copies of WordPress, unpatched OpenSSL libraries). Within code, it’s worth ensuring that all external data is sanitized before being used, and that we don’t have access to more information than expected. For example, when writing an Oracle SQL statement, it’s safer to use binding than direct string injection. When you write your query, you can use placeholder binding like this: select * from foo where bar = ?. When you go to execute, you would specify your placeholder variables. This would ensure that the data is properly encoded by your driver and prevent one of the most common attack vectors. Even more subtle things, like auto-generated API endpoints that unintentionally expose variables because they are marked public instead of private. It is also important to avoid logging sensitive data as logs can be compromised as they often have more relaxed permissions on who can read them. Ensure things like session keys and passwords are not logged by your application.

Another paradigm of safety is ensuring that your classes are properly scoped. When other subsystems and services consume your code, they are encouraged to use anything marked as public or friendly. This can lead to unexpected behaviour if you unintentionally leave variables and methods public. When writing tight units of code, it is important to only mark methods you intend to be consumable as public. These include get/set methods, constructors, and action methods on your class. If you need to enable extensible code, consider offering interfaces which can be implemented, or base classes which can be extended. This ensures that your code retains control over key attributes and structure.

Cover Edge Cases

One key thing missed by many developers when coding are edge cases. While the happy path through the application is frequently well covered, unexpected or improperly formatted input is frequently missed. When writing methods, particularly those that interface with other subsystems, it’s worth ensuring that they are appropriately hardened by considering all potential input. This is accomplished via equivalence class partitioning which allows input to be partitioned into classes of equivalent data. Ideally you would then write one test case for each partition to cover the broad spectrum of input. For example, if a method accepts integers we would want to test for values at the minimum and maximum integer value marks. For strings we would want to test null, zero, one, and and maximum length strings to ensure full coverage. For common objects this could likely be generalized to a pattern since the class partitions would be common to the type. Complex input, such as objects more advanced partitions would need to be devised. When edge cases are covered and tested properly, we can be confident that our code is robust and correct.

Exceptions are also frequently mishandled in code. Methods that can throw exceptions aren’t trapped at the appropriate level, or worse they aren’t even correctly propagated upwards. This can lead to unstable or incorrect application behaviour. When writing code it is important to consider how exceptions should be handled. In some cases we may only want to handle a subset of errors. For example a configuration class may try to read a configuration file from disk and get a java.io.FileNotFoundException which we can handle gracefully by using application defaults rather than custom values from a file on disk. In another situation we may find a file on disk and try to parse it only to get an oracle.xml.parser.v2.XMLParseException which we may want to pass up to the GUI to let the user know their custom configuration file is corrupt. When writing exception classes, it’s worth being verbose so that other classes can decide if an exception should be caught or passed upwards. Well structured and handled exceptions can make entire subsystems more robust and facilitate integration with other system components.

Targeted Documentation

This is a fairly controversial topic. Few people like writing detailed documentation despite many developers wishing they had access to cleaner and clearer documentation. Some sources say that the number of lines of documentation should match lines of code. Others say that the code should serve as documentation. My preferred method is to write concise documentation only as needed. If you have followed the previous guidelines for code quality then documentation should act only to clarify complex logic and document API methods within classes and libraries. Examples of this would include covering a basic description for every public method and constructor, and commenting complex regular expressions. When writing, try to stick to unambiguous technical language and use consistent terminology throughout, especially when referring custom components or business-related entities.

How to Choose System Monitoring Tools

There are dozens of monitoring solutions available in the marketplace. Beyond commercial tools, there are many custom, smaller, or in-house tools. For example I am the prime developer of a tool called Avail which is developed at Pythian. Since we are primarily a managed services company, this tool needs to run on every imaginable platform in existence. Beyond this, since we specialize in database and database applications, we need to be able to reliably and flexibly monitor a huge number of database products. When looking for your own monitoring needs, there are many things to consider.

What is your budget?

Unfortunately monitoring is often the first item to be cut when dealing with budgeting. Some managers think that monitoring doesn’t contribute to the bottom-line of the company, but I would argue that they’re wrong. Monitoring is a vital part of providing a service that is reliable and can be audited. Monitoring tools can assist in preventing security incidents through patch and compliance monitoring. Monitoring tools also assist in narrowing down service performance issues, providing SLA verification, and ensuring that your current infrastructure is meeting your needs. When 3 seconds of waiting decreases customer satisfaction by about 16%, it’s important to ensure your service is reliable and responsive every time. I actively encourage customers to consider monitoring at development time – it’s significantly easier to instrument new configurations and code than to retrofit and find funding when you’re about to go live.

There are many commercial solutions available in the marketplace at many price points. There are SaaS solutions, on-premise solutions, or depending on your infrastructure, integrated solutions. Many tools offer per-CPU or per-Server licensing, but an increasing of SaaS providers are offering different billing metrics including by the hour, by the check, or by the instance. Beyond commercial, there are also many free solutions available for different infrastructure setups. There are pros and cons to each infrastructure layout and each tool, which I will explore in depth in a future article.

What are you trying to monitor?

Traditionally servers were owned and housed in a data center by an organization. This typically meant a fixed number of assets with total ownership. As such, beyond your applications and services, you frequently had to monitor the health of your physical systems as well as the load and health of your applications. Things like disk state, temperature, configuration, and load were critical to ensuring a high quality of service. Virtualization abstracted away some of these issues, allowing load to be balanced and services to be shifted off troublesome hosts.

Many companies are now choosing cloud solutions for their applications, be it private clouds or public cloud utilities. This changes the nature of monitoring – no longer are you managing fixed assets, you are monitoring a service as you scale with demand. Virtual servers and application instances are spun up and down on demand, and no longer are you concerned with the health of a specific physical machine or even virtual host. This dynamic nature means you need a monitoring solution that can be deployed automatically and monitor the application and service metrics that you care about. Some cloud provides such as Amazon have built out services such as CloudWatch to monitor the status of your service. But these tools tend to scale in cost with your service, and don’t provide fine-grained integration that traditional monitoring tools have enjoyed. Some services like NewRelic have addressed application performance and reliability monitoring, but have high cost and high integration effort.

What are your monitoring metrics and goals?

Depending on the design and nature of your business there are many things to consider when it comes to monitoring. There are also three high-level metrics to consider when designing a monitoring solution for your systems: severity, risk, and preventability. Severity relates to the the impact that an outage would have on your business. Risk relates to how likely an issue is likely to occur. Preventability relates whether an issue can be prevented before it causes an outage. As a result, designing an effective monitoring solution can be extremely complex. When evaluating monitoring tools you need to consider what kind of issues can be captured, and the cost in terms of time and software customization. Some things to consider when evaluating the need and complexity of a monitoring solution:

  • Is your service online and responsive?
  • Are your SLAs being met in terms of backups, service availability, response times?
  • Are your security needs met in terms of patching, vulnerabilities, and configuration?
  • Are all key aspects of your service or system being monitored?
  • How are your thresholds configured and reviewed?
  • Do you have proactive monitoring in place to catch issues before they cause a failure?
  • Are you monitoring all aspects of your system?
  • Does the tool allow for auditing and historical analysis?

What is the difference between agent and agentless monitoring?

Traditionally monitoring a system meant installing an agent on a system, and having that agent configured to monitor a number of metrics, potentially with custom scripting. A popular alternative to this is agentless monitoring. A central service (SaaS or otherwise) would connect using established protocols such as SSH on Unix and WMI on Windows, and run queries and system commands against the system. The data would be parsed and managed remotely. The advantage of this approach is you would only need to configure permissions and connectivity to a system. While remote access may be ideal for configuration and deployment, the need to transfer data remotely may violate compliance with regulation such as PCI DSS or HIPAA. Other limitations may be platform dependent – you may not be able to access all application features and information on all OS platforms and security breaches could result in large data breaches.

What kind of integration do you need with your applications?

You may also want to consider the needs of your organization. Application instrumentation and monitoring may be an essential part of your daily operations. Tools like NewRelic can integrate with your application along with off-the-shelf software to provide request and resource-level monitoring. Tools like Nagios an be customized to integrate with your tools as well through custom scripting. Sometimes it is sufficient to monitor the system and high-level components such as the database and core services running on the system.