Building DBD::Oracle on MacOS

As you may know, DBD::Oracle is one of the most challenging DB drivers to build and install. I recently switched to MacOS Sierra and found myself needing to install DBD::Oracle within my Perlbrew installed local Perl. I opted to use the latest Instant Client from Oracle (12.1) and the latest stable DBD::Oracle build (1.74) for my system.

Install Oracle Instant Client 12.1 Packages

The easiest way to get DBD::Oracle built is against the Instant Client. Download the following 3 packages (or the corresponding 3 for the Client version you desire) to your system, and extract them to a folder on your system. I opted to keep my install local within my /Users directory, but you can opt to install it to /Library/Oracle as well. Simply extract all 3 zips to the same path in your system.

Build DBD::Oracle 1.74 Module

On MacOS Sierra I had to prepare the following steps to ensure a successful build. You may need to adjust the steps based on the location of your Instant Client. The primary issue is the fact that MacOS Sierra does not like the dynamic path linking using @rpath. A fix is to just set your full path into the binary.

# setup the Oracle Instant Client environment
export ORACLE_HOME=/Library/Oracle/instantclient_12_1
export DYLD_LIBRARY_PATH=$ORACLE_HOME

# download and unpack DBD::Oracle
cd /tmp
wget http://search.cpan.org/CPAN/authors/id/P/PY/PYTHIAN/DBD-Oracle-1.74.tar.gz
tar -xzf DBD-Oracle-1.74.tar.gz
cd ./DBD-Oracle-1.74/

# build the module against 12.1
perl Makefile.PL
make

# fix for problem with dynamic linking on MacOS Sierra
install_name_tool -change @rpath/libclntsh.dylib.12.1 \
    /Library/Oracle/instant_client_12_1/libclntsh.dylib.12.1 \
    ./blib/arch/auto/DBD/Oracle/Oracle.bundle

# complete the install
make install

Post-Install Tips

After installing, you may want to do a few things to make your experience easier.

  1. Add permanent paths to your .bash_profile
    • export ORACLE_HOME=/Library/Oracle/instantclient_12_1
    • export DYLD_LIBRARY_PATH=$ORACLE_HOME
    • export PATH=$ORACLE_HOME:$PATH
  2. Add a tnsnames.ora for easier configuration
    • cd $ORACLE_HOME
    • mkdir -p network/ADMIN && cd network/ADMIN
    • touch tnsnames.ora

On Software Quality

There are many metrics by which quality can be measured. We can take quantitative as well as qualitative approaches when evaluating systems for quality. As a software developer, I am particularly interested in how to effectively measure code quality. Quality is something that cannot be retroactively applied or evaluated, rather it must be designed into a system. As software developers, we have a responsibility in ensuring our code is written with quality in mind. Let’s explore some practical ways to improve code quality while maintaining velocity and reducing overhead work.

Write Readable Code

The key to writing high quality software is to ensure that any code you write can be read by another software developer at a later date. Unlike traditional engineering where finalized designs are rarely modified, software by it’s nature is intended to be re-written, improved, extended, and adapted. There is little or no performance cost when formatting your code for readability and providing comments for future developers. Another sure-fire way to improve code quality is to ensure that code reviews are built into your process. Constructive peer feedback allows developers to improve their own coding standards and habits. Implemented properly, it helps strengthen teams by allowing for open and honest conversation without fear of repercussion.

Technically there are several tools and practices that should be used by almost every team to manage a baseline for code quality. The first to enforce a coding standard across your project, and I don’t just mean spaces versus tabs. Things like brace placement (K&R style and the like), naming conventions, and deciding what work should be allowed in constructors. There are many existing style guides such as Google’s Style Guides which can be used as a base for your own style guides. Establishing a standard and a reference in-house also allows some of this work to be automated via critic and linting tools which can automatically format code and point out rules which may have been violated. Beyond code style there are best practices to follow for each language, and for each development style. An example of this would be in object oriented programming, regardless of language or style, concepts like classes having a single concern and subsystem architectures are solid foundations on which to build.

When it comes to personal experience, I’ve learned over my career that privacy and simplicity may require additional thought now, but definitely pays of later. Concepts like ensuring all loops (except main application loops) are properly bounded. There is almost no case for using unbounded while loops or goto statements. Simply by bounding your application entire classes of errors can be eliminated. Second, complex methods should always be broken down into smaller private methods. I don’t stick to the one-page rule-of-thumb, but I do recommend ensuring that your function is only responsible for one piece of logic. Complex methods that change behaviour based on input should likely be different methods and the caller should be responsible for making that distinction. Ensure that any variables and methods that don’t need to be exposed as public are explicitly marked as private or protected. It is easier to grant access to data than it is to revoke it. Verifying and tracing a system is much simpler when classes have a single concern and marking unnecessary variables and methods as private.

Keep Safety and Privacy In Mind

Developers must also consider the safety of their code, regardless of where it is running. Frequently drivers, services, and applications are compromised as a way to obtain access to restricted data. Modern systems are frequently written as online systems, exposing a rather large attack surface through things such as API endpoints, micro services, and web services. Frequently attackers will go after these for weak authentication (hard-coded credentials, default administrator accounts for various tools), SQL injection (not properly escaping or binding parameters), and out-of-date packages (outdated copies of WordPress, unpatched OpenSSL libraries). Within code, it’s worth ensuring that all external data is sanitized before being used, and that we don’t have access to more information than expected. For example, when writing an Oracle SQL statement, it’s safer to use binding than direct string injection. When you write your query, you can use placeholder binding like this: select * from foo where bar = ?. When you go to execute, you would specify your placeholder variables. This would ensure that the data is properly encoded by your driver and prevent one of the most common attack vectors. Even more subtle things, like auto-generated API endpoints that unintentionally expose variables because they are marked public instead of private. It is also important to avoid logging sensitive data as logs can be compromised as they often have more relaxed permissions on who can read them. Ensure things like session keys and passwords are not logged by your application.

Another paradigm of safety is ensuring that your classes are properly scoped. When other subsystems and services consume your code, they are encouraged to use anything marked as public or friendly. This can lead to unexpected behaviour if you unintentionally leave variables and methods public. When writing tight units of code, it is important to only mark methods you intend to be consumable as public. These include get/set methods, constructors, and action methods on your class. If you need to enable extensible code, consider offering interfaces which can be implemented, or base classes which can be extended. This ensures that your code retains control over key attributes and structure.

Cover Edge Cases

One key thing missed by many developers when coding are edge cases. While the happy path through the application is frequently well covered, unexpected or improperly formatted input is frequently missed. When writing methods, particularly those that interface with other subsystems, it’s worth ensuring that they are appropriately hardened by considering all potential input. This is accomplished via equivalence class partitioning which allows input to be partitioned into classes of equivalent data. Ideally you would then write one test case for each partition to cover the broad spectrum of input. For example, if a method accepts integers we would want to test for values at the minimum and maximum integer value marks. For strings we would want to test null, zero, one, and and maximum length strings to ensure full coverage. For common objects this could likely be generalized to a pattern since the class partitions would be common to the type. Complex input, such as objects more advanced partitions would need to be devised. When edge cases are covered and tested properly, we can be confident that our code is robust and correct.

Exceptions are also frequently mishandled in code. Methods that can throw exceptions aren’t trapped at the appropriate level, or worse they aren’t even correctly propagated upwards. This can lead to unstable or incorrect application behaviour. When writing code it is important to consider how exceptions should be handled. In some cases we may only want to handle a subset of errors. For example a configuration class may try to read a configuration file from disk and get a java.io.FileNotFoundException which we can handle gracefully by using application defaults rather than custom values from a file on disk. In another situation we may find a file on disk and try to parse it only to get an oracle.xml.parser.v2.XMLParseException which we may want to pass up to the GUI to let the user know their custom configuration file is corrupt. When writing exception classes, it’s worth being verbose so that other classes can decide if an exception should be caught or passed upwards. Well structured and handled exceptions can make entire subsystems more robust and facilitate integration with other system components.

Targeted Documentation

This is a fairly controversial topic. Few people like writing detailed documentation despite many developers wishing they had access to cleaner and clearer documentation. Some sources say that the number of lines of documentation should match lines of code. Others say that the code should serve as documentation. My preferred method is to write concise documentation only as needed. If you have followed the previous guidelines for code quality then documentation should act only to clarify complex logic and document API methods within classes and libraries. Examples of this would include covering a basic description for every public method and constructor, and commenting complex regular expressions. When writing, try to stick to unambiguous technical language and use consistent terminology throughout, especially when referring custom components or business-related entities.

RESTful API Best Practices

The past few days I have read a number of articles outlining what they feel are best practices when designing RESTful APIs. I have put together some of the most common points that I feel directly impact good API design. This post is a work in progress, and will evolve as I dive deeper into the world of good API design.

Nouns, not Verbs

A good API starts with its structure. The consumer of most APIs tend to be developers, so consistency and a nice interface goes a long way. The first step in good API design is having a good resource naming scheme. Too often people intermingle verbs with their resource names. A good API should be noun-based, and the verbs should be represented by HTTP methods.

Bad: GET /api/getProducts
Good: GET /api/products

API Versions

API versioning can be a point of contention. There are good and bad uses, and there are also appropriate and inappropriate time to make use of versioning. I feel that a well-designed API should not require a version, and that it is beneficial to maintain a consistent API through versions. That said, there are many cases where versioning is necessary such as major architectural changes, or paying customers who may not be able to keep their applications in lock step with your own. If you are going to make use of versioning in your API, do so in the URL and not in the header. All versions should be explorable, making headers a non-ideal option. Minor revisions can be expressed using special headers, and can also be expressed by adding new fields while leaving existing fields unchanged.

URL: /api/v1/products/1

Plural or Singular

I have no strong feelings on plural versus singular naming, however the communal wisdom dictates that all resource locators should be plural in name. Regardless of what format you choose, be consistent throughout the application.

Bad: /api/products/1, /api/item/2
Good: /api/products/1, /api/items/2

Nested Resources

Nested resources should only be created when there is a clear relationship between the two resources. It can be used to package convenience methods such as commonly made queries, and it can be used for filtering or sub-selection. The import concept to note is that nested routing should be refining a resource always.

Bad: /api/song/1/album/2
Good: /api/album2/song/1

GET/HEAD/POST/PUT/DELETE

There is not much to say here – use the HTTP verbs whenever possible! They make your code easy to read and API easy to consume. There are some caveats which must be considered, such as proxies which prevent requests other than GET or POST, but for these, make use of a convention of setting a POST header which your software can decode and route appropriately.

Action Method
Retrieve an existing resource GET /api/products/1
Update an existing resource PUT /api/products/1
Create a new resource POST /api/products/1
Delete an existing resource DELETE /api/products/1

Status Codes

Whenever dealing with resources, always make sure you return an appropriate status code that follows standards. Most individuals are used to returning HTTP 200 OK or HTTP 400 BAD REQUEST  for their applications, but for not much additional cost you can provide a lot more semantic meaning in your application.

201 CREATED – Whenever a resource has been created. Include a URL to the new resource.
202 ACCEPTED – Let the user know their resource has been accepted for processing.
204 NO CONTENT – Let the user know the request succeeded, but the server has nothing to say (i.e. responding to a DELETE request).

Error Codes

Continuing the discussion about status codes: we also need to tell the user when things didn’t go so well. It is particularly important that your format your response consistently, correctly, and provide as much detail as possible. Often times, this is the only way that developers will be able to interact with your system and get meaningful information back.

400 BAD REQUEST – Malformed request payload such as invalid XML or JSON.
410 GONE – When a resource endpoint has been deprecated.
415 UNSUPPORTED MEDIA TYPE – When a content type was either incorrectly provided or the server cannot handle that content type.
422 UNPROCESSABLE ENTITY – When the request fails server-side validation.

Payloads

Payload type doesn’t matter in a real sense. Almost every framework and language will support the most common options: JSON and XML. I personally recommend choosing JSON as XML is unnecessarily verbose and has well known parsing problems. If you are going to support multiple types, I recommend going the Rails route and appending the type to the URL such as .xml or .json. This allows for browser explorability and simple visual inspection of the expected payload types.

Security

In this day and age, it is unacceptable to not be using SSL. The overhead cost of SSL is trivial, and the security it can provide is invaluable. Certificates are cheap, and allow for the secure transmission of tokens, authentication, and of course your data.

Night of the Living Dead – Tombstoning Your Code

As code bases grow in size, so does the amount of cruft accumulated from years of refactoring, improvement, or feature abandonment. Dead code can become troublesome to troubleshoot, introduce potential security risks, and confuse developers who continue to work. In an ideal world we would do a simple grep the code base to find references to its use, and if none come up, prune the code. Unfortunately in the real world, this breaks down pretty quickly as people can do some pretty tricky things that make code usefulness non-obvious:

  • Dynamic code generation – generating the path or dependency names automatically at run time (getters/setters, importing name spaces)
  • External dependencies – projects external to the code base that may depend on some legacy components

There is an interesting idea from David Schnepper from Box who talks about the idea of “tombstones” in the code base. The core idea is that all methods in a code base be instrumented with a simple action called a tombstone such as “Tombstone(‘2014-10-31’)” that are both logged and automatically removed from the code when the code path is determined to be live. Over time, tombstones of dead code will remain which indicates that it is likely safe to remove the zombie code from the code base.

While this is a fantastic idea for deprecating dead code, I don’t think that this is ideal for a production system. There are a number of things to consider when deprecating code in this manner:

  • What is the performance impact of adding a tombstone to my application?
  • How do you determine an acceptable tombstone purge date? A month? A year?
  • What risk does this introduce in third party integration or other dependent services?
  • How do you handle cross-project dependencies?

Applied correctly though I think that adding tombstones is an interesting idea, one that perhaps could be automated with solid code instrumentation.