Parser Central: Microsoft .NET as a Security Component

During the past decade or so, a significant portion of the computer industry has set out in a quest for secure software. That this sizable force of smart people with all their resources and market power has not yet brought us a secure and safe computing experience, should be an indication that this task is not something you can just turn around and do.

Securing the huge number of software stacks we are working with on a daily basis is a massive undertaking. It is somewhat similar to attempting to change the way we use natural resources and energy in order to prevent further global warming. Since you could be reading this in Utah, USA[1], let’s assume that global warming is an actual problem and is caused by humankind burning pretty much any fossil energy source we can find in order to produce energy.

Slowing down global warming is a tall order, let alone stopping or reversing it. We would need to gradually and globally reduce energy production methods that have carbon dioxide as a byproduct. This is not something you can change overnight. Since people depend on the energy your coal-fed power plant is delivering, you cannot simply turn it off and leave them in the cold. But every time you consider building a new power plant, you should be thinking about its carbon dioxide emissions and you certainly should consider other methods of energy production. The alternative, power generation using renewable sources, will at first appear too expensive and complicated. Primarily, it will seem to provide significantly lower performance, so that you cannot really consider it as an alternative to your coal power plant.

As with the energy problem, the performance argument is constantly pulled out of the bag and waved around when one recommends .NET as the runtime environment for a new software project. Before even the first sketches of a software design and architecture are made (hoping that there actually will be some design and architecture before coding), and a long time before the first line of code is written, someone will argue that whatever it is that’s to be developed must be written in C (or some other unmanaged language).

An insidious fact is that the most seasoned programmer in any team will likely be the one to present this performance argument against whoever proposed using .NET for the task at hand. This might be explained by the seasoned programmer being the one who’s least likely to implement unmanaged code in a way that it can become a security vulnerability. Maybe it is just the programmer’s old belief that anything not compiled into native platform code doesn’t perform well. However, the meritocracy among programmers and their managers causes the senior programmer’s statements to have significant more weight that everyone else’s, so everyone in the team will “learn” that, for performance reasons, they cannot use .NET.

The sad truth is that such repeated statements will cause software stacks to stay vulnerable to memory corruption and integer overflows for decades to come. Especially experienced people should know that, as Donald Knuth already stated: “premature optimization is the root of all evil.” William Allan Wulf took it even further by saying: “More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason – including blind stupidity.” Unfortunately, this is very close to the truth.

If you are an attacker or vulnerability researcher and you are trying to identify an easy attack on a software stack, the first thing you look for is parsers. Any code that handles or interacts with externally provided data that you can influence will be your primary target of interest. If this code is written in an unmanaged language, for example, C/C++, you are very likely to find what you are looking for in the parser before anything else. This is where most software breaks, either through parsing of file formats or protocol messages. In most cases, complex parsing happens before any authentication and authorization could even be performed, so the resulting attack will not only yield arbitrary code execution, but it will also be completely anonymous.

.NET provides almost all the security you need to implement parsers that do not result in security vulnerabilities in your code. Boundary errors do not lead to memory corruptions, so the whole class of buffer overflow vulnerabilities goes away. Even better, boundary errors will throw very distinct exceptions, so your program can react to them specifically. The option to check for arithmetic overflows and underflows in your assemblies is the second mighty weapon that prevents exploitation of signedness issues and data-type conversion problems, although checking for arithmetic overflows and underflows is still not the default for new projects in Visual Studio 2010. By using safe code, written in any of the many .NET front-end languages, you can easily build very solid and robust parsers that ensure your input data is correct and that do not allow attackers to slip overly-large integers past your checks. Let the attacker fuzz your input files until kingdom come.

And what about the performance issues now?

First of all, how about writing secure code in the first place and conducting the performance measurements and optimizations afterwards? It is very likely that this one method, which iterates over your large data set in nested loops a couple of times, is eating most of the CPU time anyway, no matter what language it was written in. Algorithmic mistakes account for a much larger performance impact in almost any sufficiently large application regardless of the programming language used.

Secondly, the security critical parsers are often invoked infrequently during the operation of the application. Review carefully how often your parsers are actually invoked. When you only read files upon user request, it is very unlikely that the user will actually notice any performance difference whether your parser is written in.NET or in unmanaged code, except for the case where a corrupt file is opened, may it be intentionally corrupted or not.

In today’s multi-component multi-tier application designs, it is easy to ensure a correct input data set using a strictly written .NET parser and then handing the “normalized” and verified data to other code that performs computationally-expensive processing.

Last but not least, please keep in mind that .NET code is not executed on a virtual CPU but actually compiled into platform-specific code before being executed. This Just-In-Time compilation is where the real performance is gained in any managed languages. And some seriously smart people work on the JIT. When they find an optimization and roll out an updated JIT, all code runs suddenly faster, not just yours. But more importantly, you don’t have to do anything, it will happen behind the scenes.

I have made only the best of experiences using .NET code for parsers in security-critical situations. It doesn’t relieve you from thinking about acceptable and non-acceptable formatting of your input data, but it massively simplifies the process of validating and checking the input data. If anything goes wrong, the exception will propagate up, and you can safely discard the input from the top-level code. In any other case, the cleaned and sanitized input data set can be used immediately, even in less fortified code.

Returning to the analogy between global warming and securing software stacks, it should be clear that we may not build things the way we did before if we care about changing anything in the future. Even if all parsers from today on will be safe and sound implementations in managed languages without any unsafe code invocations, it will take a long time before the old software is phased out. But if we continue to follow our old habits for dubious reasons, we will never actually get anywhere near our goal of secure and reliable computing.

Consider the capabilities of .NET a vital security component for future software projects.

I appreciate any feedback you may have, and if you happen to attend BlueHat Buenos Aires, I will see you there.


[1] “Climate Change Joint Resolution”, 2010 General Session, State Of Utah,