Are you Null?

Within the last couple of days Microsoft released a proposed update for the next major release of C# version 8.  Over the past several years, there has been a large debate on the existence and use of null in software development.  Allowing null has been heralded as the billion dollar mistake by the null reference inventor, Sir Tony Hare. With this, Microsoft has decided to help the C# community by adding functionality to the C# compiler to help point out where a null reference might occur.

With the release of C# 8, anything referencing an object (string, etc.) must explicitly declare itself as possibly being null, and if that variable isn’t explicitly checked before being used, the compiler generates a warning that a possible null reference might occur. So how does this work? By using the ? at the end of a reference type, it signifies the developer acknowledges null might occur.

This looks like it would be a breaking change, and all code written in a previous version will suddenly stop compiling. This would be true except for two things.

  1. You must use a compiler flag to enforce the rule.
  2. The flag will only generate warnings not errors.

So legacy code is safe in the upgrade process if it’s too difficult to convert.

With this, they are still working out a number of scenarios that prove tricky to handle. These are things like default array initialization (new string[2]). Their comments about all of these can be found on their blog on MSDN

I’ve added their code examples below of edge cases they are still working on:

Personally, I hoped the compiler would enforce these rules a little stronger. Some languages like F# strictly enforce variable immutability unless explicitly allowed, and other functional languages do not allow it at all.

It is possible to turn on “Warnings as errors” and have the compiler stop if it encounters a possible null exception, but this assumes the rest of the code has no other warnings that won’t stop compilation. Ideally, no warning flags should ever appear in code without being fixed, but that is a very difficult standard follow when dealing with legacy code from years past where no one followed that rule before you. Either way, the C# team was in a tight situation, and they did the best they could. They needed to make strides towards making null references easier to track, but they couldn’t break all of the legacy code using previous versions of C#.

Quirks with Pattern Matching in C# 7

With C# 7, Microsoft added the concept of pattern matching by enhancing the switch statement. Compared to functional languages (both pure and impure), this seems to be somewhat lacking in a feature by feature comparison, however it is still nice in allowing a cleaner format of code. With this, there are some interesting quirks, that you should be aware of before using. Nothing they’ve added breaks existing rules of the language, and with a thorough understanding how the language behaves their choices make sense, but there are some gotchas that on the surface looks like they should function one way, but act in a completely different manner.

Consider the following example.

Shows

C# 7 now allows the use of a switch statement to determine the type of a variable. It as also expanded the use of is to include constants including null.

is can show if something is null : shows true

With these two understandings, which line executes in the following code?

Shows default code executed.

Based on the previous examples, its a reasonable conclusion that the one of the first two case statements would execute, but they don’t.

The is operator

The is operator was introduced in C# 1.0, and its use has been expanded, but none of the existing functionality has changed. Up until C# 7, is has been used to determine if an object is of a certain type like so.

This outputs exactly as expected. The console prints “True” (Replacing string with var works the exactly the same. Remember that the object is still typed. var only tells the compiler to figure out what type the variable should be instead of explicitly telling it.)

Is Operator String: True

What happens if the string is null? The compiler thinks its a string. It will prevent you from being able to pass it to methods requiring another reference type even though the value is explicitly null.

Type is null

The is operator is a run time check not a compile time one, and since it is null, the runtime doesn’t know what type it is. In this example, the compiler could give flags to the runtime saying what type it actually is even though it’s null, but this would be difficult if not impossible for all scenarios, so for consistency, it still returns false. Consistency is key.

Printing out True and False is nice, but it’s not really descriptive. What about adding text to describe what is being evaluated.

Is Type With Question, Question doesn't appear

Why didn’t the question appear? It has to do with operator precedence. The + has a higher operator precedence than is and is evaluated first. What is actually happening is:

This becomes clear if the clause is flipped, because the compiler doesn’t know how to evaluate string when using the + operator.

Flipping clauses throws error.

Adding parenthesis around the jennysNumber is string fixes the issue, because parenthesis have a higher operator precedence than the + operator.

output of is operator and + flipped with parenthesis (shows both question and value)

Pattern Matching with Switch Statements

Null and Dealing with Types

Null is an interesting case, because as shown during the runtime, it’s difficult to determine what type an object is.

Base Example

This code works exactly as how you think it should. Even though the type is string, the runtime can’t define it as such, and so it skips the first case, and reaches the second.

Adding a type object clause works exactly the same way

shows object case works same way

What about var. Case statements now support var as a proposed type in the statement.

If you mouse over either var or the variable name, the compiler will tell you what type it is.
show compiler knows what type it is.

Shows var case statement doesn't know type

It knows what the type is, but don’t let this fool you into thinking it works like the other typed statements though. The var statement doesn’t care that the runtime can’t determine the type. A case statement with the var type will always execute provided there is no condition forbidding null values when (o != null). Like before, it still can’t determine the type inside the case statement statement.

Why determine object type at compile time?

At any point in time (baring the use of dynamic), the compiler knows the immediate type of the variable. It could use this to directly point the correct case concerning the type. If that were true, it couldn’t handle the following scenario, or any concerning inheritance of child types.

shows is string

Personally, I would like to see either a warning or an error, that it’s not possible for type cases to determine if the variable is null case string s when (s is null), but as long as the code is tested and developers knows about this edge case, problems can be minimized.

All the examples can be found on github: https://github.com/kemiller2002/StructuredSight/tree/master/PatternMatchingQuirks_Standard

It’s OK, My eval is Sandboxed (No It’s Not)

The idea of using eval has always been in interesting debate. Instead of writing logic which accounts for possibly hundreds of different scenarios, creating a string with the correct JavaScript and then executing it dynamically is a much simpler solution. This isn’t a new approach to programming and is commonly seen in languages such as SQL (stored procedures vs. dynamically generating statements). On one hand it can save a developer an immense amount of time writing and debugging code. On the other, it’s power is something which can be abused because of it’s high execution privileges in the browser.

The question is, “should ever be used?” It technically would be safe if there is a way of securing all the code it evaluates, but this limits its effectiveness and goes against its dynamic nature. So with this, is there a balance point where using it is secure, but also flexible enough to warrant the risk?

For example purposes, we’ll use the following piece of code to show the browser has been successfully exploited: alert(‘Be sure to drink your Ovaltine.’); If the browser is able to execute that code, then restricting the use of eval failed.

In the most obvious example where nothing is sanitized executing the alert is trivial:

eval will treat any input as code and execute it. So what if eval is restricted to only execute which will correctly evaluate to a complete statement?

Nope, this still successfully executes. In JavaScript all functions return something, so calling alert and assigning undefined to total is perfectly valid.

What about forcing a conversion to a number?

This still executes also, because the alert function fires when it is parsed and its return value is converted to a string and then parsed.

The following does stop the alert from firing,

But this is rather pointless, because eval isn’t necessary. It’s much easier to assign the value to the total variable directly.

What about overriding the global function alert with a local function?

This does work for the current scenario. It overrides the global alert function with the local one but doesn’t solve the problem. The alert function can still be called explicitly from the window object itself.

With this in mind, it is possible to remove any reference to window (or alert for that matter) in the code string before executing.

This works when the word ‘window’ is together, but the following code executes successfully:

Since ‘win’ and ‘dow’ are separated, the replacement does not find it. The code works by using the first eval to join the execution code together while the second executes it. Since replace is used to remove the window code, it’s also possible to do the same thing to eval like so:

That stops the code from working, but it doesn’t stop this:

It is possible to keep accounting for different scenarios whittling down the different attack vectors, but this gets extremely complicated and cumbersome. Further more, using eval opens up other scenarios besides direct execution which may not be accounted for. Take the following example:

This code bypasses the replace sanitations, and it’s goal wasn’t to execute malicious code. It’s goal is to replace the JSON.parse with eval and depending on the application might assume that malicious code is blocked, because JSON.parse doesn’t natively execute rogue code.

Take the following example:

The code does throw an exception at the end due to invalid parsing, but that isn’t a problem for the attacker, because eval already executed the rogue code. The eval statement was used to perform a lateral attack against the functions which are assumed not to execute harmful instructions.

Server Side Validation

A great extent of the time, systems validate user input on the server trying to ensure harmful information is never stored in the system. This is a smart idea, because removing before storing it tries to ensure everything accessing potentially harmful code doesn’t need to make certain it isn’t executing something it shouldn’t (you really shouldn’t and can’t make this assumption, but it is a good start in protecting against attacks). With eval, this causes a false sense of security, because code like C# does not handle strings the same way that JavaScript does. For example:

In the first example, the C# code successfully removed the word ‘window’, but in the second, it was unable to interpret this when presented with Unicode characters which JavaScript interprets as executable instructions. (In order to test the unicode characters, you need to place an @ symbol in front of the string so that it will treat it exactly as it is written. Without it, the C# compiler will convert it.) Worse yet, JavaScript can interpret strings which are a mixture of text and Unicode values making it more difficult to search and replace potentially harmful values.

Assuming the dynamic code passed into eval is completely sanitized, and there is no possibility of executing rogue code, it should be safe to use. The problem is that it’s most likely not sanitized, and at best it’s completely sanitized for now.

Just write it here, I’ll handle the rest

It’s pretty common knowledge in a .NET console application using the following command will produce the following result.

Console output

On the surface, this doesn’t look like it’s that helpful. Console applications can immediately output a message, but most applications don’t run in the console, and those that do, run in a process not visible to the user. However, System.Console has a method which makes it ideal for logging: Console.SetOut. Passing in a TextWriter object changes the system’s default output behavior and pushes the entries to a different location.

The following example outputs the contents to “C:\temp\logfile.txt” instead of the default console itself.

Text output

This allows the application to change the destination based on the need, and the destination can be anywhere. OverrideThe output class must inherit the TextWriter class, but this doesn’t mean that it has to write to a file, screen, or something which ultimately outputs to a stream. Override the appropriate methods in the TextWriter and handle the data in any manner necessary. (Here are two examples: 1. XmlLogWriter, 2. DbLogWriter)

The real advantage comes when writing larger applications where the solution consists of multiple projects requiring a way to communicate information about the system. When using a standard logging framework, every project must have knowledge of the framework to use it. This becomes cumbersome if core projects are used across multiple solutions, or if you want to change the logging framework at some point in time in the future. System.Console is available to all .NET assemblies by default, and by using it (and Console.Error.Write(Line)), the system components do not need to reference anything additional.

Along with this, the Console allows SetOut to be called at any time in the application’s lifetime, so it’s possible to change the output stream while the program is executing. This allows for the output to be changed on the fly should the need arise.

Under The Mattress (or Compiled Code) is Not a Good Place to Hide Passwords

The question comes up from time to time about storing passwords in code, and is it secure. Ultimately, it’s probably a bad idea to store passwords in code strictly from a change management perspective, because you are most likely going to need to change it at some point in the future. Furthermore, passwords stored in compiled code are easy to retrieve if someone ever gets a hold of the assembly.

Using the following code as an example:

IL Spy Showing Password

So what about storing the password in some secure location and loading it into memory? This requires the attacker to take more steps to achieve the same goal, but it is still not impossible. In order to acquire the contents in memory (assuming the attacker can’t just attach a debugger to the running assembly), something will have to force the program to move the memory contents to a file for analysis.

Marc Russonivich wrote a utility called ProcDump which easily does the trick. Look for the name of the process (my process is named LookForPasswords) in task manager and run the following command:

This creates a file akin to LookForPasswords.exe_140802_095325.dmp and it contains all the memory information of the running process. To access the file contents you can either use Visual Studio or something like WinDbg.

WinDbg:
Open Dump File

After you open the dump file, you’ll need to load the SOS.dll to access information about the .NET runtime environment in Windbg.

LoadBySos

Once this is loaded, you can search the dump file for specific object types. So to get the statistics on strings (System.String):

String Statistics

This command will display a lot of information about the methods stored in the memory table, where the information lives in memory, etc. What you need to know is where the string data itself lives in memory. To access the statictics of a specific object

For example:

Show String List

Show Memory Address

Show Offset

In a string object, the actual data we want to get is located at the memory address plus the offset value (which is c). You can see this by accessing the specifics of the String object by inputting the following:

or in the example

Doing this for each string in the program would be rather tedious and time consuming considering most applications are significantly larger than the example application. WinDbg solves this issue, by having a .foreach command and this loops through all the string objects and prints out the contents.

Show all strings

To solve the issue of attacking the program by causing a memory dump, Microsoft added the System.Security.SecureString datatype in .NET 2.0. Although effective, it has some drawbacks to it, mainly that to effectively use it, you have to use pinned objects and doing this requires to check the unsafe flag in projects.

Unsafe Compile

Most organizations won’t allow unsafe code execution, so it makes using the SecureString pretty much pointless. With this in mind, the safest route to take for securing information is to not have it in memory at all. This removes the problem entirely. If it must reside in memory, then you can at least encrypt it while it’s stored there. This won’t solve every problem (if unsecured contents existed in memory, they still might), but it will at least reduce the possibility of it getting stolen.

The contents for the above example can be located on GitHub

Can I at Least Get Your Name Before We do This?

As applications grow in size and age, they can become difficult to navigate and unruly to manage. Most organizations give little thought to spending extra time to maintain existing code to a certain level of quality, and even less thought to spend money to rewrite or even refactor code to keep it maintainable. Fortunately, in .NET 4.5 Microsoft added a significant piece of functionality to at least help gather data on application flow. In the System.Runtime.CompilerServices namespace, 3 new attributes were added to help log calling function information.

  • CallerFilePathAttribute
  • CallerMemberNameAttribute
  • CallerLineNumberAttribute

Add the attributes to the appropriate parameters in a method to use them.

Console Output

The great part about this is that it will populate the information without modification to the calling methods’ source code. You can add parameters with these attributes to existing methods, and they will automatically populate. There are a couple of quirks though.

Optional Parameters

In order for the application to compile, you must mark these as optional parameters. After looking at how the information is resolved at compile time, this makes sense. If you notice in ILSpy, the attribute is added to the called method parameter, but the compiler generates the parameter values in the calling method. This means the compiler accessing the called method must be aware the attribute has been added to the parameter (meaning this feature will require a recompile of the assemblies calling these methods), and it must understand it needs to add the caller method information when invoking. An optional parameter is required to signify to the compiler that the source file is not missing parameter information (which would result in an error), and that the calling method expects the compiler to add the correct default values.

In addition to this, it’s a smart idea anyway, because not all .NET languages understand these attributes. For example, calling the C# method from F# doesn’t give the expected values. It inserts the default ones. F# doesn’t understand that when invoking this C# method, it should replace the optional parameter values with the calling ones.

F# Console Output

Namespaces

When using CallerMemberNameAttribute namespaces aren’t provided. This means that if you have methods with the same name in different classes or namespaces, there is no differentiation between them in the information passed to the called method. You can get around this by using the other two method parameters, which will give you the file of the calling method, and even the line where the called function was invoked, but without them there is a level of uncertainty.

The parameters can be overridden

If you explicitly set the parameter’s value, the calling method’s information won’t be sent.

Depending on the scenario, this can be both an advantage and a disadvantage. If you need to override it and pass in different information you can, but this also means you can’t add this attribute to an existing parameter and have it override what the calling method is already passing.

Reflection

Reflection, doesn’t work at all with this feature. Calling parameters are a compile time feature, and reflection is a runtime one. If you call the method with these attributes directly, the compiler won’t know to add them, so you’re on your own to supply the information. (Reflection with optional parameters will work by sending Type.Missing, but it won’t get you the caller values).

Speed

Actually, there is no slow down when using this, because it’s all resolved at compile time. Using ILSpy reveals there is nothing like reflection or other runtime resolutions involved. (This actually makes a lot of sense, because reflection doesn’t have information like the code file path or line numbers of the calling function, so it has to be resolved at compile time.):

The new feature is great, but it has it’s limitations. Considering that it’s possible to add the attributes to existing methods, and as long as the invoking methods are recompiled and the compiler is aware of how to insert the information, a wealth of information is available with very little work. All around, it’s a significant win for helping maintain applications.

You can all the project contents here.