Quirks with Pattern Matching in C# 7

With C# 7, Microsoft added the concept of pattern matching by enhancing the switch statement. Compared to functional languages (both pure and impure), this seems to be somewhat lacking in a feature by feature comparison, however it is still nice in allowing a cleaner format of code. With this, there are some interesting quirks, that you should be aware of before using. Nothing they’ve added breaks existing rules of the language, and with a thorough understanding how the language behaves their choices make sense, but there are some gotchas that on the surface looks like they should function one way, but act in a completely different manner.

Consider the following example.

Shows

C# 7 now allows the use of a switch statement to determine the type of a variable. It as also expanded the use of is to include constants including null.

is can show if something is null : shows true

With these two understandings, which line executes in the following code?

Shows default code executed.

Based on the previous examples, its a reasonable conclusion that the one of the first two case statements would execute, but they don’t.

The is operator

The is operator was introduced in C# 1.0, and its use has been expanded, but none of the existing functionality has changed. Up until C# 7, is has been used to determine if an object is of a certain type like so.

This outputs exactly as expected. The console prints “True” (Replacing string with var works the exactly the same. Remember that the object is still typed. var only tells the compiler to figure out what type the variable should be instead of explicitly telling it.)

Is Operator String: True

What happens if the string is null? The compiler thinks its a string. It will prevent you from being able to pass it to methods requiring another reference type even though the value is explicitly null.

Type is null

The is operator is a run time check not a compile time one, and since it is null, the runtime doesn’t know what type it is. In this example, the compiler could give flags to the runtime saying what type it actually is even though it’s null, but this would be difficult if not impossible for all scenarios, so for consistency, it still returns false. Consistency is key.

Printing out True and False is nice, but it’s not really descriptive. What about adding text to describe what is being evaluated.

Is Type With Question, Question doesn't appear

Why didn’t the question appear? It has to do with operator precedence. The + has a higher operator precedence than is and is evaluated first. What is actually happening is:

This becomes clear if the clause is flipped, because the compiler doesn’t know how to evaluate string when using the + operator.

Flipping clauses throws error.

Adding parenthesis around the jennysNumber is string fixes the issue, because parenthesis have a higher operator precedence than the + operator.

output of is operator and + flipped with parenthesis (shows both question and value)

Pattern Matching with Switch Statements

Null and Dealing with Types

Null is an interesting case, because as shown during the runtime, it’s difficult to determine what type an object is.

Base Example

This code works exactly as how you think it should. Even though the type is string, the runtime can’t define it as such, and so it skips the first case, and reaches the second.

Adding a type object clause works exactly the same way

shows object case works same way

What about var. Case statements now support var as a proposed type in the statement.

If you mouse over either var or the variable name, the compiler will tell you what type it is.
show compiler knows what type it is.

Shows var case statement doesn't know type

It knows what the type is, but don’t let this fool you into thinking it works like the other typed statements though. The var statement doesn’t care that the runtime can’t determine the type. A case statement with the var type will always execute provided there is no condition forbidding null values when (o != null). Like before, it still can’t determine the type inside the case statement statement.

Why determine object type at compile time?

At any point in time (baring the use of dynamic), the compiler knows the immediate type of the variable. It could use this to directly point the correct case concerning the type. If that were true, it couldn’t handle the following scenario, or any concerning inheritance of child types.

shows is string

Personally, I would like to see either a warning or an error, that it’s not possible for type cases to determine if the variable is null case string s when (s is null), but as long as the code is tested and developers knows about this edge case, problems can be minimized.

All the examples can be found on github: https://github.com/kemiller2002/StructuredSight/tree/master/PatternMatchingQuirks_Standard

C# 7 Additions – Pattern Matching

C# 7 has started to introduce Pattern Matching. This is a concept found in functional programming, and although it isn’t fully implemented compared to F#, it is a step in that direction. Microsoft has announced they intend on expanding it in future releases.

Constant Patterns

The is keyword has been expanded to allow all constants on the right side of the operator instead of just a type. Previously, C#’s only valid syntax was similar to:

Now it is possible to compare a variable to anything which is a constant: null, a value, etc.

Behind the scenes, the is statement is converted to calling the Equals function in IL code. The following two functions produce roughly the same code (they call different overloads of the Equals function).

CheckIsNull

CheckEqualsNull

This can also be combined with other features allowing variable assignment through the is operator.

In Visual Studio Preview 4, the scoping rules surrounding variables assigned in this manner are more restrictive than in the final version. Right now, they can only be used within the scope of the conditional statement.

Switch Statements

The new pattern matching extensions have also extended and changed the use of case statements. Patterns can now be used in switch statements.

Like in previous versions, the default statement will always be evaluated last, but the location of the other case statements now matter.

In this example, case int n will never evaluate, because the statement above it will always be true. Fortunately, the C# compiler will evaluate this, determine that it can’t be reached and raise a compiler error.

The variables declared in patterns behave differently than others. Each variable in a pattern can have the same name without running into a collision with other statements. Just as before, in order to declare a variable of the same name inside the case statement, you must still explicitly enforce scope by adding braces ({}).

Pattern matching has a ways to go when compared to its functional language equivalent, but it is still a nice addition and will become more complete as the language evolves.

C# 7 Additions – Literals

A small, but nice chance in C# 7 is increased flexibility in literals. Previously, large numeric constants had no separator, and it was difficult to easily read a large number. For example, if you needed a constant for the number of stars in the observable universe (1,000,000,000,000,000,000,000), you’d have to do the following:

If you hadn’t caught the error, the constant is too short, and it’s difficult to tell looking at the numbers without a separator. In C# 7, it’s now possible to use the underscore (_) in between the numbers. So the previous example now becomes much easier to read, and it is easily recognizable the number is off.

The new version adds binary constants too. Instead of writing a constant in hex, or decimal, a constant can now be written like so:

C# 7 Additions – ref Variables

C# 7 expands the use of the ref keyword. Along with its previous use, it can now be used in return statements, and local variables can store a reference to the object as well. At first glance, the question is “What is the real difference between returning a ref variable, and setting it through an out parameter?” Previously you could set a variable passed into a function with ref (or out) to a different value. In C# 7, you can return the reference of a property, variable etc. and store that in a local variable for later use.

The following is an examples showing its expanded use.

As expected, the PersonInformation object is passed into the GetName function which returns a reference to the string property Name. This is then passed into the MakeCapitalized function which capitalizes the name “jenny” (making it “Jenny”) in the original PersonInformation object. Compare this to the example here showing how the previous version of C# would not allow the modification of the original property in the same scenario.

Classes vs Structs

If the PersonInformation is changed to be a struct (value type) instead of a class (reference type), the following code won’t work without a slight modification, but it is still completely possible.

Structs are passed by value meaning that passing a struct into a method creates a copy of it. Returning a reference to the struct’s property would return a reference to the copied struct and would go out of scope as soon as the method completes. There would be no point, and it would cause errors pointing to properties to objects which didn’t exist.

Caveats

With these new features there are some restrictions to it. Consider this. A string can be treated as an array of characters. With the new functionality, it should be possible to pass back a reference to a character location in that string and update it, because you have the reference to the character location in the string.

Fortunately, this isn’t allowed. The compiler prevents from it being a valid option, because if this were possible, it would break the string’s immutability and cause havoc with C#’s ability to intern strings.
ref string not allowed.

The compiler is also smart enough to not allow references to variables which fall out of scope. The following is also not allowed:

After the method exits someNumber no longer exists, and when another part of the application tries to access it, it won’t be available. (You could say this might not be the case if it were a reference type like a string, but it still wouldn’t matter, because all the reference has is a location to where the object is, not the actual object itself. This causes 2 problems: One, currently there is no way to get the value from the reference. Two, the object isn’t rooted, so it could still be garbage collected at any point in time.)

The compiler is also smart enough to trace the variable use through the calling methods. This is also not allowed:

C# 7 Additions – Out Variables

C# 7 removes the need for out variables to be predeclared before passing them into a function.

It also now allows the use of the var keyword to declare the variable type, because the compiler will infer the type based on the declared parameter type. This is not allowed when the compiler can’t infer the type because of method overloading. It would be nice if the compiler would attempt to infer it’s type based on the use later on in the method similar to F#’s inferred types, but this isn’t slated to be in the current release.

compiler confused because of method overloading.

In Visual Studio 15 Preview 4, the out variable isn’t working exactly as it will in the final release. Wild cards will hopefully be added so extraneous variables don’t need to be declared.

The following code won’t work until the scope restrictions on out variables is updated. (They have said they intend on doing this before the release.)

In this example, the scope is limited to the method call where the strings are set. To get it work currently, variable scope must be extended and can be like so:

The conditional statement wraps the variables and they can now be used in the Console.WriteLine. This will be corrected in the final release and won’t be necessary.

C# 7 Additions – Local Functions

In C# 7 it is now possible to create a function within a function termed a Local Function. This is for instances where a second function is helpful, but it’s not really needed in the rest of the class. It’s created just like regular functions except in the middle of another function.

Just like normal functions, you can create expression bodied members as well

Local variables in the outer functions are accessible, and it’s possible to embed local functions inside other local functions:

So how does it work? Looking at the IL code, the compiler has converted the internal function into a private static one inside the class.

IL Code showing private static function

The name is generated at compile time, so it is not accessible to other methods, but it is still possible to access it through reflection with the private and static binding flags.

reflection shows local function.

Someone I know asked what would be a good use case of Local Functions vs. Lambdas. Lambdas can’t contain enumerators, and by encasing an enumerations in a local function it allows others parts of the outer method to be eagerly evaluated. For example, if you have a method which takes a parameter and returns an enumeration, the evaluation of the parameter won’t occur until program starts to enumerate the collection. Encapsulating the enumeration in a local function allows the other parts of the outer function to be eagerly evaluated. You can find an example of the difference between using one and not using one here.

It’s OK, My eval is Sandboxed (No It’s Not)

The idea of using eval has always been in interesting debate. Instead of writing logic which accounts for possibly hundreds of different scenarios, creating a string with the correct JavaScript and then executing it dynamically is a much simpler solution. This isn’t a new approach to programming and is commonly seen in languages such as SQL (stored procedures vs. dynamically generating statements). On one hand it can save a developer an immense amount of time writing and debugging code. On the other, it’s power is something which can be abused because of it’s high execution privileges in the browser.

The question is, “should ever be used?” It technically would be safe if there is a way of securing all the code it evaluates, but this limits its effectiveness and goes against its dynamic nature. So with this, is there a balance point where using it is secure, but also flexible enough to warrant the risk?

For example purposes, we’ll use the following piece of code to show the browser has been successfully exploited: alert(‘Be sure to drink your Ovaltine.’); If the browser is able to execute that code, then restricting the use of eval failed.

In the most obvious example where nothing is sanitized executing the alert is trivial:

eval will treat any input as code and execute it. So what if eval is restricted to only execute which will correctly evaluate to a complete statement?

Nope, this still successfully executes. In JavaScript all functions return something, so calling alert and assigning undefined to total is perfectly valid.

What about forcing a conversion to a number?

This still executes also, because the alert function fires when it is parsed and its return value is converted to a string and then parsed.

The following does stop the alert from firing,

But this is rather pointless, because eval isn’t necessary. It’s much easier to assign the value to the total variable directly.

What about overriding the global function alert with a local function?

This does work for the current scenario. It overrides the global alert function with the local one but doesn’t solve the problem. The alert function can still be called explicitly from the window object itself.

With this in mind, it is possible to remove any reference to window (or alert for that matter) in the code string before executing.

This works when the word ‘window’ is together, but the following code executes successfully:

Since ‘win’ and ‘dow’ are separated, the replacement does not find it. The code works by using the first eval to join the execution code together while the second executes it. Since replace is used to remove the window code, it’s also possible to do the same thing to eval like so:

That stops the code from working, but it doesn’t stop this:

It is possible to keep accounting for different scenarios whittling down the different attack vectors, but this gets extremely complicated and cumbersome. Further more, using eval opens up other scenarios besides direct execution which may not be accounted for. Take the following example:

This code bypasses the replace sanitations, and it’s goal wasn’t to execute malicious code. It’s goal is to replace the JSON.parse with eval and depending on the application might assume that malicious code is blocked, because JSON.parse doesn’t natively execute rogue code.

Take the following example:

The code does throw an exception at the end due to invalid parsing, but that isn’t a problem for the attacker, because eval already executed the rogue code. The eval statement was used to perform a lateral attack against the functions which are assumed not to execute harmful instructions.

Server Side Validation

A great extent of the time, systems validate user input on the server trying to ensure harmful information is never stored in the system. This is a smart idea, because removing before storing it tries to ensure everything accessing potentially harmful code doesn’t need to make certain it isn’t executing something it shouldn’t (you really shouldn’t and can’t make this assumption, but it is a good start in protecting against attacks). With eval, this causes a false sense of security, because code like C# does not handle strings the same way that JavaScript does. For example:

In the first example, the C# code successfully removed the word ‘window’, but in the second, it was unable to interpret this when presented with Unicode characters which JavaScript interprets as executable instructions. (In order to test the unicode characters, you need to place an @ symbol in front of the string so that it will treat it exactly as it is written. Without it, the C# compiler will convert it.) Worse yet, JavaScript can interpret strings which are a mixture of text and Unicode values making it more difficult to search and replace potentially harmful values.

Assuming the dynamic code passed into eval is completely sanitized, and there is no possibility of executing rogue code, it should be safe to use. The problem is that it’s most likely not sanitized, and at best it’s completely sanitized for now.

Configuring Logic

This question talks about removing a switch statement so that every time the business logic changes concerning a multiplier value, the C# code itself doesn’t have to be changed and the application recompiled. I proposed loading the keys and multiplier values from a configuration file into a dictionary and accessing the data when needed. (The following example shows it loaded in the constructor for brevity.)

A comment in the answer mentioned the benefits of creating extra classes, and how the dictionary approach could not handle more advanced calculations should the need arise. With a slight modification, and some additional code, this no longer becomes a hinderance. Expression Trees allow the program to dynamically create functions and execute them as it would with compiled code.

Based on the question and the example above, the current equation has two parts, the travelModifier (which is determined by the mode of transportation) and the DistanceToDestination. These are multiplied together, and return a decimal. Completely abstracting this out into its own function (which then becomes the model to base the configurable functions from), would make the method look like:

Since the travel modifier already comes from the configuration file, it is unnecessary to pass that into the function, because when the application reads the configuration and creates the method, each entry will have the travelModifier value already coded into the function so that parameter can be removed, and an example function in C# would look like:

To accomplish this, each entry in the configuration file would need to have two parts, the method of travel (bicycle, bus, car, etc.), and the equation. The latter is a combination of the travelModifier constant, the distanceToDestination and operators (+,-,/,*). An entry in the file would look like this:

Before loading the configuration file, the dictionary which will hold the function and retrieve it based on the selected method of travel will need to be changed. Currently it has a string as the key and a double as the value:

Instead, it needs a function as the value.

Loading the contents from the configuration file has a few different steps. Retrieving and separating the parts, parsing the equation, and creating the method at runtime.

Loading the Configuration File and Separating the Parts

Parsing the Equation

It would be possible to parse the equation and immediately convert it to an Expression, but it’s normally easier to load it into an intermediate structure so data can be transformed and grouped into a usable structure first. The equation has three parts, and an enum can help distinguish between them.

and the class to hold the equation parts

In order to parse the equation, the program needs to determine what is an operator and what is a variable or constant and its execution order.

A Note About Math things

Execution Order

Consider the following: 2 + 4 / 2. At first glance, it looks like the answer is three, but that is incorrect. The multiplication and division have a higher operator order precedence and their calculations occur before addition and subtraction. This makes the actual answer 4. The C# compiler knows about order of operations and which happens first. When building the expression tree, the runtime doesn’t take this into account, and will execute each operation strictly from left to right. It is important to note this when creating and grouping the intermediate objects to form a tree with the execution order, so it is correct.

Making the Expression

The System.LINQ.Expressions.Expression is the class used to create the lambda expressions. The actual method to create the function is Expression.Lambda<T> and then call its compile function to turn it into a callable method.

The Lambda function requires two parameters, an Expression, and a ParameterExpression[]. The entries in the ParameterExpression[] are the parameters to the function and they are made by calling Expression.Parameter.

Expression Body

Each Expression object is a tree of Expression objects. The four methods used to create the operator functions (Expression.Add, Expression.Subtract, Expression.Multiply, and Expression.Divide) all take two Expression parameters (the left term and the right term), and each Expression can be one of three things, a constant (Expression.Constant), the supplied parameter (ParameterExpression), or another Expression.

With this, all that is necessary is to convert the EquationPart tree into an expression.

Additional Actions

It might be necessary to do additional actions in the expression, for example method’s output could be logged to the console. To do this, the Lambda Expression would now need to:

1. Calculate the result of the equation (calling the created equation).
2. Assign that value to a variable.
3. Write the variable contents out to the console.
4. Return the result stored in the variable.

Right now, the body of the Lambda Expression is the result of a single Expression object. All the actions culminate to a single result, but when adding logging, this changes. Calculating the result and logging it are separate unrelated actions. The Expression.Block groups Expressions together, and returns the value from the last executed Expression.

The first step is creating a variable using Expression.Variable it takes a Type and optionally a variable name.

Then assign the results of the body Expression to it:

Now the system can log the result, by using Expression.Call.

The Expression.Block method takes Expressions to be executed in the entered order. The only exception to this is the creation of the variable which much be passed into the method by a ParameterExpression[].

The full method with the console output looks like this:

If/Then

The methods use the double type resulting in the impossibility of a DivideByZeroException. Per the C# specification, it returns the value infinity.

To create a conditional statement use the Expression.Condition method which has three parameters (the Expression for the test, the true block, and the false block).

Test Condition

The test condition is an Expression, and the double type has a static method for checking for the infinity value. To use it, the Expression.Call method works just like it did with writing data to the Console.WriteLine.

True Block

If the condition is true (meaning that the value is infinity, then it should throw an exception indicating a problem. Expression has a method for throwing exceptions, Expression.Throw

Empty False Statement

A false statement isn’t necessary, because if the condition is false, it will continue to the next statement outside of the condition. The Expression.Condition will not allow null as the third parameter, so to have an empty false statement use Expression.Empty instead.

Try Catch

Instead of passing the exception to the calling method, a second option would be to log it first by wrapping the method contents in a try-catch block. The Expression.TryCatch method has two parameters: the expression which contains the body information in the try statement, and the CatchBlock. Expression.MakeCatchBlock has three parameters: the type of Exception the catch block is for, the ParameterExpression which allows the Expression to bind the Exception to a variable for use, and the Expression code inside the catch statement.

Expression.Rethrow

Expression.Rethrow has two method signatures. The first has not parameters, and the second has a parameter of type of Type. In this example, since it is the last statement in the catch block (the the statement in a block determines what is returned from the block), if you use Expression.Throw(), the application will return with this error: Body of catch must have the same type as body of try. This is saying that the the try and catch blocks must have the same return type. In the example, the try block returns type double, so the catch block must do the same. The overload for Expression.Throw(Type), tells the runtime “This catch statement will return this type if necessary.” Since it’s throwing the exception, it won’t ever return a value, but this tells the Expression generator this will be the intended behavior if an exception doesn’t occur.

Here are all the code examples.

Pushing Data

Consider the following two pieces of code:

and

Although they look like they are roughly the same, they produce two very different results. The first returns an enumeration of string and the second throw’s an InvalidOperationException.

The difference stems from how C# handles the yield statement. Both methods promise to return an object which implements IEnumerable<string>. In .NET 1 and 1.1, there was no yield statement, and to return a custom object (not an array) which implemented it, you had to create an object which satisfied the IEnumerable requirements. (Remember it’s not generic, because it was created before generics were added.)

When it was added in .NET 2.0, in order to satisfy the requirements of returning an object, the C# compiler turns the method which uses yield into an object with the IEnumerable<T> interface.

(Entire generated class).

The important thing to note is that the generated class implements IDispoable, and it is responsible for disposing of the SqlConnection and SqlCommand, not the original method. This means that as the IEnumerable<string> object returns through the calling methods, nothing will dispose them until the enumeration object itself is disposed. The other method does not do this, because the SqlConnection is injected and the using statement is outside of the GetNamesWithConnection’s control (and hence it is not included in the generated class the GetNamesWithConnection converts to). Once the enumerator returns from the method responsible for disposing the SqlConnection (in the above example InjectSqlConnection), the using statement’s scope exits and the SqlConnection’s Dispose method fires.

How to fix the InjectSqlConnection method?

The easiest solution is to ensure that all operations performed on the IEnumerable object which it retrieves should occur before the using statement completes and SqlConnection.Dispose fires. In many scenarios this isn’t and issue, but in the following example this isn’t a possibility.

This method is from a class which inherits the API Controller where the method’s job isn’t to act upon the data, but return the it so the invoking method can serialize and send it out the response stream.

In this scenario, the first option would be to take the the SqlConnection out of the using statement, and let the Garbage Collector (eventually) dispose of it. This is a bad idea, and really isn’t a viable solution. Not disposing of SqlConnections will eventually use up all the available ones in the connection pool.

The second option would be to manually enumerate through the returned enumeration.
change this:

to this:

This is a possible solution, but it is a lot more work on the part of writing the code (since you would have to do this to each method returning data) and more work on the part of the system (keeping track multiple enumerators (both the original and the new), etc.). It’s a viable option, but it’s not ideal.

A third option would be to convert the returned enumeration to an object which doesn’t require a SqlConnection to operate such as an Array or a List.

This works, because the ToList() method creates a List object which implements IEnumerable and the ToList method loops through the contents of the enumeration adding each item to the list. Lists (and Arrays) exist purely in memory and is why the SqlConnectionyield return statement which only loads the record it needs at that moment (this is referred to as Lazy Loading).

Request.RegiserForDispose

The ApiController object has a property named Request which returns an object of type HttpRequestMessage. This object has a method named RegisterForDispose which returns void and takes in an IDisposable object. Any object passed into this method will be disposed once the current request finishes which is after the contents have been added to the response stream.

Encasing the SqlConnection in a using statement no longer becomes necessary to ensure it gets disposed. This allows the SqlConnection’s creation and registration to dispose to be abstracted into it’s own method. As long as the system retrieves all connections from this method, it will be impossible to leave a connection undisposed.

Just write it here, I’ll handle the rest

It’s pretty common knowledge in a .NET console application using the following command will produce the following result.

Console output

On the surface, this doesn’t look like it’s that helpful. Console applications can immediately output a message, but most applications don’t run in the console, and those that do, run in a process not visible to the user. However, System.Console has a method which makes it ideal for logging: Console.SetOut. Passing in a TextWriter object changes the system’s default output behavior and pushes the entries to a different location.

The following example outputs the contents to “C:\temp\logfile.txt” instead of the default console itself.

Text output

This allows the application to change the destination based on the need, and the destination can be anywhere. OverrideThe output class must inherit the TextWriter class, but this doesn’t mean that it has to write to a file, screen, or something which ultimately outputs to a stream. Override the appropriate methods in the TextWriter and handle the data in any manner necessary. (Here are two examples: 1. XmlLogWriter, 2. DbLogWriter)

The real advantage comes when writing larger applications where the solution consists of multiple projects requiring a way to communicate information about the system. When using a standard logging framework, every project must have knowledge of the framework to use it. This becomes cumbersome if core projects are used across multiple solutions, or if you want to change the logging framework at some point in time in the future. System.Console is available to all .NET assemblies by default, and by using it (and Console.Error.Write(Line)), the system components do not need to reference anything additional.

Along with this, the Console allows SetOut to be called at any time in the application’s lifetime, so it’s possible to change the output stream while the program is executing. This allows for the output to be changed on the fly should the need arise.