It’s OK, My eval is Sandboxed (No It’s Not)

The idea of using eval has always been in interesting debate. Instead of writing logic which accounts for possibly hundreds of different scenarios, creating a string with the correct JavaScript and then executing it dynamically is a much simpler solution. This isn’t a new approach to programming and is commonly seen in languages such as SQL (stored procedures vs. dynamically generating statements). On one hand it can save a developer an immense amount of time writing and debugging code. On the other, it’s power is something which can be abused because of it’s high execution privileges in the browser.

The question is, “should ever be used?” It technically would be safe if there is a way of securing all the code it evaluates, but this limits its effectiveness and goes against its dynamic nature. So with this, is there a balance point where using it is secure, but also flexible enough to warrant the risk?

For example purposes, we’ll use the following piece of code to show the browser has been successfully exploited: alert(‘Be sure to drink your Ovaltine.’); If the browser is able to execute that code, then restricting the use of eval failed.

In the most obvious example where nothing is sanitized executing the alert is trivial:

eval will treat any input as code and execute it. So what if eval is restricted to only execute which will correctly evaluate to a complete statement?

Nope, this still successfully executes. In JavaScript all functions return something, so calling alert and assigning undefined to total is perfectly valid.

What about forcing a conversion to a number?

This still executes also, because the alert function fires when it is parsed and its return value is converted to a string and then parsed.

The following does stop the alert from firing,

But this is rather pointless, because eval isn’t necessary. It’s much easier to assign the value to the total variable directly.

What about overriding the global function alert with a local function?

This does work for the current scenario. It overrides the global alert function with the local one but doesn’t solve the problem. The alert function can still be called explicitly from the window object itself.

With this in mind, it is possible to remove any reference to window (or alert for that matter) in the code string before executing.

This works when the word ‘window’ is together, but the following code executes successfully:

Since ‘win’ and ‘dow’ are separated, the replacement does not find it. The code works by using the first eval to join the execution code together while the second executes it. Since replace is used to remove the window code, it’s also possible to do the same thing to eval like so:

That stops the code from working, but it doesn’t stop this:

It is possible to keep accounting for different scenarios whittling down the different attack vectors, but this gets extremely complicated and cumbersome. Further more, using eval opens up other scenarios besides direct execution which may not be accounted for. Take the following example:

This code bypasses the replace sanitations, and it’s goal wasn’t to execute malicious code. It’s goal is to replace the JSON.parse with eval and depending on the application might assume that malicious code is blocked, because JSON.parse doesn’t natively execute rogue code.

Take the following example:

The code does throw an exception at the end due to invalid parsing, but that isn’t a problem for the attacker, because eval already executed the rogue code. The eval statement was used to perform a lateral attack against the functions which are assumed not to execute harmful instructions.

Server Side Validation

A great extent of the time, systems validate user input on the server trying to ensure harmful information is never stored in the system. This is a smart idea, because removing before storing it tries to ensure everything accessing potentially harmful code doesn’t need to make certain it isn’t executing something it shouldn’t (you really shouldn’t and can’t make this assumption, but it is a good start in protecting against attacks). With eval, this causes a false sense of security, because code like C# does not handle strings the same way that JavaScript does. For example:

In the first example, the C# code successfully removed the word ‘window’, but in the second, it was unable to interpret this when presented with Unicode characters which JavaScript interprets as executable instructions. (In order to test the unicode characters, you need to place an @ symbol in front of the string so that it will treat it exactly as it is written. Without it, the C# compiler will convert it.) Worse yet, JavaScript can interpret strings which are a mixture of text and Unicode values making it more difficult to search and replace potentially harmful values.

Assuming the dynamic code passed into eval is completely sanitized, and there is no possibility of executing rogue code, it should be safe to use. The problem is that it’s most likely not sanitized, and at best it’s completely sanitized for now.

Project Oxford – Image Text Detection

Microsoft has a new set of services which use machine learning to extrapolate data from images and speech called Project Oxford. Each service has a public facing REST api allowing any program capable of communicating through HTTP to utilize it, and for smaller amounts of data, Project Oxford is free to use.

Microsoft classifies these services into three categories: vision, speech, and language. The vision section contains several different capabilities including facial recognition, facial emotion detection, video stabilization, and optical character recognition (OCR). It’s OCR api allows a system to send an image URL and in turn will return the text it detected. Since the service is system agnostic, it’s possible to create an HTML page (with a little help from a service like Imgur) which can upload an image and extract the text without using server side resources. (The completed example can be found here.)

How to create

Uploading the Image

The first step is uploading the image to a publicity accessible store. Imgur works well for this, because it’s easy to use and free. It requires an account and registering the “application”, but this is relatively quick and painless. The page to do so is found here, and with the api key, all that is left is to retrieve the image from the input and upload the file.

Uploading the File

HTML 5 has the input type: file. To upload the file when the user selects it, add an onchange event to the input.

Now the function uploadImage fires whenever the user select a new file.

With the help of the FileReader, the webpage can load the image into memory to send to the server. It has several methods for retrieving files such as readAsText and readAsBinaryString, but the one which allows the image to upload correctly is readAsDataURL as this uploads the image in the correct encoded format.

Reading the file is an asynchronous operation and requires a method to call when loading. There is a property in the FileReader object named onload and has event parameter, but it’s not necessary in this context, because the reader’s result object is available through closure, so omitting it is fine.

Sending the File to Imgur

Whether using JQuery, another framework, or plain JavaScript, the upload process is straight forward. This example is in JQuery.

The Imgur api exposes the https://api.imgur.com/3/image URL, and to upload an image, use POST and add the HTTP header Authorization with the client id: Client-ID application_client_id. The trickiest part concerns the file upload from the FileReader. The readAsDataURL returns more data than necessary and the contents look like

The comma and everything before it is extraneous and will cause an error, so it’s necessary to remove it. Using String.Split(‘,’) and picking the second item in the array is the easiest approach.

Posting the image to Imgur returns several pieces of data about the object, height, width, size, etc., but the key pieces are, link (which is the URL to the image), and deletehash (this will be used to remove the image once completed. The api documentation says in some cases the id can be used, but since this is an anonymous posting, the delete hash is necessary).

Data OCR

Project Oxford requires a login to grant the necessary api key. After signing up and logging in, the service portal lists a page with access to the various API keys (there is one for each service). The once necessary for OCR looks like:

get api key for ocr

To retrieve the image text information, POST to: https://api.projectoxford.ai/vision/v1/ocr. It requires the HTTP Header Ocp-Apim-Subscription-Key with the value being the client id retrieved from the key dashboard. The data contains the URL as the key, and there are two optional querystring parameters which the request can have: language and detectOrientation.

language

The parameter language specifies which language to use when performing the OCR. If it’s not provided, the service attempts a determination based on the image and return what it thinks is correct. This can be helpful, if the language is unknown, and goal of the OCR process is to extract the text and then translate it. (Unfortunately, Microsoft has disabled the features, mainly Cross Origin Request Sharing, for retrieving the authentication token necessary to create an application not having any server side processing in it’s Microsoft Translator API). Not including leaves the service up to guess and can provide incorrect results.

The parameter is the BCP-47 language code which follows ISO 639-1. (A lowercase two character code representing each language e.g. en for English, es for Spanish, etc.) At this time the API only supports about 20 languages, but Microsoft has said it is working to expand the list. The API documentation lists the supported languages under the language parameter.

detectOrientation

This is a boolean parameter indicating it should attempt to recognize the text and orient it to be parallel with the top of image’s bounding box. Sometimes this can help in how it groups the returned text.

Response

The service returns an object looking like this:

language

The language property is what the service thinks the language in the image is, and if there are multiple languages in the text, it still only returns one. The service makes its best determination and won’t necessarily pick the first it encounters.

TextAngle

TextAngle is the tilt the service thinks the image is set to, and orientation is the direction the top text is facing. If the text is facing 45 degrees to the right, the text angle would be “-45.000” and the orientation would be “Right”.

Regions

multilanguage The found text in the image is not lumped together when returned. There are separate entries for each section of text found. In the example to the right, Hola and Hello are found in two different areas, so there are two regions returned in the array. Each region has a property boundingBox which is a list of comma separated coordinates where the text region exists on the page. In each region there is a lines property which is an array of objects each with their own boundingBox and each line has a words property object array separating each word and also containing the boundingBox property.

Removing the Image

Removing the image is similar to the initial POST action. This time the action is DELETE and the URL has the deletehash appended to the end.

The Javascript code can be found here.

Minutes and Seconds

When dealing with dates in .NET, most applications use the System.DateTime struct to store and manipulate dates. It has a natural conversion from MSSQL Server’s DateTime datatype, and has little difficulty in translating into different data formats (JSON, XML, etc.)

For a while now JSON has been a standard in communicating between applications, web clients and servers, to and from database and so on, and serializing a DateTime object looks like this converting it into the ISO 8601 format:

and converting it back works exactly as expected:

Serializing the date and parsing it with JavaScript on some browsers (such as Chrome for Mac and Windows, and Safari) yields different results. The browser has moved the date 4 hours into the past, because JavaScript has assumed the time it parsed was in UTC.

Secondly, JavaScript based its interpretation off of an incorrect timezone assumption. In 1923, the majority of the United States stayed on standard time the entire year, and those that moved their clocks forward did so on April 29. Daylight Savings Time was not in effect, and throughout history, the use of it, and when it was applied, has varied. Even if the passed in date was meant to be in UTC, the conversion should have made it Mon March 19, 1923 19:00:00 GMT -0500 (EST).

This also means when serializing the date to JSON and sending it back to the server, it sends it as it’s assumed UTC date.

At first glance, this looks correct, but changing the date to 1923-05-20T00:00:00.000Z (adding the Z at the end, indicating the date is in UTC) yields 3/19/1923 7:00:00 PM showing that the .NET DateTime object is trying to apply the Time Zone Offset. Now the question is, “Why is .NET deserializing the date as 3/19/1923 7:00:00 PM and not 3/19/1923 8:00:00 PM like JavaScript does especially since the serialized date came from JavaScript after it interpreted the original date as UTC?” The answer is because JavaScript only interpreted the date as UTC and displayed it as the timezone the machine is set in. The date it had didn’t change, so when it serialized it to the ISO 8601 format, JavaScript only appended the Z thinking the date was already in UTC.

Sending the Time Zone Offset when serializing from .NET, fixes the inconsistencies in some instances, but not all of them.

Since the time zone wasn’t specified, .NET assumes it is the local time of the server and produces 2016-03-11T19:00:00-05:00. Loading that string into a JavaScript Date object produces the same results although in UTC:

One of the issues with creating DateTimes in .NET and not explicitly setting the timezone is that it can make incorrect assumptions. Setting the timezone to Indiana (East) and parsing the following date 1989-05-12T00:00:00.000Z yields 1989-05-11T20:00:00-04:00 showing the UTC conversion to Eastern Daylight Time. Indiana didn’t follow Daylight Savings Time until 2006, meaning the conversion is incorrect. (Even if it did do the conversion correctly, setting the Set Timezone Automatically flag in Windows 10 forces it to be Eastern Time)

Explicitly setting the timezone using the DateTimeOffset corrects the problem in .NET, but it doesn’t guarantee correctness across systems.

When the browser parses the date with the correct offset, it attempts to display the date with the offset it thinks it should be.

The data in the Date object is still correct, and exporting it will not alter it in any way. Its attempt to display the date in a local time is the issue. In the United States, the only dates that are affected are prior to 2007, because that is when the last time a change to the timezones occurred. (The start of Daylight Savings Time was moved ahead to the second Sunday in March, and then end extended to the first Sunday in November. If you load a date from the year 2004 into JavaScript in Chrome, the date will incorrectly show for all dates between March 14 and April 4.)

The real solution to handling dates is to use libraries specifically designed to handle timezones and Daylight Savings Time (MomentJs for JavaScript and NodaTime for .NET). These significantly reduce the inconsistencies found in using the built in date and time conversions found in .NET and in Browsers, but even these can have mistakes in them.

JSIooa nvrtai Sncgtpi -> Sorting In JavaScript

JavaScript has built in functionality to sort arrays, but it doesn’t always work the way you think it would.

Take the following array:

Using the Array.Sort function (which all arrays have because of inheritance)

Orders the array in the following order:

Default Sort Running Example

Even though the array is numerical, the default Array.Sort places the 15 first in the series. The default comparison function sorts values by their Unicode characters and not numerical ones. In short, it treats them like strings.

There is an easy fix, the Array.Sort method takes a function as a parameter which allows you to define how the comparison elements are treated:

Now the sort order comes out as:

Custom Sort Running Example

Great! Now the numbers are sorting by their numerical value. Now what if you want the undefined values to appear first in the array? This should be an easy change:

The sort order? It comes out exactly as before:

Put Undefined First Example

So what’s going on?

JavaScript Sort Order Specification

If you look at steps 11 and 12 in the sort order steps in the JavaScript (ECMA Script) specification:

If x is undefined, return 1.
If y is undefined, return −1.

JavaScript automatically handles undefined values when comparing elements in an array. There is no way to override this with a custom comparison function, because the undefined check happens before the comparison is called.

There is a way around this predicament, although it’s not ideal. The current comparison function sorts the numbers in ascending order, and automatically forces undefined values to be last. By changing the way the comparison to put the values in descending order, it forces the elements in the array to be in the correct position in regards to one another but in reverse. To fix the reverse order problem, simply call Array.Reverse() and the order comes out the following way:

This solution causes extra work, and with large datasets the amount of extra time could become noticable, but it does work in a pinch. The other solution would be to completely ignore the Array.sort method and write a custom one.

Sort In Reverse

Language Comparison Euler Problem One

Question:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

F#

F# Example One

or

F# Example Two

C#

C# Example One

or

C# Example Two

JavaScript

JavaScript Example One

SQL

or

PowerShell

or

Where’s The Scope

When I started programming, it was customary to declare the variables at the top of the method. Many people stated that it was clearer to declare everything at the beginning and in the order it was used (or alphabetically for easier searching), and some languages, such as Delphi (Turbo Pascal) even required it. As the years passed over the past fifteen years, this has been a debated topic.

From my experience, the shift has been towards only declaring a variable just before needing it. In truth this makes a lot of sense, it allows the compiler to enforce scope and allows programmers the luxury of having it not compile when a variable is used when it shouldn’t be. This comes with the drawback that variable declarations become more difficult to find when searching through. If it’s declared at the top, then it’s always in the same place, although in truth modern IDEs alleviate this problem as it’s normally a single key stroke to find its declaration.

So, what is the correct answer between the two styles? As a blanket rule, the answer is: they’re both wrong. Where to declare variables is significantly different than what to name them. Most languages don’t care about what a variable is named. As long as they follow certain rules (don’t start with a number in some, or must start with a $ in others), the compiler/interpreter doesn’t really do much else with it. Where they are declared though can cause significant changes to how the system functions, and most developers forget that between languages this can make a huge difference.
Take C# and JavaScript for example. They look somewhat the same. Are they? No, they aren’t really even close. In C# variables are created at the branch level:

Running Example

Non-running example

In JavaScript they are declared at the function level:

Running Example

The second alert shows “undefined” instead of “I’m Global” even though it’s at the top of the function, because JavaScript moves all variable declaration to be at the top of the method regardless of where they are declared. (Instantiating the variable when it’s declared in the branch doesn’t initialize it until the branch either. Only the declaration is moved to the top of the function).

Believing that all languages work the same, because one can write similar syntax with error can cause major problems. The language that trips a lot of people up and how they should be declared is SQL. Take the following example:

Here’s the output:

Notice how it only prints out 10 even though the counter is being updated each loop? If the @LocalVariable was actually initialized each time the loop executed, it would have counted down to 1 because it would be set back to NULL each time.

What you have to remember, is that even though you wrote it a certain way doesn’t mean that the compiler/interpreter will output the exact same instruction set you thought it would. Assuming a variable will behave a certain way without testing or explicitly setting it, can be the difference between working code, and spending several hours tracking down a problem.