HTML+JS injection hidden gotcha

When you create websites with an ability to upload texts by visitors, you always need to escape special characters (tags, ampersands, quotes, etc...), but this is quite common knowledge. Lets go a bit deeper... For this example I will use PHP, JavaScript and XHTML fragments (PHP has nothing to do with this bug).

Lets say we need to create a simple list of links that would change text in thediv container depending on which link you click. This text has been uploaded to our page by visitors some time ago, so it might contain any imaginable crap...

What we might do to solve this is to create a JS function in our XHTML script block that would take a single text argument and change text in that div container when called. Something like this:

function showText( text ) {
var el = document.getElementById( 'div_id' );
el.innerHTML = text;

Everything seems reasonable so far. Now we need to call this function from HTML link, so we put onclick event there:

.... onclick="showText( ---TEXT WILL BE HERE--- );" ....

This text we want to pot into those brackets is stored in our database, so we use PHP to get it out of there and by using a template engine or inline PHP code we put it into those brackets like this:

.... showText( '' ); ....

Note that htmlspecialchars() function was used to get rid of dangerous HTML symbols. We expect to have our text obfuscated and if we open the website over some browser, in the source dump we will see that our text has actually been successfully filtered. Note that if you will attempt to view source via Firefox or some other popular browsers, you might see a modified source code. In this case it is better to use Curl or similar programs to be able to see the actual source. I have found out that Konquerror browser also does not modify the source so you may try using it.

Everything seems just fine until... Until we press the link. Text that appears in div container hasn't been fully escaped! Why? Well, it was unescaped by the browser itself while running JS code. Cool, huh?

Now, why is this dangerous you might ask. Well, it opens the way to inject links, make XHTML (HTML) invalid, inject JS that would do nasty things in your website... JS may even steal your visitor's session id and help hacker to gain control over the user's account...

I don't know whether this is a bug of many browsers or is this a feature (every bug that can not be fixed is a feature in this case), but I would recommend not to step into this sh...

Also, please comment this post and correct me if I am wrong.

No comments: