Skip to content
← Back to Community
Secure your websites: Sanitize user HTML!
Profile icon
HarperframeInc

Stop making insecure pages from user input

Huh. That's a weird thing to worry about. Yet almost everyone forgets to sanitize user content.

Now it's everyone worry to never ever parse user HTML, but incase an event like this occurs, use sanitize-html!

This tool is literally your hero. Almost all JS markdown parsers
suggest sanitize-html to cleanup user input into something safe.

But why?

Did I not give you enough?

Okay, maybe not. But still - if you don't know, XSS is a way hackers can capture peoples cookies, inject code, and in general, steal personal info and hack websites. If code isn't sanitized, the hackers can make a script tag which scrape people's cookies (including login keys,personal info, .etc) and get you in trouble. Just a simple <script>alert("Gimme 500$")</script>, will show up for everyone.

Can't I remove the "<" tag?

You could, but it can be bypassed anyways. Plus the sanitize-html
method is better because it can limit tags, so the <script> tag can
be disable while <img> tags can still be used.

Okay your right, so how do I use santize-html?

It's really simple, but first of all you need to choose how
you are going to use sanitize-html.

Browser-side: If you want the browser to santize both user HTML
and out, you can use the sanitize-html CDN. I used this method for projects like Webby (website builder). I recommend using this for the server side though if your webserver is written in JS. If you are using a webserver written in another language, use the Browser-side method.

NodeJS: The recommended method. The server side santizes the
message, removing tags and making content safe for everyone. Use this
if you are writing your webserver in NodeJS. Here is the NPM link.

Browser-side

Okay, I'm suggesting that your writing a server in another language, like Python. If that's true, link the CDN to your HTML page.
If you still want to have the server sanitize HTML, look for a
package that can santize HTML.

Both Browser-side and NodeJS methods have the same code.

NodeJS

Okay, I'm suggesting that your writing a server in NodeJS. If that's true, install the package via NPM.

Both Browser-side and NodeJS methods have the same code.

Let's get started

So here's how you sanitize-HTML.

Step 1
Use the sanitizeHTML function.

var cleanHTML = sanitizeHTML(dirtyHTML);

and your done!
Well... that was short. But that doesn't mean you should stop reading.

Specify which tags you do and don't want.

sanitize-html provides this set of allowed tags/attributes.

allowedTags: [ 'h3', 'h4', 'h5', 'h6', 'blockquote', 'p', 'a', 'ul', 'ol', 'nl', 'li', 'b', 'i', 'strong', 'em', 'strike', 'abbr', 'code', 'hr', 'br', 'div', 'table', 'thead', 'caption', 'tbody', 'tr', 'th', 'td', 'pre', 'iframe' ], disallowedTagsMode: 'discard', allowedAttributes: { a: [ 'href', 'name', 'target' ], // We don't currently allow img itself by default, but this // would make sense if we did. You could add srcset here, // and if you do the URL is checked for safety img: [ 'src' ] }, // Lots of these won't come up by default because we don't allow them selfClosing: [ 'img', 'br', 'hr', 'area', 'base', 'basefont', 'input', 'link', 'meta' ], // URL schemes we permit allowedSchemes: [ 'http', 'https', 'ftp', 'mailto' ], allowedSchemesByTag: {}, allowedSchemesAppliedToAttributes: [ 'href', 'src', 'cite' ], allowProtocolRelative: true, enforceHtmlBoundary: false

Some people may not like having these default tags.
You can simply change it using:

var cleanHTML = sanitizeHTML(dirtyHTML, { allowedTags: ["b", "i", "em", "strong", "a"], allowedAttributes: { 'a':["href"] }, } );

The default list is applied for missing keys, so allowedSchemes is using the default list.


According to the documentation, you can simply add tags with:

var cleanHTML = sanitizeHtml(dirtyHTML, { allowedTags: sanitizeHtml.defaults.allowedTags.concat([ 'img' ]) });

Using sanitizeHTML.defaults, you can access the default settings
used by sanitizeHTML, which is nice if you simply wanted to remove
or add some few tags.


Want all tags?

var cleanHTML = sanitizeHtml(dirtyHTML, { allowedTags: false });

Setting allowedTags to false lets sanitizeHTML know that you want to
allow all tags.


Don't want any tags?

var cleanHTML = sanitizeHtml(dirtyHTML, { allowedTags: [] });

Like this suggests, having an empty list means nothing, and so no
tags are allowed.


That's mostly it.

Like all tutorials, this tutorial is here to give an understanding
why you should protect your site against XSS attacks and use libraries
like sanitize-html to protect your website.

sanitize-html is more advanced than you think. Read the documentation
to learn about more features such as allowed URL schemes, transform tags, and allowed CSS classes.

I created this tutorial to help guide people how to protect their website
from such malicious attacks. Better safe than sorry!

Voters
Profile icon
PikachuB2005
Profile icon
badst
Profile icon
Kookiez
Profile icon
zplusfour
Profile icon
EpicGamer007
Profile icon
Highwayman
Profile icon
DynamicSquid
Profile icon
HarperframeInc
Comments
hotnewtop
Profile icon
KumABaker

Je post is echt heel nuttig. Hartelijk dank voor het delen van deze informatie met ons. Ik ben een website-ontwikkelaar en mijn klant heeft de https://gamblingorb-be.com/online-gokken/ website met mij gedeeld en vertelde me dat hij santize-html wil gebruiken, maar ik heb het nog nooit eerder gedaan, daarom ben ik op zoek ervoor online en bedankt voor de korte uitleg. Nu ken ik het hele proces omdat je het heel goed hebt uitgelegd.

Profile icon
JimmieLarson

Thanks for the information.

Profile icon
zplusfour

Very helpful!