Learn to Code via Tutorials on Repl.it!

← Back to all posts
PSA: You should sanitize user input
SixBeeps (5221)

If I had a nickel for the amount of chat rooms I've used that simply don't sanitize, well, I'd have at least 15 cents. That's not a lot of nickels, but it's 3 nickels too many.

What even is sanitation?

When you have an application like a chat room, you have the user input what they'd like to send, then display it for everyone else to see. Now, usually this is text, but if left unsanitized, they could just as easily send over some HTML code too. This doesn't sound too bad until you realize you can also send over JavaScript code as well. That's not good.

So, to prevent this, the programmer should sanitize their user input. Sanitation is the process of turning some HTML code into something that doesn't get parsed as HTML. There are many ways of doing this.

The hacky way

The hacky way involves replacing characters in a user's message. There are characters that look like the greater-than and less-than signs (> and < respectively) that won't get turned into HTML tags. Feel free to copy and paste these into your project.


The efficient way

JS has three similar values for tags: innerHTML, innerText, and textContent (thank you @AdCharity for telling me about this). If you can, use either innerText or textContent because your browser will force this value to not be turned into HTML.

The reason why you wouldn't want to use this is if you weren't using innerHTML in the first place. I don't see any other way than to use these two, but you never know ¯\_(ツ)_/¯


For the last time, SANITIZE

Edit: Relevant xkcd

robm99x (2)

Shouldn't need to say this ... but server-side validation and sanitizing is required for all web application requests (e.g. POST, PUT, more). Never trust any request data without validation/sanitizing. Ever.

jhash (9)

In chat room boxes? Wow, you could steal a lot of cookies lol


ooo, bobby tables

SixBeeps (5221)

@TaylorLiang yes, lil bobby tables, killer of sql databases


Sanitization 2: Fun in PHP


@SixBeeps php not fun to learn.


@SixBeeps PHP fun when u see a scammer website
but that it


@SixBeeps php no t fun

AdCharity (1322)

I'd say the best form of sanitation is to use .textContent . Basically the html tags show but don't do crap.


@AdCharity sir, you are under arrest for using the word: "crap"

SixBeeps (5221)

@AdCharity textContent and innerText function the same with some minor differences, which can be seen here.

AdCharity (1322)

@ipastrano :/ what a sad way to go about life.

SixBeeps (5221)

@AdCharity ayyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

AdCharity (1322)

@SixBeeps you're going to surpass me soon cause I haven't been creating much content lately (or checking out the feed)

SixBeeps (5221)

@AdCharity Honestly I've just been farming the Ask board a bunch

oignons (68)

@AdCharity You and your cycles! Now I have to change the post!!!!(Kidding, not angry, it's cool that you have a lot of cycles

SixBeeps (5221)

@AdCharity Guess who just reached 769?

AdCharity (1322)

@SixBeeps nah you've already surpassed me

StringentDev (229)

@SixBeeps Did not stop this one

<style>*{color:white; background-color:white; border:none; transform: skewY(90deg);} 
StringentDev (229)

a good service that can sanitise inputs and can even block attackers is sqreen.com.
beware: enable all other options BUT CSP (Content Security Protection) unless you want no JS or CSS.

cs906941 (3)
LiamDonohue (294)



@SixBeeps has shown us an amazing was to sanitize lol

pyelias (2639)

One sanitation problem I've seen is using a regex like <.*?> (without DOTALL) to remove html tags. This doesn't match tags containing newlines, carriage returns, line separators, or paragraph separators. It also turns <abc def=">"> into "> which, while safe (as far as I know), can be unexpected.

SixBeeps (5221)

@pyelias I don't know too much RegEx, but is it possible to continue a matched expression to continue beyond the " marks, making it work?

AdCharity (1322)

@SixBeeps regex matches patterns regardless of content... there’s probably not a one size fits all regex but I would probably just replace script tags with their equivalent html symbol using regex


SixBeeps (5221)

@AdCharity The problem is that it can match an incomplete tag if there is a > in an attribute string. My question is, how can this be prevented?

AdCharity (1322)

@SixBeeps ... Replace all characters with their approprite character entity. You could do so using:

function makeEntities(rawStr) {
    return rawStr.replace(/[\u00A0-\u9999<>\&]/gim, function(i) {
    return '&#'+i.charCodeAt(0)+';';
//credits to... some guy on my team did it