Simple HTML sanitizer in Javascript [closed]
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 months ago.
The community reviewed whether to reopen this question 6 months ago and left it closed:
Improve this questionOriginal close reason(s) were not resolved
I'm looking for a simple HTML sanitizer written in JavaScript. It doesn't need to be 100% XSS secure.
I'm implementing Markdown and the WMD Markdown editor (The SO master branch from github) on my website. The problem is that the HTML shown in the live preview isn't filtered, like it here on SO. I am looking for a simple/quick HTML sanitizer written in JavaScript so that i can filter the contents of the preview window.
No need for a full parser with complete XSS protection. I'm not sending the output back to the server. I'm sending the Markdown to the server where I use a proper, full HTML sanitizer before I store the result in the database.
Google is being absolutely useless to me. I just get hundreds of (often incorrect) articles on how to filter out javascript from user generated HTML in all kinds of server-side languages.
UPDATE
I'll explain a bit better why I need this. My website has an editor very similar to the one here on StackOverflow. There's a text area to enter MarkDown syntax and a preview window below it that shows you how it will look like after you submitted it.
When the user submits something, it is sent to the server in MarkDown format. The server converts it to HTML and then runs a HTML sanitizer on it to clean up the HTML. MarkDown allows arbitrary HTML so I need to clean it up. For example, the user types something like this:
<script>alert('Boo!');</script>
The MarkDown converter does not touch it since it's HTML. The HTML sanitizer will strip it so the script element is gone.
But this is not what happens in the preview window. The preview window only converts MarkDown to HTML but does not sanitize it. So, the preview window will have a script element.This means the preview window is different from the actual rendering on the server.
I want to fix this, so I need a quick-and-dirty JavaScript HTML sanitizer. Something simple with basic element/attribute blacklisting and whitelisting will do. It does not need to be XSS safe because XSS protection is done by the server-side HTML sanitizer.
This is just to make sure the preview window will match the actual rendering 99.99% of the time, which is good enough for me.
Can you help? Thanks in advance!
We've developed a simple HtmlSantizer and opensourced it here: https://github.com/jitbit/HtmlSanitizer
Usage
var result = HtmlSanitizer.SanitizeHtml(input);
[Disclaimer! I'm one of the authors!]
Another hint: as of May 2021 there is am upcoming Sanitizer API in Firefox.
const inputString = 'Some text <b><i>with</i></b> <blink>tags</blink>,, including a rogue script <script>alert(1)</script> def.';
const result = new Sanitizer().sanitizeToString(inputString);
console.log(result);
// Logs "Some text <b><i>with</i></b>, including a rogue script def."
(MDN example)
See: https://developer.mozilla.org/en-US/docs/Web/API/HTML_Sanitizer_API
If this feature is accepted by other vendors as well, it might help us get rid of JS-sanitizer-implementations.
Here is a 2kb (depends on Snarkdown, which is a 1kb markdown renderer, replace with what you need) vue component that will render escaped markdown, optionally even translating B & I tags for content that may include those tags with formatting...
<template>
<div v-html="html">
</div>
</template>
<script>
import Snarkdown from 'snarkdown'
export default {
props: ['code', 'bandi'],
computed: {
html () {
// Convert b & i tags if flagged...
const unsafe = this.bandi ? this.code
.replace(/<b>/g, '**')
.replace(/<\/b>/g, '**')
.replace(/<i>/g, '*')
.replace(/<\/i>/g, '*') : this.code
// Process the markdown after we escape the html tags...
return Snarkdown(unsafe
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''')
)
}
}
}
</script>
As a comparison, vue-markdown is over 100kb. This won't render math formulas and such, but 99.99% of people won't use it for those things, so not sure why the most popular markdown components are so bloated :(
This is safe to XSS attacks and super fast.
Why did I use '
and not '
? Because: Why shouldn't `'` be used to escape single quotes?
And now for something completely different, but related...
Not sure why this hasn't been mentioned yet... but your browser can sanitize for you.
Here is the 3-line HTML sanitizer that can sanitize 30x faster than any JavaScript variant by using the assembly language version that comes with your browser... This is used in Vue/React/Angular and many other UI frameworks. Note this does NOT escape HTML, it removes it.
const decoder = document.createElement('div')
decoder.innerHTML = YourXSSAttackHere
const sanitized = decoder.textContent
As proof this method is accepted and fast, here is a live link to the decoder used in Vue.js which uses the same pattern: https://github.com/vuejs/vue/blob/dev/src/compiler/parser/entity-decoder.js
精彩评论