A bug in the Sundown and Redcarpet markdown parsers may lead to XSS
Published: 2015-04-07
Preface
In early February 2015, I reported an XSS vulnerability in HackerOne itself. After some investigation, we determined that the vulnerability was due to a bug in version 3.2.2 of the Redcarpet markdown parser … which was due to a bug in the autolink feature in version 1.16.0 of the Sundown markdown parser that Redcarpet was based off of.
In short, text like this:
_danlec_@danlec.com
… isn't handled correctly, and would be rendered as something like
<em>dan<a href="mailto:_danlec_@danlec.com">_danlec_@danlec.com</a>
… and in certain conditions that bug could allow insertion of malicious tags and event handlers.
The bug was present in several other libraries that were based off of Sundown, so while HackerOne was able to mitigate the issue quickly, we were unable to disclose the issue until new library versions were available.
Since a fixed version of Redcarpet is now available, the report was publicly disclosed today.
Curious if a markdown library is potentially vulnerable?
Not all libraries are vulnerable (obviously); it's possible you have nothing to worry about. You can check by seeing how
_nothing_@danlec.com
is rendered.
If you get something like not_nothing_@danlec.com
(or no_nothing_@danlec.com
) then your parser has the
broken behavior that can cause the bug, and you should look into
updating it.
(If you're only using your library to format markdown you've written yourself, then there isn't as much cause for concern … but many sites use affected libraries to format markdown from untrusted sources, and they're the ones that should be worried.)
Show me the bug report!
Sure, it's right here
Cast of Characters
HackerOne is a "Security Response and Bug Bounty Platform" that was "[built] from the ground up with security as [a] top priority". The team that built it includes several security engineers and penetration testers, so I'm particularly excited when I'm able to find a vulnerability in their site. They use Redcarpet to render the markdown in the bugs submitted by security researchers.
Redcarpet is a popular Ruby library for Markdown processing. Internally it uses Sundown.
Sundown is a (deprecated) C markdown parser. It hasn't been touched in several years, but it has been used as the basis for several other markdown libraries
I'm a developer at Trello and I run Trello's bug bounty program on HackerOne. Occasionally I'll try to find security issues in the applications that I use.
Background
About a week before submitting this bug, I'd submitted the first XSS vulnerability found in HackerOne itself.
Finding the bug
Based on my earlier report, I knew that I was able to find a security issue with how links were processed, so I kept playing around with other things that were rendered as links.
One thing that is commonly linkified in markdown is an email address, so I was playing around with that. I quickly realized that HackerOne didn't actually link email addresses, but it did appear to recognize them.
For example, while <foo>
would render as
<foo>
, <daniel@example.com>
would render as daniel@example.com
.
While trying out different things, I noticed that something like
_danlec_@danlec.com
was being rendered as
<u>da_danlec_@danlec.com
That was … a completely unexpected output. Somehow that
da
part was being repeated, and the
<u>
tag wasn't being closed properly.
Investigating the bug
I tried several more inputs to see if I could figure out what was going on
_danlec_@danlec.com -> <u>da_danlec_@danlec.com
_danle_@danlec.com -> <u>da_danle_@danlec.com
_danl_@danlec.com -> <u>da_danl_@danlec.com
_dan_@danlec.com -> <u>da_dan_@danlec.com
_da_@danlec.com -> <u>da_da_@danlec.com
Okay, weird, but so far it seems like the output is the same as
the input, but with a <u>da
at the front.
(Actually, there were also surrounding <p>
tags,
but I've left those out because they aren't relevant.)
_d_@danlec.com -> <u>d<_d_@danlec.com
Wha!?
That of course seemed completely wrong, plus now there was the
start of a _d_@danlec.com
tag on the output.
I wish I could say that at this point I knew exactly what was wrong, but in reality I just tried a bunch of different similar inputs until I observed that
_http://danlec_@.1
rendered as
<p><u><a href="http://danlec">http://danlec<danlec_@.1</p>
and
_http://danlec_@.1 foo=bar
rendered as
<p><u><a href="http://danlec">http://danlec<danlec_@.1 foo=bar</p>
So at this point I was able to inject an arbitrary tag with arbitrary attributes. I didn't know how it worked, but I knew I could do something bad.
Weaponizing the bug
So maybe you've been thinking "there's no such thing as a
<danlec_@.1>
" element! And you're right …
however browsers will still go ahead and render unknown elements as
inline elements. They'll also use any styling or event handlers
they're given.
Since I could set the style
attribute on this
element, I could make it bigger, and I could use the
onmouseover
attribute to execute script.
The input I ended up with was
_http://danlec_@.1
style=background-image:url(... data uri ...);background-repeat:no-repeat;display:block;width:100%;height:100px;
onclick=alert(unescape(/Oh%20No!/.source));return(false);//
… which rendered as a big red box that said "Oh No!" and which could do an alert if you clicked on it … and happened to be using a browser that didn't have support for HackerOne's strict content security policy. (Unfortunately I didn't find a CSP bypass until later…)
I submitted the report, and HackerOne quickly responded, investigated and reported back that they'd "disabled [their] automatic linking functionality to address the issue".
I figured that a bounty and public disclosure were imminent … but it turned out I was overly optimistic, and several weeks went by without the issue being closed.
Digging deeper
After several weeks, I revisited the issue. Reading through HackerOne's response again, I realized that the issue was in the library that they were using to parse the markdown. Because of that, it wasn't as simple as fixing a bug in their own code.
Usually I treat websites I'm investigating as black boxes, and don't spend much time thinking about what's actually going on behind the scenes, but at this point I was pretty curious. I quickly determined (by googling "hackerone markdown") that HackerOne used Redcarpet to render markdown, so I downloaded the latest version of the library to play with.
I verified that Redcarpet did in fact have strange output for an
input like _danlec_@danlec.com
, and then I looked
through the source to see if I could figure out why.
After some digging, I realized the issue came from this function:
size_t
sd_autolink__email(
size_t *rewind_p,
struct buf *link,
uint8_t *data,
size_t max_rewind,
size_t size,
unsigned int flags)
{
size_t link_end, rewind;
int nb = 0, np = 0;
for (rewind = 0; rewind < max_rewind; ++rewind) {
uint8_t c = data[-rewind - 1];
if (isalnum(c))
continue;
if (strchr(".+-_", c) != NULL)
continue;
break;
}
if (rewind == 0)
return 0;
for (link_end = 0; link_end < size; ++link_end) {
uint8_t c = data[link_end];
if (isalnum(c))
continue;
if (c == '@')
nb++;
else if (c == '.' && link_end < size - 1)
np++;
else if (c != '-' && c != '_')
break;
}
if (link_end < 2 || nb != 1 || np == 0 ||
!isalpha(data[link_end - 1]))
return 0;
link_end = autolink_delim(data, link_end, max_rewind, size);
if (link_end == 0)
return 0;
bufput(link, data - rewind, link_end + rewind);
*rewind_p = rewind;
return link_end;
}
and how it was used by this function
static size_t
char_autolink_www(struct buf *ob, struct sd_markdown *rndr, uint8_t *data, size_t offset, size_t size)
{
struct buf *link, *link_url, *link_text;
size_t link_len, rewind;
if (!rndr->cb.link || rndr->in_link_body)
return 0;
link = rndr_newbuf(rndr, BUFFER_SPAN);
if ((link_len = sd_autolink__www(&rewind, link, data, offset, size, 0)) > 0) {
link_url = rndr_newbuf(rndr, BUFFER_SPAN);
BUFPUTSL(link_url, "http://");
bufput(link_url, link->data, link->size);
ob->size -= rewind;
if (rndr->cb.normal_text) {
link_text = rndr_newbuf(rndr, BUFFER_SPAN);
rndr->cb.normal_text(link_text, link, rndr->opaque);
rndr->cb.link(ob, link_url, NULL, link_text, rndr->opaque);
rndr_popbuf(rndr, BUFFER_SPAN);
} else {
rndr->cb.link(ob, link_url, NULL, link, rndr->opaque);
}
rndr_popbuf(rndr, BUFFER_SPAN);
}
rndr_popbuf(rndr, BUFFER_SPAN);
return link_len;
}
… when called by this function
/* parse_inline • parses inline markdown elements */
static void
parse_inline(struct buf *ob, struct sd_markdown *rndr, uint8_t *data, size_t size)
{
size_t i = 0, end = 0;
uint8_t action = 0;
struct buf work = { 0, 0, 0, 0 };
if (rndr->work_bufs[BUFFER_SPAN].size +
rndr->work_bufs[BUFFER_BLOCK].size > rndr->max_nesting)
return;
while (i < size) {
/* copying inactive chars into the output */
while (end < size && (action = rndr->active_char[data[end]]) == 0) {
end++;
}
if (rndr->cb.normal_text) {
work.data = data + i;
work.size = end - i;
rndr->cb.normal_text(ob, &work, rndr->opaque);
}
else
bufput(ob, data + i, end - i);
if (end >= size) break;
i = end;
end = markdown_char_ptrs[(int)action](ob, rndr, data + i, i, size - i);
if (!end) /* no action from the callback */
end = i + 1;
else {
i += end;
end = i;
}
}
}
I've marked up some bits that were interesting, but here's basically what it tries to do:
- Process any inline markdown (bold, italic, etc) or normal text and output the corresponding HTML
- If an
@
symbol is encountered, check to see if it's the @ in the middle of an email address. If it is, then remove the part of the email address that we had output before we got to the@
(i.e. "rewind"), and output an<a>
tag with amailto:
- If there's still more input, goto step 1
So you can imagine how that would play out for an input like this:
This **is** _a_ test@danlec.com
- First, we process
This
; it's plain text so addThis
to the output. - Now we handle the
**is**
, which is bold, so we add<strong>is</strong>
to the output. So far, we've outputThis <strong>is</strong>
. - Now the
_a_
is underlined, so we add<u>a</u>
to the output and haveThis <strong>is</strong> <u>a</u>
- Now we handle the
test
. It looks like plain text, so we add it to the output and end up withThis <strong>is</strong> <u>a</u> test
- When we encounter the
@
symbol, we realize that we're in the middle of an email address! Because the beginning of the email address is 4 characters long, we chop that off the output (leavingThis <strong>is</strong> <u>a</u>
) and then output a link to the email address, ending up withThis <strong>is</strong> <u>a</u> <a href="mailto:test@danlec.com">test@danlec.com</a>
But what happens when we're processing one of the inputs I
mentioned earlier, something like _http://danlec_@danlec.com
foo=bar
?
- First, we process the
_http://danlec_
. That looks like an underlined link, and we end up outputting<u><a href="http://danlec">http://danlec</a></u>
- Now we see the
@
. We decide that that's the middle of the email addressdanlec_@danlec.com
. The first part of the email address,danlec_
is 7 characters long, so we chop that many characters off the output (forgetting that some of the characters are parts of HTML tags), leaving<u><a href="http://danlec">http://danlec<
(uh oh!) and then add the link to the email address to get<u><a href="http://danlec">http://danlec<<a href="mailto:danlec_@danlec.com">danlec_@danlec.com</a>
- The
foo=bar
looks like normal text, so we output it directly and get<u><a href="http://danlec">http://danlec<<a href="mailto:danlec_@danlec.com">danlec_@danlec.com</a> foo=bar
In HackerOne's case however, they chose not to render email addresses as links, so the output is just
<u><a
href="http://danlec">http://danlec<danlec_@danlec.com
foo=bar
So, the root cause for the issue was that the rewind performed when processing an inline element was allowed to step back into previously output elements, and an assumption was being made that the rewind would only contain plain text.
Once I understood what the bug was, I was able to write a patch that addressed the issue, and included that in the report. (I didn't want to make a public pull request, since that would effectively disclose the issue…) The fix was to limit the rewinding so it would never move into text that had already been output as an element.
After that, it still took a while for the change to make it into Redcarpet, but it eventually did (and they even made sure I got credit for it!).
HackerOne's response
While HackerOne was quick to mitigate the issue, it took a couple months for the issue to be completely resolved; mostly due to external factors. Midway through, HackerOne took the unusual step of awarding a bounty for the bug without immediately disclosing it. (We agreed that we wouldn't disclose the issue until Redcarpet had an updated version available.)
HackerOne also did the additional work of reaching out to the Redcarpet maintainers.
About the author:
I'm Daniel LeCheminant, a developer at Trello Inc.
You can follow me on Twitter or e-mail me.
Most recent post:
- A bug in the Sundown and Redcarpet markdown parsers may lead to XSS
- XSS via a spoofed React element
- HackerOne's First XSS
- Hacking stackoverflow.com's HTML sanitizer
The most popular things I've written: