A Formal Spec for GitHub Flavored Markdown

jorams · on March 14, 2017

It's odd how neither this post, nor the spec, nor GitHub's "Mastering Markdown" help page[1], nor the more complete "Basic writing and formatting syntax" page[2], mentions the fact that GitHub treats every newline as a hard break.

CommonMark contains this little sentence to work around its specified behavior, which is left untouched in the GFM spec:

> A renderer may also provide an option to render soft line breaks as hard line breaks.

I'd say whether or not it does this is a rather important thing to mention. When I write Markdown documents for GitHub I have to change my editor settings, only because of this.

[1]: https://guides.github.com/features/mastering-markdown/

[2]: https://help.github.com/articles/basic-writing-and-formattin...

kivikakk · on March 14, 2017

GFM itself leaves that line unchanged because we don't actually change that option in our implementation — the reference implementation `cmark` (which we built upon) supplies the "hardbreaks" option, described as follows:

> --hardbreaks Treat newlines as hard line breaks

We turn this option on when rendering issues, issue comments and so on, but leave it off when rendering blobs (such as README.md). Both are GFM, one just uses this option to make it more conducive to communication.

yuchi · on March 15, 2017

This is a very important bit of information.

Do you mind to make sure this is reflected in docs?

powerbook5300CS · on March 15, 2017

Good to know. Why is the setting different for issues than for blobs?

steveklabnik · on March 15, 2017

This is covered in the post, no?

> There is a fundamental difference between these two kinds of content: the user comments are stored in our databases, which means their Markdown syntax can be normalized (e.g. by adding or removing whitespace, fixing the indentation, or inserting missing Markdown specifiers until they render properly). The Markdown documents stored in Git repositories, however, cannot be touched at all, as their contents are hashed as part of Git’s storage model.

powerbook5300CS · on March 15, 2017

Sure, but I'm still confused as to why the display format setting has anything to do with the on-disk format. To me they seem completely orthogonal.

kivikakk · on March 16, 2017

In general people have come to expect that hitting return once in a comment field on GitHub will produce a newline, having been the case for many years, so we try to preserve that expectation. Hence not changing the option being used when rendering comments.

Conversely, they don't expect the same from Markdown files stored in their repository (e.g. I put each sentence in a paragraph on its own line for my blog, for easier diffing and editing). Additionally, we couldn't normalise these documents even if we wanted (to prevent everything breaking by being over-vertically spaced). Hence not changing the option being not used in this case!

steveklabnik · on March 15, 2017

Because using the new display format with a non-normalized source would break the display.

y0ghur7_xxx · on March 14, 2017

> It's odd how neither this post, nor the spec, nor GitHub's "Mastering Markdown" help page[1], nor the more complete "Basic writing and formatting syntax" page[2], mentions the fact that GitHub treats every newline as a hard break.

Well, actually it does say that "Hard line breaks are for separating inline content within a block."

https://github.github.com/gfm/#hard-line-break

So two spaces at the end of the line for a <br>, and an empty line for <p>.

jorams · on March 14, 2017

Right, but that's not what GFM does. GFM turns any single newline into a <br>.

hashhar · on March 15, 2017

Depends on where you use it. All file backed content (ie. repo data) does not use it, while the communications systems (PRs and issues) use it.

kivikakk · on March 15, 2017

As of this announcement, file backed content _does_ use GFM (try a table!), but not the `hardbreaks` option.

erlend_sh · on March 14, 2017

GitHub devs & Markdown enthusiasts at large, please consider contributing some brainpower to these last remaining issues that are blocking the v1.0 release of CommonMark:

https://talk.commonmark.org/t/issues-we-must-resolve-before-...

simplehuman · on March 14, 2017

This is great! A couple of years back, there was a failed attempt at standardizing this - http://www.vfmd.org/ and http://www.vfmd.org/vfmd-spec/specification/. GitHub given it's popularity will surely succeed more.

Ajedi32 · on March 14, 2017

GitHub's spec here is based on [CommonMark][1], which has been around for a while now, and [was originally authored][2] by a group of representatives from GitHub, Reddit, and Stack Exchange.

[1]: http://commonmark.org/

[2]: https://blog.codinghorror.com/standard-flavored-markdown/

avereveard · on March 14, 2017

based on, compatible with, or is common mark?

I'd hate to see them pushing a different spec around, that would solve nothing

Ajedi32 · on March 14, 2017

> based on, compatible with, or is common mark?

The first two. [The GFM spec][1] is literally just CommonMark with a few extra extensions added. They even highlighted the new sections green in the spec to make it clear where the GFM spec differs from CommonMark. Everything else is word-for-word identical.

[1]: https://github.github.com/gfm/

jjnoakes · on March 14, 2017

Some quick reading of the linked article says this spec provides a few optional, superset features on top of CommonMark, that it did not contain (like tables, etc).

zeveb · on March 14, 2017

There was also Common Mark (http://commonmark.org/), which failed IMHO mostly due to John Gruber taking offense at their first choice of name, Common Markdown. Will formalising this as 'GitHub Flavored Markdown' similarly cause offense?

Ajedi32 · on March 14, 2017

GitHub Flavored Markdown has been a thing for a while. The only difference is that now they have a formal spec for it.

Also, CommonMark failed? News to me. Last I heard it was still under active development, years after the drama with Gruber.

criddell · on March 14, 2017

IIRC, their first name choice was Standard Markdown. I don't blame Gruber for being upset at that.

RickHull · on March 14, 2017

> I don't blame Gruber for being upset at that.

I do, when he has abandoned his project's raggedy implementation yet defends the trademark viciously.

criddell · on March 14, 2017

He hasn't abandoned it.

nickez · on March 14, 2017

Last release is from 2004 and he is not interested at all to fix the fact that it is severely underspecified..

criddell · on March 15, 2017

It's underspecified for what others want. If it didn't do what Gruber needed it to do, surely he would extend it, no?

nickez · on March 15, 2017

You have to go look at the source code of Grubers implementation to figure out what markdown actually is. Or do some empirical studies with different inputs. That is what I mean by underspecified. His specification is not detailed enough to implement a markdown parser. So in reality it is abondonware.

derrickdirge · on March 14, 2017

TFA is literally all about how GFM is based on the CommonMark spec.

steveklabnik · on March 14, 2017

The Rust ecosystem has used Markdown for a long time, but is moving to CommonMark as we speak.

Given this news from GitHub, it's very exciting.

nerdponx · on March 14, 2017

It didn't fail. CommonMark is the standard implemented in Pandoc, and the projects share an author.

gregmac · on March 14, 2017

Did it 'fail'? It seems to have a decent amount of use. And it fixed a bunch of problems with the original Markdown spec (or lack thereof).

I suppose since one of their goals was to get Github and StackExchange 'out of the markdown business' but neither use CommonMark, and further, Github now has put work into creating their own spec, they failed in that aspect.

steveklabnik · on March 14, 2017

> Github now has put work into creating their own spec

A big aspect of the post is talking about how GFM is now a set of extensions to CommonMark. This is a huge win, not a failure.

Ajedi32 · on March 14, 2017

> one of their goals was to get Github and StackExchange 'out of the markdown business' but neither use CommonMark

FWIW, StackExchange [uses CommonMark][1] for their new StackOverflow Documentation site and has [been planning][2] to migrate Q&A to CommonMark for some time now.

[1]: https://meta.stackexchange.com/questions/125148/implement-st...

[2]: https://meta.stackexchange.com/q/238957/192171

mwfunk · on March 14, 2017

Who knows, but it'd be pretty irrational if it did. Whether it was fair or not, the "Common Markdown" name caused a kerfluffle because of the implication that it was claiming to be the One True Markdown. The name "GitHub Flavored Markdown" only implies that it's base Markdown plus GitHub extras, plus it's been known by that name for a long time now.

tracker1 · on March 15, 2017

IIRC, original name was "Standard Markdown" which the original author had issue with... "Common Mark" or "Github Flavored Markdown" not implying such.

pkamb · on March 14, 2017

Previous discussion on the name: https://news.ycombinator.com/item?id=8270771

legulere · on March 14, 2017

It's a specification based on the commonmark specification. Both are not a formal specs. They are more of an informal specification with some edge-cases listed (in contrast to the original markdown specification which has known unspecified edgecases).

UK-AL · on March 14, 2017

I really wish their was concise formal spec for markdown, rather than a multi-page essay. It makes it incredibly difficult for anyone trying to create something to parse it. There is no mechanical way for go from spec -> parser.

I think its quite difficult to do though.

oblio · on March 15, 2017

Is there any common spec where you can mechanically go from spec to parser? HTTP, SMTP, DNS, HTML, CSS, Javascript, Ruby, Python, C, ... I basically know of nothing in widespread use with a spec that can actually be converted directly into working code.

hawski · on March 15, 2017

Does implementing TCP/IP stack using the RFC (as in parsing diagrams straight from RFC) counts?

Then it was done in OMeta [1].

Previous discussion: A full TCP/IP stack in under 200 LoC (and the power of DSLs) [2].

There's also a PNG parser (but it is not parsing any documentation) in 20 lines of OMeta [3].

[1] http://www.moserware.com/2008/04/towards-moores-law-software...

[2] https://news.ycombinator.com/item?id=846028

[3] http://joshondesign.com/2013/03/18/ConciseComputing

UK-AL · on March 15, 2017

For the parser itself, yes. There are parser generators that take a spec(BNF, PEG) and output a parser that can parse the language.

What you do with the parsed tree is up you though.

adrianN · on March 14, 2017

Is it really useful to write a formal spec for Github Markdown? The software they use to parse and render it is open source. If you want to know how exactly something works, you can read the source.

scrollaway · on March 14, 2017

Is it really useful to write a formal spec for HTML? The software we use to parse and render it is open source. If you want to know how exactly something works, you can read the source.

;)

jws · on March 14, 2017

Having used and maintained a Swift translation of the StackOverflow .NET markdown processor, please, when there is a proposal to use source as a spec, burn it with fire. Scatter the ashes.

legulere · on March 14, 2017

Yes, if you also want to have more implementations, that can differ in e.g. license or programming language.

For instance it is the reason why there's no reimplementation of TeX.

adrianN · on March 14, 2017

There is pdflatex, luatex,...

Nobody stops you from translating the Ruby or whatever into your favorite language.

UK-AL · on March 14, 2017

Reading the sources and trying to understand it takes a lot longer than looking at bnf and cranking out recursive descent parser based on it or a parser generator.

dreamcompiler · on March 14, 2017

It's not possible to write a BNF for Markdown. At least not an unambiguous, useful BNF.

http://roopc.net/posts/2014/markdown-cfg/

Ericson2314 · on March 15, 2017

I hate calling things formal which aren't. Also, I hope this puts to rest any idea of Markdown being "simple".

rcarmo · on March 14, 2017

I have to wonder why this isn't done in the form of a context-free grammar, like Hitman[0] uses. Specs in English are still too vague for my liking.

https://github.com/chameco/Hitman

daigoba66 · on March 14, 2017

Is it even possible to define a context-free grammar for markdown?

jws · on March 14, 2017

It is not, and not just because of a few idiosyncrasies like C which require context.

Fundamentally, markdown was specified as some pattern matching and English description. The original specification was not done thinking of productions and grammar rules, and you typically don't get there by accident.

See http://roopc.net/posts/2014/markdown-cfg/ for a detailed exposition of how the '*' character in markdown is sufficient to ruin any chance of a CFG.

I have sometimes pondered how to make a markdown-like language with a simple production based grammar. I have not succeeded and would appreciate any pointers. The criteria being that is has to have something like the minimal intrusion into the prose of markdown.

nerdponx · on March 14, 2017

Is there some kind of more general grammar that can encode the Markdown spec?

eslaught · on March 14, 2017

There was some discussion of this on the CommonMark forums shortly after the initial release of CommonMark:

https://talk.commonmark.org/t/commonmark-formal-grammar/46

See in particular these comments by maradydd:

https://talk.commonmark.org/t/commonmark-formal-grammar/46/1...

https://talk.commonmark.org/t/commonmark-formal-grammar/46/2...

I'm not sure they ever came to a conclusive answer (at least on this thread).

Edit: Here is JGM himself saying he doesn't know: https://talk.commonmark.org/t/commonmark-formal-grammar/46/3...

Ericson2314 · on March 15, 2017

If GitHub is normalizing comments anyways, I wonder they could have adopted a CFG.

Overall this a step in the right direction but the whole saga is a perfect microcosm of our understructure cranking out pooly-understood stuff which comes back to bite us and cannot be tamed.

nerdponx · on March 15, 2017

Interesting. I wonder if there's a thesis in solving this problem.

legulere · on March 14, 2017

https://github.github.com/gfm/#disallowed-raw-html-extension...

Why this? This is not a working blacklist to prevent XSS (e.g. onload="...")

tanoku · on March 14, 2017

Hi there! As the spec explains, this is a Markdown specific blacklist that prevent the tags that would otherwise "break" the content of the Markdown document.

A document that contains these tags will not be parsed properly by an HTML5 compliant parser; the parser will "swallow" other chunks of Markdown content that come after the tags. Hence, we disable the tags altogether.

This is an UX feature, not a security feature. XSS prevention, and a plethora of other security checks, are performed by our user content stack -- but this functionality is shared for all markup languages in GitHub (MD, RST, ASCIIDOC, ...), so it's not discussed in this spec.

scrollaway · on March 14, 2017

Wow, TIL about the <plaintext> tag. Here I thought I knew most of the corner cases of HTML.

dbbk · on March 14, 2017

What's bonkers to me is that there is no closing tag, everything after it is no longer parsed as HTML.

ptx · on March 15, 2017

If it disallowed certain words (i.e. treated them specially instead of just reproducing the text as written) such as "</plaintext>" it wouldn't be plain text.

jeffmcjunkin · on March 14, 2017

It's also a great way to check if user input's being parsed server-side! :)

Xylakant · on March 14, 2017

It's not meant as an xss prevention but as a safety to prevent rendering errors.

ericholscher · on March 14, 2017

This is great -- lack of a standard that was actually used (unlike CommonMark) was one of my main issues with Markdown (http://ericholscher.com/blog/2016/mar/15/dont-use-markdown-f...) -- It's really great to see GitHub leading in this department, and it gives me hope that one day we might actually have Markdown that is portable between implementations.

jotux · on March 14, 2017

The github spec is literally CommonMark with extensions.

https://github.github.com/gfm/

http://spec.commonmark.org/0.27/

zellyn · on March 14, 2017

This is actually very closely based on CommonMark, according to the article.

SEJeff · on March 14, 2017

Perhaps you missed Jeff Atwood (codinghorror)'s standardized markdown spec, which is about as close as you'll get:

http://commonmark.org/

ericholscher · on March 14, 2017

Yea, I mention commonmark in the post, and referred to it in the "standard that was used" part -- commonmark is great, but wasn't widely adopted.

Updated my original post to be more clear.

IgorPartola · on March 15, 2017

At the risk of starting a mini-flame war, is RST a more cohesive format? If one was to pick one of the two formats to start using for personal documentation, which format should one choose?

mdaniel · on March 15, 2017

I enjoyed this article http://eli.thegreenplace.net/2017/restructuredtext-vs-markdo... and since the article didn't provide supporting evidence for the Linux/OpenCV/LLVM assertion: https://www.kernel.org/doc/html/latest/doc-guide/sphinx.html and http://docs.opencv.org/2.4/doc/tutorials/introduction/how_to... and https://github.com/llvm-mirror/llvm/blob/master/docs/index.r... respectively

I do think rST is waaay more expressive, but I also recognize that in many of the instances one would want to use markup in a chat or PR situation, the expressiveness likely wouldn't be well received if the trade-off is verbosity.

This is something in life that I file away with competing regex standards: my brain just has to switch languages based on the app in which I'm typing (between markdown, pseudo-markdown (ahem, Slack), org-mode, rST, etc).

coddingtonbear · on March 15, 2017

For personal documentation (assuming you mean notes): follow your heart. Personally, though, I wouldn't use RST for any non-python public documentation at this point, but for my personal notes, it's hard to beat the extensibility of RST.

patrec · on March 14, 2017

Now if only org-mode could define a sane, parsable format.

alphapapa · on March 15, 2017

What do you mean?

jostmey · on March 15, 2017

HTML used to serve as a simple way to format a document. Now HTML is too complex for that purpose. Introduce markdown. In ten years, markdown will be too complex for formatting documents.

I am big of markdown in case I didn't make that clear. I love it

hawkice · on March 15, 2017

It wasn't an issue of simplicity, it was about HTML being extremely hard to sanitize so it isn't turing complete. Having formatting (that can still line up with your sites styles) while knowing you can't get script injection is pretty useful.

wtetzner · on March 15, 2017

I don't think it's that HTML is too complex. It's more that Markdown allows documents to be marked up in a way that "natural" and easy to read if you're reading the plain text version of it.

dreamcompiler · on March 14, 2017

This is great news! Does anybody have a recommendation for a Javascript parser for Formal GFM? (I know there are a million JS MD parsers; I'm looking for a good one that will let me serve GFM docs over HTTP and render them on the browser.)

steveklabnik · on March 14, 2017

https://www.npmjs.com/package/marky-markdown

dreamcompiler · on March 15, 2017

Looks good. Thanks.

charonn0 · on March 15, 2017

So that's why my project wikis have suddenly stopped rendering markdown properly. I've been trying to figure out WTF was going on since yesterday!

libeclipse · on March 14, 2017

I'm really happy to see this. It's actually quite frustrating that although markdown is so nice, it barely has a consistent standard. It's almost impossible to use it cross-service.

Hopefully now that Github has standardised their own flavour of it (and quite a nice flavour too), more people will start to use it.

Of course there is the obligatory XKCD: https://xkcd.com/927/

zzleeper · on March 14, 2017

I would argue that it is consistent now.

At the lowest level, you have commonmark. Then, you have extensions at the top, such as GFM.

If Pandoc/Github/Reddit/SO/kramdown switch, that accounts for almost all front- and back-end cases that I care about.

And given that the first four were actively involved with commonmark, I would take as given that they will support commonmark or a superset of it.

kccqzy · on March 14, 2017

Pandoc already supports different flavors of markdown (commonmark included) and you can add/remove extensions to base flavors.

tomcam · on March 15, 2017

web2py handled all of these issues and made its markup language extensible with its markmin specification: http://www.web2py.com/init/static/markmin.html

Siecje · on March 15, 2017

What's wrong with http://www.vfmd.org/ ???

strikedout · on March 14, 2017

Why no ~~strike out~~ in spec?

kivikakk · on March 14, 2017

Right here: https://github.github.com/gfm/#strikethrough-extension- :)

oneeyedpigeon · on March 15, 2017

Is their no corresponding equivalent of INS? I would have thought something like

    +new text+

could be used...

flippyhead · on March 14, 2017

Jeesh, about time!

harmonyinfotech · on March 14, 2017

Is it ok if I promote my domain here? I read the guidelines but it doesn't mention anything regarding self promotion. Sorry if it ain't appropriate but anyone looking for a relevant domain (markdown.in) please get in touch or any suggestion if it is better to develop it.