Transcript Collector

Should I use markdown for my site?

2026-06-15 ยท en automatic

Open YouTube
In this episode of Search of the Record,
we are talking about markdown LLMs and
if you should convert your content into
Markdown or even LLM's txt files or if
it's not that helpful maybe or is it or
is it not? Find out in this episode of
Search of the Record.
Hello and welcome to a new episode of
Surge Off the Record, the podcast coming
from the Google search relations team.
My name is Martin Split and with me here
is my boss. Hello Mr. Moo. Hello John.
How's it going?
>> Hi Martin. Great to be here with you.
>> Long time no see.
I have a question for you because I have
been asked this multiple times and I'm
pretty sure I have the right answer, but
I'd rather hear a little bit of your
perspective as well on this. Should I
convert my website into markdown so that
LLMs have an easier time figuring it
out?
>> Wow.
>> Oh, no. Okay. Okay. Hear me out. Hear me
out. In my opinion, markdown is an
intermediate format. basically like HTML
is an intermediate step to how a website
looks like. It's a structured kind of
text format. It's just annoyingly
tedious to write and can like get the
nesting wrong and all of that. And
Markdown is is doing a lot less to kind
of get more or less the same structure
into a text file. Like you can have
headlines and you can have bullet point
lists and you can have numbered lists
and you can have tables and blah blah
blah and links and images even and
whatever. But fundamentally that's just
it. And yes, it is easier to deal with
markdown than it is to deal with HTML.
But all the crawlers that exist today
had to deal with HTML for the breath of
ingesting the web. Like the goal was to
get the web. And you can't just be like,
"Oh, we're just not going to get any of
the information that's out there. We're
just going to get the markdown files."
So they had to already solve the problem
of dealing with HTML. So I don't think
that's a problem that needs solving.
>> I don't know.
>> Okay. I don't know, Martin. You're
you're like a smart guy. You've learned
a lot about how search works, right?
>> Fair.
>> Okay.
>> I'm not sure where this is going now.
Now I feel like I'm I'm in being
investigated. Okay. Uh oh.
>> So, you know, like the best practices
for making good websites, right?
>> I think so. I hope so.
>> How do you write the content for your
website? Do you write HTML or do you
write Markdown?
>> Why am I having a job interview right
now? It's it's like it's like something.
Okay, I do not write the HTML actually.
I do I well I have to admit that I
actually use a static site generator and
because of that I'm writing in markdown
and I've written my own static side
generator back in the day to do that.
>> Okay.
>> Yeah. Because I don't want to I don't
want to write all these like angular
brackets and all that kind of stuff. I
just want to have like here's a link,
there's a headline, moving on.
>> Okay. It's easier. It's less typing.
That's That's why.
>> So, if people want to rank well in
search like you do, which I I don't
know. I didn't actually look.
>> I don't I don't think I rank so well
because I don't care. Okay. I think I'm
I'm ranking reasonably well for what I
care for.
>> Oh. Oh, well then may maybe they
shouldn't be using markdown.
>> Oh, you think that's why? Aha.
Interesting. No, I don't think that's
that's the thing.
>> Okay. I I guess maybe we should take a
step back and yes briefly explain what
this markdown is like.
>> Okay.
>> Where does it come from? What what is
it?
>> Okay. Okay. Okay. Okay. Yes. Actually,
fun fact, I'm not sure where it comes
from. I saw it first I think on GitHub
where you can add a readme.mds or
markdown file and then it automatically
is kind of like the homepage of your
repository on GitHub.
But I I guess it's older. I don't I
don't know. Have you looked at the
history of it? I looked it up because I
I don't know. Preparing for this mock
interview about markdown.
>> Okay. Pressure just went up again.
Great. Yeah.
>> Yeah. Actually, I asked an LLM, so
this is awkward. So, it's like maybe I
got something wrong. But anyway, I will
assume it's kind of correct. Apparently,
it was created 2004. So, quite long time
ago. I imagine we have listeners who are
younger than markdown which is not
surprising because we probably have
listeners younger than HTML or
JavaScript which I don't even know when
JavaScript was made long time ago. Uh it
was created by John Gruber and Aaron
Schwarz. So John Gruber I think is still
active online. Aaron Schwarz is a bit
tragic the whole history around it. He
was one of the developers of RSS and
Creative Commons and one of the I think
co-owners or co-founders of Reddit. So
long time ago.
>> Wow.
>> Maybe like the Reddit connection is why
people assume this is good for AI
because AI loves Reddit. So therefore,
anything Reddit does must be good,
right?
>> Must be good. Okay. Ah, maybe that's
where it's coming from.
>> Actually, Reddit apparently was written
in lisp originally. So maybe people
should be using lisp to make their
websites if they want to be like Reddit.
>> Gosh, I've written lisp during my
university studies and
>> no. Oh my god.
>> I mean it's a pretty beautiful language
but no no thank you.
>> Gosh. Yeah. So the whole markdown thing
was created as a way to have a simple
plain text kind of English readable
style of creating content that's easy to
convert into HTML and easy to convert
back from HTML. So it's it's basically
it's like if you assume HTML exists, how
could you make it so that it's easier
for people to write and understand,
which I think maps kind of well to why
you're using it for your website, and I
use it for some of my websites as well
because it's it's just a lot easier to
write and a lot of the structures,
they're they're just kind of like a
natural text file.
>> Yeah. Yeah. And and that's how I see it.
Um, remember back in the days when you
had like text files as part of copies of
computer games you found online?
You sometimes got text files with like
ASI art in it that looked like pretty
fancy. And it's like it's a way to style
and and juice up a text file basically.
Like it's a little more structured. It's
a little more readable than just like
having random text. It's like, oh, so
this is meant to be a headline. I can
pretend that this is a headline. And I
think that makes sense. And because of
the simplicity, you can use tools to
programmatically transform it into other
things. I've used Markdown to write a
book.
>> Wow.
>> And publish a book. Okay. So, yeah. And
that has like is over 10 years ago. So,
that's nothing new.
>> That's crazy. Wow. Huh.
>> But again, I didn't want to write all
the overhead of HTML like all the
brackets and stuff. was like and then
you forget a closing thing and then
everything becomes a headline.
>> Yeah, true. And I think what I also find
kind of neat about markdown is that it's
almost by default like semantic markup.
>> Yeah.
>> Whereas it's like this is a heading.
It's not like this is a big piece of
text that could be a heading or it could
just be big text is just very clear is
like this is a heading. This is bold.
This is a link. It's is like super
straightforward.
>> Yeah. And as as long as you're not
cheating because if you know that the
output format is going to be HTML, you
can configure a bunch of the markdown
taking things like the programs that
take markdown as their input and then
produce HTML as the output and configure
them so that you can actually include
HTML. And then if you don't do Yeah.
Like that's nice. If you want to like
show a video or if you want to have a
widget in it that uses JavaScript, you
can do it that way. But if you do that
then you invite back all the complexity
of HTML. If you don't do that then by
definition you are separating style and
content right so like the presentation
is separate because you you it looks
kind of boring black text on white
background by default if it's rendered
to HTML and you need a stylesheet or
something that wraps stuff around it to
make it look in a specific way. Whereas
in HTML, you can say like, "Oh, I want
this to just be like twice the size of
everything else." And then you kind of
sneak in presentation information into
the content. And here you kind of have
clean separation between actual content
and actual presentation. I think that's
that's a good thing.
>> Yeah, that's that's true. Now that you
mentioned it is like with markdown, you
basically provide the structure and the
text,
>> but all of the styling information is
kind of separated out. Y
>> so you can take this text and repurpose
it and it's like oh I'll put it on my
website or I'll create some kind of PDF
with it or whatever and it's like the
text and the structure of the text is
transferable
>> and I think that's also why people think
it's good for LM because you kind of you
have less stuff less tokens and if you
look at an HTML file without a browser
rendering it if you just look at the
plain HTML and a text editor basically,
then it's hard to read the content
because there's so much craft, so much
stuff in it, right? There's all these
HTML tags and all this maybe even inline
styles and all that kind of stuff. But
if a markdown render fails and you look
at the markdown file in a text editor,
it still is structured and readable.
>> Yeah,
>> like a link is the word of the link
text, like the anchor text and then in
in square brackets and then in normal
brackets. It's probably what I would do
if text was all I had available, right?
If I was writing an email without the
possibility to to actually link things,
I would probably like mark up some sort
of link text and then put some sort of
way to say like, and this is where you
need to go to actually see that. Yeah.
>> And I think this minimalism is probably
what makes people think, yeah,
>> this is great for a machine that needs
to understand this content unlike HTML.
Yeah, I think the other difference is
also all of the stuff around the content
things like headings and footers,
sidebars,
>> right?
>> Like all of that is basically gone. So
when you write your content in markdown,
you focus on the text and the links and
things like that. And then afterwards
the system goes, okay, I will put your
piece of text in the structure of a
website and create all of the the cross
links to other categories and all of
these things.
>> Okay. So then that all sounds very nice.
Should we just make markdown as our like
make our websites in Markdown basically
or
>> I don't know. He's like you already make
your website in markdown.
It's like this awkward cycle of it's
like you turn it into HTML and you're
now you're like well maybe we should
just turn it back into Markdown
>> just publish the markdown with no steps
no extra steps. I think the big thing is
that the web with HTML and everything
has been around for a really long time,
longer than Markdown. And all of the
crawlers out there, they have practice
with HTML and like converting HTML into
text is is trivial. Like there are lots
of libraries out there that can do that
for you. So if you think about what an
average web crawler might look for or
might uh need to find on a page to be
able to understand it, then probably
that's just HTML.
>> Yeah. And I mean the other thing is yes
it's nice that markdown is usually then
focusing on a piece of content but HTML
with all the links and the navigation
and the headers and all that kind of
stuff that it kind of gets stripped out
in the markdown files that make the
website are important to understand the
structure and how this connects to the
rest of the site. So I guess that's also
a bad thing. If we were to lose this,
that's probably not so good for crawling
and discovery. Huh.
>> Oh, definitely. Yeah. Yeah.
>> Yeah.
>> I think when it comes to things like a
search engine or probably also an
generic LLM system, having a website
that uses normal HTML for the pages is
critical because a search engine or
crawler can just go to that page. it can
recognize all of the other links that
are within the website. And usually
those links are somewhere in the header
or in the footer or in the sidebar
somewhere where they say these are other
categories of content, maybe other pages
that are available on the website that
are not directly linked in the content.
And all of that is critical. So it's
almost like if you want to focus purely
on being discoverable in search and
being discoverable for these AI systems
so that they can use your content for
training then having normal HTML pages
is basically the main thing you can do.
It's almost I don't know may maybe it's
even the the primary thing that you need
to do as a prerequisite in order to be
crawled and indexed normally.
Obviously, the web is super messy and
sometimes people put normal text files
online or PDFs and crawlers have to deal
with some of that as well. But they
definitely know how to deal with HTML
pages. That's kind of the foundation of
the web. And I mean the other thing is
also for users you can't just publish a
set of markdown documents because a we
like colors and images and stuff to kind
of like flow in a nice layout and
markdown by definition unless you put a
layout on it doesn't and markdown
doesn't support layouts directly. So you
would have to have some sort of
mechanism to you're basically recreating
the browser. You're recreating HTML
parsing in the end. So might as well use
HTML parsing because as you say that has
been around and has been tried and
tested for decades at this point. The
other thing is you would duplicate
things. If you were to acknowledge like
oh users don't want markdown they want
the full-fledged website and then I
create a version just for LLM then
you're kind of making twice the work or
having twice the work. No.
>> Yeah. I think that's always terrible on
the web and I understand where these
ideas come from in that a lot of web
pages are just terrible from a
structural point of view and hard to use
and it's tempting to say well users can
see this complex weird page and
automated systems they should have it
easy like you should just give them the
information that they're looking for but
Fundamentally, as soon as you have these
parallel versions of your content, then
everything becomes so much more complex,
you have to maintain those multiple
versions. You have to make sure that
nothing breaks on a version that a user
doesn't see. Uh because users might
complain to you if your page doesn't
load properly. But if the LLM version of
a page doesn't load properly, then no
user is going to tell you that something
is broken. And a lot of these automated
systems, they might not even recognize
that something is broken because they
see, oh, it's like there's some text
here must be what they want us to index.
Yeah, I think we learned that lesson
with dynamic rendering, which was a nice
stop gap solution for a while, but we
found out in practice it often times
caused more problems and was really hard
to debug because of this duality of the
two different separate versions. And
yeah, that's uh that's not great. Okay,
while we are on the topic of markdown,
should I then just create a text version
like have like a text file that has all
the content in it for LLMs or is it kind
of like the same problem?
>> I think you mean the LLM's text file.
>> Oh,
>> no. The text file for LLMs. Yeah. Um,
so, so I I talked with I think one of
the people who who created that proposal
a while back and the idea was really not
to create something that makes it easier
for search engines or LLM systems to
discover all of your content, but almost
more that if an LLM already knows about
your site and wants to find out what
else is here, then that might be an
approach.
And I think the aspect of using this as
a way to optimize for discovery by AI
systems or discovery by search systems
that doesn't make any sense at all. Uh
because it's basically you're telling
these systems like oh I have the best
website ever and here are all of the
pages that everyone must go to and you
must buy all of my products or whatever
you put in there. So in LLM system it
basically by design can't trust what is
here as a way of differentiating between
different websites. If someone is
already on your website, maybe some kind
of automated system is helpful where if
they go like, I want to go to Martin
Split and buy a photograph, then the LLM
system can go to your website and can
look around like, how do we buy a
photograph? Like maybe he has some
guidelines for me as an agent for buying
photographs. That kind of makes sense.
But going off and saying like, I want to
buy a photograph. Which website has one?
the system is not going to go to your
website and five others and say like who
has some automated information but
rather they're trying going to try to
find the best website first.
>> Okay, makes sense. I think from that
point of view optimizing as a way of
being discovered that doesn't make sense
but like what happens when an agent is
on your website I think that also just
generally seems to be an open area for
discussion at the moment in that there's
LLM text as a proposal there are
different JSON files and well-known file
types that are in discussion there's web
MCP which I think tries to do something
similar where they say it's like well
you're on this page now but we have a
programmatic interface for this added
specific URL or a specific mechanism.
>> I think those are then almost different
discussions.
So the generic SEO angle of how do I
find a website that sells me a
photograph is almost going to be
completely bound to HTML pages and
normal web pages. And then if so a user
decides to go to a specific service then
within that service then there is a
little bit more room for maybe helping
an agent or an LLM system to find the
right approach. But what is interesting
of course is like lots of ideas and none
of these have basically crystallized as
the one thing that everyone will use. So
I'm sure over the next I don't know half
year or maybe longer it's going to take
a bit and some of these agentic systems
are going to kind of unify around some
some standard file type or mechanism or
something.
>> All right. So I guess
that should
settle the debate if we should just go
back to mark. I mean even if we are if I
for all the websites that I'm have made
in the last I don't know 10 years at
least actually holy moly that was like
2012 when I started using markdown to
make my website. So that's at this point
14 years. For the last 14 years, I
basically made markdown websites, but no
one would have known because you look at
the HTML version. And I think that's
continues to be fine.
>> That's fine.
>> Yeah. I think the one place where maybe
some markdown content on a website could
make sense is if you have something like
developer documentation
>> where again if the agent or the LLM
system already knows about your website
and the user says like how do I use this
API then if you give the LLM system a
markdown file it's a lot easier for it
to understand okay this is the the
mechanism here So I suspect for more
code or websites that provide code
samples and developer interaction for
them having markdown versions of the
technical documentation makes some
sense. And then you have that challenge
of course of having the parallel
versions because users are not going to
look at markdown because it's a text
file. It's not it's not this nice
looking HTML page thing, but maybe for
agents that's something that makes
sense.
>> Well, I'm I mean there's a solution to
that, which is to publish your
repository with the documentation in
Markdown and then use that markdown
documentation to generate the HTML
version.
>> Exactly.
>> And then you don't have the drift. Yeah.
Yeah.
>> Exactly. I I think again for developer
content, I think that makes a lot of
sense. But if you're selling shoes, it's
like you're not going to have a markdown
version of your shoe catalog. Like that
makes that makes no sense at all. I
think the challenge is of course people
who are creating websites are developers
and using developer tools and they're
like, "Oh, I'm using the markdown
version of this API to understand how it
works. Therefore, maybe my shoe site
should also have a markdown API." which
is kind of like that bias I think that
developers just have that is like I do
it like this therefore maybe everyone
does it like this and probably that's
not the case
>> and good news is that normally there's
more than just the developers involved
in making a website so hopefully
teamwork will make the dream work
>> that would be nice
yeah and the other thing I think is also
we've been talking about websites at
this point. But the web platform offers
more than just plain old websites like a
list of products. You could build
applications in there. You could build
interactivity in there. And markdown
itself doesn't support that. And I don't
think it should because again it's for
content. And so I guess the web will
continue to be this multitude of things,
this multitude of what a website could
be. It could be somewhere between
application and actual just like a
content document. And I guess markdown
is just one part of it and most likely
will stay just the middle bit of the
pipeline from thoughts in someone's mind
to website on on the internet. Yeah.
>> Yeah. I think again like for all of the
SEO related things and discovery of
content, a normal HTML website is
>> it's the best
>> is like that's not going to go away. I
mean like who knows, but that seems very
unlikely that it'll go away. So that I
think is at least the baseline
requirement.
If you have developer content, doing
something markdown is fine. Try it out.
See if it actually brings some value.
But for everyone else, I think markdown
doesn't really make sense.
>> Yeah. All right. I think that makes
sense. And uh I think we've spoken
enough about markdown at this point. And
uh I hope that you all out there have a
better idea of why Markdown became so
popular recently for LMS and uh what you
should do to make your websites and
maybe even use markdown to create the
HTML of your website. It's fine. And
trust me, right, John?
>> Right. Do it.
>> Excellent.
>> Or don't.
>> Yeah, we are not cops. We are just like
random people on the internet. Uh, well,
anyway, thank you all so much for
listening out there. I hope that it was
fun and useful. Let us know in the
comments below if you're using Markdown
for something and how you're using
Markdown. And um, in that case, thank
you, John.
>> Thank you, Martin. Great to be here.
>> Thank you for being here with me. And
bye-bye, everybody. Bye.
We've been having fun with these podcast
episodes. I hope you, the listener, have
found them both entertaining and
insightful, too. Feel free to drop us a
note on LinkedIn or chat with us at one
of our next events we go to. If you have
any thoughts, let us know. And of
course, do not forget to like and
subscribe. Thank you so much for
listening and goodbye.