Should I use markdown for my site?
2026-06-15 ยท en automatic
In this episode of Search of the Record, we are talking about markdown LLMs and if you should convert your content into Markdown or even LLM's txt files or if it's not that helpful maybe or is it or is it not? Find out in this episode of Search of the Record. Hello and welcome to a new episode of Surge Off the Record, the podcast coming from the Google search relations team. My name is Martin Split and with me here is my boss. Hello Mr. Moo. Hello John. How's it going? >> Hi Martin. Great to be here with you. >> Long time no see. I have a question for you because I have been asked this multiple times and I'm pretty sure I have the right answer, but I'd rather hear a little bit of your perspective as well on this. Should I convert my website into markdown so that LLMs have an easier time figuring it out? >> Wow. >> Oh, no. Okay. Okay. Hear me out. Hear me out. In my opinion, markdown is an intermediate format. basically like HTML is an intermediate step to how a website looks like. It's a structured kind of text format. It's just annoyingly tedious to write and can like get the nesting wrong and all of that. And Markdown is is doing a lot less to kind of get more or less the same structure into a text file. Like you can have headlines and you can have bullet point lists and you can have numbered lists and you can have tables and blah blah blah and links and images even and whatever. But fundamentally that's just it. And yes, it is easier to deal with markdown than it is to deal with HTML. But all the crawlers that exist today had to deal with HTML for the breath of ingesting the web. Like the goal was to get the web. And you can't just be like, "Oh, we're just not going to get any of the information that's out there. We're just going to get the markdown files." So they had to already solve the problem of dealing with HTML. So I don't think that's a problem that needs solving. >> I don't know. >> Okay. I don't know, Martin. You're you're like a smart guy. You've learned a lot about how search works, right? >> Fair. >> Okay. >> I'm not sure where this is going now. Now I feel like I'm I'm in being investigated. Okay. Uh oh. >> So, you know, like the best practices for making good websites, right? >> I think so. I hope so. >> How do you write the content for your website? Do you write HTML or do you write Markdown? >> Why am I having a job interview right now? It's it's like it's like something. Okay, I do not write the HTML actually. I do I well I have to admit that I actually use a static site generator and because of that I'm writing in markdown and I've written my own static side generator back in the day to do that. >> Okay. >> Yeah. Because I don't want to I don't want to write all these like angular brackets and all that kind of stuff. I just want to have like here's a link, there's a headline, moving on. >> Okay. It's easier. It's less typing. That's That's why. >> So, if people want to rank well in search like you do, which I I don't know. I didn't actually look. >> I don't I don't think I rank so well because I don't care. Okay. I think I'm I'm ranking reasonably well for what I care for. >> Oh. Oh, well then may maybe they shouldn't be using markdown. >> Oh, you think that's why? Aha. Interesting. No, I don't think that's that's the thing. >> Okay. I I guess maybe we should take a step back and yes briefly explain what this markdown is like. >> Okay. >> Where does it come from? What what is it? >> Okay. Okay. Okay. Okay. Yes. Actually, fun fact, I'm not sure where it comes from. I saw it first I think on GitHub where you can add a readme.mds or markdown file and then it automatically is kind of like the homepage of your repository on GitHub. But I I guess it's older. I don't I don't know. Have you looked at the history of it? I looked it up because I I don't know. Preparing for this mock interview about markdown. >> Okay. Pressure just went up again. Great. Yeah. >> Yeah. Actually, I asked an LLM, so this is awkward. So, it's like maybe I got something wrong. But anyway, I will assume it's kind of correct. Apparently, it was created 2004. So, quite long time ago. I imagine we have listeners who are younger than markdown which is not surprising because we probably have listeners younger than HTML or JavaScript which I don't even know when JavaScript was made long time ago. Uh it was created by John Gruber and Aaron Schwarz. So John Gruber I think is still active online. Aaron Schwarz is a bit tragic the whole history around it. He was one of the developers of RSS and Creative Commons and one of the I think co-owners or co-founders of Reddit. So long time ago. >> Wow. >> Maybe like the Reddit connection is why people assume this is good for AI because AI loves Reddit. So therefore, anything Reddit does must be good, right? >> Must be good. Okay. Ah, maybe that's where it's coming from. >> Actually, Reddit apparently was written in lisp originally. So maybe people should be using lisp to make their websites if they want to be like Reddit. >> Gosh, I've written lisp during my university studies and >> no. Oh my god. >> I mean it's a pretty beautiful language but no no thank you. >> Gosh. Yeah. So the whole markdown thing was created as a way to have a simple plain text kind of English readable style of creating content that's easy to convert into HTML and easy to convert back from HTML. So it's it's basically it's like if you assume HTML exists, how could you make it so that it's easier for people to write and understand, which I think maps kind of well to why you're using it for your website, and I use it for some of my websites as well because it's it's just a lot easier to write and a lot of the structures, they're they're just kind of like a natural text file. >> Yeah. Yeah. And and that's how I see it. Um, remember back in the days when you had like text files as part of copies of computer games you found online? You sometimes got text files with like ASI art in it that looked like pretty fancy. And it's like it's a way to style and and juice up a text file basically. Like it's a little more structured. It's a little more readable than just like having random text. It's like, oh, so this is meant to be a headline. I can pretend that this is a headline. And I think that makes sense. And because of the simplicity, you can use tools to programmatically transform it into other things. I've used Markdown to write a book. >> Wow. >> And publish a book. Okay. So, yeah. And that has like is over 10 years ago. So, that's nothing new. >> That's crazy. Wow. Huh. >> But again, I didn't want to write all the overhead of HTML like all the brackets and stuff. was like and then you forget a closing thing and then everything becomes a headline. >> Yeah, true. And I think what I also find kind of neat about markdown is that it's almost by default like semantic markup. >> Yeah. >> Whereas it's like this is a heading. It's not like this is a big piece of text that could be a heading or it could just be big text is just very clear is like this is a heading. This is bold. This is a link. It's is like super straightforward. >> Yeah. And as as long as you're not cheating because if you know that the output format is going to be HTML, you can configure a bunch of the markdown taking things like the programs that take markdown as their input and then produce HTML as the output and configure them so that you can actually include HTML. And then if you don't do Yeah. Like that's nice. If you want to like show a video or if you want to have a widget in it that uses JavaScript, you can do it that way. But if you do that then you invite back all the complexity of HTML. If you don't do that then by definition you are separating style and content right so like the presentation is separate because you you it looks kind of boring black text on white background by default if it's rendered to HTML and you need a stylesheet or something that wraps stuff around it to make it look in a specific way. Whereas in HTML, you can say like, "Oh, I want this to just be like twice the size of everything else." And then you kind of sneak in presentation information into the content. And here you kind of have clean separation between actual content and actual presentation. I think that's that's a good thing. >> Yeah, that's that's true. Now that you mentioned it is like with markdown, you basically provide the structure and the text, >> but all of the styling information is kind of separated out. Y >> so you can take this text and repurpose it and it's like oh I'll put it on my website or I'll create some kind of PDF with it or whatever and it's like the text and the structure of the text is transferable >> and I think that's also why people think it's good for LM because you kind of you have less stuff less tokens and if you look at an HTML file without a browser rendering it if you just look at the plain HTML and a text editor basically, then it's hard to read the content because there's so much craft, so much stuff in it, right? There's all these HTML tags and all this maybe even inline styles and all that kind of stuff. But if a markdown render fails and you look at the markdown file in a text editor, it still is structured and readable. >> Yeah, >> like a link is the word of the link text, like the anchor text and then in in square brackets and then in normal brackets. It's probably what I would do if text was all I had available, right? If I was writing an email without the possibility to to actually link things, I would probably like mark up some sort of link text and then put some sort of way to say like, and this is where you need to go to actually see that. Yeah. >> And I think this minimalism is probably what makes people think, yeah, >> this is great for a machine that needs to understand this content unlike HTML. Yeah, I think the other difference is also all of the stuff around the content things like headings and footers, sidebars, >> right? >> Like all of that is basically gone. So when you write your content in markdown, you focus on the text and the links and things like that. And then afterwards the system goes, okay, I will put your piece of text in the structure of a website and create all of the the cross links to other categories and all of these things. >> Okay. So then that all sounds very nice. Should we just make markdown as our like make our websites in Markdown basically or >> I don't know. He's like you already make your website in markdown. It's like this awkward cycle of it's like you turn it into HTML and you're now you're like well maybe we should just turn it back into Markdown >> just publish the markdown with no steps no extra steps. I think the big thing is that the web with HTML and everything has been around for a really long time, longer than Markdown. And all of the crawlers out there, they have practice with HTML and like converting HTML into text is is trivial. Like there are lots of libraries out there that can do that for you. So if you think about what an average web crawler might look for or might uh need to find on a page to be able to understand it, then probably that's just HTML. >> Yeah. And I mean the other thing is yes it's nice that markdown is usually then focusing on a piece of content but HTML with all the links and the navigation and the headers and all that kind of stuff that it kind of gets stripped out in the markdown files that make the website are important to understand the structure and how this connects to the rest of the site. So I guess that's also a bad thing. If we were to lose this, that's probably not so good for crawling and discovery. Huh. >> Oh, definitely. Yeah. Yeah. >> Yeah. >> I think when it comes to things like a search engine or probably also an generic LLM system, having a website that uses normal HTML for the pages is critical because a search engine or crawler can just go to that page. it can recognize all of the other links that are within the website. And usually those links are somewhere in the header or in the footer or in the sidebar somewhere where they say these are other categories of content, maybe other pages that are available on the website that are not directly linked in the content. And all of that is critical. So it's almost like if you want to focus purely on being discoverable in search and being discoverable for these AI systems so that they can use your content for training then having normal HTML pages is basically the main thing you can do. It's almost I don't know may maybe it's even the the primary thing that you need to do as a prerequisite in order to be crawled and indexed normally. Obviously, the web is super messy and sometimes people put normal text files online or PDFs and crawlers have to deal with some of that as well. But they definitely know how to deal with HTML pages. That's kind of the foundation of the web. And I mean the other thing is also for users you can't just publish a set of markdown documents because a we like colors and images and stuff to kind of like flow in a nice layout and markdown by definition unless you put a layout on it doesn't and markdown doesn't support layouts directly. So you would have to have some sort of mechanism to you're basically recreating the browser. You're recreating HTML parsing in the end. So might as well use HTML parsing because as you say that has been around and has been tried and tested for decades at this point. The other thing is you would duplicate things. If you were to acknowledge like oh users don't want markdown they want the full-fledged website and then I create a version just for LLM then you're kind of making twice the work or having twice the work. No. >> Yeah. I think that's always terrible on the web and I understand where these ideas come from in that a lot of web pages are just terrible from a structural point of view and hard to use and it's tempting to say well users can see this complex weird page and automated systems they should have it easy like you should just give them the information that they're looking for but Fundamentally, as soon as you have these parallel versions of your content, then everything becomes so much more complex, you have to maintain those multiple versions. You have to make sure that nothing breaks on a version that a user doesn't see. Uh because users might complain to you if your page doesn't load properly. But if the LLM version of a page doesn't load properly, then no user is going to tell you that something is broken. And a lot of these automated systems, they might not even recognize that something is broken because they see, oh, it's like there's some text here must be what they want us to index. Yeah, I think we learned that lesson with dynamic rendering, which was a nice stop gap solution for a while, but we found out in practice it often times caused more problems and was really hard to debug because of this duality of the two different separate versions. And yeah, that's uh that's not great. Okay, while we are on the topic of markdown, should I then just create a text version like have like a text file that has all the content in it for LLMs or is it kind of like the same problem? >> I think you mean the LLM's text file. >> Oh, >> no. The text file for LLMs. Yeah. Um, so, so I I talked with I think one of the people who who created that proposal a while back and the idea was really not to create something that makes it easier for search engines or LLM systems to discover all of your content, but almost more that if an LLM already knows about your site and wants to find out what else is here, then that might be an approach. And I think the aspect of using this as a way to optimize for discovery by AI systems or discovery by search systems that doesn't make any sense at all. Uh because it's basically you're telling these systems like oh I have the best website ever and here are all of the pages that everyone must go to and you must buy all of my products or whatever you put in there. So in LLM system it basically by design can't trust what is here as a way of differentiating between different websites. If someone is already on your website, maybe some kind of automated system is helpful where if they go like, I want to go to Martin Split and buy a photograph, then the LLM system can go to your website and can look around like, how do we buy a photograph? Like maybe he has some guidelines for me as an agent for buying photographs. That kind of makes sense. But going off and saying like, I want to buy a photograph. Which website has one? the system is not going to go to your website and five others and say like who has some automated information but rather they're trying going to try to find the best website first. >> Okay, makes sense. I think from that point of view optimizing as a way of being discovered that doesn't make sense but like what happens when an agent is on your website I think that also just generally seems to be an open area for discussion at the moment in that there's LLM text as a proposal there are different JSON files and well-known file types that are in discussion there's web MCP which I think tries to do something similar where they say it's like well you're on this page now but we have a programmatic interface for this added specific URL or a specific mechanism. >> I think those are then almost different discussions. So the generic SEO angle of how do I find a website that sells me a photograph is almost going to be completely bound to HTML pages and normal web pages. And then if so a user decides to go to a specific service then within that service then there is a little bit more room for maybe helping an agent or an LLM system to find the right approach. But what is interesting of course is like lots of ideas and none of these have basically crystallized as the one thing that everyone will use. So I'm sure over the next I don't know half year or maybe longer it's going to take a bit and some of these agentic systems are going to kind of unify around some some standard file type or mechanism or something. >> All right. So I guess that should settle the debate if we should just go back to mark. I mean even if we are if I for all the websites that I'm have made in the last I don't know 10 years at least actually holy moly that was like 2012 when I started using markdown to make my website. So that's at this point 14 years. For the last 14 years, I basically made markdown websites, but no one would have known because you look at the HTML version. And I think that's continues to be fine. >> That's fine. >> Yeah. I think the one place where maybe some markdown content on a website could make sense is if you have something like developer documentation >> where again if the agent or the LLM system already knows about your website and the user says like how do I use this API then if you give the LLM system a markdown file it's a lot easier for it to understand okay this is the the mechanism here So I suspect for more code or websites that provide code samples and developer interaction for them having markdown versions of the technical documentation makes some sense. And then you have that challenge of course of having the parallel versions because users are not going to look at markdown because it's a text file. It's not it's not this nice looking HTML page thing, but maybe for agents that's something that makes sense. >> Well, I'm I mean there's a solution to that, which is to publish your repository with the documentation in Markdown and then use that markdown documentation to generate the HTML version. >> Exactly. >> And then you don't have the drift. Yeah. Yeah. >> Exactly. I I think again for developer content, I think that makes a lot of sense. But if you're selling shoes, it's like you're not going to have a markdown version of your shoe catalog. Like that makes that makes no sense at all. I think the challenge is of course people who are creating websites are developers and using developer tools and they're like, "Oh, I'm using the markdown version of this API to understand how it works. Therefore, maybe my shoe site should also have a markdown API." which is kind of like that bias I think that developers just have that is like I do it like this therefore maybe everyone does it like this and probably that's not the case >> and good news is that normally there's more than just the developers involved in making a website so hopefully teamwork will make the dream work >> that would be nice yeah and the other thing I think is also we've been talking about websites at this point. But the web platform offers more than just plain old websites like a list of products. You could build applications in there. You could build interactivity in there. And markdown itself doesn't support that. And I don't think it should because again it's for content. And so I guess the web will continue to be this multitude of things, this multitude of what a website could be. It could be somewhere between application and actual just like a content document. And I guess markdown is just one part of it and most likely will stay just the middle bit of the pipeline from thoughts in someone's mind to website on on the internet. Yeah. >> Yeah. I think again like for all of the SEO related things and discovery of content, a normal HTML website is >> it's the best >> is like that's not going to go away. I mean like who knows, but that seems very unlikely that it'll go away. So that I think is at least the baseline requirement. If you have developer content, doing something markdown is fine. Try it out. See if it actually brings some value. But for everyone else, I think markdown doesn't really make sense. >> Yeah. All right. I think that makes sense. And uh I think we've spoken enough about markdown at this point. And uh I hope that you all out there have a better idea of why Markdown became so popular recently for LMS and uh what you should do to make your websites and maybe even use markdown to create the HTML of your website. It's fine. And trust me, right, John? >> Right. Do it. >> Excellent. >> Or don't. >> Yeah, we are not cops. We are just like random people on the internet. Uh, well, anyway, thank you all so much for listening out there. I hope that it was fun and useful. Let us know in the comments below if you're using Markdown for something and how you're using Markdown. And um, in that case, thank you, John. >> Thank you, Martin. Great to be here. >> Thank you for being here with me. And bye-bye, everybody. Bye. We've been having fun with these podcast episodes. I hope you, the listener, have found them both entertaining and insightful, too. Feel free to drop us a note on LinkedIn or chat with us at one of our next events we go to. If you have any thoughts, let us know. And of course, do not forget to like and subscribe. Thank you so much for listening and goodbye.