Debugging the Internet: HTTP, TCP, and You
2025-05-15 ยท en automatic
[Music] Hello and welcome to a new episode of Search Off the Record, a podcast coming to you from the Google Search team where we talk about search and maybe have some fun along the way. My name's name, I'm a joke title, or at least that's what's in our planning doc. In reality, my name is Gary and I have no idea what I'm doing at Google, but I do stuff. Today, I'm joined by this wonderful person who I call Morty. I pretty sure that's not his real name. Um, is it? No. No. Okay. No. No. Well, let's not split hairs. Oh god. So today I'm joined by Martin Split. Everyone say hi. Or at least that's in my dock. Hi. Mo. No. Hi. You have to say hi. Hi. Thank you. Hello. All right. So, I'm sure you're wondering why I have gathered you here today. Uh yeah. Yeah. Why? What is what is happening? What have I done wrong? Oh, you are wondering always. So I was talking to people. Yeah. Okay. Let's just You were talking to people. Are these people in the in this room right now? No. Oh, but they are real. I hope so. Otherwise, I should really visit a psych psychotherapist or something. But one realization that I had and I'm pretty sure that you're going to agree is that even those who have been in the internet industry for long enough forgot about HTTP. as in like they use the term HTTP but like how it works and why it works and the different nuances and all the weird stuff that it's doing and it's based on maybe people don't know that much about it anymore. I agree. Yeah. And I I have to say I keep forgetting things about HTTP and find out new things. Um well it's evolving so that's natural. So, I was thinking that for a change, we don't have a script and the two of us are just going to chat about HTTP. All right. 200. Okay. Wow. So geeky. Wow. 100. Continue. Oh god, this was a mistake. Let's talk about something else. 101. Switching protocols. All right. So, HTTP, it is the thing that makes the internet happen. Would you agree? Oh, careful. The internet or the web? Not the same thing. Nice. The web. Uhhuh. Okay. Yes. I would say yes. Okay. How about TCP? because in one of my discussions uh TCP and HTTP they were used interchangeably. No, they're not. And I know that we're experimenting with something that uses or is kind of like HTTP but it's not using TCP. Should we explain the difference of of HTTP versus TCP? Well, yeah. Okay. Do you wanna should I try? Okay. Okay. Um, so I I I would this this I'm probably going to botch it. Um, because uh it's been a while. I was uh working on u networking as in like 20 years. Um but TCP is transport something protocol transport control protocol. I think Mhm. it is the main protocol I would say of the old internet of the old web as in like it is the transmission protocol that was used on HTTP versions prior HTTP3. Well, actually HTTP3 still uses that, but let's not go there yet. Um and it was or it is coming from like the grandfather pretty much of the internet vins surf. It it was introduced a billion years ago like 50 years maybe. I think it I think it was introduced in the 70s and they needed a a protocol that was capable of doing packet switching between network nodes. Mhm. So basically you send out a packet and it could go like one way or other way plus it could uh or it had to be able to negotiate connections and then reliably transfer the data packets from A to B and there has been multiple approaches at this kind of stuff. Yeah. Oh yeah. Yeah. And then I I think it was like also taken as the basis for other protocols later like um voiceover IP whatever that protocol was RTP or something. It basically enables us to transfer data packets uh reliably. Mhm. It accounts for data loss. So if there's something like a a packet is lost for example then the server or the client can reerequest the packet. Mhm. Which well it's important and then on top of that we have HTTP right. Yeah. Well HTTP and so many other things. Yeah. Mhm. Well yeah but like if we are talking about HTTP then Yeah. uh on top of this TCP thing, we have HTTP. Yeah, I I think you could say that. And in in general, like it's layers upon layers of of the way that things work. And uh one of the lower layers is TCP and then on top of that, you can do things like HTTP or HTTPS or FTP or mail stuff, SMTP, POP, this kind of stuff. M even pings technically you can send on like you know when you type in command line like ping something like you can actually send it over TCP like by default it's on ICMP protocol but uh you can switch it to to TCP um which is UDP basically not blocked because usually ICMP is blocked by origins because like why why are you pinging me? Mhm. um trace route the program that is also normally or by default is working over ICMP but you can switch it to a TCP. You can also switch it to UDP I think. Yeah. So that that introduces more confusion. Let's not go there. Yeah. Um I I think we can't cannot avoid talking about UDP because quick like the new version of of HTTP HTTP3 is making use of UDP a lot. Yeah. Or UDP like structure height. I mean that makes sense in some ways. Yeah. To understand the difference between them, you could think of one is like establishing a connection. So like having like a channel that you can pump things through, but that obviously requires like some setup and some tear down cost because you have to set up this so quote unquote connection. And as you said, like if something gets lost, you know that it got lost and then you actually kind of by default wait for it to be ret-ransmitted before assembling the rest. But sometimes that doesn't make sense. If you think about it, if you were to like let's say do video streaming, if you use a video streaming provider or if you do like a video call, if I miss one frame out of the 25 frames per second, it doesn't matter. But I don't want the the image to freeze until that old frame from like a few minutes ago comes comes in. So UDP is kind of fine with losing stuff, but it doesn't guarantee that everything makes its way across the network to the recipient, whereas TCP does. And I think with HTTP, in certain circumstances, it can make sense if we lose one quote unquote frame in between. But quick does something weird to make that nice. Right. Right. So UDP it I I so UDP is in in general is not used on on HTTP like I would say like on on on HTTP you are just going TCP like transmission control protocol but UDP has the the feature that you described that like a packet is lost don't care let's go but like UDP you might actually uh see in DNS which are not maybe we talked about DNS in in a another episode. But um but DNS heavily relies on UDP. And then the the new thing, the new kid on the block is uh quick qo q quic um which is more similar to UDP than TCP I would say. Mhm. Okay. How so? So quick actually I'm not even sure that it's it's more like a logic than a than an actual protocol because it is relying on UDP when you need multiplexing. So basically let's say in a simplified way like a birectional uh stream which to some extent HTTP2 already did with streaming but it it was weird I guess. uh HTTP3 uh which relies on quick heavily can open these streams where the streams can multiplex easily through quick but then quick what it does under the under the hood is just opening a UDP connection between two points and then multiplexing between those two points. Okay. And why why can't you because you you have kind of a connection is more or less like a stream in TCP. Uh why can't we just use TCP for that? Good question. Um I have no idea. So I I think what they are trying to solve is a different thing which is the data rate throttling and the other thing is that now we need to open another can of worms and that's HTTPS. Oh right. Haha. Um, so in HTTPS the way that it works, so we already I think it's worthwhile explaining why there's layering. So if you think about it, if you want to talk from one computer to another or from one phone to another or whatever, you need some sort of transmission medium. That's a physical medium. That can be radio waves. That can be a cable. Can be fiber. It can be whatever. Can be light signals. It can be messenger pigeons for all I care. So that's the physical medium. The problem is if it's messenger pigeons, I just need you to receive a message. I'm not interested in feeding the the pigeons. I'm not interested in like raising them and nursing them back to health if they get sick and all that kind of stuff. So, I want someone else to take care of that. And that's why we have this layering model, right? So, like someone takes care of the physical things. Then on top of that, we need to like figure out how we group the bites so that they go over this physical medium. Well, not byes, the bits actually and and so on and so forth. And that's why we have this layer model so that you don't need to worry about the layers that you're not interested in. Yeah. So for a website, I actually most of the time don't even care about HTTPS too much because I just have a document, a PDF or an HTML file or image and I just want that image to go from my server to your client on your phone and that's it. And and then I don't have to care. The lower level is how does it get there and that's HTTP normally um for the web anyway it's HTTP and then HTTP needs to somehow transmit these messages and that's usually TCP as we discussed but none of this does encryption none of it like none of it it's just like basically yelling over from from one end of the building to the other like hey I want this something something jjpeg okay here's something something.jpeg and everyone could figure it out. That's why HTTPS was invented and that is kind of HTTP like HTTP doesn't change but the lower level is now using what's called uh transport layer security if I'm not mistaken is that what yeah used to be secure socket layer but that's uh that's that's dead and and gone it's now TLS for a long time and the problem I think was that to do an HTTP request response which all our websites are doing whenever someone requests um you do like a handshake. So it's kind of like hi I would like to open a TCP connection to you Gary and then you go hi Martin yes please open a connection to me and then I'm like I have opened a connection to you and then that's the handshake that we have to do and for TLS we have to do one more handshake because then we need to figure out like are you hi are you really Gary and then you say like yes here's my certificate and I'm like okay cool I would like to use this encryption mechanism and then you're like okay cool here's my key and then I'm like hi cool fantastic here's my key. So, we have to do two handshakes and with UDP because it's this kind of like ah whatever I send you a thing you may receive it or not I don't care um you don't have a handshake there and for quick I believe it enforces TLS so it's HTTPS does not require TLS to be there then it's HTTP there you go I think quick only uses TLS right it has a quick connection is always a TLS based connection. Yeah. Right. So, we are saving a handshake as well. So, we have less network traffic. Okay. That's pretty cool. And because it's UDP, I don't know. Have you heard about uh head-ofthe-line blocking before? Yes. I think that's the problem they are trying to solve with switching to UDP. Uh yes, I think so. So in TCP um oh my god these are like topics that I haven't talked about in like 20 years. It's so nice to bring them back. I love this kind of stuff. network is my so my thing like you you keep you sent packets right uh from A to B and then for whatever reason on the way it could be on the B side it could be also in some cases on the A side or close to the A side the packets start queuing up for whatever reason so basically the like let's say the first packet is not letting through the the follow follow-up packets so basically you have like this weird blocking essentially where the packets can't reach because one packet was somehow blocked. And when I say blocked, it's just like gone missing or MIA or something. That that first messenger pigeon you sent out met its untimely demise on the way, but the second one is fine. Um but then the second one has a problem because it doesn't know where to land because the first hasn't reached its destination. Um and like in my brain that's that's head offline blocking which cannot happen with UDP. Like with UDP basically you have a bunch of pigeons on a line like tied to a line and you pull them through to B from A. And then if the first pigeon falls to its demise from the line um midway, it doesn't matter because the line is still going to be pulled through uh to be. Um so that's nice. I was listening to you and being fascinated and I was thinking like how does this relate to like site owners and SEOs and whatever? And that's a really good question. Um it it actually does. It does because like every now and then you would get these weird messages in search console that like there was something with the network um like I don't know how it's phrased but like uh network blockage or connection issues or something like that and that can actually happen in these layers that we are talking about like down in TCP IP and actually pro well actually also in UDP like why not a DNS problem DNS is usually a UDP P thing. Oh yeah. Yeah. Um and definitely in uh in in Quick as well like these are the things that are affect affected when when we are reporting those things in search console. Now if we step one up, we have HTTP like in your layers that you were describing in in your cake like on the top we would have htt https there you have more verbose reporting of the issues because down in TCP UDP there's not that much to know about like what happened unless you like break out your wire sharkark program and you start inspecting the packets and what not. Otherwise, you wouldn't know what happened because there's no reporting whatsoever about like what's happening and why. It is expected that from A something is going to reach B. But if the something hasn't reached B, then it's like, well, it was lost. Goodbye. Um, and the upper layers usually swallow that. Like the HTTP response doesn't come in, your browser shows you, whoopsie, and server didn't respond. you get the Chrome dyno or something. Uh well, actually you don't because that's no internet, but you would get like an error message like connection error or uh connection refused or something like for example if the wrong parts of the message got lost then the server might refuse the connection, right? Yeah, like if authentication layer uh is somehow corrupted um in an HTTPS connection, then the server might outright just refuse the connection without explaining itself. But then you have HTTP where you actually get stuff out of it because the servers that run the HTTP servers they or those HTTP servers can actually route the issue or pinpoint the issue like what happened. Yeah. But that requires a lot of uh debugging in the lower levels. That's true. Oh, I'm actually thinking about the uh error messages like HTTP 200. Okay, that's the higher level ones. Yeah, sorry. I I Okay, I didn't catch that. Yes. No, those those are pretty descriptive. That's true. Yeah, everyone knows 404. That's exactly that. Yeah. And that's coming from the HTTP server that powers your site. Mhm. Like basically the user typed in something. well typed in as in like probably clicked something and that something doesn't exist on your site. So basically you just or the server just returns a 404 and that's what it should do. Um I would argue in most of the cases it's just like yeah I have no idea about that URL. Yeah, I think that's a reasonable thing. And um and if it doesn't, then that's not necessarily a problem unless you want that file or that URL to actually return something meaningful. Yeah. Yeah. So that's And then you have fair. And then you you you have other reporting as well in HTTP like the you have 100 which I have no idea what it stands for. Um what is it used for? Hold on. I I saw that somewhere but I'm not sure. It's continue, but I'm not sure. I don't even know that. I I know that we technically don't support it as in like we just don't see it. Mhm. Like we just like pass through uh without even noticing that something was in the 100 range and just notice the next non00 uh status code. Yeah. And I mean uh there are complicated situations with like websockets and stuff where websockets are based on an HTTP connection but are not HTTP themselves really and you you get HTTP responses in the 100 range as well. So there's it's not always as simple is it? I don't know. Sometimes it is. I don't even know what a web soocket is. Oh so Aha. Okay. uh you can actually have a socket-l like connection. So like a a real-time channel between client and server where you can keep sending messages. What later on um also is kind of kind of possible with the HTTP2 push. There's lots of reasons not to use that or server send. Um but it's one of the ways to kind of have like a real-time communication between server and client. So if you do like a chat for instance, how would you do chat if you had HTTP? Because HTTP traditionally is you send a request to the server, client sends a request to the server, browser sends a request to web server and the web server responds with something. But if I build a chat program then how would that work? Because you at some point someone sends a message. So then I would have Yes, exactly. Then you start polling. So every 5 seconds, every 10 seconds, every minute I send a request to the server. The server says, "No, no new message." And then after a few minutes, I send a request to the server and it comes back with, "Yeah, here 10 new messages." That's one way to do it. Then there's server send events, which is I can't I can't remember how that works actually to be honest. And then one of the ways to do this kind of thing is to establish a websocket connection. So you get like a direct because the TCP connection stays open anyways. So you can kind of like use that to send specific uh messages and then in that case the browser just gets an event like hey by the way here's new data from the server whenever there's a message. So that's and for that you need to tell the browser and the server like hi I want to use websockets and that's when uh you get an HTTP 101 switching protocols. It's funky. I never heard of this. I I should read up on it. There's there's new stuff that I I I was like, "Oh, whoa." Okay. Wow. Um, there's also the opportunity to do what's called uh early hints. Yeah. Yeah. I I don't know what they are specifically, but I know that they are also a feature in HTTP that I haven't been using much. If if I remember correctly, early hints was something that cloudflare came out with few years ago and uh people were very excited about it and then we were talking to some folks about whether we need to support it but because it's like in the 1xx range and it doesn't actually benefit crawling that much because we are just going to pass through anyway. Mhm. It's like, yeah, like I we we don't know how to say that we don't support it, but then it's not that we don't support it. It's just like we just completely ignore it and then we were updating our documentation about it somehow. Oh, nice. What What other status codes? Like these status codes are actually important, I think, for site owners uh and SEOs because they tell a story about like what happened when a particular request came in. So we had 100 which to be honest I don't know when when it would be used outside of websockets uh or early hints. We have the 200 which is just like yeah there was content. Yeah and here's the content even if broken or something. Wait, wait. But in 200 you also have other stuff like you have 204 uh which like like because that so I remember 204 has to come back without body. Ah yeah it says no content and that's for caching right? Yeah. Ah okay. So if I'm requesting something and then the server goes like no no nothing has changed then it doesn't need to. Okay. So it assumes or it knows that the client has already gotten the content of the page. So if I visit a website, if I visit I don't know the Wikipedia article on JAM, uh that doesn't change as much I guess. Um because there's not that much invention in the jam space as far as I'm aware. Um, if I open that in my web in my browser and it saves it onto my computer for like a day or something and I visit it again, I don't need to actually transfer all the information again because I already have it on my computer or still have it on my computer. So then that's where 204 comes in, right? I mean, you can also describe that with uh something 30 something 304 I think. What was it? Uh yeah, like 304 304 not modified not modified. I'm not 100% certain when you would use 204 and when when you would use 304 because the 204 that I've seen that is an endpoint uh on uh google.com I think is google.com/generate_204. Um and it just does that like it generates a 204 u like nobody know nothing. Nobody knows. Nobody knows. You get it? Yeah. Nobody. Nice. Um anyway, so it generates a 204 and uh it was used or it is used for polling whether you have internet like on um on phones and whatnot, wi-i and stuff. Yeah, on Wi-Fi and stuff as well. But you can definitely use it for that. And then you in 300 also just for the record, we are just going to run out of time. So, we will need to continue this at one other point because we have so many more stuff to talk about in HTTP realm and we haven't even gotten to like HTTP 1, 2, and three. Oh, yeah. We just like managed to like mention them in passing but otherwise nothing. The you have the 3x ones. Yeah, that's which are mostly mostly redirections. Yes. which there you also have something confusing because you have 301, 302 and 300 itself. Um and then you have 307 308. Mhm. Which are also redirections. It's just like somehow it's different. But I have no idea. They carry very specific meaning and not necessarily meaning that you always need I believe. Yeah, I guess. But like for for us it's like for Google search specifically it's just like yeah it was a red reaction. It's like whatever like we kind of care about in canonicalization whether something was do uh temporary or permanent but like otherwise we just it's just like it was a ready reaction like whatever. Then you have ah 304 was the exception. Four xx also has an exception. Does it? I mean most of it is the client made a mistake. Right. What's the exception then? No. Come on. You can do this. Martin. Oh, I'm a teapot, right? Isn't that a force? Oh, it has two ex has two exceptions then. Oh, okay. The teapot one. That's uh I I think that's super funny. Uh and just illustrates how um uh the IETF works uh because it was like an April Fool's joke. I I don't remember from whom, but and they just made it into the standard because why not? Mhm. But what other exception do you mean? 429. I don't think that's too many requests. I don't think No, I I disagree that that's a difference or exception. Yeah. Do you want me to explain? Okay. Uh, so I think 400 means the client did something that was not supposed to or made some sort of mistake, right? 404, the client asked for a thing that I don't have. That's a mistake on the client side. Uh, or 40, is it 401 or 403 where I'm not allowed. 401 is unauthorized and the other one is forbidden and they have a significant anyway. So again, I asked for something that I'm not supposed to to get to these kind of things. 410 gone. Something was there but no longer is. And 429 too many requests just means I asked too often for something. Like I still have resources to serve you but I don't want to. Yeah. Or I don't have to I don't have resources to actually serve you. So instead of doing the work to serve you, I'll just tell you no, back off. Well, but for that you have all the all the 500s. M like if you are out of resources. Oh, okay. That is very philosophical. Okay. But maybe it's not because I don't have the resources. Maybe it's just I don't want you to ask me this often. It's like are we there yet? Are we there yet? Are we there yet? Are we there yet? I can answer, but I don't want to. Stop. I don't think that's a But that's what I'm saying. But I I think that's what I'm saying is that like I like I could serve you this this particular URL, but you were asking so often that I really just don't want to. Okay. Yeah. So I I think that's not an exception because it's it's just the client asked too often. Then the teapot is the exception. The teapot is definitely an exception. Okay. Because no one made something wrong there. Like it's 5xx uh something on the server. Always a server error. Yeah, it's always a server error. Yeah. Um, and I don't think that there's an exception there unless there's some isotheric 500 status code like in the high ranges. Um, like application specific like webdev for example, I remember used. Oh, why did I mention webdev? Now we have to explain it. Actually, you can just use your favorite search engine to look up what webdev is. Um, DAV web DAV. Anyway, um Martin, yes. Um what our producer was yelling at us for um running over um and uh Damn it. I am going to schedule one more of these and we finish talking about HTTP. Oh yes. Uh in the meantime, can people listening to this please let us know if this kind of stuff is interesting or not? Because I I'm not sure if this is just a weird noise fest or if people actually enjoy this. Yeah, please. That'd be interesting. We will keep doing it anyway because we actually like it. But like you tell us, man, this was great. Thank you so much. You're very welcome. I love it. That's it for this episode. Okay. If people want to find you to chat more, where can they do that, Martin? LinkedIn, Blue Sky, Maseron. Yeah. Yeah. And I'm antisocial. Don't find me. Um, well, thank you folks for listening and goodbye. Bye-bye. We've been having fun with these podcast episodes. I hope you, the listener, have found them both entertaining and insightful, too. Feel free to drop us a note on LinkedIn or chat with us at one of the next events that we go to if you have any thoughts. And of course, don't forget to like and subscribe. Thank you and goodbye. [Music]