Transcript Collector

Debugging the Internet: HTTP, TCP, and You

2025-05-15 ยท en automatic

Open YouTube
[Music]
Hello and welcome to a new episode of
Search Off the Record, a podcast coming
to you from the Google Search team where
we talk about search and maybe have some
fun along the way. My name's name, I'm a
joke title, or at least that's what's in
our planning doc. In reality, my name is
Gary and I have no idea what I'm doing
at Google, but I do stuff. Today, I'm
joined by this wonderful person
who I call Morty. I pretty sure that's
not his real name. Um, is it? No.
No. Okay. No. No.
Well, let's not split hairs.
Oh god. So today I'm joined by Martin
Split. Everyone say hi. Or at least
that's in my dock. Hi. Mo. No. Hi. You
have to say hi. Hi.
Thank you. Hello. All right. So, I'm
sure you're wondering why I have
gathered you here today. Uh yeah. Yeah.
Why? What is what is happening? What
have I done wrong? Oh, you are wondering
always.
So I was talking to people. Yeah. Okay.
Let's just You were talking to people.
Are these people in the in this room
right now?
No. Oh, but they are real. I hope so.
Otherwise, I should really visit a psych
psychotherapist or something.
But one realization that I had and I'm
pretty sure that you're going to agree
is that even those who have been in the
internet industry for long enough forgot
about HTTP.
as in like
they use the term HTTP but like how it
works and why it works and the different
nuances and all the weird stuff that
it's doing and it's based on maybe
people don't know that much about it
anymore. I agree. Yeah. And I I have to
say I keep forgetting things about HTTP
and find out new things. Um well it's
evolving so that's natural. So, I was
thinking that for a change, we don't
have a script and the two of us are just
going to chat about HTTP. All right.
200. Okay.
Wow. So geeky.
Wow. 100. Continue.
Oh god, this was a mistake. Let's talk
about something else. 101. Switching
protocols.
All right. So, HTTP, it is the thing
that makes the internet happen. Would
you agree? Oh, careful. The internet or
the
web? Not the same thing. Nice. The web.
Uhhuh. Okay. Yes. I would say yes.
Okay. How about TCP? because in one of
my discussions
uh TCP and HTTP they were used
interchangeably. No, they're not. And I
know that we're experimenting with
something that uses or is kind of like
HTTP but it's not using TCP. Should we
explain the difference
of of HTTP versus TCP?
Well, yeah. Okay. Do you wanna should I
try? Okay. Okay. Um, so I I I would this
this I'm probably going to botch it. Um,
because uh it's been a while. I was uh
working on u networking as in like 20
years. Um
but TCP
is
transport something protocol transport
control protocol. I think Mhm. it
is the main protocol I would say of the
old internet of the old web as in like
it is the transmission protocol that was
used on
HTTP versions prior
HTTP3. Well, actually HTTP3 still uses
that, but let's not go there yet. Um and
it was or it is
coming
from like the grandfather pretty much of
the internet vins surf. It it was
introduced a billion years ago like 50
years maybe. I think it I think it was
introduced in the 70s and they needed a
a protocol that
was capable of doing packet switching
between network nodes. Mhm. So basically
you send out a packet and it could
go like one way or other way plus it
could uh or it had to be able to
negotiate connections and then
reliably transfer the data packets from
A to B
and there has been
multiple approaches at this kind of
stuff. Yeah. Oh yeah. Yeah. And then I I
think it was like also taken as the
basis for other protocols later like um
voiceover IP whatever that protocol was
RTP or something. It
basically enables us to transfer data
packets uh reliably. Mhm. It accounts
for data
loss. So if there's something like a a
packet is lost for example then the
server or the client can reerequest the
packet. Mhm. Which well it's important
and then on top of that we have HTTP
right. Yeah. Well HTTP and so many other
things. Yeah. Mhm. Well yeah but like if
we are talking about HTTP then Yeah. uh
on top of this TCP thing, we have HTTP.
Yeah, I I think you could say that. And
in in general, like it's layers upon
layers of of the way that things work.
And
uh one of the lower layers is TCP and
then on top of that, you can do things
like HTTP or HTTPS or FTP or mail stuff,
SMTP, POP, this kind of stuff. M even
pings technically you can send on like
you know when you type in command line
like ping something like you can
actually send it over TCP like by
default it's on ICMP protocol but uh you
can switch it to to TCP
um which is UDP basically not blocked
because usually ICMP is blocked by
origins because like why why are you
pinging me? Mhm. um trace route the
program that is also normally or by
default is working over
ICMP but you can switch it to a TCP. You
can also switch it to UDP I think. Yeah.
So that that introduces more confusion.
Let's not go there.
Yeah. Um I I think we can't cannot avoid
talking about UDP because quick like the
new version of of HTTP HTTP3 is
making use of
UDP a lot. Yeah. Or UDP like structure
height. I mean that makes sense in some
ways. Yeah. To understand the difference
between them, you could think of one is
like establishing a connection. So like
having like a channel that you can pump
things through, but that obviously
requires like some setup and some tear
down cost because you have to set up
this so quote unquote connection. And as
you said, like if something gets lost,
you know that it got lost and then you
actually kind of by default wait for it
to be ret-ransmitted before assembling
the rest. But sometimes that doesn't
make sense. If you think about it, if
you were to like let's say do video
streaming, if you use a video streaming
provider or if you do like a video call,
if I miss one frame out of the 25 frames
per second, it doesn't matter. But I
don't want the the image to freeze until
that old frame from like a few minutes
ago comes comes in. So UDP is kind of
fine with losing stuff,
but it doesn't guarantee that everything
makes its way across the network to the
recipient, whereas TCP does. And I think
with HTTP, in certain circumstances, it
can make sense if we lose
one quote unquote frame in between.
But quick does something weird to make
that nice. Right.
Right. So
UDP it I I so UDP is in in general is
not used on on HTTP like I would say
like on on on HTTP you are just going
TCP like transmission control protocol
but UDP has the the feature that you
described that like a packet is lost
don't care let's go but like UDP you
might actually uh see in DNS which are
not maybe we talked about DNS in in a
another episode. But um but DNS heavily
relies on UDP. And then the the new
thing, the new kid on the block is uh
quick qo q quic
um which is more similar to UDP than TCP
I would say. Mhm. Okay. How so? So
quick actually I'm not even sure that
it's it's more like a logic than a than
an actual protocol because it is relying
on UDP when you need multiplexing. So
basically let's say in a simplified way
like a birectional uh stream which to
some extent HTTP2 already did with
streaming
but it it was weird I guess. uh
HTTP3 uh which relies on quick heavily
can open these streams where the streams
can multiplex easily through quick but
then quick what it does under the under
the hood is just opening a UDP
connection between two
points and then multiplexing between
those two points. Okay. And why why
can't you because you you have kind of a
connection is more or less like a stream
in TCP. Uh why can't we just use TCP for
that? Good question. Um I have no idea.
So I I think what they are trying to
solve is a different thing which is the
data rate throttling and the other thing
is
that now we need to open another can of
worms and that's HTTPS.
Oh right.
Haha. Um, so in HTTPS the way that it
works, so we already I think it's
worthwhile explaining why there's
layering. So if you think about it, if
you want to
talk from one computer to another or
from one phone to another or whatever,
you need some sort of transmission
medium. That's a physical medium. That
can be radio waves. That can be a cable.
Can be fiber. It can be whatever. Can be
light signals. It can be messenger
pigeons for all I care. So that's the
physical medium. The problem is if it's
messenger pigeons, I just need you to
receive a message. I'm not interested in
feeding the the pigeons. I'm not
interested in like raising them
and nursing them back to health if they
get sick and all that kind of stuff. So,
I want someone else to take care of
that. And that's why we have this
layering model, right? So, like someone
takes care of the physical things. Then
on top of that, we need to like figure
out how we group the bites so that they
go over this physical medium. Well, not
byes, the bits actually and and so on
and so forth. And that's why we have
this layer model so that you don't need
to worry about the layers that you're
not interested in. Yeah. So for a
website, I actually most of the time
don't even care about HTTPS too much
because I just have a document, a PDF or
an HTML file or image and I just want
that image to go from my server to your
client on your phone and that's it. And
and then I don't have to care. The lower
level is how does it get there and
that's HTTP normally
um for the web anyway it's HTTP and then
HTTP needs to somehow transmit these
messages and that's usually TCP as we
discussed but none of this does
encryption none of it like none of it
it's just like basically yelling
over from from one end of the building
to the other like hey I want this
something something jjpeg okay here's
something something.jpeg and everyone
could figure it out. That's why HTTPS
was invented and that is kind of HTTP
like HTTP doesn't change but the lower
level is now using what's called uh
transport layer security if I'm not
mistaken is that what yeah used to be
secure socket layer but that's uh that's
that's dead and and gone it's now TLS
for a long time and the problem I think
was that to do an HTTP request response
which all our websites are doing
whenever someone requests um you do like
a handshake. So it's kind of like hi I
would like to open a TCP connection to
you Gary and then you go hi Martin yes
please open a connection to me and then
I'm like I have opened a connection to
you and then that's the handshake that
we have to do and for TLS we have to do
one more handshake because then we need
to figure out like are you hi are you
really Gary and then you say like yes
here's my certificate and I'm like okay
cool I would like to use this encryption
mechanism and then you're like okay cool
here's my key and then I'm like hi cool
fantastic here's my key. So, we have to
do two handshakes and with UDP because
it's this kind of like ah whatever I
send you a thing you may receive it or
not I don't care um you don't have a
handshake there and for quick I believe
it enforces TLS so it's HTTPS does not
require TLS to be there then it's HTTP
there you go I think quick only uses TLS
right it has a quick connection is
always a TLS based connection. Yeah.
Right. So, we are saving a handshake as
well. So, we have less network traffic.
Okay. That's pretty cool. And because
it's
UDP, I don't know. Have you heard about
uh head-ofthe-line blocking before? Yes.
I think that's the problem they are
trying to solve with switching to UDP.
Uh yes, I think so. So in TCP um oh my
god these are like topics that I haven't
talked about in like 20 years. It's so
nice to bring them back. I love this
kind of stuff. network is my so my thing
like you you keep you sent packets right
uh from A to B and then for whatever
reason on the way it could be on the B
side it could be also in some cases on
the A side or close to the A side the
packets start queuing up for whatever
reason so basically the like let's say
the first packet is not letting through
the the follow follow-up packets so
basically you have like this weird
blocking essentially where the packets
can't reach because one packet was
somehow blocked. And when I say blocked,
it's just like gone missing or MIA or
something. That that first messenger
pigeon you sent
out met its untimely demise on the way,
but the second one is fine.
Um but then the second one has a problem
because it doesn't know where to land
because the first hasn't reached its
destination. Um and like in my brain
that's that's head offline blocking
which cannot happen with UDP. Like with
UDP basically you have a bunch of
pigeons on a line like tied to a line
and you pull them through to B from A.
And then if the first pigeon falls to
its demise from the line um midway, it
doesn't matter because the line is still
going to be pulled through uh to be. Um
so that's nice. I was listening to you
and being fascinated and I was thinking
like how does this relate to like site
owners and SEOs and whatever? And that's
a really good question. Um it it
actually does. It does because like
every now and then you would get these
weird messages in search console that
like there was something with the
network um like I don't know how it's
phrased but like
uh network blockage or connection issues
or something like that and that can
actually happen in these layers that we
are talking about like down in TCP IP
and actually pro well actually also in
UDP like why not a DNS problem DNS is
usually a UDP P thing. Oh yeah. Yeah. Um
and definitely in uh in in Quick as well
like these are the
things that are affect affected when
when we are reporting those things in
search console. Now if we step one up,
we have HTTP like in your layers that
you were describing in in your cake like
on the top we would have htt
https there
you have more
verbose reporting of the issues because
down in
TCP UDP there's not that much to know
about like what happened unless you like
break out your wire sharkark program and
you start inspecting the packets and
what not. Otherwise, you wouldn't know
what happened because there's no
reporting whatsoever about like what's
happening and why. It is expected that
from A something is going to reach B.
But if the something hasn't reached B,
then it's like, well, it was lost.
Goodbye.
Um, and the upper layers usually swallow
that. Like the HTTP response doesn't
come in, your browser shows you,
whoopsie, and server didn't respond. you
get the Chrome dyno or something. Uh
well, actually you don't because that's
no internet, but you would get like an
error message like connection error or
uh connection refused or something like
for example if the wrong parts of the
message got lost then the server might
refuse the connection, right? Yeah, like
if authentication layer uh is somehow
corrupted
um in an HTTPS connection, then the
server might outright just refuse the
connection without explaining itself.
But then you have
HTTP where you actually get stuff out of
it because the servers that run the HTTP
servers they or those HTTP servers can
actually route the issue or pinpoint the
issue like what happened. Yeah. But that
requires a lot of uh debugging in the
lower levels. That's true.
Oh, I'm actually thinking about the uh
error messages like HTTP 200. Okay,
that's the higher level ones. Yeah,
sorry. I I Okay, I didn't catch that.
Yes. No, those those are pretty
descriptive. That's true. Yeah, everyone
knows 404. That's exactly that. Yeah.
And that's coming from the HTTP server
that powers your site. Mhm. Like
basically the user typed in something.
well typed in as in like probably
clicked something and that something
doesn't exist on your site. So basically
you just or the server just returns a
404 and that's what it should do. Um I
would argue in most of the cases it's
just like yeah I have no idea about that
URL. Yeah, I think that's a reasonable
thing. And um and if it doesn't, then
that's not necessarily a problem unless
you want that file or that URL to
actually return something meaningful.
Yeah. Yeah. So that's And then you have
fair. And then you you you have other
reporting as well in HTTP like the you
have 100 which I have no idea what it
stands for. Um
what is it used for? Hold on. I I saw
that somewhere but I'm not sure. It's
continue, but I'm not sure.
I don't even know that. I I know that we
technically don't support it as in like
we just don't see it. Mhm. Like we just
like pass through uh without even
noticing that something was in the 100
range and just notice the next non00
uh status code.
Yeah. And I mean uh there are
complicated situations with like
websockets and stuff where websockets
are based on an HTTP connection but are
not HTTP
themselves really and you you get HTTP
responses in the 100 range as well. So
there's it's not always as simple is it?
I don't know. Sometimes it is. I don't
even know what a web soocket is. Oh so
Aha. Okay. uh you can actually have a
socket-l like connection. So like a a
real-time channel between client and
server where you can keep sending
messages. What later on um also is kind
of kind of possible with the HTTP2
push. There's lots of reasons not to use
that or server send. Um but it's one of
the ways to kind of have like a
real-time communication between server
and client.
So if you do like a chat for instance,
how would you do chat if you had HTTP?
Because HTTP traditionally is you send a
request to the server, client sends a
request to the server, browser sends a
request to web server and the web server
responds with something. But if I build
a chat program then how would that work?
Because you at some point someone sends
a message.
So then I would have Yes, exactly. Then
you start polling. So every 5 seconds,
every 10 seconds, every minute I send a
request to the server. The server says,
"No, no new message." And then after a
few minutes, I send a request to the
server and it comes back with, "Yeah,
here 10 new messages." That's one way to
do it. Then there's server send events,
which is I can't I can't remember how
that works actually to be
honest. And then one of the ways to do
this kind of thing is to establish a
websocket connection. So you get like a
direct because the TCP connection stays
open anyways. So you can kind of like
use that to send specific uh messages
and then in that case the browser just
gets an event like hey by the way here's
new data from the server whenever
there's a message. So that's and for
that you need to tell the browser and
the server like hi I want to use
websockets and that's when uh you get an
HTTP 101 switching protocols. It's
funky.
I never heard of this. I I should read
up on it. There's there's new stuff that
I I I was like, "Oh, whoa." Okay. Wow.
Um, there's also the opportunity to do
what's called uh early hints. Yeah.
Yeah. I I don't know what they are
specifically, but I know that they are
also a feature in HTTP that I haven't
been using much.
If if I remember correctly, early hints
was something that cloudflare came out
with few years ago and uh people were
very excited about it and then we were
talking to some folks about whether we
need to support it but because it's like
in the 1xx range and it doesn't actually
benefit crawling that much because we
are just going to pass through anyway.
Mhm. It's like, yeah, like I we we don't
know how to say that we don't support
it, but then it's not that we don't
support it. It's just like we just
completely ignore it and then we were
updating our documentation about it
somehow. Oh, nice.
What What other status codes? Like these
status codes are actually important, I
think, for site owners uh and SEOs
because they tell a story about like
what happened when a particular request
came in.
So we had 100 which to be honest I don't
know when when it would be used outside
of websockets uh or early hints. We have
the 200 which is just like yeah there
was content. Yeah and here's the content
even if broken or something. Wait, wait.
But in 200 you also have other stuff
like you have 204
uh which
like like because that so I remember 204
has to come back without body. Ah yeah
it says no content and that's for
caching right? Yeah. Ah okay. So if I'm
requesting something and then the server
goes like no no nothing has changed then
it doesn't need to. Okay. So it assumes
or it knows that the client has already
gotten the content of the page. So if I
visit a website, if I visit I don't know
the Wikipedia article on JAM, uh that
doesn't change as much I guess. Um
because there's not that much invention
in the jam space as far as I'm aware.
Um, if I open that in my web in my
browser and it saves it onto my computer
for like a day or something and I visit
it again, I don't need to actually
transfer all the information again
because I already have it on my computer
or still have it on my computer. So then
that's where 204 comes in, right? I
mean, you can also describe that with uh
something 30 something 304 I think. What
was it? Uh yeah, like 304 304 not
modified not modified. I'm not 100%
certain when you would use 204 and when
when you would use 304 because the 204
that I've seen that is an endpoint uh on
uh google.com I think is
google.com/generate_204. Um and it just
does that like it generates a 204 u like
nobody know nothing. Nobody knows.
Nobody knows. You get it? Yeah. Nobody.
Nice.
Um anyway, so it generates a 204 and uh
it was used or it is used for polling
whether you have internet like on um on
phones and whatnot, wi-i and stuff.
Yeah, on Wi-Fi and stuff as well. But
you can definitely use it for that. And
then you in 300 also just for the
record, we are just going to run out of
time. So, we will need to continue this
at one other point because we have so
many more stuff to talk about in HTTP
realm and we haven't even gotten to like
HTTP 1, 2, and three. Oh, yeah. We just
like managed to like mention them in
passing but otherwise
nothing. The you have the 3x ones. Yeah,
that's which are mostly
mostly redirections. Yes. which there
you also have something confusing
because you have 301, 302 and 300
itself. Um and then you have 307 308.
Mhm. Which are also redirections. It's
just like somehow it's different.
But I have no idea. They carry very
specific meaning and not necessarily
meaning that you always need I believe.
Yeah, I guess. But like for for us it's
like for Google search specifically it's
just like yeah it was a red reaction.
It's like whatever like we kind of care
about in canonicalization whether
something was do uh temporary or
permanent but like otherwise we just
it's just like it was a ready reaction
like whatever.
Then you
have ah 304 was the exception.
Four xx also has an exception. Does it?
I mean most of it is the client made a
mistake. Right. What's the exception
then? No. Come on. You can do this.
Martin. Oh, I'm a teapot,
right? Isn't that a force? Oh, it has
two ex has two exceptions then. Oh,
okay. The teapot one. That's uh I I
think that's super funny. Uh and just
illustrates how um uh the IETF works uh
because it was like an April Fool's
joke. I I don't remember from whom, but
and they just made it into the standard
because why not? Mhm. But what other
exception do you mean? 429.
I don't think that's too many requests.
I don't think No, I I disagree that
that's a difference or exception.
Yeah. Do you want me to explain? Okay.
Uh, so I think 400 means the client did
something that was not supposed to or
made some sort of mistake, right? 404,
the client asked for a thing that I
don't have. That's a mistake on the
client side. Uh, or 40, is it 401 or 403
where I'm not allowed. 401 is
unauthorized and the other one is
forbidden and they have a significant
anyway. So again, I asked for something
that I'm not supposed to to get to these
kind of things. 410 gone. Something was
there but no longer is. And 429 too many
requests just means I asked too often
for something. Like I still have
resources to serve you but I don't want
to. Yeah.
Or I don't have to I don't have
resources to actually serve you. So
instead of doing the work to serve you,
I'll just tell you no, back off. Well,
but for that you have all the all the
500s. M like if you are out of
resources. Oh, okay. That is very
philosophical. Okay. But maybe it's not
because I don't have the resources.
Maybe it's just I don't want you to ask
me this often. It's like are we there
yet? Are we there yet? Are we there yet?
Are we there yet? I can answer, but I
don't want to. Stop.
I don't think that's a But that's what
I'm saying. But I I think that's what
I'm saying is that like I like I could
serve you this this particular URL, but
you were asking so often that I really
just don't want to. Okay. Yeah. So I I
think that's not an exception because
it's it's just the client asked too
often. Then the teapot is the exception.
The teapot is definitely an exception.
Okay. Because no one made something
wrong there. Like it's 5xx
uh something on the server. Always a
server error. Yeah, it's always a server
error. Yeah. Um, and I don't think that
there's an exception there unless
there's some
isotheric 500 status code like in the
high ranges. Um, like application
specific like webdev for example, I
remember used. Oh, why did I mention
webdev? Now we have to explain it.
Actually, you can just use your favorite
search engine to look up what webdev is.
Um,
DAV web DAV. Anyway, um Martin, yes. Um
what our producer was yelling at us for
um running over um and uh Damn it. I am
going to schedule one more of these and
we finish talking about HTTP. Oh yes. Uh
in the meantime, can people listening to
this please let us know if this kind of
stuff is interesting or not? Because I
I'm not sure if this is just a weird
noise fest or if people actually enjoy
this. Yeah, please. That'd be
interesting. We will keep doing it
anyway because we actually like it. But
like you tell us, man, this was great.
Thank you so much. You're very welcome.
I love it. That's it for this episode.
Okay. If people want to find you to chat
more, where can they do that, Martin?
LinkedIn, Blue Sky,
Maseron. Yeah. Yeah. And I'm antisocial.
Don't find me. Um, well, thank you folks
for listening and goodbye.
Bye-bye. We've been having fun with
these podcast episodes. I hope you, the
listener, have found them both
entertaining and insightful, too. Feel
free to drop us a note on LinkedIn or
chat with us at one of the next events
that we go to if you have any thoughts.
And of course, don't forget to like and
subscribe. Thank you and goodbye.
[Music]