I am working on a piece of python code to handle the POST request method from an html sever. The block of code below was provided to me in the answer to a question I asked earlier. I’ve been going through it line by line to make sure that I understand the logic at play here. But i’m hitting a snag on this line in particular.
user_input = post_data.split('=')[1]
So I’m going to outline what I think the logic is doing then provide the actual code, and I’m hoping someone could correct me where appropriate to make sure I actually understand the logic correctly. And furthermore explain why the line of code indicated above exists at all, I don’t understand why the split method is needed in this case.
Could post_data not simply be taken as is and printed out instead? Or would that cause problems for some reason?
This top section here I am providing for context, as it’s the rest of the program.
# Python 3 server example
from http.server import BaseHTTPRequestHandler, HTTPServer
import time #Why is time imported if it's not used? Hypothesis: the send response method on line 10 states among other things to send the current date. Thus time is needed to determine current date?
hostName = "localhost"
serverPort = 8080
class MyServer(BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header("Content-type", "text/html")
self.end_headers()
self.wfile.write(bytes("<html><head><title>https://pythonbasics.org</title></head>", "utf-8"))
self.wfile.write(bytes("<p>Request: %s</p>" % self.path, "utf-8"))
self.wfile.write(bytes("<body>", "utf-8"))
self.wfile.write(bytes("<p>Hello world! This is a webpage!</p>", "utf-8"))
self.wfile.write(bytes("<p> And hello to you! Please enter your name below!</p>", "utf-8"))
self.wfile.write(bytes("""<form action="/" method="post">
<label for="name">Please enter your name:</label><br>
<input type="text" id="name" name="name"><br>
<input type = "submit" value = "Click me!">
</form>
""", "utf-8" ))
self.wfile.write(bytes())
self.wfile.write(bytes("</body></html>", "utf-8"))
This section below is the portion of the code I am wanting to verify my understanding of. What I think is happening here logically and some additional questions have been included in comments within the code.
def do_POST(self):
content_length = int(self.headers['Content-Length'])
#The content-string above is an html header responsible for declaring the length of the text being passed to the server via the POST method.
#However, Content-Length is never declared a value that I can tell, not here or in the rest of the program. So, does HTML then by default take the entire doccument provided by the POST method if Content-Lengtg is not assigned any value?
# It is then coerced into an int and passed to the content_length variable. But what is the purpose of the ".headers" method?
post_data = self.rfile.read(content_length).decode('utf-8')
#utf-8 is the unicode format for encoding/decoding the given text. .read is being passed the length of the message and thus is reading the entire message. The message is then passed to .rfile, I am not sure why this is, why is the standard .read method not sufficent?
user_input = post_data.split('=')[1]
#I really have no idea why this the split method is needed at all. Could post_data not be used as it is?
self.send_response(200)
#Sends the webpage the 200 response code indicating the server proccessed the request correctly.
self.send_header('Content-type', 'text/html')
#Declares that the data about to be sent is of the type text, and is written in HTMl
self.end_headers()
self.wfile.write(bytes("html>head>title>https://pythonbasics.org</title>/head>", "utf-8"))
#I don't understand the use of the bytes class in these lines. I'm assuming that html needs information passed to it to be encoded into bytes for the transfer? This class does so?
self.wfile.write(bytes("<body>", "utf-8"))
self.wfile.write(bytes(f"<p>Hello {user_input}!</p>", "utf-8"))
#At the beginning of this string after the bytes class ther is a single "f" present. Why is this? Is it something to do with html coding or python? Also am I right in thinking that {user_input} is the syntax in HTML for inserting a variable?
self.wfile.write(bytes("</body></html>", "utf-8"))
2
Answers
The reason is because the post_data variable contains "name=<the_text_input>", then if you want the text input value; you have to split the string and get the last part.
One of the many things you develop when working as a software developer is the ability to try things out. What’s the worst thing that can happen? Errors, errors everywhere (read it with Buzz Lightyear’s voice if you get the reference)…
Do we enjoy running into bugs and errors? Well, usually not… but we’re not always sad either. An error is an opportunity to learn by cleaning your code, creating useful logs, learning about a new package functionality, etc.
My point here is: do not be afraid to test. If you’re facing someone else’s code and you’re unsure of what something does or why it’s even there, don’t be scared to remove it and rerun the code!
For example… you’ve noticed that the
time
package is imported but not used. You can remove it and see what happens! (Spoiler: It works just fine. That package wasn’t needed for your code to run at all)Another useful ability is to search and read package documentation. Yeah, it’s boring sometimes, but in most scenarios, it will save your day and even teach you a new thing or two.
For example… you’ve asked about the use of
bytes
inside thewrite
method. Well, first, you need to understand that your server is based on a built-in class from thehttp
package calledBaseHTTPRequestHandler
. More precisely, your class inherits fromBaseHTTPRequestHandler
, which means they both have the same attributes and methods.(Tip: if you don’t understand the concept of inheritance maybe you should take a step back and reinforce the concepts of Object-Oriented Programming before diving into examples such as this webserver.)
It’s always a good idea to look at the docs when inheriting from an already coded class. I’ll save you a few clicks and put the link directly here.
There you can find information about the attribute
wfile
. It’s said thatThe docs are actually redirecting us to another documentation, which is fine. It often happens. We now need to know what an "io.BufferedIOBase stream" is. Or, more specifically, we need to know what its
write
method does.The docs are pretty straightforward and give us the answer in the first sentence:
The puzzle has been solved. The
write
method needs a "bytes-like" object as input, so that’s why we usebytes
all around when callingself.wfile.write
.These two skills (testing and reading docs) are core tools for you to develop to understand better how things work in the software-developing world. I felt that it was important to give you this background before going to your specific question. But now lets go
Why the need of
split
?I’ll take some time here to give you an overview of some other topics as well. I’m no expert, but hopefully, you’ll find this interesting.
When using HTTP forms, it’s useful to understand how the information passed by the user is sent to the server. To do this, you can run your server code, access its page link by any browser, and then open the "devtools" (check how to do it on this link). And finally, open the "Network" tab
You’ll see it’s empty at first, but then just test your code: input your name and click the button. Now, a few requests may have appeared.
Look for the POST request, since it’s the one called by the button-click event. How do we know that? In your code the
<form>
tag has the following options<form action="/" method="post">
, which says the server to run a method of type POST when that button is clicked.After selecting the POST request look for its headers. It has a lot of them, but only one is useful right now: the "Content-Type". It states how the input data from the user will be written in the request. See the print screen below
The
application/x-www-form-urlencoded
states that the information we’ll be written as name-value pairs written likename1=value1&name2=value2&name3=value3
etc. (There are some answers over SO that talk about this, such as this one)This starts to ring a bell, doesn’t it? But lets dive some more. You can see the actual information sent to the server on the "Payload" tab. I’ve printed it for you as well
You can guess by that I wrote a single
X
and clicked the button. But see, this is the information the browser is sending. The name of the property is "name" because of the option inside thelabel
tag of your form. If you change it to something like<label for="banana">
we would seebanana=X
in the browser.That’s how the browser sends the information to the server. From the server side, you need to read it somehow. This is achieved by the
rfile.read
method.Now you know the path to the docs you’ll understand that this method needs an integer for the number of bytes to be read. Happily, this number of bytes is sent in the request header under the key "Content-Length".
You can check at the first print that in my case, the "Content-Length" was 6: the name of the property (
name
) is 4 bytes long; the=
sign is 1 byte long; and my answer (X
) is also 1 byte long. That’s why the Content-Length is 6.By now you can probably figure everything out:
content_length
variable to get the size of the POST payload in number of bytes=
"Hello {user_answer}"
you need only the value, therefore yousplit
the payloadname=user_answer
and voilà.This was a pretty long explanation of the "why" the code was written like this. I often enjoy this journey of reading docs, but it can be boring, I know.
Another way to understand the "why" would be to change your code to print the entire
post_data
without the split. Then you would see thename=answer
printed out, and you’d probably figure everything out by yourself.I hope you’ve learned a thing or two. Feel free to ask more if you like. Happy coding to you (: