HTTP Fundamentals, Part 2: Communication Stack, HTTP Connections, and REST Style of Architecture

Communication Stack

HTTP is the topmost layer in the communication stack and is called an application layer protocol.  From the web browser, it travels down a series of layers, and when it arrives at the web server, it then travels up through a series of layers.  The layers that make up the communication stack are:

  • Application – An example would be HTTP.
  • Transport – Responsible for error detection, flow control, and overall reliability.  An example would be TCP (Transmission Control Protocol).
  • Network – Responsible for taking pieces of information and moving them through the various switches, routers, gateways, repeaters, and other devices that move information from one network to the next and all around the world.  An example would be IP (Internet Protocol).  This is where the IP address comes in to play.
  • Data Link – This is where data have to travel over a piece of wire, a fibre optic cable, a wireless network, or a satellite link.  It’s focused more on 1s, 0s, and electric signals.  An example would be Ethernet.

HTTP relies on TCP to connect to the server.  It opens a TCP socket by specifying the server address (host name) and port (defaults to 80).  With an open socket, HTTP can write into it and read from it when it gets response from the server.

A free tool you can use to view HTTP,  TCP and IP packets is Wireshark.

 

HTTP Connections

Gone is the old days of simple web pages.  Nowadays, a webpage requires more than a single resource to fully render.  To compensate for this and so as not to bog down the Internet, several approaches have been employed   when using HTTP:

  • Parallel Connections – Browsers can open more than one connection to download several resources at the same time but there is a limit to it that is set by the server.  This is better though than doing a request in a serial one-by-one fashion.
  • Persistent Connections – Browsers can persist a connection to the server reducing overhead associated with opening and closing a TCP socket and thus improving performance.  This is the default connection style with HTTP 1.1.  These connections are only persistent for a period of time as set by either the server or the browser.  The server can also opt not to accept a persistent connection by specifying the Connection: close header in every HTTP response message.
  • Pipelined Connections – Not widely used as parallel and persistent connections, this type of connection allow multiple requests to be sent by the browser before the browser waits for the first response.

 

REST Style of Architecture

HTTP lends itself to the REpresentational State Transfer (REST) style of architecture.  If you think of resources and URLs as just not files on a web server’s file system, but more like as resource abstractions, you start to see the web as part of your application and as a flexible architectural layer you can build on.  The following are some RESTful aspects of a URL:

  • A URL cannot restrict the client or server to a specific type of technology.
  • A URL cannot force the server to store a resource using any particular technology.
  • A URL cannot specify the representation of a specific resource, and a resource can have multiple representations.  This is where content negotiation, discussed in Part 1 of this series, kicks in.
  • A URL cannot say what a user wants to do with a resource.  This is where the HTTP methods comes in.

Because an HTTP message is a simple, plain text message and fully self-describing, and together with the indirection provided by URLs, HTTP applications can rely on a number of services that provide value as a message moves between the client application and the server application.  Examples of services would be:

  • Web server
    • Route message to proper application
    • Log message to a local file
    • Compress message if client supports it
  • Proxy server – A computer that sits between a client and server.  Can either be a forward proxy that sits closer to the client or a reverse proxy that sits closer to the server.  Note that a proxy server does not have to be a physical server.
    • Prevent message to go out to specific servers
    • Remove confidential data in the message
    • Log message to create audit trails
    • Compress message
    • Forward message to one of several web servers (load balancing)
    • Encrypt and decrypt message (SSL acceleration)
    • Filter out potentially dangerous message (cross-site scripting, SQL injection attack)
    • Store copies of frequently accessed resources and respond to messages requesting those resources directly (caching)

These services can be layered into the network without impacting the application, and that is the beauty of HTTP.  It is scalable, simple, reliable, and loosely coupled.  In fact, REST was initially described in the context of HTTP.

 

*This article is Part 2 of the HTTP Fundamental series.  For Part 1, click here.

Advertisements

HTTP Fundamentals, Part 1: URL, Encoding, Request and Response

HTTP fundamentals a web developer needs to know:

  • HTTP address is called a URL (Uniform Resource Locator). Everything on the Internet is a resource.

     

    Example: http://mydevsite:1234/mydevpage?first=Rodan&last=Sotto#comment

     

    URL consists of the following parts:

     

    <scheme>://<host>:<port>/<path>?<query>#<fragment>

     

    • Scheme describes how to access a particular resource. In our example above, it’s HTTP. It can be HTTPS, FTP, or MAILTO.
    • Host is the name of the computer hosting the resource. In our example, it’s mydevsite.
    • Path is the path to the specific resource. In our example, it’s /mydevpage. A URL does not have to point to a specific file, like an image (*.jpg) or ASPX file (*.aspx). Nowadays, URLs are dynamic and, for search engine optimization (SEO), they usually contain descriptive keywords. See URL optimization for SEO.
    • Port is specified if the host is listening to HTTP requests on a port number other than 80, which is the default HTTP port number. Usually specified when testing, debugging, or developing web sites. In our example, its 1234.
    • Query, or query string, comes after ? (the question mark) and contains name=value pairs separated by & (the ampersand). In our example, its first=Rodan&last=Sotto.
    • Fragment is the part after the # sign. This is processed by the browser to display the element identified by the fragment at the top of the screen. In our example, the comment section is displayed on top of the screen.
  • URL encoding is the process of encoding unsafe characters found in the URL. The Internet standards list characters that are unsafe for URLs and thus they are encoded using % (the percent sign). One unsafe character is the space character and is usually encoded to %20. See URL unsafe characters.

     

    Example: http://mydevsite:1234/mydevpage/my%20file.txt

     

  • Content type is the MIME (Multipurpose Internet Mail Extensions) type that the server sends to the client so the requested resource can be displayed properly. The content type for an HTML resource, for example, is “text/html”, where “text” is the primary media type and “html” is the media subtype. If the client did not receive any content type information, it can guess the content type by scanning the first bytes of the response, and if that fails, will use the file extension instead. The client can also specify which content types it will accept when requesting a resource with multiple representations, a process called content type negotiation.
  • HTTP request and HTTP response form a single HTTP transaction. These 2 different message types are carefully formatted readable text messages that both server and client understand. Anyone that can send data over a network can participate, like the good old command line Telnet. Tools, such as Fiddler, can be used to inspect HTTP messages.
  • HTTP Request Methods:
    • GET to retrieve a resource
    • PUT to store a resource
    • DELETE to remove a resource
    • POST to update a resource
    • HEAD to retrieve the headers for a resource
  • Redirect response is sent by the server to the client to mean that the resource has moved to a new location and the client needs to send a request again to the new location. Redirect is also used to make sure all requests for resources from a server go through a specific location, a SEO practice known as URL canonicalization.
  • POST/Redirect/GET pattern is a common web design pattern employed by web applications when servicing POST requests so that the client is left with a response from a GET request. This avoids the issues with user refreshing or printing the page as a result of of the response of a POST request.
  • 3 common types of HTTP requests:
    • GET request , for example when clicking a link.

       

      GET http://mydevsite:1234/mydevpage/mydefault.aspx HTTP/1.1
      Host: mydevsite.com

       

    • POST request, when filling up a form whose method is POST. Form inputs go into the HTTP message body.

       

      POST http://mydevsite:1234/mydevpage HTTP/1.1
      Host: mydevsite.com
      firstName=Rodan&lastName=Sotto

       

    • Forms and GET request, when filling up a form whose method is GET. Form inputs go into the query string of the URL. Use this type of request if the operation does not require writing to the server, basically a safe retrieval operation. An example would be a search.

       

      GET http://mydevsite:1234/mydevpage?first=Rodan&last=Sotto HTTP/1.1
      Host: mydevsite.com

       

  • HTTP request message consists of the following parts:

     

    [method] [URL] [version]
    [headers]
    [body]

     

  • HTTP request headers, the Host header is one of them, contain useful information that can help the server process a request. Except for the host header, all request headers are optional. Popular request headers are:
    • Referer – URL of the referring page
    • User-Agent – information on the client software making the request, usually the browser
    • Accept – content types the client is willing to accept; used for content type negotiation
    • Accept-Language – languages the client prefers
    • Cookie – cookie information
    • If-Modified-Since – date the client last retrieved the resource; requests the server to only send the resource if it’s been modified since that time
  • A full HTTP request might look like the one below. Note that some headers contain multiple values, like the Accept header. The * (asterisk) in one of the values, usually provided as the last value, means anything. The q=[0..1] represents relative degree of preference, 1.0 being the highest and the default value. In our example below, the Accept header tells us the client will accept any content types but likes HTML best.

     

    GET http://mydevsite/ HTTP/1.1
    Host: mydevsite.com
    Connection: keep-alive
    User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) Chrome/16.0.912.75 Safari/535.7
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Referer: http://www.google.com/url?&q=mydevsite
    Accept-Encoding: gzip,deflate,sdch
    Accept-Language: en-US,en;q=0.8
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

     

  • HTTP response message consists of the following parts:

     

    [version] [status] [reason]
    [headers]
    [body]

     

  • A full HTTP response might look like the one below:

     

    HTTP/1.1 200 OK
    Cache-Control: private
    Content-Type: text/html; charset=utf-8
    Server: Microsoft-IIS/7.0
    X-AspNet-Version: 2.0.50727
    X-Powered-By: ASP.NET
    Date: Sat, 14 Jan 2012 04:00:08 GMT
    Connection: close
    Content-Length: 17151
    <html>
    <head>
    <title>My Development Site</title>
    </head>
    <body>
    ... content ...
    </body>
    </html>

     

  • HTTP Response Status Code Categories
    • 100-199 – Informational
    • 200-299 – Successful
    • 300-399 – Redirection
    • 400-499 – Client Error
    • 500-599 – Server Error
  • Common HTTP Response Status Codes
    • 200 – OK
    • 301 – Moved Permanently; redirect response used in URL canonicalization
    • 302 – Moved Temporarily; redirect response used in the POST/Redirect/GET pattern
    • 304 – Not Modified; in response to the If-Modified-Since request header
    • 400 – Bad Request
    • 403 – Forbidden
    • 404 – Not Found
    • 500 – Internal Server Error; usually happens due to programming errors in a web application