HTTP

.HyperText Transfer Protocol



HyperText Transfer Protocol

Web page by Kevin Harris of Homer IL

Please contact Kevin Harris of Homer IL concerning this web site

"I just had to take the hypertext idea and connect it to the TCP and DNS ideas and ta-da! ... the World Wide Web. - Sir Timothy John Berners-Lee "


Hypertext Transfer Protocol (HTTP) is an application-level protocol for information systems. It is a generic and stateless request/response protocol. HTTP has evolved to be used for many tasks beyond its original use for hypertext, through extension of its request methods and headers. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

In 1990, Tim Berners-Lee, with the help of Robert Cailliau, produced the first web browser and the the first web server. It was put online in 1991 running on a NeXT computer at CERN. The first web page address was http://info.cern.ch/hypertext/WWW/TheProject.html. Tim Berners-Lee's specifications of URIs, HTTP and HTML were refined as the technology spread. The standards development of HTTP was coordinated by the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C), culminating in the publication of a series of Requests for Comments (RFCs) with the current HTTP RFC being
IETF RFC 2616 - Hypertext Transfer Protocol HTTP/1.1

Uniform Resource Locator

A Uniform Resource Locator (URL) is a HTTP formatted text string which contains an address to a resource. Resources are things you which to interact with, such as a web pages, images, and videos. Resources are also components of a web page, such as those components which control how the page is formatted (CSS) or the behaves (JavaScript).

An example of a URL is: http://kcshadow.net:80/dom.htm#events

  1. http:// - URL Scheme, specifies the use of http. Other common schemes include: https, ftp, file, mailto, and telnet.

  2. kcshadow.net - Host, provides a user friendly location of the server. A Domain Name Server converts the text string into an IP Address (e.g. 204.73.40.35).

  3. dom.htm - Path, specified where on server resource is located. May be an actual path on a server, or can be a derived path containing descriptive keywords which will typically yield a higher Search Engine Optimization (SEO) rating.

  4. :80 - Port Number, typically ommited when the default port 80 is used. (Browsers assume port 80 is used.) However is required when web servers are configured to list on other ports, which is usually only done during development and testing.

  5. #events - Fragment, specifies a location within the HTML document for page positioning within the Browser (i.e. go to the location of the page marked with the fragment value).

Requests/Responses

HTTP involves a client computer sending a Request to a server. The server then creates a Response and sends it back to the client computer. HTTP is a stateless protocol where each Request/Response pair has no knowledge of any previous Request/Responses. However some methods are used to simulate state by resending values in hidden form fields or inside of cookies.

Some resource requests trigger additional requests to download resources. Such as a request for a web page which may also cause requests for images, videos, css, and javascript files. So for a typical web page request, the browser makes multiple HTTP requests to get all the resources needed for the page. Browsers have a maximum number concurrent requests allowed to a domain. Optimization techniques include bundling files (e.g. JavaScript files) and combining images into sprites to reduce the number of requests. Also pulling files from a separate domain (such as a Content Delivery Network) is another optimization technique for getting around the concurrent request limit per domain.

Browsers create the requests and handles the responses. Below is an example of a GET request which was manually created with telnet.


telnet cathyharris.net 80
GET / HTTP/1.1
HOST: cathyharris.net

.GET request using telnet


GET request using telnet

.Request/Response Headers in Fiddler


Request/Response Headers in Fiddler

Request Messages

Request messages to the server take the form of a start line consisting of a method, URL, and HTTP version number, followed by one or more headers (the HOST header is required), and an optional body. The method specifies the action required of the server. The URL is the address of the resource on the server. The version number is typically HTTP/1.1 which is the current HTTP version. The headers contain various information for the server. The body is typically not used with GET requests. The format of a request message format is:

[method] [URL] [version]
[headers]
[body]

Request Methods

Request methods, a.k.a verbs specify the action requested of the server, see W3 HTTP/1.1: Method Definitions. Web sites typically only use the GET and POST methods while RESTful services also use the PUT and DELETE methods. There is no limit to the number of methods which can be added. This allows for future expansion without affecting the current systems. For example, the WebDAV protocol, (for file access and content management over the Internet) defined 7 new methods.

The GET and HEAD methods are categorized as safe methods because they do not affect the state of the server. That is, they only retrieve data and do not change any data on the server. The POST, PUT, and DELETE methods are categorized as unsafe methods because they do affect the state of the server. Browsers may handle unsafe method differently than safe methods. For example resubmitting a POST method may cause a browser to display a confirmation message, while no confirmation is displayed on the resubmit of a GET method. Pages which perform POST operations typically use the Post Redirect Get (PRG) pattern to leave the user on a page which can be refreshed without resubmitting the POST.

Some of the request methods are:

  1. GET - (HTTP/1.0) Retrieves the resource (entity) identified in the URL. Has no affect on the state of the server.

  2. POST - (HTTP/1.0) Actual function is determined by the server. In general it creates a new entity.

  3. HEAD - (HTTP/1.0) Same as a GET, except it only returns the headers and nothing in the message body.

  4. PUT - (HTTP/1.1) It replaces an entity.

  5. DELETE - (HTTP/1.1) Deletes the entity specified in the URL.

  6. TRACE - (HTTP/1.1) Creates an application-layer loop-back used for diagnostic purposes.

  7. OPTIONS - (HTTP/1.1) Requests information about communication options and requirements.

  8. CONNECT - (HTTP/1.1) Can be used with a proxy to dynamically switch to tunneling.

Request Headers

Request headers contain useful information for the server, see W3 HTTP/1.1: Header Field Definitions. Some of the request headers are for:

  1. identifying the host - specifies the host URL (the Host: header is required).

  2. content negotiation - identifies the client's preferred content type, e.g. the Accept headers. The q> values in the Accept headers specify the relative degree of preference for a particular resource type. q values range from 0 to 1, with 1 being the most preferred.

  3. time stamp - identifies when the response was created.

  4. cache-control directives - specifies if and how caching is used.

  5. content metadata - e.g. content encoding, length, language, etc.

  6. implementation specific headers - e.g pragma.

  7. referring URL - identifies the URL which was used to get to the current URL.

  8. user agent - the browser type or other application used to access send the request.

  9. custom headers - used for Ajax, load balancers, etc., commonly have X- prepended to the header name.

Response Messages

Response messages are sent to the client from the server. The format of the response messages are similar to the format of the request messages. The response message has a start line which contain the status code which identifies what happened when the request was processed. The start line also contains a reason which is a text description of the status code. The response start line is followed by response headers which communicate information back to the client. At the end is the response body may contain HTML, a binary image, or other type of data.

The format of a response message format is:


[version][status code][reason]
[headers]
[body]

Response headers contain important information about caching, content type (MIME types), and content length which is used by the client when displaying the content. Response status codes are grouped into five categories. The status codes provide information concerning what happened when the server processed the request. Some of the most common response codes are:


200 - OK, the service was successful with no events.
301 - Moved Permanently, the resource permanently moved.
302 - Moved Temporarily, the resource temporarily moved, such as during maintenance
304 - Not Modified, the resource has not changed (use cached version)
400 - Bad Request, could be bad HTTP syntax,
401 - Unauthorized, access to the resource is not authorized.
404 - Not Found, the resource could not be found.
500 - Internal Server Error, could be an application error such as required database field missing.
503 - Service Unavailable, the server will not service the request.

.HTTP Status Codes


HTTP Status Codes

HTTP Networking



Networking Protocol Layers

HTTP resides at the application network protocol level. It's requests/responses travel down to the IP level for transmission across the network and then travels back up to the application level to communicate with browsers and web servers. The TCP layer provides a reliable protocol which assures the transmission succeeds by retransmitting the data if needed. TCP also controls the hand-shacking between the endpoints to establish agreed upon communication parameters. The network stack contains the following levels:

  1. HTTP - Application level that provides communications to software applications such as browsers and web servers.

  2. TCP - Transport level that provides hand-shaking, reliable transmission, and flow control.

  3. IP - Network level provides the movement of data through the hardware. Requires computers to have an IP address.

  4. Data Link - Moves the data through the transmission media (wires, air).
.Wireshark Network Protocol Analyzer

Wireshark Network Protocol Analyzer

Fiddler is a web proxy which allows you to only see the HTTP communications. In order to view the activities at the lower network levels a network protocol analyzer like Wireshark is required. Wireshark shows all the communications activities up and down the entire network stack

In the early days of the Internet each connection was immediately closed after the transmission was completed. Today browsers can use a certain number of multiple concurrent connections per host. Currently the number of "concurrent connections per host" for browsers can be up to six connections. The HTTP 1.1 specification made persistent connections the default. The browser or server can close a connection. A typical web server time-out value for closing a connection is five seconds. Servers can also be configured to not allow persistent connections. This will result in a "Connection: Closed" header.

Network Proxies

Proxy servers reside between the client and the web server. Proxy servers can be either hardware or software which typically is mostly transparent to the user. Proxy servers can perform a variety of functions from acting as access control devices to optimization (load balance and specialized routing). Forward Proxies are typically closer to the client than the server and perform services in a particular location. Forward proxies can provide access control, traffic logging, filtering of confidential information

Reverse Proxies are typically closer to the server than they are to the client. These are typically completely transparent to the user. Reverse proxies can be used for load balancing. A load balancing proxy forwards a request to one of multiple servers. Various factors can be used to determine the server which receives the request, such as current loads on individual servers or by having servers which specialize in handling certain types of resources. Servers can also filter out messages with contain SQL injection or cross-site scripting.

Public caches can also reside on proxy servers. The meta data inside the HTTP messages are used to determine when information should be retrieved from the source or used from a public cache. Browsers cache private data for the user. Browsers generally want to cache successful responses from safe requests (i.e. Status 200 OK on response to GET request). However responses from unsafe requests (e.g. POST, PUT, DELETE) are not cached. The headers in HTTP messages can control cache. Older cache headers exist, but the most recent, and widely supported cache header is the Cache-Control header. The header "Cache-Control: private, max-age=3600" will cache the response in the web browser for 3600 seconds. In Chrome you can enter chrome://cache to retrieve the cache being stored by the browser.

.Cache Stored in Chrome

Cache Stored in Chrome


Secure HTTP

Secure HTTP involves encrypting all HTTP messages before they are transmitted. Secure HTTP uses HTTPS as the scheme in the URL which causes the browser to use port 443 by default, instead of port 80. Secure HTTP is often referred to by its scheme name: HTTPS. In the early Web years, HTTPS was used primarily for payment and other sensitive transactions. However it wasn't long before HTTPS was widely used protecting all types of websites. For example, HTTPS is required when using Basic or Forms authentication to prevent the authentication credentials from being transmitted in clear text. Modern web browsers will alert the user when visiting sites that have invalid security certificates. With HTTPS, everything is encrypted with the exception of the host name. HTTP request and response messages and all cookies are encrypted.

The HTTPS certificate authenticates the server. With HTTPS you can be assured you are communicating with the certified server and not a "man-in-the-middle" who has used a proxy to intercept the traffic between you and the server. If you run a proxy, like Fiddler, while visiting an HTTPS site, some websitess will detect proxy and display a message indicating the connection may not be secure.

HTTPS works by adding an encryption layer, either Secure Sockets Layer (SSL) or Transport Layer Security(TLS) in the network protocol stack between the HTTP and TCP layers. SSL was developed by Netscape and the current version (3.0) was released in 1999. TLS is a similar protocol developed by the IETF in an attempt to standardize security protcol. TLS 1.0 was closely related to SSL 3.0, but there are significant differences such that they do not interoperate. Today SSL and TLS 1.0 are no longer secure because of POODLE attacks. The current version of TLS is TLS v1.2 with version TLS v1.3 in draft.



SSL/TLS Encryption Layer Added between HTTP and TCP Layers

  1. HTTP - Application level that provides communications to software applications such as browsers and web servers.

  2. SSL/TLS- Secure Sockets Layer or Transport Layer Security - Encrypts/decrypts all messages traveling up and down the stack.

  3. TCP - Transport level that provides hand-shaking, reliable transmission, and flow control.

  4. IP - Network level provides the movement of data through the hardware. Requires computers to have an IP address.

  5. Data Link - Moves the data through the transmission media (wires, air).

A web server requires an X.509 certificate to create an secure HTTP connection. To obtain the certificate, the IIS Manager is used to enter identity information into a request form. The identity information contains details about organization or person associated with the certificate. In a typical public key infrastructure (PKI), the certificate application is reviewed and, if approved, signed signature by a third-party certificate authority (CA). For purposes of testing HTTPS, a self-signed certificate can be created.

.Managing SSL Certificate in IIS

Managing SSL Certificate in IIS

QUALSYS SSL Labs has a SSL Server Test website which will evaluate the security certificate and protocols used by a website.

.QUALSYS SSL Labs Security Evaluation

QUALSYS SSL Labs Security Evaluation


.Privacy Message when Proxy is Used

Privacy Message when Secure Website Detects Proxy



Virtual Private Networks

HTTPS is a protocol which allows for end-to-end encryption over the public Internet. Another common security technique is to us a Virtual Private Network (VPN) to encrypt a private channel between the network and the client. Traditional VPNs (IPsec) require the installation of client software in order to create the private channel. HTTPS provides message encryption on a per-web-site basis to all clients, while a VPN provides encryption for all messages traveling to-and-from the browser for a particular client.

Two popular VPN technologies are VPN IPsec, which encrypts the messages at the IP packet level, and SSL VPNs which encrypt the messages higher up the network stack at the SSL level. SSL VPNs where designed to work with mobile devices. IPsec is application-agnostic, but does not have the ability to restrict resources at a granular level. SSL VPNs do not require special software be installed on the client and provides a granular level of control to resources. However IPsec supports a number of legacy protocols and traditional client/server applications which is not the case with SSL VPNs.



Error | ASP.NET Developer

Error

Error message

  • Warning: Cannot modify header information - headers already sent by (output started at /srv/disk9/1218369/www/kcshadow.net/aspnet/includes/common.inc:2748) in drupal_send_headers() (line 1232 of /srv/disk9/1218369/www/kcshadow.net/aspnet/includes/bootstrap.inc).
  • PDOException: SQLSTATE[42000]: Syntax error or access violation: 1142 INSERT command denied to user '1218369_b2cf'@'185.176.40.58' for table 'watchdog': INSERT INTO {watchdog} (uid, type, message, variables, severity, link, location, referer, hostname, timestamp) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4, :db_insert_placeholder_5, :db_insert_placeholder_6, :db_insert_placeholder_7, :db_insert_placeholder_8, :db_insert_placeholder_9); Array ( [:db_insert_placeholder_0] => 0 [:db_insert_placeholder_1] => cron [:db_insert_placeholder_2] => Attempting to re-run cron while it is already running. [:db_insert_placeholder_3] => a:0:{} [:db_insert_placeholder_4] => 4 [:db_insert_placeholder_5] => [:db_insert_placeholder_6] => http://www.kcshadow.net/aspnet/?q=http [:db_insert_placeholder_7] => [:db_insert_placeholder_8] => 54.162.171.242 [:db_insert_placeholder_9] => 1534728086 ) in dblog_watchdog() (line 160 of /srv/disk9/1218369/www/kcshadow.net/aspnet/modules/dblog/dblog.module).
The website encountered an unexpected error. Please try again later.