Basic Guidelines for Validation and Procesing of User Input to Web Applications

Here is another “Back to Basics” post to help establish, explain and document a baseline architecture for web applications. The modern web is filled with phenonmenal opportunities and following a few basic engineering priciples goes a long way to keep things moving forward in a positive direction.

By design, a web user agent (e.g. a common web browser) can directly submit arbitrary input values for all HTML QUERYSTRING variables, FORM posts, COOKIE files, and HTTP HEADER information (such as the REFERER) to the server application. Secure web database applications consider all user input as inherently untrustworthy or “tainted”. This tainted user input has a contaminating effect such that any value combined with it or otherwise derived from it becomes tainted as well. Saving tainted input to persistent storage such as a database a text file will not magically remove the taint or increase its trustworthiness. All inputs must be checked for proper formatting, boundary conditions, and “white-lists” of allowable values when possible. More importantly, special care must be taken in the use of all user input, even after it has passed all appropriate validation checks.

Although, by default, most popular browsers do not allow direct manipulation of REFERER information, it is a trivial task using special plug-ins or by writing simple codes. COOKIE files are just text stored on the user’s file system and may be easily inspected and modified. FORM posts are easily submitted from any HTML page and not just the one you are expecting. QUERYSTRING manipulation requires only typing new information after the “?” character in the URL displayed in a web browser’s address bar. The relative difficulty of spoofing the REFERER header versus typing a new QUERYSTRING in the browser does not make the REFERER header significantly more secure.

Client side validations using JavaScript or VBScript cannot ensure the integrity, format, or trustworthiness of user input. The results of client-side validations must eventually sent back to the server using untrustworthy FORM, QUERYSTRING, or COOKIE values.

  • Use encryption (e.g. SSL) to protect transmission of login information such as passwords or updates or sensitive data.
  • Use cookies only to store non-sensitive user preference information or a secure session token to facilitate server side logic.
  • Type check all program input and do additional boundary checks as required. Requests that offer input that exceeds its bounds should generally be failed rather than “sized to fit”.
  • Systems that have not effectively implemented least privileges principle in securing their database are potentially vulnerable to the problem of SQL Injection []. The construction of dynamic SQL text from raw user (either in whole or in part) may enable malicious code to be run on the web server. All major programming languages and SQL databases support strongly typed parameters and procedures that can be used to effectively eliminate most of these vulnerabilities even when connecting as a user or process with elevated privileges.
  • Canonicalization vulnerabilities occur when a decision on user input is not made based upon its simplest (canonical) representation. An all too common example of this problem are directory traversal vulnerabilities. These occur when the decision to only to allow access to a single folder does not first resolve user inputs into a canonical path name to correctly handle the common “..” parent directory syntax as well as the subtler issues surrounding text encodings.
  • Cross-Site Request Forgeries (CSRF) can occur when a web application does not prevent sensitive actions from being performed by a malicious third party site while a user is logged on. There are a number of options for Mitigating Cross-Site Request Forgeries [], but they all have limitations. The MySpace “Samy” worm [] is one of the most famous examples of this type of attack and is interesting because, even though MySpace had some CSRF protections in place, they were defeated with Cross-Site Scripting.
  • Cross-Site Scripting (XSS) occurs when a web application does not output properly encoded user input. Display of unencoded user input enables Malicious HTML Tags Embedded in Client Web Request []. This is one of the oldest problems on the web. In a Cross Site Scripting attack, web site users may have malicious scripts run automatically and submit information back to untrustworthy third party sites. All scripting enabled web browsers are vulnerable to these attacks. Cross Site Scripting mitigation techniques include the proper encoding HTML documents as well as the proper HtmlEncode or UrlEncode of user input.

    Even with all this in place, keep in mind that Content Hosting for the Modern Web [] is a non-trivial art. Simple features like allowing users to upload a picture can have non-obvious but potentially serious consequences. And the more you scale up with success on the Internet the more challenges you will face. So keep on watching and learning and good luck!