LinuxSecurity.com
Share your story
The central voice for Linux and Open Source security news
Home News Topics Advisories HOWTOs Features Newsletters About Register

Welcome!
Sign up!
EnGarde Community
Login
Polls
What is the most important Linux security technology?
 
Advisories
Community
Linux Events
Linux User Groups
Link to Us
Security Center
Book Reviews
Security Dictionary
Security Tips
SELinux
White Papers
Featured Blogs
All About Linux
DanWalsh LiveJournal
Securitydistro
Latest Newsletters
Linux Advisory Watch: November 21st, 2014
Linux Security Week: November 17th, 2014
Subscribe
LinuxSecurity Newsletters
E-mail:
Choose Lists:
About our Newsletters
RSS Feeds
Get the LinuxSecurity news you want faster with RSS
Powered By

  
Control Character Encoding in Output

9.5. Control Character Encoding in Output

In general, a secure program must ensure that it synchronizes its clients to any assumptions made by the secure program. One issue often impacting web applications is that they forget to specify the character encoding of their output. This isn't a problem if all data is from trusted sources, but if some of the data is from untrusted sources, the untrusted source may sneak in data that uses a different encoding than the one expected by the secure program. This opens the door for a cross-site malicious content attack; see Section 5.10 for more information.

CERT's tech tip on malicious code mitigation explains the problem of unspecified character encoding fairly well, so I quote it here:

Many web pages leave the character encoding ("charset" parameter in HTTP) undefined. In earlier versions of HTML and HTTP, the character encoding was supposed to default to ISO-8859-1 if it wasn't defined. In fact, many browsers had a different default, so it was not possible to rely on the default being ISO-8859-1. HTML version 4 legitimizes this - if the character encoding isn't specified, any character encoding can be used.

If the web server doesn't specify which character encoding is in use, it can't tell which characters are special. Web pages with unspecified character encoding work most of the time because most character sets assign the same characters to byte values below 128. But which of the values above 128 are special? Some 16-bit character-encoding schemes have additional multi-byte representations for special characters such as "<". Some browsers recognize this alternative encoding and act on it. This is "correct" behavior, but it makes attacks using malicious scripts much harder to prevent. The server simply doesn't know which byte sequences represent the special characters.

For example, UTF-7 provides alternative encoding for "<" and ">", and several popular browsers recognize these as the start and end of a tag. This is not a bug in those browsers. If the character encoding really is UTF-7, then this is correct behavior. The problem is that it is possible to get into a situation in which the browser and the server disagree on the encoding.

Thankfully, though explaining the issue is tricky, its resolution in HTML is easy. In the HTML header, simply specify the charset, like this example from CERT:

<HTML>
<HEAD>
<META http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">
<TITLE>HTML SAMPLE</TITLE>
</HEAD>
<BODY>
<P>This is a sample HTML page
</BODY>
</HTML>

From a technical standpoint, an even better approach is to set the character encoding as part of the HTTP protocol output, though some libraries make this more difficult. This is technically better because it doesn't force the client to examine the header to determine a character encoding that would enable it to read the META information in the header. Of course, in practice a browser that couldn't read the META information given above and use it correctly would not succeed in the marketplace, but that's a different issue. In any case, this just means that the server would need to send as part of the HTTP protocol, a ``charset'' with the desired value. Unfortunately, it's hard to heartily recommend this (technically better) approach, because some older HTTP/1.0 clients did not deal properly with an explicit charset parameter. Although the HTTP/1.1 specification requires clients to obey the parameter, it's suspicious enough that you probably ought to use it as an adjunct to forcing the use of the correct character encoding, and not your sole mechanism.

    
Partner

 

Latest Features
Peter Smith Releases Linux Network Security Online
Securing a Linux Web Server
Password guessing with Medusa 2.0
Password guessing as an attack vector
Squid and Digest Authentication
Squid and Basic Authentication
Demystifying the Chinese Hacking Industry: Earning 6 Million a Night
Free Online security course (LearnSIA) - A Call for Help
What You Need to Know About Linux Rootkits
Review: A Practical Guide to Fedora and Red Hat Enterprise Linux - Fifth Edition
Yesterday's Edition
How to weed out the next Heartbleed bug: ENISA details crypto worries
Attackers Using Compromised Web Plug-Ins in CryptoPHP Blackhat SEO Campaign
Finally, a New Clue to Solve the CIA’s Mysterious Kryptos Sculpture
Partner Sponsor

Community | HOWTOs | Blogs | Features | Book Reviews | Networking
 Security Projects |  Latest News |  Newsletters |  SELinux |  Privacy |  Home
 Hardening |   About Us |   Advertise |   Legal Notice |   RSS |   Guardian Digital
(c)Copyright 2014 Guardian Digital, Inc. All rights reserved.