October 2, 2005

PHP and Unicode BOM

Filed under: PHP — Dimitris Giannitsaros @ 13:43

I was experimenting with some UTF-8 PHP files today and I run again into the infamous BOM not ignored bug. I hadn’t been bitten by it for more than a year now and had forgotten what a PITA it is.

In case you are not familiar with it, the problem is that PHP doesn’t ignore the BOM bytes at the very beginning of a unicode file. So whenever you include a unicode file with BOM, PHP thinks the BOM is valid output and sends it to the browser, sending the HTTP headers along the way. So no more header manipulation after this point.

This is fixed in PHP 5 (and PHP 6 will have a better solution for it) but for now only 3 workarounds exist (that I know of):

  • Direct your editor not to save the BOM. I mostly use UltraEdit (ver 10.xx) and it has an option for that, but then it doesn’t recognize the file as UTF-8.
  • Turn output buffering on. This solves the header problem, but has other issues. One of them is that the 2-3 BOM bytes do get send to the browser after all and this can cause problems with invalid output or with server created javascript.
  • Turn on buffering before the include and delete the buffer contents after the include. This works when the included PHP file doesn’t create any other output. Example:
    ob_start();
    include ‘unicode_with_bom.php’;
    ob_end_clean();

1 Comment

  1. I was bitten yesterday : It took me quite some time to understand why header(”Location : …”) led to an error saying that output had begon on line 1 !
    Boy… I didn’t even know I had a BOM. Worked way better after removing it !

    Comment by Serge Wautier — October 26, 2005 @ 22:51

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.


Powered by WordPress Theme by H P Nadig