April 4, 2006

Parsing Mail with PHP

Filed under: Links, PHP — Dimitris Giannitsaros @ 22:47

eZ Systems has released a new PEAR package (currently in alpha) to parse email messages. This tutorial by Derick Rethans shows how to use it.

I’m really interested in this as I want to implement email integration for Magna CRM (although there are a lot of features ahead of this).

Via PHPDeveloper.org

December 22, 2005

Testing and installer

Filed under: MagnaCRM, PHP — Dimitris Giannitsaros @ 21:38

Testing is going well so far. I have tested on a big variety of configurations including:

IIS / PHP4 and PHP 5 / CGI and SAPI
Apache 1.x / PHP4 and PHP5
Apache 2.x / PHP4 and PHP5

Apache was tested on multiple Windows and Linux servers and I have tried various php.ini configurations. Now, combine these with 3 databases (Access, MySQL and SQL Server) and you understand it was no easy task. Finally I feel quite confident it will work correctly on the majority of configurations but I still intend to test some more configurations (lighthttpd being one of them).

Currently I am finishing the second-to-last code-related task: the installer. Honestly I hate doing PHP installers. They usually end up being a mess, plus they are sensitive and error prone. Nevertheless it is a very important part of an application, so I’m trying to be careful.

The last code-related task will be a customer only part of the web site, where customers will be able to download the application, check for available updates, find their serial numbers etc.

November 20, 2005

DB performance and ADODB

Filed under: MagnaCRM, PHP — Dimitris Giannitsaros @ 23:42

One of the things to take into consideration when developing a multi-user application is the Database performance.

When developing PHP applications, I always use ADODB to provide a Database abstraction layer. More specifically I use my own DB layer, build on top of ADODB, but I only do this is to make it easy to replace ADODB if the need ever arises (e.g. if it stops being maintained).

ADODB has a great module (part of the standard ADODB package) called Performance Monitoring Library. This little gem provides some great functionality and I suggest you check it out.

An SQL logger and analyzer is provided, among other things. The SQL logger, when enabled, stores all executed queries in a DB table, along with some interesting data, like the time it was needed to execute the query. The SQL analyzer gives you 3 extremely useful reports:

  • Suspicious SQL: The queries with the highest execution time
  • Expensive SQL: The queries with highest total execution times (number executions * average execution time).
  • Invalid SQL: Queries that produced an error.

During development I leave the SQL logger constantly on and use the analyzer reports to either optimize my queries or find ways to reduce the number of executions. Warning: Be careful not to leave the SQL logger on by default, because in real installations it can reduce performance. Also the analyzer will take a very long time to analyze more that ~500k queries. Anecdotal: I once left the SQL logger enabled in a health related project running in a hospital (~40 concurrent users, 24×7) for about a week. The analyzer never returned any results for the ~5 million records it had logged. So I had to discard the logged queries and leave it running for exactly 1 day.

Since I’ve made extended use of the Perf library with some really good results in the past, I have integrated it into Magna CRM. Through the use of a single flag (DEBUG_LEVEL) I can do various things like enabling the SQL logger or displaying the SQL analyzer screens to an administrator. I will leave this in, in case I ever need to find what’s wrong with a specific installation that has performance issues.

November 14, 2005

Internationalization (i18n)

Filed under: PHP, Web development — Dimitris Giannitsaros @ 10:42

In order to offer localized versions of an application, there are two things to be taken care for:

  • Internationalization (i18n): this is more development centric and it’s all about designing your application to support translations, foreign character sets, timezones, different number / currency / date formats etc.
  • Localization (L10n): using the mechanism provided by i18n, this includes the actual translation and settings for a new language.

On this article I focus mainly on the translation part of the i18n process, although many other subjects are touched.

There are two approaches for translating strings:

  • Using a function to wrap your strings in your code. I call this the “gettext” approach (also check PHP gettext support). Of course it’s not the only tool that offers this functionality, but it supports many languages and it’s open source.
  • Using constants or variables instead of strings in your code. One or more language files contain the actual strings and these files are included by your application.

Lets see some general facts on these two approaches:

The “gettext” approach:

  • Is more complex.
  • Is suitable for large projects with thousands of strings.
  • Allows strings to be stored in a DataBase or an efficient data structure.
  • May utilize a special translating program.
  • May offer a better management of deprecated and changed strings.
  • Offers better domain management (think of domains as logical areas of the application e.g. the administrator’s area, the user’s area etc)

The “language file” approach:

  • Is significantly simpler.
  • Is better for smaller projects.
  • Stores strings in text files, so editing is easy for anyone.
  • Doesn’t force any good habits upon you, so you have to be careful.

I’ve used both approaches in web and desktop applications (most notably Cheez, a free image cataloguing tool, currently translated in 17 languages). The “language file” approach is my favorite, so what follows is some advice on using this approach:


Language scope

For multiuser applications, you must consider whether all users will have the same language (so language is a system setting) or each user will choose his language of choice (so language is a user setting).

The same rules apply to both cases, but you’ll need to make some design decisions based on that.

Single file vs. many files

It’s better to use a single file instead of many files. If this is becoming a really big file, maybe you should check the “gettext” approach. Exception: if your application supports plug-ins (e.g. user created stuff), make sure you provide a mechanism where each plug-in has its own language file (preferably a single file per plug-in). You don’t want to litter your core language file with strings specific to plug-ins.

For an example of why multiple files can get out of hand check osCommerce, which uses about 80 files per language in many directories. Moreover each plug-in is allowed to have many language files, making things even worse. Note: Other than this inconvenience, osCommerce is a very good and popular shopping cart solution.

Images

Handling localized images can be a bit tricky. You have at least 3 options:

  • All images have the same name, but are placed in different directories (/english/, /greek/, /german/). Your code uses the right directory based on a string included in the language file.
  • Images have different names, using a standard prefix / suffix (icon_delete_ENGLISH.png, icon_delete_GREEK.png).
  • Images can have any name as long as it’s defined in the language file with the rest resource strings. Your code loads images using the appropriate string. Of course it’s a good idea to use a naming convention for images (e.g. a suffix).

Personally I prefer the 3rd option. This way your code doesn’t do anything different than for normal strings, plus translators know what images must be changed just by looking the language file.

Other i18n settings

Be careful what i18n settings go into the language file. This can be a problem especially for web applications, where users can be anywhere in the world.

Many projects I’ve seen put things like the date / number / currency formats and timezone in the language file. It’s much better to have these as user settings: just because a user prefers a specific language e.g. Greek or German, it doesn’t mean he’s currently based in Greece or Germany.

Of course, which settings should go in the language file and which are made available as a user setting depends a lot on what your application does, what kind of users it has etc.


Charset

If your application supports Unicode you probably need only one charset (e.g. UTF-8), so you can skip this paragraph.

Charset is very important for two reasons: a) It allows users to correctly view localized characters. b) It allows users to correctly enter (and store) localized characters. The first thing that comes to mind is to put charset in the language file.

This is usually good enough, but has a small problem: Imagine your web application currently supports English. Inside your language file you’ve set a variable for the charset (e.g. $charset=”ISO-8859-1″) which you use for defining the charset of the html files. Imagine a Greek installs your software and tries to use it. Although he knows English, he would also like to insert data in Greek. Since you have tied the charset (ISO-8859-1) with the interface language (English) he can’t! If he could change the charset to “ISO-8859-7″ he would be able to enter and view Greek text (and of course the English strings of the UI would be displayed correctly).

I am not arguing that it’s always the right thing to offer charset as a user setting. Just remember that the language file’s purpose is to have the localized resource strings, without interfering with the way the application works.

Good resource strings

Some general guidelines for creating good resource strings:

  • Be careful to use complete sentences as resource strings while coding. Concatenation must be kept to a minimum and special language functions must be used to format strings (printf(), sprintf() for PHP).

    So instead of

    $location . " contains " . $count . " files";

    which needs 2 resource strings and the translator doesn’t see this as a complete sentence, use

    sprintf("%s contains %d files", $location, $count);

    which needs 1 resource string and actually makes sense to the translator.

  • If support for argument ordering is available, use it (PHP has it). So the above example would become:

    sprintf("%1$s contains %2$d files";

    which needs 1 resource string and the translator can change the argument order e.g.

    "%2\$d files are contained by %1\$s"

  • Try to keep related sentences in one resource string. “Command failed. Abort or retry?” should be one resource string, not two.
  • Sometimes it’s best to use different resource strings for the same word / phrase. This is hard to get right, because as a developer you don’t know which words have many different meanings in other languages. One solution is to use a different resource string for all strings. So if your application uses the word “Save” 28 times, then you have 28 different resource strings for “Save”. This can be extra work, both for you and the translator, but guarantees a better translation quality level can be achieved.

    The best solution is somewhere in the middle. This way simple words (”yes”, “no”) can be mapped to a single resource string, while more complex words (”execute”) have a separate resource string for each occurrence.


Resource strings naming convention

Obviously it’s a good idea to have a common prefix for resource strings (e.g. lc_). The rest of the name can be either an increasing number or a description:

$lc_res1
$lc_res2
$lc_res3
$lc_res4

or

$lc_yes
$lc_no
$lc_execute1
$lc_execute2

Although the 2nd group seems much clearer, after about 1000 strings it becomes difficult to think of good descriptive names and you end with things like

$lc_warn_user_after_failed_sql_execution_offer_to_retry

Domains

If you want to logically separate the resource strings based on different areas / parts of your application you may be tempted to use multiple files. I believe it’s always better to keep to one file and just use some comments for domain separation. So you can have:

// Admin area

$lc_admin_res1 = “”;
$lc_admin_res2 = “”;

// User area

$lc_user_res1 = “”;
$lc_user_res2 = “”;

Versioning

This is the single most important advice on this article.

After you release your first public version, you must never again change a resource string. Even if you find a typo or something bad you wrote about your boss or wife.

Both new and changed resource strings go at the end of the file. Moreover it’s good to keep a comment about each version:

// Version 1.0

$lc_yes = “Yed”;
$lc_no = “No”;

// Version 2.0

$lc_yes = “Yes”; // correction
$lc_new = “New”

// Version 3.0
(v3.0 resource strings will go here)

Note that the $lc_yes resource string was corrected in the new version, while the old resource string stayed the same.

There are a number of reasons to uphold this policy:

  1. Translators have a much easier job with new versions. They just go to the end of the file to find new / changed strings. Changed strings are marked with “correction”, so they can find the old translation by searching. No need to use tools like diff, to try and find what has changed between versions.
  2. You can have a single file per language that works for all versions. If you translate the v2.0 file, you can send it to someone using v1.0 and he’ll have no problem.
  3. When you release a new version (e.g. v3.0) you may not want to wait for translators to translate the new strings. So you just copy-paste everything under “Version 3.0″ from the original language file to all other languages files and you’re good to go (so for foreign languages, old strings will remain translated while new strings will be untranslated).

Used images are under a CC license. See here:
1st image, 2nd image, 3rd image, 4th image

I was writing an article on PHP and timezones (promised here), but then I changed my mind and decided to write and publish this one first. The one about timezones will be next.

November 3, 2005

Internationalization issues

Filed under: MagnaCRM, PHP — Dimitris Giannitsaros @ 13:23

One thing I had to take care while developing Magna was i18n / L10n issues.

With i18n I mean the support for different time zones, date and number and currency formats (what Windows refer to as Regional Options). With L10n I refer to the actual steps needed to offer a translated version.

I want to make sure Magna works in any environment (because your host may be located anywhere in the world) and with at least 3 different databases (Access, SQL Server and MySQL). So, I rolled my own solution, learning a lot in the process. The biggest problem was probably time zones (and DST - daylight savings). The obvious solutions was to keep things in the database in a common format and encode / decode values, based on user settings, when reading or writing to the database.

The results seem good and now each user can select these regional options:

  • Timezone
  • Date format (DDMMYYYY, MMDDYYYY etc)
  • Date separator
  • Week first day (Sunday, Monday etc)
  • Number decimal symbol
  • Number grouping symbol
  • Number decimal digits
  • Currency symbol
  • Currency symbol placement (left / right)

For dates, numbers and currencies the solution was straightforward. A simple pair of functions for each was enough (encode / decode). Dates (and timezones) were more difficult to handle. I am writing an article on this issue and I hope to post it in the next days. Localization issues are also a good subject for a separate article which I also hope to write soon.

October 2, 2005

PHP and Unicode BOM

Filed under: PHP — Dimitris Giannitsaros @ 13:43

I was experimenting with some UTF-8 PHP files today and I run again into the infamous BOM not ignored bug. I hadn’t been bitten by it for more than a year now and had forgotten what a PITA it is.

In case you are not familiar with it, the problem is that PHP doesn’t ignore the BOM bytes at the very beginning of a unicode file. So whenever you include a unicode file with BOM, PHP thinks the BOM is valid output and sends it to the browser, sending the HTTP headers along the way. So no more header manipulation after this point.

This is fixed in PHP 5 (and PHP 6 will have a better solution for it) but for now only 3 workarounds exist (that I know of):

  • Direct your editor not to save the BOM. I mostly use UltraEdit (ver 10.xx) and it has an option for that, but then it doesn’t recognize the file as UTF-8.
  • Turn output buffering on. This solves the header problem, but has other issues. One of them is that the 2-3 BOM bytes do get send to the browser after all and this can cause problems with invalid output or with server created javascript.
  • Turn on buffering before the include and delete the buffer contents after the include. This works when the included PHP file doesn’t create any other output. Example:
    ob_start();
    include ‘unicode_with_bom.php’;
    ob_end_clean();

July 7, 2005

Advanced PHP Programming

Filed under: General, PHP — Dimitris Giannitsaros @ 23:53

I read Advanced PHP Programming by George Schlossnagle.

While not a bad book, it left me with mixed feelings. It covers too many things, from coding style, design patterns and templates to caching, profiling and the Zen engine internals. I was expecting depth (since it’s supposed to be advanced) but the book just touches on too many concepts. A better title would be “Introduction to Advanced PHP Concepts”.

Anyway, every book has interesting things. My favorite parts were about profiling and benchmarking. I also enjoyed the Zend engine section, because I had never read about it.

June 21, 2005

PHP components I would buy

Filed under: PHP, Products and Services — Dimitris Giannitsaros @ 15:09

This is inspired by Ian Landsman’s comment:

I think developers are afraid to build components for languages like Perl, PHP, etc because of the risk of an open source competitor but I think there’ s a good business in there someplace.

Off the top of my head, here are some PHP components / libraries / solutions I would gladly pay for:

  • As we know, all programs expand until they can read mail ;-) So I would love a library that does exactly that: integrate with mail accounts (supporting pop, imap, exchange, web mail services with no pop access, whatever else exists), retrieve mail, filter it for spam (with keywords, black/white lists and / or Bayesian Filtering), convert from weird / non standard encodings, handle html, detect quoted parts and give it to me in a nice format.
  • A special installer for PHP applications. This should probably be based in a well-established installer (e.g. NSIS or InnoSetup) for Windows and something similar for Linux / Unix / MacOS. It should of course support all these Op. Systems. It should detect if Apache / PHP is installed and use it, detect and use IIS on Windows installing PHP if it’s not installed. It should even install Apache / PHP / MySQL on a clean system (if this is allowed by the various licenses of course). Finally it should take care of databases. A mechanism should be provided to install / update the database, at least for the most popular databases (Access, MySQL, MS SQL, Postgres)
  • APIs to integrate with other services (which may or may not offer their own API). A good example of a service I would like to integrate with, is Google Maps. The integration of craiglist and Google Maps is an exceptional example of what one can do. Flickr is another service I may want to integrate with. Contrary to Google Maps, they offer a public API, but it requires a lot of work (and that’s why so many 3rd party API kits exist). Search engines (Google, Yahoo) offer APIs but they are incompatible. If I wanted to integrate with them I would have to do a lot of work. Amazon, Feedburner, Bloglines and many others provide APIs, but it’s difficult to integrate with each and every one of them. I would like one API for similar services (one for eShop sites, one for Search engines, one for RSS content, one for News portals etc). I would like the API I use to stay the same, even when the underlying APIs change (e.g. the Flickr API which is in beta changes from time to time).

Of course, if you happen to know any existing solutions for the above (free or commercial), drop me a line.

June 19, 2005

First things first

Filed under: MagnaCRM, PHP, ProductX, Web development — Dimitris Giannitsaros @ 23:43

Starting a new application from scratch is usually fun, but it also has some boring aspects. Here I write about some stuff I find rather boring, mostly because they give no visual feedback. The first units I coded and tested for my CRM application (a couple months ago), were:

  • Database abstraction layer: I discovered the excellent ADODB library some years ago. ADODB supports an impressive number of databases. It is stable, used in many projects and most importantly I have successfully deployed large and small applications using ADODB, in the past. What I did, was create a simple layer on top of ADODB and use that throughout my application. This way I can replace ADODB in the future, if I need to, without much difficulty.
  • Error handler: Using PHP’s set_error_handler() I route errors to a custom function. There I decide how to handle the error: whether to log the error, issue a warning to the user and continue or display an error message and halt execution. Obviously I use trigger_error() to… trigger errors.
  • Session handler: I built a custom session handler, based on an article by Matt Wade. This gives me the following advantages:
    1. I can store session data in the database. So I don’t have to care about some php.ini settings (e.g. session.save_path) and I can store the user id associated with each session.
    2. I have better control over things like the “Remember me” (cookie_lifetime) and session garbage collector settings.
    3. I can allow/disallow simultaneous use of the same username. This will be part of the simplified copy protection scheme.

After these were in place, I developed the user login / logout procedures, so I could both test all three units and do something that gives a sense of progress (hey, I can login and logout! It’s almost done ;-).

June 8, 2005

Ten years since PHP 1.0

Filed under: PHP, Web development — Dimitris Giannitsaros @ 22:27

PHP turned 10 years old today! Congratulations to everyone involved in this great project.

My first web “application” was written in 1995 using C (CGI) while attending uni, then it was ASP for some years and after a brief test with JSP & Java I discovered PHP about 4 years ago. For all its quirks, I truly enjoy using PHP.


Powered by WordPress Theme by H P Nadig