15 March 2013

remove non ASCII characters from a String

Hi guys

I had a problem with removing non-utf8 characters from string, which are not displaying properly. Characters are like this 0x97 0x61 0x6C 0x6F (hex representation).

I am getting some encoded value from url. Let assume it's encoded email and value is ankitchauhan22@gmail.com
When I tried to find out the length of this string as
  
$email = somefunction($encodedStringFromUrl);
$length = strlen($email);
print $length;
    
 
I was shocked. It's printing 37 instead of 24. Than I printed each index of this string on the string but after 24 character, nothing printed.

I used trim() to remove whitespace but didn't work.

Then I tried something which worked for me.

This is a little snippet that will remove any non-ASCII characters from a string.
  
$string = "ankitchauhan22@gmail.com รครณ";
$string = preg_replace('/[^(\x20-\x7F)]*/','', $string);
    
 
Now It's printing 24.

No comments:

Post a Comment