The root cause?
There’s also a pure-PHP option: combined with mb_* functions gives you a U::toUtf8() method that attempts detection + conversion. What About Files? finfo vs mb_detect_encoding Don't confuse file encoding (how bytes are structured) with MIME content type . detect encoding php
// Double-check UTF-8 validity if ($detected === 'UTF-8' && !mb_check_encoding($string, 'UTF-8')) return 'Windows-1252'; // common fallback The root cause
function smartEncodingDetect(string $string, array $priorities = ['UTF-8', 'ISO-8859-1', 'Windows-1252']) foreach ($priorities as $encoding) // For UTF-8, validate it strictly if ($encoding === 'UTF-8' && mb_check_encoding($string, 'UTF-8')) return 'UTF-8'; // For others, attempt detection if (mb_detect_encoding($string, $encoding, true) === $encoding) return $encoding; return 'UTF-8'; // safe fallback finfo vs mb_detect_encoding Don't confuse file encoding (how
For serious work, mb_detect_encoding has limitations. Consider nelexa/encoding or symfony/polyfill-intl-normalizer , but the gold standard is Mozilla’s universalchardet (ported to PHP as jaybizzle/crawler-detect or similar, or use the mbstring strict mode).
$string = "Café"; $encoding = mb_detect_encoding($string); echo $encoding; // UTF-8 (usually) By default, it looks for . You can pass a custom list of encodings: