PDA

View Full Version : Terrestrial EPG foreign charater corruption



alphacabbage1
13-06-19, 15:35
Hi All,

For a long time EPG descriptions and sometimes, programme titles are out of whack on “smart quotes”, foreign accents, etc.

I'm not sure if it's channel/Mux specific but it's most often seen on BBC HD broadcasts (probably where we hang out the most). I'm wondering if I'm missing a character-encoding/language setting somewhere which would sort it.

Vu+ Duo²
OpenViX 5.2.042 (2019-06-02)
Kernel / Drivers: 3.13.5 / 20190429
Terrestrial Freeview only [Vuplus DVB-T NIM(TT3L10) (MultiType DVB-C/T2) x4]
(Crystal Palace)

OpenWebIf: Settings > Automatic Language shows:
EPG Language selection 1: 'Original'
EPG Language selection 2: 'None'

Any ideas, maybe tuner set-up?

5898358984

TIA!

alphacabbage1
20-06-19, 15:40
For anyone landing here, my best guess at an answer/workaround comes from a MediaPortal forum post (https://forum.team-mediaportal.com/threads/accents-diacritics-in-the-freeview-uk-ota-epg-dvb-t.106159/):

Freeview EPG encoding is erratic, with sources sometimes using non utf-8 encoding (ISO 6937, [ISO-8859-*])
utf-8 assumption corrupts on storage
Scripts could clean up the data but it's better dealt with at source (broadcaster) or on storage (e2/OpenVix)

I'm not sure how to test or go about any of that but it might point the way for people who know.

birdman
20-06-19, 19:59
Freeview EPG encoding is erratic

Certainly true. Some of it seems to have passed through an ISO-8859-* (or, I suspect, a Windows-1252) to utf8 encoder twice.
In practice you can usually work out what was meant.

alphacabbage1
21-06-19, 11:59
you can usually work out what was meant.
Yup, it's not just humans though -- occasionally, it'll hit autotimer searches (esp. when the title's affected).

One day I'll try and script something to work through .meta files (possibly, .eit) so at least the archive is clean.

I'm surprised the problem exists this day and age -- it's almost like the UK's not part of Europe. ;)

birdman
22-06-19, 00:05
Yup, it's not just humans though -- occasionally, it'll hit autotimer searches (esp. when the title's affected).Not that many searches on UK Freeview where you'd need a non-ASCII character.

The only one I can think of in the last few years was "Les Misérables", and that was OK. Well, it was OK for me - I've just looked again at the second image in #1.

Although the real oddity there is why BBC ONE HD was showing Baptiste, while BBC ONE London was showing "Les Misérables" - which had finished its series on Feb 3rd.

abu baniaz
22-06-19, 00:11
Not that many searches on UK Freeview where you'd need a non-ASCII character.

The only one I can think of in the last few years was "Les Misérables", and that was OK. Well, it was OK for me - I've just looked again at the second image in #1.

Although the real oddity there is why BBC ONE HD was showing Baptiste, while BBC ONE London was showing "Les Misérables",

London does not have a HD BBC channel.

abu baniaz
22-06-19, 00:18
The only one I can think of in the last few years was "Les Misérables", and that was OK. Well, it was OK for me.

You have a bug thread for this.

Thread below is not publically viewable
https://www.world-of-satellite.com/showthread.php?60850-quot-Accented-quot-characters-show-incorrectly-in-OpenWebIF-interface

birdman
22-06-19, 12:14
London does not have a HD BBC channel.No, but its SD one shows as BBC London and, except when regional news is being broadcast, it broadcasts the same programmes as the HD channel. So it shouldn't have been broadcasting Les Misérables at 9pm on Sun 24 Feb 2019.

birdman
22-06-19, 12:17
You have a bug thread for this.So I do.
However, that stated that the on-screen EPG was fine and AutoTimers worked OK - it was just the EPG view in OpenWebIF that was wrong.
FWIW, the name in the recording of the programme as seen via OpenWebIF is OK.

EnoSat
23-06-19, 19:07
use two-bit charset in epg , edit your encoding.conf

birdman
23-06-19, 21:24
use two-bit charset in epg , edit your encoding.confThe problem is with what the broadcaster sends.
It's difficult to detect an incorrect, but valid, string without knowing what it should be in the first place and, if you knew that, there's be no need to look at the incorrect one at all.