Hello Guest, if you are reading this it means you have not registered yet. Please take a second, Click here to register, and in a few simple steps you will be able to enjoy our community and use our OpenViX support section.

View Entry Info: Crash when folder contains filenames with Windows encoding

Category:
Possible Bug
What ViX Image build number are you using?
Please provide your ViX Team image build number. Menu > Information > About > Build number > ENTER THIS NUMBER e.g. 4.2.028
6.2.005
Have you tried a flash WITHOUT settings restore?
Have you tried this? PLEASE SELECT YES OR NO.
No
Have you tried a flash WITH settings restore?
Have you tried this? PLEASE SELECT YES OR NO.
No
Attachments
Page 2 of 10 FirstFirst 1234 ... LastLast
Results 16 to 30 of 138

Thread: Crash when folder contains filenames with Windows encoding

  1. #16
    abu baniaz's Avatar
    Title
    Moderator
    Join Date
    Sep 2010
    Location
    East London
    Posts
    23,362
    Thanks
    6,443
    Thanked 9,160 Times in 6,235 Posts
    This is how my one looks. I used your zip file.
    Attached Images Attached Images

  2. #17
    twol's Avatar
    Title
    Moderator
    Join Date
    Apr 2012
    Posts
    8,419
    Thanks
    997
    Thanked 2,894 Times in 2,247 Posts
    Quote Originally Posted by ocean View Post
    pöllö.mp4 is just empty file. Attached zipped folder. Windows is Finnish locale. I'd guess this is Python3 related issue..
    interesting in that when I unzip there are 2 files there pöllö.mp4 and a 2nd with the ö replaced by hex 94. On copying to the moves folder, both appear but the 2nd now shows as pll.mp4 with the hex characters stripped out. Both are accepted without a crash.
    Gigablue Quad 4K & UE 4K
    .........FBC Tuners:
    ------------------> GT-Sat unicable LNB to 1.5M dish(28.2E)
    ------------------> Gigablue unicable LNB to 80 cm dish(19.2E)
    .......................> FBC & DVB-S2X into 90cm dish (27.5W) Opticum robust Unicable LNB
    AX HD61, Edision Osmio 4K+, Zgemma H9Combo, Octagon SF8008 , gbtrio4k, h9se using unicable ports
    Zgemma H9 C/S into Giga4K

  3. #18
    birdman's Avatar
    Title
    Moderator
    Join Date
    Sep 2014
    Location
    Hitchin, UK
    Posts
    7,797
    Thanks
    237
    Thanked 1,659 Times in 1,307 Posts
    Quote Originally Posted by ocean View Post
    pöllö.mp4 is just empty file. Attached zipped folder. Windows is Finnish locale. I'd guess this is Python3 related issue..
    Or a Windows one, since filenames are supposed to be utf-16(?).
    That filename shows up as pФllФ.mp4 on my Linux systems. (The Ф is a Cyrillic Capital Letter EF - Unicode U+0424.)

    But it is a valid utf-8 string (even if it doesn't look as you'd expect). Not sure why the ö (o with diaresis - U+00F6) should show up differently in different places, though.
    MiracleBox Prem Twin HD - 2@DVB-T2 + Xtrend et8000 - 5(incl. 2 different USBs)@DVB-T2[terrestrial - UK Freeview HD, Sandy Heath] - LAN/USB-stick/HDD

  4. #19
    birdman's Avatar
    Title
    Moderator
    Join Date
    Sep 2014
    Location
    Hitchin, UK
    Posts
    7,797
    Thanks
    237
    Thanked 1,659 Times in 1,307 Posts
    Quote Originally Posted by twol View Post
    interesting in that when I unzip there are 2 files there
    What unzip are you using?
    Mine finds one file and says so:

    Code:
    [parent]: unzip -l ../mymovies.zip  
    Archive:  ../mymovies.zip 
      Length      Date    Time    Name 
    ---------  ---------- -----   ---- 
            0  2022-06-23 16:44   mymovies/pФllФ.mp4 
    ---------                     ------- 
            0                     1 file
    And we disagree about the name which is there too.

    However, if I actually peep into the zip file the filename which is there is:
    p~ll~.mp4
    where both ~s are byte 0x94.
    Which in Unicode is a non-printable character. (CANCEL CHARACTER).
    Which might be what triggers the bugs.
    MiracleBox Prem Twin HD - 2@DVB-T2 + Xtrend et8000 - 5(incl. 2 different USBs)@DVB-T2[terrestrial - UK Freeview HD, Sandy Heath] - LAN/USB-stick/HDD

  5. #20
    birdman's Avatar
    Title
    Moderator
    Join Date
    Sep 2014
    Location
    Hitchin, UK
    Posts
    7,797
    Thanks
    237
    Thanked 1,659 Times in 1,307 Posts
    Quote Originally Posted by birdman View Post
    However, if I actually peep into the zip file the filename which is there is:
    p~ll~.mp4
    where both ~s are byte 0x94.
    Which in Unicode is a non-printable character. (CANCEL CHARACTER).
    Which might be what triggers the bugs.
    On reflexion 0x94, whilst being perfectly legal in an ext4 filename, is NOT legal utf-8.
    That should be the 2-byte sequence 0xc2 0x94.

    But ö is U+00f6 in Unicode, which is 0xc3 0xb6 in utf-8.

    Anything that looks at filesystem names (or, as in console output, might echo this back) has to be able to cater for the result being non-utf8 when decoded.
    So this might be what contributes to triggering the bugs.
    Last edited by birdman; 23-06-22 at 20:42.
    MiracleBox Prem Twin HD - 2@DVB-T2 + Xtrend et8000 - 5(incl. 2 different USBs)@DVB-T2[terrestrial - UK Freeview HD, Sandy Heath] - LAN/USB-stick/HDD

  6. #21

    Title
    Member
    Join Date
    Jun 2022
    Posts
    70
    Thanks
    1
    Thanked 33 Times in 19 Posts
    Tested openatv 6.4 and everything works fine there. 0x94 = 148 = ö in codepage 850

    Just realised 6.4 still uses python2. Tested Openatv 7.0 and that is also crashing. I guess all pyhton3 images have this issue?!
    Last edited by ocean; 29-06-22 at 08:59.

  7. #22
    Joe_90's Avatar
    Title
    Moderator
    Join Date
    Mar 2014
    Location
    Wicklow, Ireland
    Posts
    4,109
    Thanks
    1,275
    Thanked 1,122 Times in 884 Posts
    Quote Originally Posted by birdman View Post
    On reflexion 0x94, whilst being perfectly legal in an ext4 filename, is NOT legal utf-8.
    That should be the 2-byte sequence 0xc2 0x94.

    But ö is U+00f6 in Unicode, which is 0xc3 0xb6 in utf-8.

    Anything that looks at filesystem names (or, as in console output, might echo this back) has to be able to cater for the result being non-utf8 when decoded.
    So this might be what contributes to triggering the bugs.
    I just tried extracting the file on my linux mint system. This how it shows on the console:

    Code:
    joe@joe-desktop:~/Downloads/mymovies$ ls -al
    total 124
    drwx------  2 joe joe   4096 Jun 29 09:11  .
    drwxr-xr-x 25 joe joe 118784 Jun 29 09:11  ..
    -rw-rw-r--  1 joe joe      0 Jun 23 16:44 'p'$'\302\224''ll'$'\302\224''.mp4'
    ...and in the Nemo File Manager:

    Code:
    p”ll”.mp4
    Where the " are displayed as unicode 0094 symbols.
    Last edited by Joe_90; 29-06-22 at 09:22.
    GB Quad Plus, Mut@nt HD51, AX HD61, 80cm dish and Supreme Dark motor. Sony STR-DN 1060, Sony UHP-H1 Bluray, Odroid N2+ (CoreElec), Monitor Audio Bronze 5.1 speakers

  8. #23

    Title
    Member
    Join Date
    Jun 2022
    Posts
    70
    Thanks
    1
    Thanked 33 Times in 19 Posts
    Spent some time debuging code and now know what is causing the crash.

    "def createPlaylist()" in file MovieSelection.py

    When there is non utf-8 filename in directory, item.getPath() contains surrogates for all non utf-8 characters. It throws exception "std::logic_error" when path is used.

    If simply filter out surrogates from item paths like this before appending to items:

    item.setPath(item.getPath().encode('UTF-8', 'surrogateescape').decode('UTF-8', 'ignore'))

    -> No crash and directory loads normally.

    Ofcourse you can't play these files, because path is still incorrect.

    How did this work in python27? It should use bytes for file paths, not str.

    Anyway, hope this info helps to fix this properly. I did see other similar bug where opening movie directory crashed and it's probably this same issue.

  9. The Following User Says Thank You to ocean For This Useful Post:

    abu baniaz (23-08-22)

  10. #24
    birdman's Avatar
    Title
    Moderator
    Join Date
    Sep 2014
    Location
    Hitchin, UK
    Posts
    7,797
    Thanks
    237
    Thanked 1,659 Times in 1,307 Posts
    Quote Originally Posted by ocean View Post
    How did this work in python27? It should use bytes for file paths, not str.
    Agreed. It should. The only time it needs to be "converted" to a string is for display (although even that might not be needed - Py3 can print bytes).
    Py2 didn't have the bytes/str distinction, so didn't have this problem.
    MiracleBox Prem Twin HD - 2@DVB-T2 + Xtrend et8000 - 5(incl. 2 different USBs)@DVB-T2[terrestrial - UK Freeview HD, Sandy Heath] - LAN/USB-stick/HDD

  11. #25

    Title
    Member
    Join Date
    Jun 2022
    Posts
    70
    Thanks
    1
    Thanked 33 Times in 19 Posts
    If I may suggest adding temporary workaround for the crash:

    Filter out non UTF-8 filenames from movielist. Currently there is no way to play the files anyway.

    MovieList.py line 741 has this comment:
    # OSX put a lot of stupid files ._* everywhere... we need to skip them

    Just add this code after that:

    Code:
    # Filter out non UTF-8 files. Remove this when movielist supports them.
    aname = name.encode('UTF-8', 'surrogateescape').decode('UTF-8', 'ignore')
    if aname != name:
        print("[MovieList] skipping non utf-8 filename: %s" % aname)
        continue

  12. #26
    birdman's Avatar
    Title
    Moderator
    Join Date
    Sep 2014
    Location
    Hitchin, UK
    Posts
    7,797
    Thanks
    237
    Thanked 1,659 Times in 1,307 Posts
    Quote Originally Posted by ocean View Post
    If I may suggest adding temporary workaround for the crash:

    Filter out non UTF-8 filenames from movielist. Currently there is no way to play the files anyway.
    The trash handler will still have an issue.
    Basically the file-system code needs to use the file-system name when dealing with the file-system. This is a bytes array.
    MiracleBox Prem Twin HD - 2@DVB-T2 + Xtrend et8000 - 5(incl. 2 different USBs)@DVB-T2[terrestrial - UK Freeview HD, Sandy Heath] - LAN/USB-stick/HDD

  13. #27

    Title
    Member
    Join Date
    Jun 2022
    Posts
    70
    Thanks
    1
    Thanked 33 Times in 19 Posts
    Quote Originally Posted by birdman View Post
    The trash handler will still have an issue.
    Basically the file-system code needs to use the file-system name when dealing with the file-system. This is a bytes array.
    Yes I know, my suggestion only meant to be temporary.

    But I think it makes sence, because I don't expect to see fix for filesystem any time soon. Unfortunately.

    Same problems are also in other images, so this not VIX only problem. I Wonder if other image developers are aware of this, atleast openatv crashes same way and trashcan code also looks identical.

  14. #28
    Huevos's Avatar
    Title
    Administrator
    Join Date
    Jun 2010
    Location
    38.5N, 0.5W
    Posts
    13,632
    Thanks
    2,007
    Thanked 4,956 Times in 3,276 Posts
    Quote Originally Posted by birdman View Post
    The trash handler will still have an issue.
    Basically the file-system code needs to use the file-system name when dealing with the file-system. This is a bytes array.
    We are working in python. When we access the file system we use os.walk. Are you saying you think we should remove all python abstraction layers and work directly on the byte code on the hard drive?
    Help keep OpenViX servers online.Please donate!

  15. #29

    Title
    Member
    Join Date
    Jun 2022
    Posts
    70
    Thanks
    1
    Thanked 33 Times in 19 Posts
    Quote Originally Posted by Huevos View Post
    When we access the file system we use os.walk.
    I think using os.walk.. is fine.

    In trashcan.py code parameter "trashfolder" is str, that means python3 converts everything to str automatically. Non utf-8 characters are replaced with surrogates -> problem.

    Exact same "os.walk" where "trashfolder" parameter is bytes, then root, dirs, files and name are also bytes and no str conversion happens.

    That's exactly what we want. But real problem comes from this part: enigma.eBackgroundFileEraser.getInstance().erase(f n)

    That function needs accept fn also as bytes.

    It's imported from enigma.pyc, but I can't find enigma.py. I guess file is generated somehow from C, is there documentation how this works?

    file_eraser.cpp has: void eBackgroundFileEraser::erase(const std::string& filename)

    Possible solution is to add overloaded function. I have only minimal experience in C, maybe something like: void eBackgroundFileEraser::erase(const char* filename)

    That also likely means whole image needs to be compiled from source before you can even test.

  16. #30
    twol's Avatar
    Title
    Moderator
    Join Date
    Apr 2012
    Posts
    8,419
    Thanks
    997
    Thanked 2,894 Times in 2,247 Posts
    Quote Originally Posted by ocean View Post
    I think using os.walk.. is fine.

    In trashcan.py code parameter "trashfolder" is str, that means python3 converts everything to str automatically. Non utf-8 characters are replaced with surrogates -> problem.

    Exact same "os.walk" where "trashfolder" parameter is bytes, then root, dirs, files and name are also bytes and no str conversion happens.

    That's exactly what we want. But real problem comes from this part: enigma.eBackgroundFileEraser.getInstance().erase(f n)

    That function needs accept fn also as bytes.

    It's imported from enigma.pyc, but I can't find enigma.py. I guess file is generated somehow from C, is there documentation how this works?

    file_eraser.cpp has: void eBackgroundFileEraser::erase(const std::string& filename)

    Possible solution is to add overloaded function. I have only minimal experience in C, maybe something like: void eBackgroundFileEraser::erase(const char* filename)

    That also likely means whole image needs to be compiled from source before you can even test.
    Yes it's compiled into Enigma, but all the standard C++ calls in file_eraser such as delete, rename, trunc expect string

    building the image is simple, but also somewhat complex.... see bottom of https://github.com/OpenViX/enigma2

    appreciate suggestions....... Huevos and I are still looking at having a resolution ... in python
    Last edited by twol; 26-08-22 at 16:09.
    Gigablue Quad 4K & UE 4K
    .........FBC Tuners:
    ------------------> GT-Sat unicable LNB to 1.5M dish(28.2E)
    ------------------> Gigablue unicable LNB to 80 cm dish(19.2E)
    .......................> FBC & DVB-S2X into 90cm dish (27.5W) Opticum robust Unicable LNB
    AX HD61, Edision Osmio 4K+, Zgemma H9Combo, Octagon SF8008 , gbtrio4k, h9se using unicable ports
    Zgemma H9 C/S into Giga4K

Page 2 of 10 FirstFirst 1234 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
This website uses cookies
We use cookies to store session information to facilitate remembering your login information, to allow you to save website preferences, to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners.