PDA

View Full Version : [OS-mio] Crash when folder contains filenames with Windows encoding



ocean
23-06-22, 08:21
Created folder "mymovies" and copied some mp4 files from Win10 to HDD. But now E2 crashes when I try to access folder /media/hdd/mymovies

Easy to reproduce. In WINDOWS: Create folder "mymovies" and add new empty file "pöllö.mp4" and FTP folder to HDD. Now try to access folder..

See attached screenshots. Box is Osmio, but I don't think it matters, SF8008 crashes also..

JonMMM
23-06-22, 08:54
Done exactly as you asked and it played fine

64030

64031

JonMMM
23-06-22, 09:03
Your MP4 file looks corrupted the size is 0

abu baniaz
23-06-22, 10:10
Can you please upload the crashlog instead of a screenshot of it. Default location is /home/root/logs

ocean
23-06-22, 11:04
Log attached



< 37.4687> 10:51:11.6917 [ActionMap] Keymap 'OkCancelActions' -> Action = 'ok'.
< 37.5573> 10:51:11.7803 [Trashcan] Debug path /media/hdd/mymovies/ => /media/hdd/.Trash
< 37.5620> 10:51:11.7850 [Pixmap] setPixmapNum(7) failed! defined pixmaps: [<enigma.gPixmapPtr; proxy of <Swig Object of type 'ePtr< gPixmap > *' at 0xaf82de90> >, <enigma.gPixmapPtr; proxy of <Swig Object of type 'ePtr< gPixmap > *' at 0xaf82df20> >, <enigma.gPixmapPtr; proxy of <Swig Object of type 'ePtr< gPixmap > *' at 0xaf82dfc8> >, <enigma.gPixmapPtr; proxy of <Swig Object of type 'ePtr< gPixmap > *' at 0xaf82df68> >, <enigma.gPixmapPtr; proxy of <Swig Object of type 'ePtr< gPixmap > *' at 0xaf82de48> >, <enigma.gPixmapPtr; proxy of <Swig Object of type 'ePtr< gPixmap > *' at 0xaf838020> >, <enigma.gPixmapPtr; proxy of <Swig Object of type 'ePtr< gPixmap > *' at 0xaf8380b0> >]
< 37.5651> 10:51:11.7881 PC: b5d9dae0
< 37.5652> 10:51:11.7881 Fault Address: 00000000
< 37.5652> 10:51:11.7882 Error Code: 0
< 37.5654> 10:51:11.7884 Backtrace:
< 37.5656> 10:51:11.7886 /usr/bin/enigma2(_Z17handleFatalSignaliP9siginfo_tPv) [0x7B6B0]
< 37.5657> 10:51:11.7887 /lib/libc.so.6(__default_rt_sa_restorer) [0xB5D64D30]
< 37.5658> 10:51:11.7888 /lib/libc.so.6(n/a) [0xB5D9DAE0]
< 37.5659> 10:51:11.7889 /lib/libc.so.6(raise) [0xB5D63EE0]
< 37.5660> 10:51:11.7890 /lib/libc.so.6(abort) [0xB5D51638]
< 37.5665> 10:51:11.7894 /usr/lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_ handlerEv) [0xB5FC4DC8]
< 37.5666> 10:51:11.7896 /usr/lib/libstdc++.so.6(n/a) [0xB5FC2E10]
< 37.5666> 10:51:11.7896 -------FATAL SIGNAL (6)


@JonMMM File content does not matter, it crashes when building the list.. And you need to use Windows (Latin-1 encoding), UTF8 ofcourse works

JonMMM
23-06-22, 11:16
i did use windows, right click new folder, rename, copy a mkv into it and upload via thumb zilla

ocean
23-06-22, 11:26
@JonMMM There are no special / scandinavian characters in your screenshot..

JonMMM
23-06-22, 11:40
@JonMMM There are no special / scandinavian characters in your screenshot..


I think you have just found your problem, try to recreate without special characters and see if you still get the error

ocean
23-06-22, 11:54
I think you have just found your problem, try to recreate without special characters and see if you still get the error
That's exactly what this bug report is about. Special characters in filenames crashes E2. Renaming files is just a workaround.. BTW no crash in older Openvix 5.2

JonMMM
23-06-22, 12:03
Sorry somehow I missed that

twol
23-06-22, 13:33
So I have created pöllö.mp4 on both windows & linux.... no issues.
So post the file here............

ocean
23-06-22, 14:01
So I have created pöllö.mp4 on both windows & linux.... no issues.
So post the file here............

Could you post screenshot what it looks like when you open that folder? (Don't create file with Linux, that works)

I can reproduce this in every box. Vu zero4K crashes same way.

See screenshot what folder looks like in Windows telnet:

twol
23-06-22, 14:29
Could you post screenshot what it looks like when you open that folder? (Don't create file with Linux, that works)

I can reproduce this in every box. Vu zero4K crashes same way.

See screenshot what folder looks like in Windows telnet:

so in my windows putty it shows as pöllö.mk4, so its obviously your windows language setup, which is why I need you to post the attachment here to see if I can repeat on my boxes - I need more than the crash log to be able to recreate

birdman
23-06-22, 14:46
The actual error message indicates that it has been asked to get a Pixmap that is outside the list of pixmaps it has.

My suspicion is that while this error is reported, the rest of the code assumes that the self.instance.setPixmap has been set; but it hasn't.

So there are probably two bugs here.


Why is it being asked to use something which doesn't exist?
What should the code be doing on such an error such that the caller doesn't crash after the failure?


My suggestion for the latter is that there should be a fallback, empty/blank pixmap that gets returned when the error is printed.

ocean
23-06-22, 14:53
so in my windows putty it shows as pöllö.mk4, so its obviously your windows language setup, which is why I need you to post the attachment here to see if I can repeat on my boxes - I need more than the crash log to be able to recreate

pöllö.mp4 is just empty file. Attached zipped folder. Windows is Finnish locale. I'd guess this is Python3 related issue..

abu baniaz
23-06-22, 18:44
This is how my one looks. I used your zip file.

twol
23-06-22, 19:36
pöllö.mp4 is just empty file. Attached zipped folder. Windows is Finnish locale. I'd guess this is Python3 related issue..

interesting in that when I unzip there are 2 files there pöllö.mp4 and a 2nd with the ö replaced by hex 94. On copying to the moves folder, both appear but the 2nd now shows as pll.mp4 with the hex characters stripped out. Both are accepted without a crash.

birdman
23-06-22, 20:14
pöllö.mp4 is just empty file. Attached zipped folder. Windows is Finnish locale. I'd guess this is Python3 related issue..Or a Windows one, since filenames are supposed to be utf-16(?).
That filename shows up as pФllФ.mp4 on my Linux systems. (The Ф is a Cyrillic Capital Letter EF - Unicode U+0424.)

But it is a valid utf-8 string (even if it doesn't look as you'd expect). Not sure why the ö (o with diaresis - U+00F6) should show up differently in different places, though.

birdman
23-06-22, 20:21
interesting in that when I unzip there are 2 files there What unzip are you using?
Mine finds one file and says so:


[parent]: unzip -l ../mymovies.zip
Archive: ../mymovies.zip
Length Date Time Name
--------- ---------- ----- ----
0 2022-06-23 16:44 mymovies/pФllФ.mp4
--------- -------
0 1 file
And we disagree about the name which is there too.

However, if I actually peep into the zip file the filename which is there is:
p~ll~.mp4
where both ~s are byte 0x94.
Which in Unicode is a non-printable character. (CANCEL CHARACTER).
Which might be what triggers the bugs.

birdman
23-06-22, 20:37
However, if I actually peep into the zip file the filename which is there is:
p~ll~.mp4
where both ~s are byte 0x94.
Which in Unicode is a non-printable character. (CANCEL CHARACTER).
Which might be what triggers the bugs.On reflexion 0x94, whilst being perfectly legal in an ext4 filename, is NOT legal utf-8.
That should be the 2-byte sequence 0xc2 0x94.

But ö is U+00f6 in Unicode, which is 0xc3 0xb6 in utf-8.

Anything that looks at filesystem names (or, as in console output, might echo this back) has to be able to cater for the result being non-utf8 when decoded.
So this might be what contributes to triggering the bugs.

ocean
29-06-22, 08:32
Tested openatv 6.4 and everything works fine there. 0x94 = 148 = ö in codepage 850

Just realised 6.4 still uses python2. Tested Openatv 7.0 and that is also crashing. I guess all pyhton3 images have this issue?!

Joe_90
29-06-22, 09:14
On reflexion 0x94, whilst being perfectly legal in an ext4 filename, is NOT legal utf-8.
That should be the 2-byte sequence 0xc2 0x94.

But ö is U+00f6 in Unicode, which is 0xc3 0xb6 in utf-8.

Anything that looks at filesystem names (or, as in console output, might echo this back) has to be able to cater for the result being non-utf8 when decoded.
So this might be what contributes to triggering the bugs.

I just tried extracting the file on my linux mint system. This how it shows on the console:


joe@joe-desktop:~/Downloads/mymovies$ ls -al
total 124
drwx------ 2 joe joe 4096 Jun 29 09:11 .
drwxr-xr-x 25 joe joe 118784 Jun 29 09:11 ..
-rw-rw-r-- 1 joe joe 0 Jun 23 16:44 'p'$'\302\224''ll'$'\302\224''.mp4'


...and in the Nemo File Manager:


p”ll”.mp4

Where the " are displayed as unicode 0094 symbols.

ocean
23-08-22, 20:28
Spent some time debuging code and now know what is causing the crash.

"def createPlaylist()" in file MovieSelection.py

When there is non utf-8 filename in directory, item.getPath() contains surrogates for all non utf-8 characters. It throws exception "std::logic_error" when path is used.

If simply filter out surrogates from item paths like this before appending to items:

item.setPath(item.getPath().encode('UTF-8', 'surrogateescape').decode('UTF-8', 'ignore'))

-> No crash and directory loads normally.

Ofcourse you can't play these files, because path is still incorrect.

How did this work in python27? It should use bytes for file paths, not str.

Anyway, hope this info helps to fix this properly. I did see other similar bug where opening movie directory crashed and it's probably this same issue.

birdman
23-08-22, 21:21
How did this work in python27? It should use bytes for file paths, not str.Agreed. It should. The only time it needs to be "converted" to a string is for display (although even that might not be needed - Py3 can print bytes).
Py2 didn't have the bytes/str distinction, so didn't have this problem.

ocean
25-08-22, 08:06
If I may suggest adding temporary workaround for the crash:

Filter out non UTF-8 filenames from movielist. Currently there is no way to play the files anyway.

MovieList.py line 741 has this comment:
# OSX put a lot of stupid files ._* everywhere... we need to skip them

Just add this code after that:



# Filter out non UTF-8 files. Remove this when movielist supports them.
aname = name.encode('UTF-8', 'surrogateescape').decode('UTF-8', 'ignore')
if aname != name:
print("[MovieList] skipping non utf-8 filename: %s" % aname)
continue

birdman
25-08-22, 10:21
If I may suggest adding temporary workaround for the crash:

Filter out non UTF-8 filenames from movielist. Currently there is no way to play the files anyway.The trash handler will still have an issue.
Basically the file-system code needs to use the file-system name when dealing with the file-system. This is a bytes array.

ocean
25-08-22, 10:50
The trash handler will still have an issue.
Basically the file-system code needs to use the file-system name when dealing with the file-system. This is a bytes array.
Yes I know, my suggestion only meant to be temporary.

But I think it makes sence, because I don't expect to see fix for filesystem any time soon. Unfortunately.

Same problems are also in other images, so this not VIX only problem. I Wonder if other image developers are aware of this, atleast openatv crashes same way and trashcan code also looks identical.

Huevos
26-08-22, 14:21
The trash handler will still have an issue.
Basically the file-system code needs to use the file-system name when dealing with the file-system. This is a bytes array.

We are working in python. When we access the file system we use os.walk. Are you saying you think we should remove all python abstraction layers and work directly on the byte code on the hard drive?

ocean
26-08-22, 15:40
When we access the file system we use os.walk.

I think using os.walk.. is fine.

In trashcan.py code parameter "trashfolder" is str, that means python3 converts everything to str automatically. Non utf-8 characters are replaced with surrogates -> problem.

Exact same "os.walk" where "trashfolder" parameter is bytes, then root, dirs, files and name are also bytes and no str conversion happens.

That's exactly what we want. But real problem comes from this part: enigma.eBackgroundFileEraser.getInstance().erase(f n)

That function needs accept fn also as bytes.

It's imported from enigma.pyc, but I can't find enigma.py. I guess file is generated somehow from C, is there documentation how this works?

file_eraser.cpp has: void eBackgroundFileEraser::erase(const std::string& filename)

Possible solution is to add overloaded function. I have only minimal experience in C, maybe something like: void eBackgroundFileEraser::erase(const char* filename)

That also likely means whole image needs to be compiled from source before you can even test.

twol
26-08-22, 16:01
I think using os.walk.. is fine.

In trashcan.py code parameter "trashfolder" is str, that means python3 converts everything to str automatically. Non utf-8 characters are replaced with surrogates -> problem.

Exact same "os.walk" where "trashfolder" parameter is bytes, then root, dirs, files and name are also bytes and no str conversion happens.

That's exactly what we want. But real problem comes from this part: enigma.eBackgroundFileEraser.getInstance().erase(f n)

That function needs accept fn also as bytes.

It's imported from enigma.pyc, but I can't find enigma.py. I guess file is generated somehow from C, is there documentation how this works?

file_eraser.cpp has: void eBackgroundFileEraser::erase(const std::string& filename)

Possible solution is to add overloaded function. I have only minimal experience in C, maybe something like: void eBackgroundFileEraser::erase(const char* filename)

That also likely means whole image needs to be compiled from source before you can even test.
Yes it's compiled into Enigma, but all the standard C++ calls in file_eraser such as delete, rename, trunc expect string

building the image is simple, but also somewhat complex.... see bottom of https://github.com/OpenViX/enigma2

appreciate suggestions....... Huevos and I are still looking at having a resolution ... in python

ocean
26-08-22, 17:11
appreciate suggestions....... Huevos and I are still looking at having a resolution ... in python
I'm not sure filesystem issues are solvable with ONLY python changes.

In C++ string (std::string) is bytes and it's not required to be UTF-8 string.

C++ functions should be able take bytes, not sure why it's not working. I think if you know answer to this, you would be closed to solution.

birdman
26-08-22, 18:37
We are working in python. When we access the file system we use os.walk. Are you saying you think we should remove all python abstraction layers and work directly on the byte code on the hard drive?No.
I'm saying that when you get a filename (or pathname) you must always keep it as bytes. If you ever convert it to a str then you (may) no longer have anythng that corresponds to what is on disk.

Huevos
26-08-22, 22:49
I'm not sure filesystem issues are solvable with ONLY python changes.

In C++ string (std::string) is bytes and it's not required to be UTF-8 string.

C++ functions should be able take bytes, not sure why it's not working. I think if you know answer to this, you would be closed to solution.

Can we try experimenting in Trashcan.py?


for root, dirs, files in os.walk(trashfolder.encode(), topdown=False):
for name in files:
# Don't delete any per-directory config files from .Trash if the option is in use
if (config.movielist.settings_per_directory.value and name == ".e2settings.pkl"):
continue
fn = os.path.join(root, name)
st = os.stat(fn)
try:
import chardet
encoding = chardet.detect(fn)["encoding"]
fn = fn.decode(encoding=encoding)
except Exception as e:
print( type(e).__name__, e)
if st.st_ctime < self.ctimeLimit:
try:
enigma.eBackgroundFileEraser.getInstance().erase(f n)
bytesToRemove -= st.st_size
except Exception as e:
print("[Trashcan] Failed to erase file after stat.st_ctime selection:", name, " ", type(e).__name__, e)
else:
candidates.append((st.st_ctime, fn, st.st_size))
size += st.st_size
# Remove empty directories if possible

ocean
27-08-22, 05:11
Can we try experimenting in Trashcan.py?
It's possible that could work. I can test it later.

Small note, also "name" is bytes. name == ".e2settings.pkl" -> name == b".e2settings.pkl", and also in that debug print

Huevos
27-08-22, 08:59
It's possible that could work. I can test it later.

Small note, also "name" is bytes ... in that debug printThat's fine.


>>>
>>> print(b"abc")
b'abc'
>>>

ocean
27-08-22, 11:44
Looks like it's still not working. No crash, but it doesn't erase the file.

I modified it to erase always, if filename contains 'test' and added some test files to trash folder.

I guess next step would be to check old python2 image, if it erases the files. (I only know it doesn't crash)

Also if anyone else wants to test, it's easy to create files from terminal. (Just remember to remove st_ctime check from code to make sure it tries to erase)



fn = b'/media/hdd/movie/.Trash/' + b't\xe4hti.mpg'
f = open(fn, 'w')
f.write('')
f.close()

twol
27-08-22, 16:16
@Ocean - I believe that you are using latin-1?? I tried that with file generated in post #36 and it does same as Ascii.... no delete
So I made the mistake of modifying both file_eraser.cpp and .h to take char*, but image build fails because other C++ modules use this module directly and expect string.
Will look at what it will take to handle this change to include modded file_eraser, but starting to think the whole lot will have to be done in python.

birdman
28-08-22, 01:07
@Ocean - I believe that you are using latin-1??The file-system has no concept of latin-1, unicode, iso8859-16, etc..
It only deals in a byte string.
So what any user is using has to be ignored, except when displaying it to a user, when it might enable it to make sense to a person - but such a translation must not produce an error, which is the problem here.

Huevos
28-08-22, 14:27
@Ocean,

Is it possible to check with the current code in the repo:
1) it works with utf8 filenames that contain single byte chars > 127.
2) it works with utf8 filenames that contain multi byte chars.

ocean
28-08-22, 15:04
So I made the mistake of modifying both file_eraser.cpp and .h to take char*, but image build fails because other C++ modules use this module directly and expect string.
I said it could possible to add overloaded function, not to modify existing one.

I did try compile image from source and there were no errors, but I don't think my changes were included to image. There are no instructions how to compile enigma2 from local folder. Maybe I needed to clean something too. I modified ENIGMA2_URI in site.conf to point to git:///home/.. Could someone add some instructions to github enigma2 page?

My initial cpp and h are attached if you like to test those. Don't know if they compile or not.

Also added debug prints to original function where filename is empty or not found. I think you can commit 2 debug lines if they work, erase shouldn't normally be called with empty or non existing file.

If you get debug prints working you should see in log what filename contains in case it does not erase.

ocean
28-08-22, 15:06
Is it possible to check with the current code in the repo:

I'll test later..

Huevos
28-08-22, 15:58
I said it could possible to add overloaded function, not to modify existing one.

I did try compile image from source and there were no errors, but I don't think my changes were included to image. There are no instructions how to compile enigma2 from local folder. Maybe I needed to clean something too. I modified ENIGMA2_URI in site.conf to point to git:///home/.. Could someone add some instructions to github enigma2 page?

My initial cpp and h are attached if you like to test those. Don't know if they compile or not.

Also added debug prints to original function where filename is empty or not found. I think you can commit 2 debug lines if they work, erase shouldn't normally be called with empty or non existing file.

If you get debug prints working you should see in log what filename contains in case it does not erase.Just for your curiosity:

| components/file_eraser.cpp: In member function 'void eBackgroundFileEraser::erase(const char*)':
| components/file_eraser.cpp:87:21: error: declaration of 'std::string filename' shadows a parameter
| 87 | std::string filename(str(filename));
| | ^~~~~~~~
| components/file_eraser.cpp:85:47: note: 'const char* filename' previously declared here
| 85 | void eBackgroundFileEraser::erase(const char* filename)
| | ~~~~~~~~~~~~^~~~~~~~
| components/file_eraser.cpp:87:30: error: 'str' was not declared in this scope; did you mean 'std'?
| 87 | std::string filename(str(filename));
| | ^~~
| | std
| components/file_eraser.cpp:85:47: warning: unused parameter 'filename' [-Wunused-parameter]
| 85 | void eBackgroundFileEraser::erase(const char* filename)
| | ~~~~~~~~~~~~^~~~~~~~
| In file included from ../lib/components/file_eraser.h:5,
| from components/file_eraser.cpp:1:
| ../lib/base/message.h: In instantiation of 'eFixedMessagePump<T>::eFixedMessagePump(eMainloop*, int, const char*) [with T = eBackgroundFileEraser::Message]':
| components/file_eraser.cpp:16:2: required from here
| ../lib/base/message.h:187:51: warning: unused parameter 'mt' [-Wunused-parameter]
| 187 | eFixedMessagePump(eMainloop *context, int mt, const char *name):
| | ~~~~^~
| In file included from ../lib/service/service.h:6,
| from ../lib/dvb/idvb.h:12,
| from ../lib/dvb/dvb.h:8,
| from components/scan.cpp:1:
| ../lib/service/iservice.h: In member function 'virtual void iServiceInformation::getAITApplications(std::map<int, std::__cxx11::basic_string<char> >&)':
| ../lib/service/iservice.h:432:69: warning: unused parameter 'aitlist' [-Wunused-parameter]
| 432 | virtual void getAITApplications(std::map<int, std::string> &aitlist) {};
| | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
| make[4]: *** [Makefile:2703: components/file_eraser.o] Error 1
| make[4]: *** Waiting for unfinished jobs....

Huevos
28-08-22, 16:07
Could someone add some instructions to github enigma2 page?

Example of building the enigma2 binary. Modify paths/machine to suit what is on your server.

Paths:

enigma2 git location = /home/username/6.2/builds/developer/vusolo4k/tmp/work/vusolo4k-oe-linux-gnueabi/enigma2/enigma2-5.2+gitAUTOINC+f3fe89ab92-r0/git
enigma2 binary location = /home/username/6.2/builds/developer/vusolo4k/tmp/work/vusolo4k-oe-linux-gnueabi/enigma2/enigma2-5.2+gitAUTOINC+f3fe89ab92-r0/package/usr/bin/enigma2

Run at command line:

cd /home/username/6.2/builds/openvix/developer/vusolo4k
MACHINE=vusolo4k
MACHINEBUILD=vusolo4k
. env.source
bitbake -f -c clean enigma2
bitbake -f -c unpack enigma2

(transfer your modified files to "enigma2 git location")

bitbake -c package enigma2

(retrieve enigma2 binary from "enigma2 binary location")...

twol
28-08-22, 16:25
I said it could possible to add overloaded function, not to modify existing one.

I did try compile image from source and there were no errors, but I don't think my changes were included to image. There are no instructions how to compile enigma2 from local folder. Maybe I needed to clean something too. I modified ENIGMA2_URI in site.conf to point to git:///home/.. Could someone add some instructions to github enigma2 page?

My initial cpp and h are attached if you like to test those. Don't know if they compile or not.

Also added debug prints to original function where filename is empty or not found. I think you can commit 2 debug lines if they work, erase shouldn't normally be called with empty or non existing file.

If you get debug prints working you should see in log what filename contains in case it does not erase.

I have run out of time to run test ...I had
initial crash but that was my issue....
modded c++ files attached...... yes I changed filename for convert

LraiZer
28-08-22, 18:06
I did try compile image from source and there were no errors, but I don't think my changes were included to image. There are no instructions how to compile enigma2 from local folder. Maybe I needed to clean something too. I modified ENIGMA2_URI in site.conf to point to git:///home/..

I think different people have different ways to rebuild enigma2. Here is one way!

In the build-envoronment site.conf file comment out to not INHERIT "rm_work" before building.

DL_DIR = "/home/LraiZer/sources"
#INHERIT += "rm_work"

This leaves the enigma2 source code after a build to allow editing and recompile .cpp changes.

EG.
/home/lraizer/openvix/build-enviroment/builds/openvix/release/inihdp/tmp/work/xpeedlx3-oe-linux/enigma2/enigma2-6.2+gitAUTOINC+0cd13deda6-r0/temp

Edit your changes to the enigma2 .cpp source code.

Then execute the script "run.do_compile" file, this will recompile engima2 binary only with the files that have been edited, so is very fast.

Then copy the recompiled enigma2 binary to /tmp on the set top box and run it from there in debug mode to test.

For speedy edit, test, edit, test.. this process only takes about 60 seconds to edit, recompile and transfer the new enigma2 binary to STB /tmp if you put the commands to run the script and ftp binary to STB in a build.sh script ;)

ocean
28-08-22, 18:25
modded c++ files attached...... yes I changed filename for convert
Ok, thanks, modified files look good, just needed to use diffent variable. I was not sure if it was required to cast "char *" to string, not used C++ in 10 years.

If it still doesn't like bytes, then not sure what would be the correct type for bytes?! Also could add function with different name "erase2", then error message atleast must be different

birdman
28-08-22, 18:29
1) it works with utf8 filenames that contain single byte chars > 127.
2) it works with utf8 filenames that contain multi byte chars.Filenames are not utf8. They are an array of arbitrary bytes.
Until we get away from that viewpoint there will be problems.

Huevos
28-08-22, 19:07
Filenames are not utf8. They are an array of arbitrary bytes.
Until we get away from that viewpoint there will be problems.They are from a particular charset converted to bytes.

Huevos
28-08-22, 19:27
the recompiled enigma2 binary to /tmp on the set top box and run it from there in debug mode to test.Can you give more detail on this step?

twol
28-08-22, 19:48
Ok, thanks, modified files look good, just needed to use diffent variable. I was not sure if it was required to cast "char *" to string, not used C++ in 10 years.

If it still doesn't like bytes, then not sure what would be the correct type for bytes?! Also could add function with different name "erase2", then error message atleast must be different
Will try tomorrow….. if current doesn,t work will try erase2 idea.
you may have not used C++ for 10 years, but there are very few people with any C++ knowledge left to help with Enigma2 changes, so your input is invaluable.

LraiZer
28-08-22, 21:13
Can you give more detail on this step?
The enigma2 work folder does not get removed after a build since the commented out #INHERIT += "rm_work"
Copy the recompiled engima2 binary you find in the enigma2 work folder to Set Top Box /tmp folder and chmod 755 executable.

Then run and monitor in a test enviromant on STB via telnet (i use HyperTerminal with logging)
to stop current enigma2 and run the enigma2 test binary instead:

stop current enigma2
init 4

start the enigma2 binary in /tmp with debug level 4
ENIGMA_DEBUG_LVL=4 /tmp/enigma2

stop the test engima2 binary test
CTRL+C

restart original engima2 as normal
init 3

birdman
29-08-22, 01:38
They are from a particular charset converted to bytes.They are not.
I can put any bytes I want (except /and NUL) in any order I wish to into a file-name.

birdman
29-08-22, 01:42
Will try tomorrow….. if current doesn,t work will try erase2 idea.
you may have not used C++ for 10 years, but there are very few people with any C++ knowledge left to help with Enigma2 changes, so your input is invaluable.Not sure that any C++ code changes should be needed. I'd expect that to just be handing a filename name around, not trying to (wrongly) decode/encode it.

twol
29-08-22, 10:01
@Ocean, Huevos

so this is what I see when sending a byte string to the C++ code ( trying erase b'/media/hdd/movie/.Trash/20220827 1404 - ITV HD - ITV News & Weather.ts' )
Q: should a Null be added to the end of the byte string?? ------>since tried that didn't make difference
so finds both C++ functions, but dislikes the byte string going from python to C++, os stat accepts the string.
Maybe something to do with SWIG and the way our python.h or other python modules in E2 are defined?


07:56:02.2879 [RecordTimer] prepare ok, waiting for begin
07:56:02.2897 [RecordTimer] activating state 2 (Running)
07:56:02.2901 [Trashcan] probing folders
07:56:02.2935 [Notifications] AddPopup, id = None
07:56:02.2979 [Trashcan] looking in trashcan /media/hdd/movie/.Trash
07:56:02.3058 [Trashcan] /media/hdd/movie/.Trash: Size: 347,121,183
07:56:02.3063 [Trashcan] trying erase b'/media/hdd/movie/.Trash/20220827 1404 - ITV HD - ITV News & Weather.ts'
07:56:02.3102 [Screen] Warning: Skin is missing element 'icon' in <class 'Screens.MessageBox.MessageBox'>(A recording has been started:
Breakfast).07:56:02.3103
07:56:02.3117 [Pixmap] setPixmapNum(1) failed! defined pixmaps: []
07:56:02.3123 [MessageBox] Timeout set to 3 seconds.
07:56:02.3165 [eDVBServiceRecord] Recording to /media/hdd/movie/20220829 0756 - BBC One HD - Breakfast.ts...
07:56:02.3173 [eDVBServiceRecord] start recording...
07:56:02.3175 [eDVBServiceRecord] RECORD: have 1 video stream(s) (19c9), and 1 audio stream(s) (19ca), and the pcr pid is 19c9, and the text pid is ffffffff
07:56:02.3179 [eDVBServiceRecord] ADD PID: 0000
07:56:02.3181 [eDVBServiceRecord] ADD PID: 0004
07:56:02.3183 [eDVBServiceRecord] ADD PID: 19c9
07:56:02.3184 [eDVBServiceRecord] ADD PID: 19ca
07:56:02.3199 [setIoPrio] realtime level 7 ok
07:56:02.3200 [eFilePushThreadRecorder] THREAD START
07:56:02.3210 [RecordTimer] start recording on tuner: I
07:56:02.3331 [eInputDeviceInit] 0 160 (352) 1
07:56:02.3335 [eRCDeviceInputDev] emit: 0
07:56:02.3344 [InfoBarGenerics] Key: 352 (Break) KeyID='KEY_OK' Binding='('OK',)'.
07:56:02.3375 [Task] >>> Error: [<Components.Task.FailedPostcondition object at 0xb0622370>]
07:56:02.3381 [Task] job Components.Task.Job name=Cleaning Trashes #tasks=1 completed with [<Components.Task.FailedPostcondition object at 0xb0622370>] in Components.Task.Task name=Cleaning Trashes
07:56:02.3400 [Task] unrecoverable task failure
07:56:02.3404 Cleaning Trashes
Error: [Failure instance: Traceback: <class 'TypeError'>: in method 'eBackgroundFileEraser_erase', argument 2 of type 'char const *'
Additional information:
Wrong number or type of arguments for overloaded function 'eBackgroundFileEraser_erase'.
Possible C/C++ prototypes are:
eBackgroundFileEraser::erase(std::string const &)
eBackgroundFileEraser::erase(char const *)

/usr/lib/python3.10/threading.py:1009:_bootstrap_inner
/usr/lib/python3.10/threading.py:946:run
/usr/lib/python3.10/site-packages/twisted/_threads/_threadworker.py:47:work
/usr/lib/python3.10/site-packages/twisted/_threads/_team.py:182:doWork
--- <exception caught here> ---
/usr/lib/python3.10/site-packages/twisted/python/threadpool.py:244:inContext
/usr/lib/python3.10/site-packages/twisted/python/threadpool.py:260:<lambda>
/usr/lib/python3.10/site-packages/twisted/python/context.py:117:callWithContext
/usr/lib/python3.10/site-packages/twisted/python/context.py:82:callWithContext
/usr/lib/enigma2/python/Tools/Trashcan.py:200:work
]07:56:02.3431
07:56:02.3435 [Task] not retrying job.
07:56:03.9599 [eDVBDemux] open demux /dev/dvb/adapter0/demux0
07:56:03.9601 [eDVBDemux][eDVBSectionReader] DMX_SET_FILTER pid=18
07:56:05.3470 [MessageBox] Timeout!
07:56:05.3613 [Screen] Warning: Skin is missing element 'icon' in <class 'Screens.MessageBox.MessageBox'>(Cleaning Trashes
Error: [Failure instance: Traceback: <class 'TypeError'>: in method 'eBackgroundFileEraser_erase', argument 2 of type 'char const *'
Additional information:
Wrong number or type of arguments for overloaded function 'eBackgroundFileEraser_erase'.
Possible C/C++ prototypes are:
eBackgroundFileEraser::erase(std::string const &)
eBackgroundFileEraser::erase(char const *)

/usr/lib/python3.10/threading.py:1009:_bootstrap_inner
/usr/lib/python3.10/threading.py:946:run
/usr/lib/python3.10/site-packages/twisted/_threads/_threadworker.py:47:work
/usr/lib/python3.10/site-packages/twisted/_threads/_team.py:182:doWork
--- <exception caught here> ---
/usr/lib/python3.10/site-packages/twisted/python/threadpool.py:244:inContext
/usr/lib/python3.10/site-packages/twisted/python/threadpool.py:260:<lambda>
/usr/lib/python3.10/site-packages/twisted/python/context.py:117:callWithContext
/usr/lib/python3.10/site-packages/twisted/python/context.py:82:callWithContext
/usr/lib/enigma2/python/Tools/Trashcan.py:200:work
]).07:56:05.3614

ocean
29-08-22, 10:50
Maybe something to do with SWIG and the way our python.h or other python modules in E2 are defined?

Yes, that seems to be the issue. Somehow needs to tell SWIG it's okay to pass bytes to these functions.

Probably there are other projects with same issues after moving to py3, maybe worth to google errormessages and see if others have solved this

twol
29-08-22, 13:05
So this is what I get if I use str() on byte string in python, it appears to execute... but fails as cannot find the file

13:56:26.7858 [Trashcan] root = b'/media/hdd/movie/.Trash'
13:56:26.7859 [Trashcan] name = b'20220829 1054 - BBC One HD - Morning Live.ts.sc'
13:56:26.7860 [Trashcan] fn = b'/media/hdd/movie/.Trash/20220829 1054 - BBC One HD - Morning Live.ts.sc'
13:56:26.7861 [Trashcan] trying erase b'/media/hdd/movie/.Trash/20220829 1054 - BBC One HD - Morning Live.ts.sc'
13:56:26.7861 [eBackgroundFileEraser] filename b'/media/hdd/movie/.Trash/20220829 1054 - BBC One HD - Morning Live.ts.sc' not found: Function not implemented
13:56:26.7863 [Trashcan] /media/hdd/movie/.Trash: Size now: 0

where fn is byte string before and fn1 after ---> fn1 = str(fn)
then I specifically call the routine now erase2 in C++ code


st = stat(fn)
fn1 = str(fn)
print("[Trashcan] fn = ", fn)
print("[Trashcan] trying erase", fn1)
enigma.eBackgroundFileEraser.getInstance().erase2( fn1)

wondering if we need this if (rename(filename.c_str(), delname.c_str())<0) with the .c_str

ocean
29-08-22, 14:44
if I use str() on byte string in python, it appears to execute
I don't think doing str() to bytes results anything useful.

However there are questions like this: https://github.com/swig/swig/issues/752

Not sure if adding that "#define SWIG_PYTHON_STRICT_BYTE_CHAR" somewhere would help

twol
29-08-22, 14:49
I don't think doing str() to bytes results anything useful.

However there are questions like this: https://github.com/swig/swig/issues/752

Not sure if adding that "#define SWIG_PYTHON_STRICT_BYTE_CHAR" somewhere would help

I would agree except its passed to the C++ routine whereas previously the call failed

Also seen that post, but issue is it would apply all over and not sure we want to do that

Huevos
29-08-22, 14:50
Not sure if adding that "#define SWIG_PYTHON_STRICT_BYTE_CHAR" somewhere would helpThere are a lot of strings passed in and out of the c++ code. I think adding that would affect everything.

Huevos
29-08-22, 15:51
The enigma2 work folder
So I am here:
/home/openvix/6.1/builds/openvix/developer/vuultimo4k/tmp/work/vuultimo4k-oe-linux-gnueabi/enigma2/enigma2-6.2+gitAUTOINC+6ff9cbe887-r0

I commented out #INHERIT += "rm_work"

Ran run.do_compile.

What folder is the rebuilt binary now in?

twol
29-08-22, 16:15
So I am here:
/home/openvix/6.1/builds/openvix/developer/vuultimo4k/tmp/work/vuultimo4k-oe-linux-gnueabi/enigma2/enigma2-6.2+gitAUTOINC+6ff9cbe887-r0

I commented out #INHERIT += "rm_work"

Ran run.do_compile. So

What folder is the rebuilt binary now in?
I am guessing temp ….. because thats where I always look when the E2 compile fails

birdman
29-08-22, 18:20
There are a lot of strings passed in and out of the c++ code. I think adding that would affect everything.Except that we know we need to pass bytes here, not the str which is what we are doing.
It really depends on what that setting covers.

Huevos
29-08-22, 18:49
Except that we know we need to pass bytes here, not the str which is what we are doing.
It really depends on what that setting covers.

As far as I can see that is a global.

Huevos
29-08-22, 18:50
I am guessing temp ….. because thats where I always look when the E2 compile fails

No. That contains the logs and scripts.

LraiZer
29-08-22, 21:43
So I am here:
/home/openvix/6.1/builds/openvix/developer/vuultimo4k/tmp/work/vuultimo4k-oe-linux-gnueabi/enigma2/enigma2-6.2+gitAUTOINC+6ff9cbe887-r0

I commented out #INHERIT += "rm_work"

Ran run.do_compile.

What folder is the rebuilt binary now in?

So you commented out #INHERIT += "rm_work"
Then you have run your original bitbake commands you posted earlier to build the engima2 package.
You now have the git source code and compiled sources remaining to edit and recompile.

So for example you now edit:

/home/lraizer/openvix/build-enviroment/builds/openvix/release/inihdp/tmp/work/xpeedlx3-oe-linux/enigma2/enigma2-6.2+gitAUTOINC+6124ebc99d-r0/git/lib/components/file_eraser.cpp

you then execute the compile command, it only takes 5 seconds to recompile the file_eraser.cpp to create the new engima2 binary.

/home/lraizer/openvix/build-enviroment/builds/openvix/release/inihdp/tmp/work/xpeedlx3-oe-linux/enigma2/enigma2-6.2+gitAUTOINC+6124ebc99d-r0/temp/run.do_compile

New engima2 binary is in the same git/main/ folder location as the engima.cpp

/home/lraizer/openvix/build-enviroment/builds/openvix/release/inihdp/tmp/work/xpeedlx3-oe-linux/enigma2/enigma2-6.2+gitAUTOINC+6124ebc99d-r0/git/main/

Copy this engima2 binary to /tmp on STB and run from there.

birdman
29-08-22, 22:49
As far as I can see that is a global.Sorry, didn't mean whether it was local/specific/global but rather what it actually does (and doesn't) do.

ocean
31-08-22, 18:52
One solution is to add typemap. I don't know the exact typemap should be used, but here is minimal example how to reproduce same issue with your computer.

testbytes.i


%module test
%{
void erase(const char* filename) {
printf("Filename = %s\n", filename);
}
%}

//%typemap(in) (const char* filename) {
// Py_ssize_t len;
// PyBytes_AsStringAndSize($input, &$1, &len);
//}

void erase(const char* filename);

testbytes.py

from test import erase
erase(b't\xe4hti.mpg')
sudo apt install swig
sudo apt install clang
swig -python -py3 testbytes.i
clang -Wall -Wextra -Wpedantic -I /usr/include/python3.10/ -fPIC -shared testbytes_wrap.c -o _test.so
python3 testbytes.py


Traceback (most recent call last):
File "/home/ocean/testbytes.py", line 2, in <module>
erase(b't\xe4hti.mpg')
File "/home/ocean/test.py", line 66, in erase
return _test.erase(filename)
TypeError: in method 'erase', argument 1 of type 'char const *'

Now activate typemap by removing //

swig -python -py3 testbytes.i
clang -Wall -Wextra -Wpedantic -I /usr/include/python3.10/ -fPIC -shared testbytes_wrap.c -o _test.so
python3 testbytes.py
Filename = t�hti.mpg

ocean
31-08-22, 21:20
Ofcource possible to use gcc for that example.
gcc -O2 -fPIC -I/usr/include/python3.10 -shared testbytes_wrap.c -o _test.so

And even simpler typemap


%typemap(in) (const char* filename) {
$1 = PyBytes_AsString($input);
}

Next step would be to add that and test. But there are also other functions using same parameters (const std::string& filename) and (const char* filename)

ocean
01-09-22, 04:39
After reading documentation, typemap can also be added to class.

file_eraser.h has:

#ifdef SWIG
public:

Added typemap here and it's now WORKING! Also file was erased.

#ifdef SWIG
public:
%typemap(in) (const char* filename2) {
$1 = PyBytes_AsString($input);
}
Could not yet make (const std::string& filename) typemap compile, but (const char* filename2) works fine.

twol
01-09-22, 05:56
After reading documentation, typemap can also be added to class.

file_eraser.h has:

#ifdef SWIG
public:

Added typemap here and it's now WORKING! Also file was erased.

#ifdef SWIG
public:
%typemap(in) (const char* filename2) {
$1 = PyBytes_AsString($input);
}
Could not yet make (const std::string& filename) typemap compile, but (const char* filename2) works fine.

So with that filename2 change that now works….and if nothing additional is added, then the existing fiename works as before? …. Because that isneeded for the direct C++ calls from other routines to filename

ocean
01-09-22, 07:39
So with that filename2 change that now works….and if nothing additional is added, then the existing fiename works as before? …. Because that isneeded for the direct C++ calls from other routines to filenameYes, attached files I currently have and everything seems to be working. You can now call erase with bytes.

Ofcourse that whole function should not be copy/pasted. It should be possible to call eBackgroundFileEraser::erase(const std::string& filename) from eBackgroundFileEraser::erase(const char* filename2)

twol
01-09-22, 08:18
Yes, attached files I currently have and everything seems to be working. You can now call erase with bytes.

Ofcourse that whole function should not be copy/pasted. It should be possible to call eBackgroundFileEraser::erase(const std::string& filename) from eBackgroundFileEraser::erase(const char* filename2)

OK excellent, but maybe better to just change the 2nd calls name to prevent any other potential issues with other C++ calls and also then calling the 1st routine from the 2nd should be straight forward??

Have you had opportunity to look at playing these files??

ocean
01-09-22, 08:34
OK excellent, but maybe better to just change the 2nd calls name to prevent any other potential issues with other C++ calls and also then calling the 1st routine from the 2nd should be straight forward??

Have you had opportunity to look at playing these files??

Other function could be something like this for example:


void eBackgroundFileEraser::erase(const char* filename2)
{
std::string filename(filename2);
eBackgroundFileEraser *eraser = eBackgroundFileEraser::getInstance();
eraser->erase(filename);
}


Ofcourse if you want to be sure, just use different name for the function, "erase2" or whatever.

I have not looked at other places in code, but I guess it's just doing this same

twol
01-09-22, 08:46
Other function could be something like this for example:


void eBackgroundFileEraser::erase(const char* filename2)
{
std::string filename(filename2);
eBackgroundFileEraser *eraser = eBackgroundFileEraser::getInstance();
eraser->erase(filename);
}


Ofcourse if you want to be sure, just use different name for the function, "erase2" or whatever.

I have not looked at other places in code, but I guess it's just doing this same
The other calls from other C++ calls to the original function are not an issue because they are removing internally created files.
I plan to build today using your code, and then if everything looks OK will commit later.

twol
01-09-22, 15:21
@Ocean.

Currently using your 2 files as is.
Having a few hiccups, and run out of time today.
Basically in most cases file delete appears to work although I have had one crash where it was apparently doing a stat on a deleted file.
Biggest issue is deleting directories which is not happening but think that is an issue in Trashcan ... I need to follow up and sort out why, probably something very simple.... and stupid.

ocean
02-09-22, 17:02
Back to the original issue post #1 crash when loading movielist.

This was much harder to trace, because crash happens inside C++ code.

Problem is in movielist.py, def getItemDisplayName()

Function returns "name" in UTF-8. When filename contains other than UTF-8 characters they are escaped inside str.

def buildMovieListEntry() there is data.txt = getItemDisplayName(serviceref, info)

Now data.txt contains escaped str and when function returns SWIG_PYOBJECT to C-code it crashes.

Avoiding the crash is simple, for example modify getItemDisplayName() to return value: name.encode('UTF-8', 'surrogateescape').decode('UTF-8', 'ignore')

Now question is what name / characters it should display to user? But I think display problem is only minor issue here.

After displayname is fixed movielist loads fine. There is another problem in playing these files, but that is separate issue. I can look at it after this is fixed

Huevos
02-09-22, 17:39
@Ocean, can you tell me if you see a difference in the output for:

name.encode('UTF-8', 'surrogateescape').decode('UTF-8', 'ignore')
name.encode('UTF-8', 'ignore').decode('UTF-8')

ocean
02-09-22, 18:23
@Ocean, can you tell me if you see a difference in the output for:

name.encode('UTF-8', 'surrogateescape').decode('UTF-8', 'ignore')
name.encode('UTF-8', 'ignore').decode('UTF-8')

I think both do the same.

Maybe it would be possible to check coding and do this for latin-1: name.encode('UTF-8', 'surrogateescape').decode('latin-1')

It does display correctly, attached screencap.

twol
02-09-22, 20:48
Assuming we are still looking at filenames in some language encoding then…..I believe !

1. the utf8.encode.decode (huevos) will be the same whatever the original text, as if there are any surrogates you are throwing them away.
2. the 2nd(ocean) - if the original encoding is not utf-8, then the encode will restore to the original encoding, and the decode to latin-1 could contain surrogates ( and cause issues) if original encoding not latin-1.

Huevos
03-09-22, 01:09
So maybe... encode (with surrogates) to get a bytes string... then try to get the encoding with chardet... then use that encoding in the decode.

birdman
03-09-22, 01:31
Avoiding the crash is simple, for example modify getItemDisplayName() to return value: name.encode('UTF-8', 'surrogateescape').decode('UTF-8', 'ignore')
Why would the encode be needed? Isn't name starting as a bytes item (given that this is the only data-type which can contain all valid filenames)?


Now question is what name / characters it should display to user? But I think display problem is only minor issue here.What about:


name.decode(encoding='utf8', errors='backslashreplace')which at least shows what was there.

birdman
03-09-22, 01:32
So maybe... encode (with surrogates) to get a bytes string.....Why would you not already have a bytes string?

birdman
03-09-22, 01:34
Assuming we are still looking at filenames in some language encoding then…..We have no idea what "language" encoding may be used in a filename. A filename is just a series of bytes - any bytes. A file may have come from anywhere.

ocean
03-09-22, 05:24
So maybe... encode (with surrogates) to get a bytes string... then try to get the encoding with chardet... then use that encoding in the decode.
Yes, works for displayname. Maybe because data.txt is inside structure and not a function parameter.

For example this works for me:

s = 'p\udcf6ll\udcf6.mp4' #str
b = s.encode('UTF-8', 'surrogateescape') #bytes
import chardet
e = chardet.detect(b)['encoding']
s2 = b.decode(e)
print(s2) # Displays correctly for me

Btw, just noticed "getItemDisplayName" is imported also in MovieSelection.py and it uses it for file operations?? Function name "DisplayName" suggest it should be used only for the display.

It would be better to have another function "getItemFileName" and that return bytes, name.encode('UTF-8', 'surrogateescape') and adjust code in movieselection.

ocean
03-09-22, 05:29
Why would the encode be needed? Isn't name starting as a bytes item (given that this is the only data-type which can contain all valid filenames)?

Currently function getItemDisplayName return value "name" is STR, not bytes. You can convert it to bytes: name.encode('UTF-8', 'surrogateescape'). But then every code that uses "getItemDisplayName" should be adjusted for bytes. It is possible to do this also. Question is only what is the best way

ocean
03-09-22, 07:18
So maybe... encode (with surrogates) to get a bytes string... then try to get the encoding with chardet... then use that encoding in the decode.
And if you like to avoid doing "chardet" for utf-8, this should work (atleast works for me)


def getItemDisplayName(itemRef, info, removeExtension=None):
if itemRef.flags & eServiceReference.isGroup:
name = itemRef.getName()
elif itemRef.flags & eServiceReference.isDirectory:
name = info.getName(itemRef)
name = path.basename(name.rstrip("/"))
else:
name = info.getName(itemRef)
removeExtension = config.movielist.hide_extensions.value if removeExtension is None else removeExtension
if removeExtension:
fileName, fileExtension = path.splitext(name)
if fileExtension in KNOWN_EXTENSIONS:
name = fileName
aname = name.encode('UTF-8', 'surrogateescape')
if name != aname.decode('UTF-8', 'ignore'):
from chardet import detect
encoding = detect(aname)['encoding']
return aname.decode(encoding)
return name

twol
03-09-22, 09:55
And if you like to avoid doing "chardet" for utf-8, this should work (atleast works for me)


def getItemDisplayName(itemRef, info, removeExtension=None):
if itemRef.flags & eServiceReference.isGroup:
name = itemRef.getName()
elif itemRef.flags & eServiceReference.isDirectory:
name = info.getName(itemRef)
name = path.basename(name.rstrip("/"))
else:
name = info.getName(itemRef)
removeExtension = config.movielist.hide_extensions.value if removeExtension is None else removeExtension
if removeExtension:
fileName, fileExtension = path.splitext(name)
if fileExtension in KNOWN_EXTENSIONS:
name = fileName
aname = name.encode('UTF-8', 'surrogateescape')
if name != aname.decode('UTF-8', 'ignore'):
from chardet import detect
encoding = detect(aname)['encoding']
return aname.decode(encoding)
return name


Ah, the joys of Unicode!

birdman
03-09-22, 11:23
Currently function getItemDisplayName return value "name" is STR, not bytes.Yes,
But it should be given a bytes.
So if what is coming in is bytes why does it need to have an encode run on it?

birdman
03-09-22, 11:28
And if you like to avoid doing "chardet" for utf-8, this should work (atleast works for me)Ideally it should work for any filename.
So you can't assume that there is a valid encoding.
You have to get away from the idea that there is some (Windows) language encoding here and handle any arbitrary stream of bytes.

twol
03-09-22, 11:56
@Ocean - still working some of my changes to file_eraser and Trashcan, so have committed your original(working) changes to Git.
We can submit some of your suggestions to reduce code size when this change has been in use for a while and the getItemDisplayName changes added.

Huevos
03-09-22, 11:59
@ocean, are you able to build or would you like us to supply you with the develop image?

Huevos
03-09-22, 12:01
Why would you not already have a bytes string?

I think you are talking about a different problem.

ocean
03-09-22, 13:52
@ocean, are you able to build or would you like us to supply you with the develop image?
Thanks, I don't have buildenviroment with me now, but changes looks good and can test tomorrow.

In trashcan change name == ".e2settings.pkl" -> name == b".e2settings.pkl", otherwise comparison can never be true because name is bytes.

Huevos
03-09-22, 14:23
Thanks, I don't have buildenviroment with me now, but changes looks good and can test tomorrow.

In trashcan change name == ".e2settings.pkl" -> name == b".e2settings.pkl", otherwise comparison can never be true because name is bytes.

https://github.com/OpenViX/enigma2/commit/9624cf8a1df2eca5a11b6810b63f9b972e408076

birdman
03-09-22, 17:07
I think you are talking about a different problem.I'm talking about post 76 (https://www.world-of-satellite.com/showthread.php?65365-Crash-when-folder-contains-filenames-with-Windows-encoding&p=527648&viewfull=1#post527648) which refers to the code for getItemDisplayName().

name is a filename, so has to be bytes. So why would you run encode on it?

And why, when getting a display name, would you discard information rather than making it visible to the user?

twol
03-09-22, 17:17
I'm talking about post 76 (https://www.world-of-satellite.com/showthread.php?65365-Crash-when-folder-contains-filenames-with-Windows-encoding&p=527648&viewfull=1#post527648) which refers to the code for getItemDisplayName().

name is a filename, so has to be bytes. So why would you run encode on it?

And why, when getting a display name, would you discard information rather than making it visible to the user?

Because external is bytes, but internal is unicode (python 3 automatic conversion unless really forced to bytes … and the python 3 default is always utf-8) ……….. why else would we go through these changes otherwise??

birdman
04-09-22, 02:02
Because external is bytes, but internal is unicode (python 3 automatic conversion unless really forced to bytes … and the python 3 default is always utf-8) ……….. why else would we go through these changes otherwise??The name at this point should be a filename, hence should be bytes.

The filename should be kept as bytes throughout the code (and passed to C+ code as such). Otherwise it cannot cover all possible filenames.
The only time you need it to be anything else is for display, and then the decode should not discard information.

ocean
04-09-22, 05:01
The name at this point should be a filename, hence should be bytes.
The filename should be kept as bytes throughout the code (and passed to C+ code as such). Otherwise it cannot cover all possible filenames.
name is str in getItemDisplayName. But what is IMPORTANT here, "filename" is still intact inside str, it's just in different format. This str can always be converted to bytes without any characters lost: name.encode('UTF-8', 'surrogateescape'). Now we have original filename in bytes.

ocean
04-09-22, 05:09
Since "getItemDisplayName" is also used in MovieSelection.py for different purposes, maybe it would be best to use two functions instead. This way it's not required to handle possible other codings in MovieSelection.py


def getItemDisplayName(itemRef, info, removeExtension=None):
if itemRef.flags & eServiceReference.isGroup:
name = itemRef.getName()
elif itemRef.flags & eServiceReference.isDirectory:
name = info.getName(itemRef)
name = path.basename(name.rstrip("/"))
else:
name = info.getName(itemRef)
removeExtension = config.movielist.hide_extensions.value if removeExtension is None else removeExtension
if removeExtension:
fileName, fileExtension = path.splitext(name)
if fileExtension in KNOWN_EXTENSIONS:
name = fileName
return name


def getItemDisplayNameText(itemRef, info, removeExtension=None):
name = getItemDisplayName(itemRef, info, removeExtension)
aname = name.encode('UTF-8', 'surrogateescape')
if name != aname.decode('UTF-8', 'ignore'):
from chardet import detect
encoding = detect(aname)['encoding']
return aname.decode(encoding)
return name


And replace all 3 cases of "data.txt = getItemDisplayName(serviceref, info)" with "data.txt = getItemDisplayNameText(serviceref, info)"

Huevos
04-09-22, 09:45
So with the code changes already made is there still a problem getting these files to play?

birdman
04-09-22, 11:02
name is str in getItemDisplayName. But what is IMPORTANT here, "filename" is still intact inside str, it's just in different format.
A Python str is an array of Unicode points. How did you get a filename in there that might not have been valid utf8 in the first place?

Huevos
04-09-22, 12:43
A Python str is an array of Unicode points. How did you get a filename in there that might not have been valid utf8 in the first place?I don't know what you mean "How did you". We are working on the code in the repo. We haven't done anything to create that behaviour. It was already there.

ocean
04-09-22, 13:32
I got it working! Attatched all modifications I currently have and files are playing fine! (service.cpp function is copy/pasted, sorry)

Also have very good news. It's NOT required to change any python functions to use bytes. Escaped characters in str can be handled using typemaps.

Huevos
04-09-22, 14:25
Can the copy/paste be simplified?

Huevos
04-09-22, 14:28
It's NOT required to change any python functions to use bytes. Escaped characters in str can be handled using typemaps.Does that mean we don't need the changes in MovieList at all?

ocean
04-09-22, 17:11
Does that mean we don't need the changes in MovieList at all?
MovieList still has to be changed. I posted change earlier today and zip has this change included.

ocean
04-09-22, 17:22
Can the copy/paste be simplified?
Yes, but I don't have it. I tried few things, but it did not compile. What works in file_eraser.cpp does not work in service.cpp

Huevos
04-09-22, 17:29
MovieList still has to be changed. I posted change earlier today and zip has this change included.

We already committed this: https://github.com/OpenViX/enigma2/commit/6efb844aba07357c3285e120bde4dc048e2bbad8

What about the copy/paste? Is there a nicer way to do that? https://github.com/OpenViX/enigma2/commit/e3474fc6ae016b563ad0e56073f27c17b8d13785

Edit: sorry, it seems we posted at more or less the same time.

ocean
04-09-22, 19:00
We already committed this: https://github.com/OpenViX/enigma2/commit/6efb844aba07357c3285e120bde4dc048e2bbad8
Yes, but that change also affected MovieSelection.py, because there is "from Components.MovieList import getItemDisplayName".

movielist.py change was only intended for 3 cases "data.txt" in movielist.py, so I changed movielist.py little.



What about the copy/paste? Is there a nicer way to do that?
I also googled, but did not find obvious ways to do that. It seems it's not possible to call constructor from outside.

Also it should be possible to add these typemaps to original functions, then these extra char* functions would not be needed. (So far attempt has been unsuccessful)

Huevos
04-09-22, 19:44
So the problem with the copy/paste version is we now have duplicate code which would need maintaining in synchronisation. Bit of a nightmare. :(


eServiceReference::eServiceReference(const char* string2)
{
std::string string(string2);
eDebug("[eServiceReference][char]");
eServiceReference(string);
}


| service/service.cpp: In constructor 'eServiceReference::eServiceReference(const char*)':
| service/service.cpp:121:26: warning: unnecessary parentheses in declaration of 'string' [-Wparentheses]
| 121 | eServiceReference(string);
| | ^~~~~~~~
| service/service.cpp:121:26: note: remove parentheses
| 121 | eServiceReference(string);
| | ^~~~~~~~
| | - -
| service/service.cpp:121:27: error: conflicting declaration 'eServiceReference string'
| 121 | eServiceReference(string);
| | ^~~~~~

So looks like it doesn't understand we are calling the function.

Is this problem caused because someone thought it would be a good idea to give the class and the function the same name?

Huevos
04-09-22, 22:28
Ok, playing a bit more...

In iservice.h:

eServiceReference(int type, int flags, const std::string &path)
: type(type), flags(flags), path(path)
{
memset(data, 0, sizeof(data));
number = 0;
}
+ void eServiceReferenceBase(const std::string &string);
eServiceReference(const std::string &string);
eServiceReference(const char* string2);
std::string toString() const;
std::string toCompareString() const;

In service.cpp:

}
return res;
}

-eServiceReference::eServiceReference(const std::string &string)
+void eServiceReference::eServiceReferenceBase(const std::string &string)
{
const char *c = string.c_str();
int pathl = 0;

Then add this in service.cpp:

eServiceReference::eServiceReference(const std::string &string)
{
eDebug("[eServiceReference][std]");
eServiceReferenceBase(string);
}

eServiceReference::eServiceReference(const char* string2)
{
std::string string(string2);
eDebug("[eServiceReference][char]");
eServiceReferenceBase(string);
}

birdman
05-09-22, 00:44
I don't know what you mean "How did you". We are working on the code in the repo. We haven't done anything to create that behaviour. It was already there.It's an anonymous you. I could have written "How did one".

I'm intrigued as to how bytes end up in a str given things like:


>>> bstr = b'\x86\x87\x88'
>>> ustr = bstr.decode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 0: invalid start byte

ocean
05-09-22, 05:31
I'm intrigued as to how bytes end up in a str given things like:

>>> bstr = b'\x86\x87\x88'
>>> ustr = bstr.decode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 0: invalid start byte

bstr = b'\x86\x87\x88'
ustr = bstr.decode('UTF-8', 'surrogateescape')

Ofcourse this is python only string format. When this string is sent to SWIG, it sees escaped characters and throws exception. But they can be converted back to bytes using typemap.

Huevos
05-09-22, 09:18
Ok, playing a bit more...

In iservice.h:

eServiceReference(int type, int flags, const std::string &path)
: type(type), flags(flags), path(path)
{
memset(data, 0, sizeof(data));
number = 0;
}
+ void eServiceReferenceBase(const std::string &string);
eServiceReference(const std::string &string);
eServiceReference(const char* string2);
std::string toString() const;
std::string toCompareString() const;

In service.cpp:

}
return res;
}

-eServiceReference::eServiceReference(const std::string &string)
+void eServiceReference::eServiceReferenceBase(const std::string &string)
{
const char *c = string.c_str();
int pathl = 0;

Then add this in service.cpp:

eServiceReference::eServiceReference(const std::string &string)
{
eDebug("[eServiceReference][std]");
eServiceReferenceBase(string);
}

eServiceReference::eServiceReference(const char* string2)
{
std::string string(string2);
eDebug("[eServiceReference][char]");
eServiceReferenceBase(string);
}

By the way, I forgot to say, this does actually work.

Huevos
05-09-22, 10:10
So I have added this, based on ocean's pattern:
https://github.com/OpenViX/enigma2/commit/2c74d42983c6969c0d2ec87b3c48622ba0ff3a45

Huevos
05-09-22, 10:24
MovieList still has to be changed. I posted change earlier today and zip has this change included.

So what goes wrong without the MovieList change now we have the new code in iservice.h/service.cpp? Can you test with the old MovieList.py and new cpp code please?

birdman
05-09-22, 10:28
bstr = b'\x86\x87\x88'
ustr = bstr.decode('UTF-8', 'surrogateescape')

Ofcourse this is python only string format. When this string is sent to SWIG, it sees escaped characters and throws exception. But they can be converted back to bytes using typemap.Seems a convoluted (and hence error prone) way to keep data when it is always actually a bytes format.

Huevos
05-09-22, 10:37
Seems a convoluted (and hence error prone) way to keep data when it is always actually a bytes format.

We are just working with what python/SWIG developers dropped in our laps.

twol
05-09-22, 11:57
Seems a convoluted (and hence error prone) way to keep data when it is always actually a bytes format.

Huevos is correct its what the python 3 Developers have given us, bless them!
Also there are bytes and bytes - with Unicode you have be careful with text files and their manipulation, as a unicode character can in theory occupy upto 4 bytes in utf-8, so memory manipulation can typically occupy 2 x memory compared to python 2 and be slower with older python 2 methods of manipulation.

ocean
05-09-22, 13:44
So what goes wrong without the MovieList change now we have the new code in iservice.h/service.cpp? Can you test with the old MovieList.py and new cpp code please?
Without movielist changes it crashes same way as in post #1.

data.txt is inside larger object/structure, it's not a function parameter. Typemaps can only handle function parameters.

It should also be easy to test yourself too. Take any videofile, check that it plays fine. Edit filename to contain b'\xe4' for example.

ocean
05-09-22, 20:06
Crash happens in elistboxcontent.cpp, line 920


const char *string = (PyUnicode_Check(pstring)) ? PyUnicode_AsUTF8(pstring) : "<not-a-string>";


"pstring" contains escaped characters and PyUnicode_AsUTF8(pstring) fails. Changed it to this bellow and it doesn't crash anymore. (Note: there are atleast 5 similar checks in this file, possibly they all should be fixed same way)



const char *string = (PyUnicode_Check(pstring)) ? PyBytes_AsString(PyUnicode_AsEncodedString(pstring , "utf-8", "surrogateescape")) : "<not-a-string>";


But that only fixes the crash. GUI expects UTF-8. Movielist change "decode(encoding)" is still needed to display latin-1 characters correctly.

Huevos
05-09-22, 20:38
How do you trigger this crash?

Are the current changes in enigma repo of service.cpp working for you?

twol
05-09-22, 20:45
Crash happens in elistboxcontent.cpp, line 920


const char *string = (PyUnicode_Check(pstring)) ? PyUnicode_AsUTF8(pstring) : "<not-a-string>";


"pstring" contains escaped characters and PyUnicode_AsUTF8(pstring) fails. Changed it to this bellow and it doesn't crash anymore. (Note: there are atleast 5 similar checks in this file, possibly they all should be fixed same way)



const char *string = (PyUnicode_Check(pstring)) ? PyBytes_AsString(PyUnicode_AsEncodedString(pstring , "utf-8", "surrogateescape")) : "<not-a-string>";


But that only fixes the crash. GUI expects UTF-8. Movielist change "decode(encoding)" is still needed to display latin-1 characters correctly.

Is this crash still from movielist or another location?
if another issue how did you get there

ocean
05-09-22, 21:50
How do you trigger this crash?
That is the original crash without any changes to movielist. (You asked to test with old movielist)


Are the current changes in enigma repo of service.cpp working for you?
Yes, service.cpp changes are working fine.

Huevos
06-09-22, 01:07
@ocean, what are you using to FTP the file to the STB?

Is it possible that FTP is interfering with the filename. Windows is supposed to save filenames as UTF16.

ocean
06-09-22, 05:17
Is it possible that FTP is interfering with the filename. Windows is supposed to save filenames as UTF16.
No, it's not only FTP issue. If files are saved to USB drive in Windows. When you plug that USB drive to box, it has same issues.

But with current changes in repo it's already working. Just include latest movielist changes I posted earlier, then it doesn't affect anything in movieselection.py

birdman
06-09-22, 11:10
But that only fixes the crash. GUI expects UTF-8. Movielist change "decode(encoding)" is still needed to display latin-1 characters correctly.Linux filenames have no concept of Latin-1 (or of iso8859-1, Windows-1252, utf8 etc.) They are just an array of bytes.
The filename has to be kept (and passed around) as bytes while any display has to use a utf8 representation of this.

Huevos
06-09-22, 11:21
Linux filenames have no concept of Latin-1 (or of iso8859-1, Windows-1252, utf8 etc.) They are just an array of bytes.
The filename has to be kept (and passed around) as bytes while any display has to use a utf8 representation of this.

Gordon, we are talking about GUI display. We have to feed something to the GUI that it understands.

birdman
07-09-22, 01:22
Gordon, we are talking about GUI display. We have to feed something to the GUI that it understands.Yes. But you also need to retain the original name in bytes in order to pass it to the filesystem.
If you start with a filename you should only ever need to decode it (in some lossless way) to produce a display value.
If you start with a display string (such as setting up a timer/Autotimer) then you only ever need to encode it to get a filename.

So if you ever find yourself encoding and decoding the same value then something has gone wrong in the logic.

Huevos
07-09-22, 12:49
The current code works so far but open to any improvements you suggest.

ocean
07-09-22, 13:13
@Huevos Could you still add some fix for RedirectOutput.py? There are debug prints, one in navigation.py, line 110, print("[Navigation] playing ref", ref and ref.toString())

If you don't like try/except, maybe change this:


if isinstance(data, bytes):
data = data.decode(encoding="UTF-8", errors="ignore")


To this in RedirectOutput.py?



if isinstance(data, bytes):
data = data.decode(encoding="UTF-8", errors="ignore")
else:
data = data.encode("UTF-8", "ignore").decode()


Everything else seems working for me.

Huevos
08-09-22, 15:42
@Huevos Could you still add some fix for RedirectOutput.py? There are debug prints, one in navigation.py, line 110, print("[Navigation] playing ref", ref and ref.toString())

If you don't like try/except, maybe change this:


if isinstance(data, bytes):
data = data.decode(encoding="UTF-8", errors="ignore")


To this in RedirectOutput.py?



if isinstance(data, bytes):
data = data.decode(encoding="UTF-8", errors="ignore")
else:
data = data.encode("UTF-8", "ignore").decode()


Everything else seems working for me.

That crash was caused by something nasty in RecordTimer.py...

< 4037.8571> 17:38:38.4681 Traceback (most recent call last):
< 4037.8572> 17:38:38.4681 File "/usr/lib/enigma2/python/StartEnigma.py", line 224, in processDelay
< 4037.8581> 17:38:38.4690 callback(*retval)
< 4037.8581> 17:38:38.4691 File "/usr/lib/enigma2/python/Screens/InfoBarGenerics.py", line 3333, in recordQuestionCallback
< 4037.8585> 17:38:38.4694 File "/usr/lib/enigma2/python/Screens/InfoBarGenerics.py", line 3253, in startInstantRecording
< 4037.8588> 17:38:38.4697 File "/usr/lib/enigma2/python/RecordTimer.py", line 1275, in record
< 4037.8591> 17:38:38.4701 File "/usr/lib/enigma2/python/timer.py", line 229, in addTimerEntry
< 4037.8594> 17:38:38.4704 File "/usr/lib/enigma2/python/timer.py", line 268, in calcNextActivation
< 4037.8597> 17:38:38.4706 File "/usr/lib/enigma2/python/timer.py", line 363, in processActivation
< 4037.8601> 17:38:38.4710 File "/usr/lib/enigma2/python/RecordTimer.py", line 1009, in doActivate
< 4037.8607> 17:38:38.4717 File "/usr/lib/enigma2/python/RecordTimer.py", line 676, in activate
< 4037.8611> 17:38:38.4720 File "/usr/lib/enigma2/python/RecordTimer.py", line 485, in log_tuner
< 4037.8614> 17:38:38.4723 File "/usr/lib/enigma2/python/RecordTimer.py", line 284, in log
< 4037.8617> 17:38:38.4726 File "/usr/lib/enigma2/python/Tools/RedirectOutput.py", line 16, in write
< 4037.8620> 17:38:38.4729 TypeError: in method 'ePythonOutput', argument 1 of type 'char const *

What exactly is that bad output?

Huevos
08-09-22, 15:54
Python:

ePythonOutput(self.line, self.level)

C++:

void ePythonOutput(const char *, int lvl = lvlDebug);

const char * ... shouldn't we be feeding that bytes?

twol
08-09-22, 17:42
Python:

ePythonOutput(self.line, self.level)

C++:

void ePythonOutput(const char *, int lvl = lvlDebug);
const char * ... shouldn't we be feeding that bytes?

Thats in eerror.h but its also in eerror.cpp

void ePythonOutput(const char *string, int lvl)

Which confuses me - why in both

ocean
08-09-22, 17:54
... shouldn't we be feeding that bytes?
It's possible to change that to take bytes. But do you want log files to be other than UTF-8? In receiver menu there is option to view log, it most likely crashes if log is not UTF-8

ocean
08-09-22, 18:13
What exactly is that bad output?
That no longer happens and problem was debug print in traschan.py, it contained filename. I don't know why traceback said "RecordTimer.py", nothing was wrong there.

Current debug log crash is this and it just tries to print "playing.." and filename



< 1151.2274> 19:04:31.2122 Traceback (most recent call last):
< 1151.2275> 19:04:31.2123 File "/usr/lib/enigma2/python/StartEnigma.py", line 224, in processDelay
< 1151.2282> 19:04:31.2130 callback(*retval)
< 1151.2283> 19:04:31.2131 File "/usr/lib/enigma2/python/Screens/InfoBar.py", line 223, in movieSelected
< 1151.2291> 19:04:31.2139 File "/usr/lib/enigma2/python/Screens/InfoBar.py", line 228, in openMoviePlayer
< 1151.2295> 19:04:31.2143 File "/usr/lib/enigma2/python/StartEnigma.py", line 314, in open
< 1151.2301> 19:04:31.2149 dlg = self.current_dialog = self.instantiateDialog(screen, *arguments, **kwargs)
< 1151.2302> 19:04:31.2150 File "/usr/lib/enigma2/python/StartEnigma.py", line 251, in instantiateDialog
< 1151.2306> 19:04:31.2155 return self.doInstantiateDialog(screen, arguments, kwargs, self.desktop)
< 1151.2307> 19:04:31.2155 File "/usr/lib/enigma2/python/StartEnigma.py", line 274, in doInstantiateDialog
< 1151.2312> 19:04:31.2160 dlg = screen(self, *arguments, **kwargs)
< 1151.2312> 19:04:31.2161 File "/usr/lib/enigma2/python/Screens/InfoBar.py", line 289, in __init__
< 1151.2315> 19:04:31.2164 File "/usr/lib/enigma2/python/Navigation.py", line 110, in playService
< 1151.2318> 19:04:31.2166 File "/usr/lib/enigma2/python/Tools/RedirectOutput.py", line 16, in write
< 1151.2322> 19:04:31.2170 TypeError: in method 'ePythonOutput', argument 1 of type 'char const *'
Additional information:
Wrong number or type of arguments for overloaded function 'ePythonOutput'.

birdman
08-09-22, 18:32
That crash was caused by something nasty in RecordTimer.py...?Trying to log something.

The logging code should really trap this in case the caller doesn't (as they clearly all do not).

birdman
08-09-22, 18:36
Current debug log crash is this and it just tries to print "playing.." and filenameAnd that goes via a print statement to the log. If filename is not valid utf8 (and there is no reason at all why it has to be unless the logging code has turned it into valid Unicode) then you'll get a crash.
We reached here earlier in the thread.