Autotimers and Description Uniqueness

**ccs** · 10-09-20, 10:54

Originally Posted by spanner123

That all makes sense Birdman but why is it Sky boxes or Humax boxes etc never get it wrong?

Maybe because they use series CRID data.

Some progress on enigma2/CRID's achieved in Australia a while ago ....

Code:

http://beyonwiz.com.au/forum/viewtopic.php?f=54&t=10773

**ccs** · 10-09-20, 11:32

.... this link looks quite interesting....

Code:

https://www.beyonwiz.com.au/forum/viewtopic.php?p=183023#p170895

@IanSav is probably the best bet for comments.

BrianTheTechieSnail · 10-09-20, 11:36

Originally Posted by birdman

You do seem to be missing the real point.
It's very simple for a human to read two similar texts and decide whether they are, in fact, the same - we've evolved over thousands of years to recognize patterns.
There is no way that simple computer code can achieve the same discrimination.
No matter what comparison code you put in place it will never come up with the "correct result" every time.
So live with that limitation and look for some some other test to help you.

Oh come on. I never said my answer would be perfect.

Sent from my SM-A515F using Tapatalk

**Joe_90** · 10-09-20, 11:46

I contributed to a few threads discussing CRIDs several years ago and actually raised a request here on Vix to see if CRID handling could be incorporated into the AutoTimer.py code but that request thread went nowhere at the time. The beyonwiz team in Australia (as per @ccs post) have had some success in using this CRID data, but I think the big issue is the lack of consistency of how the CRID information is structured or encoded. (as @ccs points out, IanSav (and prl) have worked on this in the Australian environment). What might work for Freesat/Freeview might need workarounds or kludges for SKY or other broadcasters. I used the "Series Link" feature on SKY and on Humax Freesat receivers and found them excellent. I originally thought the AutoTimer mechanism on enigma to be a little crude in its operation but, over the years I've adapted to its foibles and have found that it works for me 99% of the time. I rarely miss episodes I want to record and it generally works when a new series starts after being off the air for months (providing the broadcaster doesn't move it to a completely new channel or time slot). Worst case scenario is that I have multiple recording of the same episode occasionally.
The CRID mechanism would eliminate the ambiguity of trying to match on title or description but I imagine it would need a complete overhaul of the program logic to keep tables of series and programme CRID info to determine if particular episodes need to be recorded or have already been recorded.

**ccs** · 10-09-20, 12:01

... I'm sure there would be room in the *.ts.meta files to store an extra word or two of crid details.

If it's blank/missing, use the existing system, if it's not, bingo.

**adm** · 10-09-20, 12:38

Originally Posted by fat-tony

I contributed to a few threads discussing CRIDs several years ago and actually raised a request here on Vix to see if CRID handling could be incorporated into the AutoTimer.py code but that request thread went nowhere at the time. The beyonwiz team in Australia (as per @ccs post) have had some success in using this CRID data, but I think the big issue is the lack of consistency of how the CRID information is structured or encoded. (as @ccs points out, IanSav (and prl) have worked on this in the Australian environment). What might work for Freesat/Freeview might need workarounds or kludges for SKY or other broadcasters.

This part of the problem in that any changes have to work for all broadcasters irrespective where in the world they may be. Even in the UK the main channels may have a good record with CRID data but there have in the past also many instances on the "lesser" channels where strict transmitting of the correct CRID has been a bit lax.

I originally thought the AutoTimer mechanism on enigma to be a little crude in its operation but, over the years I've adapted to its foibles and have found that it works for me 99% of the time. I rarely miss episodes I want to record and it generally works when a new series starts after being off the air for months (providing the broadcaster doesn't move it to a completely new channel or time slot).

I've also found Autotimers to be reliable 99+% of the time and if anything "goes wrong" it tends to record too much rather than missing recordings. I can live with the occsaional 2 or 3 copies of the repeat. I tend not to set limited time slots so when a program does move time it tends to be captured three months down-line.

Note: all my autotimers are set to check in the title and short description only.

BrianTheTechieSnail · 10-09-20, 13:18

Originally Posted by BrianTheTechieSnail

The comparisons of the titles and descriptions seem to be done in function checkSimilarity which starts on line 838 of the file AutoTimer.py.
It uses a function SequenceMatcher from difflib, which you can find descriptions of on the web such as https://towardsdatascience.com/sequencematcher-in-python-6b1e6f3915fc

I don't think it's really the right function in this application because, for instance, a single character difference in the middle of a description counts as a huge difference while a single character difference near the beginning or end counts only as a small difference. Thus the change from (S01:E04) to (S01:E05) right at the end of a description is seen as something to ignore.

Okay I've done some tests and THIS IS WRONG.
It does not seem to see differences in the middle as more important than differences near the beginning and end.
The descriptions of the SequenceMatcher function I found seem over simple and use over simple examples so I didn't understand exactly what it does (and I still don't).
SORRY.

SequenceMatcher probably is a good choice except that numbers need to be given more importance, and I have an idea for that which I will try soon.
Other people are, as always, free to ignore what I write.

**adm** · 10-09-20, 13:56

Originally Posted by BrianTheTechieSnail

SequenceMatcher probably is a good choice except that numbers need to be given more importance, and I have an idea for that which I will try soon.
Other people are, as always, free to ignore what I write.

But don't forget there may be many numbers in the description that don't relate to the series or episode. I saw one description the other day it said something like "....post war britain between 1943 and 1952........" Perhaps just making numbers more important in the current tests is not the way to go.

Also don't forget that any solution for one problem cannot create another and break something the autotimer does well. Fixing a problem in less than 1% of descriptions cannot cause prolems with, say, 3% of other descriptions.

BrianTheTechieSnail · 10-09-20, 14:21

Originally Posted by adm

But don't forget there may be many numbers in the description that don't relate to the series or episode. I saw one description the other day it said something like "....post war britain between 1943 and 1952........" Perhaps just making numbers more important in the current tests is not the way to go.

Also don't forget that any solution for one problem cannot create another and break something the autotimer does well. Fixing a problem in less than 1% of descriptions cannot cause prolems with, say, 3% of other descriptions.

Okay, lets confine ourselves to fixing the big logical error you described:

Originally Posted by adm

There are 3 separate tests working on 3 discrete bits of EPG.
I) the title data
ii) the short description data
iii) the extended description data

Test 1: only the title data is compared

Test 2: only the short description data is compared,but only if:
i) test 1 produced a match
ii) the menu option “title and short description” has been selected

Test 3: only the extended description data is compared but only if:
i) test 2 produced a match
ii) the menu option “title and all descriptions” has been selected

Test 2 is falling over on these problem EPG beacuse it is identifying all programs with a genric description with only the last few charcters changing to be similar enough to be identical.

Test 3 is falling over because there is no extended description data to check but when checking this non-existant data it indicates that every time there is a difference. Garbage in = garbage out - or more correctly garbage in = the same result out every time. This is overriding the result of test 2. Note: there may be extended description from the broadcasters in other countries and maybe if the epg information is obtained over the net.

The code is clearly not supposed to do this, there is a test for it, but it's screwed up. Maybe the code I posted before and then lost confidence in is the fix:

Code:

	def checkSimilarity(self, timer, name1, name2, shortdesc1, shortdesc2, extdesc1, extdesc2, force=False):
		foundTitle = False
		foundShort = False
		retValue = False
		if name1 and name2:
			foundTitle = ( 0.8 < SequenceMatcher(lambda x: x == " ",name1, name2).ratio() )
		# NOTE: only check extended & short if tile is a partial match
		if foundTitle:
			if timer.searchForDuplicateDescription > 0 or force:
				if shortdesc1 and shortdesc2:
					# If the similarity percent is higher then 0.7 it is a very close match
					foundShort = ( 0.7 < SequenceMatcher(lambda x: x == " ",shortdesc1, shortdesc2).ratio() )
					if foundShort:
						if timer.searchForDuplicateDescription == 2:
							if extdesc1 and extdesc2:
								# Some channels indicate replays in the extended descriptions
								# If the similarity percent is higher then 0.7 it is a very close match
								retValue = ( 0.7 < SequenceMatcher(lambda x: x == " ",extdesc1, extdesc2).ratio() )
							else:			# Brian was here
								retValue = True	# Brian was here
						else:
							retValue = True
			else:
				retValue = True
		return retValue

**birdman** · 10-09-20, 16:27

Originally Posted by ccs

... I'm sure there would be room in the *.ts.meta files to store an extra word or two of crid details

Not where it's needed. You really want to know you've already recorded something even after you've deleted the timer for and the recording of it.
A sqlite database of all recorded CRIDs would be the thing to use.
With a configurable "forget after" time, so that any record older then this would be pruned.

BrianTheTechieSnail · 10-09-20, 16:30

Originally Posted by birdman

Not where it's needed. You really want to know you've already recorded something even after you've deleted the timer for and the recording of it.
A sqlite database of all recorded CRIDs would be the thing to use.
With a configurable "forget after" time, so that any record older then this would be pruned.

That would be hard to edit if you accidentally deleted one of a series you were trying to collect.

**birdman** · 10-09-20, 16:33

Originally Posted by BrianTheTechieSnail

The code is clearly not supposed to do this, there is a test for it, but it's screwed up. Maybe the code I posted before and then lost confidence in is the fix:

It's a fix for something - although the actual fix should be more like:

Code:

 retValue = extdesc1 == extdesc2

with relevant handling of any Null values.
However, that something is only a small subset of cases - it's not going to have any effect on most of the cases in this thread.

**ccs** · 10-09-20, 16:43

Originally Posted by birdman

Not where it's needed. You really want to know you've already recorded something even after you've deleted the timer for and the recording of it.
A sqlite database of all recorded CRIDs would be the thing to use.
With a configurable "forget after" time, so that any record older then this would be pruned.

OK, but I was rambling on earlier in this thread suggesting that remembering recordings/timers somewhere after they have been deleted wasn't such a bad idea.

BrianTheTechieSnail · 10-09-20, 16:53

Originally Posted by birdman

It's a fix for something - although the actual fix should be more like:

Code:

 retValue = extdesc1 == extdesc2

with relevant handling of any Null values.
However, that something is only a small subset of cases - it's not going to have any effect on most of the cases in this thread.

First you say perfection is impossible so give up.
Now you say it's not perfect - so it's not worth bothering with.

**Old Codger** · 10-09-20, 16:58

I thought the following might be of some interest.

Having installed OpenPLi a few weeks ago, I decided to have a look at its version of Autotimer.

The 'Edit Autotimer' screen has several extra options not present in OpenViX -
1. Description - short equal extended for match (default 'no')
2. Do not skip match when not description (default 'no')
3. Percentage ratio for duplicate matches (from '50%' to '100%' in 10 percent steps, default '80%')

The 3rd sounded like it might be of use, so I did a test on 'Two in Clover' with this set to '100%' (and, as normal, no timespan specified). The result was much as I hoped - all unique episodes found, no duplicates. Changing to '90%' naturally didn't work, returning just one hit.

The 100% setting obviously wouldn't be of any help where, for example, one episode of a set of duplicates was prefixed 'NEW: ', or one had the 'sign language' indicator [SL]. Incidentally, this sort of thing was discussed at length in the 4-year-old topic referred to in my first post.

Also, this version of Autotimer doesn't have the 'all descriptions' problem.

I wonder if the following suggestions for the OpenViX version of 'Autotimer' might be worthy of consideration by the various experts here?
Please feel free to ignore this if you think it's nonsense (which it most likely is).

1. Add an extra option to the 'Edit Autotimer' screen 'Expert match' or similar, default 'no'.
If set to 'yes', a set of extra options would appear, for example 'Ignore any leading "NEW: " ', 'Ignore any "[SL]"', 'Percentage match', and anything else deemed useful.

2. Alternatively, change the matching algorithm to ignore spaces, punctuation (comma, period, brackets, dashes, colons, etc) and superfluous information such as 'NEW: ', '[S]', '[AD]', '[SL]' etc (assuming of course that the matching algorithm can actually be altered). This would hopefully reduce the possiblity of mismatches a bit.

Thread: Autotimers and Description Uniqueness

Thread Tools

Display

The Following User Says Thank You to ccs For This Useful Post:

The Following User Says Thank You to adm For This Useful Post:

Tags for this Thread

Posting Permissions

Options

About

Site Links

Social Media