Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1... 10 11 12 13  Previous   Next
Can we ever stop copying from IMDb?
Author Message
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting RHo:
Quote:
We have a system that works for universal data entry other aspects of people's names, namely parsing. Use your native background knowledge and document your corrections.

I'm sorry, but that made me laugh out loud just now - to the point that I actually spilled some coffee over my keyboard. We have "a system that works", for parsing?!     

Let me show you the latest parsing thread: here. That's the kind of "system that works" we have: the latest dozen or so (if you want, I can provide links to all of them) have pretty much all resulted in more or less of a tie; there's no consensus at all, and what one person finds acceptable "documentation" for one particular way of parsing, is readily dismissed by the other half. If that's the kind of "system that works" you'd like to see for diacriticals, then thanks, but no thanks.

Still, it works for the vast majority of names (including double barrelled last names). Those forum threads are simply the exceptional ambiguous cases which prove the rule. And some of them are solved during those threads which often is not represented in the poll results because people tend to never correct their earlier vote.

The part that does not work that well is using the combination of 4 different fields as the key for linking.
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,736
Posted:
PM this userDirect link to this postReply with quote
Quoting RHo:
Quote:
Still, it works for the vast majority of names

It does not. Not at all. Virtually every name that consists of more than two parts have separate, non-linking entries for the same person in the database. That's how well parsing works. It doesn't work at all. And the reason it goes wrong is because, similar to grammar, of the different cultural backgrounds of the users. For example, German users have generally learned that a "middle name" is a second given name, and that maiden names are always part of the last name: once a last name, always a last name. And they will often enter names as such automatically, without even giving it a moment's thought. Many American users, though, are quite comfortable with moving a maiden name to the "middle name" field. And then there are many cultures in which the entire concept of a "middle name" doesn't exist. In short: as long as the users are allowed to let their cultural background decide how to enter data, we're never going to get on the same page.
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting RHo:
Quote:
Quoting CharlieM:
Quote:
And exactly how do you propose to issue a rule that can be consistently applied by all people across  "50 different languages"?

The same way we do it for title capitalisation or name parsing? Use your knowledge and document changes.

For title capitalisation we have a clear set of rules, so that's totally different, (...)

My point is that we have a rule that depends on individual rules of those "50 different languages", namely we handle every language correctly. It could be done for names as well.

BTW French title capitalisation rules are strangely complex.  A lot of people can't handle them correctly in their contribution of French titles on the first try. Nevertheless they can be easily convinced with a no vote and an explanation about the French rules.
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting RHo:
Quote:
Still, it works for the vast majority of names

It does not. Not at all. Virtually every name that consists of more than two parts have separate, non-linking entries for the same person in the database. That's how well parsing works. It doesn't work at all. And the reason it goes wrong is because, similar to grammar, of the different cultural backgrounds of the users. For example, German users have generally learned that a "middle name" is a second given name, and that maiden names are always part of the last name: once a last name, always a last name. And they will often enter names as such automatically, without even giving it a moment's thought. Many American users, though, are quite comfortable with moving a maiden name to the "middle name" field. And then there are many cultures in which the entire concept of a "middle name" doesn't exist. In short: as long as the users are allowed to let their cultural background decide how to enter data, we're never going to get on the same page.

The maiden name argument would not explain your "virtual every" 3 part name assertion because it is only valid for at most half of the names. You are massively overstating.
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,736
Posted:
PM this userDirect link to this postReply with quote
Quoting RHo:
Quote:
It could be done for names as well.

It could not, no. The ongoing debate on Asian names, for instance, should give you an idea of the hurdles that would need to be taken. And over the last, say, five years, I've seen frighteningly little movement on that.
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,736
Posted:
PM this userDirect link to this postReply with quote
Quoting RHo:
Quote:
You are massively overstating.

I'm sorry, but I am most certainly not. Not at all. You may well choose to ignore the problem, but that doesn't mean it doesn't exist. It exists, and it exists on a massive scale. If you have it, I suggest that you play with the CLTPlus-tool: I have, extensively, and that'll give you a horrifying view of what's actually going on in the database with this.
 Last edited: by T!M
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting RHo:
Quote:
You are massively overstating.

Not at all.

virtually 100% is not overstated versus at most 50%?
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,736
Posted:
PM this userDirect link to this postReply with quote
Quoting RHo:
Quote:
Quoting T!M:
Quote:
Quoting RHo:
Quote:
You are massively overstating.

Not at all.

virtually 100% is not overstated versus at most 50%?

No, it's really not - honestly. Again, just have a look in the database - as I said, the CLTPlus-tool is extremely useful in showing you these differences - and see for yourself. I'm not just shouting something here - it comes from what I'm actually finding in the database on a daily basis.
 Last edited: by T!M
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting RHo:
Quote:
It could be done for names as well.

It could not, no. The ongoing debate on Asian names, for instance, should give you an idea of the hurdles that would need to be taken. And over the last, say, five years, I've seen frighteningly little movement on that.

Asian names have nothing at all to do with diacritics. There are two problems involved in Asian names. The first is romanisation because there is more than one romanisation system. This could be solved by the rules. But I'm not sure that many user really do their own romanisation. For Asian credits which are already romanised there is the problem of parsing and field association. This could as well be easily solved in a way which would be acceptable by a majority, if the rules would allow the usage of the credited as field for this purpose. Unfortunately the credited as field is only allowed for linking.
DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting RHo:
Quote:
Quoting T!M:
Quote:
Quoting RHo:
Quote:
You are massively overstating.

Not at all.

virtually 100% is not overstated versus at most 50%?

No, it's really not - honestly. Again, just have a look in the database - as I said, the CLTPlus-tool is extremely useful in showing you these differences - and see for yourself. I'm not just shouting something here - it comes from what I'm actually finding in the database on a daily basis.

Sorry, your maiden name argument is only valid for women's names. And then only if there is some doubt.

All men's names do not suffer from the maiden name problem. That's 50%.
And women's names consisting of 2 clear given names and a family name (e.g. Marie Jane Smith) have no problem as well. All last names with a hyphen do not have a problem.

How can you assert that virtually every 3 part name has a problem?

BTW Yes, nevertheless, I would prefer a unified single name field for those problems and other problems as well (like the Asian name problem).
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,736
Posted:
PM this userDirect link to this postReply with quote
Quoting RHo:
Quote:
How can you assert that virtually every 3 part name has a problem?

Simple: by looking up the existing parsing variants for a hundred 3 part names, and seeing that literally each of them has differently parsed variants in the database. Again, it's not just a random assumption: it's what I actually found in the database.

Quote:
BTW Yes, nevertheless, I would prefer a unified single name field for those problems and other problems as well (like the Asian name problem).

DVD Profiler Unlimited RegistrantStar ContributorRHo
Registered: March 13, 2007
Posts: 2,759
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Quote:
BTW Yes, nevertheless, I would prefer a unified single name field for those problems and other problems as well (like the Asian name problem).


But this still would not fix linking. We would still have the accents, capitalisation, spacing, and other data entry problems which result in name variants for the same credit.
DVD Profiler Unlimited RegistrantStar ContributorAddicted2DVD
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 17,334
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting RHo:
Quote:

And women's names consisting of 2 clear given names and a family name (e.g. Marie Jane Smith) have no problem as well.


I just wanted to point out... don't be so sure about this... where (at least here in the US) there is no rule as to what is a given name or a last name. Just looking at that name (Mary Jane Smith).... how do you know Jane is a given name?

White Pages showing several pages of people Last Name Jane: LINK

Just the fact that Jane is a proven last name as well shows it is just as possible for someone to have a double barreled last name... and it could very well be Mary /  /Jane Smith.

It is just impossible to know for sure what the parsing is... which is one of the main reasons I am sorry that Ken don't want to go back to  1 field for names.
Pete
DVD Profiler Unlimited RegistrantStar ContributorCharlieM
Registered Sept 5 2005
Registered: May 20, 2007
Reputation: High Rating
United States Posts: 2,934
Posted:
PM this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
I think people look at this wrong. 

WE are not emphasizing a Frenchman entering credits for a french film.  Obviously that is easy.  What we I worry about is the differences when an American/Australian/Brazillian (or you take your pick) has a french name in a film for there local release or the French Film released outside of France (or any other nation combination not picking on the French)  It is these differences in entering the same name that is the problem.

If the system was different, and it was movie centric instead of DVD centric, the problem would not be as great.  In reality that way the name is only entered once instead of100 times for a given movie release.

Charlie
DVD Profiler Unlimited RegistrantStar Contributorsurfeur51
Since July 3, 2003
Registered: March 29, 2007
Reputation: Great Rating
France Posts: 4,479
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting T!M:
Quote:

It does not. Not at all. Virtually every name that consists of more than two parts have separate, non-linking entries for the same person in the database.


And virtually every accented name have separate, non-linking entries for the same person in the database. For you, one (parsing) is a big problem, and the second (accents) is not, as you state in the Contribtion Rule Committee forum. I hardly see the difference. The André Maranne example clearly shows that even when the common name is proved per invelos rules, the database keeps all different non linking entries.
Images from movies
DVD Profiler Unlimited RegistrantStar ContributorKathy
Registered: May 29, 2007
Reputation: Highest Rating
United States Posts: 3,475
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
I've read these discussions many times over the years. I understand and can appreciate both sides of the issue.

I clearly see what each side has to say - I don't think those points to be brought to the table any more.

But, and please correct me if I'm wrong, it seems that the solution is not in our hands.

Why do I say this? Because the only solutions presented here that has the ability to satisfy both sides of this issue seems to require an overhaul of Ken's database.

Until that happens, the continual rehashing of the same arguments seems to me to be a waste of time and energy.

Is there a solution that satisfies both sides? Try and come up with one looking at it from others point of view.

I know what everyone wants - please work on the solution.
    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1... 10 11 12 13  Previous   Next