|
|
Welcome to the Invelos forums. Please read the forum
rules before posting.
Read access to our public forums is open to everyone. To post messages, a free
registration is required.
If you have an Invelos account, sign in to post.
|
|
|
|
Invelos Forums->General: Website Discussion |
Page:
1... 7 8 9 10 11 ...26 Previous Next
|
goodguy's Credit Lookup Plus |
|
|
|
Author |
Message |
Registered: March 14, 2007 | Reputation: | Posts: 4,693 |
| Posted: | | | | Just a hunch - could it be different name parsing? There are three possible ways to add "Queen Elizabeth": Queen Elizabeth// Queen/Elizabeth/ Queen//Elizabeth | | | My freeware tools for DVD Profiler users. Gunnar |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Quoting GSyren: Quote: Just a hunch - could it be different name parsing? There are three possible ways to add "Queen Elizabeth": Queen Elizabeth// Queen/Elizabeth/ Queen//Elizabeth Yeah, AiAustria told me before and I didn't get it. Now I do. I am matching CLT results now. Now to test and see if that fixes some of the other differences I found. Problem was that even though I was using the search field firstname to match on creditedAs, the CLT will also match to the first name in the database, if the middle and last names are null. So, I was missing "Queen Elizabeth" as a first name. So far, my code does not handle the parsing Queen/Elizabeth/. Should it? If so, it means the database is even more messed up than I thought. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | No sooner spoken, than smacked on the head. Just resolved another missing profile, due to the database having a blank firstname and using the middle and lastname for match to "zhang ziyi".
Hopefully now, that gets me there.
At least for the simple stuff. But I have found examples of things like of 2 and 3 word middle names.
So, if you start with a single field, with say 5 or 6 tokens, I guess the trick would be to generate all possible parsings into three fields and check for those (case insensitive) matches, along with a match on creditedAs.
It might be too late in the game for me to try and provide that parsing, without a rewrite, which I am not motivated to do until / unless I get the required results accuracy, now that I understand how the search need to work. But, the variants can be loaded from a CSV file, so the combinations can be generated either by hand, or some other tool. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Quoting mediadogg: Quote: ... So, if you start with a single field, with say 5 or 6 tokens, I guess the trick would be to generate all possible parsings into three fields and check for those (case insensitive) matches, along with a match on creditedAs.... At least for the comparison it is not necessary to parse. It is easier to concatenate F/M/L to "F M L" and compare this string. That is what the CLT does. Concerning the Queen Elizabeth Problem: A little bit weird, the following titles can't be downloaded by Add-Multiple-UPC: 8-273770-005144 026359-924620 Although they can be found and selected for download manually... | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Quoting AiAustria: Quote: Quoting mediadogg:
Quote: ... So, if you start with a single field, with say 5 or 6 tokens, I guess the trick would be to generate all possible parsings into three fields and check for those (case insensitive) matches, along with a match on creditedAs.... At least for the comparison it is not necessary to parse. It is easier to concatenate F/M/L to "F M L" and compare this string. That is what the CLT does.
Concerning the Queen Elizabeth Problem: A little bit weird, the following titles can't be downloaded by Add-Multiple-UPC: 8-273770-005144 026359-924620 Although they can be found and selected for download manually... Well of course that's how I started, but it isn't that simple. You don't understand the database structure. As I have explained before, the database does not have a single string to compare to. It has separate fields for first, middle, last and creditedAs. I can give the concatenated string to the CLT, but after scraping the UPCs, I have to then generate profile IDs from the UPCs and then search the database for credits. Not much point in further scraping the CLT web pages - what would be the point. We already have CLTPlus. Due to the inconsistent way people encode the data, you can find the whole name (as "queen elizabeth" ) in any one or split across those fields, totally dependent on how the user built the profile. The eleven credited profiles for queen elizabeth includes: - middle empty with "Queen Elizabeth" in credited as - first empty with queen and elizabeth in the middle and last fields - middle empty with first as queen and elizabeth as last - middle and last empty with first as "queen elizabeth" And I wouldn't be surprised to find other variations. And then there are cases of people putting in honorifics. I have seen a 3 word middle name. The database is a mess. I wish I had implemented your idea of accepting a string and then constructing a table of all the possible variants for the user to pick from. Didn't get it, sorry. But, I do have the table and the ability to bulk process, and a way to fill the table from a file. So, if you want to make a little script to go from a string to all the the variants, that would work. For now, I am trying to get the thing accurate and stable, and then maybe faster. I wish I had the energy to start over, knowing what I know now ... | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | If there were an alternative web browser object, it would be so much easier and faster. Anybody find one, please let me know. Maybe we even could chip in together and buy it? Hey now! Free and Open Source, WinForms and WPF. Hmm .... look here. Anybody interested? I'll share all I know to get you a head start on the scraping. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Quoting mediadogg: Quote: Well of course that's how I started, but it isn't that simple. You don't understand the database structure. As I have explained before, the database does not have a single string to compare to. It has separate fields for first, middle, last and creditedAs. I can give the concatenated string to the CLT, but after scraping the UPCs, I have to then generate profile IDs from the UPCs and then search the database for credits. How do you search the data base? I thought you have to fetch each profile with its UPC/locality before you can do anything further. If yes, what prevents you from concatenating F/M/L after fetched the data base record to get a comparable string? | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Quoting AiAustria: Quote: Quoting mediadogg:
Quote: Well of course that's how I started, but it isn't that simple. You don't understand the database structure. As I have explained before, the database does not have a single string to compare to. It has separate fields for first, middle, last and creditedAs. I can give the concatenated string to the CLT, but after scraping the UPCs, I have to then generate profile IDs from the UPCs and then search the database for credits. How do you search the data base?
I thought you have to fetch each profile with its UPC/locality before you can do anything further.
If yes, what prevents you from concatenating F/M/L after fetched the data base record to get a comparable string? One more time ... because there is no string in the database to compare with. Simple as that. To understand better what I am saying, look at the XML for any profile. How would you use a concatenated string to pull credits from the XML? (If you show me something I don't know, I will quickly and happily use it. Old dogg can learn new tricks.) The XML somewhat reflects the database structure. A plugin makes calls to the database, and get back either the XML, or program objects that have pretty much the same struture: <Credit FirstName="Alexander" MiddleName="" LastName="Payne" BirthYear="0" CreditType="Direction" CreditSubtype="Director" CreditedAs=""/> | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: May 19, 2007 | Reputation: | Posts: 5,715 |
| Posted: | | | | Quoting AiAustria: Quote: How do you search the data base? Maybe I am able to understand, if you can share the relevant part of the code... - ? | | | Complete list of Common Names • A good point for starting with Headshots (and v11.1) |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Quoting AiAustria: Quote: Quoting AiAustria:
Quote: How do you search the data base? Maybe I am able to understand, if you can share the relevant part of the code... - ? I'll send you a PM. I am still honoring my contract with Invelos to not reveal the plugin API in public, although I'm not sure it matters these days. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Oh wait, I think I read your post backwards. What you are suggesting is in fact what I am doing. Maybe what I should also do is get rid of the variants table and just give a single line text field and leave it at that? (of course I would also compare with creditAs, along with concatenated F/M/L). I'll try that. No more variants table? | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Ok, that did help by motivating me to simplify my code mish-mash. Now all the cases are covered, and it might be 1 nanosecond faster. Three brains are better than one. And no need to throw away the variant table, since each row gets concatenated, you can still put in variants any way you want. | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Quoting mediadogg: Quote: If there were an alternative web browser object, it would be so much easier and faster. Anybody find one, please let me know. Maybe we even could chip in together and buy it?
Hey now! Free and Open Source, WinForms and WPF. Hmm .... look here.
Anybody interested? I'll share all I know to get you a head start on the scraping. If that thing is as advertised, once could replace CLTPlus without a plugin, using WinForms or WPF and it would be fast and avoid IE. And if it produced XML that incuded the profile IDs, you could even run it through a small plugin that would give you the complete online XML to play with, also fast. I'm just sayin ... | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. | | | Last edited: by mediadogg |
| Registered: March 14, 2007 | Reputation: | Posts: 4,693 |
| Posted: | | | | Quoting mediadogg: Quote: - first empty with queen and elizabeth in the middle and last fields Hm, you're not supposed to be able to add cast/crew with an empty first name field, so I wonder how that got there? | | | My freeware tools for DVD Profiler users. Gunnar |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Quoting GSyren: Quote: Quoting mediadogg:
Quote: - first empty with queen and elizabeth in the middle and last fields Hm, you're not supposed to be able to add cast/crew with an empty first name field, so I wonder how that got there? Dunno, but as we have together found things like the uncredited entry in a different locality, and printer control characters in the Overview, people don't always follow the rules! Here is the profile: 4895024927206.21 <Actor FirstName="" MiddleName="Zhang" LastName="Ziyi" BirthYear="0" Role="Hu Li" CreditedAs="" Voice="false" Uncredited="false" Puppeteer="false"/> | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. |
| Registered: March 18, 2007 | Reputation: | Posts: 6,463 |
| Posted: | | | | Couple of other matching tips that most programmers have run across, but a reminder:
- use .Trim() to remove leading and trailing blanks - use .ToLower() (or ToUpper()) to remove case sensitivity - convert double blanks (" ") to single blanks(" ") to account for sloppy typing inside a field | | | Thanks for your support. Free Plugins available here. Advanced plugins available here. Hey, new product!!! BDPFrog. |
|
|
Invelos Forums->General: Website Discussion |
Page:
1... 7 8 9 10 11 ...26 Previous Next
|
|
|
|
|
|
|
|
|