in Education by
Here's a puzzle... I have two databases of the same 50000+ electronic products and I want to match products in one database to those in the other. However, the product names are not always identical. I've tried using the Levenshtein distance for measuring the string similarity however this hasn't worked. For example, -LG 42CS560 42-Inch 1080p 60Hz LCD HDTV -LG 42 Inch 1080p LCD HDTV These items are the same, yet their product names vary quite a lot. On the other hand... -LG 42 Inch 1080p LCD HDTV -LG 50 Inch 1080p LCD HDTV These are different products with very similar product names. How should I tackle this problem? Select the correct answer from above options

1 Answer

0 votes
by
The first thing you should do is to parse the names into a description of features (company LG, size 42 Inch, resolution 1080p, type LCD HDTV). Then you can match these descriptions against each other for compatibility; it's okay to omit a product number but bad to have different sizes. Simple are-the-common-attributes-compatible might be enough, or you might have to write/learn rules about how much different attributes are allowed to differ and so on. Depending on how many various kinds of products you have and how different the listed names are, I might actually start by manually defining a set of attributes and possibly even just adding specific words/regex to match them, iteratively seeing what isn't been parsed so far and adding rules for that. I'd imagine there's not a lot of ambiguity in terms of one vocabulary item possibly belonging to multiple attributes, though without seeing your database I guess I don't know. If that's not going to be feasible, this extraction is kind of analogous to semi-supervised part-of-speech tagging. It's somewhat different, though, in that I imagine the vocabulary is much more limited than typical parsing, and in that, the space of product names is more hierarchical: the resolution tag only applies to certain kinds of products. I'm not very familiar with that literature; there might be some ideas you could use.

Related questions

0 votes
    I have learned a Machine Learning course using Matlab as a prototyping tool. Since I got addicted to F#, I ... of resources? Thanks. Select the correct answer from above options...
asked Jan 30, 2022 in Education by JackTerrance
0 votes
    I just started with machine learning. I want to know about the applications of machine learning. I know we ... recent applications. Select the correct answer from above options...
asked Jan 26, 2022 in Education by JackTerrance
0 votes
    While training my neural network using Theano or tensorflow, a variable called loss per epoch was reported. Now ... neural network? Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
0 votes
    It is a principal question, regarding the theory of neural networks: Why do we have to normalize the input for ... is not normalized? Select the correct answer from above options...
asked Jan 27, 2022 in Education by JackTerrance
0 votes
    I'm Working on document classification tasks in java. Both algorithms came highly recommended, what are the ... Processing tasks? Select the correct answer from above options...
asked Feb 2, 2022 in Education by JackTerrance
0 votes
    I am receiving the error: ValueError: Wrong number of items passed 3, placement implies 1, and I am struggling to ... 'sigma'] = sigma Select the correct answer from above options...
asked Feb 1, 2022 in Education by JackTerrance
0 votes
    I'm looking for a decent implementation of the OPTICS algorithm in Python. I will use it to form density-based ... to that cluster. Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    I'm learning the difference between the various machine learning algorithms. I understand that the implementations of ... for that? Select the correct answer from above options...
asked Jan 25, 2022 in Education by JackTerrance
0 votes
    I'm trying to write a program that takes text(article) as input and outputs the polarity of this text, ... open-source implementation. Select the correct answer from above options...
asked Feb 4, 2022 in Education by JackTerrance
0 votes
    I'm hoping to use either Haskell or OCaml on a new project because R is too slow. I need to be able to ... in either Haskell or OCaml? Select the correct answer from above options...
asked Feb 8, 2022 in Education by JackTerrance
0 votes
    I'm hoping to use either Haskell or OCaml on a new project because R is too slow. I need to be able to ... in either Haskell or OCaml? Select the correct answer from above options...
asked Feb 5, 2022 in Education by JackTerrance
0 votes
    I know the basics of feedforward neural networks, and how to train them using the backpropagation algorithm, but I'm ... , even better. Select the correct answer from above options...
asked Feb 8, 2022 in Education by JackTerrance
0 votes
    I'm writing a game that's a variant of Gomoku. Basically a tic tac toe on a huge board. Wondering if anyone ... [self put randomly]; } Select the correct answer from above options...
asked Feb 4, 2022 in Education by JackTerrance
0 votes
    Trying to implement OCR in the bank environment but the challenge is, we don't have access to an internet connection ... in our bank. Select the correct answer from above options...
asked Jan 31, 2022 in Education by JackTerrance
0 votes
    According to some document the weight adjustment formula will be: new weight = old weight + learning rate * delta ... ) is enough? Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
...