Benchmarks
Benchmarks
Dataset specifying the best configuration
D1 | D2 | D3 | D5 | D8 | D10 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Recall | Precision | F1 | Recall | Precision | F1 | Recall | Precision | F1 | Recall | Precision | F1 | Recall | Precision | F1 | Recall | Precision | F1 | |
D1 | 1.000 | 0.788 | 0.881 | 1.000 | 0.263 | 0.416 | 0.989 | 0.281 | 0.438 | 1.000 | 0.322 | 0.488 | 0.933 | 0.417 | 0.576 | 1.000 | 0.263 | 0.416 |
D2 | 0.000 | 0.000 | 0.000 | 0.942 | 0.952 | 0.947 | 0.695 | 0.798 | 0.743 | 0.851 | 0.922 | 0.885 | 0.270 | 0.997 | 0.424 | 0.746 | 0.853 | 0.796 |
D3 | 0.037 | 0.539 | 0.069 | 0.431 | 0.354 | 0.389 | 0.674 | 0.584 | 0.625 | 0.486 | 0.425 | 0.454 | 0.092 | 0.545 | 0.158 | 0.523 | 0.472 | 0.496 |
D4 | 0.000 | 0.000 | 0.000 | 0.808 | 0.794 | 0.801 | 0.920 | 0.702 | 0.796 | 0.931 | 0.844 | 0.886 | 0.038 | 0.961 | 0.072 | 0.845 | 0.659 | 0.741 |
D5 | 0.096 | 0.812 | 0.172 | 0.856 | 0.288 | 0.431 | 0.673 | 0.235 | 0.348 | 0.792 | 0.330 | 0.466 | 0.753 | 0.671 | 0.709 | 0.859 | 0.302 | 0.446 |
D6 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.474 | 0.748 | 0.580 | 0.805 | 0.924 | 0.860 | 0.014 | 0.970 | 0.028 | 0.858 | 0.940 | 0.897 |
Datasets specs
Dataset | E1 | E2 | Entities E1 | Entities E2 | Duplicates |
---|---|---|---|---|---|
D1 | Restaurants 1 | Restaurants 2 | 339 | 2,256 | 89 |
D2 | Abt | Buy | 1,076 | 1,076 | 1,076 |
D3 | Amazon | Google Pr. | 1,354 | 3,039 | 1,104 |
D4 | IMDb | TMDb | 5,118 | 6,056 | 1,968 |
D5 | Walmart | Amazon | 2,554 | 22,074 | 853 |
D6 | IMDb | DBpedia | 27,615 | 23,182 | 22,863 |
Configurations specifics
Block Building | Blocking Cleaning | Comprison Cleaning | Entity Matching | Entity Clustering | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Method | Ratio | Pruning algorithm | Weighting Scheme | Algorithm | Representation Model | Similarity Function | Algorithm | Similarity Threshold | ||
D1 | Standard Blocking | Block Filtering | 0.050 | BLAST | ARCS | Profile Matcher | CHARACTER_BIGRAMS | COSINE_SIMILARITY | Unique Mapping Clustering | 0.90 |
D2 | Standard Blocking | Block Filtering | 0.900 | WEP | EJS | Profile Matcher | CHARACTER_TRIGRAMS_TF_IDF | ARCS_SIMILARITY | Unique Mapping Clustering | 0.90 |
D3 | Standard Blocking | Block Filtering | 0.600 | WNP | ARCS | Profile Matcher | TOKEN_BIGRAMS_TF_IDF | COSINE_SIMILARITY | Unique Mapping Clustering | 0.05 |
D4 | Standard Blocking | Block Filtering | 0.925 | CEP | ECBS | Profile Matcher | CHARACTER_FOURGRAMS_TF_IDF | ARCS_SIMILARITY | Unique Mapping Clustering | 0.85 |
D5 | Standard Blocking | Block Filtering | 0.075 | WEP | ARCS | Profile Matcher | CHARACTER_BIGRAMS_TF_IDF | COSINE_SIMILARITY | Unique Mapping Clustering | 0.65 |
D6 | Standard Blocking | Block Filtering | 0.575 | BLAST | X2 | Profile Matcher | TOKEN_UNIGRAMS_TF_IDF | ARCS_SIMILARITY | Unique Mapping Clustering | 0.25 |