Introduction – Big Data & Entity Resolution [Lecture 1]
Introduction on the techniques that will be presented at the lectures, and on the structure/organization of the course.
Web Data [Lecture 2]
Similarity Measures [Lecture 3]
Blocking for Entity Resolution [Lecture 4]
Instructions for the first assignment. Students presentations on Friday, January 25, Pinni B0016, 14.00-16.00.
Objectives of Entity Resolution [Lecture 5]
Assignment 1 — Students presentations — Friday, January 25, Pinni B0016, 14.00-16.00.
Assignment 1, students presentations, second part – Thursday, January 31, Pinni B0016, 14.00-16.00.
Blocking for Entity Resolution II [Lecture 6]
Meta-blocking for Entity Resolution [Lecture 7]
Please send your teams (team members names and students ids) for the project work at firstname.lastname@example.org before Thursday, February 7.
Noisy-aware Entity Resolution [Lecture 8]
Students presentations — Friday, February 8, Pinni B0016, 14.00-16.00 — Assignment 2
Assignment 4 [Noisy-aware Entity Resolution]
This assignment refers to our today’s lecture. Please read carefully the related article here. Then update, the workflow of the example in Figure 1 in the article, by considering instead of attribute clustering blocking, the token blocking method. Please send the updated figure/workflow to email@example.com before February 25, 2019.
Assignment 2, students presentations, second part – Thursday, February 14, Pinni B0016, 14.00-16.00.
Iterative Entity Resolution [Lecture 9]
Iterative Blocking [Lecture 10]
Students presentations — Thursday & Friday, February 21 & 22, Pinni B0016, 14.00-16.00 — Assignment 3
No lectures – free time for working on your project and project report. More details here.
The project work will include the implementation of algorithms that realize steps of the entity resolution process. Please work on groups of two (2). All groups will be examined on the project codes on March 7 and March 8, 14.00-16.00 – Room: PINNI B2052 (More details will be announced later).
For testing, please have with you either your laptop or the source codes.
For testing your codes please create 2 small clean datasets, each one containing around 100 to 150 entity descriptions, with 2 to 4 attribute-value pairs. Try to ensure that there exist some common tokens among the entity descriptions values.
The project will be accompanied with a short report (about 5 pages long), describing algorithms and implementations. Please send your report to firstname.lastname@example.org before March 9, 2019.
Projects examination (More details to be announced)
Send your project report (pdf files only) via email at email@example.com before Saturday, March 9.