KohoVolit.eu this week

Jaroslav posted on KohoVolit.eu blog that their team has finished data scraper for Chamber of deputies of Czech parliament:

“The scraping of official website will be performed periodically to keep the data up-to-date. Three issues are needed to address within it: insert new information that appeared on the official website into the database, update the information that is already stored in the database but it has changed on the official website and finally mark the information in the database that is no more present on the official website as obsolete. The obsolete information is rather marked than completely deleted in sake of references to history or providing some statistics in time. The same is true for any database updates – the original information is never deleted but together with time interval it was valid through moved to the set of historical records.

Thus the scraper that solely extracts the data from source is called by an updater that controls the process of updating data in the database.”

Next steps? Similar scraper for the Senate and updaters for both!;)