Datacamp – ETL

What problem are you solving?:

Extraction, transformation and loading of data: “Screen scraping” of web pages, gathering of data from other sources, cleansing and transformation of data.

Describe your idea:

Library should serve as a base for other transparency applications that would like to deal with data extraction from other sources, data transformations and data loading (into application database).

Provide a library and set of tools for:
* automating and scheduling data management tasks
* parallel managed screen-scraping
* remote control and configuration through database
* make development of data/information extractions easier
* tools for data transformations and data cleansing (to make further analysis easier)
* parallel downloading (1M pages < 2hours)

Objective is to:
* simplification of extraction development
* easy task management and configuration
* immediate and easy to get feedback about success/failure of tasks
* use modular approach for tasks – each task is independent module

STATUS

Currently used in Datacamp application – transparency application for sharing government data in Slovakia.

What country will this operate in?: Slovakia

Who are you?:

Analyst with background in bsiness intelligence, data warehouse and knowledge management.

Currently working on Datacamp – Fair-play alliance application to publish government data in Slovakia in “Web 2.0″ way.