What problem are you solving?:
Extraction, transformation and loading of data: “Screen scraping” of web pages, gathering of data from other sources, cleansing and transformation of data.
Describe your idea:
Library should serve as a base for other transparency applications that would like to deal with data extraction from other sources, data transformations and data loading (into application database).
Provide a library and set of tools for:
* automating and scheduling data management tasks
* parallel managed screen-scraping
* remote control and configuration through database
* make development of data/information extractions easier
* tools for data transformations and data cleansing (to make further analysis easier)
* parallel downloading (1M pages < 2hours)
Objective is to:
* simplification of extraction development
* easy task management and configuration
* immediate and easy to get feedback about success/failure of tasks
* use modular approach for tasks – each task is independent module
STATUS
Currently used in Datacamp application – transparency application for sharing government data in Slovakia.
What country will this operate in?: Slovakia
Who are you?:
Analyst with background in bsiness intelligence, data warehouse and knowledge management.
Currently working on Datacamp – Fair-play alliance application to publish government data in Slovakia in “Web 2.0″ way.