Embed code
Note: this content has been automatically generated.
data is often directed and dirty they take is wrong information according
to recent studies data find this and after eighty percent of
the time preparing directed data before it can be useful data analysis
and therefore they take a longer time to generate insights
the main reason for these is that the data cleaning process is a costly therapist process because that does scientists
need to perform operations started filling with the values fixing
around yourself and applying any sort of transformations
at the same time existing tools that try to automated data cleaning procedure
either focus on a specific data cleaning operation already facing
therefore from a user's perspective one has to use a
different potential inefficient tools for each category affairs
so how can we support arbitrary pickup cleaning operations which are subjective to
the use of manipulative data and the faster the same time
yes there is that we need cleaning language which is also coupled with that within that with an optimised algebra
so the call i know it's clean them and clinton support multiple types of
data cleaning operations and can be easily expanded to support more brains
so support operations that as a duplicate elimination violations off integrate because states that as
well lessons of functional dependence for example ten validation using dictionaries and so on
and clean them out all the sub races into a common autograph in order
to be able to you to optimise i mean the unified way
so that's not a great based on them one night calculus one night
out right construct that's them from category theory which are used
to rebut it to replace that are great given collection operators for
example mean maxed and some are classic examples of one night
that's for using them one like out because we can represent the
complex building blocks that data cleaning operation civil status clustering
and then one night carter's expressions are translated into and out the back plan where
we can perform up to make decisions by exploiting work setting opportunities for example
and finally uh that that's a great band gets translated into an optimist
physical pain plan which can be executed in the scale out fashion
we have implemented and evaluate it had these are three level
optimisation process and we have observed that compared to
existing data cleaning techniques clean them can support more data

Share this talk: 


Conference program

Welcome address
Andreas Mortensen, Vice President for Research, EPFL
7 June 2018 · 9:49 a.m.
Introduction
Jim Larus, Dean of IC School, EPFL
7 June 2018 · 10 a.m.
The Young Software Engineer’s Guide to Using Formal Methods
K. Rustan M. Leino, Amazon
7 June 2018 · 10:16 a.m.
Safely Disrupting Computer Networks with Software
Katerina Argyraki, EPFL
7 June 2018 · 11:25 a.m.
Short IC Research Presentation 2: Gamified Rehabilitation with Tangible Robots
Arzu Guneysu Ozgur, EPFL (CHILI)
7 June 2018 · 12:15 p.m.
Short IC Research Presentation 3: kickoff.ai
Lucas Maystre, Victor Kristof, EPFL (LCA)
7 June 2018 · 12:19 p.m.
Short IC Research Presentation 5: CleanM
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 12:25 p.m.
Short IC Research Presentation 6: Understanding Cities through Data
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 12:27 p.m.
Short IC Research Presentation 7: Datagrowth and application trends
Matthias Olma, EPFL (DIAS)
7 June 2018 · 12:31 p.m.
Short IC Research Presentation 8: Point Cloud, a new source of knowledge
Mirjana Pavlovic, EPFL (DIAS)
7 June 2018 · 12:34 p.m.
Short IC Research Presentation 9: To Click or not to Click?
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 12:37 p.m.
20s pitch 1: Cost and Energy Efficient Data Management
Utku Sirin, (DIAS)
7 June 2018 · 2:20 p.m.
20s pitch 2: Gamification of Rehabilitation
Arzu Guneysu Ozgur, EPFL (CHILI)
7 June 2018 · 2:21 p.m.
20s pitch 4: Neural Network Guided Expression Transformation
Romain Edelmann, EPFL (LARA)
7 June 2018 · 2:21 p.m.
20s pitch 5: Unified, High Performance Data Cleaning
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 2:21 p.m.
20s pitch 6: Interactive Exploration of Urban Data with GPUs
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 2:22 p.m.
20s pitch 7: Interactive Data Exploration
Matthias Olma, EPFL (DIAS)
7 June 2018 · 2:22 p.m.
20s pitch 8: Efficient Point Cloud Processing
Mirjana Pavlovic, EPFL (DIAS)
7 June 2018 · 2:23 p.m.
20s pitch 9: To Click or not to Click?
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 2:24 p.m.
20s pitch 10: RaaSS Reliability as a Software Service
Maaz Mohiuddlin, LCA2, IC-EPFL
7 June 2018 · 2:24 p.m.
20s pitch 11: Adversarial Machine Learning in Byzantium
El Mahdi El Mhamdi, EPFL (LPD)
7 June 2018 · 2:24 p.m.
Machine Learning: Alchemy for the Modern Computer Scientist
Erik Meijer, Facebook
7 June 2018 · 2:29 p.m.

Recommended talks

Health in a world of Data : IoT, a Source of (Big) Data for the Digital Health
Gilles Mazars, Director of Engineering, Samsung Strategy and Innovation Center - SAMI Platform
11 June 2015 · 11:17 a.m.
session IM2 Start-up Jamboree, Koemei
Temitope Ola, Koemei SA
1 Sept. 2011 · 6:04 p.m.