Embed code
Note: this content has been automatically generated.
00:00:00
data is often directed and dirty they take is wrong information according
00:00:04
to recent studies data find this and after eighty percent of
00:00:08
the time preparing directed data before it can be useful data analysis
00:00:12
and therefore they take a longer time to generate insights
00:00:16
the main reason for these is that the data cleaning process is a costly therapist process because that does scientists
00:00:22
need to perform operations started filling with the values fixing
00:00:26
around yourself and applying any sort of transformations
00:00:30
at the same time existing tools that try to automated data cleaning procedure
00:00:35
either focus on a specific data cleaning operation already facing
00:00:40
therefore from a user's perspective one has to use a
00:00:43
different potential inefficient tools for each category affairs
00:00:48
so how can we support arbitrary pickup cleaning operations which are subjective to
00:00:52
the use of manipulative data and the faster the same time
00:00:56
yes there is that we need cleaning language which is also coupled with that within that with an optimised algebra
00:01:04
so the call i know it's clean them and clinton support multiple types of
00:01:08
data cleaning operations and can be easily expanded to support more brains
00:01:13
so support operations that as a duplicate elimination violations off integrate because states that as
00:01:19
well lessons of functional dependence for example ten validation using dictionaries and so on
00:01:25
and clean them out all the sub races into a common autograph in order
00:01:30
to be able to you to optimise i mean the unified way
00:01:34
so that's not a great based on them one night calculus one night
00:01:37
out right construct that's them from category theory which are used
00:01:41
to rebut it to replace that are great given collection operators for
00:01:45
example mean maxed and some are classic examples of one night
00:01:49
that's for using them one like out because we can represent the
00:01:52
complex building blocks that data cleaning operation civil status clustering
00:01:57
and then one night carter's expressions are translated into and out the back plan where
00:02:02
we can perform up to make decisions by exploiting work setting opportunities for example
00:02:07
and finally uh that that's a great band gets translated into an optimist
00:02:12
physical pain plan which can be executed in the scale out fashion
00:02:16
we have implemented and evaluate it had these are three level
00:02:20
optimisation process and we have observed that compared to
00:02:23
existing data cleaning techniques clean them can support more data

Share this talk: 


Conference program

Welcome address
Andreas Mortensen, Vice President for Research, EPFL
7 June 2018 · 9:49 a.m.
Introduction
Jim Larus, Dean of IC School, EPFL
7 June 2018 · 10 a.m.
The Young Software Engineer’s Guide to Using Formal Methods
K. Rustan M. Leino, Amazon
7 June 2018 · 10:16 a.m.
Safely Disrupting Computer Networks with Software
Katerina Argyraki, EPFL
7 June 2018 · 11:25 a.m.
Short IC Research Presentation 2: Gamified Rehabilitation with Tangible Robots
Arzu Guneysu Ozgur, EPFL (CHILI)
7 June 2018 · 12:15 p.m.
Short IC Research Presentation 3: kickoff.ai
Lucas Maystre, Victor Kristof, EPFL (LCA)
7 June 2018 · 12:19 p.m.
Short IC Research Presentation 5: CleanM
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 12:25 p.m.
Short IC Research Presentation 6: Understanding Cities through Data
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 12:27 p.m.
Short IC Research Presentation 7: Datagrowth and application trends
Matthias Olma, EPFL (DIAS)
7 June 2018 · 12:31 p.m.
Short IC Research Presentation 8: Point Cloud, a new source of knowledge
Mirjana Pavlovic, EPFL (DIAS)
7 June 2018 · 12:34 p.m.
Short IC Research Presentation 9: To Click or not to Click?
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 12:37 p.m.
20s pitch 1: Cost and Energy Efficient Data Management
Utku Sirin, (DIAS)
7 June 2018 · 2:20 p.m.
20s pitch 2: Gamification of Rehabilitation
Arzu Guneysu Ozgur, EPFL (CHILI)
7 June 2018 · 2:21 p.m.
20s pitch 4: Neural Network Guided Expression Transformation
Romain Edelmann, EPFL (LARA)
7 June 2018 · 2:21 p.m.
20s pitch 5: Unified, High Performance Data Cleaning
Stella Giannakopoulo, EPFL (DIAS)
7 June 2018 · 2:21 p.m.
20s pitch 6: Interactive Exploration of Urban Data with GPUs
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 2:22 p.m.
20s pitch 7: Interactive Data Exploration
Matthias Olma, EPFL (DIAS)
7 June 2018 · 2:22 p.m.
20s pitch 8: Efficient Point Cloud Processing
Mirjana Pavlovic, EPFL (DIAS)
7 June 2018 · 2:23 p.m.
20s pitch 9: To Click or not to Click?
Eleni Tzirita Zacharatou, EPFL (DIAS)
7 June 2018 · 2:24 p.m.
20s pitch 10: RaaSS Reliability as a Software Service
Maaz Mohiuddlin, LCA2, IC-EPFL
7 June 2018 · 2:24 p.m.
20s pitch 11: Adversarial Machine Learning in Byzantium
El Mahdi El Mhamdi, EPFL (LPD)
7 June 2018 · 2:24 p.m.
Machine Learning: Alchemy for the Modern Computer Scientist
Erik Meijer, Facebook
7 June 2018 · 2:29 p.m.

Recommended talks

Theseus Medico and imaging in digital diagnoses
Dr. Sascha Seifert , Siemens Healthcare
7 June 2013 · 11:26 a.m.