Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:02
yet so huh you're talking about models like
00:00:06
t. v.s models design for running g. e. so
00:00:12
so it's much design ah ah ah for running a chip
00:00:18
i see that things would have been things that are good she age
00:00:24
ah well your chips and she says something that that we still exploring internally uh up
00:00:30
but have found to be quite powerful is the ability for us to work with sparse models
00:00:37
so the way out architecture does design we actually don't spend any compute on sparsely it's so
00:00:44
you can scale models to separate regions of bottom it doesn't enforce spa sitting in their training
00:00:50
and that allows them to one and the same amount of time isn't model that several orders of magnitude smaller
00:00:56
but without running into a new but while
00:01:00
achieving significantly higher performance so in terms of
00:01:04
performance on the target task so this is something that that is fundamentally special to our architecture
00:01:11
and uh it's something that we're we're still working on but it it's something that
00:01:15
we plan to warm up to a stack such that we can offer the ability for
00:01:21
people to train sparse city with part city uh by just you know just specifying an initial command line flack
00:01:28
so that's basically leads to proceed in the way to the more yes
00:01:34
uh_huh
00:01:42
so big one pictures of a day or so short lived for what
00:01:48
is the highest absolute highest priority for zeros but for for t. v.
00:01:55
you know i would say uh so for us it's it's something
00:02:02
so sparse it is actually something that that we're going full tilt on
00:02:06
up because uh i i think that fundamentally and locks capabilities that are
00:02:11
really not possible with any other hardware with everything else we really much faster but it's
00:02:16
still things that are possible another hard but so for sparse it is something that's incredibly important
00:02:21
and something else that great exploding quite a and he's the ability to also incorporate in sport
00:02:27
so i mean for some money from human feedback directly on chip on to the whole system so this is also something that but
00:02:34
has been proven to be like incredibly powerful and and in exposing that eating capabilities in
00:02:39
the models so for us that's also something that that we are focusing on quite funky
00:02:46
uh_huh it's just uh_huh capital if you can yeah we get here
00:02:53
great thank you should uh and so uh uh it's very interesting you know it things
00:02:57
were going for the presentation and for the
00:03:00
examples you you showed is a huge concrete examples
00:03:04
uh did you uh and maybe he'll be a cost comparison between running
00:03:10
when you were a machines with respect and a
00:03:14
commercial g. views 'cause i think that that's a major
00:03:17
factor for the adoption does commercial jeep user is the
00:03:20
produce so made and it'd cost is pretty low today
00:03:24
yeah so i'm just to to give you an estimate um to
00:03:30
go back to your soul to train should be deeply excel um
00:03:36
two so the one point three billion billion but i mean the model will get half a day on the seats too
00:03:42
and will cost roughly two thousand five hundred this is about the
00:03:45
same as it would cost on on a public cloud as well on
00:03:49
it up because for example and but the the t. difference here is
00:03:53
that one endeavour is it going to take you much longer to achieve
00:03:57
full training completion so what what even though that on the price is quite similar in terms of what the offering
00:04:05
you can essentially i'm too late much quickly um one small models and then finally decide when you
00:04:11
decide okay this is the more that i want to train you can have your rating model much fast
00:04:18
cool right so it either comparable cost but indian much shorter training doesn't yes cool
00:04:25
uh_huh
00:04:28
um i think we should have a state yeah that's one train you're
00:04:33
doing things fast i you still have a you know i just um
00:04:37
nice sat completely times anything else a cheap in terms of convention right
00:04:43
um of to be honest i'm not entirely shot with just because we faster that we use for
00:04:48
and then she violently do this i'm not sure if it turns out to be overall cheaper than painting on to use
00:04:55
but we also partner with with a cloud providers
00:05:00
um like for example one of five lot providers is
00:05:03
a clean e. i. which is based in sweden and they
00:05:06
actually are net cover negative so when you train models you actually
00:05:10
have cop wouldn't uh credits that you generate from your training so
00:05:14
i'm in that sense that's something that we do focus on but
00:05:18
yeah to answer your question it's not necessarily something that that this is to is currently optimised for
00:05:28
i'm a recent addition thank you
00:05:32
i i get a huge just two questions the first one is shoulders and you much of a of a way for
00:05:39
start almost six what is sixty four chips or one chip uh some
00:05:44
it it's a single chip exchange yes in what use the the cost of us used to
00:05:51
the cost of the c. is to uh it is something i
00:05:54
think that but probably is best uh discussed without without sales team
00:05:59
but it's off the order of a more than a million dollars
00:06:03
yes uh_huh but it just about the the way for manufacturing process
00:06:10
yeah yeah yeah yeah it's it's basically the
00:06:15
entire way for repeatedly crop out the i just
00:06:18
to make the square and that's it s. e. g. yeah i it's the whole way for it

Share this talk: 


Conference Program

The Evolution of Large Language Models that led to ChatGPT (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 8:34 a.m.
664 views
Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.
369 views
Inference using Large Language Models (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:19 a.m.
Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:45 a.m.
ChatGPT for Digital Marketing
Floris Keijser, N98 Digital Marketing
March 10, 2023 · 9:58 a.m.
Biomedical Inference & Large Language Models
Oskar Wysocki, University of Manchester
March 10, 2023 · 10:19 a.m.
Abstract Reasoning
Marco Valentino, Idiap Research Institute
March 10, 2023 · 10:38 a.m.
120 views
Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 10:58 a.m.
Round Table: Risks & Broader Societal Impact (Legal, Educational and Labor)
Lonneke van der Plas, Idiap Research Institute
March 10, 2023 · 2:07 p.m.

Recommended talks

Management of health data in an industrial company
V. Srikanth Nallanthighal
Sept. 5, 2019 · 1:57 p.m.
146 views