Q&A: The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)

Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:02

yet so huh you're talking about models like

00:00:06

t. v.s models design for running g. e. so

00:00:12

so it's much design ah ah ah for running a chip

00:00:18

i see that things would have been things that are good she age

00:00:24

ah well your chips and she says something that that we still exploring internally uh up

00:00:30

but have found to be quite powerful is the ability for us to work with sparse models

00:00:37

so the way out architecture does design we actually don't spend any compute on sparsely it's so

00:00:44

you can scale models to separate regions of bottom it doesn't enforce spa sitting in their training

00:00:50

and that allows them to one and the same amount of time isn't model that several orders of magnitude smaller

00:00:56

but without running into a new but while

00:01:00

achieving significantly higher performance so in terms of

00:01:04

performance on the target task so this is something that that is fundamentally special to our architecture

00:01:11

and uh it's something that we're we're still working on but it it's something that

00:01:15

we plan to warm up to a stack such that we can offer the ability for

00:01:21

people to train sparse city with part city uh by just you know just specifying an initial command line flack

00:01:28

so that's basically leads to proceed in the way to the more yes

00:01:34

uh_huh

00:01:42

so big one pictures of a day or so short lived for what

00:01:48

is the highest absolute highest priority for zeros but for for t. v.

00:01:55

you know i would say uh so for us it's it's something

00:02:02

so sparse it is actually something that that we're going full tilt on

00:02:06

up because uh i i think that fundamentally and locks capabilities that are

00:02:11

really not possible with any other hardware with everything else we really much faster but it's

00:02:16

still things that are possible another hard but so for sparse it is something that's incredibly important

00:02:21

and something else that great exploding quite a and he's the ability to also incorporate in sport

00:02:27

so i mean for some money from human feedback directly on chip on to the whole system so this is also something that but

00:02:34

has been proven to be like incredibly powerful and and in exposing that eating capabilities in

00:02:39

the models so for us that's also something that that we are focusing on quite funky

00:02:46

uh_huh it's just uh_huh capital if you can yeah we get here

00:02:53

great thank you should uh and so uh uh it's very interesting you know it things

00:02:57

were going for the presentation and for the

00:03:00

examples you you showed is a huge concrete examples

00:03:04

uh did you uh and maybe he'll be a cost comparison between running

00:03:10

when you were a machines with respect and a

00:03:14

commercial g. views 'cause i think that that's a major

00:03:17

factor for the adoption does commercial jeep user is the

00:03:20

produce so made and it'd cost is pretty low today

00:03:24

yeah so i'm just to to give you an estimate um to

00:03:30

go back to your soul to train should be deeply excel um

00:03:36

two so the one point three billion billion but i mean the model will get half a day on the seats too

00:03:42

and will cost roughly two thousand five hundred this is about the

00:03:45

same as it would cost on on a public cloud as well on

00:03:49

it up because for example and but the the t. difference here is

00:03:53

that one endeavour is it going to take you much longer to achieve

00:03:57

full training completion so what what even though that on the price is quite similar in terms of what the offering

00:04:05

you can essentially i'm too late much quickly um one small models and then finally decide when you

00:04:11

decide okay this is the more that i want to train you can have your rating model much fast

00:04:18

cool right so it either comparable cost but indian much shorter training doesn't yes cool

00:04:25

uh_huh

00:04:28

um i think we should have a state yeah that's one train you're

00:04:33

doing things fast i you still have a you know i just um

00:04:37

nice sat completely times anything else a cheap in terms of convention right

00:04:43

um of to be honest i'm not entirely shot with just because we faster that we use for

00:04:48

and then she violently do this i'm not sure if it turns out to be overall cheaper than painting on to use

00:04:55

but we also partner with with a cloud providers

00:05:00

um like for example one of five lot providers is

00:05:03

a clean e. i. which is based in sweden and they

00:05:06

actually are net cover negative so when you train models you actually

00:05:10

have cop wouldn't uh credits that you generate from your training so

00:05:14

i'm in that sense that's something that we do focus on but

00:05:18

yeah to answer your question it's not necessarily something that that this is to is currently optimised for

00:05:28

i'm a recent addition thank you

00:05:32

i i get a huge just two questions the first one is shoulders and you much of a of a way for

00:05:39

start almost six what is sixty four chips or one chip uh some

00:05:44

it it's a single chip exchange yes in what use the the cost of us used to

00:05:51

the cost of the c. is to uh it is something i

00:05:54

think that but probably is best uh discussed without without sales team

00:05:59

but it's off the order of a more than a million dollars

00:06:03

yes uh_huh but it just about the the way for manufacturing process

00:06:10

yeah yeah yeah yeah it's it's basically the

00:06:15

entire way for repeatedly crop out the i just

00:06:18

to make the square and that's it s. e. g. yeah i it's the whole way for it

Share this talk:

Conference Program

09:52

The Evolution of Large Language Models that led to ChatGPT (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 8:34 a.m.

664 views

30:48

Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.

369 views

25:22

Inference using Large Language Models (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:19 a.m.

12:41

Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:45 a.m.

19:12

ChatGPT for Digital Marketing
Floris Keijser, N98 Digital Marketing
March 10, 2023 · 9:58 a.m.

18:16

Biomedical Inference & Large Language Models
Oskar Wysocki, University of Manchester
March 10, 2023 · 10:19 a.m.

20:13

Abstract Reasoning
Marco Valentino, Idiap Research Institute
March 10, 2023 · 10:38 a.m.

120 views

15:42

Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 10:58 a.m.

18:35

The Risks Behind Large Language Models (Al Brown, Fujitsu)
Al Brown, Fujitsu
March 10, 2023 · 1:42 p.m.

05:07

Q&A: The Risks Behind Large Language Models (Al Brown, Fujitsu)
Al Brown, Fujitsu
March 10, 2023 · 2:01 p.m.

57:08

Round Table: Risks & Broader Societal Impact (Legal, Educational and Labor)
Lonneke van der Plas, Idiap Research Institute
March 10, 2023 · 2:07 p.m.

18:17

The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)
Vinay Pondenkandath, Cerebras Systems
March 10, 2023 · 3:12 p.m.

06:37

Q&A: The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)
Vinay Pondenkandath, Cerebras Systems
March 10, 2023 · 3:30 p.m.

Recommended talks

20:33

Management of health data in an industrial company
V. Srikanth Nallanthighal
Sept. 5, 2019 · 1:57 p.m.

146 views

Q&A: The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)
Vinay Pondenkandath, Cerebras Systems

Embed

Transcriptions

Conference Program

The Evolution of Large Language Models that led to ChatGPT (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 8:34 a.m.

Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.

Inference using Large Language Models (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:19 a.m.

Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:45 a.m.

ChatGPT for Digital Marketing
Floris Keijser, N98 Digital Marketing
March 10, 2023 · 9:58 a.m.

Biomedical Inference & Large Language Models
Oskar Wysocki, University of Manchester
March 10, 2023 · 10:19 a.m.

Abstract Reasoning
Marco Valentino, Idiap Research Institute
March 10, 2023 · 10:38 a.m.

Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 10:58 a.m.

The Risks Behind Large Language Models (Al Brown, Fujitsu)
Al Brown, Fujitsu
March 10, 2023 · 1:42 p.m.

Q&A: The Risks Behind Large Language Models (Al Brown, Fujitsu)
Al Brown, Fujitsu
March 10, 2023 · 2:01 p.m.

Round Table: Risks & Broader Societal Impact (Legal, Educational and Labor)
Lonneke van der Plas, Idiap Research Institute
March 10, 2023 · 2:07 p.m.

The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)
Vinay Pondenkandath, Cerebras Systems
March 10, 2023 · 3:12 p.m.

Q&A: The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)
Vinay Pondenkandath, Cerebras Systems
March 10, 2023 · 3:30 p.m.

Recommended talks

Management of health data in an industrial company
V. Srikanth Nallanthighal
Sept. 5, 2019 · 1:57 p.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Q&A: The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems) Vinay Pondenkandath, Cerebras Systems

Embed

Transcriptions

Conference Program

The Evolution of Large Language Models that led to ChatGPT (Andre Freitas, Idiap) Andre Freitas, Idiap Research Institute March 10, 2023 · 8:34 a.m.

Understanding Transformers James Henderson, Idiap Research Institute March 10, 2023 · 8:46 a.m.

Inference using Large Language Models (Andre Freitas, Idiap) Andre Freitas, Idiap Research Institute March 10, 2023 · 9:19 a.m.

Q&A Andre Freitas, Idiap Research Institute March 10, 2023 · 9:45 a.m.

ChatGPT for Digital Marketing Floris Keijser, N98 Digital Marketing March 10, 2023 · 9:58 a.m.

Biomedical Inference & Large Language Models Oskar Wysocki, University of Manchester March 10, 2023 · 10:19 a.m.

Abstract Reasoning Marco Valentino, Idiap Research Institute March 10, 2023 · 10:38 a.m.

Q&A Andre Freitas, Idiap Research Institute March 10, 2023 · 10:58 a.m.

The Risks Behind Large Language Models (Al Brown, Fujitsu) Al Brown, Fujitsu March 10, 2023 · 1:42 p.m.

Q&A: The Risks Behind Large Language Models (Al Brown, Fujitsu) Al Brown, Fujitsu March 10, 2023 · 2:01 p.m.

Round Table: Risks & Broader Societal Impact (Legal, Educational and Labor) Lonneke van der Plas, Idiap Research Institute March 10, 2023 · 2:07 p.m.

The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems) Vinay Pondenkandath, Cerebras Systems March 10, 2023 · 3:12 p.m.

Q&A: The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems) Vinay Pondenkandath, Cerebras Systems March 10, 2023 · 3:30 p.m.

Recommended talks

Management of health data in an industrial company V. Srikanth Nallanthighal Sept. 5, 2019 · 1:57 p.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Q&A: The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)
Vinay Pondenkandath, Cerebras Systems

The Evolution of Large Language Models that led to ChatGPT (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 8:34 a.m.

Understanding Transformers
James Henderson, Idiap Research Institute
March 10, 2023 · 8:46 a.m.

Inference using Large Language Models (Andre Freitas, Idiap)
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:19 a.m.

Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 9:45 a.m.

ChatGPT for Digital Marketing
Floris Keijser, N98 Digital Marketing
March 10, 2023 · 9:58 a.m.

Biomedical Inference & Large Language Models
Oskar Wysocki, University of Manchester
March 10, 2023 · 10:19 a.m.

Abstract Reasoning
Marco Valentino, Idiap Research Institute
March 10, 2023 · 10:38 a.m.

Q&A
Andre Freitas, Idiap Research Institute
March 10, 2023 · 10:58 a.m.

The Risks Behind Large Language Models (Al Brown, Fujitsu)
Al Brown, Fujitsu
March 10, 2023 · 1:42 p.m.

Q&A: The Risks Behind Large Language Models (Al Brown, Fujitsu)
Al Brown, Fujitsu
March 10, 2023 · 2:01 p.m.

Round Table: Risks & Broader Societal Impact (Legal, Educational and Labor)
Lonneke van der Plas, Idiap Research Institute
March 10, 2023 · 2:07 p.m.

The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)
Vinay Pondenkandath, Cerebras Systems
March 10, 2023 · 3:12 p.m.

Q&A: The Infrastructure to build Large Language Models (Vinay Pondenkandath, Cerebras Systems)
Vinay Pondenkandath, Cerebras Systems
March 10, 2023 · 3:30 p.m.

Management of health data in an industrial company
V. Srikanth Nallanthighal
Sept. 5, 2019 · 1:57 p.m.