Torch 2

Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.

00:00:00

Um small so these have the lunch break

00:00:04

use we just in front of this one also

00:00:08

we will have so the don't stop going on

00:00:11

tables on the should be useful suitable

00:00:13

for the of to people's movement usually

00:00:16

table for the user so just just you're

00:00:19

result was you really be in the cooking

00:00:22

okay okay now comes the more boring

00:00:29

talk. So it's it's a bit more hands on

00:00:34

it's a deeper dive and I don't expect

00:00:37

it to be as interesting for for most if

00:00:40

you so it'd be surprised if I see some

00:00:44

sleeping people alright so in this

00:00:49

topic we're gonna talk about two things

00:00:52

the first is how that answers then

00:00:55

stored this for econ charge. And then

00:00:59

they also talk about how how we use the

00:01:03

neural networks packages this the ad

00:01:08

this this display the stock basically

00:01:10

will get you to a place where you kind

00:01:15

of know now how most people use

00:01:18

storage. And then the next an X like

00:01:21

and the next time. Um I would I would

00:01:25

talk about more experimental stuff

00:01:29

that's a bit more interesting so think

00:01:32

of this as the middle part which gets

00:01:35

really boring us yeah yeah they'll be

00:01:41

available online I'll send you I'll

00:01:45

entrance selling. That's slide okay so

00:01:49

coming to pretenders and storage is and

00:01:53

George ten servers are as I said before

00:01:57

and I wish I race. They are through

00:02:01

major and memory so basically what that

00:02:04

means is let's say there's a two

00:02:06

dimensional tensor that's illustrated

00:02:08

here on in the middle. Um it has rose

00:02:15

of data and if you actually look at in

00:02:20

the system memory how how it's

00:02:23

represented because men like the

00:02:25

memories land here the first rose laid

00:02:28

out as is and then the second rose laid

00:02:31

out and the third and the fourth. Um

00:02:33

this is called a major and if you do it

00:02:36

the column eyes instead it's called

00:02:37

call a major. Um like I think matlab

00:02:43

this column major and number by Israel

00:02:47

major as well which is nice because the

00:02:51

the the layout and to large part the

00:02:55

format that fords and number by follow

00:02:57

is very similar so if you want to share

00:03:01

if you want to copy a utter sensor into

00:03:05

an umpire a or vice versa. And you

00:03:08

don't actually have to do any memory

00:03:09

copy so here in this example the cancer

00:03:19

here has the four of those and six

00:03:23

columns. So the size of the answer is

00:03:25

four by six and there is another

00:03:28

there's another thing that we common

00:03:31

use commonly use when we do any kind of

00:03:35

anti arrays it's called this tried. Um

00:03:38

so what's tried is is in that dimension

00:03:43

to access the next element how many how

00:03:47

many how many the elements and forward

00:03:52

in memory do you have to go. And so as

00:03:55

an example the orange block here the

00:04:00

first remote first column. It's a bad

00:04:04

memory location zero. Now to go to the

00:04:07

first yellow block how many how many

00:04:11

every access is do I have to do make

00:04:14

how about like how much I had I have to

00:04:16

go and that is I have to go six

00:04:18

elements forward and that's basically

00:04:20

that's right. So the stride here is six

00:04:24

in the first dimension and one in the

00:04:26

second dimension. It's one and the

00:04:28

second dimension because if I want to

00:04:31

access the next column for from going

00:04:34

for the first contra second colour for

00:04:36

example I just have to go one one

00:04:40

forward and the actual memory that you

00:04:43

see so why do we have the size in

00:04:49

stride. Um this actually give us a very

00:04:52

powerful way of indexing sub answers

00:04:57

basically choosing parts of cancers and

00:05:00

still operating on then that out doing

00:05:04

expensive operations like memory

00:05:06

copies. Um so as an example here. Let's

00:05:10

say I have a select operation of where

00:05:14

the I'm I'm selecting in the first

00:05:17

dimension the third element what that

00:05:20

means is I would want in the first to

00:05:22

mention that is over the rose I won the

00:05:24

third draw. Um that operation will have

00:05:28

to give me this this third row. And to

00:05:32

do this operation I don't actually have

00:05:34

to create a new memory copy the only

00:05:36

operate operation up to do here is

00:05:39

create just another cancer structure

00:05:42

that changes the size and the stride

00:05:44

and these storage offset that is where

00:05:47

my where might answer starts from and

00:05:49

and memory and it can still map to the

00:05:52

same underlying stories that that at

00:05:54

this tender points to so that also

00:05:58

would mean that if I change anything in

00:06:02

this sub tensor the values will also

00:06:07

change in the original tensor I've

00:06:12

illustrated this here. But these this

00:06:17

this substance Renault very actually

00:06:19

points out and memory and those are the

00:06:22

red red memory locations no if we do it

00:06:26

column wise so also as you see the

00:06:30

offset here now is thirty and which is

00:06:33

starting from this initial story

00:06:34

location how how much for their do I

00:06:37

have to go to start just start my this

00:06:42

up to answer this particular substance

00:06:44

or so west now do this colour wise if

00:06:49

we do this column wise we can still

00:06:51

create a select the sub tensor. So what

00:06:54

you notice here is that the the storage

00:06:58

is still contiguous in memory which

00:06:59

means distance or that answer that you

00:07:02

just created a by doing the select

00:07:04

operation here is every element to the

00:07:08

story just next to each other and this

00:07:10

is called the contiguous tensor. Um and

00:07:14

that doesn't hold if you for example do

00:07:17

column by selection. So if I the call

00:07:20

by selection so all of my calls here

00:07:23

are number and that's a selected in the

00:07:26

second dimension which is over columns

00:07:29

the third colour so I ask for this

00:07:33

particular call them. And that would

00:07:35

give me this particular cancer but if

00:07:38

you see how it's actually mapped into

00:07:41

the raw memory the it's the each of

00:07:46

these are not contiguous in memory but

00:07:49

you can still construct a tensor that

00:07:52

maps to this this particular substance

00:07:54

or by changing the size in this tried.

00:07:57

So the stride here is six which means

00:08:00

that from going from this orange shah

00:08:03

block to the yellow block after takes

00:08:05

six the I have to go six locations

00:08:09

forward and I will get my next element

00:08:11

in that answer and this dimension. And

00:08:14

the offset just points to the fact that

00:08:17

this particular stored stars from the

00:08:19

third element itself and so you start

00:08:22

with a relevant and then for every

00:08:24

element you one next use go six blocks

00:08:27

for and this is actually a very simple

00:08:32

we have mapping things the fact that

00:08:35

you have a tensor and the storage and

00:08:37

that answers map to storage and you can

00:08:39

extract sub answers. Um but it's

00:08:42

extremely powerful you can for example.

00:08:45

Um ask for the first three channels of

00:08:50

your image when they're say a hundred

00:08:52

twenty eight channels and operate on

00:08:54

that separately without having to do a

00:08:56

memory copies and so on. Um so that is

00:09:01

how tender in storage is work at at

00:09:06

like that at at at low level not

00:09:11

looking at some syntax. Um in torture

00:09:15

if you want to read low the package you

00:09:18

call the require sorry you call the

00:09:21

require function. So here we just

00:09:24

requiring towards the semicolon is

00:09:26

optional but if you use I python

00:09:29

notebooks torch actually it has support

00:09:32

for I put on the books where you can

00:09:35

you have a night porch colonel that you

00:09:39

can use which means that you can use I

00:09:43

python if you're familiar with that but

00:09:46

then you can use it as you always use

00:09:48

that has and line hell or to complete

00:09:50

and so on. So you load torture the

00:09:55

semicolon here is indict button

00:09:58

notebooks it wouldn't print out the

00:10:00

result and that's that's all it does so

00:10:05

the created tensor you create you had

00:10:09

the syntax for as part of the party

00:10:11

package you have several types of

00:10:13

dancers double tense reflect answer by

00:10:16

tensor along cancer and so on. Um and

00:10:20

you created tensor of size four times

00:10:24

six it's just the matrix the four by

00:10:26

six matrix. And cancers by default are

00:10:29

not initialised with any default values

00:10:34

so it's it's the standard now has an

00:10:38

initialised memory might cunts contain

00:10:41

all kinds of weird well is so let's

00:10:43

fill it up with a uniform noise you can

00:10:46

do that with the this call this colon

00:10:49

here is just a little a syntax for I

00:10:53

want to operate on this variables the

00:10:57

sequel and to saying a dot uniform of a

00:11:00

so you will keep seeing this call an

00:11:03

operator. It's just could like calling

00:11:06

method of a class. So you call uniform

00:11:11

here that fills the transfer that

00:11:13

uniform nice mean zero standard

00:11:15

deviation one if you actually pass

00:11:17

arguments of then you can actually

00:11:20

change them in a standard deviation you

00:11:22

can print that answer it will print a

00:11:24

screen in a nice format. Um and that's

00:11:29

basically the same tensor that we

00:11:31

wanted to create the ten zero will have

00:11:33

an underlying storage that you can also

00:11:35

access using the colon storage call and

00:11:40

that will actually turn the underlying

00:11:42

stores that you can directly manipulate

00:11:45

that for example and the other

00:11:50

operation we did previously was select

00:11:53

and similarly there's a select call

00:11:55

here the dimension and the element and

00:11:59

as you see you print out the D sept

00:12:03

answer here. It's has the same values

00:12:06

as the third that are doing here. And

00:12:10

to illustrate that the underlying ghost

00:12:12

stories a shared I show that if you

00:12:16

filled be that's some value let's say

00:12:18

with the values three the the the had

00:12:22

original tensor a in rotary also

00:12:26

changes values. And and this is a

00:12:28

pretty important detail to remember

00:12:31

when you're working it answers in sept

00:12:33

answers oh you don't get a call and you

00:12:41

get you get a single a vector when you

00:12:45

select a real it's a one dimensional

00:12:49

that so the print there is just showing

00:12:53

it column wise but that's about it.

00:12:55

Okay those are the basics of tenders I

00:13:04

I obviously wouldn't cover the whole

00:13:07

tensor library because it's has more

00:13:10

than a hundred and fifty tensor

00:13:11

functions of like you have that now do

00:13:16

we all stations compilations a lot of

00:13:18

blast calls a cancer manipulation

00:13:22

operations like narrowing indexing

00:13:25

masks selecting scatter gather logical

00:13:28

operators and so on. And it's fully

00:13:32

documented all the functions that

00:13:33

torture nicely documented with examples

00:13:37

what you expect from a nice library I

00:13:39

guess. And you also have in line help

00:13:41

boat and I torch as well as in the

00:13:44

regular George interpreter you can ask

00:13:48

for the hell by saying question mark

00:13:50

the function you're interested in and

00:13:52

it will in line give you they help you

00:13:55

line up that examples most of the time

00:13:59

coming to the next part I I talked

00:14:06

earlier about the jeep you supporting

00:14:08

words it's extremely seamless it's like

00:14:12

it's exactly like using the C C. P. U.

00:14:14

package except that instead of I

00:14:17

instead of using the torch dot float

00:14:21

answer or torso double tensor user uses

00:14:24

newtons or course taught couldn't

00:14:26

answer. And project couldn't answer is

00:14:29

afloat answer that sits on the jeep you

00:14:32

it also has all the mat operations

00:14:34

defined on it you can use it exactly

00:14:36

how you use the sepia cancer. But for

00:14:40

most of the operations because tense

00:14:42

for cancer operations jeep user usually

00:14:44

faster almost all the operations that

00:14:47

you try to do our faster and you're

00:14:52

only limited by the did the tensor size

00:14:55

you create is your only limited by the

00:14:57

the unwanted cheap P memory you have

00:15:00

which is usually much smaller than the

00:15:04

amount of CPU memory you have on most

00:15:06

systems okay so that's a basic overview

00:15:12

off towards that answers. I didn't go

00:15:16

into a lot of detail because it's

00:15:18

mostly once you get through the basics

00:15:21

it's mostly subtext may freely you look

00:15:23

at how you use non by or matlab

00:15:26

matrices for example you just look at

00:15:28

the functions you're interested and

00:15:29

then you would use that next I wanna

00:15:36

talk about training neural networks. So

00:15:40

neural networks the way you train them

00:15:44

there's a lot of mowing parts well you

00:15:46

can you networks. Um I just feel like I

00:15:50

just created a figure to map some or

00:15:56

most of the use cases can be mapped

00:15:59

into for example such a structure you

00:16:03

have most modern datasets I'm by mortar

00:16:08

I mean large datasets that don't fit.

00:16:11

And memory anymore you have of I'd say

00:16:15

sixteen gigs or sixty four gigs or you

00:16:18

know two fifty six gigs of CPU memory.

00:16:21

And you can no longer load your data

00:16:23

sets into memory like you load and then

00:16:25

Steve on the you know research still

00:16:27

carries on on and this for example

00:16:31

image that is the image that I in

00:16:34

classes dataset is one point two

00:16:35

terabytes and for as like guys as at

00:16:41

face book we consider that a small data

00:16:43

set so usually low these datasets and

00:16:47

some kind of disc either hard drive or

00:16:51

asses these are on some network file

00:16:55

system and then you have a data loader

00:16:58

that loads this data it it basically

00:17:02

you can ask for many batch of samples

00:17:05

attitude on the fly lo these many batch

00:17:08

of samples sent to process them augment

00:17:11

them do all kind of colour jitters and

00:17:14

cropping then all that and then it will

00:17:17

send that into some Q where your neural

00:17:21

network trainer can fall the many

00:17:25

batches off of that Q and the train

00:17:27

your neural network with the cost

00:17:29

function that you specified and the

00:17:31

optimisation algorithm that you specify

00:17:33

like a CD or add it or mass problem for

00:17:36

example and usually. Um these are multi

00:17:41

threaded or multi process. So the data

00:17:44

loader sits in a separate thread or

00:17:46

process and your main thread compute

00:17:52

the computes the neural network

00:17:56

process. And there are other there are

00:18:03

other right ease of this as well for

00:18:06

example if you are doing but serial

00:18:08

learning and bearings you have a much

00:18:13

smaller neural networks and these are

00:18:17

not that much faster than the jeep you

00:18:19

then on the CPU and you so one common

00:18:23

way to train these things is via how

00:18:25

well and by hog well you would have

00:18:29

multiple all these neural networks. Um

00:18:33

replicated sharing the same parameters.

00:18:36

And their train simultaneously in

00:18:38

parallel asynchronously. Um and of it

00:18:43

no no kind of synchronisation and

00:18:45

that's that's hard well and so and this

00:18:47

most common scenario you just have a

00:18:49

single thread for and a single neural

00:18:52

network that your training I'm gonna

00:18:54

cover that first okay so coming to how

00:19:05

these us how how these structures

00:19:08

actually map to a large packages. So

00:19:13

the data loader especially having

00:19:16

multiple data threads that have

00:19:19

callbacks once they're finished in the

00:19:21

main thread and so on. And the are

00:19:23

covered by the threads package be will

00:19:26

go over that briefly and you have the

00:19:31

trainer in fort itself there is no

00:19:36

notion of a train or the the researcher

00:19:39

just right still in training will this

00:19:42

is not common but not uncommon as well

00:19:46

like I mean Indiana for example every

00:19:49

and just racer and training will but in

00:19:51

cafe you would have a trainer that

00:19:55

takes your in fort above of the neural

00:19:57

network and these all were and it would

00:20:00

us all the whole thing so torch at all

00:20:04

also tries to maintain being very raw

00:20:07

the researcher and like fifteen twenty

00:20:09

lines of code writes their own training

00:20:12

will and that gives them the

00:20:13

flexibility to change it in weird ways

00:20:15

it needed. And in the third lecture we

00:20:18

will see how this kind of flexibility

00:20:20

would be very useful for example when

00:20:22

you're training adversarial networks

00:20:24

and the neural network and the cost

00:20:30

function are covered by the and then

00:20:32

package and we will go over it briefly.

00:20:35

And the optimiser is covered by the

00:20:37

optimal package we will also go over

00:20:39

that where we have a platter of we

00:20:44

didn't based optimisation algorithms

00:20:47

it's next starting with the and then

00:20:50

packages started and then package and

00:20:52

then go to adopt them and then lastly

00:20:54

threads because the loading is boring

00:20:57

so the and then packets as they said in

00:21:03

the in the last lecture as it just

00:21:05

briefly touched upon and towards the

00:21:09

neural network packets is has this

00:21:11

notion of building your neural networks

00:21:13

as stacks of Lego blocks and various

00:21:17

structures. So what we have is we call

00:21:25

containers and we have modules. So

00:21:29

containers are these very is structures

00:21:35

that implement. Um that that stack your

00:21:39

modules and and different ways I have a

00:21:41

visualisation coming up for that in a

00:21:42

second. And Montana and modules are

00:21:45

basically the the actual computations

00:21:50

that you want one for example in this

00:21:52

case a convolution over spatial just

00:21:58

images into D with three input feature

00:22:03

map sixteen out for feature maps and a

00:22:05

five but like curl that is added to

00:22:08

this sequential there. So that it's

00:22:10

back and the and the tenets activations

00:22:15

added right after that the now these

00:22:17

two are the input comes through the

00:22:20

convolution and the output of the

00:22:22

convolution goes that the tenets that

00:22:24

pitted and this goes into this max

00:22:26

pulling and so on so it's like the

00:22:28

sequential container is basically just

00:22:31

a linear container that passes the

00:22:33

input through all of these containers

00:22:36

and then give you the output of the

00:22:37

last layer of discontent or in this

00:22:42

example this this short example

00:22:46

implements this particular neural

00:22:48

network using this this seventy lines

00:22:53

of code and one one thing if if your

00:23:01

family with other packages and you come

00:23:03

to torch one thing to keep in mind is

00:23:05

that we implement the self mikes as

00:23:11

lots so max we do have a soft max there

00:23:14

but as most people who would know the

00:23:18

canadian it's computations for soft max

00:23:22

are unstable. So what most packages do

00:23:26

is they call the layer soft max and

00:23:28

they give the output says the soft max

00:23:30

but the compute the gradients of the

00:23:33

log so max we actually wanted to be

00:23:36

more transparent from the beginning so

00:23:38

you actually have a lyrical soft max if

00:23:41

you wanna shoot yourself in the foot go

00:23:43

for it. But you have a lyrical lots of

00:23:46

Mike that does the right thing we all

00:23:49

for example is use lots of mikes so you

00:23:52

would if you looked at the basic

00:23:54

example go to basic controls you would

00:23:57

actually like pretty much just use that

00:24:00

so each of these modules and then and

00:24:05

then package they have a common

00:24:07

interface that they have to define even

00:24:10

though they can like I mean these are

00:24:12

these are functions these three

00:24:15

functions are essential functions that

00:24:17

to define. And they can have custom

00:24:20

functions that they use in other ways.

00:24:24

So the three functions are about this

00:24:28

there's a typo here are update but big

00:24:32

right into it and I grabbed parameters.

00:24:35

Um a bit out but computes the output

00:24:40

given the input so vital to have a fix

00:24:43

a deducted is FFX and it gives out

00:24:48

reply. And a big red input computes. Um

00:24:55

DY by DX basically the gradient with

00:25:02

respect to the input the gradient of

00:25:06

the the input with respect to the

00:25:07

output. Um yeah good and then there is

00:25:14

accurate parameters so the the the the

00:25:17

the module you that you define can be

00:25:19

parametric or nonparametric so max

00:25:22

pulling for example is a nonparametric

00:25:26

module it doesn't have any parameters

00:25:28

that indians. Um but convolution for

00:25:31

example. And convolutional networks.

00:25:35

And it tries to change its filters in a

00:25:38

way that improves define the loss of

00:25:40

your network. And this is a parametric

00:25:43

might do. So it has a set of weights

00:25:45

and biases that are defined inside it.

00:25:47

So for a layer like max pulling you

00:25:50

don't need to define the I good

00:25:52

parameters so you can just leave it as

00:25:54

is but if you do have parameters in

00:25:57

your in your module you would want to

00:26:00

define this function that computes the

00:26:03

gradients with respect to the

00:26:05

parameters that you have in your market

00:26:07

and also part of the and then package

00:26:12

are lost functions like mean square

00:26:15

laws or like negative likelihood loss

00:26:17

or marginal loss and so on. And the

00:26:20

loss functions have a similar

00:26:23

interface. They have and update output

00:26:27

and then update grad input since lost

00:26:29

functions are not parametric you don't

00:26:34

have the egg right parameters there no

00:26:39

coming to the containers. Um the end

00:26:42

package has several several containers

00:26:45

the most common that you saw earlier is

00:26:48

sequential the modules and the white

00:26:51

there the take the input feature input

00:26:54

that is of of four channels and they it

00:26:57

just sends the input through and then

00:27:00

as a spits the output that and then

00:27:04

there's the con cat container which

00:27:08

takes the inputs. And then it sense the

00:27:12

same input to to these two sites that

00:27:19

it has can cat has get basically

00:27:22

creates a pipe for every input so what

00:27:24

this what this structure if you're

00:27:27

actually is is this whole thing is a

00:27:29

con cat which has to sequential as and

00:27:33

that and the sequential stem cell cell

00:27:35

four layers. So the contact a dual here

00:27:39

has two pipes that the input goes

00:27:41

through each of these the same input

00:27:44

goes three to these can get types and

00:27:47

then it has separate outputs that are

00:27:49

then concatenated together to give you

00:27:52

a single out but and then there's also

00:27:56

the the parallel container that let's

00:28:00

see you're given and input of two

00:28:01

channels and you have to and and you

00:28:04

have to to see controls that are added

00:28:09

to it to pipes it gives each of these

00:28:13

channels to each of these separate

00:28:15

pipes and then it gets outputs that it

00:28:17

concatenated together and sends to the

00:28:20

next layer. So as you've seen already

00:28:23

in these cases a container can have

00:28:27

other containers inside it you can

00:28:29

basically compose these things in a

00:28:31

very natural way you can we you can

00:28:36

compose complicated networks like resin

00:28:39

that or or go on that just using these

00:28:43

three can actually just using

00:28:47

sequential yeah just using these three

00:28:49

contenders you can actually create duh

00:28:52

de Vere structure that Google net is

00:28:55

and it's and and very the lines a code.

00:29:00

So getting to the could have back an of

00:29:08

the neural network packets as they

00:29:10

showed earlier using could afford

00:29:13

portrait answers was very very natural

00:29:16

like you had to change one line

00:29:20

similarly the and then packages also

00:29:23

equally natural to use if you have a

00:29:26

model that you define to actually

00:29:29

transfer the model to could all you

00:29:31

have to do is call colon put on the

00:29:33

model and it automatically now sits on

00:29:35

the GPU and it expects inputs to itself

00:29:39

from to be could at answers that's also

00:29:42

sit on the jeep you and now this model

00:29:44

for which computes the update out the

00:29:48

is done on the GPA so very easy to use

00:29:55

a very natural to use you never feel

00:29:58

like you're doing something special for

00:29:59

the GPU next comes the and then graph

00:30:04

package I don't I only have one slide

00:30:06

on this because and then graph and

00:30:08

there's not much to the end where

00:30:11

packets it's very very powerful the

00:30:14

ending graph package introduces

00:30:17

composing neural networks in a

00:30:20

different way instead of composing them

00:30:22

in terms of containers and modules all

00:30:25

you have to do is chained modules one

00:30:27

after the other. So an example is

00:30:31

probably the best way to showcase this.

00:30:33

Um in this example let's say you have

00:30:36

an let's see you have and then graph

00:30:41

where you want to create a a two layer

00:30:44

I'm not be with the tennis nonlinearity

00:30:50

what you do is you create some dummy

00:30:52

input layer is just for a best practise

00:30:55

is this is the actual air and entered

00:30:57

identity open close bracket. And then

00:31:00

graph. Um basically has is overloads

00:31:05

the call operator so that the second

00:31:07

bracket that tells you what it

00:31:09

disconnected do. So the input here is

00:31:12

not connected to anything else it is

00:31:14

the first later in your a neural

00:31:18

network. So it just has an empty

00:31:20

bracket not coming to the next part. Um

00:31:23

you could the first do there where but

00:31:27

you have a an actual air that is

00:31:31

connected to a linear layer that is

00:31:35

connected to input which is this

00:31:36

identity layer. And this whole thing is

00:31:39

not created in one shot and this is the

00:31:44

first it in there. And then you create

00:31:47

the next linear there which connects to

00:31:52

hedge one here which is the first in

00:31:54

there and that gives you the output.

00:31:57

And then when you want to create you

00:31:59

and then got you just define the the

00:32:02

input and the output module there that

00:32:06

that you want your and then grab to map

00:32:08

to so you create what you call in and

00:32:12

entity module. It's a short for a graph

00:32:15

module where in the first set of

00:32:18

parentheses you give all the inputs to

00:32:21

you a neural network. And in the second

00:32:23

set of the prep fancies you give all

00:32:26

the outputs you want from the neural

00:32:28

network and the ML P.s created and you

00:32:32

use it exactly like how you used the

00:32:35

previous and then modules it has the

00:32:38

same interface everything is the same

00:32:40

all it does is it looks at it basically

00:32:46

looks at what's connected to what and

00:32:48

it just creates a competition graph

00:32:51

there and there's not much else to and

00:32:57

then we have to be honest like I mean

00:32:59

it has some useful things like you can

00:33:02

actually but using graph is you can

00:33:06

actually create a visualisation of your

00:33:08

graph where if you have a very complex

00:33:11

crafted we it would be useful to see

00:33:14

what's going on in the graph so you

00:33:17

create. Um an SVG file that shows the

00:33:21

structure of your graphic descriptions

00:33:23

of each layer and your graph and how

00:33:25

they're connected. Um and you also have

00:33:30

a mode where if you haven't ever at

00:33:34

runtime in your neural network the and

00:33:37

then grab can automatically spit out

00:33:41

and that's VG file with the whole grass

00:33:43

structure. And with more that but

00:33:48

different colours for the which no D

00:33:53

ever a card in in case you want to like

00:33:55

visually see which of your neural

00:33:57

network module actually filled and come

00:34:00

like had some runtime error apart from

00:34:03

that and then grabbed is very basic

00:34:04

grey useful to create complicated

00:34:08

things like weird LSTM or other

00:34:12

frequent modules then I come to the

00:34:17

optimal package the option package is

00:34:21

written in a way where it knows nothing

00:34:24

about you know networks optimist

00:34:26

basically just as a bunch of

00:34:28

optimisation algorithms. Um including

00:34:31

and non non graded additional buttons

00:34:35

like a line search algorithms. And it

00:34:38

basically once a function of I go to F

00:34:41

of X a W where W or the parameters of

00:34:47

your system annexes the input actually

00:34:50

does any and care about ex the input it

00:34:53

just once why go to have of doubly. So

00:34:56

in this small example here I'm just

00:34:58

showing at the interface that often

00:35:00

takes you can havoc on fake that

00:35:02

defines all the parameters of your

00:35:04

optimisation. And for for each of your

00:35:08

training sample can create a function

00:35:11

that does that that that that does F

00:35:17

affects. Um and then you can pass that

00:35:21

too often but as you D in this example

00:35:24

very you're doing stochastic gradient

00:35:26

descent you pass the function the

00:35:29

function that computes a affects you

00:35:31

pass an X which is the parameters of

00:35:33

your system that you're trying to

00:35:35

optimise and the configuration and

00:35:38

often in as you D will run on this

00:35:41

function. It's a slightly different

00:35:44

it's it's it's decoupled from neural

00:35:47

networks for a very good reason we want

00:35:49

to do you like right the optima package

00:35:52

to be very generate like a black box

00:35:55

optimiser that you can just plug into

00:35:59

other places well the up in package has

00:36:03

a wide range of algorithms implemented

00:36:06

your standard stochastic gradient

00:36:09

descent averages you DOBFTS conjugate

00:36:12

gradients it out it impacts our as prop

00:36:16

they started line search this is an

00:36:20

interesting one and that's true of SCDR

00:36:24

our prop. And most recently C mas I

00:36:28

haven't really I haven't figured out

00:36:32

what the full form is but it's some guy

00:36:36

contributed this very recently. Um my

00:36:42

favourites here are as GD and adam and

00:36:47

our mess prop the kind of nice

00:36:50

everything else is I only used in

00:36:52

passing. So how does the opt in package

00:36:57

work for neural networks itself. So in

00:37:00

the end then package we have any

00:37:04

powerful a function call get parameters

00:37:08

that what it does is your network has

00:37:12

several modules several modules that

00:37:14

each of them can be parameterised let's

00:37:17

say you had three convolutional errors

00:37:18

each of them has their one parameters

00:37:21

that map to separate memory regions

00:37:24

still they call their on my logs

00:37:26

they're just sitting in different parts

00:37:27

of memory what we what get parameters

00:37:31

does is when you call this it maps all

00:37:35

of the parameters of and all the

00:37:38

parameters of your current neural

00:37:41

network on to a single contiguous

00:37:44

storage. And then re maps that answers

00:37:48

of each of these layers onto that

00:37:51

storage using the offsets and the

00:37:54

strides. And what that would give you

00:37:57

is a single vector that you can pass to

00:38:04

your optimisation package and

00:38:07

optimisation packaged oh oh oh

00:38:10

something have and the optimisation

00:38:16

packages doesn't have to know where

00:38:17

there's a neural net for or anything

00:38:20

else. It just once a vector of

00:38:22

parameters that it once to optimise and

00:38:24

so the and then package has this call

00:38:28

get parameters that will do that for

00:38:29

you it will remap all your parameters

00:38:32

to a single vector that you can that

00:38:34

then pass into the optimal package this

00:38:37

is probably the only harry detail the

00:38:41

hole and then a pin thing but it's a

00:38:44

very important detail and several

00:38:46

people have shot themselves in the foot

00:38:48

in the past using this okay no let's

00:38:56

actually look at how this example the

00:38:59

same example I've given is gonna map to

00:39:03

a neural network. So you want to define

00:39:07

this function F affects that racks are

00:39:10

the input the parameters up your

00:39:13

network. And you want to compute the

00:39:16

the neural network gradients DFTX which

00:39:22

are the gradients with respect to the

00:39:23

weights. And returned them and then the

00:39:26

LCD step is done after that so an

00:39:31

example here let's let me call my

00:39:34

function F well that basically computes

00:39:37

of of of affects scum W it say selects

00:39:43

a training example it loads to training

00:39:46

example let's say select the training

00:39:49

example from random right over here

00:39:52

actually size the training the next

00:39:55

training example in this in this table

00:39:58

called data and the inputs are and they

00:40:08

in the sample of one and targets and

00:40:12

sample of two inputs are the impostor

00:40:14

neural network targets are what you

00:40:15

wanted to be or what you call what your

00:40:18

loss function expects to compute the

00:40:22

laws. So if we first use your the way

00:40:25

with respect your rates because if you

00:40:28

have a previous optimisation instead

00:40:31

the gradients are sitting there

00:40:33

accumulated already. So is just zero

00:40:36

the gradients and the gradients

00:40:40

articulated in and all of a neural

00:40:42

networks to accommodate batch methods

00:40:45

when you're not doing when you context

00:40:48

you compute the batch in one shot so

00:40:50

you just compute a large about sample

00:40:53

by sample and the great escape

00:40:54

accumulated there. And this is very

00:40:57

useful when you're doing memory hungry

00:41:02

methods of optimisation especially so

00:41:08

what you do here have to reset the

00:41:10

gradients as you call criterion colon

00:41:14

for criterion is your loss function

00:41:16

model Colin forward inputs model call

00:41:19

important puts it returns the output.

00:41:22

So and they lost function takes the

00:41:24

output of you know network and the

00:41:25

target that if you wanted to be it

00:41:28

computes a loss. And then you call

00:41:31

model colon backward inputs comma the

00:41:36

gradients. So models backward which

00:41:39

computes but the big red input and

00:41:42

accurate parameters in one shot it

00:41:44

takes the inputs and the gradients with

00:41:47

respect to the output. So the great

00:41:49

interest illegally out that are given

00:41:51

from the backward call off your loss

00:41:53

function. And those are passing as a

00:41:55

second parameter inputs as the first

00:41:57

parameter that computes the that

00:42:02

basically will accumulate into DLDX the

00:42:06

gradients with respect to the weights

00:42:08

and then you sure you return the the

00:42:11

loss that you computed. And the LDX

00:42:15

which is weird to respect to the

00:42:18

weights. And then that closure that you

00:42:22

just define is called FE well right so

00:42:26

you define your LC parameters in this

00:42:28

case because you're doing as you D

00:42:30

learning a southern undertake DK which

00:42:33

is that how much lower it has to drop

00:42:35

off per sample weight decay momentum.

00:42:38

Um and for all he box in your training

00:42:42

little you for all the many batches you

00:42:47

have I guess you just call up in not as

00:42:51

CD of that function. Um and the X which

00:42:56

is the parameters of your network. And

00:43:00

at the SGD configuration parameters

00:43:03

which specify language and so on. And

00:43:07

that's it and the return value here is

00:43:11

one of them is the the loss. And you

00:43:18

just accumulate that and printed out to

00:43:20

make sure that your model is going down

00:43:22

in Los if your model goes up the noise

00:43:24

that's nice. Um that's the often

00:43:29

packets it might be a little dance. But

00:43:32

it's really really powerful and if you

00:43:35

don't understand that at the end of

00:43:38

like two or three we will be pointing

00:43:40

you guys to links to three notebooks

00:43:44

that you can go home and work on in

00:43:47

your own time they will have commons

00:43:49

they will take you to the basic example

00:43:51

of how to do things. Um and lastly the

00:43:58

threads package the threads package. Um

00:44:02

so don't laugh at the next slide

00:44:07

there's a small that there that's

00:44:09

funny. It's mostly an accident so we

00:44:14

created that threads package and at

00:44:15

some point I was writing example code

00:44:18

for myself on how to do data loading

00:44:22

using the threads packets. And they

00:44:25

call those the that frightful donkeys.

00:44:30

And I like I first and open source I

00:44:35

never like actually looked into why I

00:44:37

called it donkeys. But many people in

00:44:41

that arts comedy actually call data

00:44:43

loading threads donkeys. Um so the

00:44:49

examples here are just screen shots

00:44:51

from my my example so they might have a

00:44:55

variable called donkeys and like you

00:44:57

know so basically the way that that

00:45:00

trends package works is it creates

00:45:02

thread tools you can submit arbitrary

00:45:06

functions to distasteful and that

00:45:09

function will get I executed in so one

00:45:11

of the threads in that dreadful and you

00:45:15

can also specify return callback that

00:45:19

executes in the main thread once the

00:45:21

the thread finishes its computation. So

00:45:27

the way you create these threads is

00:45:28

actually very simple you just ask for a

00:45:34

as many times as you want you have some

00:45:37

initialisation functions that have to

00:45:38

be run when the threat is initialised

00:45:41

this can be like loading the functions

00:45:43

that you will call later and so on. And

00:45:48

there is a mode called shared serialise

00:45:50

which is very powerful in the threads

00:45:53

package what this does is it shares all

00:45:59

the ten serious between threads between

00:46:02

the main thread and all the worker

00:46:04

threads. And this this this is really

00:46:08

powerful because one when you're

00:46:10

returning cancers from your thread

00:46:14

spread bore to your main threat you

00:46:15

don't have to amend copy. It's all very

00:46:18

seamless you don't have to this

00:46:19

serialisation or D sterilisation. And

00:46:22

if you want to do hog well training you

00:46:25

can basically just created dreadful

00:46:27

ties in you know network to each of

00:46:28

your threads. And the net the the the

00:46:32

network will automatically be shared

00:46:34

among all threads and you can write

00:46:36

your training inside your thread and

00:46:40

it'll be a synchronous hog well then

00:46:42

it's very fast with like zero overhead

00:46:45

you don't have to collective parameters

00:46:48

to parameters server do the update and

00:46:50

send them back and so on. Um this is

00:46:54

the creating the threads and the slide

00:46:56

here showcases how you use the threads

00:47:00

there's one function that's the most

00:47:02

important it's call ad job the ad job

00:47:05

function takes an arbitrary close your

00:47:09

that you can define. Um you just as you

00:47:13

just defined a complication that you

00:47:15

want to do and that's the first

00:47:19

argument and the second argument is a

00:47:23

callback that is run in the main thread

00:47:25

once you finish doing this computation

00:47:29

and and the thread in in the date it's

00:47:33

right. And in the main thread in this

00:47:36

case for example what I did was and in

00:47:41

the data thread is basically loading a

00:47:43

particular a training sample of bad

00:47:46

size and that sample the that function

00:47:49

here returns inputs and labels and it

00:47:52

returns inputs and labels because the

00:47:54

main thread. And in the main thread the

00:47:59

closure that I defined separately the

00:48:01

function you have the inputs and the

00:48:04

labels that are sitting on the CPU that

00:48:07

come in and these are just some data

00:48:11

logging how much time it's taking to

00:48:14

look low the data and so on. And you

00:48:17

here input CP label sepia sitting on

00:48:20

the CPU there float answers inputs the

00:48:23

labels here or could it answers I copy

00:48:27

over the contents in the float enters

00:48:28

over to the correct answers to transfer

00:48:30

them but you be you I define my have to

00:48:34

well which is zero the great into was

00:48:37

like the parameters forward the art

00:48:39

that's forward the inputs to the model

00:48:42

get the outputs forward the outputs and

00:48:45

the labels to the loss function get the

00:48:47

our and then compute the great great

00:48:50

introspective the outputs of the neural

00:48:52

network past into the neural network

00:48:55

itself. And then returned the the

00:48:58

parameters of the neural network and de

00:49:00

lots and this is defined elsewhere when

00:49:04

you call get parameters on your neural

00:49:07

network the parameters and the grad

00:49:09

parameters the return. And then finally

00:49:12

I call up in the as you D here you can

00:49:14

replace as you deal with your favourite

00:49:16

algorithm for optimisation. So that's

00:49:21

it and this basically you went through

00:49:25

piece by piece a complete training will

00:49:28

for almost all case of neural networks

00:49:33

except like you know if you do weird

00:49:35

stuff like additional training or

00:49:38

something like for almost all

00:49:40

supervised cases at least. Um oh you

00:49:48

went through all the examples and the

00:49:50

last slide there was basically what

00:49:53

we're gonna cover in the next lecture

00:49:54

after you are sure ben just a

00:49:56

congenital models we will do a complete

00:49:59

example. It's only a hundred and sixty

00:50:02

lines ish so don't feel intimidated by

00:50:05

a complete example. We will do a

00:50:07

complete example of using and then up

00:50:09

him and threads for image generation we

00:50:13

will be if you look at the autograph

00:50:15

package and how you how to use it. And

00:50:21

then I will finally talk about sports

00:50:23

net which is a a small helper framework

00:50:28

that is sitting on top of torture and

00:50:30

then and all these things to abstract

00:50:34

of a common patterns that you do like

00:50:37

for example data loading data

00:50:39

augmentation. And all this stuff is

00:50:43

basically code that you copy paste from

00:50:45

one script to another. And towards that

00:50:47

kind of implements these for you in a

00:50:51

nice way and I'll briefly talk about

00:50:54

the that's the end of this session and

00:50:58

if you have questions I'll take them

00:51:00

now I did promise you this lecture was

00:51:15

gonna be boring there's a questionnaire

00:51:20

no okay so I just wanted to know who

00:51:34

easy is to use maybe layers you you

00:51:38

created the through doing then gruff.

00:51:41

And the mean either sequential sore

00:51:43

parallels or content. So it's extremely

00:51:46

natural when you create the G module in

00:51:50

the and then graph package. So once you

00:51:53

create wasted no it's the next okay

00:51:57

once you create the you module here.

00:52:00

This thing now can be added it's a

00:52:02

standard layer it can be added to

00:52:05

containers and so okay when you say

00:52:08

it's the standard lay your you the the

00:52:12

parameters are distinct if you are the

00:52:16

multiple times or the parameter Sir if

00:52:18

you add in mark if you had the same

00:52:21

thing multiple times the parameters are

00:52:23

not distinct okay but also the state or

00:52:27

not distinct and usually you wouldn't

00:52:28

wanna do that. You what you can do use

00:52:30

you can call clone on this thing and

00:52:33

that will create a replica okay thank

00:52:36

you alright and things for the talk on

00:52:42

what you normally do for testing and

00:52:44

debugging so before like I used to just

00:52:51

use this debugger called model debug

00:52:55

it's open source it's installed the

00:52:58

like the package manager. Um these days

00:53:01

I'm using the FB debugger package that

00:53:05

there's also open source. Um and

00:53:09

usually the FB debugger packet has a

00:53:11

mode where if you hit an error it will

00:53:13

automatically going to the debugger.

00:53:16

And you can and then like see what's

00:53:18

wrong for example. And that's very

00:53:21

useful and for testing. I just write

00:53:24

unit test for like all of my players

00:53:27

sense well okay thanks a thanks a lot.

00:53:45

So I mean I'm you know user and what I

00:53:48

find cool is the transparency between

00:53:51

the CPUNGU E. you know exactly run

00:53:56

Michael and see you just check for

00:53:58

dimensions and stuff and then if you

00:54:00

heavy workload ship it to the server is

00:54:02

or any thing like this and it works or

00:54:05

just have a bunch of if statements. Um

00:54:08

you don't have to do an if statement

00:54:10

you can write one single function

00:54:12

called cast that will like cast of all

00:54:17

no network into to keep you or we have

00:54:23

a another function called a course star

00:54:27

set the fall tensor type that it you

00:54:34

can set the default ends are tied that

00:54:35

you do operations and you can first

00:54:39

check it on CPU by setting the default

00:54:41

answer that for example float answer.

00:54:43

And then you can switch it to GPU and

00:54:46

then like all the declarations that you

00:54:48

do but forged a tensor alike by default

00:54:51

in the neural network they get created

00:54:54

with the the jeep you answered thanks

00:54:58

but well them memory monitoring

00:55:07

transfer from the subject you wore I

00:55:09

mean you start responded to the users

00:55:11

yes okay it's as they showed in the

00:55:14

last example it yeah here is it yeah

00:55:21

yeah so as they should hear the CPU

00:55:24

tense yours are not automatic like if

00:55:26

you get the sepia cancer as an input

00:55:28

your GPU module for example it will

00:55:30

just tear you you have to transfer them

00:55:35

yourself to the jeep you there's no

00:55:37

ambiguity there is no like debugging

00:55:39

issues there okay so users no more

00:55:48

question we're already doing so again

Share this talk:

Conference Program

59:34

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

2368 views

55:38

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

427 views

01:01:02

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

331 views

55:14

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

815 views

55:57

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

342 views

01:08:04

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

2156 views

49:29

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

275 views

52:43

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

151 views

45:40

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

2659 views

52:33

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

1704 views

01:05:51

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

1406 views

01:04:41

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

2251 views

Recommended talks

32:23

Limbic system using Tensorflow
Gema Parreño Piqueras, Tetuan Valley / Madrid, Spain
Nov. 26, 2016 · 3:31 p.m.

624 views

01:03:36

Component Analysis for Human Sensing
Fernando De la Torre, Carnegie Mellon University
Aug. 29, 2013 · 11:07 a.m.

399 views

Torch 2
Soumith Chintala, Facebook

Embed

Transcriptions

Conference Program

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

Recommended talks

Limbic system using Tensorflow
Gema Parreño Piqueras, Tetuan Valley / Madrid, Spain
Nov. 26, 2016 · 3:31 p.m.

Component Analysis for Human Sensing
Fernando De la Torre, Carnegie Mellon University
Aug. 29, 2013 · 11:07 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Torch 2 Soumith Chintala, Facebook

Embed

Transcriptions

Conference Program

Deep Supervised Learning of Representations Yoshua Bengio, University of Montreal, Canada July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning Alison B Lowndes, NVIDIA July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers Panel July 4, 2016 · 4:16 p.m.

Torch 1 Soumith Chintala, Facebook July 5, 2016 · 10:02 a.m.

Torch 2 Soumith Chintala, Facebook July 5, 2016 · 11:21 a.m.

Deep Generative Models Yoshua Bengio, University of Montreal, Canada July 5, 2016 · 1:59 p.m.

Torch 3 Soumith Chintala, Facebook July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers Panel July 5, 2016 · 4:21 p.m.

TensorFlow 1 Mihaela Rosca, Google July 6, 2016 · 10 a.m.

TensorFlow 2 Mihaela Rosca, Google July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning Mauricio Breternitz, AMD July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session Mihaela Rosca, Google July 6, 2016 · 3:21 p.m.

Recommended talks

Limbic system using Tensorflow Gema Parreño Piqueras, Tetuan Valley / Madrid, Spain Nov. 26, 2016 · 3:31 p.m.

Component Analysis for Human Sensing Fernando De la Torre, Carnegie Mellon University Aug. 29, 2013 · 11:07 a.m.

Klewel SA

What is Klewel?

Follow Us

Contact Us

Torch 2
Soumith Chintala, Facebook

Deep Supervised Learning of Representations
Yoshua Bengio, University of Montreal, Canada
July 4, 2016 · 2:01 p.m.

Hardware & software update from NVIDIA, Enabling Deep Learning
Alison B Lowndes, NVIDIA
July 4, 2016 · 3:20 p.m.

Day 1 - Questions and Answers
Panel
July 4, 2016 · 4:16 p.m.

Torch 1
Soumith Chintala, Facebook
July 5, 2016 · 10:02 a.m.

Torch 2
Soumith Chintala, Facebook
July 5, 2016 · 11:21 a.m.

Deep Generative Models
Yoshua Bengio, University of Montreal, Canada
July 5, 2016 · 1:59 p.m.

Torch 3
Soumith Chintala, Facebook
July 5, 2016 · 3:28 p.m.

Day 2 - Questions and Answers
Panel
July 5, 2016 · 4:21 p.m.

TensorFlow 1
Mihaela Rosca, Google
July 6, 2016 · 10 a.m.

TensorFlow 2
Mihaela Rosca, Google
July 6, 2016 · 11:19 a.m.

AMD's Open Compute and Open Source cross platform solutions for Machine Learning
Mauricio Breternitz, AMD
July 6, 2016 · 1:59 p.m.

TensorFlow 3 and Day 3 Questions and Answers session
Mihaela Rosca, Google
July 6, 2016 · 3:21 p.m.

Limbic system using Tensorflow
Gema Parreño Piqueras, Tetuan Valley / Madrid, Spain
Nov. 26, 2016 · 3:31 p.m.

Component Analysis for Human Sensing
Fernando De la Torre, Carnegie Mellon University
Aug. 29, 2013 · 11:07 a.m.