- 02L – Modules and architectures

02L – Modules and architectures

Course website: http://bit.ly/DLSP21-web
Playlist: http://bit.ly/DLSP21-YouTube
Speaker: Yann LeCun

Chapters
00:00:00 – Welcome to class
00:00:38 – Non-linear functions
00:14:34 – Q&A
00:28:09 – Softargmax and softargmin
00:38:10 – Logsoftargmax
00:47:14 – Cost functions
00:58:39 – Architect...
Course website: http://bit.ly/DLSP21-web
Playlist: http://bit.ly/DLSP21-YouTube
Speaker: Yann LeCun

Chapters
00:00:00 – Welcome to class
00:00:38 – Non-linear functions
00:14:34 – Q&A
00:28:09 – Softargmax and softargmin
00:38:10 – Logsoftargmax
00:47:14 – Cost functions
00:58:39 – Architectures: multiplicative interaction
01:09:48 – Mixture of experts
01:27:50 – Parameter transformations

#PyTorch #NYU #Yann LeCun #Deep Learning #neural networks
– Welcome to class - 02L – Modules and architectures

– Welcome to class

02L – Modules and architectures
2021年07月14日 
00:00:00 - 00:00:38
– Non-linear functions - 02L – Modules and architectures

– Non-linear functions

02L – Modules and architectures
2021年07月14日 
00:00:38 - 00:14:34
@ One of the reasons why ReLU is better in deep networks than say Sigmoid is that the gradient in backward pass after each sigmoid nonlinearity gets smaller (multiplied by around 0.25), but in ReLU-like nonlinearities, the gradient does not get smaller after each layer (the gradient is one in positive part) - 02L – Modules and architectures

@ One of the reasons why ReLU is better in deep networks than say Sigmoid is that the gradient in backward pass after each sigmoid nonlinearity gets smaller (multiplied by around 0.25), but in ReLU-like nonlinearities, the gradient does not get smaller after each layer (the gradient is one in positive part)

02L – Modules and architectures
2021年07月14日 
00:08:33 - 01:42:27
– Q&A - 02L – Modules and architectures

– Q&A

02L – Modules and architectures
2021年07月14日 
00:14:34 - 00:28:09
– Softargmax and softargmin - 02L – Modules and architectures

– Softargmax and softargmin

02L – Modules and architectures
2021年07月14日 
00:28:09 - 00:38:10
– Logsoftargmax - 02L – Modules and architectures

– Logsoftargmax

02L – Modules and architectures
2021年07月14日 
00:38:10 - 00:47:14
– Cost functions - 02L – Modules and architectures

– Cost functions

02L – Modules and architectures
2021年07月14日 
00:47:14 - 00:58:39
– Architectures: multiplicative interaction - 02L – Modules and architectures

– Architectures: multiplicative interaction

02L – Modules and architectures
2021年07月14日 
00:58:39 - 01:09:48
@Alfredo Canziani ~ the topic is "multiplicative modules", thank you - 02L – Modules and architectures

@Alfredo Canziani ~ the topic is "multiplicative modules", thank you

02L – Modules and architectures
2021年07月14日 
00:59:02 - 01:42:27
– Mixture of experts - 02L – Modules and architectures

– Mixture of experts

02L – Modules and architectures
2021年07月14日 
01:09:48 - 01:27:50
at  Yann says that we can do a Non-linear classification with mixture of linear classifiers which are gated, isn't it still linear classifier? Why is it non-linear, what is it that makes it non-linear classification. - 02L – Modules and architectures

at Yann says that we can do a Non-linear classification with mixture of linear classifiers which are gated, isn't it still linear classifier? Why is it non-linear, what is it that makes it non-linear classification.

02L – Modules and architectures
2021年07月14日 
01:14:38 - 01:42:27
– Parameter transformations - 02L – Modules and architectures

– Parameter transformations

02L – Modules and architectures
2021年07月14日 
01:27:50 - 01:42:27
What is the meaning of the update rule when the parameter vector is the output a function [at time ] ? As the name implies, w is the output of a function, so how can you update the output? - 02L – Modules and architectures

What is the meaning of the update rule when the parameter vector is the output a function [at time ] ? As the name implies, w is the output of a function, so how can you update the output?

02L – Modules and architectures
2021年07月14日 
01:31:21 - 01:42:27
do the ideas for update rules for W in the example @ come from $dw = \frac{\partial H}{\partial u} \cdot du$ ? - 02L – Modules and architectures

do the ideas for update rules for W in the example @ come from $dw = \frac{\partial H}{\partial u} \cdot du$ ?

02L – Modules and architectures
2021年07月14日 
01:33:00 - 01:42:27

Alfredo Canziani

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。

Timetable

動画タイムテーブル