@ There is a bit of glossing over detail on part of the subject I see a number of confused people posting on Stack Overflow or Datascience Stack Exchange. Namely that you don't backpropagate the error value per se, but the gradient of the error with respect to a current parameter. This is made more confusing to many software devs implementing back propagation because usual design of neural nets is to cleverly combine the loss function and the output layer transform, so that the derivative is numerically equal to the error (specifically only at the pre-transform stage of the output layer). It really matters to understand the difference though because in the general case it is not true, and there are developers "cargo culting" in apparently magic manipulations of the error because they don't understand this small difference.（00:04:40 - 00:05:29）
Backpropagation in 5 Minutes (tutorial)

Let's discuss the math behind back-propagation. We'll go over the 3 terms from Calculus you need to understand it (derivatives, partial derivatives, and the chain rule and implement it programmatically.

Code for this video:
https://github.com/llSourcell/how_to_do_math_for_deep_learning

Please Subscribe! And like. And comment. That's what keeps me going.

I've used this code in a previous video. I had to keep the code as simple as possible in order to add on these mathematical explanations and keep it at around 5 minutes.

More Learning resources:
https://mihaiv.wordpress.com/2010/02/08/backpropagation-algorithm/
http://outlace.com/Computational-Graph/
http://briandolhansky.com/blog/2013/9/27/artificial-neural-networks-backpropagation-part-4
https://jeremykun.com/2012/12/09/neural-networks-and-backpropagation/
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

Join us in the Wizards Slack channel:
http://wizards.herokuapp.com/

And please support me on Patreon:
https://www.patreon.com/user?u=3191693

Forgot to add my patron shoutout at the end so special thanks to Patrons Tim Jiang, HG Oh, Hoang, Advait Shinde, Vijay Daniel & Umesh Rangasamy
Follow me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/
Signup for my newsletter for exciting updates in the field of AI:
https://goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
https://www.wagergpt.co

#backpropagation #back propagation #backpropagation example #back propagation neural network #backpropagation in neural networks #backpropagation algorithm #back propagation algorithm in neural network #neural network backpropagation #backpropagation explained #back propagation algorithm

Why is the seed not 42 as it is supposed to be!?

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:00:22 - 00:05:29

at , shouldn't there be 3 Inputs? one for each of the features including the bias?If I understand correctly, number of input layer neurons is equal to number of features in our data.Am i correct?

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:00:38 - 00:05:29

for o =for i = 1:n

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:01:02 - 00:02:40

I'm a bit lost at the dot product example atwhy are we multiplying a row of inputs on the same input node by a column of different weights? Wouldnt the value of each node in the next row be based on a column of inputs (the curent value of each node) * the weight of that each nodes connection to the next one?How many output nodes are there in this equation?

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:01:21 - 00:05:29

At Layer1 should be a scalar value since you are performing the dot product, how do you then use it to calculate Layer2?

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:01:42 - 00:05:29

in ? (This looks like the same one that 3Blue1Brown uses to me)

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:02:00 - 00:05:29

@R. D. Machinery I actually had like in mind, where the arrow heads for the tangent lines he draws should be on the other side of the curve. In fairness, at 2:52 he's plotting the derivative and the function together.

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:02:28 - 00:05:29

for l =t1 = t1(.999) - .001(Delta1/n);b1 = b1(.999) - .001(Db1/n);t2 = t2(.999) - .001(Delta2/n);b2 = b2(.999) - .001(Db2/n);t3 = t3(.999) - .001(Delta3/n);b3 = b3(.999) - .001(Db3/n);

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:02:40 - 00:05:29

I went back and looked and yeah at there's a straight line drawn really badly through the curve. It actually crosses two points on the curve quite some distance apart. lol

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:02:52 - 00:05:29

In the derivation function the (time approx. ) the slope is indeed 4 but the graph is incorrect (wrongly plotted black line representing the slope)

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:02:53 - 00:05:29

In I am a bit stuck. It's obvious f(x)' = 2 but the chain rule includes f(x)' applied to g(x) and g(x) = x², so why is there not an x² term in the derivative? It looks to me like it should be f'(g(x))g'(x) = 2(x²)2x. Where have I gone wrong following the formula? The actual calculation just looks like pulling out the constant 2 and using the simple power rule. I am confused about where the chain rule actually comes into this calculation.

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:03:25 - 00:05:29

Hello Siraj, around , you say that df/dx = (df/dx)*(dg/dx) , which is wrong, but in the screen it is stated correctly.

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:03:26 - 00:05:29

Small mistake in - you said 'the derivative of f(g(x)) is equal to the derivative of f(x) times the derivative of g(x)' where you meant to say what is actually written in the slide - (f(g(x))' = f'(g(x))g'(x) and not f'(x)g'(x)

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:03:30 - 00:05:29

- best sketch ever.

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:04:23 - 00:05:29

Hey Siraj, Can you provide a mathematical proof for the equations at , especially for layer2_error

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:04:33 - 00:05:29

@ There is a bit of glossing over detail on part of the subject I see a number of confused people posting on Stack Overflow or Datascience Stack Exchange. Namely that you don't backpropagate the error value per se, but the gradient of the error with respect to a current parameter. This is made more confusing to many software devs implementing back propagation because usual design of neural nets is to cleverly combine the loss function and the output layer transform, so that the derivative is numerically equal to the error (specifically only at the pre-transform stage of the output layer). It really matters to understand the difference though because in the general case it is not true, and there are developers "cargo culting" in apparently magic manipulations of the error because they don't understand this small difference.

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:04:40 - 00:05:29

Guess should be the partial derivative of the activation function instead of simply derivative ??Correct me if in wrong

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:04:47 - 00:05:29

in layer1_error calculation why weights1?, layer1 has weights0, so it should be weights0

Backpropagation in 5 Minutes (tutorial)

2017年04月03日　

00:04:48 - 00:05:29

チャンネル登録

Siraj Raval

※本サイトに掲載されているチャンネル情報や動画情報はYouTube公式のAPIを使って取得・表示しています。

概要カレンダー動画一覧タイムテーブルチャンネル分析

Timetable

動画タイムテーブル

よく話題になっている単語を表示する

動画数：471件

字幕を含める

@ you are showing what appears to be yet another medical dataset "medalpaca/medical_meadow_mediqa" but it is unclear how that is used.

DoctorGPT: Offline & Passes Medical Exams!

2023年08月13日　 Mark Woodworth 様　

00:13:12 - 00:18:13

@What exactly are you concatenating? You say "instruction column and input column into a single input" but the code references only the "question" column from the "GBaker/MedQA-USMLE-4-options" dataset. The question is then submitted for inference as-is, without being combined with anything as far as I can tell. Also - are the options (answer choices) and correct_answer_idx (multiple choice answer) used anywhere?

DoctorGPT: Offline & Passes Medical Exams!

2023年08月13日　 Mark Woodworth 様　

00:13:21 - 00:13:12

@ you mention SFT with the base model, but the code appears to be using the chat model

DoctorGPT: Offline & Passes Medical Exams!

2023年08月13日　 Mark Woodworth 様　

00:18:13 - 00:38:49

Why is the seed not 42 as it is supposed to be!?

at , shouldn't there be 3 Inputs? one for each of the features including the bias?If I understand correctly, number of input layer neurons is equal to number of features in our data.Am i correct?

for o =for i = 1:n

At Layer1 should be a scalar value since you are performing the dot product, how do you then use it to calculate Layer2?

in ? (This looks like the same one that 3Blue1Brown uses to me)

@R. D. Machinery I actually had like in mind, where the arrow heads for the tangent lines he draws should be on the other side of the curve. In fairness, at 2:52 he's plotting the derivative and the function together.

for l =t1 = t1(.999) - .001(Delta1/n);b1 = b1(.999) - .001(Db1/n);t2 = t2(.999) - .001(Delta2/n);b2 = b2(.999) - .001(Db2/n);t3 = t3(.999) - .001(Delta3/n);b3 = b3(.999) - .001(Db3/n);

I went back and looked and yeah at there's a straight line drawn really badly through the curve. It actually crosses two points on the curve quite some distance apart. lol

In the derivation function the (time approx. ) the slope is indeed 4 but the graph is incorrect (wrongly plotted black line representing the slope)

Hello Siraj, around , you say that df/dx = (df/dx)*(dg/dx) , which is wrong, but in the screen it is stated correctly.

Small mistake in - you said 'the derivative of f(g(x)) is equal to the derivative of f(x) times the derivative of g(x)' where you meant to say what is actually written in the slide - (f(g(x))' = f'(g(x))g'(x) and not f'(x)g'(x)

- best sketch ever.

Hey Siraj, Can you provide a mathematical proof for the equations at , especially for layer2_error

Guess should be the partial derivative of the activation function instead of simply derivative ??Correct me if in wrong

in layer1_error calculation why weights1?, layer1 has weights0, so it should be weights0

Siraj Raval

Timetable

よく話題になっている単語

.

that’s an outdated player roster from 3 years ago buddy

fwiw they went 1-3 on those picks

😂

the lie. any apple m* processor has it, even many phone processor have

@ you are showing what appears to be yet another medical dataset "medalpaca/medical_meadow_mediqa" but it is unclear how that is used.

@ you mention SFT with the base model, but the code appears to be using the chat model

機械学習のまとめとは

利用規約

プライバシーポリシー

お問い合わせ

その他のデータベース

Why is the seed not 42 as it is supposed to be!?

at , shouldn't there be 3 Inputs? one for each of the features including the bias?If I understand correctly, number of input layer neurons is equal to number of features in our data.Am i correct?

for o =for i = 1:n

At Layer1 should be a scalar value since you are performing the dot product, how do you then use it to calculate Layer2?

in ? (This looks like the same one that 3Blue1Brown uses to me)

@R. D. Machinery I actually had like in mind, where the arrow heads for the tangent lines he draws should be on the other side of the curve. In fairness, at 2:52 he's plotting the derivative and the function together.

for l =t1 = t1*(.999) - .001*(Delta1/n);b1 = b1*(.999) - .001*(Db1/n);t2 = t2*(.999) - .001*(Delta2/n);b2 = b2*(.999) - .001*(Db2/n);t3 = t3*(.999) - .001*(Delta3/n);b3 = b3*(.999) - .001*(Db3/n);

I went back and looked and yeah at there's a straight line drawn really badly through the curve. It actually crosses two points on the curve quite some distance apart. lol

In the derivation function the (time approx. ) the slope is indeed 4 but the graph is incorrect (wrongly plotted black line representing the slope)

Hello Siraj, around , you say that df/dx = (df/dx)*(dg/dx) , which is wrong, but in the screen it is stated correctly.

Small mistake in - you said 'the derivative of f(g(x)) is equal to the derivative of f(x) times the derivative of g(x)' where you meant to say what is actually written in the slide - (f(g(x))' = f'(g(x))*g'(x) and not f'(x)*g'(x)

- best sketch ever.

Hey Siraj, Can you provide a mathematical proof for the equations at , especially for layer2_error

Guess should be the partial derivative of the activation function instead of simply derivative ??Correct me if in wrong

in layer1_error calculation why weights1?, layer1 has weights0, so it should be weights0

Siraj Raval

Timetable

よく話題になっている単語

.

that’s an outdated player roster from 3 years ago buddy

fwiw they went 1-3 on those picks

😂

the lie. any apple m* processor has it, even many phone processor have

@ you are showing what appears to be yet another medical dataset "medalpaca/medical_meadow_mediqa" but it is unclear how that is used.

@ you mention SFT with the base model, but the code appears to be using the chat model

for l =t1 = t1(.999) - .001(Delta1/n);b1 = b1(.999) - .001(Db1/n);t2 = t2(.999) - .001(Delta2/n);b2 = b2(.999) - .001(Db2/n);t3 = t3(.999) - .001(Delta3/n);b3 = b3(.999) - .001(Db3/n);

Small mistake in - you said 'the derivative of f(g(x)) is equal to the derivative of f(x) times the derivative of g(x)' where you meant to say what is actually written in the slide - (f(g(x))' = f'(g(x))g'(x) and not f'(x)g'(x)