第一句子网 > Machine Learning - Coursera 吴恩达机器学习教程 Week5 学习笔记

Machine Learning - Coursera 吴恩达机器学习教程 Week5 学习笔记

时间：2024-06-01 20:20:56

神经网络的代价函数

定义

L = 神经网络总层数

sl = 第l层的单元数（不包含bias unit）

K = output units/classes的数量

普通逻辑回归代价函数：

神经网络代价函数：

后面的正则化部分，θ矩阵的：

列数=当前层的节点数（包含bias unit）行数=下一层的节点数（不包含bias unit）

反向传播

先正向推导：

再反向求代价：

D：delta矩阵，它正好是J(θ)的偏导函数

有点复杂。暂不深究细节，先会用。

反向传播和正向传播很像，只是换了个方向：

反向传播实现

参数展开

展开参数，作为一个输入：

thetaVector = [ Theta1(:); Theta2(:); Theta3(:); ]deltaVector = [ D1(:); D2(:); D3(:) ]

在函数里面，还原出各个矩阵：

Theta1 = reshape(thetaVector(1:110),10,11)Theta2 = reshape(thetaVector(111:220),10,11)Theta3 = reshape(thetaVector(221:231),1,11)

在反向传播中的应用为：

梯度检查（Gradient Checking）

近似法求导数：

对于多个θ的矩阵，可以用近似法逐个求偏导：

当ε足够小时（比如ε = 10-4)，可以得到近似的导数值。

计算gradApprox的伪代码如下：

epsilon = 1e-4;for i = 1:n,thetaPlus = theta;thetaPlus(i) += epsilon;thetaMinus = theta;thetaMinus(i) -= epsilon;gradApprox(i) = (J(thetaPlus) - J(thetaMinus))/(2*epsilon)end;

梯度检查可以与DVec对比，检查自己求解是否正确。

这种方法的梯度检查效率比较低，在真正开始训练时要记得关闭梯度检查。

随机初始θ

如果上层每个节点的θij的值都相同（比如0），那么该节点输出到下个节点的结果就都相同，最终造成下层每个a都相同；更进一步，正向传播求时的偏导也相同，每次更新后，下层的每个a仍是相同的，相当于到了后面，只有一个特征了：

需要打破对称性，对每个θ矩阵内部引入随机数初始化：

伪代码为：

If the dimensions of Theta1 is 10x11, Theta2 is 10x11 and Theta3 is 1x11.Theta1 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;Theta2 = rand(10,11) * (2 * INIT_EPSILON) - INIT_EPSILON;Theta3 = rand(1,11) * (2 * INIT_EPSILON) - INIT_EPSILON;

整体流程

架构设计

输入节点数：x的特征维度数输出节点数：在多分析类问题中，等于类别数隐藏层的节点数：越多效果越好，但计算量越大，需要权衡默认值：1个隐藏层，若多于1个隐藏层，则每个隐藏层的节点数一样

计算流程

随机初始化权值（θ）正向传播，计算hx计算代价函数反向传播，计算偏导数用梯度检查确认反向传播是否正确。然后关闭提督检查。使用梯度下降或优化算法，来最小化代价函数，求出最佳θ。

循环对每个样本进行正向、反向传播：

for i = 1:m,Perform forward propagation and backpropagation using example (x(i),y(i))(Get activations a(l) and delta terms d(l) for l = 2,...,L

神经网络的工作过程大致如下：

作业

θ的维度，就是新特征数 * 旧特征数。因为θ的作用，就是计算出新的维度。

% Theta1 has size 25 x 401% Theta2 has size 10 x 26

作业有难度，反向传播部分代码参考了/everpeace/ml-class-assignments

nnCostFunction.m

function [J grad] = nnCostFunction(nn_params, ...input_layer_size, ...hidden_layer_size, ...num_labels, ...X, y, lambda)%NNCOSTFUNCTION Implements the neural network cost function for a two layer%neural network which performs classification% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...% X, y, lambda) computes the cost and gradient of the neural network. The% parameters for the neural network are "unrolled" into the vector% nn_params and need to be converted back into the weight matrices. % % The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...num_labels, (hidden_layer_size + 1));% Setup some useful variablesm = size(X, 1);% You need to return the following variables correctly J = 0;Theta1_grad = zeros(size(Theta1));Theta2_grad = zeros(size(Theta2));% ====================== YOUR CODE HERE ======================% Instructions: You should complete the code by working through the%following parts.%% Part 1: Feedforward the neural network and return the cost in the% variable J. After implementing Part 1, you can verify that your% cost function computation is correct by verifying the cost% computed in ex4.m%% Part 2: Implement the backpropagation algorithm to compute the gradients% Theta1_grad and Theta2_grad. You should return the partial derivatives of% the cost function with respect to Theta1 and Theta2 in Theta1_grad and% Theta2_grad, respectively. After implementing Part 2, you can check% that your implementation is correct by running checkNNGradients%% Note: The vector y passed into the function is a vector of labels%containing values from 1..K. You need to map this vector into a %binary vector of 1's and 0's to be used with the neural network%cost function.%% Hint: We recommend implementing backpropagation using a for-loop%over the training examples if you are implementing it for the %first time.%% Part 3: Implement regularization with the cost function and gradients.%% Hint: You can implement this around the code for%backpropagation. That is, you can compute the gradients for%the regularization separately and then add them to Theta1_grad%and Theta2_grad from Part 2.%% Y = zeros(m, num_labels); % m x num_labels == 5000 x 10% for i = 1:m,%Y(i, y(i)) = 1;% endY = (1:num_labels)==y; % m x num_labels == 5000 x 10a1 = [ones(m, 1) X]; % 5000 x 401z2 = a1 * Theta1'; % m x hidden_layer_size == 5000 x 25a2 = sigmoid(z2); % m x hidden_layer_size == 5000 x 25a2 = [ones(m,1), a2]; % 5000 x 26z3 = a2 * Theta2'; % m x num_labels == 5000 x 10a3 = sigmoid(z3); % m x num_labels == 5000 x 10h = a3; % m x num_labels == 5000 x 10% calculte penaltyp = sum(sum(Theta1(:, 2:end).^2, 2))+sum(sum(Theta2(:, 2:end).^2, 2));% calculate JJ = sum(sum((-Y).*log(h) - (1-Y).*log(1-h), 2))/m + lambda*p/(2*m); %scalar% calculate sigmassigma3 = a3 - Y; % 5000 x 10sigma2 = (sigma3*Theta2).*sigmoidGradient([ones(size(z2, 1), 1) z2]); % 5000 x 26sigma2 = sigma2(:, 2:end); % 5000 x 25% accumulate gradientsdelta_1 = (sigma2'*a1); % 25 x 401delta_2 = (sigma3'*a2); % 10 x 26% calculate regularized gradientp1 = (lambda/m)*[zeros(size(Theta1, 1), 1) Theta1(:, 2:end)];p2 = (lambda/m)*[zeros(size(Theta2, 1), 1) Theta2(:, 2:end)];Theta1_grad = delta_1./m + p1; % 25 x 401Theta2_grad = delta_2./m + p2; % 10 x 26% -------------------------------------------------------------% =========================================================================% Unroll gradientsgrad = [Theta1_grad(:) ; Theta2_grad(:)];end

sigmoidGradient.m

function g = sigmoidGradient(z)%SIGMOIDGRADIENT returns the gradient of the sigmoid function%evaluated at z% g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function% evaluated at z. This should work regardless if z is a matrix or a% vector. In particular, if z is a vector or matrix, you should return% the gradient for each element.g = zeros(size(z));% ====================== YOUR CODE HERE ======================% Instructions: Compute the gradient of the sigmoid function evaluated at%each value of z (z can be a matrix, vector or scalar).g = sigmoid(z).*(1-sigmoid(z));% =============================================================end

randInitializeWeights.m

function W = randInitializeWeights(L_in, L_out)%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in%incoming connections and L_out outgoing connections% W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights % of a layer with L_in incoming connections and L_out outgoing % connections. %% Note that W should be set to a matrix of size(L_out, 1 + L_in) as% the first column of W handles the "bias" terms%% You need to return the following variables correctly W = zeros(L_out, 1 + L_in);% ====================== YOUR CODE HERE ======================% Instructions: Initialize W randomly so that we break the symmetry while%training the neural network.%% Note: The first column of W corresponds to the parameters for the bias unit%epsilon_init = 0.12;W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;% =========================================================================end

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。