I'm making a basic feedforward neural network to solve the XOR gate problem.
Standard settings: input layer + hidden layer + output layer, the constant learning rate of 0.01 and the number of epochs is 500.
Sigmoid activation all the way. Stochastic/Gradient descent for backpropagation.
the hidden layer has 2 neurons. The input and output data:
input = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
output = [[0.0], [1.0], [1.0], [0.0]]
Now here's the problem: I know bias is a (column) vector, and you complete a cycle (forward + back) on a sample data. The predictions after training look like this:
( 0.4954120458511844 )
( 0.5081637529087711 )
( 0.5153967874989785 )
( 0.5653967874989785 )
Compared to when I set bias as a matrix (number of rows is input.rows) and instead train full sample data per cycle, the predictions are:
⎛ 0.18379659987542804 ⎞
⎜ 0.8220424701617579 ⎥
⎜ 0.8217815808742437 ⎥
⎝ 0.18653256456589742 ⎠
which are the correct ones?
I can post full code here, but I am certain the problem is from biases I just don't know why?
EDIT As I said in comments, the reason may be from Backpropagation part (Stochastic Gradient Descent) Here's the full code (yes it's in Swift, don't ask why) and I am using Surge Matrix library
It's LONG THOUGH:
import Surge
// XOR TABLE DATA
let inputDataAsArray: [[Double]] = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
let outputDataAsArray: [[Double]] = [[0.0], [1.0], [1.0], [0.0]]
let inputData: Matrix = Matrix(inputDataAsArray)
let outputData: Matrix = Matrix(outputDataAsArray)
var inputData_samples : Array> = Array()
var outputData_samples : Array> = Array()
for i in 0..([inputDataAsArray[i]]))
outputData_samples.append(Matrix([outputDataAsArray[i]]))
}
let size = inputData.rows
let neurons = 2 // NUMBER OF NEURONS IN HIDDEN LAYER
var weights0 : Matrix = random(rows: inputData.columns, columns: neurons)
var biases0 : Matrix = Matrix(rows: 1, columns: neurons, repeatedValue: 0.0)
var weights1 : Matrix = random(rows: neurons, columns: outputData.columns)
var biases1 : Matrix = Matrix(rows: 1, columns: outputData.columns, repeatedValue: 0.0)
print("Running...")
let alpha = 0.01
let loops = size * 500
var sampleIndex = 0
for i in 0..(uncheckedBounds: (lower: 0, upper: size - 1)))
let a0 = inputData_samples[j]
let output = outputData_samples[j]
let z1: Matrix = a0 * weights0 + biases0
let a1: Matrix = sigmoidMatrix(x: z1)
// LAYER 2
let z2 : Matrix = a1 * weights1 + biases1
let a2 : Matrix = sigmoidMatrix(x: z2)
// let cost = cross_entropy(size: size, a: a2, y: output)
// BACKPROPAGATION
// LAYER 2
var dz2 : Matrix = subtractMatrix(x: a2, y: output)
let dw2 : Matrix = divideMatrix(x: transpose(a1) * dz2 , y: size)
let db2 : Matrix = divideMatrix(x: dz2, y: size)
// LAYER 1
dz2 = dz2 * transpose(weights1)
let dz1 : Matrix = sub(y: 1.0, x: a0)
* transpose(a0) * dz2 // multiply(x: part1, y: sub(y: 1.0, x: part2))
let dw1 : Matrix = divideMatrix(x: transpose(a0) * dz1 , y: size)
let db1 : Matrix = divideMatrix(x: dz1, y: size)
weights0 = subtractMatrix(x: weights0, y: mul(alpha, x: dw1))
biases0 = subtractMatrix(x: biases0, y: mul(alpha, x: db1))
weights1 = subtractMatrix(x: weights1, y: mul(alpha, x: dw2))
biases1 = subtractMatrix(x: biases1, y: mul(alpha, x: db2))
}
for sample in inputData_samples{
let z1: Matrix = sample * weights0 + biases0
let a1: Matrix = sigmoidMatrix(x: z1)
let z2 : Matrix = a1 * weights1 + biases1
let a2 : Matrix = sigmoidMatrix(x: z2)
print(a2.description)
}
Select the correct answer from above options