How LSTM networks solve the problem of vanishing gradients