Reinforcement learning for optimization of variational quantum circuit architectures