System identification has a long history with several well-established methods, in particular for learning linear dynamical systems from input/output data. While the asymptotic properties of these methods are well understood as the number of data points goes to infinity or the noise level tends to zero, how well their estimates in finite data regime are relatively less studied. This talk will mainly focus on our analysis of the robustness of the classical Ho-Kalman algorithm and how it translates to non-asymptotic estimation error bounds as a function of number of data samples. If time permits, I will also mention two other problems we study at the intersection of learning, control, and optimization: (i) learning constraints from demonstrations as an alternative to inverse optimal control, (ii) learning Markov jump linear systems with reduced discrete state-space size. I will conclude with some open problems and directions for future research.