The excellent Susan Athey addresses that question on Quora, here is one excerpt:
Machine learning is a broad term; I’m going to use it fairly narrowly here. Within machine learning, there are two branches, supervised and unsupervised machine learning. Supervised machine learning typically entails using a set of “features” or “covariates” (x’s) to predict an outcome (y). There are a variety of ML methods, such as LASSO (see Victor Chernozhukov (MIT) and coauthors who have brought this into economics), random forest, regression trees, support vector machines, etc. One common feature of many ML methods is that they use cross-validation to select model complexity; that is, they repeatedly estimate a model on part of the data and then test it on another part, and they find the “complexity penalty term” that fits the data best in terms of mean-squared error of the prediction (the squared difference between the model prediction and the actual outcome). In much of cross-sectional econometrics, the tradition has been that the researcher specifies one model and then checks “robustness” by looking at 2 or 3 alternatives. I believe that regularization and systematic model selection will become a standard part of empirical practice in economics as we more frequently encounter datasets with many covariates, and also as we see the advantages of being systematic about model selection.
…in general ML prediction models are built on a premise that is fundamentally at odds with a lot of social science work on causal inference. The foundation of supervised ML methods is that model selection (cross-validation) is carried out to optimize goodness of fit on a test sample. A model is good if and only if it predicts well. Yet, a cornerstone of introductory econometrics is that prediction is not causal inference, and indeed a classic economic example is that in many economic datasets, price and quantity are positively correlated. Firms set prices higher in high-income cities where consumers buy more; they raise prices in anticipation of times of peak demand. A large body of econometric research seeks to REDUCE the goodness of fit of a model in order to estimate the causal effect of, say, changing prices. If prices and quantities are positively correlated in the data, any model that estimates the true causal effect (quantity goes down if you change price) will not do as good a job fitting the data. The place where the econometric model with a causal estimate would do better is at fitting what happens if the firm actually changes prices at a given point in time—at doing counterfactual predictions when the world changes. Techniques like instrumental variables seek to use only some of the information that is in the data – the “clean” or “exogenous” or “experiment-like” variation in price—sacrificing predictive accuracy in the current environment to learn about a more fundamental relationship that will help make decisions about changing price. This type of model has not received almost any attention in ML.
The answer is interesting, though difficult, throughout. Here are various Susan Athey writings, on machine learning. Here are other Susan Athey answers on Quora, recommended. Here is her answer on whether machine learning is “just prediction.”