Neural Network Feature Importance and Feature Effect with Simple Scientific Trick
Introduction
So you built your neural network, and, based on its holdout and/or out-of-time performance metrics, it’s looking pretty good. Now you need to “sell it” to your business partners, and for that, you need to be able to explain what is happening under the hood. A lot of modelers will skip that part and say “it’s a black box and it’s difficult to really know how the network does it. I can use eli5 and SHAP to get an idea, but it’s hard to explain how it does it.”
While it is true that there is a lot going on in neural networks (hundreds of weights and biases, multiplied by an activation function), it does not mean that we cannot come up with a business explanation of how our network works.
In this article, I am going to show you a simple trick that scientists use all the time to understand and explain the natural world around us, called “Ceteris Paribus”, which translates as “Other things held constant.” It’s the only way we can truly derive causation vs correlation. We will explore how to leverage Ceteris Paribus in Python to understand how our Neural Networks work.
Building Our Neural Network: The Boston Dataset
I have already published an article on building a neural network for predicting house prices. You can find it here. For the purposes of this article, I am going to pick up right where I left off. The referenced article will provide you with all the details you need as background for the remainder of this story.
Feature Effect on Predictions: Ceteris Paribus
To truly understand how one feature affects our predictions, we need to hold all input values constant and only vary the feature that we want to study and understand. By measuring the outcome on our prediction, we can draw a clear relationship between input and prediction.
A good analogy for this is studying plant growth. If you want to really know what causes a plant to grow taller, greener, or produce more fruits, you need to isolate each individual growth factor, vary it, then measure the output and compare to the variation in input. This will give you a good idea of how that factor affects the desired outcome.
Now, in our housing example, let’s examine the effect of “# of Bedrooms” against our target outcome, Median Home Value. Based on our correlation matrix and our sns.pairplot, # of bedrooms came out as highly correlated to our outcome, so it would be interesting to see how variations in the # of bedrooms, with everything else held constant (Ceteris Paribus), will affect house prices.
First, we need to create our constant values. What values do we pick? Generally, I recommend using the median values for each input. The median will allow you to avoid extreme inputs should you decide to select a row instead. In Python, you can do so by simply doing:
Now, let’s create all the variations in “# of Rooms”. For that, we need to know what the minimum and maximum values of “# of Rooms” is in our dataset, then create an array of variation between the min and max for submitting into our model.
# create variables to store min and max value for "# of Rooms"
minimum, maximum = df[“# of Rooms”].min(), df[“# of Rooms”].max()
# create an evenly spaced array of 100 value between min and max
arr = np.arange(minimum, maximum, (maximum - minimum)/100)
Now that we generated all of our variations, let’s create the actual input to the model, which is going to be each variation plus the median values for the other factors / features, held constant.
Since we are only interested in “# of Rooms” for now, you will notice that it’s located on the 6th row, or 5th value in a python list (python lists always start at 0). So for us to vary “# of Rooms”, we will be modifying the 5th value in our python list, in the following fashion:
Great! Our list “lst” contains every variation of “# of Rooms” while everything else is held constant, at the median. Ceteris Paribus! Now, let’s submit our input “lst” into our model and check out the output.
# scale our values
lst = scaler.transform(lst)
# submit to model
predictions = model.predict(lst)
Et voila, we now have all of our predictions of each variation of “# of Rooms”. Let’s plot this out to better understand what’s going on.
plt.figure(figsize=(12, 6))sns.lineplot(test_values, predictions.flatten(), color='r')
plt.xlabel("# of Rooms")
plt.ylabel("Median House Price (Red)")
plt.ylim(0, 400000)plt.twinx()
sns.distplot(df['# of Rooms'], kde=False, bins=10)
plt.ylabel("Distribution of House Price (blue histogram)", labelpad=7)plt.title("Impact of # of Rooms on Median House Price", pad=13);
How do we interpret the above graph?
The red line measures the variation in Median House Price (y-axis) when varying “# of Rooms” (x-axis). The red line clearly shows that the higher the number of rooms, the higher the house price. In fact, it looks like the Median House Price can go from a modest $100,000 for a 1 BR to a whopping $360,000 for a 9-room house, everything else held constant, which is a $260,000 swing. Pretty impressive for just one variable!
The light blue distribution plot will help you see the actual distribution of the number of Rooms in the Boston dataset. This gives you perspective as how often each value occurs. It looks like the # of Rooms follows a normal distribution, with 6 being the most common value. Please note # of Rooms in our case is # of rooms in a house that is not a bathroom. For example, if someone has a Living/Dining room (1) + a Kitchen (1) + a Master Bedroom (1), that is a 3-room apartment.
Let’s apply this to all feature
Our next step is to examine the effect of each feature on our target variable, Median House Price. The easiest way to do this is to loop over each feature and perform the same steps we did above. Since we have 12 features, let’s graph them on a 3 x 4 grid.
You will notice I kept the same y-scale for all graphs, so that the effect is not relative but absolute. As expected, not all features have the same effect on our target variable. Some have mild inverse relationships, some have direct positive relationships.
You will want to show all the feature effects to your business partners so they can help you put the story together. This step of model validation is critical to ensure your model makes sense from a business / real-world perspective.
If you find anything funky with the relationship between your features and your outcome, you may want to revisit your data as there might be something weird going on there. If your data is correct however, you might want to question what you or your business partners think you know! That is the fun part of modeling for sure.
Feature Importance and Feature Impact
The graphs above can help us tell a great story, but it does not answer this important question: which feature is more important than the others when it comes to house pricing?
In order to examine the importance between the features, we will simply compare the maximum effect that each variable has on our target output, Median House Price, and rank them from highest to lowest. Here is a simple way to do it in Python.
impact_dct = {}
for x, y in dct.items():
df_feature = pd.DataFrame.from_dict(dct[x])
impact = abs(tdf.y.max() - tdf.y.min())
impact_dct[x] = impact
Notice the maximum impact is simply measured as the difference between the maximum house price and minimum house price for all of the variations of each feature. Let’s graph it up and rank them.
pd.DataFrame.from_dict(impact_dct, orient='index', columns=['Impact']).sort_values(by="Impact").plot(kind='barh', colormap='viridis')plt.xlabel("Impact on Median Home Value")
plt.legend().remove()
plt.title("Feature Importance on Median Home Value", pad=12);
Now we know which features are most important when assessing the house price! It looks like indeed, “# of Rooms” has the highest impact on house prices according to our neural network, followed closely by “% Lower Income”, which represents the “% lower income of the population” according to the official documentation. Notice that in the latter case, the relationship is inverse, meaning a higher % of lower income the lower the house prices.
You can use feature importance in combination with feature effect to get the complete story of your neural network. This will greatly put your business at ease, since they are now able to understand how your model prioritizes each feature and how it assesses the prices of houses.
Conclusion
I hope you enjoyed reading this article as much as I enjoyed writing it! I teach data science on the side at www.thepythonacademy.com, so if you’d like further training, or even want to learn it all from scratch, feel free to contact us on the website. I also plan to publish many articles on Machine Learning and AI on here, so feel free to follow me as well. Please share, like, connect, and comment, as Ialwayslove hearing from you. Thank you!
Comments