F1 Manager Part Selection with Pandas and Linear Regression

13 min readFeb 20, 2021

Introduction

For those unfamiliar, F1 Manager, developed by Hutch Games, is a popular Formula One team management game available on several mobile platforms, such as Google Play and Apple Store. In this game, a player is demanded to create the best combination of drivers and car parts, and to use the optimal pitting and tire strategy for each track under probabilistic wet/dry conditions to beat an online opponent. Rather than control the car, you control the driver and vehicle setup. The races are short but downright entertaining, and a player can lose hours competing against various online opponents.

Problem Identification

One of the most frustrating parts about this game is the selection of equipment parts for your vehicle. Equipment can only be obtained through the awarding of parts points via Crates unlocked or bought by in-game Bucks. After reaching a certain level of points for each equipment, the player can choose to upgrade their part using the mostly-useless coins obtained through race wins. Parts are separated by types, such as Stock, Common, Rare, and Epic, which is almost irrelevant to a part’s performance (Common equipment can beat out Epic). Each type has a maximum upgrade level, and each part has a maximum set of stats that can be attained.

Each part has a set of stats that apply to the vehicle’s overall stat: Power, Aero, Grip, Reliability, and Average Pit Stop Time. As different parts are obtained and upgraded over time, the player is challenged with trying to find which set of combinations will offer the best reward: overall wins for standard Duel races or beating the Qualifying, Opening, and Final rounds in weekly Grand Prix Events.

The average person may develop a set of rules and mythos that helps guide their selection of parts and racing strategies, based on limited and one-sided observations. If you’re a scientist, you may be wondering how could I approach this problem from an unbiased standpoint that results in the most wins possible?

Today, I want to walk you through one of my experiments, that I found fun, interesting, and rewarding. Outside of the scope of this article are pit strategies and driver selection: pit strategies can fill up an entire article, but driver selection is pretty obvious. Instead, I will walk you through how I choose the best set of parts using Linear Regression, in an attempt to minimize my own bias in parts selection and increase my win-rate.

Data and Code Setup

To maintain simplicity for the article, I will hold off on full code snippets and refer to my Github repo for the source code (cleanup in progress). Short snippets will be provided as needed below. Granted, I’m a researcher, so the code won’t look as pretty as it should be, and my Panda-foo is subpar. I also left in much of my experimentation in raw detail. Nevertheless, I’m providing it free of charge!

First, we’ll want to generate a CSV of all our available parts. We’ll want to list every detail that we can, including the name, type, class, current part level (to help us track changes), and all the part’s stats for power, aero, grip, reliability, and average pit stop time in seconds. This information is provided at the parts selection menu in-game.

Figure 1. The master CSV of all available parts, entered by hand.

From here, we’ll use a bit of Python and Pandas to generate a list of all possible combinations. My inefficient code for this is found here (do better, be better). Thankfully for its small set of parts, this enumeration is tractable and only results in 8⁶=262,144 total combinations. By hand, this would be horrible to try and fill out, but luckily our computers can do this sort of math and data handling easily. It only takes a few seconds to generate all combinations, using efficient programming skills, or minutes if you’re sloppy (like me…).

Figure 2. The combination of part names and stats, generated by our program

Given our list of all potential name combinations for each type, and our master lookup table of part stats, we can generate an output CSV file of all combinations and overall vehicle stats. Due to the generation time of this combination list, it makes sense to offload it to a file for easier future access. But as parts become available and are upgraded, we will want to regenerate this file.

Fitness Model and Metric Selection

Given the available stats to choose from, there are several ways to think about how to choose parts. Do we want to maximize a single stat, such as Power or Grip? Do we want to maximize the sum of all the stats (minus in the case of pit stop time)? Do we want to minimize pit time? Is there some combination of all the stats that will overall produce a winning combination? This is where we want our model to do the heavy lifting for us, and not be subjected to our own bias.

I want to make this as unbiased as possible, and select an objective metric to evaluate my solution fitness. When we talk about fitness, fitness can be any value that represents our world model that abides by a particular set of rules. It’s basically a way of saying how “good” our model is. In our case, we want to know what rules do we want our fitness to abide by? Do we want to maximize the total number of wins on each track? Do we want to optimize the points? Do we care about wining more on wet races vs dry? Maybe we only care about wining on 4 particular tracks for our Duel Series?

Furthermore, we may also want to consider other factors: what type of tracks am I typically racing at my level? How often does it rain? Are the tracks curvy or have long straights? Is there a short or long pit lane? How does the player match-up system tie into race wins? These can all play into how different vehicle stats can result in a player’s ability to win on particular tracks. It also informs what type of data we may need to collect. But with the right experiment setup, we can make many of these factors irrelevant.

Aggregate statistics can cancel-out irrelevant details. For example, rain probability, track curvy-ness, and pit-lane size is all tied into track metrics. Track- and opponent-based factors can be averaged out across aggregate runs. We can simply bypass a lot of these details by looking at a simple metric, like aggregate race wins and points across tracks.

In selection of the fitness, I chose to examine four metrics:

average number of points won overall
average number of wins overall
average of the average number of points for each track
average of the average number of wins for each track

Take care in understanding the differences. A win only tells us how well we did against a player. Points dictate how well we performed not just against our player, but also among the AI. There may also be some objectivity behind points that shares across races and players. Furthermore, by taking the overall average, we may be missing hidden bias, such as how often a particular track was randomly chosen, or how often our race was affected by rain (certain tracks have different wet/dry and temperature probabilities). Therefore, we may wish to weight each track raced equally by taking their averages, and then averaging across the track aggregate results. We’ll examine all 4 by the end of this article.

Data Collection

With our metrics chosen, we can collect a bit of data by simply racing with random or predefined set of setups. My obvious choice was to first select a few racing setups that I thought were maybe the best, given my experience with this season. The not-so-obvious step that many may miss is to also try out a few off-the-wall setups, that adds variety for our algorithm to optimize against.

Given that we currently do not know which set of stats gives us the best edge against other players, we will want to choose the most diverse set of setups that gives us coverage over the solution space. For instance, I may have some idea of what a “good” setup is, but I also want to provide some “bad” setups, including some races with Stock parts. There are many sampling and data collection techniques for this, such as grid sampling, stratification, Monte Carlo, etc. We can also choose to train our model on a few base points, let the model decide our next setup, and provide the inference as a data point. In this case, we’d be doing Online Learning. I chose a combination of grid-sampling to initialize the model, and online learning for the selection of the rest of my points.

Figure 4 shows a snippet of some of my data collected. I chose to collect vehicle part setup by part name, track name, whether or not the qualification laps or race laps had rain (which I don’t use), whether or not I won the race, and the total points I achieved for that race. Factors that I ignore include pit and tire strategy, individual lap times, individual lap wet/dry conditions, etc. I feel these details for now are too fine-grained and harder to scale and track appropriately. They would also over-complicate my initial model.

Figure 3. A CSV of my win results, with spaces between setups for ease on eyes (Pandas has df.dropna())

Linear Regression

I don’t want to give a full explanation on linear regression, because Wikipedia and other sources can do that for us. Furthermore, we can use SciKit-Learn modeling work for us. Instead, what I will provide is a bit of motivation for why I chose a simple linear model. First off, linear models are typically one of the first things one can do for a problem. They’re very fast and efficient for both optimization and inference, and can offer quick iterations over varying models, features, and data. They do have several key issues: not every problem is linear, and they are heavily affected by outliers in the data. With that said, one should try a linear model first before trying anything fancy, like a neural network or random forest regression.

How can we apply linear regression to our problem? Let’s first point out what we’re trying to do. We want to model the fitness of our solution space, given the selection of parts and their performance stats, and the overall vehicle fitness. We can select a set of fitness values, such as win probability or average points with each setup, and apply a linear set of weights to our vehicle metrics:

Equation 1. Our fitness function: a linear combination of vehicle stats.

Another thing that can be done, for example, is to look at polynomial approximations that take into account more complex functions. This is a step I did not do this time, but am providing as a reference potentially for future work. The whole linear concept applies to the coefficients. When modeling in this type of way, we want to make sure we are not overfitting our model to the data, and apply Occam’s Razor when reasonable.

Equation 2. Example of other combinations that could be tried.

Looking at our linear equation, we can consider this in matrix form.

Equation 3. Matrix representation of data-points, coefficients, and fitness for our fitness function.

From here, I hope you can appreciate the form that we are taking. If not, don’t worry too much about it:

Equation 4. Least-Squares Linear Regression definition

Using linear algebra, we can solve for our fitness function. By representing the data-points and proposed fitness values as matrices, we can use least-squares to solve for the coefficients to be used in our fitness function. For a given race, we had an overall setup stat based on the equipment we chose with regards to power, aero, grip, reliability, and pit stop time, and a metric calculated based on either wins or points. Matrix A provides us with the empirical set of values, representing the components of our fitness function, obtained from performing each of our races. Matrix b provides us with the fitness value calculated from our metrics for each race. By solving the equation for x-hat, we can approximate a set of coefficients (the alphas) that can be applied to a given setup and calculate its estimated fitness.

A snippet from the optimization part of the code is shown here. More of the convoluted code can be seen on Github.

Code 1. Snippet of linear algebra application

Based on this fitness function, we can evaluate each of the 8⁶ combinations to calculate their fitness and choose the best fitness overall. Thanks to Python, Numpy, and Pandas, this calculation is straightforward. I’ll provide a snapshot of this particular part of the code that calculates the fitness of all combinations in a dataframe and selects the best equipment combination.

Code 2. Snippet of coefficient application and fitness calculations.

Results

Over the course of 82 races, I examined the results for metrics chosen for race points vs race wins. Furthermore, I also evaluated the treatment of results across overall averages vs track averages. Finally, I compare my own best choice chosen prior to optimization as a control. For each of the 5 metrics, I produce the model’s choice selection and race a Series 12 Duel race 5 times. I calculate the wins and the average points from each set of races, and place the results in tables 1–3 shown below:

Table 1. A table of the my best choice prior to optimization, vs the approximate solutions generated by each type of metric. The total number of wins out of 5 attempts are shown, along with the average points across each set (47 points maximum).

Table 2. A table of the vehicle’s overall stats given the options chosen from Table 1. The fitness is the value represented by the metric choice for fitness model, and thus cannot be compared to each other. It’s also why my best choice does not have a fitness.

Table 3. Given the configurations from Table 1 and the fitness model generated for each metric, these are the coefficients learned from the linear regression optimization. What can be gained is to observe which set of stats is considered important (large absolute value), vs which set is considered unimportant (close to zero) for each metric. For instance, Overall Wins saw pit stop time to be the most important, whereas both Points-based metrics looked at a near equal distribution across power and grip.

Discussion

We obtain some interesting results. Starting with Table 1, we can see that the same set of equipment was chosen for our Points-based metric. For our Wins-based metric, we had slightly dissimilar options, which solely switched between the selection for Gearbox. Table 2–3 provides a bit of reasoning behind this. For Points-based metrics, the parameters emphasized Grip (first) and Power (second) over all other parameters. Aero came in third, with Reliability surprisingly beating out Pit Stop Time for significance (this challenged my own bias). Oddly enough, the Track Average Points found pit stop time to have a positive correlation, albeit low, despite a lower pit stop time improving a racer’s stats, leading me to believe not enough data was collected for this metric.

For Win-based metrics, a high emphasis was found for Grip and Pit Time. Oddly enough, the Win-based model found little, even negative, correlation between winning and Power, which in my own experience and bias sounds wrong. Power, ableit low, appeared to be more important in the overall win model, whereas Grip and Pit Stop Time was more important in the track aggregate wins.

All models chose Compressor as the resounding Suspension part, but much disagreement occurred between the other parts. FX was also selected in consensus among the models, with my own personal setup being the only disagreement. Yet, my own choice for best setup prior to optimization appeared to choose from the same set of components as chosen by each of the models, which increases my confidence in each of the models.

How does all this stack against the control, which relied on my own experience, memory, reasoning, and side experiments? My personal best only had 4/5, or 80% wins, and an average set of points of 43.6. Compare that to the points-based metrics, the optimization was able to select a combination of parts that beat my selected components wins by 25% and average points by 1.376% points performance.

Equation 5. Error calculation for average points gained.

Not bad.

It appears that our Points-based metrics resulted in the highest win percentage and average points, signifying that a fitness based on points appears to be the best option. But due to averaging across tracks resulting in a positive coefficient for pit stop time, it appears that the first metric of taking the overall average points without equal emphasis on tracks results in the best fitness model. Evaluating based on Wins alone does not appear to afford us any benefit, although the track-based aggregates on wins did give us better than control performance. To me, the almost obvious response is “not enough data” with an N=5 sample size, but this is just a game.

Caveats

With this being just a game, I didn’t take as much rigor into my work as I would have like. I could increase the 5 result races to a large N size to provide a more deeper statement on the overall performance of the models. Furthermore, my 82 races used to train the linear model could probably be better chosen: more attempts at equal selection of tracks, a discussion and data-collection on player match-ups, random-sampling of equipment at different parts of the day, and even some work in understanding the effects of changing equipment and the F1 Manager match-up system. Even from there, I may want to consider more features than just a linear combination of stats, such as stat influence on each other, higher degree polynomials, rain-based stats, etc. I could also do some sort of PCA analysis for each component. I could even visualize each of the stats and feature spaces, since many people like to see graphs.

There’s so much more that can be done here, but since this is just a game and my time is limited, I left a lot of this alone. Feel free to try your own thing.