Skip to content

Some topics (and data) to be considered for research

As I mentioned before, it is quite challenging to come up with a real-world problem as it requires some expertise in the area of research and some critical thinking. In the following posts, I will try to come up with some ideas that could solve problems in the areas of my interest. As well as seek data that I could exploit for training or assessing. 

Here is one of them:

Optimal strategy selection (or race outcome prediction given the strategies) for Baku City Circuit Formula 1

One of the major challenges Baku City Circuit has to offer is its wind and this is something that almost none of the drivers were experiencing before they came to race to Baku. Other than that, it has the longest straight of 2.2 kilometers (1.36 miles) in the whole history of F1 races so far, which allows achieving speeds over 350 km/h with active DRS. But overall, F1 is not only about being fast, it's technology, planning, strategy and precision race and beyond.

So I'd like to develop a system that could identify the best strategy for the race if driver positions and weather conditions are known. And by strategy, the combination of tyre compound choice and pit time is usually considered. Another variation of this problem would be: knowing the strategies, to predict the outcome of the race up to a certain percentage.

It's been 5 years since Baku became host to one of the races of seasons in F1, so there have been only 5 races so far. Therefore, the challenge is that there might not be enough data to work with. But on an extended scale, data can be taken from previous years and applied in the same manner.

Data collection

For data collection, there is a great source called Ergast API. It contains almost all the data about F1 races starting from the 1950s till today. The documentation can be found here. So to check it out I used the postman to send requests to the API and searched for information about the latest race of 2021 that was held in Baku. To do that I have to send get requests to the following path: http://ergast.com/api/f1/2021/6 which gave me this output below

For more detailed information for example lets add extra variable like pitstop time:

In this case it would output the list with all pitstops and necessary information like lap number, time, duration of stop etc.

But another challenge in this case is that since 1950, many regulations in races have changed, and so did the vehicle specs and tyre compounds. I guess therefore they are missing in the data sources, however they play very important role. Starting from 2017, F1 moved to new compounds array which for the sake of simplicity they called hard, medium and soft. I belive for the last several years that information could be obtained from F1 original sources like F1 official webpage by webcraping or from livestreams.

Another remaining challenge is weather which also plays critical role on a way to victory. Unfortunately, the API doesnt provide the weather details of the race but knowing the exact date and location it could be looked up in other services via API or etc.

Leave a Reply

Your email address will not be published. Required fields are marked *