LLMs are great geo-aware predictors
[Paper: GeoLLM: Extracting Geospatial Knowledge from Large Language Models]
Summary by Adrian Wilkins-Caruana
Have you ever gotten lost while traveling? Losing your bearings when you’re in a new place is a common problem, one that doesn't happen as often when you're in a familiar place like your hometown. That’s because your brain builds up a sense of direction over time, which lets you navigate your surroundings with ease. This sense of place is a cognitive map that helps you understand many things about your environment and the people in it. Is this sense of direction and of place unique to humans? Do LLMs have a sense of place? A new study by researchers at Stanford suggests that they do. The researchers found that LLMs can answer a wide array of geospatial questions, about topics like population, asset wealth, and demographics.
The researchers first worked to determine what kind of geospatial location formats LLMs can understand. If I were to ask you, "Where are you right now?" you would probably answer with an address like "42 Wallaby Way, Sydney." To make sense of this, you’d need to know where in the world Sydney is, and then where in Sydney 42 Wallaby Way is. Another way to answer this question is with a set of latitude and longitude coordinates . (According to Google Maps, the fictional dental practice from Finding Nemo has the coordinates –33.8690005, 151.2091858). Unless you spend a lot of time looking at maps, you're unlikely to recognize those coordinates as being located in Sydney but, with some understanding of latitude and longitude, you can probably determine that it must be somewhere in the southern and eastern hemispheres.
The researchers used an additional format for location that shares information about some nearby locations. They devised a way to generate these nearby locations automatically using a free API. For example, here are the locations of places near the Calyon Building on 6th Avenue in New York City:
Nearby Places:
"
0.6 km South-West: Theater District
0.7 km North: Columbus Circle
0.7 km East: Midtown East
0.9 km South-West: Midtown
1.0 km West: Hell’s Kitchen
1.2 km North: Lincoln Square
1.3 km South-West: Garment District
1.4 km South-East: Turtle Bay
1.4 km South: Jan Karski Corner
1.4 km South: Midtown South
"
The researchers then turned their attention to how LLMs can use these location formats to answer questions about various places, and how to measure their accuracy. They used several datasets of info about population densities, home values, mean income, infant mortality rates, and more. To retrieve this kind of information from the LLM, using a method they dubbed GeoLMM, the researchers made a prompt that looks like this: "<Location information>. What is the <Question> on a scale from 0.0 to 9.9?" For example, to ask about the population density of Sydney, the prompt would be "Coordinates: -33.8690005, 151.2091858, Address: ..., Nearby places: ..., What is the population density on a scale from 0.0 to 9.9?" They also restricted GeoLLM's answer to three tokens, which forced it to give a numerical answer like "7.2" or "3.5."
To measure the accuracy of GeoLLM's answers, the researchers had to turn the actual data into the 0.0–9.9 scale that GeoLLM uses. To do this, they scaled the values of the data to this range. (If the original data wasn’t uniform, then they distributed the values evenly across the range.) Once they’d done that, when GeoLLM answered a lot of these questions, they could measure how well its answers correlated with the actual data using the Pearson correlation coefficient. They then fine-tuned several LLMs so that they became accustomed to answering each specific geospatial question.
A fine-tuned GPT-3.5 outperformed other LLMs (Llama 2, GPT 2, and RoBERTa) and several other non-LLM baselines. Across nine tasks, GPT-3.5's correlation coefficient was between 0.55 and 0.87, which is really quite impressive! The researchers also used an ablation study to show that the Nearby Places format (used as part of the prompts) is crucial for improving the LLM's performance — without it, the LLM’s answers don't correlate as well with the actual data. (The addresses were the next most helpful piece of info and, unsurprisingly, the coordinates were the least helpful.) You can see how well GeoLLM works in the figure below: The colors (green is good) show the absolute error in quantitative questions (eg, what’s the population density?) asked about each given place. The XGBoost model does much worse than the GPT-based model, showing that LLMs can be viewed as trainable best-in-class predictors for geographic models.
This study highlights just how much knowledge an LLM accumulates during its training. Even though the LLM’s predictions were rarely perfect, they were still quite good, and they show that the LLMs have some understanding of the world we live in. One interesting application of this work would be to use the model to fill in missing data in datasets, or to provide a rough estimate of a geospatial value when the actual data isn't available. This could be especially helpful in places where data is scarce or where it's difficult to collect data, like in remote areas or in countries with limited resources.