Home / Design Lab 13 (When You Wish Upon A*)

Design Lab 13: When You Wish Upon A*

The questions below are due on Thursday May 09, 2019; 09:55:00 PM.
 
Partners: You have not yet been assigned a partner for this lab.
You are not logged in.

If you are a current student, please Log In for full access to this page.

music for this problem

Files

This lab description is available as a PDF file here. Code is available here, or by running `athrun 6.01 getFiles`.

Goals: In this lab, you will experiment with A* and heuristics to solve a complicated search problem.

1) Getting Started

Do this lab with your assigned partner. You and your partner will need a lab laptop, or you may use your personal machine, provided you have installed the lib601 software. Get today's files by running:

athrun 6.01 getFiles

or from the link above.

2) Preparation

Last week, we used remarkably concise search algorithms to find paths through complicated mazes. Today we will use slight variations of those search algorithms to solve an even larger problem: planning routes for driving across the USA.

This lab will make use of publicly-available highway data from the U.S. Federal Highway Administration1.

Our Python representation for these data consists of two main classes (defined in highways.py and imported into designLab13.py):

  • Location: An instance of this class represents a single point on a map. Each instance includes a unique identifier id_number, and latitude and longitude coordinates (latitude and longitude, respectively). Instances also contain an attribute name, which contains a (more-or-less) human-readable string identifier. Some locations are not named; for these locations, name will contain None.
  • Link: An instance of this class represents a bidirectional highway between two locations. Each instance contains attributes begin and end, which hold instances of Location representing the two ends of the link. Some instances also contain an attribute name, which contains a (more-or-less) human-readable string identifier. In addition, each instance has an attribute max_speed, which contains the maximum speed you are allowed to drive along it.

The following structures and procedures are also defined in highways.py and imported into designLab13.py:

  • location\_from\_id: a dictionary that maps each location's unique id number to the corresponding instance of Location
  • location\_from\_name: a dictionary that maps the name of a location to the corresponding instance of Location
  • links: a list containing each Link in the database.
  • distance(loc1, loc2): a function that takes two ID numbers and returns an estimate of the distance between the locations they represent, in miles, incorporating an approximation for the curvature of the earth. This will serve as our cost function.
  • locations\_matching\_name(state, name): a method that takes two strings as input, and returns the names and ids of matching locations as a list of tuples. For example, locations\_matching\_name('IL','PRINCETON') returns [('ILPRINCETON C-N', 17001075), ('ILPRINCETON E', 17001089), ('ILPRINCETON', 17001088)].

2.1) Helpers

To find routes through these maps, it will be convenient to know which locations are neighbors (i.e., which Locations are connected by single Links). Since locating neighbors is slow, we will find the neighbors once, and record them in a dictionary.

Open the file designLab13.py and complete the definition of the procedure make_neighbors_dict, which takes no arguments and returns a dictionary that maps each location's unique ID number to a list of ID numbers corresponding to its neighbors.

Eventually, we will also want an easy way to compute the total cost of a path. Write a function path_cost that takes a list of id numbers as input, and returns the total cost of that path in miles (the sum of the distances between locations). A skeleton has been provided in designLab13.py.

2.2) Representation

Use the methods and variables from designLab13.py and highways.py to answer the following. Decimal answers must be accurate to four decimal places.

Scale

How many locations exist in the database?

How many unique location names exist in the database?

Data

What is the ID number of the location whose name is 'ILPEORIA CBD'?

Enter (as a Python string) the name of the location whose ID number is 8000302:

Enter the Python list containing the IDs of the neighbors of location 23000331:

What is the cost of the following path, in miles?
[17001088, 17001089, 17001106, 17001147, 17001168]

3) Search

Now we want to use the search procedures we developed over the last couple of weeks to find paths in the map. As usual, we will need to implement a successor function to specify this search domain.

Check Yourself 1:

What is a good representation for state in this search problem?

How would you implement your successors function, given this representation?

Implement the highway_successors function.

As a test, we will use search to run a BFS to find a path from a place near the geographic center of the U.S. (Smith Center, KS; ID number 20000071), to Cambridge (ID number 25000502).

Make a note of the number of states in the path found by BFS, and also of the total cost of the path found.

3.1) Uniform Cost

Now we will update our code to use uniform cost search (which is implemented as a function uc_search in designLab13.py). Note that this function is different from the one in SL13, as it only returns the path, rather than a (path, cost) tuple.

Since UC search is trying to find the path with smallest cost, we will need to represent the cost of the links in our successor function. We will do this by making our successor function to return tuples of the form (state, cost).

Check Yourself 2:
Look at the code for the uc_search function, and make sure you understand it before moving forward.

Define a new successor function for use with uc_search, or modify your old successor function so that it can be used with uc_search..

As a test, use uc_search to find a path between Smith Center, KS, and Cambridge, MA (the same two locations we just tested with BFS).

Please answer the following questions about the path you found from 20000071 to 25000502. Decimal answers must be accurate to one decimal place.

How many states were in the "shortest path" found using BFS?

What was the total cost of the "shortest path" found using BFS, in miles?

How many states were in the "shortest path" found using UC Search?

What was the total cost of the "shortest path" found using UC Search, in miles?

3.2) Viewing Your Paths

Looking at lists of ID numbers isn't very informative. As an alternative, we can view our paths on a map to get an idea of how good the results are.

highways.py contains a function print_kml(path) which takes a list of node IDs as input and prints a KML string, as well as a function save_kml(path, filename), which saves the corresponding KML string in a file.

You can view your path by copying/pasting the KML string into:
http://display-kml.appspot.com/

Use the above tool to visualize your BFS and UC paths. We'll discuss them during the checkoff.

Checkoff 1:
Discuss your results with a staff member. How do the paths returned by BFS and UC Search differ? Why is this?

4) Evaluation

We will now try running uniform cost search in a few different trials to try to get a feel for how it performs in different circumstances. In order to understand and measure the performance of uc_search, you are free to modify or augment the implementation provided in designLab13.py.

4.1) Specifics

Run your search through the following scenarios, and make note of the total cost of the path found, as well as the number of states expanded, in each case.

  • Find a path from Pittsfield, MA (ID 25000379) to Cambridge (25000502), without a heuristic.
  • Find a path from Altoona, PA (ID 42001780) to Cambridge, without a heuristic.
  • Find a path from Smith Center, KS (ID 20000071) to Cambridge, without a heuristic.

Answer these tutor questions about your results.

Pittsfield, MA (25000379)

How many states were expanded by uniform cost search with no heuristic?

What was the total cost of the path found with no heuristic?

Altoona, PA (42001780)

How many states were expanded by uniform cost search with no heuristic?

What was the total cost of the path found with no heuristic?

Smith Center, KS (20000071)

How many states were expanded by uniform cost search with no heuristic?

What was the total cost of the path found with no heuristic?

4.2) Generalizing

In this section, we will try to find a general trend between the cost of the path returned by a uniform-cost search starting at the center of the US, and the number of states expanded over the course of that search.

Consider the plot below, which shows the number of states expanded over the course of a uniform cost for paths of various lengths (all of which start at Smith Center, KS):

Check Yourself 3:

Consider the geometry of the space over which we are searching and the cost function we are using. Does this graph match your expectations? Explain the relationship between the number of states expanded and the cost of the final path found.

Why does the graph seem to grow faster than linearly for small paths, then appear to grow linearly, and then remain constant for very long paths?

It may help to note that the height of the contiguous United States (the distance from the northern border to the southern border) is approximately 1500 miles, and that the width of the contiguous Unites States (from east to west) is approximately 2800 miles. How do these numbers manifest in the graph above?

5) Heuristics

In uniform-cost search, we select the next node to expand that minimizes F(n)=path_cost(n)F(n) = path\_cost(n).

In this section, we will consider A* (uniform cost search with a heuristic function), in which we select the node that minimizes F(n)=path_cost(n)+H(n)F(n) = path\_cost(n) + H(n), where HH is a heuristic function that estimates the distance from nn to the goal.

Repeat your specific searches from the previous section, this time using the geodesic distance between the current state and the goal (which can be computed with the distance function) as a heuristic.

Pittsfield, MA (25000379)

What fraction of the states expanded with no heuristic are expanded with distance as a heuristic?

What was the total cost of the path found with distance as a heuristic?

Altoona, PA (42001780)

What fraction of the states expanded with no heuristic are expanded with distance as a heuristic?

What was the total cost of the path found with distance as a heuristic?

Smith Center, KS (20000071)

What fraction of the states expanded with no heuristic are expanded with distance as a heuristic?

What was the total cost of the path found with distance as a heuristic?

Check Yourself 4:
How does this heuristic affect the cost of the path returned?

Check Yourself 5:
How does the effectiveness of the heuristic relate to the total cost of the path found by the search? Why?

Checkoff 2:
Discuss your results with a staff member.

6) A Weighty Problem

It is interesting to explore a range of methods that "interpolate" between Uniform Cost Search and A*, and then go beyond A* in terms of how much they are affected by distance to goal.

A weighted version of A* has an additional parameter α\alpha that ranges between 0 and 1, and selects the next node to expand by finding the one that minimizes

F(s)=(1α)path_cost(s)+αH(s)F(s) = (1 - \alpha) \cdot path\_cost(s) + \alpha \cdot H(s)
where H(s)H(s) represents the heuristic evaluated at ss, and path_cost(s)path\_cost(s) represents the cost of the path from the starting state to ss.

For what value of α\alpha is this equivalent to Uniform Cost Search?

α=\alpha =  

For what value of α\alpha is this equivalent to A*?

α=\alpha =  

Check Yourself 6:
When α=1\alpha = 1, would you expect the algorithm to find the least-cost solution?
Check Yourself 7:
When α=1\alpha = 1, would you expect to find that the algorithm expands more or fewer nodes than A*?

Modify the uc_search procedure in designLab13.py to implement weighted A*. Experiment with a range of values values of $\alpha$ on a search that goes from Hollister, CA (id 6002971) to Cambridge, MA (id 25000502). In the box below, define two variables:
  • costs should be a list containing the cost of the path found from Hollister to Cambridge as α\alpha is varied from 0 to 1 (inclusive) with a step size of 0.01 (i.e., costs[0] should be the cost of the path found when α=0.0\alpha=0.0; costs[1] should be the cost of the path found when α=0.01\alpha=0.01; etc). Note that costs should be a list of length 101.
  • expanded should be a list containing the number of states expanded when searching from Hollister to Cambridge as α\alpha is varied form 0 to 1 (inclusive) with a step size of 0.01. Note that expanded should be a list of length 101.

Check Yourself 8:
Look at the graphs (via the "Show Detailed Results" button). Do these plots make sense?

Checkoff 3:
Discuss your results with a staff member.

7) Need for Speed (Limits)

So far, our planning has been based purely on distance, but oftentimes, when planning a route between two locations, we are actually interested in the amount of time it will take to move from one location to another. Modify your highway_successors function (and/or your make_neighbors_dict function) so that they output a cost in terms of time, rather than distance (you may assume that a person can travel at a link's speed limit for the entire length of the link).

Check Yourself 9:
Will using the distance function as a heuristic work under this new cost model? What is an example of an effective admissible heuristic under this cost model?

Run your updated search (with an effective, admissible heuristic) to find the minimum time path between Smith Center, KS (20000071) and Cambridge, MA (25000502).

What is the total time associated with this path, in hours?

What is the total length of this path, in miles?

What is the total time associated with your original UC Search result, in hours?

What is the total length of your original UC Search result, in miles?

Checkoff 4:
Discuss your results, including your choice of heuristic, with a staff member.

Use the path visualizer to visualize the UC search paths, with distance as cost and with time as cost. What is different about them? How can you explain this difference?


 
Footnotes

1http://www.fhwa.dot.gov/planning/processes/tools/nhpn/ (click to return to text)