Saturday, May 14, 2016

Why Tuples, when Python already has Lists ?

One of my student, very pertinently asked me, "Why do we need Tuples, when we already have a data structure called list ? Tuples are almost same like Lists, except that they are immutable. So, what's such a big deal about being immutable ?"

This question, inspired me to write this post and add some clarity on when to prefer "Tuples" over "Lists" in Python. I am assuming that the basic understanding of what a Tuple is and what a List is more or less understood by the reader, but is not able to decide which one to choose over the other. I intend to concentrate more on situations when it would make more [ programming ? ] sense to use "Tuples" rather than "Lists" to help answer the question.

Few distinctions between "Tuple" and "Lists" are :

  1. List is a data structure for storing and manipulating a collection of items. Tuple is not a data structure in the same sense, even though it helps to store a collection of items.
  2. Lists, by convention are used for storing collection of items that are of the same types [ homogeneous ]. Where as Tuples, by convention are used for storing collection of items that are of different types [ heterogeneous ]. Beware that this is only a coding style preference and the language as such does not support any such restriction.
  3. Individual items in a List can be modified and updated. Tuples are immutable, and thus its individual items cannot be modified or updated.
  4. The number of methods supported in a List are much more than those for a Tuple, since Lists allow us to manipulate its collection of items and thus need a lot more methods to support those manipulations.



Consider a scenario where we need to find out the cheapest flight for travelling from city A to city B.

At some point in our calculation, we would want to store all the flight fares in a collection that will allow us to sort it. The best data structure that we can use for this scenario is a List. It will help us collect all the flight fares in one single structure. And it also allows us to move around the position of the flight fares in the collection, so that we can then re-order them in a sorted order. Also, going by convention, since the items in this collection are all of the same type [ flight fares as integers ] we should prefer to choose a "List" instead of a "Tuple" here.

Ex : flight_fares = [ 4500, 2300, 3570, 5990, 1900 ]

This list can now grow as and when we need to add more flight fares to the collection. "Lists" in Python allow a simple "append" method to achieve this.

Consider another scenario, where we need to store the flight passenger's seat allocation information.

We need to store the passenger's name, flight number and seat number to distinguish one passenger from the other. Using a "List" to represent this data would be an overkill, since after creating this collection, we no longer intend to either modify it or add any more items to it. Using a "List" that comes built-in with its repository of methods, which we never intend to use would be like "using a canyon to kill a mosquito".

It makes more programming sense to use a "Tuple" to represent this data. Also, going by the convention of using a "Tuple" to store heterogeneous items, it makes perfect sense to store a string that holds the passenger's name, an integer that holds the flight number and another string that holds the seat number. Remember that, being heterogenous is not the only criteria for using a "Tuple". There is no harm in storing all items of type "string" [ homogenous ] in a "Tuple".

The main criteria that tilts the balances in favour of a "Tuple" is the arrangement of the items in the container. We need a container that stores specific items at a specific location and has the capability of not allowing those items to be modified and also not alloying extra items to be included in it.

Ex : passenger_info = ("Amit", 119, "12A")

In this tuple, the position of each item has a specific connotation for the programmer. The first item in the Tuple represents the passenger's name, the second represents the flight number, and the third represents seat number. We can use index operator to individual seek these values or use Tuple unpacking to assign these values to 3 separate variables as below :

Ex : name, flight_num, seat_num = passenger_info

or

Ex : name, flight_num, seat_num = ("Amit", 119, "12A")

A good analogy to understand the difference of "Lists" and "Tuples" is to compare them with "Arrays" and "Unions" [ or even structs ] in the C Programming language. "Lists" are more analogous to "Arrays" in C Programming language and "Tuples" and more analogous to "Unions" / "structs" in C Programming language.

To extend the above flight example, suppose we need to store the information of all the passengers who have checked-in, we would once again use a "List". Each item in this "List" would be a "Tuple" that represents a single passenger's information. Notice that, all the items in this "List" are still homogenous [ all are Tuples ]

Ex : checked_in = [ ("Amit", 119, "12A"), ("Sanjay", 119, "20W"), ("Dinkar", 119, "19F"), ]

Now, lets take a different scenario and try to understand the use of combining various data structures together to form a more meaningful solution.

Lets consider the following runs scored by a cricket player in IPL against various other IPL teams.


To represent all the runs scored in a data structure [ and probably add more runs when we plays with other teams ] we can consider a List.

runs = [7, 9, 40, 80, 4 ]

The guideline being that the data being stored is homogeneous [ integers ]. We can easily use "append" method of list to add more items to this list when this player scores runs with other teams.

To help commentators announce which team he scored the maximum and minimum number of runs, we can create a "dictionary" that maps his runs scored to the team names as shown pictorially below :


To better analyse his performance, we might be interested in his strike rate [ number of runs scored per 100 balls ]. The data for this available as shown below :


Lets extract only the strike rates [ all floats ] and team names from the above data. This data too can be stored in a dictionary to help the commentators announce against which teams he had the best and worst strike rates. This dictionary is pictorially shown below :

To print this information, we can write a simple "for" loop as below :

for str_rate in sorted(player_dict):
     print str_rate, player_dict[str_rate]


Looking at the strike rates, it might look tempting to say that he is a very good player. A more detailed analysis would require us to look for strike rates of only those innings where he played for at least 10 balls or more.

This requires us to store his runs scored and balls faced information in one single unit of data, and then may be postpone the calculation of finding his strike rate as a later time. The code would look something like this :

for runs, balls in some_list:
      if balls > 9:
          str_rate = 100 * runs/balls
          print str_rate, player_dict[str_rate]

To store runs scored and the corresponding balls faced, we need to store them as members of a Tuple. ( runs, balls ). Every first item in this tuple represents the number of runs scored for the number of balls faced as mentioned in the corresponding second item in that tuple. This type of data storage is analogous to the concept of "unions" in C programming language. In contrast to this, a list is analogous to an "array" in C programming language.
 
runs_scored = [ (7, 2), (9, 2), (40, 40), (80, 95), (4, 1) ]

runs_scored is a list of such tuples. We can "append" a new tuple for every other match that the player runs scores in. We can very easily iterate over this list of tuples as follows :

for runs, balls in runs_scored:
      print runs, balls

In fact, to make things even more easier, we can have these tuples directly stored as keys for our earlier dictionary !!!

We can have statements like :

player_dict[(7, 2)] = "RCB" .. and so on.

Dictionary keys can be any data type that is "immutable". And since tuples are "immutable", we can very easily use tuples as dictionary keys.

This is pictorially depicted below :



This dictionary can be easily accessed as below :

for runs, balls in player_dict:
      if balls > 9:
          str_rate = 100 * runs/balls
          print str_rate, player_dict[str_rate]


This shows another use case where we can use Tuples as keys in dictionaries. Hopefully, these examples should give a good guideline for understanding when to prefer using Tuples over Lists. If you still find this confusing, then please post your questions in the comments below.


No comments:

Post a Comment