{
  "$type": "site.standard.document",
  "canonicalUrl": "https://rednafi.com/python/deduplicate-iterables-while-preserving-order/",
  "description": "Master techniques to remove duplicates from Python iterables while maintaining original order using sets, OrderedDict, and nested deduplication.",
  "path": "/python/deduplicate-iterables-while-preserving-order/",
  "publishedAt": "2023-05-01T00:00:00.000Z",
  "site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
  "tags": [
    "Python"
  ],
  "textContent": "Whenever I need to deduplicate the items of an iterable in Python, my usual approach is to\ncreate a set from the iterable and then convert it back into a list or tuple. However, this\napproach doesn't preserve the original order of the items, which can be a problem if you\nneed to keep the order unscathed. Here's a naive approach that works:\n\nThis code snippet defines a function dedup that takes an iterable it as input and\nreturns a new list containing the unique items of the input iterable in their original\norder. The function uses a set seen to keep track of the items that have already been\nseen, and a list result to store the unique items.\n\nThen it iterates over all the items of the input iterable using a for loop. For each item,\nthe function checks if it has already been seen (i.e., if it's in the seen set). If the\nitem hasn't been seen, it's added to both the seen set and the result list. The final\nresult list contains the unique items of it in their original order.\n\nThis can be made a little terser by using listcomp as follows:\n\nDedup with ordered dict\n\nSimilar to the first snippet, this also defines dedup that takes an iterable it as input\nand returns a new list containing the unique items of it in their original order. The\nfunction uses the OrderedDict.fromkeys() method to create a new ordered dict with the\nitems of it as keys and None as values.\n\nSince an ordered dict maintains the insertion order of its keys, this effectively removes\nany duplicate items from the iterable without affecting the order of the remaining ones. The\niterable containing the keys of the resulting ordered dict is then converted into a list\nusing the list() function to obtain a list of the unique items in their original order.\n\nWhile this is quite terse and does the job with O(1) complexity, neither this nor the\nprevious solution would work for compound iterables as follows:\n\nThe next solution works on one-level nested iterables.\n\nDedup by any element of an item in a nested iterable\n\nConsider this one-level nested iterable:\n\nWe want to write a dedup function that'll allow us to deduplicate the iterable based on a\nparticular element of an item. Here, (1,1), (2, 1) are items of the iterable it and\n1 is the second element of item (2, 1). Here's how we can modify the first dedup to\nallow deduplication by nested elements.\n\nThis time, the dedup function takes in an iterable of tuples it, an element index\nindex, and a boolean lazy (defaulting to True) as arguments. The function returns a\nlist or generator of the unique items from the input iterable based on the specified element\nindex.\n\nJust as before, the function first creates an empty set seen and binds its add method to\na variable seen_add. It then creates a generator expression that iterates over it and\nyields each item if its element at the specified index isn't already present in the seen\nset. If item[index] isn't present in seen, it's added to it using the seen_add method.\n\nIf lazy is True, the function returns the generator expression verbatim. Otherwise, it\nreturns a list created from the generator expression.\n\nIn the example provided, the function is called with arguments it, 1, and False. This\nmeans that it will deduplicate the input iterable based on the second element of each tuple\nand return a list. The result is [(1,1), (1,3)].\n\nFurther reading\n\n- [How do I remove duplicates from a list while preserving order]\n\n\n\n\n[how do i remove duplicates from a list while preserving order]:\n    https://stackoverflow.com/questions/480214/how-do-i-remove-duplicates-from-a-list-while-preserving-order",
  "title": "Deduplicating iterables while preserving order in Python"
}