{
  "$type": "site.standard.document",
  "canonicalUrl": "https://rednafi.com/python/string-interning/",
  "description": "Learn how Python's string interning optimizes memory by caching strings and using sys.intern() for custom string caching to improve performance.",
  "path": "/python/string-interning/",
  "publishedAt": "2022-01-05T00:00:00.000Z",
  "site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
  "tags": [
    "Python",
    "Performance"
  ],
  "textContent": "I was reading the [reference implementation] of [PEP-661: Sentinel Values] and discovered an\noptimization technique known as String interning. Modern programming languages like\nJava, Python, PHP, Ruby, Julia, etc, performs _string interning_ to make their string\noperations more performant.\n\nString interning\n\n> String interning makes common string processing operations time and space-efficient by\n> caching them. Instead of creating a new copy of string every time, this optimization\n> method dictates to keep just one copy of string for every appropriate immutable distinct\n> value and use the pointer reference wherever referred.\n\nConsider this example:\n\nRunning this will print True on the console. The is operator in Python is used to check\nwhether two objects refer to the same memory location or not. If it returns True, it\nmeans, the two objects surrounding the operator are actually the same object.\n\nThis is kind of neat if you think about it. In the above snippet, instead of creating a new\ncopy when y is assigned to a string that has the same value as x, internally, Python\npoints to the same string that is assigned to x. This is only true for smaller strings;\nlarger strings will create individual copies as usual. The exact length that determines\nwhether a string will be interned or not depends on the implementation and you shouldn't\nrely on this implicit behavior if your code needs interning. See this example:\n\nThis will print False on the console and the strings aren't interned.\n\nExplicit string interning\n\nPython's sys module in the standard library has a routine called intern that you can use\nto intern even large strings. For example:\n\nHere, the strings are interned and running the snippet will print True on the console.\n\nWhat strings are interned?\n\nCPython performs string interning on constants such as Function Names, Variable Names,\nString Literals, etc. This [PyObject snippet] from the CPython codebase suggests that when a\nnew Python object is created, the interpreter is interning all the compile-time constants,\nnames, and literals. Also, Dictionary Keys and Object Attributes are interned. Notice this:\n\nIn both of these above cases, the print statement will print True on the console -\nconfirming the fact that dictionary keys and object attributes are interned. Having interned\nattributes and keys means that the access operation is faster since the string comparison\noperation is now just doing a pointer comparison.\n\nWhen explicit string interning might come in handy?\n\nOne use case that I've found is - interning large dictionary keys. Dictionary keys are in\ngeneral, interned automatically. However, if the key is large - something like a 4097 bytes\nhash value - Python can choose not to perform interning. Here's an example:\n\nThis will print False as the key in this case won't be interned. Dictionary value access\nis slower if the key isn't interned. Let's test that out:\n\nThis prints:\n\nThe above script creates and accesses a dictionary with interned and non-interned keys 10000\ntimes. The time difference is quite huge. Non-interned dict creation and accession are in\nfact, 33 times slower than its interned counterpart.\n\nWe can circumnavigate this limitation by using explicit string interning via the sys\nmodule as follows:\n\nThis prints:\n\nHere, implicitly and explicitly interned dict creation and key access are almost equally\nfast.\n\nFurther reading\n\n- [String interning in Python]\n- [Python docs: sys.intern]\n\n\n\n\n[reference implementation]:\n    https://github.com/taleinat/python-stdlib-sentinels/blob/main/sentinels/sentinels.py\n\n[pep-661: sentinel values]:\n    https://hugovk-peps.readthedocs.io/en/latest/pep-0661/\n[pyobject snippet]:\n    https://github.com/python/cpython/blob/7d7817cf0f826e566d8370a0e974bbfed6611d91/Objects/codeobject.c#L537\n\n[string interning in python]:\n    https://arpitbhayani.me/blogs/string-interning-python/\n\n[python docs: sys.intern]:\n    https://docs.python.org/3/library/sys.html#sys.intern",
  "title": "String interning in Python"
}