To ban or to not ban, that’s the pickle
Whereas Hugging Face helps machine studying (ML) fashions in numerous codecs, Pickle is among the many most prevalent because of the recognition of PyTorch, a extensively used ML library written in Python that makes use of Pickle serialization and deserialization for fashions. Pickle is an official Python module for object serialization, which in programming languages means turning an object right into a byte stream — the reverse course of is called deserialization, or in Python terminology: pickling and unpickling.
The method of serialization and deserialization, particularly of enter from untrusted sources, has been the reason for many distant code execution vulnerabilities in a wide range of programming languages. Equally, the Python documentation for Pickle has a giant purple warning: “It’s potential to assemble malicious pickle information which is able to execute arbitrary code throughout unpickling. By no means unpickle information that would have come from an untrusted supply, or that would have been tampered with.”
That poses an issue for an open platform like Hugging Face, the place customers overtly share and need to unpickle mannequin information. On one hand, this opens the potential for abuse by ill-intentioned people who add poisoned fashions, however on the opposite, banning this format can be too restrictive given PyTorch’s reputation. So Hugging Face selected the center highway, which is to try to scan and detect malicious Pickle information.