In BitTorrent v2 there’s pieces root key (string) which has root sha256 of a file encoded in binary form, in documentation there’s written:
"pieces root" is the the root hash of a merkle tree with a branching factor of 2, constructed from 16KiB blocks of the file. The last block may be shorter than 16KiB. The remaining leaf hashes beyond the end of the file required to construct upper layers of the merkle tree are set to zero. As of meta version 2 SHA2-256 is used as digest function for the merkle tree. The hash is stored in its binary form, not as human-readable string.
I need to extract this hash to use it on my torrent tracker, so in info web page users could see original hashes of files of torrent, how do I do that?
How could I decode that binary string and I don’t know if those are concatenation of all piece hashes.
PHP or C is preferred or maybe some docs.
I’m a noob regarding encoding, so please explain thoroughly.
Thanks a ton!!
I tried unpack() function, but I’m missing something.
2
Answers
I wrote a windows command-line tool for extracting/calculating Merkle root hashes.
It can be used to search for desired files among trackers that have acquired BitTorrent v2 support to find seeds for reviving dead torrents for example.
Usage:
Open a command prompt and do the following:
The tool will output all root hashes of files with their names and sizes. Feel free to give feedback.
I looped through "file tree" dictionary concatenating all directories and file names, extracted file hashes passing each "pieces root" key to bin2hex() function and compiled this code for windows.
The hash as stored in the torrent file is not encoded, it’s in its native representation that computers deal in: a sequence of bytes. In the case of SHA2-256 that would be 32 bytes (256 bits).
If you need it representable in text then you’ll have to encode it. There are many ways to do this. Hexadecimal is a common choice, also frequently used to display the infohash of a torrent.
As the BEP says, the pieces root is the root hash of a merkle tree, it can’t be obtained by concatenation of individual block hashes.
It can only be computed from the torrent contents themselves. So if you don’t have the data you can’t recompute it, you can only extract it from the torrent file. But since it uses a fixed construction (independent of the piece size) the pieces root is always the same for files of equal content.