Bitcoin multisig hardware signing performance (2020)
Editor's Note: Our team performed an updated series of tests. View the latest report below.
As CTO of Casa it's my job to examine every possible aspect of our system's architecture and understand which components are suboptimal so that we can make plans to implement improvements. No system is perfect, but by exploring the limits of the system we can design our user experience in order to steer clients down "happy paths" and away from rough edges.
Casa is built upon a foundation of several pillars: geographically distributed multisig, dedicated hardware devices to secure keys, thoughtfully designed user experience, and high quality client services. The dedicated hardware devices are a unique aspect of our architecture in that Casa has no control over them - and that's the point! This lack of control is crucial to Casa's security model, but it results in a far more challenging job for our engineers.
Casa supports a diversity of hardware devices and we'd like to eventually support any device that we believe meets our standards. But what are our standards? This question is what spawned the following research.
The following article covers a series of stress tests I performed to determine hardware wallet capabilities when signing multisignature transactions of varying complexity. It's important for multisig software providers to understand the limits of hardware so that we can decide which hardware to recommend, which hardware to support, and so that we can customize our wallet software to handle hardware edge cases as gracefully as possible.
The hardware
Casa is currently compatible with the following hardware except for BitBox; I happened to have those devices on hand and they are supported by Electrum so I figured I might as well test them too.
Trezor One - firmware 1.9.2
Trezor Model T - firmware 2.3.2
Ledger Nano S - firmware 1.6.1, BTC app 1.4.7
Ledger Nano X - firmware 1.2.4-4, BTC app 1.4.7
Coinkite Coldcard Mk2 - firmware 3.1.9
Coinkite Coldcard Mk3 - firmware 3.1.9
ShiftCrypto BitBox01 - firmware 7.1.0
ShiftCrypto BitBox02 - firmware 9.1.1
It's worth noting that the Coldcard Mk2 and Bitbox01 are no longer for sale; they are considered obsolete.
Why bother?
From a network perspective any bitcoin transaction that is less than 100kB in virtual size / less than 400000 weight units should be accepted and relayed by nodes as long as it's valid and pays above the minimum relay fee. I'd consider this to be the minimum complexity for which hardware devices ought to support. All of the test transactions created during my research were smaller than this.
However, the above size limits can be reached in a variety of ways. The bulk of a transaction's size / weight comes from the signatures on the transaction's inputs. Thus you can achieve a large transaction size by having many single signature inputs or by having somewhat fewer inputs but with multiple signatures per input.
The multisig setups
For my testing I used Electrum's 4.0.2 appimage on Debian Linux. Linux is a bit tricky to get working with hardware devices; you have to install python libraries and set up udev rules in order to allow your computer to communicate with them.
I created a variety of P2WSH (native segwit) multisig wallets with the hardware devices plugged in via USB. Naturally these wallets are all using bitcoin's testnet because that's what it's for! Then I funded each wallet with a deposit of 100 UTXOs.
I also discovered and filed a variety of Electrum issues while performing this research; hopefully my struggles will result in Electrum becoming more robust.
2-of-3 multisig
I started off by creating three different 2-of-3 multisig wallets:
Trezor One
Ledger Nano S
Coinkite Coldcard Mk2
Trezor Model T
Ledger Nano X
Coinkite Coldcard Mk3
Trezor Model T
ShiftCrypto Bitbox 01
ShiftCrypto Bitbox 02
Then 8 different times I created a 100 input transaction that was 10.5 kB unsigned. The unsigned PSBT was 485 kB. The final raw signed transaction size was 59 kB.
If you're unfamiliar with how (most) hardware devices work when it comes to signing a bitcoin transaction, there are normally two steps:
- The transaction gets loaded onto the device, it parses the details and displays them on the screen for user confirmation. These details are generally the address(es) to which funds are being sent, the amount(s) being sent, and the fee being paid.
- Upon user confirmation, the device signs each transaction input and then returns the signed transaction to the wallet software.
Older hardware devices (such as the BitBox01 and original Ledger HW.1) did not have screens and thus did not perform the first step of loading and parsing the transaction details. These are less secure because you are blindly signing data without verifying it on dedicated hardware first.
I had no issues with the Trezors and Ledgers, but the Coldcards threw this error:
/usr/lib/python3.7/site-packages/electrum/plugins/coldcard/coldcard.py
line 222, in sign_transaction_startassert
20 <= len(raw_psbt) < MAX_TXN_LEN, 'PSBT is too big'
What can we see here? The max payload for the Coldcard wire protocol is 384kB. This unsigned transaction is only 10.5 kB thus we should be well under that, right? Well, the Coldcard library notes that "a PSBT might contain a full txn for each input." That's how the unsigned PSBT in this case ends up being 485 kB!
BitBox01 - took 8 presses to sign, about 2 minutes in total. Perhaps it signs inputs in batches? Electrum displayed a warning about the transaction size.
What was the final performance breakdown?
5 to 10 minutes to sign a single transaction? That's pretty rough!
3-of-5 multisig
Next I created two different 3-of-5 wallets comprised of:
Trezor One
Ledger Nano S
Coinkite Coldcard Mk2
ShiftCrypto Bitbox 01
ShiftCrypto Bitbox 02
Trezor Model T
Ledger Nano X
Coinkite Coldcard Mk3
ShiftCrypto Bitbox 01
ShiftCrypto Bitbox 02
I generated a 100 input transaction that was 14 kB unsigned. The unsigned PSBT was 505 kB. The fully signed raw transaction size was 87 kB.
Trezor Model T - took 2:20 to get halfway through loading the transaction and then threw an error.
File "/tmp/.mount_electrsT6YLa/usr/lib/python3.7/site-packages/trezorlib/transport/bridge.py", line 44, in call_bridgeraise TransportException(error_str)trezorlib.transport.TransportException: trezord: release/151 failed with code 400: session not found
Unsure why this happened as I only had the one wallet opened; I tried again and it took 3:50 to parse the transaction and display the details for confirmation. It took another 3:10 to sign all the inputs.
BitBox02 - I authorized the transaction and was met with no progress indicators in Electrum or on the device screen. After 10 minutes I gave up. I tried a second time and after 20 minutes I gave up. When I unplugged the BitBox02, Electrum threw a "read error" so I guess it was still doing something, but it's unclear if it would ever finish.
I came back to this wallet a few days later to retry in case I had somehow gotten into a bad state. It succeeded, taking 4:40 to parse the transaction and 1:47 to sign it.
Both Coinkite Coldcard Mk2 and Mk3 threw a "PSBT is too big" error, which makes sense given the PSBT was 505 kB.
1-of-8 multisig
I created a transaction with 100 inputs that was 13 kB unsigned. This resulted in a PSBT that was 418 kB. The fully signed raw transaction was 79 kB.
At this point I started experiencing failures with the Ledgers.
Ledger Nano S: silently fails after 5:40
Ledger Nano X: invalid status error after 5:25
Traceback (most recent call last):File "/tmp/.mount_electrIYmilV/usr/lib/python3.7/site-packages/electrum/plugins/ledger/ledger.py", line 475, in sign_transactionchipInputs, redeemScripts[inputIndex], version=tx.version)File "/tmp/.mount_electrIYmilV/usr/lib/python3.7/site-packages/btchip/btchip.py", line 265, in startUntrustedTransactionself.dongle.exchange(bytearray(apdu))File "/tmp/.mount_electrIYmilV/usr/lib/python3.7/site-packages/btchip/btchipComm.py", line 127, in exchangeraise BTChipException("Invalid status %04x" % sw, sw)btchip.btchipException.BTChipException: Exception : Invalid status 6f01I | plugins.ledger.ledger | Exception : Invalid status 6f01
Just for fun I tried creating a 17 input transaction to see if the Coldcard Mk2 could handle a medium complex transaction that was under the max PSBT size. It didn't like it.
Next I tried a 1 input transaction that was 183 bytes unsigned. Every device did well except I hit the "read error" again with Coldcard Mk2.
The final results for signing 100 input transactions in a 1 of 8 multisig:
8-of-8 multisig
I created a transaction with 100 inputs that was 26 kB unsigned. This resulted in a PSBT that was 418 kB. I wasn't able to add 8 signatures but by my calculation the fully signed transaction would have been 112 raw kB / 26 virtual kB / 102,515 weight units.
Trezor One: After 68 seconds I got a timeout.
BitBox02 - made me walk through the process of setting up the multisig wallet and verifying all 8 cosigners. I then authorized the transaction and was met with no progress indicators in Electrum or on the device screen. After 10 minutes I gave up. I tried a second time and after 5 minutes it loaded the transaction and asked me to confirm the transaction details on the device. Took 2 minutes to sign.
Ledger Nano S - After 8 minutes Electrum threw an "invalid status" error. Retrying gave the same result.
Ledger Nano X - Same as the S, at the 8 minute mark Electrum threw the invalid status error. Retrying gave the same result.
Both Coldcards predictably threw "PSBT is too big" errors.
The final signing performance results for a 100 input 8 of 8 multisig transaction:
Next I decided to try signing a 1 input transaction in the 8 of 8 wallet. This resulted in a transaction that was 311 bytes unsigned; the PSBT with all the xpubs was only 7 kB. Every device was able to sign except for the Coldcard MK2, which seemed to have a data handling error during the transaction validation step.
E | plugins.coldcard.coldcard.Coldcard_KeyStore |Traceback (most recent call last):File "/tmp/.mount_electrlyEgC9/usr/lib/python3.7/site-packages/electrum/plugins/coldcard/coldcard.py", line 383, in sign_transactionresp = client.sign_transaction_poll()File "/tmp/.mount_electrlyEgC9/usr/lib/python3.7/site-packages/electrum/plugins/coldcard/coldcard.py", line 233, in sign_transaction_pollreturn self.dev.send_recv(CCProtocolPacker.get_signed_txn(), timeout=None)File "/tmp/.mount_electrlyEgC9/usr/lib/python3.7/site-packages/ckcc/client.py", line 139, in send_recvbuf = self.dev.read(64, timeout_ms=(timeout or 0))File "hid.pyx", line 123, in hid.device.readOSError: read error
Specter Desktop
I had already spent several days running through all these tests, but it did bother me that due to the opaqueness of the software-hardware integrations I couldn't be sure that it wasn't simply Electrum choking in some cases. So I decided to set up a 3-of-5 multisig wallet on Specter Desktop to see if the results were the same. I didn't do an M of 8 because Specter doesn't support BitBox at the time of this test.
I created a 10 input transaction; signing performance (in seconds) was as follows:
I created a 100 input transaction and first chose to sign with Coldcard Mk3. Specter asked if I wanted to sign via HWI or PSBT, so I chose the former. After 1:50 I got a "bad txn len" error which sounds similar to "PSBT is too big."
Otherwise, the Ledger and Trezor signing went smoothly:
How do the numbers compare for signing a 100 input 3 of 5 multisig transaction?
We can see that the Trezor performance results are in the same ballpark though Ledger is almost 50% slower with Specter. Why might this be the case? Well, Electrum is a python application that directly uses the python USB libraries from hardware vendors. Specter is also a python application, though it uses HWI which is an abstraction layer that sits on top of the vendor libraries. There could be some differences between how HWI uses the libraries and how Electrum uses them.
Final thoughts
Remember that these tests were for ONE particular trait of hardware devices: multisig signing performance. There are many other traits around which one can judge hardware devices. For example, BitBox01's results are unfairly fast because it doesn't even parse transaction details for user verification, which is a major security flaw.
It's clear that hardware devices perform well for small simple transactions. But as you increase transaction complexity, you're going to start having a bad time.
After spending several days repeating these exercises I came to really dislike hardware devices that don't show progress indicators for loading and signing. As such, I highly prefer Coldcard and Trezor in this respect. BitBox and Ledger are anxiety-inducing because you have no idea if anything is actually happening. I hope they will look into adding progress indicators.
Note that while Coldcard Mk3 can't sign a 100 input transaction due to size limits, that doesn't mean it's unusable for multisig wallets. You can always break up a send into multiple smaller transactions that are below its limits - I just think it's a suboptimal user experience to do so. I spoke with Coinkite about this issue and was told that it's a hardware limitation that will be fixed in the next iteration of their product.
The length of time to process complex transactions can get pretty bad - I'd suggest that if your device locks itself from inactivity because it took so long to process a request, you've got some work to do. At the very least it seems like device manufacturers should disable the screen lock timeout while the device is actively parsing or signing a transaction. To give a comparison of just how computationally slow these devices are: signing a 100 input 15 of 15 multisig transaction with my laptop CPU takes 3 seconds.
The USB bridge interfaces are not as robust as I'd like. Sometimes the communication layer or session would get screwed up and Electrum would not be able to detect the device until I restarted the app. Not sure if this is something that could be improved by the hardware manufacturers or the software wallet developers.
PSBT is great for compatibility but is really bloated. I think all hardware devices should support PSBT and should be designed in order to be able to support the entire range of possible valid multisig transactions. However, I expect that in order to do so, hardware manufacturers will need to significantly bump up their specs. For reference, an unsigned 15-of-15 PSBT with 100 inputs is 586 kB.
In general it seems like multisig is an afterthought for many hardware manufacturers. They tend to spend their resources working on (single signature) full stack solutions for their customers. I believe it's time for hardware manufacturers to start acting like platform providers and ensure that they are providing robust platforms that can be used to build a wide variety of solutions.
As Casa continues to evaluate which hardware devices to support for our users, I expect that this suite of tests will be added to our internal standards.
Interested in upgrading to a user-friendly multisig setup?
Casa is here to support you every step of the way on your journey to self-custody. Learn more about our membership plans here.
Stay in the know
Our weekly Security Briefing is free to join. Sign up below for bitcoin security + privacy updates delivered right to your inbox.