Casa Blog - Bitcoin Security Made Easy

Casa is built upon a foundation of several pillars: multiple geographically distributed keys, dedicated hardware devices to secure them, thoughtfully designed user experience, and high quality client services. The dedicated hardware devices are a unique aspect of our architecture in that Casa has no control over them — and that's the point! This lack of control is crucial to Casa's security model, but it results in a far more challenging job for our team to provide a smooth user experience.

Casa supports a diversity of hardware devices and we'd like to support any device that we believe meets our standards. But what are our standards? This question is what spawned the following research.

The following article covers a series of stress tests I performed to determine hardware wallet capabilities when signing multisignature transactions of varying complexity. It's important for multisig software providers to understand the limits of hardware so that we can decide which hardware to recommend, which hardware to support, and so that we can customize our wallet software to handle hardware edge cases as gracefully as possible.

assortment-of-hardware-wallets

The hardware

Blockstream Jade, Firmware 1.0.27 No-Radio
Cobo Vault, V2.6.1 Bitcoin-Only Firmware
Coinkite Coldcard Mk4, Firmware 5.2.2
Coinkite Coldcard Q, Firmware 1.0.0Q
Foundation Passport Founder’s Edition, Firmware 2.3.0
Keystone Pro, B-3.4 Bitcoin-Only Firmware
Keystone 3 Pro, v1.1.0 Bitcoin-Only Firmware
Ledger Nano S, Firmware 2.1.0, BTC app 2.1.3
Ledger Nano S Plus, Firmware 1.1.1, BTC app 2.1.3
SeedSigner, Firmware 0.7.0
Shift Crypto BitBox02, Firmware 9.16.0
Specter DIY, Firmware v1.8.3
Trezor Model One, Universal 1.12.1
Trezor Model T, Universal 2.6.4 Firmware
Trezor Safe 3, Universal 2.6.4 Firmware

It's worth noting that the Cobo Vault, Keystone Pro, and Ledger Nano S are no longer for sale; they are considered obsolete and have been replaced by newer versions.

Why didn’t you test device _____?

Since people will inevitably ask “why didn’t you test device X?” The reason is simple: I need all of the tested devices to work interoperably with Electrum / Sparrow / Specter. Devices that aren’t integrated into a variety of multisig software wallets aren’t interesting to us.

Why is this a requirement? Because Casa is dedicated to eliminating single points of failure for our members, which includes Casa itself. In order to uphold our principle of Sovereign Recovery then it must be possible for Casa users to be able to spend their funds from other multisig software wallets in the (unlikely) event that Casa ceases operating for any reason.

In order to ensure that I could test a device fairly, it needs to support at least one of:

  • USB drivers via Bitcoin HW.I
  • A QR code data transmission standard like UR 2.0 or BBQR
  • Sneakernet transfer of PSBT files via microSD card

The software

The following tests were conducted with Sparrow Wallet 1.8.2 and Electrum 4.5.3 (Ledger only) on Ubuntu 22.04. Note that the wallet software can actually make a performance difference — for example, I saw that signing with Trezor was around 40% faster on Electrum while Ledger is 10X - 100X faster on Electrum. How could this be? It’s due to differences in the code libraries used to communicate with the devices over USB.

Why bother?

From a network perspective any bitcoin transaction that is less than 100kB in virtual size / less than 400000 weight units should be accepted and relayed by nodes as long as it's valid and pays above the minimum relay fee. I'd consider this to be the minimum complexity for which hardware devices ought to support. All of the test transactions created during my research were smaller than this.

However, the above size limits can be reached in a variety of ways. The bulk of a transaction's size / weight usually comes from the signatures on the transaction's inputs. Thus you can achieve a large transaction size by having many single signature inputs or by having somewhat fewer inputs but with multiple signatures per input.

The transaction process

  1. Construct unsigned transaction (instant)
  2. Transfer unsigned transaction to device
    1. USB (fastest)
    2. MicroSD (medium)
    3. Animated QR Code (slowest)
  3. Load & Parse Transaction (10 sec - 10 min)
  4. Sign Transaction (instant - hours)
  5. Transfer signed transaction off device
    1. USB (fastest)
    2. MicroSD (medium)
    3. Animated QR Code (slowest)

For the purposes of these tests I’ll be focusing on step 4, though I will make note of times when other steps can be particularly slow.

The multisig setups

For my testing I mostly used Sparrow on Ubuntu. Linux is a bit tricky to get working with hardware devices; you have to set up udev rules in order to allow your computer to communicate with them.

I created a variety of P2WSH (native segwit) multisig wallets with the hardware devices I was testing. Naturally these wallets are all using bitcoin's testnet because that's what it's for, otherwise testing would get pretty expensive! Then I funded each wallet with 100 deposits to create 100 UTXOs.

2-of-3 multisig

I started off by creating a bunch of 2-of-3 multisig wallets from my available hardware devices.

Then, after funding each wallet with 100 deposits I created 2 transactions: one transaction that spent 10 UTXOs and one that spent all 100 UTXOs. Here are the signing times noted in HOURS:MINUTES:SECONDS format.

table-showing-2-of-3-performance

We can see that all devices performed well when signing 10 inputs. But 100 inputs becomes problematic. The Passport doesn't have enough memory to complete the operation, while Ledgers using HWI are so slow that it's effectively a failure. Nearly 2 hours to sign a transaction is pretty ridiculous!

3-of-5 multisig

Next I created several different 3-of-5 wallets and repeated the process. Here you can see all of the signing times in one spreadsheet.

table-showing-multiple-input-performance

I must say I was completely unprepared to have to spend five and a half hours sitting at my computer watching a Ledger sign a transaction. I'd like to never have to do that again.

Scaling performance

Although these tests are comparing all of the devices against each other, we can also compare each device against itself. How so? Because each round of testing involves 4 different transactions: two with 10 inputs and two with 100 inputs.

If a device scales linearly then the 100 input transaction should take 10X as long as the 10 input transaction. If it scales well, it will take less than 10X longer. If it scales poorly, it will take more than 10X longer.

table-showing-performance-with-up-to-a-hundred-inputs

It's quite obvious that Ledger with HWI scales terribly. More details on that in a bit.

Device usability notes

Specter DIY’s camera is basically worthless for scanning QR codes because it doesn’t actually display what the camera is seeing on the screen, so you have no way of knowing if you are pointing it at the QR code accurately. I wasn't able to successfully transfer a PSBT via this method, so I stuck to microSD card data transfers.

Coldcard Q1 is slightly better because it gives you a red laser line to center on the QR, but you don't have a great idea what the actual boundaries are of what the camera is seeing, so you’re partially blind as to how accurately you’re pointing the camera. I think a potential simple UX improvement here would be to have the laser project either an outline of a box or a cross so that you have a better idea of the camera's boundaries along each axis.

It's worth noting that SeedSigner's amazing signing performance results are overshadowed by a major performance downside. Scanning large PSBTs with SeedSigner was excruciating. For a 100 input 2-of-3 PSBT it would get to 99% after 8 minutes and then hang, likely because it had missed a few frames. It took a total of 14 minutes for me to get that unsigned transaction fully scanned. And on the extreme end, it took me 48 minutes to scan the 100 input 3-of-5 PSBT data.

It’s unfortunate that you can’t load PSBTs via microSD card; it would make signing huge transactions nearly painless. Also, since SeedSigner is stateless and doesn’t actually store your seed, it’s very important to create a SeedQR of your seed so that you don’t have to type the seed in every time you turn the device on.

Another slight annoyance is that the SeedSigner and Specter DIY are the only QR code based devices that don’t have an on-board battery, so you have to keep a USB cord connected to some other power source the entire time you’re using them.

An annoyance with the Keystone Pro 3 is that it’s quite difficult to pop the microSD card out — I had to use tweezers. Also, I have 3 versions of Keystone hardware on hand and it’s odd that they only support multisig wallets with the bitcoin-only version. And even then it’s rather difficult to get a multisig testnet wallet working. I was able to successfully export the extended public keys into Sparrow from Cobo Vault Pro and Keystone Pro, but ONLY via microSD export of a json file and NOT via QR code. Unfortunately, the Keystone Pro 3 doesn't seem to have the same "export all public keys to file" option that the older devices have that worked for me. And it will only export the single signature P2WPKH public key for testnet. I created a multisig wallet using that key, but then Keystone Pro 3 wouldn’t import the wallet descriptor. The folks at Keystone tell me that multisig testnet support is still in the works. As such, the Keystone Pro 3 was the only device I wasn’t able to test; I’ll update this article once firmware with multisig testnet support is released.

After spending several days repeating these exercises I came to really dislike hardware devices that don't show progress indicators for loading and signing. As such, Ledger is anxiety-inducing because you have no idea if anything is actually happening — the "processing" screen doesn't even animate, which is particularly frustrating when signing a transaction that takes tens of minutes.

Ledger’s performance slowdown

You probably noticed that Ledger is over 10 times slower when you use it with an HWI based wallet.

When I want to sign a 100 input transaction it makes me register the wallet NINE TIMES. As a result, signing a 3-of-5 100 input transaction requires FOUR HUNDRED clicks on the Ledger in order to register all of the public keys 9 times. I thought it was odd how much slower Ledger was compared to my tests 4 years ago, so I reached out for an explanation.

I was told that it’s actually due to how different libraries handle Ledger’s new wallet policy framework. Since each wallet policy represents an "account," the app independently re-derives the script of each input to make sure it's internal to the "account." These are essentially the same kind of checks that are performed on change addresses to ensure they belong to your wallet, but now the checks are also performed on each input.

It turns out that it's up to the software wallet to store the HMAC that the device returns after registering the wallet policy; the device doesn't store anything itself. The current version of HWI provides no way of getting the HMAC, thus you have to keep re-registering the same wallet policy. Hopefully this pull request to add better wallet policy support to HWI will get merged and drastically improve the performance and UX.

Thus a slowdown is expected, but the security model is stronger, as the device now guarantees that all the inputs are spending from that exact account that you registered.

In short: the performance slowdowns are not a result of Ledger's hardware getting worse, but rather due to inefficiencies in how some types of wallets interact with the device.

On a different note: Ledger Stax hasn’t started shipping yet, but I’m told that it has the same secure element as the Nano S Plus, thus the performance should be the same.

Final thoughts

Remember that these tests were for ONE particular trait of hardware devices: multisig signing performance. There are many other traits around which one can judge hardware devices.

It's clear that hardware devices perform well for small, simple transactions. But as you increase transaction complexity, you're going to start having a bad time.

The length of time to process complex transactions can get pretty bad. To give a comparison of just how computationally slow these devices are: fully signing a 100 input 15 of 15 multisig transaction with my laptop CPU takes a mere 3 seconds.

It's been 4 years since my last round of testing and multisig still feels like an afterthought for many hardware manufacturers. They tend to spend their resources working on (single signature) full stack solutions for their customers. Perhaps multisig users are too rare for hardware companies to spend much time thinking about them, but the flip side of this is that poor hardware UX could actually turn people off from using multisig.


Interested in upgrading to a user-friendly multisig setup?

Casa is here to support you every step of the way on your journey to self-custody. Learn more about our membership plans here.


Stay in the know

Our weekly Security Briefing is free to join. Sign up below for bitcoin security + privacy updates delivered right to your inbox.