Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DJTAG2 protocol discussion #77

Closed
jeanthom opened this issue Jan 11, 2021 · 28 comments
Closed

DJTAG2 protocol discussion #77

jeanthom opened this issue Jan 11, 2021 · 28 comments

Comments

@jeanthom
Copy link
Collaborator

In a discussion with @phdussud he suggested an evolution of the current USB protocol to allow for better performance. This GitHub issue is here to discuss how we should implement the next iteration of the DirtyJTAG USB protocol.

The current protocol is nicknamed "DJTAG1". You can actually query this with CMD_INFO (source code).

We should ideally make it retrocompatible by making DJTAG2 a superset of DJTAG1. Newer software should CMD_INFO to query whether or not they can use DJTAG2 commands or if they're stuck with DJTAG1.

Proposal 1: increase maximum transfer size

@phdussud wrote in #76:

One last thing is that I believe the biggest improvement that can be made after this is increasing the packet length from 32 to 64. This requires changing the cmd structure (2 bytes for the length), therefore clients will have to change. This is the reason I didn't do it.

Personnal comment: USB bulk transaction for full-speed devices (like DirtyJTAG) are limited to 64 bytes. For a transfer we would need:

  • 1 command identification byte
  • 2 bytes for transfer length

That would push the maximum transfer size to 61 bytes.

Proposal 2: specify if transfers should return data

@phdussud wrote in #76:

Another simple substantial improvement: we can introduce a transfer that does not send back a USB packet. Most of the programming involves only a write operation to the device. I implemented it and my timing improves quite a bit. With the best clock rate (SPI) it takes 253us for each 30byte of transfer. Only 53us is actually spend on the Jtag bus. The rest is USB overhead.
If I implement the transfer with no read, the time goes down to 172us. In my example, the fastest wall clock time for the end to end programming is 2.9sec with normal transfers and 2s with no read transfer
This is with OpenFPGALoader which already knows when it does not need to read back.

Proposal 3: add bootloader support

Using DirtyJTAG with a bootloader is currently very limited. Adding a command for jumping to bootloader would allow seamless upgrades. For dapboot we have to set magic values into the RTC backup registers before resetting the MCU.

@phdussud
Copy link
Contributor

For Proposal 1 and 2:
May I propose something a little different:
We use a full byte for the command ID and we have very few commands. As a result, the upper 4 bits of the command byte is always 0. I propose that we only support 16 commands coded in the lower 4 bits and use the upper 4 bits as a command modifiers. This way we don't have to duplicate code when the commands are very close like the ones we have been discussing in proposal 1 and 2.
For the transfer command, I propose that the bit 0x10 means no read is necessary and 0x20 means the transfer is 32 bytes longer than the length coded in the transfer length.

@phdussud
Copy link
Contributor

Sorry I got confused between bytes and bits for a moment.
For the transfer command, I propose that the bit 0x10 means no read is necessary and 0x20 means the transfer is 256 bits longer than the length coded in the transfer length.

@jeanthom
Copy link
Collaborator Author

That sounds better. So we'd have something like this:

  • CMD_XFER, acts like the legacy DJTAG1 transfer command: for transfers with variable lengths up to 256 bits
  • CMD_XFER|NO_READ, like CMD_XFER but without replying to the host
  • CMD_XFER|EXTEND_LENGTH: variable lengths from 256 to 496 bits
  • CMD_XFER| EXTEND_LENGTH|NO_READ: same as above but without replying to the host

Internally (in DirtyJTAG firmware) everything would be implemented as a single transfer function in jtag.c and the command parsing code in cmd.c would take care of calling jtag_transfer() with the right parameters.

Does that sound alright to you?

@phdussud
Copy link
Contributor

Exactly!
Now that I think more about it, I would like NO_READ to be 0x80 and EXTEND_LENGTH to be 0x40. This way are not not locked in into 16 commands max, we can still use 64 commands. We can make the decision later to use yet other bits in the high 4 bits or leave them for more commands.

@jeanthom
Copy link
Collaborator Author

Yep that would be better for later.

Also I remembered that ST's genuine ST-Link adapter (the "white pod" one) has configurable I/O levels, we should add a command to take care of that.

@jeanthom
Copy link
Collaborator Author

Undergoing work on DJTAG2 is happening in the #78 PR.

@phdussud
Copy link
Contributor

I have an implementation of the new protocol in another branch of my fork. However the EXTEND_LENGTH seems to have a problem. I also implemented the required code in OpenFPGALoader and when I pass larger packets I get a CRC failure at the end of programming. I would appreciate some eye balls on the larger packet changes. The rest works (no read, SPI support and no timer loops). See https://github.com/phdussud/DirtyJTAG/tree/spi
Thank you very much!
Patrick

@phdussud
Copy link
Contributor

@jeanthom Sorry I didn't notice your work on most of the new commands attribute. I will pick up your code and integrate in my branch, as the code changes I have are largely about making the jtag_transfer command go faster.

@phdussud
Copy link
Contributor

phdussud commented Jan 15, 2021

OK. I integrated jeanthom changes in my spi branch. I would prefer that we don't send back 64 bytes all the time in jtag_transfer, as it is more expensive.

@phdussud
Copy link
Contributor

The issue with EXTEND_LENGTH and openFPGALoader was my fault, on the openFPGALoader side. It is now fixed and my tests pass.
@jeanthom It seems to me that you probably want to tag the latest commit as being the last of version 1. Unless you want to create a branch for version 1 but fixes.
My changes (in https://github.com/phdussud/DirtyJTAG/tree/spi ) will be for version 2. because it requires a change in the pinout for bluepill board to be the maximum performance.
Thoughts?

@phdussud
Copy link
Contributor

A note about performance:
Before my first PR openFPGALoader and bluepill was taking 13-14s to load a bit file into a ulx3s board. With https://github.com/phdussud/DirtyJTAG/tree/spi it now takes 1.3s (10x faster). This is with changes to openFPGALoader to take advantage of the version 2.
For reference, a FT232H will load the same bit file in 0.9s

@phdussud
Copy link
Contributor

phdussud commented Apr 3, 2021

@jeanthom About CMD_SETSIG/CMD_GETSIG mentioned in #88. I didn't suggest changing or removing CMD_SETSIG/CMD_GETSIG. I was pointing out that the JTAG interface will ignore TMS, TDI pins levels unless they are sampled by a rising TCK. This lead me to think that accomplishing this with CMD_SETSIG isn't the most performant and does not lend itself to a good timing of the signal transitions with respect to TCK edges. It is even slower if we need to capture the state of TDO during the rising edge of TCK. I think that the idea of CMD_CLK with a modifier to capture TDO is an excellent idea. I would suggest that this modifier should be allowed only when the clock count is 1.

@jeanthom
Copy link
Collaborator Author

jeanthom commented Apr 4, 2021

@phdussud What is your opinion on eb80bec?

@phdussud
Copy link
Contributor

phdussud commented Apr 4, 2021 via email

@jeanthom
Copy link
Collaborator Author

jeanthom commented Apr 4, 2021

I tried to fix this in 7060127

I wanted to avoid doing lots of checks inside the loop but I'm not sure that's the right way to do it. What do you think?

@phdussud
Copy link
Contributor

phdussud commented Apr 4, 2021 via email

@jeanthom
Copy link
Collaborator Author

jeanthom commented Apr 5, 2021

CMD_GOTOBOOTLOADER was tested OK on an Olimex STM32-H103 with dapboot as its bootloader.

@phdussud
Copy link
Contributor

phdussud commented Apr 9, 2021

Do you have access to a Raspberry Pico board? If yes, let me know... I have a preliminary implementation of DJtag2 on it. It needs testing with other software than OpenFPGALoader.

@phdussud
Copy link
Contributor

phdussud commented Apr 9, 2021

Another question... Why stop cmd processing when the command produces an output? If we didn't we could send a cmd request like CMD_SETSIG, CMD_SETSIG, CMD_GETSIG, CMD_SETSIG and it would be able to efficiently set values, generate a rising clock get TDO, and generate a falling clock. I may be missing something.

@zoobab
Copy link
Collaborator

zoobab commented Apr 9, 2021

I do have a picoboard, if you want it, shoot me an email at zoobab at Gmail.

@jeanthom
Copy link
Collaborator Author

jeanthom commented Apr 12, 2021

Another question... Why stop cmd processing when the command produces an output? If we didn't we could send a cmd request like CMD_SETSIG, CMD_SETSIG, CMD_GETSIG, CMD_SETSIG and it would be able to efficiently set values, generate a rising clock get TDO, and generate a falling clock. I may be missing something.

I think this design choice finds its roots in my extreme laziness 😄 I wanted to avoid keeping track of which commands I sent were supposed to return data. With that said, I'm not exactly sure if this is really a feature that would boost performance? What do you think?

@phdussud
Copy link
Contributor

phdussud commented Apr 12, 2021 via email

@zoobab
Copy link
Collaborator

zoobab commented Apr 13, 2021

"Anything that cuts down the number of USB packets is a performance improvement."

Packet batching, that's what Zeromq is using for optimising TCP performance.

@jeanthom
Copy link
Collaborator Author

jeanthom commented Apr 13, 2021

I would be ok if we sent one packet per response, which won't materially change the structure of programs taking advantage of it, since they already need to do a USB read per response.

Do you think we should introduce a command flag/command list header in DJTAG2 for packet buffering?

@phdussud
Copy link
Contributor

My proposal would be to change the DJTAG2 protocol to allow adding commands after one that returns a value. There would be one USB IN packet per return. I am not sure if allowing batching of returned values into one USB IN packet makes a great deal of sense because many times, the host program needs to read the first value returned in order to issue other types of commands, beyond "epilogue" commands that are part of the JTAG state machine. I also would specify that you can't add a CMD after a CMD_XFER since the rest of the buffer is assumed to be data. If there are a lot of short transfers, we could lift this restriction, at the expense of a more complicated implementation on the device side. We can do all of this without a new flag or command as long as we can break compatibility with DJTAG2 (we assume that all DJTAG2 devices have the new capability).

@jeanthom jeanthom pinned this issue Apr 24, 2021
@jeanthom
Copy link
Collaborator Author

jeanthom commented Apr 24, 2021

DJTAG2 hasn't been merged into master yet, we still can do huge changes to DJTAG2 as long as we don't break compatibility with DJTAG1.

@phdussud
Copy link
Contributor

phdussud commented Jul 6, 2021

Ok, then I propose this. CMD_XFER is terminating: The command buffer will not be read beyond the last value to transfer. The value returning commands are not terminating: The next command in the command buffer will be processed. This means that all command buffers except for the CMD_XFER need to be terminated by CMD_STOP.
I verified that openFPGALoader already complies.
I will implement this shortly for the PICO implementation of DJTAG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants