Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Andrew N Sloss, Dominic System and Chris Wright,” ARM System Developers Guide”, Elsevier,

Andrew N Sloss, Dominic System and Chris Wright,” ARM System Developers Guide”, Elsevier,

Published by Demo 1, 2021-07-03 06:41:10

Description: Andrew N Sloss, Dominic System and Chris Wright,” ARM System Developers Guide”, Elsevier,

Search

Read the Text Version

680 Index Instruction set (continued) unsigned 64-bit by 64-bit multiply with data processing instructions 128-bit result, 209–210 arithmetic instructions, 53–55 barrel shifter. see Barrel shifter normalization of comparison instructions, 56–57 on ARMv4, 213–215 logical instructions, 55–56 on ARMv5 and above, 212–213 move instructions, 50 description of, 212 multiply instructions, 57–58 description of, 26, 47–50, 48t–49t overflow of, 265 Jazelle, 26–27, 27t Intel XScale loading constants, 78–79 load-store instructions D-cache cleaning in, 435–438 multiple-register transfer. see digital signal processing on, 278–280 Multiple-register transfer instruction cycle timings, 659–660 single-register load-store addressing Intel XScale SA-110, 453–456 modes, 61–63 Interrupt(s) single-register transfer, 60–61 assigning of, 324–325 swap instruction, 72–73 description of, 33, 317 program status register instructions, 75–76 software, 324 16-bit, 6 Interrupt controller registers, 349t software interrupt instruction, 73–75 Interrupt controllers, 12 Thumb Interrupt handler ARM-Thumb interworking, 90–92 nested, 325, 333, 336–342 branch instructions, 92–93 nonnested, 333–336 code density, 87, 88f prioritized direct, 333, 356–359 data processing instructions, 93–95 prioritized group, 333, 359–363 decoding, 88f, 639–641 prioritized simple, 333, 346–352 description of, 26, 27t prioritized standard, 333, 352–356 encodings, 638–644 reentrant, 333, 342–346 list of, 89t Interrupt handling schemes, 317 load and store offsets, 132t Interrupt latency, 325–326 multiple-register load-store instructions, Interrupt masks, 27 97–98 Interrupt request overview of, 87–89 assigning of, 324 register usage, 89–90 description of, 318t, 322 single-register load-store instructions, exceptions, 326–329 96–97 stack design and implementation, 329–333 software interrupt instruction, 99 Interrupt request mode, 23–24, 26t, 27 stack instructions, 98–99 Interrupt request vector, 33 Interrupt stack, 343 Integer Inverted logical relations, 183 double-precision multiplication .irp, 634 description of, 208 long long multiplication, 208–209 J signed 64-bit by 64-bit multiply with 128-bit result, 211–212 J bit, 22 Jazelle, 26–27, 27t JTAG, 38

Index 681 K base-two, 242–244 calculation of, 242f KEEP, 629 Logarithmic indexing, 190–191 Logarithmic representation of digital signal, 263 L Logical cache, 406, 407f, 458 Logical instructions, 55–56 L1 translation table base address, 503–504 Long long multiplication, 208–209 Latency, 30 Loop(s) LCLA, 629 counted LCLL, 629 LCLS, 629 decremented, 183–184 LDC instruction, 583–584 types of, 190–191 LDM instruction, 65, 164, 584–586 unrolled, 184–187 LDMIA instruction, 66, 67f, 97 with fixed number of iterations, 113–116 LDR instruction, 60, 63, 64t, 78, 96, 106t, 164, nested example of, 176 319, 586–589 multiple, 187–190 LDRB instruction, 60, 96, 106t unrolling, 117–120, 184–187 LDRD instruction, 106t with variable number of iterations, 116–117 LDRH instruction, 60, 96, 106t, 109 writing for, 120 LDRSB instruction, 60, 96, 106t Loop counter, 114–115 LDRSH instruction, 60, 96, 106t Loop overhead, 118–119 Least recently used, 422 LS1, 165 Left shifts, saturation of, 253–254 LS2, 165 Level 1 page table entry, 501–503 LSL instruction, 94, 589 Level 2 page table entry, 504–505 LSR instruction, 94, 589–590 Link register LTORG, 629 description of, 22, 121t M offsets, 322–324 Little-endian mode, 137, 138t Machine independent layer, 370 Load instructions scheduling MACRO, 629 overview of, 167–168 .macro, 634 by preloading, 168–169 MACRO directive, 202 by unrolling, 169–171 MAP (alias ∧), 630 Loading constants, 78–79 MCR instruction, 590 Load-store architecture, 5, 19–20 MCRR instruction, 590 Load-store instructions Memory multiple-register transfer. see cache. see Cache Multiple-register transfer content addressable, 414 single-register load-store description of, 9 dynamic random access. see DRAM description of, 61–63 fetching instructions for, 10t Thumb instruction set, 96–97 hierarchy of, 9–10, 404f single-register transfer, 60–61 main swap instruction, 72–73 Local variable data types, 107–110 cache and, relationship between, 410–412 Locality of reference, 407, 457 description of, 405 Lock bits, for cache lockdown, 450–453 management of, 35–36 Logarithm

682 Index Memory (continued) definition of, 506 nonprotected, 35 functions of, 506 random access. see RAM hit, 506 read-only. see ROM lockdown registers, 510t remapping of, 14, 14f miss, 506 secondary, 405 operations, 509–510 size of, 10 single-step page table walk, 507–508 static random access. see SRAM two-step page table walk, 508–509 synchronous dynamic random access. see write buffer, 512–513 DRAM Memory protection units tightly coupled, 35, 36f, 405 access permission for, 470–474 types of, 10–11 description of, 35, 461–462 virtual. see Virtual memory system initializing of width of, 10 access permission, 470–474 cache attributes, 474–477 Memory controllers, 11 demonstration of, 481–482, 485–486 Memory management units enabling of regions, 477–478 region size and location, 466–470 access permission, 510–512 write buffer attributes, 474–477 ARM, 501 protected regions attributes of, 492–493, 493t access permission for, 470–474 caches, 512–513 assigning of, 479–481 coprocessor 15 and, 513–515 background regions, 464–465 definition of, 491 configuring of, 482–485 description of, 35–36, 406–408, 462 enabling of, 477–478 domains, 510–512 governing rules for, 463–464 fast context switch extension initializing of, 482–485 location of, 466–470 definition of, 515 overlapping regions, 464 domains used by, 518–519 size of, 466–470 features of, 515–516 sample demonstration of hints for, 519–520 context switch, 486 page tables used by, 518–519 description of, 478 schematic diagram of, 517f initializing, 481–482 virtual addresses modified by, 516 memory map for assigning regions, functions of, 491 multitasking and, 497–499 479–481 page tables mpuSLOS, 487 activation of, 497 system requirements, 479 architecture of, 501–502 MEND, 629 context switch activation of, 497 MEXIT, 629 definition of, 495 Miss rate, 417 L1 translation table base address, 503–504 Mixed-endianness support, 560 types of, 502t MLA multiply instruction, 57–58, 590–591 regions, 492 MMU. see Memory management unit simple little operating system, 545 mmuSLoS, 492 tasks in, 493 Modified virtual address, 516 translation lookaside buffer CP15:c7 commands, 509t, 509–510

Index 683 Most significant word multiplies, 558–559 integer normalization for, 212 MOV instruction, 94, 591–592 Q15 fixed-point, 233–235 Move instructions, 50 Q31 fixed-point, 235–237 MPU. see Memory protection unit unsigned 32/32-bit, 225–230 mpuSLOS, 487 square root by, 240–250 MRC instruction, 592 NOFP, 630 MRRC instruction, 592 Nonnested interrupt handler, 333–336 MRS instruction, 75–76, 592 Nonprivileged mode, 23 MSR instruction, 75–76, 592–593 Nonprotected memory, 35 MUL multiply instruction, 57–58, 94, 593–594 NOP instruction, 595 Multiple-register transfer Normalization, integer on ARMv4, 213–215 description of, 63 on ARMv5 and above, 212–213 stack operations, 70–72 description of, 212 Thumb instruction set, 97–98 Multiplication O double-precision integer One-cycle interlock, 166, 166f signed 64-bit by 64-bit multiply with Operating systems, 14–15 128-bit result, 211–212 OPT, 630 Optional expressions, 570 unsigned 64-bit by 64-bit multiply with ORR instruction, 55, 94, 595–596 128-bit result, 209–210 P repeated divisions converted into, 143–145 Multiply instructions, 57–58 Packing Multiply-accumulate unit, 20 fixed-width bit-field, 191–192 Multiprocessing synchronization primitives, of variable-width bitstreams, 192–194 560–562 Page Multitasking, 497–499 definition of, 494 MVN instruction, 94, 594–595 regions defined using, 495–497 N Page frame definition of, 494 NEG instruction, 94, 595 mapping pages to, 496f Negative indexing, 190 Nested interrupt handler, 325, 333, 336–342 Page size, 505–506 Nested loops Page table(s) example of, 176 access permission, 512 multiple, 187–190 activation of, 497 Network order, 192 architecture of, 501–502 Newton-Raphson iteration context switch activation of, 497 division by definition of, 495 demonstration of, in virtual memory system applications of, 223–224 on ARM9E, 217 activation of, 539–540 description of, 223–225 data structures, 525–529 fractional values defining of, 525 filling of, with translations, 531–538 initial estimate for, 231 initializing of, in memory, 529–531 iteration accuracy, 232 locating of, 525 overview of, 230 theory of, 231

684 Index Page table(s) (continued) POP instruction, 70, 98, 597 fast context switch extension use of, 518–519 Postindex, 62–63 L1 translation table base address, 503–504 Prefetch abort, 318t, 322 types of, 502t Prefetch abort vector, 33 Preindex, 62–63, 96 Page table control block, 527 Preindex with writeback, 62 Page table entry Primitives definition of, 495 definition of, 207 Level 1, 501–503 double-precision integer multiplication Level 2, 504–505 page size selection, 505–506 description of, 208 Page table walk long long multiplication, 208–209 single-step, 507–508 signed 64-bit by 64-bit multiply with two-step, 508–509 Periodic interrupt, 382 128-bit result, 211–212 Peripheral component interconnect bus, 8 unsigned 64-bit by 64-bit multiply with Peripherals description of, 11 128-bit result, 209–210 function of, 7 multiprocessing synchronization, 560–562 interrupt controllers, 12 permutations, 250t memory controllers, 11 Prioritized direct interrupt handler, 333, Permutations bit 356–359 Prioritized group interrupt handler, 333, description of, 249t, 249–250 examples of, 251–252 359–363 macros, 250–251 Prioritized simple interrupt handler, 333, description of, 249t Physical addresses, 492 346–352 Physical cache, 406, 407f, 458 Prioritized standard interrupt handler, 333, Pipeline definition of, 29 352–356 description of, 4 Priority mask table, 352 executing characteristics, 31–32 Privileged mode, 23 filling of, 30 PROC. see FUNCTION five-stage, 31f Process control block, 385 schematic diagram of, 30f Profiler, 163 six-stage, 31f Profiling, 163 three-stage, 30, 30f Program status registers Pipeline bubble, 166 Pipeline flush, 167 current. see Current program status register Pipeline hazard, 165 decode, 645 Pipeline interlock, 165, 208 instructions, 75–76 PKH instruction, 596 schematic diagram of, 23f Platform operating systems, 14 Protected regions, for memory protection PLD instruction, 596–597 Pointer aliasing, 127–130 units Polling, 382–383 access permission for, 470–474 assigning of, 479–481 background regions, 464–465 configuring of, 482–485 enabling of, 477–478 governing rules for, 463–464 initializing of, 482–485 location of, 466–470

Index 685 overlapping regions, 464 current. see Current program status register size of, 466–470 decode, 645 Pseudoinstructions, 78–79 instructions, 75–76 Pseudorandom numbers, 255 schematic diagram of, 23f Pseudorandom replacement, 419, 458 special-purpose, 22 PUSH instruction, 70, 98, 597 Thumb, 89–90 types of, 22 Q in user mode, 21f, 21–22 Register allocation Q representation, 264 C compilers, 120–122 Q15 fixed-point division, by Newton-Raphson description of, 171 maximizing the available registers, division, 233–235 Q31 fixed-point division, by Newton-Raphson 177–180 variables division, 235–237 QADD instruction, 81, 597–599 allocation to register numbers, 171–175 QDADD instruction, 81, 597–599 more than 14 local variables, 175–177 QDSUB instruction, 81, 597–599 Register file, 20, 405 QSUB instruction, 81, 597–599 Register numbers, 171–175 Register postindex, 63, 64t R Register set, 24f Repeated divisions converted into Race condition, 342 Radix-2 fast Fourier transform, 304–305 multiplications, 143–145 Radix-4 fast Fourier transform, 305–313 Repeated unsigned division with remainder, RAM 142–143 description of, 11 .rept, 634 dynamic, 11 .req, 634 Random number generation, 255 Reset exception, 390 Rd, 20 Reset vector, 33, 385 Read-allocate, 422 Return stack, 662 Read-write-allocate, 422 REV instruction, 599–600 Real-time operating systems, 14 Reverse subtract instruction, 54 RedBoot, 371–372 RFE instruction, 600 Reduced instruct set computer design. see RISC Right shift, rounded, 254, 264 RISC design design Reentrant interrupt handler, 333, 342–346 CISC vs., 4f Register(s) philosophy of, 4–5 RLIST, 630–631 argument, 172 Rm, 20 banked, 23–26 RN, 20, 630–631 function of, 4–5 ROM general-purpose, 21–22 description of, 10 link flash, 11 ROR instruction, 94, 600 description of, 22, 121t Round-robin algorithm, 383 offsets, 322–324 Round-robin replacement, 419 maximizing of, 177–180 ROUT, 631 names, 570–571 program status

686 Index SHADD instruction, 604–605 Shift operations, 572–573 RSB instruction, 54, 600–601 Signed 64-bit by 64-bit multiply with 128-bit RSC instruction, 54, 601 result, 211–212 S Signed data type, 112–113 Signed division by a constant, 147–149 SADD instruction, 601–603 Simple cache, 408, 409f Sandstone Simple little operating system code structure, 373–378 context switch, 396–398 description of, 372 device driver framework, 398–400 directory layout of, 372–373, 373f directory layout, 384–385 execution flow, 373t exceptions handling hardware initialization, 375, 377 remap memory, 375–377 description of, 389 reset exception, 374 IRQ exception, 393–394 Saturated arithmetic, 80–81 reset exception, 390 Saturation SWI exception, 390–393 absolute, 254 initialization, 385–389 ARMv6, 555–556 interrupts, 389 function of, 253 memory management unit, 545 left shift, 253–254 memory model, 389 32 bits to 16 bits, 253 memory protection units, 487 32-bit addition and subtraction, 254 mmuSLOS, 545 Saturation instructions, 81t mpuSLOS, 487 SBC instruction, 54, 94, 603 overview of, 383–384 SC100, 43 periodic timer, 388 Scaled register postindex, 63 scheduler, 394–396 Scheduler, 394–396 service routines, 384 Scheduling of instructions sin, 245 description of, 30, 163–167 Single instruction multiple data arithmetic load instructions operations, 550–554 overview of, 167–168 Single issue multiple data processing, 178 by preloading, 168–169 Single-register load-store instructions by unrolling, 169–171 SDRAM, 11 addressing modes, 61–63, 96 .section, 634 description of, 61–63 SEL instruction, 603–604 Thumb instruction set, 96–97 .set, 635 Single-register transfer, 60–61 Set associativity SMLA instruction, 605–607 description of, 412–414 SMLAL multiply instruction, 57–58 four-way, 413f, 414, 415f SMLALxy instruction, 82t increasing of, 414–416 SMLAWy instruction, 82t Set index, 412 SMLAxy instruction, 82t Set of defines, 339 SMLS instruction, 605–607 SETA, 631 SMMLA instruction, 607 SETEND instruction, 604 SMMLS instruction, 607 SETL, 631 SMMUL instruction, 607 SETS, 631 SMUA instruction, 608–609

Index 687 SMUL instruction, 608–609 STRB instruction, 60, 96, 106t SMULL instruction, 57–58 STRD instruction, 106t SMULWy instruction, 82t STRH instruction, 60, 64t, 96, 106t SMULxy instruction, 82t StrongARM SMUS instruction, 608–609 Software, 12–16 description of, 43 Software interrupt exception, 321 digital signal processing on, 274–275 Software interrupt instruction StrongARM1 instruction cycle timings, ARM, 73–75 655–656 Thumb, 99 SUB instruction, 54, 94, 615–616 Software Interrupt vector, 33 Subroutine, 160 .space, 635 Subtraction. see Trial subtraction SPACE (alias %), 631 Sum of absolute differences instructions, Spatial locality, 408 Spilled variables, 120 556–557 Split cache, 408, 424, 458 Supervisor mode, 23, 26t Square root Supervisor mode stack, 332 description of, 238 Swap instruction, 72–73 fixed-point representation signal, Swapped out variables, 120 SWI exception, 390–393 267–268 SWI instruction, 99, 616 by Newton-Raphson iteration, 240–250 Switches by trial subtraction, 238–239 SRAM, 11 on a general value x, 199–200 SRS instruction, 609 efficient, 197–200 SSAT instruction, 609 function of, 197 SSUB instruction, 609–610 on the range of 0 ó x ó N, 197–199 Stack base, 72 SWP instruction, 72, 616–617 Stack frame, 338, 341 SWPB instruction, 72 Stack instructions SXT instruction, 617–618 ARM, 70–72 SXTA instruction, 617–618 Thumb, 98–99 Synthesizable, 38 Stack limit, 72 System control coprocessor, 77 Stack operations, 70–72 System mode, 23–24, 26t Stack overflow, 329 System-on-chip architecture, 560 Stack overflow error, 72 Stack pointer, 72, 121t T Static predictor, 661 Static random access memory. see SRAM TEQ comparison instruction, 56, 618 Static task, 382 Test-clean command, for D-cache cleaning, Status bits, 408–409 STC instruction, 610 428t, 434–435 STM instruction, 65, 610–612 32-bit STMED instruction, 71 STMIA instruction, 97 addition, 254 STMIB instruction, 68 subtraction, 254 STR instruction, 60, 96, 106t, 612–615 32-bit interrupt controller register, 350f 32-bit/32-bit divide, unsigned by Newton-Raphson divide, 225–230 by trial subtraction, 218–220 32-bit/15-bit divide by trial subtraction, 220–222

688 Index Thrashing Truncation error, 228 definition of, 411, 412f TrustZone, 563–565 ways for reducing, 412 TST comparison instruction, 56, 94, 618–619 Thumb-2, 565 U Thumb instruction set UADD instruction, 619 ARM-Thumb interworking, 90–92 UHADD instruction, 619 branch instructions, 92–93 UHSUB instruction, 619 code density, 87, 88f UMAAL instruction, 619 data processing instructions, 93–95 UMLAL multiply instruction, 57–58, 620 decoding, 88f, 639–641 UMULL multiply instruction, 57–58, 620 description of, 26, 27t Unaligned data encodings, 638–644 list of, 89t description of, 136–140 load and store offsets, 132t handling of, 201–203 multiple-register load-store instructions, Undefined instruction, 318t, 321 Undefined instruction vector, 33 97–98 Undefined mode, 23, 26t overview of, 87–89 Underflow error, 72 register usage, 89–90 Unified cache, 408 single-register load-store instructions, 96–97 Unique identification number, 398 software interrupt instruction, 99 Unknown_condition routine, 362 stack instructions, 98–99 Unpacking Tightly coupled memory, 35, 36f, 405 fixed-width bit-field, 191–192 Trailing zeros, counting of, 215–216 variable-width bitstreams, 195–197 Transcendental functions Unrolled counted loops, 184–187 base-two exponentiation, 244–245 Unrolling base-two logarithm, 242–244 load instructions scheduling by, 169–171 description of, 241–242 of loop, 117–120, 184–187 trigonometric operations, 245–248 Unsigned 64-bit by 64-bit multiply with 128-bit Translation lookaside buffer CP15:c7 commands, 509t, 509–510 result, 209–210 definition of, 506 Unsigned 64/31-bit divide, by trial subtraction, functions of, 506 hit, 506 222–223 lockdown registers, 510t Unsigned 32-bit/32-bit divide miss, 506 operations, 509–510 by Newton-Raphson divide, 225–230 single-step page table walk, 507–508 by trial subtraction, 218–220 two-step page table walk, 508–509 Unsigned 32-bit/15-bit divide, by trial Trial subtraction, division by description of, 217–218 subtraction, 220–222 nonrestoring, 218 Unsigned data type, 112–113 restoring, 218 Unsigned division unsigned 64/31-bit divide by, 222–223 unsigned 32-bit/15-bit divide by, 220–222 by a constant, 145–147 unsigned 32-bit/32-bit divide by, 218–220 repeated, with remainder, 142–143 Trigonometric operations, 245–248 UQADD instruction, 620 UQSUB instruction, 620 USAD instruction, 620 USAT instruction, 620 User mode, 23–24, 26t

Index 689 User mode stack, 332 initializing of, in memory, 529–531 USMLAL macro, 211 locating of, 525 USUB instruction, 620 region data structures, 525–529 UXT instruction, 620 regions in physical memory, 522–525 UXTA instruction, 620 virtual memory maps, 522, 524f fixed mapping in, 499–500 V mechanism of, 493–495 memory organization in, 499–501 Variables, 171–175 modified, 516 Variable-width bitstream packing, 192–194 task mapping in, 494f Variable-width bitstream unpacking, 195–197 task switching, 499 Vector floating point accelerator, 149 volatile, 154 Vector floating-point, 37 Von Neumann architecture, 34, 34f, Vector interrupt controller, 12 Vector interrupt controller PL190 based 408 interrupt service routine, 333, W 363–364 Vector table, 33t, 33–34, 319–320 Way and set index addressing, for D-cache Veneer, 90 cleaning, 428t, 431–434 VIC PL190 based interrupt service routine, 333, 363–364 Ways, 412 Victim, 419, 458 WEND, 631 Victim reset value, 445 WHILE, 631 Virtual address, 516 .word, 635 Virtual addresses, 492 Write buffer Virtual memory system components of, 495f description of, 403, 416–417 definition of, 491 initializing of, 465–466 demonstration of memory management units, 512–513 context switch procedure, 544 region attributes, 474–477 fixed system software regions, 521–522 Write collapsing, 417 memory management unit initialization Write combining, 417 activation of page table, 539–540 Write merging, 417 assigning of domain access, 541–542 Writeback, 418–419 overview of, 529 Writethrough, 418 page tables filled with translations, 531–538 X page tables initialized in memory, 529–531 XScale, 43 overview of, 520–521 page tables Z activation of, 539–540 data structures, 525–529 Zeros defining of, 525 count leading, 215–216 filling of, with translations, 531–538 count trailing, 215–216 Zero-wait-state memory, 164 z-transform, 295


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook