2455 Hilgard Avenue #23, Berkeley,
|My technical interests include computer processor design, computer arithmetic, signal processing, image processing, graphics rendering, operating system kernels, device drivers, computer languages, compilers, and software support libraries.|
University of California
Ph.D. in Computer Science (specializing in computer architecture, with a minor in mathematics).
M.S. in Computer Science.
University of Colorado
One year of graduate study.
North Carolina State University
B.S. in Computer Engineering (mostly electrical engineering).
B.S. in Computer Science.
University of California
Part-time development of floating-point software (mostly Berkeley SoftFloat) and hardware floating-point units (becoming Berkeley HardFloat).
Hardware and software development for Bluechip Systems’ computer contained in a microSD package (
C-code optimization of one of Droplet’s image codecs.
Design and implementation of 3Plus1’s CoolEngine, plus various programming tools.
(Technology acquired by iCelero, which later spawned Bluechip Systems.)
Berkeley Design Technology, Inc. (BDTI)
Senior DSP Engineer (part-time).
University of California
Various Graduate Student Instructor and Graduate Student Researcher positions.
International Computer Science Institute (ICSI)
Software programming for ICSI’s
Implementation of quadruple-precision floating-point in software.
|Computer architecture, hardware design|
Made various contributions to the definition of the new
|2010–2014||At iCelero, tracked down bugs in hardware units provided by various suppliers, and devised workarounds both in hardware and software.|
|2003–2011||For 3Plus1 and iCelero, defined and implemented nearly all of the CoolEngine, a multiprocessor subsystem designed primarily for streaming tasks such as video processing. Each CoolEngine processor can execute multiple operations (scalar and SIMD) per clock cycle from a compressed VLIW machine code. A CoolEngine combines several such processors with an intelligent multichannel DMA for I/O and DRAM access. Was responsible for every aspect of the CoolEngine processors, including the programming architecture, instruction caches, instruction fetch, decode, and dispatch units, datapaths, and all functional units; plus the DMA unit and other CoolEngine components. Implementation was in Verilog and a proprietary langauge (that was machine-translated to Verilog). Physical area and timing were optimized using feedback from Synopsys standard-cell synthesis.|
|2002||At BDTI, advised a major processor design company of specific SIMD-style extensions that could be made to an existing architecture to improve its performance for fixed-point digital signal processing.|
|1994–2000||For my doctoral dissertation, examined the use of an FPGA-like device as an additional microprocessor functional unit. Defined a novel processor architecture named Garp, and constructed programming tools and a cycle-accurate simulator. Implementation feasibility was studied through SPICE circuit simulations and partial VLSI layout. My research advisor was John Wawrzynek.|
|System software, firmware|
|2011–2016||Developed all of the on-chip ROM firmware used for booting Bluechip Systems’ microSD-package computer, in a mixture of C and ARM assembly code. Also created several tools and libraries for similar “bare metal” programming of the platform. With some experimentation, found a working configuration for the device’s complex DRAM controller. Wrote the basic boot code that configures the DRAM controller and MMU and then loads a program from flash memory through an internal SD bus interface. Ported Express Logic’s ThreadX library for multithreading. Wrote the initial program code to load Linux and pass control to it, including an optional cryptographic test of the integrity of the Linux code that, for speed, is performed on the device’s CoolEngine.|
|2006–2008||For 3Plus1, defined and implemented libraries and related software tools for interprocessor communication within a CoolEngine multiprocessor subsystem.|
Created a Solaris device driver to interface with ICSI’s
|Programming languages, compilers, other software development tools|
|2013||Created a complete and robust GDB bridge to support debugging software on Bluechip Systems’ computer in a microSD package, for code running on both the device’s ARM processor and its CoolEngine.|
|2004–2008||For 3Plus1, created tools and libraries to provide a practical alternative platform for testing and debugging CoolEngine software, by allowing CoolEngine source code (in C and assembly language) to be compiled and linked to run efficiently on an ordinary desktop computer. In this foreign environment, the CoolEngine’s multiple processors are simulated by transparently spawning multiple processes, and CoolEngine assembly code is handled by transparently invoking a CoolEngine processor simulator whenever an assembly-language function is called.|
|2004–2005||Invented the CoolEngine’s complex assembly language, and created the first assembler.|
Represented BDTI within the
ISO working group
Implemented a basic-block instruction scheduler within the GNU assembler
|1991–1994||For my Master’s degree, examined the need for exception-handling features in programming languages, and critiqued the main kinds of exception mechanisms that have been implemented or proposed over the years. Special attention was given to the efficient handling of arithmetic exceptions on high-speed processors.|
For U.C., Berkeley, completely rewrote the
SoftFloat and TestFloat
software packages, adding new features and optimizations.
Adaptations of SoftFloat are used in a number of projects worldwide, including
Linux for older ARM processors.
Also developed hardware functional units for the basic floating-point
arithmetic operations, implemented in both Verilog and
The hardware floating-point units have been used in several fabricated
Participated in the working group revising the IEEE standard for floating-point
Created a C++ library for BDTI that fully
implements parameterized fixed-point types.
Fixed-point formats are specified using C++ template parameters, and the
standard arithmetic operators,
Released and updated the original SoftFloat and TestFloat packages, both grown
out of work originally done for ICSI (see below).
At the time of its release, TestFloat found small flaws in the floating-point
of several commercial processors, including a flaw in the Intel Pentium Pro
that was rediscovered the next year and dubbed
Implemented floating-point and other arithmetic functions for ICSI’s
|1992||At Silicon Graphics, coded IEEE-compliant quadruple-precision floating-point in MIPS assembly language.|
|1992||Discovered an oversight in Digital Equipment Corporation’s Alpha architecture concerning floating-point subnormal numbers. The discovery resulted in a last-minute fix by Digital before the first Alpha machines were shipped.|
|Image processing, digital signal processing (DSP)|
|2015–2017||For Bluechip Systems, demonstrated the performance of the CoolEngine by crafting optimized implementations of a selection of image processing functions, notably a standard histogram of oriented gradients (HOG) reduction used for object detection, and some image filters including a Gaussian filter and a 3×3 median filter.|
|2010||For Droplet Technology, overhauled the C source code of a proprietary image codec to optimize its performance on an ARM processor.|
|2003–2009||Designed 3Plus1’s CoolEngine for efficient performance of many common DSP functions (such as FFT).|
|1999–2009||For various BDTI customers, defined and coded numerous DSP functions, and also improved the performance of several DSP applications through profiling and the recoding of critical functions, usually in hand-optimized assembly language. Speed improvements in some cases were as much as a factor of ten. For 3Plus1, performed the same services to demonstrate the CoolEngine’s capabilites.|
|1999–2003||At BDTI, helped evaluate the DSP performance of a number of processors, either with respect to specific customer needs or in accordance with BDTI’s proprietary benchmarking methodology. Participated in ongoing efforts within BDTI to refine and extend the company’s benchmarking methods.|
|2001||Together with a colleague at BDTI, converted a customer-supplied software MP3 decoder entirely from floating-point to fixed-point in order to port the decoder to processors supporting only integer operations in hardware.|
|1989||Helped crack the encoding of Adobe Type-1 fonts, before the format was publicly documented by Adobe. My contributions included expanding the set of known byte-code operators and deducing much of the font hinting mechanism.|
|I have also done some unpublished research on geometric interpolation (splines), on image scaling (changing size and/or resolution), and on dithering color images to a limited palette.|
Linux, older Microsoft Windows, Solaris,
POSIX, Linux, UNIX
|Software development tools:||
|Processor architectures and assembly languages:||
|Hardware description languages:||
|Hardware development tools:||Verilator, Vivado (Non-Project Mode scripting only), Icarus Verilog, OpenOCD, some GTKWave.|
|FPGA architectures:||Xilinx 7 Series.|
SD and MMC (for SD cards, etc.), AXI, SPI,
|Document definition languages:||LaTeX, TeX, simple HTML, some PostScript.|
|Source control tools:||Git, Subversion, CVS.|
“The SFRA: A Corner-Turn FPGA Architecture.”
Nicholas Weaver, John Hauser, and John Wawrzynek.
Proceedings of the 2004 ACM/SIGDA 12th International Symposium on
Field-Programmable Gate Arrays
“The Garp Architecture and
“A Fixed-Point Recursive Digital Oscillator for Additive Synthesis.”
Todd Hodes, John Hauser, John Wawrzynek, Adrian Freed, and David Wessel.
Proceedings of the IEEE International Conference on Acoustics, Speech,
and Signal Processing
“Garp: A MIPS Processor with a Reconfigurable Coprocessor.”
“Handling Floating-Point Exceptions in Numeric Programs.”