Get Your YouTube Summary:
- Copy the video link.
- Paste it into the input field.
- Click 'Summarize' to get your summary with timestamps.
Important Note on Subtitles:
- Automatic summary generation requires English subtitles on the video.
- If the video has no English subtitles, the automatic download of the transcript using the link will fail.
- Manual Alternative: You can still get a summary!
- Find the transcript on YouTube (usually below the video description when viewed on a desktop browser).
- Copy the entire transcript text manually. (Need help finding/copying? Watch the 'Demo Video' linked at the top right of this page).
- (Optional) Add any additional instructions after the transcript (e.g., 'Translate the summary to German.', 'Add a glossary of medical terms and jargon to the summary.').
For videos longer than 20 minutes:
- Select a Pro model for automatic summarization. Note that Pro usage is limited daily.
- If the Pro limit is reached (or if you prefer using your own tool), use the Copy Prompt button, paste the prompt into your AI tool, and run it there.
- 0:00:32 Goal: The primary aim is to compare two HP 5061A atomic clocks to assess their relative stability, as a precursor to attempting a measurement of relativistic time dilation (inspired by the Hafele-Keating experiment).
- 0:01:02 Initial Check: The first step is confirming the clocks' stability relative to each other while stationary in the lab, aiming for sub-nanosecond drift over days.
- 0:01:50 Measurement Setup: Comparing two imperfect clocks means only their relative difference can be measured. The setup aims to precisely track this difference.
- 0:02:13 Clock Outputs: The HP 5061A units are frequency standards providing 5 MHz, 1 MHz, and 100 kHz outputs, unlike the time-of-day clocks used in the original Hafele-Keating experiment.
- 0:02:43 1PPS Generation: HP 59308A Timing Generators are used to divide the 5 MHz output from each clock down to a precise 1 pulse-per-second (1PPS) signal.
- 0:03:31 Time Interval Measurement: An HP 5334B counter measures the time interval between the 1PPS signals from the two clocks with 0.1 ns resolution. The counter's math function zeros the initial arbitrary delay.
- 0:04:04 Counter Reference Challenge: The counter requires a 10 MHz reference, but the clocks output 5 MHz.
- 0:04:17 Frequency Doubling Solution: A restored HP 5087A Distribution Amplifier with its frequency doubler option converts the 5 MHz output from one clock to the required 10 MHz reference for the counter.
- 0:04:56 Stability Metric (Allan Variance): Allan Variance is introduced as the standard measure for clock stability over different time scales. The target stability for the relativity experiment (around 1 part in 10^13) is noted as challenging for these specific clocks.
- 0:06:52 First Overnight Result: The initial overnight test shows a significant relative drift of ~35 nanoseconds, far too unstable for the intended experiment.
- 0:08:31 Second Overnight Result: After restarting, the drift is lower (~6.7 nanoseconds) but still too high.
- 0:08:58 C-Field Adjustment: An attempt is made to finely tune the frequency of one clock relative to the other using its C-field adjustment (tuning the magnetic field affecting the cesium resonance), aiming for minimal relative drift.
- 0:10:31 Characterization vs. Tuning: It's noted that the original Hafele-Keating experiment characterized clock drift against a master standard rather than attempting to tune the clocks perfectly relative to each other.
- 0:11:04 Long-Term Logging: A vintage, all-HP logging system is assembled to record the drift continuously.
- 0:11:08 Logging Components: The system uses an HP 7132A chart recorder driven by an HP 59501B DAC, controlled by an HP 85 computer via the HP-IB bus. An HP 3478A DVM monitors the DAC output.
- 0:11:56 Control Program: A simple 13-line BASIC program on the HP 85 reads the time interval from the counter, scales it, and sends the value to the DAC for plotting.
- 0:14:10 Third Result (Logged): The chart recording shows ~14 nanoseconds of drift overnight.
- 0:14:20 Drift Instability: Crucially, the recording reveals the drift isn't smooth but includes significant, abrupt jumps, indicating underlying instability.
- 0:14:44 Conclusion: The clocks (or at least one of them) are currently too unstable, exhibiting both drift and jumps ("atomic gremlins").
- 0:15:03 Measurement System Success: Despite the clock instability, the entire vintage HP measurement and logging system performed flawlessly.
- 0:15:11 Next Steps: More atomic clock repair work is required before the desired stability measurements and the subsequent relativity experiment can be successfully performed.
- 0:00:47 Why Create Mojo? Mojo wasn't the initial goal of Modular. It emerged from the need to program next-generation hardware (GPUs, accelerators) effectively, leveraging the MLIR compiler framework, and requiring a user-friendly syntax, leading them to build upon Python.
- 0:01:21 Learning from Swift: Creating a new language syntax is complex and time-consuming. Mojo prioritizes powerful semantics, compiler infrastructure for performance, and meeting the large existing Python community where they are.
- 0:03:55 Python Implementation Limitations: CPython interpreter isn't very fast, lacks modern compiler techniques, has packaging challenges, and fundamentally cannot target GPUs or embedded systems requiring minimal dependencies.
- 0:04:56 Importance of API Design: Good API design is crucial, including what it disallows. Swift's string API (bytes vs. grapheme clusters) is praised for simplifying by removing confusing concepts. The goal is to make safe defaults easy to reach for.
- 0:09:29 Mojo as "Python++": Aims to be a superset of Python over time, allowing use of the existing ecosystem. It adds new capabilities (like
struct
for value types) alongside Python's features (class
). The goal is to allow gradual adoption and upgrading of Python code, potentially replacing complex Python+C extensions with pure Mojo. - 0:17:48 Against Magic Optimization: Mojo deliberately avoids trying to magically speed up unmodified Python code. Such approaches (like PyPy, Codon) create unpredictable performance cliffs and reduce programmer control. Mojo instead provides explicit tools (types, structs,
fn
) to unlock performance, working backward from hardware capabilities. - 0:25:09 Compiler Strategy (MLIR + LLVM): Mojo uses MLIR for high-level, parallelizable optimizations (like its own inliner) and selectively uses LLVM for backend code generation, disabling parts like LLVM's auto-vectorizer to ensure predictability.
- 0:28:15 Balancing Control and Dynamism: Mojo offers
def
for Python-like dynamic behavior (defaulting to object types) andfn
for systems programming requiring explicit types and offering more control and performance predictability. This caters to different user needs within one language. - 0:39:03 Compile-Time Metaprogramming: Inspired by languages like Zig, Mojo uses the same language for compile-time and runtime code, simplifying macros, templates, etc., enabling powerful library abstractions.
- 0:39:41 First-Class SIMD: Mojo treats SIMD types as fundamental, defined in the standard library using compile-time parameters for element type and width. Scalar types (like
Float32
) are defined as single-element SIMD vectors (SIMD<Float32, 1>
), allowing numeric libraries (likesin
,cos
) to be written generically for both scalars and vectors. - 0:43:47 Hardware Abstraction: Low-level hardware intrinsics (Neon, AVX, etc.) are exposed. Libraries can use compile-time checks (
if target == ...
) to use specific hardware features, abstracting hardware differences away from the end-user. Mojo pushes complexity from the compiler into libraries. - 0:52:51 Functional Programming Influence & Mutation: Mojo embraces value semantics (like Swift) to prevent spooky action at a distance but allows efficient in-place mutation where appropriate (e.g., via copy-on-write or explicit unsafe code in library internals), unlike pure functional languages that can suffer performance penalties.
- 1:02:42 Integer Overflow Handling: Mojo consistently uses wrapping (two's complement) semantics for integer overflow for both signed and unsigned types. This differs from C++ (UB for signed) and Swift (traps by default) but is necessary for SIMD performance and predictability. It distinguishes between Python's arbitrary-precision
int
(heap object) and Mojo's fixed-widthInt
(struct, fast). - 1:13:35 Type Inference Approach: Mojo avoids complex Hindley-Milner type inference (which caused issues in Swift due to interaction with overloading, literals, etc.). It uses a simpler, more localized type checking approach consistent with Python's dynamic feel, aiming for predictable compile times and clear error messages. Function signatures are generally required for
fn
functions. - 1:29:11 Operator Overloading Strategy: Mojo currently allows overloading existing Python operators via Dunder methods (
__add__
, etc.). It avoids adding custom operators (like Swift did) to keep the language simpler initially, recognizing that the complexity might not pay for itself. This allows libraries to define numeric types (likeComplex
,BFloat16
) without needing compiler changes. - 1:38:50 Complex Numbers & Hardware: While Mojo doesn't privilege complex numbers, its library-centric design and compile-time features allow a
Complex
type's multiply operator to internally check the target hardware and use specialized instructions (if available on platforms like IBM mainframes) for acceleration. - 1:41:37 Arbitrary Bitwidth Integers: MLIR supports arbitrary bitwidths. While not exposed in Mojo's standard library yet, it could be added purely as a library feature without compiler changes.
- 1:43:43 Allocator Philosophy: Mojo currently doesn't have a pervasive allocator parameter system like Zig. Instead, it focuses on providing different types with different allocation strategies (e.g., a
List
type potentially taking a parameter for inline storage likeSmallVector
) rather than making allocators a universal generic parameter. - 1:47:32 Slices and Ownership: Mojo has an ownership system, allowing efficient, safe slices that refer back to the original data structure's lifetime, often using copy-on-write semantics.
- 1:48:23 Community and Status: Mojo is still early but has an active community on Discord. It's being used for various applications beyond AI, including web development and GUIs. Users are encouraged to explore the documentation and get involved.
- 0:00:51 Event Kick-off: Welcome to the first Modular Community Meetup, outlining talks by Chris Lattner (CEO) on Mojo/Max and Jack Clayton (Engineer) with a Mojo GPU programming demo.
- 0:03:30 Chris Lattner: The Problem with GPU Software: AI's importance is hampered by poor, complex software stacks, particularly around CUDA, which has become a "rickety moat" and a "swamp."
- 0:07:12 CUDA's Shortcomings: Identified issues include vendor lock-in (Nvidia only), hardware limitations (GPU-only, not even Nvidia CPUs), proprietary nature hindering innovation, poor Python integration, C++ complexities, and massive container sizes (15-50GB).
- 0:09:43 Flawed Solutions - Fixed Function: Approaches like NIMS optimize specific models but sacrifice the flexibility and innovation core to AI's value ("dystopian").
- 0:12:26 DeepSeek as a Catalyst: Highlighted performance gains possible by bypassing abstractions (going to PTX) but showed this requires immense resources, making it inaccessible for most ("dystopian"). Democratization is needed.
- 0:14:32 Why Alternatives Failed (OpenCL, Compilers, Triton, MLIR):
- OpenCL: Suffered from committee-driven slowness vs. CUDA's focused drive.
- AI Compilers (XLA/TVM): Hit performance ceilings and bottlenecked on compiler expertise; struggled to adapt to GenAI's complexity (e.g., FlashAttention).
- Triton: Improved usability (Pythonic, tile-based) but sacrifices performance (~20%), often rewritten in CUDA C++, and lacks portability/ASIC support. Validated need for usability but can't lose performance.
- MLIR: Successful infrastructure but doesn't provide a full AI stack; ecosystem projects often replicated XLA's limitations.
- 0:23:47 Modular's Approach: Rebuilding the Stack: A multi-year research effort to create new abstractions (Mojo language, Max engine) to fundamentally change the game.
- 0:25:17 The Modular Stack:
- Mojo: A Pythonic systems programming language designed for heterogeneous compute (CPUs, GPUs, ASICs), providing full hardware access and performance without C++ pain points. Crucially, Modular's GPU code is written in Mojo, entirely replacing CUDA, CuDNN, Cutlass etc.
- Max: An AI engine (initially inference-focused) built on Mojo, offering Python APIs, predictability, GenAI design, kernel fusion, and serving components.
- Enterprise Platform: Higher-level tools for managing large-scale GPU deployments.
- 0:31:43 Deployment & Portability: Simple Docker deployment. Key Reveal: The same compiled binary currently runs across both Nvidia (A100/H100) and AMD (MI300X) GPUs.
- 0:32:25 Major Announcements: Licensing & Open Source:
- New Modular Community License: Mojo & Max are free for non-commercial use, AND free for commercial/production use on x86/Nvidia hardware (with attribution). Goal: remove barriers to adoption.
- Impending Open Source: ~250,000 lines of high-performance Mojo GPU kernels (matmul, FlashAttention, etc.) for multiple GPUs coming in ~1-2 weeks under Apache 2.0 license.
- Mojo Compiler Open Source: Still planned (target tentative: 2026).
- Pip Support: Coming soon for easier integration.
- 0:39:09 Resources:
builds.modular.com
provides access to hundreds of optimized models and easy deployment commands. Recipes for common AI tasks available. - 0:40:45 Q&A Highlights (Chris Lattner): Mojo is general-purpose but currently optimized for AI use cases; Mac GPU support targeted for summer; C interop exists, Rust possible via C; Mojo supports various accelerators (if they have a program counter); Mojo uses powerful metaprogramming (inspired by Zig's
comptime
, dependent types) to build high-level abstractions (like tile-based programming) as libraries, not hardcoded in the compiler, offering more power and flexibility than Triton. - 1:01:03 Jack Clayton: Mojo GPU Programming Demo: Introduction to writing GPU kernels in Mojo.
- 1:01:31 Mandelbrot Kernel: Live coding and execution of a Mandelbrot kernel on an Nvidia A100 GPU using Mojo syntax (
gpu_kernel
,thread_idx
,LayoutTensor
). - 1:03:20 Assembly Inspection: Demonstrated dumping the generated PTX assembly code directly from Mojo.
- 1:03:53 Custom Ops for Max: Showcased creating Mojo GPU kernels (grayscale, brightness, blur) and integrating them as custom operations within a Python script using the Max engine for an image processing pipeline.
- 1:08:48 Cross-Vendor Portability Demo: Key Demo: Ran the exact same Mojo code (Mandelbrot, image pipeline, even low-level warp shuffles) successfully on an AMD MI300X GPU, demonstrating true write-once-run-anywhere capability between vendors. Showed dumping AMD GPU assembly.
- 1:09:47 Demo Resources & Challenge: Provided links to GPU guides, example repositories, and issued a challenge to refactor a kernel for swag.
- 1:11:28 Q&A Highlights (Jack Clayton): Apple Metal support not yet available (planned); confirmed AMD GPU was MI300X; Collab support being investigated; graphics/video possible but not specifically explored yet.
- 1:14:11 Closing: Call for hiring announcements, mention of upcoming hackathon and community forum.
identifier: 2106
model: gemini-2.5-pro-exp-03-25| input-price: 1.25 output-price: 10.0 max-context-length: 128_000
host: 217.15.157.112
https://www.youtube.com/watch?v=RzOQHazjXvEinclude_comments: None
include_timestamps: 1
include_glossary: None
output_language: en
cost: 0.0305525
Okay, here is the abstract and summary for the provided transcript, following the requested format. *Abstract:* This video documents an attempt to measure the relative frequency stability of two vintage HP 5061A cesium atomic clocks as a preliminary step towards potentially observing relativistic time dilation effects, similar to the Hafele-Keating experiment. The measurement setup utilizes HP timing generators to derive 1-pulse-per-second (1PPS) signals from each clock's 5 MHz output, an HP 5334B counter to measure the time interval between these pulses, and a restored HP 5087A distribution amplifier with a frequency doubler providing the necessary 10 MHz reference for the counter. Allan Variance is discussed as the relevant stability metric. Initial measurements revealed significant relative drift (tens of nanoseconds overnight), far exceeding the stability required for relativistic measurements. Attempts were made to improve stability by finely adjusting the C-field (magnetic field) of one clock. A vintage HP-IB logging system, comprising an HP 85 computer, HP 59501B DAC, and HP 7132A chart recorder, was implemented to track the drift over time. Despite the successful operation of the all-HP measurement chain, the logged data showed continued drift and discrete jumps in the time difference, indicating persistent instability likely originating from one of the clocks. The experiment concludes that further clock repair is necessary before achieving the required stability, while highlighting the successful performance of the vintage test equipment setup. *Comparing Two Atomic Clocks: Stability Test for a Relativity Experiment* * *0:00:32 Goal:* The primary aim is to compare two HP 5061A atomic clocks to assess their relative stability, as a precursor to attempting a measurement of relativistic time dilation (inspired by the Hafele-Keating experiment). * *0:01:02 Initial Check:* The first step is confirming the clocks' stability relative to each other while stationary in the lab, aiming for sub-nanosecond drift over days. * *0:01:50 Measurement Setup:* Comparing two imperfect clocks means only their relative difference can be measured. The setup aims to precisely track this difference. * *0:02:13 Clock Outputs:* The HP 5061A units are frequency standards providing 5 MHz, 1 MHz, and 100 kHz outputs, unlike the time-of-day clocks used in the original Hafele-Keating experiment. * *0:02:43 1PPS Generation:* HP 59308A Timing Generators are used to divide the 5 MHz output from each clock down to a precise 1 pulse-per-second (1PPS) signal. * *0:03:31 Time Interval Measurement:* An HP 5334B counter measures the time interval between the 1PPS signals from the two clocks with 0.1 ns resolution. The counter's math function zeros the initial arbitrary delay. * *0:04:04 Counter Reference Challenge:* The counter requires a 10 MHz reference, but the clocks output 5 MHz. * *0:04:17 Frequency Doubling Solution:* A restored HP 5087A Distribution Amplifier with its frequency doubler option converts the 5 MHz output from one clock to the required 10 MHz reference for the counter. * *0:04:56 Stability Metric (Allan Variance):* Allan Variance is introduced as the standard measure for clock stability over different time scales. The target stability for the relativity experiment (around 1 part in 10^13) is noted as challenging for these specific clocks. * *0:06:52 First Overnight Result:* The initial overnight test shows a significant relative drift of ~35 nanoseconds, far too unstable for the intended experiment. * *0:08:31 Second Overnight Result:* After restarting, the drift is lower (~6.7 nanoseconds) but still too high. * *0:08:58 C-Field Adjustment:* An attempt is made to finely tune the frequency of one clock relative to the other using its C-field adjustment (tuning the magnetic field affecting the cesium resonance), aiming for minimal relative drift. * *0:10:31 Characterization vs. Tuning:* It's noted that the original Hafele-Keating experiment characterized clock drift against a master standard rather than attempting to tune the clocks perfectly relative to each other. * *0:11:04 Long-Term Logging:* A vintage, all-HP logging system is assembled to record the drift continuously. * *0:11:08 Logging Components:* The system uses an HP 7132A chart recorder driven by an HP 59501B DAC, controlled by an HP 85 computer via the HP-IB bus. An HP 3478A DVM monitors the DAC output. * *0:11:56 Control Program:* A simple 13-line BASIC program on the HP 85 reads the time interval from the counter, scales it, and sends the value to the DAC for plotting. * *0:14:10 Third Result (Logged):* The chart recording shows ~14 nanoseconds of drift overnight. * *0:14:20 Drift Instability:* Crucially, the recording reveals the drift isn't smooth but includes significant, abrupt jumps, indicating underlying instability. * *0:14:44 Conclusion:* The clocks (or at least one of them) are currently too unstable, exhibiting both drift and jumps ("atomic gremlins"). * *0:15:03 Measurement System Success:* Despite the clock instability, the entire vintage HP measurement and logging system performed flawlessly. * *0:15:11 Next Steps:* More atomic clock repair work is required before the desired stability measurements and the subsequent relativity experiment can be successfully performed. I used gemini-2.5-pro-exp-03-25| input-price: 1.25 output-price: 10.0 max-context-length: 128_000 on rocketrecap dot com to summarize the transcript. Cost (if I didn't use the free tier): $0.03 Input tokens: 14098 Output tokens: 1293
Okay, here is the abstract and summary for the provided transcript, following the requested format.
Abstract:
This video documents an attempt to measure the relative frequency stability of two vintage HP 5061A cesium atomic clocks as a preliminary step towards potentially observing relativistic time dilation effects, similar to the Hafele-Keating experiment. The measurement setup utilizes HP timing generators to derive 1-pulse-per-second (1PPS) signals from each clock's 5 MHz output, an HP 5334B counter to measure the time interval between these pulses, and a restored HP 5087A distribution amplifier with a frequency doubler providing the necessary 10 MHz reference for the counter. Allan Variance is discussed as the relevant stability metric. Initial measurements revealed significant relative drift (tens of nanoseconds overnight), far exceeding the stability required for relativistic measurements. Attempts were made to improve stability by finely adjusting the C-field (magnetic field) of one clock. A vintage HP-IB logging system, comprising an HP 85 computer, HP 59501B DAC, and HP 7132A chart recorder, was implemented to track the drift over time. Despite the successful operation of the all-HP measurement chain, the logged data showed continued drift and discrete jumps in the time difference, indicating persistent instability likely originating from one of the clocks. The experiment concludes that further clock repair is necessary before achieving the required stability, while highlighting the successful performance of the vintage test equipment setup.
Comparing Two Atomic Clocks: Stability Test for a Relativity Experiment
Below, I will provide input for an example video (comprising of title, description, and transcript, in this order) and the corresponding abstract and summary I expect. Afterward, I will provide a new transcript that I want you to summarize in the same format. **Please give an abstract of the transcript and then summarize the transcript in a self-contained bullet list format.** Include starting timestamps, important details and key takeaways. Example Input: Fluidigm Polaris Part 2- illuminator and camera mikeselectricstuff 131K subscribers Subscribed 369 Share Download Clip Save 5,857 views Aug 26, 2024 Fluidigm Polaris part 1 : • Fluidigm Polaris (Part 1) - Biotech g... Ebay listings: https://www.ebay.co.uk/usr/mikeselect... Merch https://mikeselectricstuff.creator-sp... Transcript Follow along using the transcript. Show transcript mikeselectricstuff 131K subscribers Videos About Support on Patreon 40 Comments @robertwatsonbath 6 hours ago Thanks Mike. Ooof! - with the level of bodgery going on around 15:48 I think shame would have made me do a board re spin, out of my own pocket if I had to. 1 Reply @Muonium1 9 hours ago The green LED looks different from the others and uses phosphor conversion because of the "green gap" problem where green InGaN emitters suffer efficiency droop at high currents. Phosphide based emitters don't start becoming efficient until around 600nm so also can't be used for high power green emitters. See the paper and plot by Matthias Auf der Maur in his 2015 paper on alloy fluctuations in InGaN as the cause of reduced external quantum efficiency at longer (green) wavelengths. 4 Reply 1 reply @tafsirnahian669 10 hours ago (edited) Can this be used as an astrophotography camera? Reply mikeselectricstuff · 1 reply @mikeselectricstuff 6 hours ago Yes, but may need a shutter to avoid light during readout Reply @2010craggy 11 hours ago Narrowband filters we use in Astronomy (Astrophotography) are sided- they work best passing light in one direction so I guess the arrows on the filter frames indicate which way round to install them in the filter wheel. 1 Reply @vitukz 12 hours ago A mate with Channel @extractions&ire could use it 2 Reply @RobertGallop 19 hours ago That LED module says it can go up to 28 amps!!! 21 amps for 100%. You should see what it does at 20 amps! Reply @Prophes0r 19 hours ago I had an "Oh SHIT!" moment when I realized that the weird trapezoidal shape of that light guide was for keystone correction of the light source. Very clever. 6 Reply @OneBiOzZ 20 hours ago given the cost of the CCD you think they could have run another PCB for it 9 Reply @tekvax01 21 hours ago $20 thousand dollars per minute of run time! 1 Reply @tekvax01 22 hours ago "We spared no expense!" John Hammond Jurassic Park. *(that's why this thing costs the same as a 50-seat Greyhound Bus coach!) Reply @florianf4257 22 hours ago The smearing on the image could be due to the fact that you don't use a shutter, so you see brighter stripes under bright areas of the image as you still iluminate these pixels while the sensor data ist shifted out towards the top. I experienced this effect back at university with a LN-Cooled CCD for Spectroscopy. The stripes disapeared as soon as you used the shutter instead of disabling it in the open position (but fokussing at 100ms integration time and continuous readout with a focal plane shutter isn't much fun). 12 Reply mikeselectricstuff · 1 reply @mikeselectricstuff 12 hours ago I didn't think of that, but makes sense 2 Reply @douro20 22 hours ago (edited) The red LED reminds me of one from Roithner Lasertechnik. I have a Symbol 2D scanner which uses two very bright LEDs from that company, one red and one red-orange. The red-orange is behind a lens which focuses it into an extremely narrow beam. 1 Reply @RicoElectrico 23 hours ago PFG is Pulse Flush Gate according to the datasheet. Reply @dcallan812 23 hours ago Very interesting. 2x Reply @littleboot_ 1 day ago Cool interesting device Reply @dav1dbone 1 day ago I've stripped large projectors, looks similar, wonder if some of those castings are a magnesium alloy? Reply @kevywevvy8833 1 day ago ironic that some of those Phlatlight modules are used in some of the cheapest disco lights. 1 Reply 1 reply @bill6255 1 day ago Great vid - gets right into subject in title, its packed with information, wraps up quickly. Should get a YT award! imho 3 Reply @JAKOB1977 1 day ago (edited) The whole sensor module incl. a 5 grand 50mpix sensor for 49 £.. highest bid atm Though also a limited CCD sensor, but for the right buyer its a steal at these relative low sums. Architecture Full Frame CCD (Square Pixels) Total Number of Pixels 8304 (H) × 6220 (V) = 51.6 Mp Number of Effective Pixels 8208 (H) × 6164 (V) = 50.5 Mp Number of Active Pixels 8176 (H) × 6132 (V) = 50.1 Mp Pixel Size 6.0 m (H) × 6.0 m (V) Active Image Size 49.1 mm (H) × 36.8 mm (V) 61.3 mm (Diagonal), 645 1.1x Optical Format Aspect Ratio 4:3 Horizontal Outputs 4 Saturation Signal 40.3 ke− Output Sensitivity 31 V/e− Quantum Efficiency KAF−50100−CAA KAF−50100−AAA KAF−50100−ABA (with Lens) 22%, 22%, 16% (Peak R, G, B) 25% 62% Read Noise (f = 18 MHz) 12.5 e− Dark Signal (T = 60°C) 42 pA/cm2 Dark Current Doubling Temperature 5.7°C Dynamic Range (f = 18 MHz) 70.2 dB Estimated Linear Dynamic Range (f = 18 MHz) 69.3 dB Charge Transfer Efficiency Horizontal Vertical 0.999995 0.999999 Blooming Protection (4 ms Exposure Time) 800X Saturation Exposure Maximum Date Rate 18 MHz Package Ceramic PGA Cover Glass MAR Coated, 2 Sides or Clear Glass Features • TRUESENSE Transparent Gate Electrode for High Sensitivity • Ultra-High Resolution • Board Dynamic Range • Low Noise Architecture • Large Active Imaging Area Applications • Digitization • Mapping/Aerial • Photography • Scientific Thx for the tear down Mike, always a joy Reply @martinalooksatthings 1 day ago 15:49 that is some great bodging on of caps, they really didn't want to respin that PCB huh 8 Reply @RhythmGamer 1 day ago Was depressed today and then a new mike video dropped and now I’m genuinely happy to get my tear down fix 1 Reply @dine9093 1 day ago (edited) Did you transfrom into Mr Blobby for a moment there? 2 Reply @NickNorton 1 day ago Thanks Mike. Your videos are always interesting. 5 Reply @KeritechElectronics 1 day ago Heavy optics indeed... Spare no expense, cost no object. Splendid build quality. The CCD is a thing of beauty! 1 Reply @YSoreil 1 day ago The pricing on that sensor is about right, I looked in to these many years ago when they were still in production since it's the only large sensor you could actually buy. Really cool to see one in the wild. 2 Reply @snik2pl 1 day ago That leds look like from led projector Reply @vincei4252 1 day ago TDI = Time Domain Integration ? 1 Reply @wolpumba4099 1 day ago (edited) Maybe the camera should not be illuminated during readout. From the datasheet of the sensor (Onsemi): saturation 40300 electrons, read noise 12.5 electrons per pixel @ 18MHz (quite bad). quantum efficiency 62% (if it has micro lenses), frame rate 1 Hz. lateral overflow drain to prevent blooming protects against 800x (factor increases linearly with exposure time) saturation exposure (32e6 electrons per pixel at 4ms exposure time), microlens has +/- 20 degree acceptance angle i guess it would be good for astrophotography 4 Reply @txm100 1 day ago (edited) Babe wake up a new mikeselectricstuff has dropped! 9 Reply @vincei4252 1 day ago That looks like a finger-lakes filter wheel, however, for astronomy they'd never use such a large stepper. 1 Reply @MRooodddvvv 1 day ago yaaaaay ! more overcomplicated optical stuff ! 4 Reply 1 reply @NoPegs 1 day ago He lives! 11 Reply 1 reply Transcript 0:00 so I've stripped all the bits of the 0:01 optical system so basically we've got 0:03 the uh the camera 0:05 itself which is mounted on this uh very 0:09 complex 0:10 adjustment thing which obviously to set 0:13 you the various tilt and uh alignment 0:15 stuff then there's two of these massive 0:18 lenses I've taken one of these apart I 0:20 think there's something like about eight 0:22 or nine Optical elements in here these 0:25 don't seem to do a great deal in terms 0:26 of electr magnification they're obiously 0:28 just about getting the image to where it 0:29 uh where it needs to be just so that 0:33 goes like that then this Optical block I 0:36 originally thought this was made of some 0:37 s crazy heavy material but it's just 0:39 really the sum of all these Optical bits 0:41 are just ridiculously heavy those lenses 0:43 are about 4 kilos each and then there's 0:45 this very heavy very solid um piece that 0:47 goes in the middle and this is so this 0:49 is the filter wheel assembly with a 0:51 hilariously oversized steper 0:53 motor driving this wheel with these very 0:57 large narrow band filters so we've got 1:00 various different shades of uh 1:03 filters there five Al together that 1:06 one's actually just showing up a silver 1:07 that's actually a a red but fairly low 1:10 transmission orangey red blue green 1:15 there's an excess cover on this side so 1:16 the filters can be accessed and changed 1:19 without taking anything else apart even 1:21 this is like ridiculous it's like solid 1:23 aluminium this is just basically a cover 1:25 the actual wavelengths of these are um 1:27 488 525 570 630 and 700 NM not sure what 1:32 the suffix on that perhaps that's the uh 1:34 the width of the spectral line say these 1:37 are very narrow band filters most of 1:39 them are you very little light through 1:41 so it's still very tight narrow band to 1:43 match the um fluoresence of the dies 1:45 they're using in the biochemical process 1:48 and obviously to reject the light that's 1:49 being fired at it from that Illuminator 1:51 box and then there's a there's a second 1:53 one of these lenses then the actual sort 1:55 of samples below that so uh very serious 1:58 amount of very uh chunky heavy Optics 2:01 okay let's take a look at this light 2:02 source made by company Lumen Dynamics 2:04 who are now part of 2:06 excelitas self-contained unit power 2:08 connector USB and this which one of the 2:11 Cable Bundle said was a TTL interface 2:14 USB wasn't used in uh the fluid 2:17 application output here and I think this 2:19 is an input for um light feedback I 2:21 don't if it's regulated or just a measur 2:23 measurement facility and the uh fiber 2:27 assembly 2:29 Square Inlet there and then there's two 2:32 outputs which have uh lens assemblies 2:35 and this small one which goes back into 2:37 that small Port just Loops out of here 2:40 straight back in So on this side we've 2:42 got the electronics which look pretty 2:44 straightforward we've got a bit of power 2:45 supply stuff over here and we've got 2:48 separate drivers for each wavelength now 2:50 interesting this is clearly been very 2:52 specifically made for this application 2:54 you I was half expecting like say some 2:56 generic drivers that could be used for a 2:58 number of different things but actually 3:00 literally specified the exact wavelength 3:02 on the PCB there is provision here for 3:04 385 NM which isn't populated but this is 3:07 clearly been designed very specifically 3:09 so these four drivers look the same but 3:10 then there's two higher power ones for 3:12 575 and 3:14 520 a slightly bigger heat sink on this 3:16 575 section there a p 24 which is 3:20 providing USB interface USB isolator the 3:23 USB interface just presents as a comport 3:26 I did have a quick look but I didn't 3:27 actually get anything sensible um I did 3:29 dump the Pi code out and there's a few 3:31 you a few sort of commands that you 3:32 could see in text but I didn't actually 3:34 manage to get it working properly I 3:36 found some software for related version 3:38 but it didn't seem to want to talk to it 3:39 but um I say that wasn't used for the 3:41 original application it might be quite 3:42 interesting to get try and get the Run 3:44 hours count out of it and the TTL 3:46 interface looks fairly straightforward 3:48 we've got positions for six opto 3:50 isolators but only five five are 3:52 installed so that corresponds with the 3:54 unused thing so I think this hopefully 3:56 should be as simple as just providing a 3:57 ttrl signal for each color to uh enable 4:00 it a big heat sink here which is there I 4:03 think there's like a big S of metal 4:04 plate through the middle of this that 4:05 all the leads are mounted on the other 4:07 side so this is heat sinking it with a 4:09 air flow from a uh just a fan in here 4:13 obviously don't have the air flow 4:14 anywhere near the Optics so conduction 4:17 cool through to this plate that's then 4:18 uh air cooled got some pots which are 4:21 presumably power 4:22 adjustments okay let's take a look at 4:24 the other side which is uh much more 4:27 interesting see we've got some uh very 4:31 uh neatly Twisted cable assemblies there 4:35 a bunch of leads so we've got one here 4:37 475 up here 430 NM 630 575 and 520 4:44 filters and dcro mirrors a quick way to 4:48 see what's white is if we just shine 4:49 some white light through 4:51 here not sure how it is is to see on the 4:54 camera but shining white light we do 4:55 actually get a bit of red a bit of blue 4:57 some yellow here so the obstacle path 5:00 575 it goes sort of here bounces off 5:03 this mirror and goes out the 520 goes 5:07 sort of down here across here and up 5:09 there 630 goes basically straight 5:13 through 5:15 430 goes across there down there along 5:17 there and the 475 goes down here and 5:20 left this is the light sensing thing 5:22 think here there's just a um I think 5:24 there a photo diode or other sensor 5:26 haven't actually taken that off and 5:28 everything's fixed down to this chunk of 5:31 aluminium which acts as the heat 5:32 spreader that then conducts the heat to 5:33 the back side for the heat 5:35 sink and the actual lead packages all 5:38 look fairly similar except for this one 5:41 on the 575 which looks quite a bit more 5:44 substantial big spay 5:46 Terminals and the interface for this 5:48 turned out to be extremely simple it's 5:50 literally a 5V TTL level to enable each 5:54 color doesn't seem to be any tensity 5:56 control but there are some additional 5:58 pins on that connector that weren't used 5:59 in the through time thing so maybe 6:01 there's some extra lines that control 6:02 that I couldn't find any data on this uh 6:05 unit and the um their current product 6:07 range is quite significantly different 6:09 so we've got the uh blue these 6:13 might may well be saturating the camera 6:16 so they might look a bit weird so that's 6:17 the 430 6:18 blue the 575 6:24 yellow uh 6:26 475 light blue 6:29 the uh 520 6:31 green and the uh 630 red now one 6:36 interesting thing I noticed for the 6:39 575 it's actually it's actually using a 6:42 white lead and then filtering it rather 6:44 than using all the other ones are using 6:46 leads which are the fundamental colors 6:47 but uh this is actually doing white and 6:50 it's a combination of this filter and 6:52 the dichroic mirrors that are turning to 6:55 Yellow if we take the filter out and a 6:57 lot of the a lot of the um blue content 7:00 is going this way the red is going 7:02 straight through these two mirrors so 7:05 this is clearly not reflecting much of 7:08 that so we end up with the yellow coming 7:10 out of uh out of there which is a fairly 7:14 light yellow color which you don't 7:16 really see from high intensity leads so 7:19 that's clearly why they've used the 7:20 white to uh do this power consumption of 7:23 the white is pretty high so going up to 7:25 about 2 and 1 half amps on that color 7:27 whereas most of the other colors are 7:28 only drawing half an amp or so at 24 7:30 volts the uh the green is up to about 7:32 1.2 but say this thing is uh much 7:35 brighter and if you actually run all the 7:38 colors at the same time you get a fairly 7:41 reasonable um looking white coming out 7:43 of it and one thing you might just be 7:45 out to notice is there is some sort 7:46 color banding around here that's not 7:49 getting uh everything s completely 7:51 concentric and I think that's where this 7:53 fiber optic thing comes 7:58 in I'll 8:00 get a couple of Fairly accurately shaped 8:04 very sort of uniform color and looking 8:06 at What's um inside here we've basically 8:09 just got this Square Rod so this is 8:12 clearly yeah the lights just bouncing 8:13 off all the all the various sides to um 8:16 get a nice uniform illumination uh this 8:19 back bit looks like it's all potted so 8:21 nothing I really do to get in there I 8:24 think this is fiber so I have come 8:26 across um cables like this which are 8:27 liquid fill but just looking through the 8:30 end of this it's probably a bit hard to 8:31 see it does look like there fiber ends 8:34 going going on there and so there's this 8:36 feedback thing which is just obviously 8:39 compensating for the any light losses 8:41 through here to get an accurate 8:43 representation of uh the light that's 8:45 been launched out of these two 8:47 fibers and you see uh 8:49 these have got this sort of trapezium 8:54 shape light guides again it's like a 8:56 sort of acrylic or glass light guide 9:00 guess projected just to make the right 9:03 rectangular 9:04 shape and look at this Center assembly 9:07 um the light output doesn't uh change 9:10 whether you feed this in or not so it's 9:11 clear not doing any internal Clos Loop 9:14 control obviously there may well be some 9:16 facility for it to do that but it's not 9:17 being used in this 9:19 application and so this output just 9:21 produces a voltage on the uh outle 9:24 connector proportional to the amount of 9:26 light that's present so there's a little 9:28 diffuser in the back there 9:30 and then there's just some kind of uh 9:33 Optical sensor looks like a 9:35 chip looking at the lead it's a very 9:37 small package on the PCB with this lens 9:40 assembly over the top and these look 9:43 like they're actually on a copper 9:44 Metalized PCB for maximum thermal 9:47 performance and yeah it's a very small 9:49 package looks like it's a ceramic 9:51 package and there's a thermister there 9:53 for temperature monitoring this is the 9:56 475 blue one this is the 520 need to 9:59 Green which is uh rather different OB 10:02 it's a much bigger D with lots of bond 10:04 wise but also this looks like it's using 10:05 a phosphor if I shine a blue light at it 10:08 lights up green so this is actually a 10:10 phosphor conversion green lead which 10:12 I've I've come across before they want 10:15 that specific wavelength so they may be 10:17 easier to tune a phosphor than tune the 10:20 um semiconductor material to get the uh 10:23 right right wavelength from the lead 10:24 directly uh red 630 similar size to the 10:28 blue one or does seem to have a uh a 10:31 lens on top of it there is a sort of red 10:33 coloring to 10:35 the die but that doesn't appear to be 10:38 fluorescent as far as I can 10:39 tell and the white one again a little 10:41 bit different sort of much higher 10:43 current 10:46 connectors a makeer name on that 10:48 connector flot light not sure if that's 10:52 the connector or the lead 10:54 itself and obviously with the phosphor 10:56 and I'd imagine that phosphor may well 10:58 be tuned to get the maximum to the uh 5 11:01 cenm and actually this white one looks 11:04 like a St fairly standard product I just 11:06 found it in Mouse made by luminous 11:09 devices in fact actually I think all 11:11 these are based on various luminous 11:13 devices modules and they're you take 11:17 looks like they taking the nearest 11:18 wavelength and then just using these 11:19 filters to clean it up to get a precise 11:22 uh spectral line out of it so quite a 11:25 nice neat and um extreme 11:30 bright light source uh sure I've got any 11:33 particular use for it so I think this 11:35 might end up on 11:36 eBay but uh very pretty to look out and 11:40 without the uh risk of burning your eyes 11:43 out like you do with lasers so I thought 11:45 it would be interesting to try and 11:46 figure out the runtime of this things 11:48 like this we usually keep some sort 11:49 record of runtime cuz leads degrade over 11:51 time I couldn't get any software to work 11:52 through the USB face but then had a 11:54 thought probably going to be writing the 11:55 runtime periodically to the e s prom so 11:58 I just just scope up that and noticed it 12:00 was doing right every 5 minutes so I 12:02 just ran it for a while periodically 12:04 reading the E squ I just held the pick 12:05 in in reset and um put clip over to read 12:07 the square prom and found it was writing 12:10 one location per color every 5 minutes 12:12 so if one color was on it would write 12:14 that location every 5 minutes and just 12:16 increment it by one so after doing a few 12:18 tests with different colors of different 12:19 time periods it looked extremely 12:21 straightforward it's like a four bite 12:22 count for each color looking at the 12:24 original data that was in it all the 12:26 colors apart from Green were reading 12:28 zero and the green was reading four 12:30 indicating a total 20 minutes run time 12:32 ever if it was turned on run for a short 12:34 time then turned off that might not have 12:36 been counted but even so indicates this 12:37 thing wasn't used a great deal the whole 12:40 s process of doing a run can be several 12:42 hours but it'll only be doing probably 12:43 the Imaging at the end of that so you 12:46 wouldn't expect to be running for a long 12:47 time but say a single color for 20 12:50 minutes over its whole lifetime does 12:52 seem a little bit on the low side okay 12:55 let's look at the camera un fortunately 12:57 I managed to not record any sound when I 12:58 did this it's also a couple of months 13:00 ago so there's going to be a few details 13:02 that I've forgotten so I'm just going to 13:04 dub this over the original footage so um 13:07 take the lid off see this massive great 13:10 heat sink so this is a pel cool camera 13:12 we've got this blower fan producing a 13:14 fair amount of air flow through 13:16 it the connector here there's the ccds 13:19 mounted on the board on the 13:24 right this unplugs so we've got a bit of 13:27 power supply stuff on here 13:29 USB interface I think that's the Cyprus 13:32 microcontroller High speeded USB 13:34 interface there's a zyink spon fpga some 13:40 RAM and there's a couple of ATD 13:42 converters can't quite read what those 13:45 those are but anal 13:47 devices um little bit of bodgery around 13:51 here extra decoupling obviously they 13:53 have having some noise issues this is 13:55 around the ram chip quite a lot of extra 13:57 capacitors been added there 13:59 uh there's a couple of amplifiers prior 14:01 to the HD converter buffers or Andor 14:05 amplifiers taking the CCD 14:08 signal um bit more power spy stuff here 14:11 this is probably all to do with 14:12 generating the various CCD bias voltages 14:14 they uh need quite a lot of exotic 14:18 voltages next board down is just a 14:20 shield and an interconnect 14:24 boardly shielding the power supply stuff 14:26 from some the more sensitive an log 14:28 stuff 14:31 and this is the bottom board which is 14:32 just all power supply 14:34 stuff as you can see tons of capacitors 14:37 or Transformer in 14:42 there and this is the CCD which is a uh 14:47 very impressive thing this is a kf50 100 14:50 originally by true sense then codec 14:53 there ON 14:54 Semiconductor it's 50 megapixels uh the 14:58 only price I could find was this one 15:00 5,000 bucks and the architecture you can 15:03 see there actually two separate halves 15:04 which explains the Dual AZ converters 15:06 and two amplifiers it's literally split 15:08 down the middle and duplicated so it's 15:10 outputting two streams in parallel just 15:13 to keep the bandwidth sensible and it's 15:15 got this amazing um diffraction effects 15:18 it's got micro lenses over the pixel so 15:20 there's there's a bit more Optics going 15:22 on than on a normal 15:25 sensor few more bodges on the CCD board 15:28 including this wire which isn't really 15:29 tacked down very well which is a bit uh 15:32 bit of a mess quite a few bits around 15:34 this board where they've uh tacked 15:36 various bits on which is not super 15:38 impressive looks like CCD drivers on the 15:40 left with those 3 ohm um damping 15:43 resistors on the 15:47 output get a few more little bodges 15:50 around here some of 15:52 the and there's this separator the 15:54 silica gel to keep the moisture down but 15:56 there's this separator that actually 15:58 appears to be cut from piece of 15:59 antistatic 16:04 bag and this sort of thermal block on 16:06 top of this stack of three pel Cola 16:12 modules so as with any Stacks they get 16:16 um larger as they go back towards the 16:18 heat sink because each P's got to not 16:20 only take the heat from the previous but 16:21 also the waste heat which is quite 16:27 significant you see a little temperature 16:29 sensor here that copper block which 16:32 makes contact with the back of the 16:37 CCD and this's the back of the 16:40 pelas this then contacts the heat sink 16:44 on the uh rear there a few thermal pads 16:46 as well for some of the other power 16:47 components on this 16:51 PCB okay I've connected this uh camera 16:54 up I found some drivers on the disc that 16:56 seem to work under Windows 7 couldn't 16:58 get to install under Windows 11 though 17:01 um in the absence of any sort of lens or 17:03 being bothered to the proper amount I've 17:04 just put some f over it and put a little 17:06 pin in there to make a pinhole lens and 17:08 software gives a few options I'm not 17:11 entirely sure what all these are there's 17:12 obviously a clock frequency 22 MHz low 17:15 gain and with PFG no idea what that is 17:19 something something game programmable 17:20 Something game perhaps ver exposure 17:23 types I think focus is just like a 17:25 continuous grab until you tell it to 17:27 stop not entirely sure all these options 17:30 are obviously exposure time uh triggers 17:33 there ex external hardware trigger inut 17:35 you just trigger using a um thing on 17:37 screen so the resolution is 8176 by 17:40 6132 and you can actually bin those 17:42 where you combine multiple pixels to get 17:46 increased gain at the expense of lower 17:48 resolution down this is a 10sec exposure 17:51 obviously of the pin hole it's very uh 17:53 intensitive so we just stand still now 17:56 downloading it there's the uh exposure 17:59 so when it's 18:01 um there's a little status thing down 18:03 here so that tells you the um exposure 18:07 [Applause] 18:09 time it's this is just it 18:15 downloading um it is quite I'm seeing 18:18 quite a lot like smearing I think that I 18:20 don't know whether that's just due to 18:21 pixels overloading or something else I 18:24 mean yeah it's not it's not um out of 18:26 the question that there's something not 18:27 totally right about this camera 18:28 certainly was bodge wise on there um I 18:31 don't I'd imagine a camera like this 18:32 it's got a fairly narrow range of 18:34 intensities that it's happy with I'm not 18:36 going to spend a great deal of time on 18:38 this if you're interested in this camera 18:40 maybe for astronomy or something and 18:42 happy to sort of take the risk of it may 18:44 not be uh perfect I'll um I think I'll 18:47 stick this on eBay along with the 18:48 Illuminator I'll put a link down in the 18:50 description to the listing take your 18:52 chances to grab a bargain so for example 18:54 here we see this vertical streaking so 18:56 I'm not sure how normal that is this is 18:58 on fairly bright scene looking out the 19:02 window if I cut the exposure time down 19:04 on that it's now 1 second 19:07 exposure again most of the image 19:09 disappears again this is looks like it's 19:11 possibly over still overloading here go 19:14 that go down to say say quarter a 19:16 second so again I think there might be 19:19 some Auto gain control going on here um 19:21 this is with the PFG option let's try 19:23 turning that off and see what 19:25 happens so I'm not sure this is actually 19:27 more streaking or which just it's 19:29 cranked up the gain all the dis display 19:31 gray scale to show what um you know the 19:33 range of things that it's captured 19:36 there's one of one of 12 things in the 19:38 software there's um you can see of you 19:40 can't seem to read out the temperature 19:42 of the pelta cooler but you can set the 19:44 temperature and if you said it's a 19:46 different temperature you see the power 19:48 consumption jump up running the cooler 19:50 to get the temperature you requested but 19:52 I can't see anything anywhere that tells 19:54 you whether the cool is at the at the 19:56 temperature other than the power 19:57 consumption going down and there's no 19:59 temperature read out 20:03 here and just some yeah this is just 20:05 sort of very basic software I'm sure 20:07 there's like an API for more 20:09 sophisticated 20:10 applications but so if you know anything 20:12 more about these cameras please um stick 20:14 in the 20:15 comments um incidentally when I was 20:18 editing I didn't notice there was a bent 20:19 pin on the um CCD but I did fix that 20:22 before doing these tests and also 20:24 reactivated the um silica gel desicant 20:26 cuz I noticed it was uh I was getting 20:28 bit of condensation on the window but um 20:31 yeah so a couple of uh interesting but 20:34 maybe not particularly uh useful pieces 20:37 of Kit except for someone that's got a 20:38 very specific use so um I'll stick a 20:42 I'll stick these on eBay put a link in 20:44 the description and say hopefully 20:45 someone could actually make some uh good 20:47 use of these things Example Output: **Abstract:** This video presents Part 2 of a teardown focusing on the optical components of a Fluidigm Polaris biotechnology instrument, specifically the multi-wavelength illuminator and the high-resolution CCD camera. The Lumen Dynamics illuminator unit is examined in detail, revealing its construction using multiple high-power LEDs (430nm, 475nm, 520nm, 575nm, 630nm) combined via dichroic mirrors and filters. A square fiber optic rod is used to homogenize the light. A notable finding is the use of a phosphor-converted white LED filtered to achieve the 575nm output. The unit features simple TTL activation for each color, conduction cooling, and internal homogenization optics. Analysis of its EEPROM suggests extremely low operational runtime. The camera module teardown showcases a 50 Megapixel ON Semiconductor KAF-50100 CCD sensor with micro-lenses, cooled by a multi-stage Peltier stack. The control electronics include an FPGA and a USB interface. Significant post-manufacturing modifications ("bodges") are observed on the camera's circuit boards. Basic functional testing using vendor software and a pinhole lens confirms image capture but reveals prominent vertical streaking artifacts, the cause of which remains uncertain (potential overload, readout artifact, or fault). **Exploring the Fluidigm Polaris: A Detailed Look at its High-End Optics and Camera System** * **0:00 High-End Optics:** The system utilizes heavy, high-quality lenses and mirrors for precise imaging, weighing around 4 kilos each. * **0:49 Narrow Band Filters:** A filter wheel with five narrow band filters (488, 525, 570, 630, and 700 nm) ensures accurate fluorescence detection and rejection of excitation light. * **2:01 Customizable Illumination:** The Lumen Dynamics light source offers five individually controllable LED wavelengths (430, 475, 520, 575, 630 nm) with varying power outputs. The 575nm yellow LED is uniquely achieved using a white LED with filtering. * **3:45 TTL Control:** The light source is controlled via a simple TTL interface, enabling easy on/off switching for each LED color. * **12:55 Sophisticated Camera:** The system includes a 50-megapixel Kodak KAI-50100 CCD camera with a Peltier cooling system for reduced noise. * **14:54 High-Speed Data Transfer:** The camera features dual analog-to-digital converters to manage the high data throughput of the 50-megapixel sensor, which is effectively two 25-megapixel sensors operating in parallel. * **18:11 Possible Issues:** The video creator noted some potential issues with the camera, including image smearing. * **18:11 Limited Dynamic Range:** The camera's sensor has a limited dynamic range, making it potentially challenging to capture scenes with a wide range of brightness levels. * **11:45 Low Runtime:** Internal data suggests the system has seen minimal usage, with only 20 minutes of recorded runtime for the green LED. * **20:38 Availability on eBay:** Both the illuminator and camera are expected to be listed for sale on eBay. Here is the real transcript. Please summarize it: 00:00:08 two clocks are functioning and locked, 00:00:14 we have lots of beam current on both. 00:00:17 and they hover about a few nanoseconds. 00:00:22 see if it can maintain that measurement. 00:00:32 Hello and welcome back. If you follow the channel, 00:00:34 clock. The goal is to compare two atomic clocks, 00:00:41 theory of relativity for ourselves. 00:00:46 Hafele and Keating experiments of 1971, 00:00:51 of time dilation as predicted by 00:00:55 HP 5061A clocks around the world. 00:01:02 check that our tired eBay clocks still 00:01:06 when next to each other, at rest in the lab. 00:01:13 to measure a shift of about 20 00:01:16 at elevated altitude over a few days, 00:01:22 original experiment. So our two clocks 00:01:28 few days to give us a fighting chance. 00:01:33 in the series, one of the clocks failed, 00:01:38 about to measure it. Fortunately, 00:01:43 we think, so let's try this measurement again. 00:01:50 But first, let's look at the measurement setup. 00:01:56 clocks, which is man's worst nightmare. Which 00:02:02 answer is neither, and all we can do is 00:02:08 us an idea of their ultimate stability. 00:02:13 and Keating's clocks were equipped with 00:02:19 generated a sharp tick every second, 00:02:24 clock face on the front panel. Unfortunately, 00:02:30 Ours are merely frequency standards: 00:02:35 signals at 5 MHz, 1 MHz and 100 kHz. 00:02:43 two HP 59308A Timing Generators. We'll 00:02:50 atomic clock, and do the same with the 00:02:55 programmable digital dividers. We'll set 00:03:01 microseconds, representing each of the clocks. 00:03:09 we can observe on a scope. The interval between 00:03:15 will depend on when the dividers randomly started. 00:03:22 constant, atomically stable. 00:03:26 measure any changes on the scope itself. 00:03:31 pulses, we'll use an HP 5334B counter instead. 00:03:39 measuring the interval from the first tick to 00:03:45 function, we can zero out the mean delay, 00:03:52 give us a resolution of 0.1 nanoseconds. Hopefully 00:03:59 00:04:04 we will also slave it to one of the atomic 00:04:09 counter can only take a 10 MHz reference input, 00:04:17 MHz output. Not to worry! We'll use 00:04:23 restored and upgraded in a previous episode, 00:04:30 its newly installed frequency doubler option, 00:04:36 the resulting 10 MHz for the counter reference. 00:04:44 our previous HP restoration efforts pay off, 00:04:49 equipment that works. Let's measure atomic time! 00:04:56 other Time Nuts in the comments, 00:05:00 characterize the frequency stability 00:05:04 Variance or its modified alternatives. 00:05:10 but in essence, this is the variance of the 00:05:16 It's a statistical measure of the likely clock 00:05:22 being a day in our case. It is expressed in 00:05:29 so it's a dimensionless quantity. A day being 00:05:35 to the 13th nanoseconds, if we targeted a 00:05:43 we'd want a clock with Allan variance of about 00:05:51 plots provided by HP, that is stretching what our 00:05:59 It should be within the reach of an HP 5061A 00:06:07 what a standard tube can do. Plus now, we have 00:06:14 know it's going to be dicey. But we'll try anyhow. 00:06:21 our pile of HP measurement instruments. 00:06:28 repaired clock B on the right, 00:06:31 nanoseconds difference between each other. 00:06:36 see if it stays there. If that's the case, 00:06:41 relativity for ourselves. 00:06:46 I'm not too sure about that, we'll see... 00:06:49 (The next morning...) 00:06:52 experiment half succeeded and half failed. 00:07:00 locked. They are both on green, 00:07:06 would unlock before the repair. 00:07:11 They drifted 34, 35 nanoseconds, 00:07:22 a short duration experiment. 00:07:29 200 nanoseconds. But in our experiment, 00:07:36 around the world in opposite directions. 00:07:40 which will be less than minus 34 nanoseconds. 00:07:48 not warm enough, maybe something happened. 00:07:52 So I'm going to restart it. 00:08:00 So right now they, are 598 apart. 00:08:06 So I'll enter this, 598... 00:08:12 Minus 9... Enter... Enter... 00:08:27 let it run for another day. 00:08:31 (The following day...) 00:08:33 and I have invited Einstein to the party, 00:08:39 it's only about 6.7 nanoseconds of drift. 00:08:51 but much better than what we had before. 00:08:58 And what I have done is, 00:08:59 same direction, and always the same value. 00:09:05 the clocks, so that both match. 00:09:11 graduation, or half a graduation of the 00:09:17 is already a fine fine fine pot over here. 00:09:25 time is atomic time and that is it, 00:09:30 seen in the Zeeman alignment episode, 00:09:36 atomic line frequency due to the magnetic field, 00:09:42 Zeeman and that bears his name to this day. Here 00:09:48 way. Therefore, there is quite an elaborate 00:09:54 in the Cesium tube, known as the C-Field, 00:10:00 here, we are pushing the clocks to their limits, 00:10:06 to put the clocks right on top of each other. 00:10:12 experiment, they didn't even bother with that. 00:10:16 to each other. But they characterized 00:10:21 much larger national metrology lab master clock. 00:10:29 [Marc] It's moving to this left. 00:10:30 [Eric] Yeah. 00:10:31 adjust the results afterwards. And by the way, 00:10:36 Yes, they flew not just one, but 4 clocks. 00:10:42 to a better master clock, 00:10:45 clocks are not drifting vs. each other 00:10:49 stability directly from this simple experiment. 00:10:56 clocks to more or less track each other, 00:11:00 over long periods of time, 00:11:04 a little bit more of HP equipment. 00:11:08 7132A as our chart recorder. It can record for 00:11:16 analog instrument, so we'll need the HP 00:11:21 conveniently repaired two episodes ago. We'll 00:11:29 good measure, we'll also add an HP 3478A DVM, 00:11:37 course, to monitor the DAC output voltage. 00:11:43 to serve as our HP-IB controller. It will 00:11:50 send it over to the DAC over the HP-IB bus. 00:11:56 a simple BASIC program, and our 00:11:59 100% vintage HP metrology setup is good to go. 00:12:03 by the minute, or the nanosecond here. 00:12:09 newest of my oldest computers. 00:12:15 the nanosecond tracking of the two clocks. 00:12:27 is 14 lines of vintage BASIC - no OS to load, 00:12:33 no drivers, no version of Python to check, 00:12:36 all this natively, out of the box. 00:12:43 just realized we don't use anywhere, 00:12:47 so make this 13 lines of BASIC. 00:12:50 bus, in case we get stuck for more 00:12:55 500 and clear the bus and start again. 00:13:00 from the counter at address 703. 00:13:05 Line 40 scales it to get rid of the exponent, 00:13:08 around 500, so it fits our DAC scale. 00:13:15 the valid DAC values, from 0 to 999. 00:13:23 line towards the bottom of the screen, 00:13:28 so we can check it's doing the right thing. 00:13:31 to the DAC at address 706, 00:13:36 preceded by a 2 to indicate the -10 to +10V range. 00:13:41 And that's pretty much it. 00:13:42 a voltage out of it, changes that into a 00:13:49 A transforms it into a real voltage, 00:13:58 recorder, so we can follow the trends. 00:14:04 one direction, or if it's getting big skips. 00:14:10 (The next morning...) 00:14:12 great. We have drifted almost 14 nanosecond. 00:14:20 And not only there is drift, but the way it 00:14:23 here. I have this big jumps. 00:14:32 there's one there, there's some here, 00:14:39 is not going to get us there. 00:14:44 is going all over the place, and even jumps, 00:14:50 strongly suspect that it's just one of the clocks, 00:14:56 are more atomic gremlins hiding in that one. 00:15:03 However, there is one piece of really good news. 00:15:05 of vintage HP equipment, works perfectly. 00:15:11 We just need more work before we can check on 00:15:16 more atomic clock repairs in our near future.
identifier: 2105
model: gemini-2.5-pro-exp-03-25| input-price: 1.25 output-price: 10.0 max-context-length: 128_000
host: 194.230.160.66
https://youtu.be/ENviIxDTmUA?feature=sharedinclude_comments: None
include_timestamps: 1
include_glossary: None
output_language: en
cost: 0.09736
Okay, here is the abstract and summary for the provided transcript featuring Chris Lattner discussing Mojo. *Abstract:* This interview features Chris Lattner, creator of Swift, LLVM, Clang, and the Mojo programming language. Lattner discusses the motivations behind creating Mojo: primarily to address performance challenges in AI/ML and effectively utilize modern hardware like GPUs and accelerators, while integrating seamlessly with the existing Python ecosystem. He contrasts Mojo's approach ("Python++" superset focusing on adding capabilities) with attempts to simply optimize existing Python code, emphasizing Mojo's design philosophy of providing explicit control and predictability for performance-critical tasks via features like structs, static typing (`fn` functions), and first-class SIMD support. The discussion covers lessons learned from Swift's design, particularly regarding type system complexity and operator overloading. Lattner details Mojo's use of MLIR alongside LLVM for its compiler infrastructure, enabling advanced optimizations and features like compile-time metaprogramming. He also touches upon specific design choices around numeric types (including SIMD and integer overflow behavior) and memory management, positioning Mojo as a systems language that empowers library developers and provides flexibility for different programmer personas. *Mojo Language Design and Philosophy with Chris Lattner* * *0:00:47 Why Create Mojo?* Mojo wasn't the initial goal of Modular. It emerged from the need to program next-generation hardware (GPUs, accelerators) effectively, leveraging the MLIR compiler framework, and requiring a user-friendly syntax, leading them to build upon Python. * *0:01:21 Learning from Swift:* Creating a new language syntax is complex and time-consuming. Mojo prioritizes powerful semantics, compiler infrastructure for performance, and meeting the large existing Python community where they are. * *0:03:55 Python Implementation Limitations:* CPython interpreter isn't very fast, lacks modern compiler techniques, has packaging challenges, and fundamentally cannot target GPUs or embedded systems requiring minimal dependencies. * *0:04:56 Importance of API Design:* Good API design is crucial, including what it *disallows*. Swift's string API (bytes vs. grapheme clusters) is praised for simplifying by removing confusing concepts. The goal is to make safe defaults easy to reach for. * *0:09:29 Mojo as "Python++":* Aims to be a superset of Python over time, allowing use of the existing ecosystem. It adds new capabilities (like `struct` for value types) alongside Python's features (`class`). The goal is to allow gradual adoption and upgrading of Python code, potentially replacing complex Python+C extensions with pure Mojo. * *0:17:48 Against Magic Optimization:* Mojo deliberately avoids trying to magically speed up unmodified Python code. Such approaches (like PyPy, Codon) create unpredictable performance cliffs and reduce programmer control. Mojo instead provides explicit tools (types, structs, `fn`) to unlock performance, working backward from hardware capabilities. * *0:25:09 Compiler Strategy (MLIR + LLVM):* Mojo uses MLIR for high-level, parallelizable optimizations (like its own inliner) and selectively uses LLVM for backend code generation, disabling parts like LLVM's auto-vectorizer to ensure predictability. * *0:28:15 Balancing Control and Dynamism:* Mojo offers `def` for Python-like dynamic behavior (defaulting to object types) and `fn` for systems programming requiring explicit types and offering more control and performance predictability. This caters to different user needs within one language. * *0:39:03 Compile-Time Metaprogramming:* Inspired by languages like Zig, Mojo uses the same language for compile-time and runtime code, simplifying macros, templates, etc., enabling powerful library abstractions. * *0:39:41 First-Class SIMD:* Mojo treats SIMD types as fundamental, defined in the standard library using compile-time parameters for element type and width. Scalar types (like `Float32`) are defined as single-element SIMD vectors (`SIMD<Float32, 1>`), allowing numeric libraries (like `sin`, `cos`) to be written generically for both scalars and vectors. * *0:43:47 Hardware Abstraction:* Low-level hardware intrinsics (Neon, AVX, etc.) are exposed. Libraries can use compile-time checks (`if target == ...`) to use specific hardware features, abstracting hardware differences away from the end-user. Mojo pushes complexity from the compiler into libraries. * *0:52:51 Functional Programming Influence & Mutation:* Mojo embraces value semantics (like Swift) to prevent spooky action at a distance but allows efficient in-place mutation where appropriate (e.g., via copy-on-write or explicit unsafe code in library internals), unlike pure functional languages that can suffer performance penalties. * *1:02:42 Integer Overflow Handling:* Mojo consistently uses wrapping (two's complement) semantics for integer overflow for both signed and unsigned types. This differs from C++ (UB for signed) and Swift (traps by default) but is necessary for SIMD performance and predictability. It distinguishes between Python's arbitrary-precision `int` (heap object) and Mojo's fixed-width `Int` (struct, fast). * *1:13:35 Type Inference Approach:* Mojo avoids complex Hindley-Milner type inference (which caused issues in Swift due to interaction with overloading, literals, etc.). It uses a simpler, more localized type checking approach consistent with Python's dynamic feel, aiming for predictable compile times and clear error messages. Function signatures are generally required for `fn` functions. * *1:29:11 Operator Overloading Strategy:* Mojo currently allows overloading existing Python operators via Dunder methods (`__add__`, etc.). It avoids adding custom operators (like Swift did) to keep the language simpler initially, recognizing that the complexity might not pay for itself. This allows libraries to define numeric types (like `Complex`, `BFloat16`) without needing compiler changes. * *1:38:50 Complex Numbers & Hardware:* While Mojo doesn't privilege complex numbers, its library-centric design and compile-time features allow a `Complex` type's multiply operator to internally check the target hardware and use specialized instructions (if available on platforms like IBM mainframes) for acceleration. * *1:41:37 Arbitrary Bitwidth Integers:* MLIR supports arbitrary bitwidths. While not exposed in Mojo's standard library *yet*, it could be added purely as a library feature without compiler changes. * *1:43:43 Allocator Philosophy:* Mojo currently doesn't have a pervasive allocator parameter system like Zig. Instead, it focuses on providing different *types* with different allocation strategies (e.g., a `List` type potentially taking a parameter for inline storage like `SmallVector`) rather than making allocators a universal generic parameter. * *1:47:32 Slices and Ownership:* Mojo has an ownership system, allowing efficient, safe slices that refer back to the original data structure's lifetime, often using copy-on-write semantics. * *1:48:23 Community and Status:* Mojo is still early but has an active community on Discord. It's being used for various applications beyond AI, including web development and GUIs. Users are encouraged to explore the documentation and get involved. I used gemini-2.5-pro-exp-03-25| input-price: 1.25 output-price: 10.0 max-context-length: 128_000 on rocketrecap dot com to summarize the transcript. Cost (if I didn't use the free tier): $0.10 Input tokens: 65072 Output tokens: 1602
Okay, here is the abstract and summary for the provided transcript featuring Chris Lattner discussing Mojo.
Abstract:
This interview features Chris Lattner, creator of Swift, LLVM, Clang, and the Mojo programming language. Lattner discusses the motivations behind creating Mojo: primarily to address performance challenges in AI/ML and effectively utilize modern hardware like GPUs and accelerators, while integrating seamlessly with the existing Python ecosystem. He contrasts Mojo's approach ("Python++" superset focusing on adding capabilities) with attempts to simply optimize existing Python code, emphasizing Mojo's design philosophy of providing explicit control and predictability for performance-critical tasks via features like structs, static typing (fn
functions), and first-class SIMD support. The discussion covers lessons learned from Swift's design, particularly regarding type system complexity and operator overloading. Lattner details Mojo's use of MLIR alongside LLVM for its compiler infrastructure, enabling advanced optimizations and features like compile-time metaprogramming. He also touches upon specific design choices around numeric types (including SIMD and integer overflow behavior) and memory management, positioning Mojo as a systems language that empowers library developers and provides flexibility for different programmer personas.
Mojo Language Design and Philosophy with Chris Lattner
Below, I will provide input for an example video (comprising of title, description, and transcript, in this order) and the corresponding abstract and summary I expect. Afterward, I will provide a new transcript that I want you to summarize in the same format. **Please give an abstract of the transcript and then summarize the transcript in a self-contained bullet list format.** Include starting timestamps, important details and key takeaways. Example Input: Fluidigm Polaris Part 2- illuminator and camera mikeselectricstuff 131K subscribers Subscribed 369 Share Download Clip Save 5,857 views Aug 26, 2024 Fluidigm Polaris part 1 : • Fluidigm Polaris (Part 1) - Biotech g... Ebay listings: https://www.ebay.co.uk/usr/mikeselect... Merch https://mikeselectricstuff.creator-sp... Transcript Follow along using the transcript. Show transcript mikeselectricstuff 131K subscribers Videos About Support on Patreon 40 Comments @robertwatsonbath 6 hours ago Thanks Mike. Ooof! - with the level of bodgery going on around 15:48 I think shame would have made me do a board re spin, out of my own pocket if I had to. 1 Reply @Muonium1 9 hours ago The green LED looks different from the others and uses phosphor conversion because of the "green gap" problem where green InGaN emitters suffer efficiency droop at high currents. Phosphide based emitters don't start becoming efficient until around 600nm so also can't be used for high power green emitters. See the paper and plot by Matthias Auf der Maur in his 2015 paper on alloy fluctuations in InGaN as the cause of reduced external quantum efficiency at longer (green) wavelengths. 4 Reply 1 reply @tafsirnahian669 10 hours ago (edited) Can this be used as an astrophotography camera? Reply mikeselectricstuff · 1 reply @mikeselectricstuff 6 hours ago Yes, but may need a shutter to avoid light during readout Reply @2010craggy 11 hours ago Narrowband filters we use in Astronomy (Astrophotography) are sided- they work best passing light in one direction so I guess the arrows on the filter frames indicate which way round to install them in the filter wheel. 1 Reply @vitukz 12 hours ago A mate with Channel @extractions&ire could use it 2 Reply @RobertGallop 19 hours ago That LED module says it can go up to 28 amps!!! 21 amps for 100%. You should see what it does at 20 amps! Reply @Prophes0r 19 hours ago I had an "Oh SHIT!" moment when I realized that the weird trapezoidal shape of that light guide was for keystone correction of the light source. Very clever. 6 Reply @OneBiOzZ 20 hours ago given the cost of the CCD you think they could have run another PCB for it 9 Reply @tekvax01 21 hours ago $20 thousand dollars per minute of run time! 1 Reply @tekvax01 22 hours ago "We spared no expense!" John Hammond Jurassic Park. *(that's why this thing costs the same as a 50-seat Greyhound Bus coach!) Reply @florianf4257 22 hours ago The smearing on the image could be due to the fact that you don't use a shutter, so you see brighter stripes under bright areas of the image as you still iluminate these pixels while the sensor data ist shifted out towards the top. I experienced this effect back at university with a LN-Cooled CCD for Spectroscopy. The stripes disapeared as soon as you used the shutter instead of disabling it in the open position (but fokussing at 100ms integration time and continuous readout with a focal plane shutter isn't much fun). 12 Reply mikeselectricstuff · 1 reply @mikeselectricstuff 12 hours ago I didn't think of that, but makes sense 2 Reply @douro20 22 hours ago (edited) The red LED reminds me of one from Roithner Lasertechnik. I have a Symbol 2D scanner which uses two very bright LEDs from that company, one red and one red-orange. The red-orange is behind a lens which focuses it into an extremely narrow beam. 1 Reply @RicoElectrico 23 hours ago PFG is Pulse Flush Gate according to the datasheet. Reply @dcallan812 23 hours ago Very interesting. 2x Reply @littleboot_ 1 day ago Cool interesting device Reply @dav1dbone 1 day ago I've stripped large projectors, looks similar, wonder if some of those castings are a magnesium alloy? Reply @kevywevvy8833 1 day ago ironic that some of those Phlatlight modules are used in some of the cheapest disco lights. 1 Reply 1 reply @bill6255 1 day ago Great vid - gets right into subject in title, its packed with information, wraps up quickly. Should get a YT award! imho 3 Reply @JAKOB1977 1 day ago (edited) The whole sensor module incl. a 5 grand 50mpix sensor for 49 £.. highest bid atm Though also a limited CCD sensor, but for the right buyer its a steal at these relative low sums. Architecture Full Frame CCD (Square Pixels) Total Number of Pixels 8304 (H) × 6220 (V) = 51.6 Mp Number of Effective Pixels 8208 (H) × 6164 (V) = 50.5 Mp Number of Active Pixels 8176 (H) × 6132 (V) = 50.1 Mp Pixel Size 6.0 m (H) × 6.0 m (V) Active Image Size 49.1 mm (H) × 36.8 mm (V) 61.3 mm (Diagonal), 645 1.1x Optical Format Aspect Ratio 4:3 Horizontal Outputs 4 Saturation Signal 40.3 ke− Output Sensitivity 31 V/e− Quantum Efficiency KAF−50100−CAA KAF−50100−AAA KAF−50100−ABA (with Lens) 22%, 22%, 16% (Peak R, G, B) 25% 62% Read Noise (f = 18 MHz) 12.5 e− Dark Signal (T = 60°C) 42 pA/cm2 Dark Current Doubling Temperature 5.7°C Dynamic Range (f = 18 MHz) 70.2 dB Estimated Linear Dynamic Range (f = 18 MHz) 69.3 dB Charge Transfer Efficiency Horizontal Vertical 0.999995 0.999999 Blooming Protection (4 ms Exposure Time) 800X Saturation Exposure Maximum Date Rate 18 MHz Package Ceramic PGA Cover Glass MAR Coated, 2 Sides or Clear Glass Features • TRUESENSE Transparent Gate Electrode for High Sensitivity • Ultra-High Resolution • Board Dynamic Range • Low Noise Architecture • Large Active Imaging Area Applications • Digitization • Mapping/Aerial • Photography • Scientific Thx for the tear down Mike, always a joy Reply @martinalooksatthings 1 day ago 15:49 that is some great bodging on of caps, they really didn't want to respin that PCB huh 8 Reply @RhythmGamer 1 day ago Was depressed today and then a new mike video dropped and now I’m genuinely happy to get my tear down fix 1 Reply @dine9093 1 day ago (edited) Did you transfrom into Mr Blobby for a moment there? 2 Reply @NickNorton 1 day ago Thanks Mike. Your videos are always interesting. 5 Reply @KeritechElectronics 1 day ago Heavy optics indeed... Spare no expense, cost no object. Splendid build quality. The CCD is a thing of beauty! 1 Reply @YSoreil 1 day ago The pricing on that sensor is about right, I looked in to these many years ago when they were still in production since it's the only large sensor you could actually buy. Really cool to see one in the wild. 2 Reply @snik2pl 1 day ago That leds look like from led projector Reply @vincei4252 1 day ago TDI = Time Domain Integration ? 1 Reply @wolpumba4099 1 day ago (edited) Maybe the camera should not be illuminated during readout. From the datasheet of the sensor (Onsemi): saturation 40300 electrons, read noise 12.5 electrons per pixel @ 18MHz (quite bad). quantum efficiency 62% (if it has micro lenses), frame rate 1 Hz. lateral overflow drain to prevent blooming protects against 800x (factor increases linearly with exposure time) saturation exposure (32e6 electrons per pixel at 4ms exposure time), microlens has +/- 20 degree acceptance angle i guess it would be good for astrophotography 4 Reply @txm100 1 day ago (edited) Babe wake up a new mikeselectricstuff has dropped! 9 Reply @vincei4252 1 day ago That looks like a finger-lakes filter wheel, however, for astronomy they'd never use such a large stepper. 1 Reply @MRooodddvvv 1 day ago yaaaaay ! more overcomplicated optical stuff ! 4 Reply 1 reply @NoPegs 1 day ago He lives! 11 Reply 1 reply Transcript 0:00 so I've stripped all the bits of the 0:01 optical system so basically we've got 0:03 the uh the camera 0:05 itself which is mounted on this uh very 0:09 complex 0:10 adjustment thing which obviously to set 0:13 you the various tilt and uh alignment 0:15 stuff then there's two of these massive 0:18 lenses I've taken one of these apart I 0:20 think there's something like about eight 0:22 or nine Optical elements in here these 0:25 don't seem to do a great deal in terms 0:26 of electr magnification they're obiously 0:28 just about getting the image to where it 0:29 uh where it needs to be just so that 0:33 goes like that then this Optical block I 0:36 originally thought this was made of some 0:37 s crazy heavy material but it's just 0:39 really the sum of all these Optical bits 0:41 are just ridiculously heavy those lenses 0:43 are about 4 kilos each and then there's 0:45 this very heavy very solid um piece that 0:47 goes in the middle and this is so this 0:49 is the filter wheel assembly with a 0:51 hilariously oversized steper 0:53 motor driving this wheel with these very 0:57 large narrow band filters so we've got 1:00 various different shades of uh 1:03 filters there five Al together that 1:06 one's actually just showing up a silver 1:07 that's actually a a red but fairly low 1:10 transmission orangey red blue green 1:15 there's an excess cover on this side so 1:16 the filters can be accessed and changed 1:19 without taking anything else apart even 1:21 this is like ridiculous it's like solid 1:23 aluminium this is just basically a cover 1:25 the actual wavelengths of these are um 1:27 488 525 570 630 and 700 NM not sure what 1:32 the suffix on that perhaps that's the uh 1:34 the width of the spectral line say these 1:37 are very narrow band filters most of 1:39 them are you very little light through 1:41 so it's still very tight narrow band to 1:43 match the um fluoresence of the dies 1:45 they're using in the biochemical process 1:48 and obviously to reject the light that's 1:49 being fired at it from that Illuminator 1:51 box and then there's a there's a second 1:53 one of these lenses then the actual sort 1:55 of samples below that so uh very serious 1:58 amount of very uh chunky heavy Optics 2:01 okay let's take a look at this light 2:02 source made by company Lumen Dynamics 2:04 who are now part of 2:06 excelitas self-contained unit power 2:08 connector USB and this which one of the 2:11 Cable Bundle said was a TTL interface 2:14 USB wasn't used in uh the fluid 2:17 application output here and I think this 2:19 is an input for um light feedback I 2:21 don't if it's regulated or just a measur 2:23 measurement facility and the uh fiber 2:27 assembly 2:29 Square Inlet there and then there's two 2:32 outputs which have uh lens assemblies 2:35 and this small one which goes back into 2:37 that small Port just Loops out of here 2:40 straight back in So on this side we've 2:42 got the electronics which look pretty 2:44 straightforward we've got a bit of power 2:45 supply stuff over here and we've got 2:48 separate drivers for each wavelength now 2:50 interesting this is clearly been very 2:52 specifically made for this application 2:54 you I was half expecting like say some 2:56 generic drivers that could be used for a 2:58 number of different things but actually 3:00 literally specified the exact wavelength 3:02 on the PCB there is provision here for 3:04 385 NM which isn't populated but this is 3:07 clearly been designed very specifically 3:09 so these four drivers look the same but 3:10 then there's two higher power ones for 3:12 575 and 3:14 520 a slightly bigger heat sink on this 3:16 575 section there a p 24 which is 3:20 providing USB interface USB isolator the 3:23 USB interface just presents as a comport 3:26 I did have a quick look but I didn't 3:27 actually get anything sensible um I did 3:29 dump the Pi code out and there's a few 3:31 you a few sort of commands that you 3:32 could see in text but I didn't actually 3:34 manage to get it working properly I 3:36 found some software for related version 3:38 but it didn't seem to want to talk to it 3:39 but um I say that wasn't used for the 3:41 original application it might be quite 3:42 interesting to get try and get the Run 3:44 hours count out of it and the TTL 3:46 interface looks fairly straightforward 3:48 we've got positions for six opto 3:50 isolators but only five five are 3:52 installed so that corresponds with the 3:54 unused thing so I think this hopefully 3:56 should be as simple as just providing a 3:57 ttrl signal for each color to uh enable 4:00 it a big heat sink here which is there I 4:03 think there's like a big S of metal 4:04 plate through the middle of this that 4:05 all the leads are mounted on the other 4:07 side so this is heat sinking it with a 4:09 air flow from a uh just a fan in here 4:13 obviously don't have the air flow 4:14 anywhere near the Optics so conduction 4:17 cool through to this plate that's then 4:18 uh air cooled got some pots which are 4:21 presumably power 4:22 adjustments okay let's take a look at 4:24 the other side which is uh much more 4:27 interesting see we've got some uh very 4:31 uh neatly Twisted cable assemblies there 4:35 a bunch of leads so we've got one here 4:37 475 up here 430 NM 630 575 and 520 4:44 filters and dcro mirrors a quick way to 4:48 see what's white is if we just shine 4:49 some white light through 4:51 here not sure how it is is to see on the 4:54 camera but shining white light we do 4:55 actually get a bit of red a bit of blue 4:57 some yellow here so the obstacle path 5:00 575 it goes sort of here bounces off 5:03 this mirror and goes out the 520 goes 5:07 sort of down here across here and up 5:09 there 630 goes basically straight 5:13 through 5:15 430 goes across there down there along 5:17 there and the 475 goes down here and 5:20 left this is the light sensing thing 5:22 think here there's just a um I think 5:24 there a photo diode or other sensor 5:26 haven't actually taken that off and 5:28 everything's fixed down to this chunk of 5:31 aluminium which acts as the heat 5:32 spreader that then conducts the heat to 5:33 the back side for the heat 5:35 sink and the actual lead packages all 5:38 look fairly similar except for this one 5:41 on the 575 which looks quite a bit more 5:44 substantial big spay 5:46 Terminals and the interface for this 5:48 turned out to be extremely simple it's 5:50 literally a 5V TTL level to enable each 5:54 color doesn't seem to be any tensity 5:56 control but there are some additional 5:58 pins on that connector that weren't used 5:59 in the through time thing so maybe 6:01 there's some extra lines that control 6:02 that I couldn't find any data on this uh 6:05 unit and the um their current product 6:07 range is quite significantly different 6:09 so we've got the uh blue these 6:13 might may well be saturating the camera 6:16 so they might look a bit weird so that's 6:17 the 430 6:18 blue the 575 6:24 yellow uh 6:26 475 light blue 6:29 the uh 520 6:31 green and the uh 630 red now one 6:36 interesting thing I noticed for the 6:39 575 it's actually it's actually using a 6:42 white lead and then filtering it rather 6:44 than using all the other ones are using 6:46 leads which are the fundamental colors 6:47 but uh this is actually doing white and 6:50 it's a combination of this filter and 6:52 the dichroic mirrors that are turning to 6:55 Yellow if we take the filter out and a 6:57 lot of the a lot of the um blue content 7:00 is going this way the red is going 7:02 straight through these two mirrors so 7:05 this is clearly not reflecting much of 7:08 that so we end up with the yellow coming 7:10 out of uh out of there which is a fairly 7:14 light yellow color which you don't 7:16 really see from high intensity leads so 7:19 that's clearly why they've used the 7:20 white to uh do this power consumption of 7:23 the white is pretty high so going up to 7:25 about 2 and 1 half amps on that color 7:27 whereas most of the other colors are 7:28 only drawing half an amp or so at 24 7:30 volts the uh the green is up to about 7:32 1.2 but say this thing is uh much 7:35 brighter and if you actually run all the 7:38 colors at the same time you get a fairly 7:41 reasonable um looking white coming out 7:43 of it and one thing you might just be 7:45 out to notice is there is some sort 7:46 color banding around here that's not 7:49 getting uh everything s completely 7:51 concentric and I think that's where this 7:53 fiber optic thing comes 7:58 in I'll 8:00 get a couple of Fairly accurately shaped 8:04 very sort of uniform color and looking 8:06 at What's um inside here we've basically 8:09 just got this Square Rod so this is 8:12 clearly yeah the lights just bouncing 8:13 off all the all the various sides to um 8:16 get a nice uniform illumination uh this 8:19 back bit looks like it's all potted so 8:21 nothing I really do to get in there I 8:24 think this is fiber so I have come 8:26 across um cables like this which are 8:27 liquid fill but just looking through the 8:30 end of this it's probably a bit hard to 8:31 see it does look like there fiber ends 8:34 going going on there and so there's this 8:36 feedback thing which is just obviously 8:39 compensating for the any light losses 8:41 through here to get an accurate 8:43 representation of uh the light that's 8:45 been launched out of these two 8:47 fibers and you see uh 8:49 these have got this sort of trapezium 8:54 shape light guides again it's like a 8:56 sort of acrylic or glass light guide 9:00 guess projected just to make the right 9:03 rectangular 9:04 shape and look at this Center assembly 9:07 um the light output doesn't uh change 9:10 whether you feed this in or not so it's 9:11 clear not doing any internal Clos Loop 9:14 control obviously there may well be some 9:16 facility for it to do that but it's not 9:17 being used in this 9:19 application and so this output just 9:21 produces a voltage on the uh outle 9:24 connector proportional to the amount of 9:26 light that's present so there's a little 9:28 diffuser in the back there 9:30 and then there's just some kind of uh 9:33 Optical sensor looks like a 9:35 chip looking at the lead it's a very 9:37 small package on the PCB with this lens 9:40 assembly over the top and these look 9:43 like they're actually on a copper 9:44 Metalized PCB for maximum thermal 9:47 performance and yeah it's a very small 9:49 package looks like it's a ceramic 9:51 package and there's a thermister there 9:53 for temperature monitoring this is the 9:56 475 blue one this is the 520 need to 9:59 Green which is uh rather different OB 10:02 it's a much bigger D with lots of bond 10:04 wise but also this looks like it's using 10:05 a phosphor if I shine a blue light at it 10:08 lights up green so this is actually a 10:10 phosphor conversion green lead which 10:12 I've I've come across before they want 10:15 that specific wavelength so they may be 10:17 easier to tune a phosphor than tune the 10:20 um semiconductor material to get the uh 10:23 right right wavelength from the lead 10:24 directly uh red 630 similar size to the 10:28 blue one or does seem to have a uh a 10:31 lens on top of it there is a sort of red 10:33 coloring to 10:35 the die but that doesn't appear to be 10:38 fluorescent as far as I can 10:39 tell and the white one again a little 10:41 bit different sort of much higher 10:43 current 10:46 connectors a makeer name on that 10:48 connector flot light not sure if that's 10:52 the connector or the lead 10:54 itself and obviously with the phosphor 10:56 and I'd imagine that phosphor may well 10:58 be tuned to get the maximum to the uh 5 11:01 cenm and actually this white one looks 11:04 like a St fairly standard product I just 11:06 found it in Mouse made by luminous 11:09 devices in fact actually I think all 11:11 these are based on various luminous 11:13 devices modules and they're you take 11:17 looks like they taking the nearest 11:18 wavelength and then just using these 11:19 filters to clean it up to get a precise 11:22 uh spectral line out of it so quite a 11:25 nice neat and um extreme 11:30 bright light source uh sure I've got any 11:33 particular use for it so I think this 11:35 might end up on 11:36 eBay but uh very pretty to look out and 11:40 without the uh risk of burning your eyes 11:43 out like you do with lasers so I thought 11:45 it would be interesting to try and 11:46 figure out the runtime of this things 11:48 like this we usually keep some sort 11:49 record of runtime cuz leads degrade over 11:51 time I couldn't get any software to work 11:52 through the USB face but then had a 11:54 thought probably going to be writing the 11:55 runtime periodically to the e s prom so 11:58 I just just scope up that and noticed it 12:00 was doing right every 5 minutes so I 12:02 just ran it for a while periodically 12:04 reading the E squ I just held the pick 12:05 in in reset and um put clip over to read 12:07 the square prom and found it was writing 12:10 one location per color every 5 minutes 12:12 so if one color was on it would write 12:14 that location every 5 minutes and just 12:16 increment it by one so after doing a few 12:18 tests with different colors of different 12:19 time periods it looked extremely 12:21 straightforward it's like a four bite 12:22 count for each color looking at the 12:24 original data that was in it all the 12:26 colors apart from Green were reading 12:28 zero and the green was reading four 12:30 indicating a total 20 minutes run time 12:32 ever if it was turned on run for a short 12:34 time then turned off that might not have 12:36 been counted but even so indicates this 12:37 thing wasn't used a great deal the whole 12:40 s process of doing a run can be several 12:42 hours but it'll only be doing probably 12:43 the Imaging at the end of that so you 12:46 wouldn't expect to be running for a long 12:47 time but say a single color for 20 12:50 minutes over its whole lifetime does 12:52 seem a little bit on the low side okay 12:55 let's look at the camera un fortunately 12:57 I managed to not record any sound when I 12:58 did this it's also a couple of months 13:00 ago so there's going to be a few details 13:02 that I've forgotten so I'm just going to 13:04 dub this over the original footage so um 13:07 take the lid off see this massive great 13:10 heat sink so this is a pel cool camera 13:12 we've got this blower fan producing a 13:14 fair amount of air flow through 13:16 it the connector here there's the ccds 13:19 mounted on the board on the 13:24 right this unplugs so we've got a bit of 13:27 power supply stuff on here 13:29 USB interface I think that's the Cyprus 13:32 microcontroller High speeded USB 13:34 interface there's a zyink spon fpga some 13:40 RAM and there's a couple of ATD 13:42 converters can't quite read what those 13:45 those are but anal 13:47 devices um little bit of bodgery around 13:51 here extra decoupling obviously they 13:53 have having some noise issues this is 13:55 around the ram chip quite a lot of extra 13:57 capacitors been added there 13:59 uh there's a couple of amplifiers prior 14:01 to the HD converter buffers or Andor 14:05 amplifiers taking the CCD 14:08 signal um bit more power spy stuff here 14:11 this is probably all to do with 14:12 generating the various CCD bias voltages 14:14 they uh need quite a lot of exotic 14:18 voltages next board down is just a 14:20 shield and an interconnect 14:24 boardly shielding the power supply stuff 14:26 from some the more sensitive an log 14:28 stuff 14:31 and this is the bottom board which is 14:32 just all power supply 14:34 stuff as you can see tons of capacitors 14:37 or Transformer in 14:42 there and this is the CCD which is a uh 14:47 very impressive thing this is a kf50 100 14:50 originally by true sense then codec 14:53 there ON 14:54 Semiconductor it's 50 megapixels uh the 14:58 only price I could find was this one 15:00 5,000 bucks and the architecture you can 15:03 see there actually two separate halves 15:04 which explains the Dual AZ converters 15:06 and two amplifiers it's literally split 15:08 down the middle and duplicated so it's 15:10 outputting two streams in parallel just 15:13 to keep the bandwidth sensible and it's 15:15 got this amazing um diffraction effects 15:18 it's got micro lenses over the pixel so 15:20 there's there's a bit more Optics going 15:22 on than on a normal 15:25 sensor few more bodges on the CCD board 15:28 including this wire which isn't really 15:29 tacked down very well which is a bit uh 15:32 bit of a mess quite a few bits around 15:34 this board where they've uh tacked 15:36 various bits on which is not super 15:38 impressive looks like CCD drivers on the 15:40 left with those 3 ohm um damping 15:43 resistors on the 15:47 output get a few more little bodges 15:50 around here some of 15:52 the and there's this separator the 15:54 silica gel to keep the moisture down but 15:56 there's this separator that actually 15:58 appears to be cut from piece of 15:59 antistatic 16:04 bag and this sort of thermal block on 16:06 top of this stack of three pel Cola 16:12 modules so as with any Stacks they get 16:16 um larger as they go back towards the 16:18 heat sink because each P's got to not 16:20 only take the heat from the previous but 16:21 also the waste heat which is quite 16:27 significant you see a little temperature 16:29 sensor here that copper block which 16:32 makes contact with the back of the 16:37 CCD and this's the back of the 16:40 pelas this then contacts the heat sink 16:44 on the uh rear there a few thermal pads 16:46 as well for some of the other power 16:47 components on this 16:51 PCB okay I've connected this uh camera 16:54 up I found some drivers on the disc that 16:56 seem to work under Windows 7 couldn't 16:58 get to install under Windows 11 though 17:01 um in the absence of any sort of lens or 17:03 being bothered to the proper amount I've 17:04 just put some f over it and put a little 17:06 pin in there to make a pinhole lens and 17:08 software gives a few options I'm not 17:11 entirely sure what all these are there's 17:12 obviously a clock frequency 22 MHz low 17:15 gain and with PFG no idea what that is 17:19 something something game programmable 17:20 Something game perhaps ver exposure 17:23 types I think focus is just like a 17:25 continuous grab until you tell it to 17:27 stop not entirely sure all these options 17:30 are obviously exposure time uh triggers 17:33 there ex external hardware trigger inut 17:35 you just trigger using a um thing on 17:37 screen so the resolution is 8176 by 17:40 6132 and you can actually bin those 17:42 where you combine multiple pixels to get 17:46 increased gain at the expense of lower 17:48 resolution down this is a 10sec exposure 17:51 obviously of the pin hole it's very uh 17:53 intensitive so we just stand still now 17:56 downloading it there's the uh exposure 17:59 so when it's 18:01 um there's a little status thing down 18:03 here so that tells you the um exposure 18:07 [Applause] 18:09 time it's this is just it 18:15 downloading um it is quite I'm seeing 18:18 quite a lot like smearing I think that I 18:20 don't know whether that's just due to 18:21 pixels overloading or something else I 18:24 mean yeah it's not it's not um out of 18:26 the question that there's something not 18:27 totally right about this camera 18:28 certainly was bodge wise on there um I 18:31 don't I'd imagine a camera like this 18:32 it's got a fairly narrow range of 18:34 intensities that it's happy with I'm not 18:36 going to spend a great deal of time on 18:38 this if you're interested in this camera 18:40 maybe for astronomy or something and 18:42 happy to sort of take the risk of it may 18:44 not be uh perfect I'll um I think I'll 18:47 stick this on eBay along with the 18:48 Illuminator I'll put a link down in the 18:50 description to the listing take your 18:52 chances to grab a bargain so for example 18:54 here we see this vertical streaking so 18:56 I'm not sure how normal that is this is 18:58 on fairly bright scene looking out the 19:02 window if I cut the exposure time down 19:04 on that it's now 1 second 19:07 exposure again most of the image 19:09 disappears again this is looks like it's 19:11 possibly over still overloading here go 19:14 that go down to say say quarter a 19:16 second so again I think there might be 19:19 some Auto gain control going on here um 19:21 this is with the PFG option let's try 19:23 turning that off and see what 19:25 happens so I'm not sure this is actually 19:27 more streaking or which just it's 19:29 cranked up the gain all the dis display 19:31 gray scale to show what um you know the 19:33 range of things that it's captured 19:36 there's one of one of 12 things in the 19:38 software there's um you can see of you 19:40 can't seem to read out the temperature 19:42 of the pelta cooler but you can set the 19:44 temperature and if you said it's a 19:46 different temperature you see the power 19:48 consumption jump up running the cooler 19:50 to get the temperature you requested but 19:52 I can't see anything anywhere that tells 19:54 you whether the cool is at the at the 19:56 temperature other than the power 19:57 consumption going down and there's no 19:59 temperature read out 20:03 here and just some yeah this is just 20:05 sort of very basic software I'm sure 20:07 there's like an API for more 20:09 sophisticated 20:10 applications but so if you know anything 20:12 more about these cameras please um stick 20:14 in the 20:15 comments um incidentally when I was 20:18 editing I didn't notice there was a bent 20:19 pin on the um CCD but I did fix that 20:22 before doing these tests and also 20:24 reactivated the um silica gel desicant 20:26 cuz I noticed it was uh I was getting 20:28 bit of condensation on the window but um 20:31 yeah so a couple of uh interesting but 20:34 maybe not particularly uh useful pieces 20:37 of Kit except for someone that's got a 20:38 very specific use so um I'll stick a 20:42 I'll stick these on eBay put a link in 20:44 the description and say hopefully 20:45 someone could actually make some uh good 20:47 use of these things Example Output: **Abstract:** This video presents Part 2 of a teardown focusing on the optical components of a Fluidigm Polaris biotechnology instrument, specifically the multi-wavelength illuminator and the high-resolution CCD camera. The Lumen Dynamics illuminator unit is examined in detail, revealing its construction using multiple high-power LEDs (430nm, 475nm, 520nm, 575nm, 630nm) combined via dichroic mirrors and filters. A square fiber optic rod is used to homogenize the light. A notable finding is the use of a phosphor-converted white LED filtered to achieve the 575nm output. The unit features simple TTL activation for each color, conduction cooling, and internal homogenization optics. Analysis of its EEPROM suggests extremely low operational runtime. The camera module teardown showcases a 50 Megapixel ON Semiconductor KAF-50100 CCD sensor with micro-lenses, cooled by a multi-stage Peltier stack. The control electronics include an FPGA and a USB interface. Significant post-manufacturing modifications ("bodges") are observed on the camera's circuit boards. Basic functional testing using vendor software and a pinhole lens confirms image capture but reveals prominent vertical streaking artifacts, the cause of which remains uncertain (potential overload, readout artifact, or fault). **Exploring the Fluidigm Polaris: A Detailed Look at its High-End Optics and Camera System** * **0:00 High-End Optics:** The system utilizes heavy, high-quality lenses and mirrors for precise imaging, weighing around 4 kilos each. * **0:49 Narrow Band Filters:** A filter wheel with five narrow band filters (488, 525, 570, 630, and 700 nm) ensures accurate fluorescence detection and rejection of excitation light. * **2:01 Customizable Illumination:** The Lumen Dynamics light source offers five individually controllable LED wavelengths (430, 475, 520, 575, 630 nm) with varying power outputs. The 575nm yellow LED is uniquely achieved using a white LED with filtering. * **3:45 TTL Control:** The light source is controlled via a simple TTL interface, enabling easy on/off switching for each LED color. * **12:55 Sophisticated Camera:** The system includes a 50-megapixel Kodak KAI-50100 CCD camera with a Peltier cooling system for reduced noise. * **14:54 High-Speed Data Transfer:** The camera features dual analog-to-digital converters to manage the high data throughput of the 50-megapixel sensor, which is effectively two 25-megapixel sensors operating in parallel. * **18:11 Possible Issues:** The video creator noted some potential issues with the camera, including image smearing. * **18:11 Limited Dynamic Range:** The camera's sensor has a limited dynamic range, making it potentially challenging to capture scenes with a wide range of brightness levels. * **11:45 Low Runtime:** Internal data suggests the system has seen minimal usage, with only 20 minutes of recorded runtime for the green LED. * **20:38 Availability on eBay:** Both the illuminator and camera are expected to be listed for sale on eBay. Here is the real transcript. Please summarize it: 00:00:01 welcome to software unscripted I'm your 00:00:01 host Richard Feldman today I'm talking 00:00:03 with Chris latner creator of Swift llvm 00:00:06 clang and most recently the Mojo 00:00:08 programming language at modular 00:00:10 Incorporated we talk about a lot of 00:00:12 different topics including Mojo language 00:00:14 design and API design performance both 00:00:16 compiler performance and runtime 00:00:17 performance and ecosystem 00:00:19 Evolution I want to give a massive thank 00:00:21 you to everyone who's been supporting 00:00:22 software unscripted on patreon if you 00:00:25 enjoy these episodes and you'd like to 00:00:26 become a supporter too check out 00:00:33 and now Mojo with Chris 00:00:33 latner all right Chris welcome awesome 00:00:36 Richard I'm very excited to be here it's 00:00:38 it's super funny I hear you talking to 00:00:40 my ears all the time and so it's it's 00:00:41 great to be able to talk with you well 00:00:44 likewise I've seen a bunch of your talks 00:00:45 and uh so so now you're making this 00:00:47 language called Mojo uh what's it all 00:00:49 about uh great question and like why why 00:00:52 is it a good idea to make a new 00:00:53 programming language have you ever 00:00:54 wondered that for your own I've never 00:00:56 been asked that yeah like everybody that 00:01:00 programming language are a bad idea why 00:01:01 would you ever do this right and so in 00:01:03 the case of Mojo it's it's a combination 00:01:05 of uh it's it's basically just a way to 00:01:08 solve a problem and so in in my 00:01:10 background uh as you probably know I've 00:01:12 built like the clang C++ implementation 00:01:15 and the Swift programming language and a 00:01:16 bunch of other stuff that's less uh 00:01:18 wellknown but um what I learned from 00:01:21 that is that building a programming 00:01:22 language is really expensive really hard 00:01:24 and you have to have really good reason 00:01:25 to do this and so when we started 00:01:27 modular our goal wasn't to create a new 00:01:30 programming language what we wanted to 00:01:31 do is we wanted to solve a bunch of very 00:01:34 hard tech let's go make gpus go Burr 00:01:37 let's go make accelerators that are even 00:01:40 weirder than gpu's work let's make it so 00:01:42 we can program this entirely new next 00:01:43 generation of hardware and we had some 00:01:46 tools we had this ml compiler framework 00:01:48 thingy had a whole bunch of new 00:01:50 algorithms and stuff like this but then 00:01:52 we decided we needed to have a way to 00:01:53 actually express it and for normal 00:01:55 programmers to be able to participate 00:01:57 and so coming out of that we decided to 00:01:59 make Mojo which is a solution to a 00:02:02 series of equations of we have this 00:02:04 programming model that we need to expose 00:02:06 how where do we get 00:02:08 syntax that's that's where that's where 00:02:10 Mojo came from actually 00:02:12 originally right so you're like we've 00:02:14 decided that we have problems where the 00:02:16 solution is in the shape of a 00:02:17 programming language ml is a big part of 00:02:20 that and now we need some sort of syntax 00:02:22 to put on top of this now you could have 00:02:24 just gone with like let's let's make up 00:02:26 a new syntax that's fun it's you bik 00:02:28 shed for for that b b there done that 00:02:31 with Swift and and and what I learned 00:02:34 and I I mean I could just share the 00:02:35 experience there is that it's a lot of 00:02:37 fun like being able to go through all 00:02:39 the the Lessons Learned on all these 00:02:41 different things integrate what you like 00:02:42 about all these different things into a 00:02:43 new system like that's a lot of fun it's 00:02:45 also really hard also takes a lot of 00:02:47 time you spend a lot of your energy on I 00:02:50 don't know people would say bike 00:02:51 shedding but I would say it's it's 00:02:52 design which is which is a really 00:02:55 difficult skill um and what we wanted to 00:02:57 spend our time on is two things one is 00:03:01 the actual semantics and the execution 00:03:03 platform the compiler stuff like how it 00:03:06 actually runs fast and how we express 00:03:08 the full capabilities of Hardware that's 00:03:10 what we wanted to spend our time on and 00:03:11 second we wanted to meet people where 00:03:13 they are and in our case in the AI and 00:03:16 and other space everybody's in Python 00:03:19 right and so we thought okay well I mean 00:03:21 I know Swift pretty well like should we 00:03:23 just use Swift and hack Swift up or 00:03:24 should I hack up C++ or hack up a domain 00:03:27 specific embedded language in Scola or 00:03:29 or whatever did not really think about 00:03:31 that by the way but um uh but I love 00:03:35 schola but yeah what we realized was we 00:03:37 realize that because the entire 00:03:39 Community was around python what we 00:03:41 wanted is we wanted Mojo to be a strong 00:03:43 member of the Python family and look and 00:03:44 feel and allow transferable skill set so 00:03:46 programmers didn't have to like retach 00:03:48 themselves how to overload the plus 00:03:50 operator but we didn't like any of the 00:03:52 implementation of python and so fast 00:03:55 enough well it's it's there's I mean 00:03:59 should I be uh positive about python CU 00:04:02 I love it uh well so I love python the 00:04:05 language but the implementation of 00:04:06 cpython is both not very fast it's an 00:04:09 interpreter it uh is not even using 00:04:11 state-of-the-art interpreter techniques 00:04:13 like Justin time compilers and things 00:04:14 like this um it also has certain like 00:04:18 software dependency problems which 00:04:19 people feel in packaging like that 00:04:23 that's a problem there's also no way 00:04:25 that python would ever run on a on a GPU 00:04:27 M right and and for for example example 00:04:30 like you've done some great work in rock 00:04:31 where you need to be able to build down 00:04:34 to effectively an embedded system like 00:04:36 you have your whole platform abstraction 00:04:38 and things like this right and so and so 00:04:41 that you have to design for that and if 00:04:43 you design for that then you can achieve 00:04:46 that you can make zero dependence you 00:04:47 can make you know 40 kilobyte hello 00:04:49 world or something like that but but 00:04:51 python can never do that right just 00:04:52 because the entire system was never 00:04:54 built for that well also there's I mean 00:04:56 something you've said before is like API 00:04:58 design is kind of the main thing it's 00:05:00 like the biggest thing and I think uh 00:05:02 this is sort of an underrated concept I 00:05:04 think people think a lot about languages 00:05:05 and a lot about like libraries and stuff 00:05:07 like that in terms of like what can they 00:05:09 do what do they let you do but I think 00:05:12 it's just as important like what don't 00:05:13 they let you do like what do they rule 00:05:15 out a thing that people ask me all the 00:05:17 time is like hey can we just compile 00:05:19 Rock to JavaScript and that sounds like 00:05:21 a really easy thing to say until you 00:05:23 think about like our standard Library 00:05:25 it's like when you ask for the length of 00:05:26 the list we give you a 64-bit unsigned 00:05:28 integer like what's that going to be in 00:05:30 JavaScript is that going to be a big int 00:05:31 are we going to convert that to like be 00:05:33 really slow if you do yeah right it's 00:05:35 going to be super slow Heap allocated 00:05:36 and then like if we convert it to float 00:05:38 which 64-bit float which is what sort of 00:05:40 JavaScript likes well now we have 00:05:42 different lengths and we have like there 00:05:43 there's a whole new set of problems 00:05:45 there well and when you get really big 00:05:46 you start dropping precision and that's 00:05:48 not also super great either right yeah 00:05:50 so there's there's a lot of potential 00:05:52 problems there something I really like 00:05:53 about Swift's API design is that when 00:05:57 you have um like a the string library 00:06:00 for example like you have this concept 00:06:02 of like we have the btes that are 00:06:04 underneath the string and they're utf8 00:06:06 and then we have uh graphine clusters or 00:06:08 extended graphine clusters and those are 00:06:11 sort of the two units like I haven't 00:06:13 seen any other standard library that 00:06:14 does that that that sort of draws that 00:06:16 distinction and says like we don't have 00:06:18 a thing called like a character in the 00:06:20 sense of like a code code point or 00:06:24 because those are not useful unless 00:06:25 you're writing like an implementation of 00:06:27 graphine clustering or something like 00:06:29 that 99% of the time yeah it's super 00:06:32 funny you mention that I mean I I 00:06:33 haven't thought about that for a long 00:06:34 time but um but the way that emerged is 00:06:36 that you know one of the benefits of 00:06:38 building a new language or a system is 00:06:39 you get to reevaluate age-old problems 00:06:42 right and so Swift is like 14 years old 00:06:44 now right and so you go back to some of 00:06:46 these decisions made and they were made 00:06:48 in uh like 2012 you know kind of time 00:06:52 frame right yeah but but even at that 00:06:54 time it was it was very clear Java for 00:06:56 example and um Windows had started with 00:06:59 saying okay well internationalization is 00:07:01 important and so we can't use 8bit 00:07:03 characters let's use 16bit characters 00:07:06 right it's like okay wait that's not 00:07:08 good enough well now we need 32bit 00:07:10 characters well yeah wait a second 32bit 00:07:12 characters also don't make sense because 00:07:13 you have composing characters and so you 00:07:16 have a and then you have the hat and 00:07:18 they're two different Unicode code 00:07:19 points and and you know one of the 00:07:21 things I think that was amazing about 00:07:22 Apple and is amazing is that they're 00:07:24 very enduser Centric and so you want to 00:07:27 build a UI app for people to use 00:07:30 to write to build you know people are 00:07:32 building applications for normal normal 00:07:33 humans not programmers right and so you 00:07:36 you want all this stuff to just work and 00:07:37 when you when you SL when you highlight 00:07:39 some text in a little window you don't 00:07:41 want to take a graph like the hat before 00:07:44 it takes the a or something like this 00:07:46 right and so graphing clusters are 00:07:47 actually the human visible normal 00:07:49 default thing that people should think 00:07:51 about and as you say like the is it 8 00:07:54 Bits 16 bits variable size this and that 00:07:56 the other thing like okay well yes utf8 00:07:59 is actually strictly better than any of 00:08:00 the other Alternatives and so cool use 00:08:02 that but that was really just more a 00:08:04 consequence of a caring but also B 00:08:06 learning from the mistakes of the past 00:08:08 so yeah and I think from an API design 00:08:11 perspective what's amazing to me is the 00:08:13 fact that Swift's design says it it 00:08:16 subtracts it says like it's not saying 00:08:18 like oh yeah you got you got characters 00:08:20 that mean this you got characters in 00:08:21 that sense and characters in that sense 00:08:22 it's like no no no we're just going to 00:08:24 take out like you have extended graphine 00:08:26 clusters which makes sense and then you 00:08:27 have bites which you need 00:08:31 purp intermediate if you what you're 00:08:33 doing you canes that but we're not going 00:08:36 to make that part of the standard 00:08:37 library because that's a really really 00:08:39 Niche weird thing to be doing and people 00:08:41 trip over it all the time thinking that 00:08:43 that's the right thing they should reach 00:08:44 for and unfortunately as it happens if 00:08:47 you reach for like scalers or uh code 00:08:50 points or something like that like if 00:08:52 your test cases are simple enough you 00:08:54 won't realize that you have lingering 00:08:56 bugs it'll everything will look fine um 00:08:58 and that sort of subtraction is the only 00:09:00 way to get the programmer benefit in my 00:09:02 mind well and I I think that a lot of 00:09:04 your point is um make it this comes back 00:09:07 to the design aspect right which is make 00:09:09 it the thing that people naturally reach 00:09:11 for is the safe default totally right 00:09:14 and you don't have to you don't have to 00:09:15 prevent people from having lowlevel 00:09:17 access and bit banging or something like 00:09:19 that but but you want to make it so that 00:09:21 uh that that's something that you opt 00:09:23 into not something you accidentally get 00:09:25 without thinking about it right and so I 00:09:27 think that that's that's a big thing 00:09:29 bringing that back to Mojo like we were 00:09:31 talking about the sort of the Python 00:09:33 ecosystem and stuff like that where on 00:09:35 the one hand there's lots of really 00:09:36 useful python code out there that if you 00:09:39 can just access that immediately that 00:09:41 gives people a really nice bootstrapped 00:09:43 ecosystem on the other hand uh so so 00:09:47 let's talk about a little bit about some 00:09:48 of those API design trade-offs that you 00:09:50 have to deal with in terms of if you 00:09:52 want the python compatibility so I 00:09:54 believe you said that one of Mojo's 00:09:56 goals is to be a super set of python is 00:09:57 that right yeah so over time you can 00:09:59 think of is like a python Plus+ and so 00:10:01 the same way that C++ is a super set of 00:10:03 C you can think of python and mojo as 00:10:06 being kind of co-evolving over time but 00:10:09 the goal of Mojo is to be a python Plus+ 00:10:12 now you know often smart people like you 00:10:15 will give me a hard time and say okay 00:10:16 well but you took a new keyword and 00:10:17 therefore you're not a superet etc etc 00:10:19 ET right and well so did so did C++ C++ 00:10:23 took a whole bunch of different keywords 00:10:24 right and it's not like C stopped 00:10:26 evolving just because C++ came on the 00:10:28 scene and so and so what happens 00:10:29 naturally is you get effective super set 00:10:32 even though technically there's little 00:10:34 things on the edge that that you know 00:10:37 have an issue it's like super set unless 00:10:38 you happen to name a variable class 00:10:40 which is a keyword now yeah right that 00:10:41 type of thing exactly exactly which I 00:10:44 think an important distinction there is 00:10:46 it's like we also we we also have ways 00:10:48 to solve for that too so but but sorry 00:10:50 keep going cool okay that's interesting 00:10:52 I'm I'm curious to hear about that later 00:10:54 maybe but uh one of the things that's 00:10:56 interesting to me about the suet 00:10:58 approach is that it it raises this 00:11:01 intriguing trade-off in my mind around 00:11:02 so you have this big ecosystem and a lot 00:11:05 of it is written using sort of python 00:11:07 idioms and things that I'm not going to 00:11:09 say they're necessarily fast or slow in 00:11:10 Python but it's like they make sense in 00:11:12 Python and on the one hand you want to 00:11:14 access that ecosystem but on the other 00:11:15 hand like you said at the beginning the 00:11:17 the big motivation for Mojo is to solve 00:11:19 these really hard especially performance 00:11:20 oriented problems so how do you think 00:11:22 about sort of you know using that 00:11:25 ecosystem but also probably evolving 00:11:28 replacement parts guessing for for that 00:11:30 that are written in lower level ways or 00:11:32 or not you're very smart guy that's 00:11:34 exactly what we're doing okay so let me 00:11:37 give you an example this uh python has a 00:11:40 data class right a data class is roughly 00:11:43 a aggregation of different fields Mojo 00:11:46 sure you can have classes someday and 00:11:48 like you can have all these different 00:11:49 things and that's fine we don't want to 00:11:50 remove those but we can also have a 00:11:52 struct a struct allows you to have an 00:11:54 inline collection of fields that's what 00:11:56 you actually want most of the time and 00:11:59 so so now our our approach on this is 00:12:01 not to remove the data class we'll 00:12:04 support it it can be compatible and it 00:12:06 can have the same reference semantic 00:12:07 behavior and things like this you'd 00:12:09 expect that's awesome it's not that 00:12:11 that's a bad thing but we can add a new 00:12:13 thing and um and it has these other 00:12:15 advantages it's like in line it's faster 00:12:17 it has all the stuff it's more 00:12:19 predictable etc etc um let me give you 00:12:22 let me zoom out in generalized from my 00:12:24 past history so when we brought up Swift 00:12:27 we we're doing exactly the same thing to 00:12:29 Objective C as Mojo is doing to python 00:12:32 right and it's it's different in many 00:12:35 and it was much harder for Swift because 00:12:37 Swift was very different syntax than 00:12:39 Objective C but one of the things we 00:12:41 were doing is we were bringing a new 00:12:42 language that had for example enums and 00:12:45 pattern matching and things like this 00:12:46 right in into a ecosystem that didn't 00:12:49 and so Objective C just like python has 00:12:51 a whole bunch of different ways to be 00:12:52 able to kind of work around the 00:12:54 limitations of not having these things 00:12:56 everything with massively objectoriented 00:12:58 programming patterns and things like 00:13:00 this and so what we did with swift was 00:13:02 we said 00:13:03 okay guess what all all of you objectiv 00:13:05 see programmers in that case we're 00:13:07 changing the syntax completely because 00:13:09 that was part of the job to be done um 00:13:11 but but but we need all the iOS sdks to 00:13:14 work the same way and so when Swift one 00:13:16 launched we said all of the patterns 00:13:19 that you've already developed all of 00:13:21 your API knowledge all of that comes 00:13:24 over and now you can build an app in 00:13:26 Swift or you could build a feature in 00:13:28 Swift into your Objective C app um and 00:13:31 all of that knowledge comes over but in 00:13:33 that case you have to learn new syntax 00:13:35 and so but Swift also has enums for 00:13:38 example and a lot of the journey that 00:13:40 object C programmers went through is 00:13:41 they're like okay well learning how to 00:13:43 not use square brackets from Objective C 00:13:45 or whatever um is kind of weird but now 00:13:47 I get it oh wow this is easy okay I get 00:13:49 going oh wow I discovered enums wow this 00:13:51 is really cool I've never seen pattern 00:13:54 matching before oh wow this is like and 00:13:57 to me to me that's success and some 00:13:59 other people for a long time would stay 00:14:01 writing the classical 00:14:03 objectoriented oopy patterns and stuff 00:14:05 like that and that those also still 00:14:06 worked and so it wasn't a we're trying 00:14:08 to like brainwash you or force you into 00:14:09 a thing it's that we're giving you 00:14:11 access to pretty well understood 00:14:13 technology from a PL perspective um yeah 00:14:16 in in a way that you can now access and 00:14:18 makes your life better and so a lot of 00:14:19 this is what we're doing with Mojo and 00:14:21 so this is the journey that I see a lot 00:14:23 of python programmers go through when 00:14:24 they come to Mojo is they say okay well 00:14:27 right now Mojo is not a full super that 00:14:29 python is missing a bunch of features um 00:14:31 but it's still really useful for a bunch 00:14:33 of different things and one of our goals 00:14:35 that we're building into for this fall 00:14:36 is make it super easy to make a python 00:14:39 package from Mojo and so you can say I 00:14:42 want to I'm actually going to take a 00:14:44 python package I'm going to upgrade part 00:14:46 of it and instead of being split between 00:14:48 Python and c I just want to write in 00:14:49 Mojo and what you're doing is you're 00:14:51 subtracting all the complexity of 00:14:53 interoperating with c and all this 00:14:54 different kind of stuff and providing 00:14:57 you know the same performance or better 00:14:58 that you get for C or C++ but now it's 00:15:00 just way easier to build one of these 00:15:02 packages and and that that's a thing 00:15:04 where Suddenly It's very valuable and 00:15:06 useful for people and also if you have 00:15:08 like new new PL technology available 00:15:11 then people can learn in place but they 00:15:12 don't have to learn a new way to 00:15:14 overload the plus operator things like 00:15:16 this right right and so it sounds like I 00:15:19 mean maybe the ideal is if you look at 00:15:21 something like um you know just to put 00:15:23 the python part aside if you look at 00:15:25 like a typescript for example um you 00:15:27 could say that typescript is kind of 00:15:29 taking the like package ecosystem of 00:15:32 thesis approach to npm where it's like 00:15:34 At first nothing is in typescript and 00:15:36 then oh typescript here and there and 00:15:38 then also that affects the the new API 00:15:40 like when version 2.0 comes out or 3.0 00:15:42 comes out they do it with typescript in 00:15:44 mind and then now eventually you end up 00:15:46 having npm being a primarily or maybe 00:15:48 even eventually completely typescript 00:15:50 ecosystem even though it started out 00:15:52 with zero yeah I mean so ship aesus 00:15:55 that's what you're referring to uh where 00:15:56 you rebuild the whole thing and you 00:15:58 place one board at a time but you get to 00:16:00 the same ship but all the parts are 00:16:02 different right and that's kind that's 00:16:04 kind of equivalent to what happened to 00:16:05 iOS right is IOS was completely 00:16:07 Objective C and over time they've been 00:16:10 rewriting big chunks of it in Swift and 00:16:11 you kind of ships ship of thesis I guess 00:16:14 um but but more broadly like if you look 00:16:16 at PL communities it's super funny you 00:16:18 talk about 00:16:19 JavaScript JavaScript was not built for 00:16:22 nodejs you know that was not it was it 00:16:25 was designed for onclick handlers and 00:16:27 things like this so one one of the again 00:16:30 obviously javascript's amazing and it's 00:16:31 come a long ways but uh but the uh but 00:16:34 the natural thing that happens with 00:16:36 successful language communities is that 00:16:37 programmers want to bring their skill 00:16:39 sets forward and as they do this they 00:16:42 bring it into adjacent domains that they 00:16:44 want to apply the technology to and so 00:16:46 this is something that that natural 00:16:48 thing that that happens with scaling 00:16:51 communities because it's a community 00:16:52 aspect it's not a technology aspect it's 00:16:54 it's a consequence of the community 00:16:56 that's what Mojo is really built for 00:16:58 right and so in modular we're building 00:17:00 Mojo because we care about a lot of AI 00:17:02 and GPU and accelerator stuff like this 00:17:04 but a lot of the folks in the Mojo 00:17:06 Community are very happy building gooey 00:17:07 libraries and you know I want a faster 00:17:11 fast API or you know whatever kind of 00:17:13 thing and and so it's amazing to see 00:17:16 that and that's one of the things where 00:17:18 if you get these fundamental lower level 00:17:20 Technologies right you can get something 00:17:22 that's very generally useful and then 00:17:24 you can also scale into this and then 00:17:26 when you're writing a web server for 00:17:28 example there's a kajillion packages out 00:17:30 there that are really really really 00:17:32 important that you need to be able to 00:17:34 talk to and not saying that you know 00:17:36 fixing the cold start problem of like 00:17:38 you can just import any of them and they 00:17:39 just work today in Mojo is actually 00:17:42 extremely valuable and then if you want 00:17:43 to ship a thesia something then you can 00:17:45 totally do that over time right have you 00:17:48 thought about like talking about that 00:17:49 use case of um developing python 00:17:52 packages in Mojo have you thought about 00:17:54 doing things like optimizing to be like 00:17:57 oh well we we'll import this python 00:17:59 package as a dependency but we see that 00:18:02 they're like through static inference we 00:18:04 can figure out that they're only using 00:18:06 this data class in a way that it could 00:18:07 have been a struct and so we'll just you 00:18:09 know what we'll just pretend that they 00:18:11 said struct and then you know move on 00:18:13 from 00:18:14 there so so you're saying have I thought 00:18:16 about that or what do I think about it 00:18:17 or yeah yeah or do you plan to do that 00:18:19 or seem out of scope that that is a very 00:18:21 common idea I think it's a terrible idea 00:18:24 okay no 00:18:25 offense so so get asked about this all 00:18:27 the time and so so again I'm I'm being 00:18:30 provocative to try to be you know more 00:18:33 funny right but uh it's it's not a 00:18:35 terrible idea get ask about this all the 00:18:37 time there's probably I don't know 30 00:18:40 different things in the python ecosystem 00:18:42 of the form let's take python that's 00:18:44 Dynamic and try to make it static and 00:18:46 therefore faster there's piie there's 00:18:49 codon there's there's a million of these 00:18:51 inscript unlean swallow not inscripted 00:18:53 unlean swallow back in the day like 00:18:55 there's a million of these different 00:18:56 things right and and here's here's the 00:18:58 challenge with this so sorry by the way 00:19:00 I identify as oldfashioned compiler guy 00:19:04 who learned who's learned a lot across 00:19:07 the ears right um the the challenge with 00:19:10 this is that if you build you know that 00:19:12 sufficiently smart compiler that will 00:19:14 make your code 00:19:16 fast what actually happens is that um 00:19:20 you optimize for some Benchmark right 00:19:22 you make some marketing claim you you 00:19:24 you make it so that some combination of 00:19:26 the source code that you have today plus 00:19:28 the mag magic compiler gives you an 00:19:30 outcome that you like and maybe it's a 00:19:32 10x speed up right um the challenge with 00:19:36 that is that now you go fix a bug in the 00:19:38 code it breaks compiler and you get a 00:19:39 10x 00:19:40 slowdown sure right and so what you've 00:19:43 built is you've built a demo you've 00:19:45 built something that can look really 00:19:47 good on a marketing slide or something 00:19:49 like that and yeah it's probably useful 00:19:51 and like again pipie for example it 00:19:53 provides like a 20x speed up or 00:19:54 something like that over python code and 00:19:56 stuff like this but for a programmer the 00:19:58 problem is you have no control and now 00:20:00 you have to understand both your code 00:20:01 which you keep Simple because it's all 00:20:03 Dynamic but then you have to understand 00:20:05 the compiler thing and where it's 00:20:07 sufficiently smart and where it falls 00:20:09 off the cliff and then you have to know 00:20:11 all of that complexity in order to get 00:20:13 the outcome that you care about which is 00:20:14 way faster right and so Mojo's approach 00:20:16 is saying again there's so much stuff in 00:20:18 the python Community that's been about 00:20:20 like let's make python un unmodified go 00:20:22 fast and you can get 20x speed ups and 00:20:25 stuff like this but Mojo is saying let's 00:20:27 not work forward for from Python and try 00:20:29 to make python a little bit better we're 00:20:31 saying let's work backwards from the 00:20:33 speed of light of Hardware unlock the 00:20:35 full power of Hardware which isn't just 00:20:37 you know int being fast and not boxed 00:20:40 but it's also like accelerators and simd 00:20:42 and like all these kinds of things and 00:20:45 um and do that and this is where again 00:20:47 it depends on your workload but you know 00:20:48 we have a blog post showing that Mojo 00:20:50 can be 65,000 times faster than python 00:20:52 right and and and the the usual push 00:20:56 back on that is it's like well you would 00:20:57 never write numeric code that's doing 00:20:59 math and dense dense arithmetic in 00:21:03 Python but that's literally the point of 00:21:06 Mojo is to make it so all that code you 00:21:08 would never write in Python you can 00:21:09 write in a language that's coherent and 00:21:11 consistent with python so you don't have 00:21:13 to switch languages right and so right 00:21:15 and you can write it right next to your 00:21:16 python code in the same file that's 00:21:18 right that's right and so this is this 00:21:19 is this is a big deal and so a lot of 00:21:21 that comes back to anyways so again I 00:21:22 was being provocative and you know being 00:21:24 being mean to you here but but but a 00:21:27 more neutral way saying this is that 00:21:29 instead of prioritizing magic and like 00:21:32 look you know there's an instant uh like 00:21:36 we're a better python in that way what 00:21:38 we're saying is we're saying we give you 00:21:39 tools we give you stroks we give you 00:21:42 type annotations we give you these 00:21:43 things these are very powerful and if 00:21:45 you want something to be an inline 00:21:47 64-bit thing on your stack use colon int 00:21:50 if you want it to be a boxed object 00:21:52 don't it's fine you 00:21:54 choose right something this reminds me 00:21:56 of is that I think that examp Le of you 00:21:59 write the code and it gets optimized 00:22:01 great and it's amazing performance 10x 00:22:03 faster and then later on you make one 00:22:05 slight tweak and then it de optimizes by 00:22:07 10x and now you know everything's 00:22:09 terrible um that actually reminds me of 00:22:11 it optimization in general is like that 00:22:14 right like I mean like we use lvm for 00:22:16 rock thank you for creating that by the 00:22:17 way um I like how fast it makes our 00:22:19 compiled Rock 00:22:21 programs um but uh but at the end of the 00:22:24 day we we do have those same kinds of 00:22:26 considerations where in a lot of cases 00:22:27 we're thinking about as we're writing 00:22:29 the code okay what's lvm going to do 00:22:31 with this and that's like you know 00:22:33 somewhat behind the scenes and it seems 00:22:35 to me that there is somewhat of an 00:22:38 inherent uh sort of spectrum and 00:22:40 trade-off along the lines of like 00:22:42 explicitness to how much the compiler is 00:22:45 able to help you and on the one extreme 00:22:47 end of explicitness you have handwrite 00:22:49 your assembly and don't optimize it at 00:22:50 all that's right and as we've known 00:22:52 since like 1950 with Fortran like 00:22:55 compilers can do better than that like 00:22:56 if you think you're going to outdo a 00:22:57 compiler Optimizer by just handwriting 00:22:59 everything you're probably wrong you 00:23:02 know the human brains get at a scope of 00:23:04 like 10 or 15 machine instructions or 00:23:06 something but when you get into tens of 00:23:07 thousands then like you just don't have 00:23:09 the attention span to do that right but 00:23:11 then on on the extreme well not the 00:23:12 extreme but like on the other end of 00:23:14 that like something that like when I'm 00:23:15 writing rust code for example all the 00:23:16 time I'm thinking about coves so for 00:23:19 those who aren't familiar with C move 00:23:20 instructions that's like you have an if 00:23:22 and an else and there's a certain 00:23:25 processor instruction that has 00:23:26 significantly better performance 00:23:28 character istics if you can get it and 00:23:30 the way that you get it is what you do 00:23:32 inside the if Branch what you do inside 00:23:33 the elf Branch El branch have to be very 00:23:35 very restricted and if they are 00:23:36 restricted in exactly the right way then 00:23:39 the optimizer will emit a SE move 00:23:40 instruction instead of a normal 00:23:41 branching instruction and life is good 00:23:44 performance-wise to oversimplify a huge 00:23:46 amount of things of course c c c move is 00:23:48 also really bad if you have a uh very 00:23:51 predictable Branch like yeah right so 00:23:54 that's one of the many things I'm 00:23:55 oversimplifying here yeah it can't be a 00:23:57 DE optimis 00:23:59 right but generally speaking as I 00:24:01 understand it pretty much always 00:24:02 optimizers will turn into a c move if 00:24:04 they have the ability to um because in 00:24:06 the typical case it's so much better but 00:24:09 that's an example of the type of thing 00:24:10 where you don't have to go all the way 00:24:11 to Assembly Language to have like if you 00:24:13 wanted to you could have a language 00:24:14 level construct of I want a c Move 00:24:16 versus I want a branch and maybe you 00:24:17 have a linter that like tells you hey 00:24:19 this could have been a c move you know 00:24:20 are you sure um but that type of thing 00:24:24 is there's this huge amount of language 00:24:26 design space in there where you can you 00:24:28 can make it like I I would say on Rock's 00:24:31 side we are much more on the side of 00:24:33 pretend there is no Optimizer like don't 00:24:35 give you a lot of low-l knobs to tweak 00:24:36 and we're gonna optimize things and 00:24:38 hopefully do a good enough job for the 00:24:40 type of language that rock is I think 00:24:41 that makes sense but it sounds like Mojo 00:24:43 is more in the space of like you have 00:24:46 access to higher level constructs in the 00:24:47 form of python things but also you have 00:24:50 access to more explicit lowlevel things 00:24:52 and the expectation is that if you want 00:24:54 to become a really expert Mojo 00:24:55 programmer you you can there's like more 00:24:57 and more you can learn that's not just 00:25:00 here's what the optimizer is doing it's 00:25:01 like I can be more and more explicit 00:25:03 about the exact instructions that I want 00:25:05 to come out or the mli I guess that I'm 00:25:07 feeding to lvf sure is that right well 00:25:09 so so so the way I would I would 00:25:10 separate this out into a couple of 00:25:12 different pieces right there's um you 00:25:14 know if you're a compiler nerd 00:25:16 implementing an Optimizer you have a 00:25:17 bunch of design trade-offs you have to 00:25:19 make when you're doing that right and so 00:25:21 I I gave a talk at the cgo 00:25:24 conference 12 years ago or something a 00:25:26 long time ago talking about the par bars 00:25:28 of heroic compiler optimizations and the 00:25:30 tldr on that is this thing that we're 00:25:33 talking about which is if you give a 10x 00:25:35 performance Improvement then you take it 00:25:37 away when you change your code that's 00:25:38 not actually great for ux but instead 00:25:41 what you can do is you can do static 00:25:42 analysis tools and you can tell the 00:25:44 programmer hey change your array of 00:25:47 structs into structs of arrays or 00:25:48 whatever whatever the thing is and now 00:25:50 you get a 10x performance Improvement 00:25:52 and then the programmer can put that in 00:25:54 their code and then they own it it's 00:25:56 it's explicit right and so it's the 00:25:58 predictability thing that actually in my 00:26:00 opinion really matters um and so how you 00:26:04 design a compiler 00:26:05 Optimizer uh really affects it now lvm 00:26:09 is a very useful one uh it has some good 00:26:11 things it has some bad things it's also 00:26:12 over 20 years old so glad it's useful um 00:26:16 in the case of Mojo we use some of it 00:26:18 but we've also disabled a lot of the 00:26:19 optimizer and so interesting um I we 00:26:23 gave a talk uh this last fall at the lvm 00:26:26 developer meeting talking about we use 00:26:27 lvm the Parts oh and and just if you 00:26:31 don't mind just let me give me I'll 00:26:32 write down a list of the exact 00:26:33 optimizations that you've enabled and 00:26:35 and we'll make sure to use those we Bas 00:26:37 we basically use lvm as a per 00:26:39 function Optimizer code generator we 00:26:41 disable the vectorizer there's a bunch 00:26:43 of stuff that anyways but the uh but do 00:26:46 you do your own inlining first or yeah 00:26:48 we have our own inliner yeah okay okay 00:26:50 got it that makes sense and so we've 00:26:51 we've reimplemented a ton of the 00:26:53 compiler pipeline in ml which is a 00:26:55 higher level compiler stack um but the 00:26:57 uh uh for uh one one problem that lvm 00:27:00 has for example is that lvm IR uh cannot 00:27:04 be optimized 00:27:07 multi-threaded and so have you ever been 00:27:09 on a computer with more than one core uh 00:27:12 maybe it's happened yeah it's yeah turn 00:27:15 out multicore is no longer the future 00:27:17 and so like being able to paralyze your 00:27:18 compilation is actually a pretty big 00:27:20 deal and so M can do that LM is not 00:27:23 super great at that um anyways there's 00:27:25 ways to work around this and and get 00:27:26 that to work but that's really important 00:27:27 for build times um but uh so so I mean 00:27:31 there's a bunch of stuff that goes in 00:27:32 there but then where I going is that 00:27:34 when you're designing an Optimizer from 00:27:36 scratch there's a bunch of trade-offs 00:27:37 you can choose to 00:27:39 make you and we use a huge amount of lvm 00:27:43 and it for better or worse it makes a 00:27:44 bunch of decisions on C moves and stuff 00:27:46 like this and so and so as a consequence 00:27:49 of what lvm chooses to do we have to 00:27:50 kind of live with it and work around or 00:27:51 decide to disable it or hack it or you 00:27:54 know you're working with a a useful pile 00:27:57 of stuff that has trade-offs right and 00:27:59 then from that you then build your 00:28:01 language sematic on top of that right 00:28:03 and you say okay well you as an as a 00:28:06 language implementer have to know a lot 00:28:07 about the compiler end to end and how 00:28:09 all the pieces compos together do your 00:28:11 users well that's that's a choice right 00:28:13 and so what we've decided to do with 00:28:15 Mojo and the contract we want to provide 00:28:17 to our users is that if you're writing a 00:28:21 high performance matrix multiplication 00:28:22 or something really low level and you or 00:28:24 you're writing GPU code and want to get 00:28:26 Peak flops out of your h100 00 and use 00:28:29 the tensor memory accelerator and blah 00:28:30 blah blah blah blah like okay well 00:28:32 you're pretty hardcore and what you want 00:28:35 is control if you're writing Dynamic 00:28:38 code and you're kind of a python 00:28:40 programmer and you know the 8020 rule 00:28:42 most code doesn't matter for performance 00:28:45 or you're hacking out a script or your 00:28:46 researcher or whatever well you 00:28:47 shouldn't have to care about any of that 00:28:50 it's fine so and so the balance is 00:28:52 really how do you get dynamism and uh 00:28:55 flexibility without without cting 00:28:58 against control and this is where types 00:29:00 for example are pretty cool because if 00:29:02 you opt into types then you can have a 00:29:04 lot of control and you get things like 00:29:06 inline allocation if you choose not to 00:29:08 well you get boxing and Heap and blah 00:29:09 blah blah all the usual stuff and it's 00:29:12 super obvious you know it's easy to 00:29:14 control it's easy to reason about and so 00:29:15 that's that's great yeah I do think 00:29:18 about um you mentioned the the sort of 00:29:20 the static analysis route where it's 00:29:21 like hey let me just tell you about the 00:29:23 optimizations you could do in line to 00:29:25 your code to make it run faster I I 00:29:28 think that's a really interesting 00:29:29 concept and in my mind it's more or less 00:29:32 useful depending on how lowlevel your 00:29:34 code sort of wanted to be if that makes 00:29:35 sense because on the one hand if you're 00:29:37 like to give your example of like you're 00:29:38 writing a matrix multiply like really 00:29:40 complicated low level like this is all 00:29:42 gonna I want I want really really 00:29:43 precise control then like yeah telling 00:29:45 me what to do gives me even more control 00:29:47 that's awesome if I'm writing a web 00:29:49 server I probably don't want my code to 00:29:51 look like that because now refactoring 00:29:52 it means I have to refactor all the 00:29:54 lowlevel stuff that the the static 00:29:56 analysis told me to write it into so so 00:29:58 it's it's an interesting balance of like 00:30:00 you know the I'm I'm a generally default 00:30:03 explicit person I like explicitness I'm 00:30:05 not like oh let's let's hide this and 00:30:07 that um but at the same time I 00:30:10 appreciate that like explicitness does 00:30:12 have downsides and that seems like it's 00:30:13 one of them well so so but pick the web 00:30:16 server example what what you're 00:30:18 presumably doing right and I assume I 00:30:21 assume this but tell me if I'm off base 00:30:23 but you're using you're using a 00:30:25 framework right potentially yeah right 00:30:28 and so if you're using and so if you're 00:30:29 using a framework then what you're doing 00:30:31 is you're bringing its threading model 00:30:33 its concurrency patterns it's uh you 00:30:35 know whatever all all the different ways 00:30:37 you set up and respond to the requests 00:30:39 and like all this and so the framework 00:30:40 provides that for you and you don't 00:30:42 really have so much of a choice now if 00:30:43 you're in the case of rock maybe you're 00:30:45 going all the way down and you're 00:30:47 building a tcpip stack or something 00:30:48 right I mean there are different 00:30:50 extremes or you're you know putting down 00:30:52 an fbga or something but in that case 00:30:54 it's it's really about how to use the 00:30:56 API right that's that's a great point 00:30:58 yeah and so and so what these tools need 00:31:00 to do is they need to map to the 00:31:03 abstraction that the programmer is 00:31:04 thinking in and if they can do that they 00:31:06 can be very effective um when I when I 00:31:09 complain about this really uh it's it's 00:31:12 not that you know heroic optimizations 00:31:14 are a bad thing is that um they get used 00:31:17 when they get used contextually when you 00:31:19 can't change the 00:31:21 code and so that's so for example in C 00:31:24 compilers there's this Benchmark Library 00:31:26 called specken and specint is the way 00:31:29 that people would you know bet test CPUs 00:31:32 against each other but also see 00:31:33 compilers against each other and things 00:31:35 like this and it's GCC faster than lovm 00:31:37 and you go you go try spec right but 00:31:39 spec because they're it's an industry 00:31:41 Consortium they want to be very 00:31:43 hardcoded is you know the source code is 00:31:46 fixed and so you incentivize the 00:31:48 compiler people myself in a former life 00:31:51 right uh to to go make the compiler take 00:31:53 the code and without modification make 00:31:55 it go fast right but but the observation 00:31:58 is that if you're an app programmer or 00:31:59 you're building anything that matters 00:32:02 about performance really you can change 00:32:04 the code and so and and and often it's 00:32:08 way easier to change the code than it is 00:32:10 to go write meta coding into a compiler 00:32:14 and then do all this kind of stuff and 00:32:15 so it's not that that's bad it's it's 00:32:17 just that it's it's designed for a 00:32:19 specific thing and when you're designing 00:32:20 language you can decide where in that 00:32:21 Spectrum you want to be and what 00:32:22 tradeoffs you want in the case of Mojo 00:32:24 what we have is we have a lot of a lot 00:32:26 of the ability to choose because we're 00:32:28 replacing a lot of lovm and a lot of 00:32:30 this kind of stuff and we're not 00:32:32 building on the python implementation um 00:32:34 and so we have a lot of choice there and 00:32:36 it gives us a lot of freedom but not 00:32:38 full Freedom right because we are still 00:32:39 building on parts of lvm and so like 00:32:41 coves you picked in very interesting 00:32:43 example C moves are a very simple thing 00:32:46 where um you know lvm will flatten 00:32:50 sufficiently small things just like 00:32:51 you're saying and it gives you almost no 00:32:53 control over that as a language author 00:32:56 like there's no way to express this in 00:32:57 lvm like don't do that and and that can 00:33:01 be really bad for very very narrow but 00:33:04 for very specific things and so there 00:33:06 are some workarounds but it's it's not 00:33:09 great yeah well even bigger example is 00:33:11 inlining um so like inlining in general 00:33:14 like uh anytime a language gives me an 00:33:17 option to control inlining and say like 00:33:19 inline always or inline never something 00:33:20 like that one of the things I always 00:33:22 think about and because a question that 00:33:23 I've had for rock is like should we 00:33:24 introduce something like that we don't 00:33:25 currently have it should we is that 00:33:27 every single time I write a function I 00:33:28 always think about that a little bit 00:33:30 little part of me is like every single 00:33:31 time like should should I should I try 00:33:33 to control the inlining it and then if 00:33:34 it's like a oneline function I'm like 00:33:36 it's gonna get inlined anyway it's fine 00:33:38 you know but past a certain point I'm 00:33:39 like do I want to make sure it's in line 00:33:41 or do I not want it to be in or never 00:33:43 this and then you get people that just 00:33:45 put always in line on everything and now 00:33:46 your compile time is just ridiculous and 00:33:48 your code size is ridiculous and they 00:33:50 don't actually but they're just pulling 00:33:52 it Forward because they see it somewhere 00:33:53 else so well in a lot of cases it's 00:33:55 probably oh yeah that could that could 00:33:56 totally happen oh but in a lot of cases 00:33:58 I assume it's probably just like 00:34:00 harmless in the sense that the optimizer 00:34:01 was GNA I've heard the joke that I think 00:34:03 this is about LM actually that the like 00:34:05 the inline her istic is yes like it's 00:34:08 like always in line 00:34:09 basically yeah I'm sure it's not quite 00:34:12 true obviously but yeah I think that's 00:34:14 not quite true but I mean I could see 00:34:15 how you could feel that way um so in the 00:34:18 space of ner nerdery right I mean in a 00:34:20 zeroc cost substraction language where 00:34:22 you're composing very simple things so 00:34:24 uh Mojo for example is a zeroc cost 00:34:28 substraction language built on top of ml 00:34:30 at the bottom right so it's turtles in 00:34:32 terms of strs and functions and stuff 00:34:34 like this kind of similar to a rust or 00:34:36 C++ but it's like turtles all the way 00:34:37 down and then at the very bottom um in 00:34:40 our case int for example is not built 00:34:43 into the language int is just a struct 00:34:46 okay and so in is a struct int is a 00:34:49 struct tuple is a struct like all the 00:34:51 like array obviously is like a struct 00:34:53 and things like this but yeah int is not 00:34:55 built into the Mojo language at all 00:34:58 right there's nothing oh I see so it's 00:35:00 like a struct of bites or bits even like 00:35:03 and so what's inside of it what's inside 00:35:04 of it is an ml magic thingy right and 00:35:08 what what Mojo is is it's syntactic 00:35:10 sugar for ML at the bottom of that stack 00:35:14 right now I see so ml will see like this 00:35:16 is a one field struct like ml will 00:35:18 recognize that like by struct you mean 00:35:20 32-bit integer well so we use there's 00:35:23 like an ml magic thing that you can get 00:35:25 direct access to anything in ml in 00:35:28 and so what int is is int is um you know 00:35:32 a struct but then also implements all 00:35:34 the operators that You' expect for an 00:35:36 integer to have like plus and and it 00:35:38 implements them in terms of all the mlr 00:35:39 magic and so it's just like in a seed 00:35:41 language you'd have like built-in pop 00:35:43 count or something like that um it's 00:35:46 just way more fancy than that but it's 00:35:48 the same kind of idea and you you won't 00:35:50 don't want to sprinkle built-in pop 00:35:51 count into all of your code what you do 00:35:53 is you write some big in class or 00:35:56 something you'd have a popc count method 00:35:57 right and and stuff like this and so 00:35:59 it's the same kind of idea but um but 00:36:01 anyways in terms of inlining like 00:36:03 there's a small bit of nerdery you might 00:36:04 want to think about for rock uh which is 00:36:07 um if you're at the very bottommost 00:36:09 level of the stack you don't want people 00:36:11 to see these 00:36:13 abstractions and so Plus on int you 00:36:16 don't want to step into that in a 00:36:18 debugger and so a thing you might want 00:36:20 to look at is not just always in line 00:36:23 but always in line and throw away the 00:36:24 debug info and so l do that and this is 00:36:28 something where that way you don't step 00:36:30 into Plus on your integer because that's 00:36:33 not something anybody actually wants to 00:36:35 do and it can make a better U ux as well 00:36:37 as yeah make sure that plus gets 00:36:39 inlined right yeah we've we've thought 00:36:42 about that um so our we we call them 00:36:44 built-ins like our standard Library just 00:36:46 because it's like baked into the 00:36:47 compiler but um we we have them 00:36:50 implemented in a mix of uh Rock and also 00:36:53 in like lower level magic things that 00:36:55 only the compiler has access to so if 00:36:57 you look at some of the built-in files 00:36:59 sometimes it'll just have like a 00:37:01 function and it's just a type annotation 00:37:02 there's no implementation and that's 00:37:04 because the compiler just magically 00:37:05 inserts one so for things like that we 00:37:07 certainly have to be like yeah no no 00:37:08 debug info past this point you know stop 00:37:10 here um but then there's other ones 00:37:12 where we've thought about like maybe we 00:37:14 want to just not do debug info on them 00:37:15 just because we want to reserve the 00:37:17 right to change the implementation we 00:37:19 kind of want it to appear like a black 00:37:20 box like look don't someday it might 00:37:22 become one of those internal only things 00:37:23 so don't assume that this is what's 00:37:24 actually happening just because it 00:37:26 happens to be right now yeah that's 00:37:28 right well so it's it's a fun thing that 00:37:29 you get to design and control a lot of 00:37:31 these different things and there isn't 00:37:32 one right answer and so you can iterate 00:37:34 a so yeah so speaking of numbers I was 00:37:37 actually there there's two aspects of 00:37:39 Mojo's number system that I'm really 00:37:41 curious about uh one is you mentioned um 00:37:44 in in the previous talk you gave about 00:37:45 Mojo you sort of hinted at um Hardware 00:37:48 accelerated complex numbers which I'm 00:37:50 kind of curious about um and the other 00:37:52 thing is uh you made reference to the 00:37:55 fact that numbers are sort of all 00:37:57 vectorized by default in some sense and 00:38:00 the scalers are sort of a special case 00:38:01 of that it's like like a 8bit integer is 00:38:05 like a one 00:38:06 lane8 bit Vector something like that so 00:38:10 yeah I just love to know more about 00:38:11 those things yeah so so let me let me 00:38:13 frame this by explaining um something 00:38:16 that for example C++ programmers will 00:38:17 have seen right so in 00:38:19 C++ um again a language I've written 00:38:22 millions of lines of code in literally 00:38:24 and so I am allowed to have an opinion 00:38:26 on this 00:38:28 it it It suffers from a lot of history 00:38:30 okay and so in C++ for example int and 00:38:33 Float are built into the language they 00:38:35 have magic promotion roles there's a 00:38:36 whole bunch of really weird stuff that 00:38:38 happens like this simd still isn't 00:38:41 really first class there are various 00:38:43 extensions for example clang and GCC 00:38:44 both support extensions but it's not 00:38:46 really first class um C++ also has a 00:38:50 fairly horrible but getting better 00:38:52 metaprogramming system around coner and 00:38:55 templates and stuff like this okay so so 00:38:57 just to frame this uh Mojo solves all 00:39:01 these problems so first 00:39:03 metaprogramming Mojo learns from 00:39:05 languages like Zig and says let's not 00:39:08 use a different language for programming 00:39:10 than we used for meta programming so 00:39:12 don't say you write you know void Fu 00:39:14 parthy parenthese to write a function 00:39:17 but you write template blah blah blah 00:39:18 blah blah blah blah to write a meta 00:39:19 program instead just WR a function and 00:39:22 now you can use the function both at 00:39:23 comp time and at runtime and so so Mojo 00:39:26 Mojo does that and so that unifies a 00:39:29 huge amount of stuff that often hangs 00:39:31 out in macro land or pre-processor land 00:39:33 or template land and things like this 00:39:35 into a very uh very nice thing right 00:39:38 second thing we do is we say all of the 00:39:41 numbers and values and things like this 00:39:43 are defined in the standard Library so 00:39:45 Mojo's standard library is very open 00:39:47 source you can go check out struct int 00:39:49 and float and things like this and go 00:39:52 check it out and see all the Mr uh 00:39:54 low-level stuff inside of it um and so 00:39:57 as a consequence of that we can choose 00:39:59 how we want to implement this and so 00:40:00 when we Implement first of all simd we 00:40:03 have a simd type simd takes a element 00:40:06 type so are you a float 32 or 64 or B 00:40:08 float 16 or a float 8 or float 4 there's 00:40:12 lots of complexity these days in this um 00:40:14 but then also as you say what how how 00:40:16 big is the simd vector so do you have 00:40:18 four elements or eight or 64 or whatever 00:40:21 right and so having simd as a struct in 00:40:24 Mojo using parametric 00:40:27 meta programming now means that the lane 00:40:30 Dimension so how wide is it is now 00:40:33 parameterized in comp time okay the 00:40:36 element type the dtype is parameterized 00:40:38 in comp time right and so now what that 00:40:40 means is you can build and and it has 00:40:42 very nice methods for swizzling and all 00:40:44 the things you'd want it's all very 00:40:45 natural it's all it all feels very first 00:40:47 class and oh by the way all processors 00:40:49 have have simd since like the late 90s 00:40:52 and so like why is it that no languages 00:40:54 have embraced modern computers it's 00:40:56 unclear to me 00:40:58 and so what that means though is think 00:41:00 about this from a library ecosystem so 00:41:02 when you start if you're a numeric 00:41:03 programmer and you want to use simd well 00:41:05 sure you want to do plus and minus and 00:41:08 things like this but you also want to do 00:41:10 things like sign and cosine mhm and so 00:41:13 as a C++ programmer it's like cool I 00:41:14 have sign and cosine they come from libm 00:41:16 where do I get a 4X float sign oh now 00:41:20 you're in the mode of there's 497 00:41:22 different libraries none of which are 00:41:23 compatible and now there's this 00:41:24 fragmentation at the bottom of the stack 00:41:26 and now stuff built on top of sign and 00:41:28 cosine like an AI 00:41:30 framework or a ray Tracer or whatever 00:41:33 right they get built on top of different 00:41:34 libraries and now the whole software 00:41:36 ecosystem is not unified because simd at 00:41:38 the very bottom is a mess so what Mojo 00:41:42 does is we say okay well a couple of 00:41:44 things one is have a comp time meta 00:41:46 programming system second is put all 00:41:48 these core types in the library and use 00:41:51 the comp time but then what you're 00:41:52 referring to is we go one step further 00:41:55 and we say our scaler types are just a 00:41:57 special case of the simd types and so if 00:42:00 you go look up uh float 32 for example 00:42:03 float 32 is just a type alius for simd 00:42:06 of float 32 comma 1 and so what that 00:42:10 means is that means that we can now 00:42:12 write things like s and cosine write 00:42:15 them parametric across the length of the 00:42:18 vector and so it's just paramet you pass 00:42:20 in and now we have one implementation of 00:42:22 sign that works on vectors it also works 00:42:24 on scalers and what and what you're 00:42:27 incentivizing as you're incentivizing 00:42:28 people that are writing numeric 00:42:29 algorithms they're already generally 00:42:31 thinking about genericity because they 00:42:32 care about both float 32 and Float 64 00:42:35 right and so what you do is you say like 00:42:37 okay we'll just add yeah if it's an 00:42:38 element wise op it's trivial to make it 00:42:40 work uh for any length and now your 00:42:42 whole ecosystem at the bottom becomes 00:42:44 simpler and this just defines away a ton 00:42:45 of complexity for the software Library 00:42:49 ecosystem so does that mean that you can 00:42:51 have Vector operations that are defined 00:42:53 for scalers like for example can you 00:42:54 Swizzle an 8bit integer uh yes but the 00:42:57 index space is only uh zero so you can 00:43:00 Swizzle zero with itself yes okay yeah 00:43:03 that makes sense so uh one of the other 00:43:05 things that I've thought about is so I 00:43:07 haven't written that much syb code um 00:43:09 compared to you certainly compared to 00:43:11 plent of other people um but one of the 00:43:13 things I have noticed is that often 00:43:15 times if I'm writing like a Syd 00:43:17 algorithm that's going to do something 00:43:19 fairly complicated or fairly tricky and 00:43:22 then I implement it for like Neon versus 00:43:25 if I implement it for like x64 00:43:27 um they they end up using in some cases 00:43:30 just pretty fundamentally different 00:43:32 instructions that have different apis so 00:43:35 is that something you think about like 00:43:36 do you do you expose like different 00:43:38 intrinsics for different architectures 00:43:40 or do you try to unify those and say 00:43:42 like here's a way we can patch over 00:43:44 those differences yeah great question so 00:43:47 this this is a really important question 00:43:48 and this you're also talking about like 00:43:50 uh complex numbers and accelerating uh 00:43:52 there's this whole Space of accelerating 00:43:54 block operations like a lot of processes 00:43:56 these days have like little matrix 00:43:58 multiplication operations or you have 00:43:59 tensor cores on GPU and things like this 00:44:02 so our approach on that is multi-layered 00:44:04 one is expose the full power of the 00:44:07 hardware and so if you want to use a 00:44:09 neon uh thing or you want to use altc or 00:44:13 you want to use viz on spark or you want 00:44:15 to 00:44:16 use MMX on Intel whatever like you could 00:44:20 totally go nuts and you do whatever that 00:44:23 chip wants chip at the very low level 00:44:25 and then it's specific to that chip 00:44:26 uhhuh you have Target specific 00:44:28 compilation you're like this I'm 00:44:29 building this Mojo code just for Intel 00:44:31 yeah okay yep and and and now you can 00:44:34 say on top of that at param at comp time 00:44:37 at parameter time you can say if I'm on 00:44:39 this target use this code path okay 00:44:42 right so think of it as ifs but way 00:44:44 cooler and um and so now if you do that 00:44:48 then that what that allows that allows 00:44:49 Library developers to be able to build a 00:44:51 higher level abstraction like let's do a 00:44:53 blend which is a generally understood 00:44:55 thing or let's do a linear interpolation 00:44:58 or let's do like you you can Define 00:44:59 these higher level Library abstractions 00:45:01 and then what that allows you to do is 00:45:03 that you compose one into the other and 00:45:05 you say okay well um if I'm building a 00:45:07 matrix multiplication for example you 00:45:09 can Implement in terms of 00:45:11 scalar multiplies and ads at the very 00:45:13 bottommost level of the stack but then 00:45:15 you say okay well the API I give out to 00:45:18 my users at the top level is this go go 00:45:21 apply mmol to a big block of memory you 00:45:23 know an array or a tensor and then what 00:45:26 that allow in that whole algorithm has 00:45:27 recursive decomposition and a whole 00:45:29 bunch of complexity within it and so 00:45:31 then what you allow is you allow 00:45:32 different layers at the at the within 00:45:36 that implementation be able to Fork off 00:45:37 to Hardware specific implementations and 00:45:40 so this way you allow Library developers 00:45:42 to build that Hardware abstraction 00:45:43 instead of having the the compiler have 00:45:45 to do this and one of the major things 00:45:47 that Mojo does in general is it's about 00:45:49 forcing complexity out of the compiler 00:45:51 and force it into the libraries and the 00:45:54 rationale for this is that that very few 00:45:56 people people can't even work on 00:45:58 compilers I mean you've noticed it's 00:46:00 kind of a different art but lots of 00:46:02 people can build libraries and 00:46:03 particularly if you and particularly if 00:46:05 you make it super easy to build these 00:46:07 things you make the language super 00:46:08 consistent and simple by making 00:46:11 everything be a library then what you're 00:46:13 doing is you're encouraging very high 00:46:15 power highly expressive libraries to be 00:46:17 built and then you allow the ecosystem 00:46:19 figure out the abstractions that make 00:46:21 sense for them and even subdomains like 00:46:23 Graphics versus rate tracing or 00:46:25 something they they can have different 00:46:27 different careabout and they can have 00:46:28 different abstractions that make sense 00:46:29 for them yeah yeah that makes a lot of 00:46:32 sense I I've thought about um it's 00:46:34 almost like a funnel where you have 00:46:36 application developers and there's a 00:46:37 huge number of application developers 00:46:39 out there and then a subset of 00:46:40 application developers are also Library 00:46:42 developers I guess theoretically you 00:46:43 could have someone who's exclusively 00:46:44 Library that's really hard to find and 00:46:46 then within that you get a much smaller 00:46:48 percentage of those that are like 00:46:49 framework authors and then Bel that you 00:46:52 have compiler authors as like the sliver 00:46:54 among all application authors and yeah 00:46:57 it's it's definitely increasingly niche 00:47:00 as you as you go further and further 00:47:01 down those l and a lot of a lot of my 00:47:03 quests with Mojo is about um so I built 00:47:06 a lot of compilers I know a lot of 00:47:07 compiler Engineers across my career and 00:47:10 I love them all and and very special but 00:47:13 but that pyramid or funnel or however 00:47:15 you look at it is is really interesting 00:47:18 because compiler Engineers are special 00:47:20 and beautiful and wonderful and I love 00:47:22 them in their own way but they also 00:47:23 don't know a lot about the domain 00:47:24 problems yeah right 00:47:27 some things and and but and so what you 00:47:29 get is it's not that compiler Engineers 00:47:31 are smarter better looking than the 00:47:34 other I mean I 00:47:36 mean I mean I mean yeah so but the uh 00:47:41 but what you get is you you I think 00:47:43 about this as an ecosystem of talent 00:47:45 yeah right and like when you're 00:47:47 programming a GPU there's there's this 00:47:49 whole range of recently there's all 00:47:51 these like special compilers for ML and 00:47:53 a lot of them compilers for machine 00:47:55 learning and a lot of it is like like 00:47:57 let's go put these algorithms into a 00:47:58 compiler but the compiler Engineers 00:48:00 don't know the best way to get full 00:48:02 power of the GPU they don't know all the 00:48:04 numeric tricks they don't know how to 00:48:06 cheat in the case that you know the low 00:48:09 Precision blah blah blah blah blah 00:48:11 thingy or the the fancy Swizzle that the 00:48:13 GPU does internally and stuff like this 00:48:15 and making that happen in a very general 00:48:17 way is very difficult whereas a numeric 00:48:19 kernel programmer knows gpus really well 00:48:21 but doesn't know compilers they know all 00:48:23 the tricks they they they they know how 00:48:25 the profile works sometimes they design 00:48:27 the hardware themselves right and so I 00:48:29 look at you know my my quest with Mojo 00:48:31 is really about unlocking those people 00:48:33 and giving them superp Powers rather 00:48:36 than a trend that I've seen over the 00:48:37 last 5 10 years in the space of like put 00:48:41 sophistication into the compiler lock it 00:48:42 up throw away the key and trust us us 00:48:45 compiler people have got it because you 00:48:47 know what I've seen is that that's not 00:48:49 actually really that true right instead 00:48:51 what I see is compiler folk and library 00:48:54 and Newar kernel and application Level 00:48:56 and domain people like they all come at 00:48:58 the prom and they bring different skill 00:48:59 sets to bear and a compiler is rarely 00:49:02 going to give you TX Improvement but 00:49:05 somebody working at the application 00:49:06 domain because they know the application 00:49:08 totally can because they can use the 00:49:10 right tool for the job right yeah and 00:49:12 that that seems to be especially true 00:49:13 when it comes to simd like autov 00:49:15 vectorization historically has had a 00:49:17 really low success rate in terms of 00:49:19 approaching what a a human programmer 00:49:21 can do or maybe now on llm but I'll make 00:49:24 fun of us as an industry again I hope 00:49:27 you know that I love compiler people you 00:49:29 know like I love PL people but but let 00:49:32 me make fun of us as a category um it 00:49:35 was the late 00:49:36 90s early 2000s where like at least us 00:49:40 and working on PCS and things like this 00:49:42 is like oh multicore is coming multicore 00:49:44 is the future like parallel Computing is 00:49:46 going to be here like we need to have 00:49:48 these fancy abstractions so we can do 00:49:50 parallel compute and a lot of people 00:49:52 talked about it there were things like 00:49:53 Silk obviously open MP and MPI and 00:49:56 things like this fast forwarded today 00:49:58 people are still using p threads like 00:50:00 after models that are still obscure like 00:50:02 I mean there's like all this stuff that 00:50:03 really hasn't penetrated and so much 00:50:05 code is still single threaded like lvm 00:50:08 for example but also yeah a lot of other 00:50:11 code the progress we've made has been 00:50:13 mostly in like there's actually a dual 00:50:16 world of there's the really structured 00:50:18 HPC style compute and then there's the 00:50:19 actor models and things like this that 00:50:21 are very good at decentralized 00:50:23 asynchronous compute and it's super 00:50:25 funny when you look at that that 00:50:27 like Auto paralyzing compilers didn't 00:50:29 solve the problem what we did is we 00:50:32 built higher level things and then we 00:50:33 allowed experts to be able to express 00:50:35 them and that that was where progress 00:50:36 came from not not the silk or you know 00:50:39 the magic Solutions like that well and I 00:50:41 think I mean this is where a lot of 00:50:43 people listening are being like this is 00:50:44 why functional programming will will 00:50:45 ride to the rescue because uh there was 00:50:48 also I think around the 90s like the 00:50:49 paper that always comes to mind is John 00:50:51 Hughes like why does functional 00:50:52 programming matter and he talks about 00:50:53 Mo's laws coming to an end and you know 00:50:56 more cores are the future and therefore 00:50:58 we need to be able to write pure 00:50:59 functions that can be distributed across 00:51:01 cores without you know data rases and so 00:51:03 on so forth a classic example that 00:51:05 people give of this is quick sort and 00:51:07 when they say quicksort they don't mean 00:51:09 the classic quick sort that's like 00:51:10 here's a flat array and we're going to 00:51:12 go through and swap elements in the 00:51:14 array back and forth and what ends up 00:51:16 happening in this hasal implementation 00:51:18 of quote unquote quick sort is that it's 00:51:21 doing like five allocations per 00:51:23 iteration and then that like gets 00:51:25 potentially worse than that depending on 00:51:27 like how the data is arranged and it's 00:51:29 this totally but it's three lines of 00:51:32 code right and it's it looks very 00:51:34 elegant and it looks much nicer than 00:51:36 quick sort which perhaps is because it's 00:51:38 not quicksort but I I just I look at 00:51:40 that and I'm just very frustrated 00:51:42 because when I see people talk about 00:51:44 this like oh look and and let me show 00:51:45 you this beautiful graph of you can take 00:51:47 this algorithm and unlike traditional 00:51:49 quicksort you can run it across end 00:51:51 cores and it just keeps getting faster 00:51:53 and faster it's amazing and I'm like 00:51:54 well but how does it compare to actual 00:51:56 regular quicksort if you run them side 00:51:58 by side I'm like I already know the 00:52:00 answer to that it's going to get 00:52:01 absolutely obliterated there like if you 00:52:02 draw if you put actual quick sort on the 00:52:04 graph it's going to be one pixel at the 00:52:06 bottom because this is so much slower 00:52:08 than that and every no matter how many 00:52:09 cores you throw at it and that sort of 00:52:12 thing is the the type of thing that to 00:52:14 me there's in the functional programming 00:52:16 Community a little bit of a mismatch 00:52:18 between in my mind what the observable 00:52:21 benefits of a functional programming 00:52:22 style are where you're like maximizing 00:52:24 pure functions and things like that and 00:52:26 what are the theoretical benefits that I 00:52:28 think are a lot harder to achieve in 00:52:30 practice than people there's a lot of 00:52:32 hand waving going on I guess is what I'm 00:52:33 trying to say and I I I think it's uh 00:52:35 unfair it's it's not 00:52:37 reasonable well uh how can I react to 00:52:41 that do you want me to well you can you 00:52:43 can push back and say Richard you're 00:52:44 totally wrong about that if that's what 00:52:46 you believe but I kind of have a 00:52:47 suspicion that you probably agree 00:52:49 already well so so what I would say is 00:52:51 that um the definition of functional 00:52:54 programming is extremely controversial 00:52:57 true so scholar programmers think 00:52:59 they're functional programmers even 00:53:00 though they have massive reference 00:53:02 sematic mutable State all over the place 00:53:04 and it's more of a a branding than it is 00:53:07 a reality and so if you talk to Haso 00:53:10 programmer they'd say that ok camel is 00:53:12 not a functional programming language 00:53:13 because it's strict right and so that's 00:53:16 that's not functional right and so I 00:53:18 think that the the the you know the 00:53:20 religious wars can be fought on the back 00:53:22 the it's a big things that I don't care 00:53:25 about 00:53:27 so I'll tell you I'll tell you what I 00:53:28 care about I love functional programming 00:53:30 because it gives you it defines way 00:53:32 spooky action at a at a distance right 00:53:35 and if I were to pick camps like in 00:53:37 terms of like a a reasonable a language 00:53:40 I would like to use to build things 00:53:42 that's in the functional Spectre I would 00:53:44 prefer okam rather than hasal just 00:53:47 because okam is strict it is fits better 00:53:51 more control things like this and OK 00:53:53 camel achieves lack of spooky action a 00:53:56 mostly um because of uh things that 00:54:00 really have nothing to do with 00:54:00 strictness right but now the challenge 00:54:02 with this is if you take it all the way 00:54:04 to its extreme you say okay well I'm not 00:54:06 going to allow mutation like so some 00:54:08 functional some people Define functional 00:54:10 programming as no mutation then um one 00:54:13 of the challenges you get into is that 00:54:15 you as you say quick sorting an array 00:54:19 means that you cannot do imp Place 00:54:20 mutation of the array and so every 00:54:23 change to every element ends up being an 00:54:25 O of n operation cuz you have to 00:54:26 reallocate the end elements or the end 00:54:29 buckets right and so you're destroying 00:54:31 algorithmic complexity for the sake of 00:54:33 Purity and so this is where um hey I hey 00:54:36 I I heard there's this cool language 00:54:37 called Rock but there's also other 00:54:38 languages like Swift and things like 00:54:40 this that give you the benefit of 00:54:42 functional programming but allow in 00:54:44 place mutation and that's way better for 00:54:46 the actual machine and you know what 00:54:48 Swift calls this is value sematics yeah 00:54:51 and saying look you can Define first 00:54:53 class values first class values Define a 00:54:54 way spooky at action in a distance yes 00:54:56 you can have an array of array of 00:54:57 dictionaries of strings of whatever 00:55:00 right and it's purely value semantic 00:55:02 just like it would be in a functional 00:55:03 programming language but because of 00:55:05 behavior in the language and in the 00:55:06 library and how it's implemented you get 00:55:08 in place mutation yeah and that's a huge 00:55:11 you get the benefit of the functional 00:55:13 programming style without the curse of 00:55:14 having to reallocated the entire tree 00:55:17 just because you want to change one of 00:55:18 the leaves yeah and I mean the thing 00:55:20 that uh is as far as I know rock is 00:55:22 unique in like among functional 00:55:24 languages that are not just research 00:55:26 language like there's research languages 00:55:27 that have done this but we're trying to 00:55:28 actually be like a you know commercially 00:55:29 used language um is that we have the 00:55:32 concept of there is no language level 00:55:35 user facing concept of mutation it's 00:55:37 purely an optimization and so uh like 00:55:40 the really simple example of this is 00:55:42 like if the reference count is one we're 00:55:44 not cloning that we're just gonna use 00:55:45 the one you got in in place like we do 00:55:47 AOA reference counting but then we also 00:55:49 have static analysis that says you don't 00:55:50 even need to check the reference count 00:55:51 at run time we just know yep um I gave a 00:55:54 talk I don't know if you've seen this um 00:55:55 like in 2021 strange Loop called uh 00:55:57 outperforming imperative with uh purely 00:55:59 functional languages or something like 00:56:00 that and um the the punchline of the 00:56:03 talk is I'm going to spoil the ending 00:56:04 for those who haven't seen it is that we 00:56:06 do a quick sort Benchmark that's actual 00:56:08 quick sort and it's like we have C++ 00:56:10 Rock um go Java and JavaScript and we 00:56:15 didn't put Swift in there I don't know 00:56:16 how swift would have done but um but at 00:56:18 the end with all of Rock's optimizations 00:56:19 we actually and The Rock implementation 00:56:21 looks much worse than the others because 00:56:23 you have to do recursion and stuff and 00:56:25 it's relying on tail call optimization 00:56:26 and stuff like that but it does actually 00:56:28 we we were second only to C++ C++ was 00:56:31 faster no surprise there but we actually 00:56:32 were managed to get it to be faster than 00:56:34 go slightly faster than go not a lot but 00:56:36 slightly faster than go which of course 00:56:37 all the other ones were just using 00:56:39 explicit Direct in place mutation and to 00:56:41 me and that that by the way that's not 00:56:43 going to be paralyzable in the same way 00:56:45 like for free that the has is fair 00:56:47 enough um but it's not trying to be like 00:56:49 the point in my mind is that but but to 00:56:52 your point you're training off otation 00:56:53 for the the ability to use multiple 00:56:55 cores that's kind of 00:56:57 yeah well but I mean I think there is a 00:56:59 there is a relevant distinction in that 00:57:01 like one of the things that I've been 00:57:02 asked as recently as earlier this week 00:57:04 by somebody else on Rox zulip was um 00:57:06 somebody was asking about um like hey 00:57:08 like isn't isn't rust essentially a 00:57:10 functional programming language because 00:57:12 or like I would argue that rust is and 00:57:14 my having spent a lot of time using rust 00:57:16 it doesn't feel like a functional 00:57:17 programming language in any way I mean 00:57:19 again this this gets back to your your 00:57:21 arguing definitions of what functional 00:57:22 programming mean yeah right well so so 00:57:25 so I mean I that's a really so with rock 00:57:28 I'm a huge fan by the way I I think that 00:57:29 I haven't had a chance to use it but I'm 00:57:31 a huge fan of your work and I think that 00:57:32 you're working on very fundamental 00:57:33 things and I love that thank you um the 00:57:35 the I'm not aware of other languages 00:57:37 that have done that Swift's approach on 00:57:38 that was to say um mutation is explicit 00:57:42 but then um when you do something that 00:57:46 would require a copy and it works the 00:57:48 same way with ref count equal one 00:57:49 basically right um don't make the copy 00:57:52 if ref count is greater than one make 00:57:54 the copy and so it's just it's like a 00:57:55 slight different bites it's not right in 00:57:57 a functional style it's right in an 00:57:59 imperative style but it gives you 00:58:00 functional Behavior yeah and so it's a 00:58:02 diff different different trade-off and I 00:58:03 agree with you it's a very different 00:58:04 thing but um one of the benefits of this 00:58:07 is that it enables you to beat 00:58:09 C++ things like this and allows you to 00:58:11 paraliz things and because you can now 00:58:13 say hey I I give me a view on these 00:58:16 elements and uh so one of the problems 00:58:18 with with any of these languages that if 00:58:20 you compare again C++ is that just 00:58:22 indexing in the array reasonable 00:58:24 languages want to bounce check that 00:58:25 index yeah right and so one of the 00:58:28 things that Swift can do is you can say 00:58:30 okay well give me a view on this array 00:58:34 um and give me an unsafe view and say 00:58:37 just I'm going to write generic code a 00:58:40 generic algorithm that works on any 00:58:42 sequence or any collection and your 00:58:44 generic algorithm you can apply either 00:58:46 to the array or you could apply it to an 00:58:48 unsafe view on the same array and so 00:58:52 then what you can do is you can say Okay 00:58:53 I I care about performance of bounce 00:58:56 checking or something for some reason 00:58:58 and say you know with one line of code 00:59:00 you can swap from saying okay well I 00:59:01 have the safe thing to the unsafe thing 00:59:02 and I don't have to rewrite the entire 00:59:03 algorithm if it was written in a generic 00:59:05 way so our version of that is to have 00:59:08 like because this is kind of the style 00:59:10 and functional languages anyway is to 00:59:11 have like standard Library operations 00:59:14 that use unsafe behind the scenes but 00:59:16 Pro provide only a safe API to that so 00:59:18 great example of this is map like list. 00:59:20 map you're like I want to transform 00:59:21 every element in this under the hood 00:59:23 we're doing a for loop with you know 00:59:25 unsafe you know pointer bumps but you 00:59:27 never see that and but you don't pay for 00:59:28 the bounce check either I mean Swift and 00:59:30 Mojo do the same thing right so in our 00:59:32 case we don't put in the the language 00:59:34 and we don't have special built-ins we 00:59:36 just put in the library like Russ is a 00:59:38 very interesting community in a culture 00:59:40 because a lot of people don't like 00:59:42 unsafe code but rust has a very weird UB 00:59:47 filled definition of what unsafe means I 00:59:49 know it's it's a very sharp there's a 00:59:52 very sharp tool that's being held there 00:59:54 but but in Swift remote 00:59:56 the the observation is it's like array 00:59:59 which is built into the library not into 01:00:01 the compiler like it is at R like is 01:00:04 implemented with effectively malakin 01:00:06 free and unsafe pointer stuff on the 01:00:08 inside but then the API to array is safe 01:00:12 even though the implementation's unsafe 01:00:14 right and so a lot of a lot of the idea 01:00:16 just like you're talking about is you 01:00:18 you use unsafe constructs to build a 01:00:20 safe API and it's an API design problem 01:00:22 um but again with Russ it's super 01:00:25 interesting how they have this culture 01:00:26 and the theory of like let's get unsafe 01:00:28 out of the packages entirely and that's 01:00:31 just a very different cultural position 01:00:33 then right it is but I actually see it 01:00:35 as a cultural dysfunction of the Russ 01:00:36 community and I can say that because I 01:00:38 spent a huge amount of time writing Russ 01:00:39 code and I do it my day job at Zed so 01:00:41 like I I always imagine the Drake meme 01:00:43 where he's like you know looking at one 01:00:45 thing like oh yeah that's great looking 01:00:46 at the other one like oh no and it's 01:00:48 like he when he looks at the thing and 01:00:50 says oh no it's unsafe keyword being 01:00:53 used in my code base and then for the 01:00:55 like oh yeah this is great it's like 01:00:57 literally cut the code out of my code 01:00:59 base and put it into a package and then 01:01:01 publish the package and now it's like 01:01:03 yeah now it's great like as long as it's 01:01:05 behind a package that's got to you know 01:01:07 like separately published somewhere 01:01:09 else well I agree with you I wouldn't 01:01:12 say it that strongly but what I would 01:01:13 say is like the thing I'm identifying 01:01:14 here is that different programmers will 01:01:16 will balance religion versus pragmatism 01:01:20 sure and so if you want to call into an 01:01:22 existing sea Library it's going to be 01:01:23 unsafe because God knows what the C code 01:01:26 is doing right and so and there's a lot 01:01:28 of really useful code out there and it's 01:01:30 dusty deck and you don't want to have to 01:01:31 change it and so it's very pragmatic to 01:01:34 build on top of unsafe code now Malik is 01:01:36 one example of that it's unsafe like or 01:01:39 free you know Malik may be aafe but free 01:01:42 right yeah and so um but but that 01:01:46 doesn't mean that Mal and free are bad 01:01:48 what it means is you don't want to 01:01:49 expose like you talking about earlier 01:01:51 the the first thing you reach for should 01:01:53 be a type safe array that is memory safe 01:01:57 but that array can be built out of 01:01:59 unsafe components right and in the case 01:02:01 of rust they just hardcode the array 01:02:04 into the language which they had to 01:02:06 because of some of the language design 01:02:08 decisions they made I guess or maybe it 01:02:10 was an accident of History maybe they 01:02:11 don't have to I can't speak to that but 01:02:13 U but in the case of moo we're like no 01:02:15 well rays are not special they're just 01:02:17 they're called list actually because of 01:02:19 python DNA but um uh but actually 01:02:23 there's lots of kinds of arrays there's 01:02:25 inline arrays there's fix size arrays 01:02:28 there's Heap allocated thingies there's 01:02:29 all like we don't want just one kind of 01:02:31 array we there's a lot of a big world 01:02:33 out there and we want them all to be 01:02:34 first class or field first class to 01:02:37 library developers and so hardcoding any 01:02:38 one of them into the library or into the 01:02:40 language would be a little bit weird 01:02:42 yeah there's this whole spectrum of 01:02:45 Concepts around error handling and it's 01:02:47 sort of weird to me that um we the the 01:02:50 way that we think about error handling 01:02:52 and performance in particular has some 01:02:54 strange trade-offs so some examples um 01:02:57 if I'm talking about scaler integer 01:03:00 Edition uh that can overflow and if it 01:03:02 overflows it wraps around like by 01:03:04 default at the CPU level and that can 01:03:06 cause all sorts of horrendous bugs um I 01:03:08 was asking people about overflow bugs 01:03:09 they've seen and somebody talked about 01:03:10 like a router or something having an 01:03:13 overflow bug and it resulted in just 01:03:14 like total disasters but at the same 01:03:16 time I mean for for for for reference uh 01:03:19 Swift uh traps on overflow by yeah okay 01:03:22 so which totally makes sense that's rock 01:03:24 rock does the same thing um 01:03:26 now rust what rust does in that space is 01:03:29 it says well in debug builds uh we trap 01:03:31 on overflow but in release builds we 01:03:34 don't do that and I believe that's what 01:03:35 C++ does I don't know what Mojo does but 01:03:38 I thought about that design decision we 01:03:39 talked about it but the thing that I 01:03:40 don't like about it is that that is 01:03:42 exactly the type of bug that is unlikely 01:03:44 to be hit by any of your tests and like 01:03:47 if you're if you don't do the bounce 01:03:49 check at runtime that's most likely when 01:03:51 you're going to see like after you got 01:03:53 it deployed in production and somebody 01:03:54 hit it with a really weird data set 01:03:56 that's when it's going to hit the 01:03:57 Overflow and so it feels like it's it's 01:03:59 like you might as well just you know 01:04:01 acknowledge that you're just going to 01:04:02 have those bugs well better performance 01:04:04 but but but so I actually don't even 01:04:06 know in Rust in release build is it UB 01:04:08 when you do that or is it question uh I 01:04:11 don't it rap so so I I don't I don't 01:04:14 know what uh rust does I will tell you 01:04:18 the pit of Hell that is 01:04:19 C++ I'll tell you what we did in Swift 01:04:22 and I'll tell you what we did in Mojo 01:04:23 and they're all different and I'll tell 01:04:24 you why because again like this this is 01:04:27 um I think illustrative of language 01:04:29 design and it's it's a simple and 01:04:30 accessible thing but it it actually 01:04:32 these decisions really matter and there 01:04:34 isn't one right answer so in in C++ so 01:04:37 please don't do this if you're out there 01:04:38 building a language C++ if you have an 01:04:40 unsigned number it is defined to wrap 01:04:43 with two's complement on overflow and 01:04:45 this is not actually literally True by 01:04:48 the language spec as far as I know but 01:04:51 because there's ones complement machines 01:04:53 theoretically and stuff but in practice 01:04:55 PC and clang and all reasonable c 01:04:57 compilers will guarantee two's 01:04:59 complement overflow and then it is UB if 01:05:03 you overflow a signed integer okay and 01:05:07 so what this allows this allows a c 01:05:10 compiler to say if you take like an INT 01:05:12 and you multiply it by two and then you 01:05:14 divide it by two it can just delete that 01:05:17 and just say I * 2ide by two is equal to 01:05:20 I because if it would have overflowed 01:05:22 then it would have been UB I've heard 01:05:24 about this so that that is that is crazy 01:05:29 and as far as I can tell it comes down 01:05:31 to um like so there is a practical 01:05:33 reason that c compilers need to be able 01:05:35 to cheat on this is It's because on 01:05:37 64-bit targets lots of people use int 01:05:40 which is 32 bits as an induction 01:05:42 variable in a for Loop and if you do 01:05:44 that then you get all these sign 01:05:46 extensions from 32bit to 64-bit and they 01:05:49 tank performance on Loop algorithms and 01:05:51 so there's a reason to do that and so 01:05:53 the problem that c c and C++ have is 01:05:55 they don't like the standard integer 01:05:57 type is not size T uhhuh and so in Swift 01:06:02 int and uint are size T we care a lot 01:06:05 about correctness I think Swift is 01:06:08 massively over it jumped the shark on 01:06:10 correctness and it tried to it tried way 01:06:12 too hard and I mean that's a different 01:06:15 topic but um but so therefore it traps 01:06:17 if you either overflow an unsigned in or 01:06:19 assigned in and then it has specific 01:06:21 operators to do two's complement math 01:06:23 and so you can use plus to do trapping 01:06:26 on overflow math and then Amper stand 01:06:28 plus to do tw's complement math and you 01:06:30 can opt into it so that's the way it 01:06:31 went now as you're saying like the 01:06:33 consequence of this is that all code 01:06:35 everywhere and you can there's crazy 01:06:38 flags that very few people use to turn 01:06:40 the stuff off but but roughly nobody 01:06:42 does um so now there's as you say like 01:06:47 very few people hit these errors and 01:06:48 then when they do it's not clear that 01:06:50 they're going to hit them in debug 01:06:52 they're going to hit them mostly in 01:06:53 production and even then like what does 01:06:56 it mean to trap is that actually great 01:06:58 it's it's unclear um there's actually so 01:07:00 when we got to Mojo there's actually a 01:07:02 bigger problem which is um come back to 01:07:06 simd right integers in simd registers 01:07:09 like in8 for example the whole reason 01:07:12 that you're doing simd math is that you 01:07:14 care about performance like that's 01:07:16 that's actually pretty important and so 01:07:19 you can't say every time you do plus on 01:07:21 a simd of in8 you're going to do an 01:07:24 overflow check yep 01:07:26 not going to happen like that that 01:07:27 completely destroys the entire purpose 01:07:29 of doing this and we want consistency 01:07:31 between scalers and 01:07:33 vectors what we decid to do I don't 01:07:35 think simy even offers like a like a 01:07:37 single instruction way to say like did 01:07:39 any of these overflow at least not that 01:07:41 I could 01:07:41 find I mean you 01:07:44 you so it can be done maybe not in a 01:07:47 single cycle but you you could there's a 01:07:49 design point where you could choose to 01:07:51 enforce overflow at extreme cost but but 01:07:54 yeah I mean I I agree with you 01:07:56 it would be it would be ridiculous and 01:07:58 so and roughly all CD programmers 01:08:00 everywhere are used to thinking to 01:08:01 compliment and so it's just a mess and 01:08:03 so we had this choice of like do we want 01:08:05 to have different behavior for scaler of 01:08:07 one or do we want to have it consistent 01:08:10 with simd and like we were talking about 01:08:11 before we decided let's make it 01:08:13 consistent yeah and we we could have had 01:08:16 like I mean even within our design 01:08:17 because we have fancy parametric 01:08:18 metaprogramming stuff we could have said 01:08:19 that scaler of one has special behavior 01:08:21 but then you have inconsistency in your 01:08:23 software ecosystem Y and so what we 01:08:25 decided to do is just everything's to 01:08:27 compliment super consistent it's fast 01:08:30 you don't have like you know Runing to 01:08:33 bug mode is different than release mode 01:08:35 like you know you're not going to be 01:08:36 able to catch these classes of errors 01:08:39 but guess what you can't catch all 01:08:40 classes of Errors anyways so I think 01:08:42 that's a totally reasonable design and I 01:08:44 could imagine a world in which that's 01:08:46 what we go with and rock and in that 01:08:47 world it would be a world specifically 01:08:49 where we've decided that like simd is a 01:08:51 much more like right now we just 01:08:53 basically don't have any simd support 01:08:54 and that might turn out not to be 01:08:56 acceptable someday like it might just be 01:08:58 like the the way the world's going it's 01:09:00 just like you got to be able to write 01:09:02 like it the compiler is not smart enough 01:09:04 to use these instructions and even if it 01:09:06 did you still couldn't have the bounce 01:09:08 checking overflow and get any reasonable 01:09:10 performance they'd be a deoptimization 01:09:11 so there's there's a world where well 01:09:14 and and and with with Swift like that 01:09:15 that's the thing it's like an autov 01:09:16 vectorizer can't vectorize code that is 01:09:18 doing bounce checks right or overflow 01:09:21 checks yeah overflow sorry overflow 01:09:23 checks yeah yeah which makes total sense 01:09:25 and it's also another thing that I think 01:09:27 about a lot is um what are the errors 01:09:29 that people are cognizant of and 01:09:31 thinking about handling because if you 01:09:33 say for example like something we 01:09:34 experimented with early on was um let's 01:09:37 try to make it so that floating Point 01:09:39 division returns uh like an error type 01:09:42 right and and like you have to check it 01:09:43 every time because hey division by zero 01:09:45 can like cause all these problems we can 01:09:47 we can Define away all those nasty edge 01:09:49 cases around not a number and you know 01:09:51 not being equal to itself and so forth 01:09:53 at the hardware level but what we found 01:09:55 very quickly is that what that means is 01:09:58 in practice whenever anyone does 01:09:59 division they will immediately say with 01:10:01 default zero like I don't worry about it 01:10:03 if it if it if it would have caused a 01:10:05 problem just put it put it at zero and 01:10:07 just ignore it and so in my mind that's 01:10:10 a worse outcome because that's because 01:10:12 man managing the the error explicitly is 01:10:15 just not worth it like nobody wants to 01:10:17 do that right and so in my mind it's 01:10:19 better to say look there are these usual 01:10:21 floating Point error cases now in our 01:10:23 case we do offer like a fix Point 01:10:24 decimal like in the standard library 01:10:26 that you can use instead of that so in a 01:10:27 lot of Ed cases that's going to be 01:10:29 better but if you actually are doing 01:10:30 floating Point math the point is like 01:10:32 here there be dragons like you need to 01:10:34 actually be aware of this and and think 01:10:36 about these edge cases and in some cases 01:10:38 I think that is the best API is to is to 01:10:40 not try to hide it and just say like 01:10:42 it's not that it's you well maybe you 01:10:44 say it's UB or maybe you say it's always 01:10:45 going to wrap but like the point is you 01:10:47 say this is an edge case you need to be 01:10:49 aware of and thinking about whenever 01:10:50 you're doing this type of 01:10:52 arithmetic yeah well so I mean I think 01:10:54 that it really comes down to what is the 01:10:55 goal of your language right and so that 01:10:57 you can so you don't have to 01:11:00 use i e floating point right to your 01:11:04 point you could use rational numbers or 01:11:06 decimals or lots of other things you can 01:11:08 invent your own floating Point format 01:11:09 you could not you could Define away Nan 01:11:12 somehow like all the stuff the reason 01:11:13 you use i e is that then you get 01:11:15 Hardware acceleration yes yep soft float 01:11:18 is not fast right and so um and so and 01:11:21 what I'm saying is uh Mojo cares about 01:11:24 simd well why is that well it's because 01:11:27 you know effectively every CPU has a 4X 01:11:30 float on it and so if you're doing float 01:11:33 arithmetic and you're not using simd 01:11:35 you're one quarter of the utilization of 01:11:37 the machine and sure maybe the 01:11:39 vectorizer will pick it up for you or 01:11:40 something in some cases but but 4X is a 01:11:43 pretty big deal like in terms of power 01:11:46 power consumption latency and everything 01:11:48 Etc and so like being native and 01:11:50 supporting that not forcing people to do 01:11:51 it but but uh making it easy is is 01:11:54 pretty valuable and similarly with 01:11:55 threads like make parallelism easy and 01:11:57 then people use it right and so but that 01:12:00 doesn't mean that particularly uh like 01:12:02 you and I we want practically useful in 01:12:05 production kind of languages and that's 01:12:06 where we're biased towards um but there 01:12:08 are also huge space for research 01:12:10 languages like and so I I love to see 01:12:13 the the the just like intended for 01:12:15 research don't actually care about 01:12:16 performance 01:12:18 um exploration of the stuff and uh even 01:12:21 more so I love and what what I think is 01:12:23 cool about Mojo is you can just like you 01:12:25 don't have to have a different language 01:12:26 to explore these things you can just 01:12:27 Define your own float type right right 01:12:30 and and in the case of you're talking 01:12:32 about should divide return zero should 01:12:34 it clamp should it do this should it 01:12:36 throw an error whatever like you can 01:12:37 just Define like Divi The Divide 01:12:40 operator has Behavior X and then I have 01:12:43 a method that has Behavior y right and 01:12:46 because floats are just structs you can 01:12:47 add methods to them it's fine you can 01:12:49 say div with error whatever so you can 01:12:54 enable people to opt into the 01:12:55 yeah yeah that makes a lot of sense I um 01:12:59 uh speaking of like the relationship 01:13:01 between like research and uh industrial 01:13:04 languages and stuff like that um I I we 01:13:06 were talking a little bit before the we 01:13:08 started recording and I I mentioned this 01:13:10 and I guess you hadn't heard of this but 01:13:11 there was I think it was Hacker News or 01:13:13 Reddit or something but like uh somebody 01:13:15 latched on to a comment you made pretty 01:13:16 offhandedly and your actually company 01:13:19 internal presentation about Mojo at 01:13:21 modular um where you mentioned that 01:13:24 there's an internal uprising and people 01:13:25 like we should totally publish this I'm 01:13:27 like uh okay I would have prepared a 01:13:29 little bit better if that were the cas 01:13:32 okay um but anyway the comment that 01:13:35 people were uh talking about was you 01:13:37 mentioned about uh like type inference 01:13:39 and you said like we we didn't we could 01:13:41 have done something like what we did 01:13:42 with swift which was uh Henley mner 01:13:44 bidirectional type checking really fancy 01:13:46 full type inference Yad yada but there 01:13:49 were two problems that you saw with that 01:13:50 in Swift one was uh that the compile 01:13:52 times um were a problem and then second 01:13:56 that uh the error messages were a 01:13:58 problem because they you could get 01:13:59 really like non-local um it wasn't clear 01:14:02 where the where the era came from and 01:14:05 the the fight that I saw was from like 01:14:07 the pro hinley Milner Henley Milner 01:14:09 Defenders saying you know Henley m is 01:14:11 great and look at Elm elm's hinley M has 01:14:13 like the best error messages ever and 01:14:15 then uh and and you know his compiles 01:14:17 really fast and then uh and then the 01:14:19 anti- hinley mil people saying like Ah 01:14:21 that's just a bunch of you know academic 01:14:22 stuff like of course you know if you 01:14:24 want to do real program you got to so 01:14:26 I'm kind of curious to just I don't know 01:14:28 just expand on on your thoughts on that 01:14:30 some more based on your experiences with 01:14:32 Swift yeah well so okay so so maybe I 01:14:35 misspoke and it came across not like I 01:14:38 intended where's the fun in that let's 01:14:40 just pretend you said common right but 01:14:44 but but here here's what I meant to say 01:14:46 um using Henley Milner in Swift was a 01:14:48 mistake in my opinion you can still get 01:14:52 expression is too complex to type check 01:14:53 and Swift and it 01:14:55 on pretty reasonable things and today 01:14:58 people Swift programmers are taught to 01:15:00 break long lines into separate 01:15:02 statements just to kind of work around 01:15:04 compile time interesting um and a thing 01:15:06 I could say is like today Swift is like 01:15:08 14 years old it's not like it's never 01:15:10 been worked on in fact Engineers have 01:15:12 spent years of effort trying to make 01:15:14 this work better right yeah now so so I 01:15:17 could say I think that that was a 01:15:18 mistake in Swift now I think that 01:15:22 reasonable people can give me a hard 01:15:24 time and I could it by the way and I I 01:15:26 say the wrong thing all the time so it's 01:15:27 not unusual and uh I I I read all those 01:15:31 really angry comments with love if if I 01:15:33 see them um I'm impressed that's that's 01:15:35 that's not an easy thing to do all the 01:15:37 time well I try to but yeah anyways I'm 01:15:39 I'm this is the benefit of having a 01:15:41 thick skin and having been around 01:15:42 programmers for a long time but the uh 01:15:44 but but the the mistake in the statement 01:15:46 is it's not Henley Milner's fault right 01:15:49 the actually the actual fault is the 01:15:50 fact that Swift has other things like 01:15:53 function overloading 01:15:55 and uh literals that can be inferred to 01:15:57 different types and many other things 01:16:00 that now make type- checking this thing 01:16:02 an exponential time problem yeah and so 01:16:05 if you make type checking exponential 01:16:07 time well sure you can push it down you 01:16:10 can make certain things work better and 01:16:11 stuff like this but the actual bad thing 01:16:13 that's happened there is is an 01:16:15 exponential time algorithm now what that 01:16:18 means is TW fold one is which is compile 01:16:20 times can tank and can become 01:16:21 unpredictable for users and stuff like 01:16:23 this and they have to learn patterns to 01:16:24 work around the Imp a stuff like this 01:16:26 but also means that if it's taking 01:16:27 exponential time to even decide like 01:16:30 solve decidability of can I type check 01:16:33 this thing well then of course the airs 01:16:34 are not going to be great just because 01:16:36 you have this exponential permutation of 01:16:38 different things that somebody could be 01:16:39 trying to express and so then trying to 01:16:41 infer that and explain it back is is 01:16:43 Frau so I don't know I mean I'm very 01:16:45 impressed with Elm and the the the 01:16:47 quality work they put into the air 01:16:48 messages I think it's amazing I don't 01:16:50 know the the set of trade-offs and 01:16:52 decisions and how they intersect and how 01:16:54 they go into that but um rust for 01:16:56 example doesn't have function 01:16:58 overloading right right and so it 01:16:59 decided like let's not do part of this 01:17:02 and therefore they 01:17:03 solved this in a different way right and 01:17:07 so what Mojo is doing is MoJo's um 01:17:11 saying okay well let's not use Henley 01:17:13 Miler at all right what's the benefit of 01:17:16 that well it makes error reporting 01:17:18 really easy that's great because it 01:17:20 means that it's very predictable that 01:17:22 means the contract with the programmer 01:17:23 is very predictable uh it's roughly what 01:17:26 C++ has been doing of course we have our 01:17:27 own twist on it but uh and therefore 01:17:29 it's proven it's like it doesn't require 01:17:32 years of engineering an R&D to like try 01:17:35 to make it work right meaning we're 01:17:37 putting our time into things that 01:17:38 actually matter like it's super funny if 01:17:40 you think about python well python 01:17:42 people will tell you it's untyped or 01:17:43 something right I would say it's 01:17:45 dynamically typed it's funny if you say 01:17:48 X Plus Y and then parentheses time Z 01:17:51 right well the X + Y gets type checked 01:17:55 when you run plus and it doesn't know 01:17:57 about the time Z that's coming 01:17:59 afterwards and so all python programmers 01:18:02 are exposed to a system that can only 01:18:05 work inside and out on the expression 01:18:08 tree anyways and so if that's the case 01:18:10 then why not make the static type 01:18:11 Checker work the same way like that way 01:18:12 it's consistent yeah and if you make it 01:18:14 different then there's a burden of like 01:18:16 why make it different yeah so Sor right 01:18:19 so there's a lot of things that lead to 01:18:20 a very simple and predictable 01:18:21 programming model it also makes compile 01:18:24 time deterministically reasonable and 01:18:26 and things like this and so I think 01:18:28 that's just kind of the set of decisions 01:18:30 we decide to make and it's not hinley or 01:18:32 Milner's 01:18:34 fault okay so so the function 01:18:36 overloading being you think like that 01:18:37 was kind of the the the intersection of 01:18:39 hinley Miler and function overloading in 01:18:41 Swift is kind of like the I don't the 01:18:44 main culprit in in your mind is that 01:18:45 accurate or so I mean there there's a 01:18:47 bunch of things that Swift does by the 01:18:49 way Swift is a very fancy language and 01:18:50 it has a bunch of magic built into it 01:18:52 and has a ton of complexity and there's 01:18:54 a whole bunch of stuff with uh uh 01:18:57 protocol it's it's equivalent of traits 01:18:59 uh overload resolution and all all this 01:19:02 kind of stuff the uh literals are very 01:19:04 fancy um there there's a whole bunch of 01:19:07 of cool stuff uh that went into there if 01:19:09 you're interested in more of what makes 01:19:11 Swift interesting but also very 01:19:13 complicated there's a cool series of 01:19:14 blog posts written by a friend of mine 01:19:16 named Doug Gregor um it's in Doug's 01:19:18 compiler Corner um and it's the blog 01:19:20 post are called Swift for C++ 01:19:22 practitioners nice and he kind of 01:19:24 explained explains like how swift a 01:19:27 language built by a bunch of people that 01:19:28 were very experienced with C++ by the 01:19:30 way uh how Swift uh is different and 01:19:33 better and for example in Swift you can 01:19:35 Define your own operators and you can 01:19:36 Define your own precedence groups and 01:19:38 you can relate this operator to other 01:19:40 existing precedence groups and and stuff 01:19:42 like this and so Swift is a very very 01:19:44 very fancy language um our goal for Mojo 01:19:47 is don't have any of that fanciness like 01:19:50 just keep it simple it's fine like I 01:19:51 don't want it to be nearly as 01:19:53 complicated as Swift ended up in the end 01:19:55 so I assume this means that um in Mojo 01:19:58 you have to there's some point where 01:20:00 type annotations are required such as 01:20:01 like functions have to have have to be 01:20:03 type annotated is that right the way it 01:20:06 works is that if you don't put a type 01:20:07 annotation on WE default object and then 01:20:09 you get basically python Behavior right 01:20:11 so then it's all Dynamic and stuff like 01:20:13 this and so um and then you can and so 01:20:15 this is actually really important for 01:20:17 python compatibility let me give you 01:20:18 some intuition on this again going back 01:20:21 to TI tie tie a loop back around uh 01:20:23 sufficiently smart compilers and stuff 01:20:25 like this so in Python Python's integers 01:20:28 are big integers like you can make an 01:20:31 arbitrary sized integer and it's a heap 01:20:33 allocated thing and it's reference 01:20:34 semantics all this stuff they have 01:20:35 identity yeah and so you can use right 01:20:38 and so this this is a this is a pretty 01:20:40 interesting from a I need to implement 01:20:42 this in fast way on a GPU thing where in 01:20:45 a GPU can't do Malik and so how do you 01:20:47 do a bigin yeah right and we want to be 01:20:49 compatible and we want to be a good 01:20:50 superet of python over time and so we 01:20:52 have to support this Behavior um on the 01:20:55 other hand we don't want it to be the 01:20:56 first thing you reach for if you care 01:20:58 about performance right and so what we 01:21:00 do is we keep the names like lowercase 01:21:03 int because python has Type annotations 01:21:05 and they're kind of cursed in their own 01:21:07 way and there's a bunch of challenges 01:21:09 with the linters that use these things 01:21:12 but it's generally known as lowercase 01:21:13 INT in in the python world right and the 01:21:17 list in Python is generally known as 01:21:18 lowercase list L right and so our 01:21:21 approach on building standard library is 01:21:23 say okay well we can now have a capital 01:21:25 int Capital int is a struct capital int 01:21:29 is inline on your stack and it's size t 01:21:32 or you know the size of your pointer 01:21:33 right and it has these behaviors it has 01:21:36 a very very similar but slightly 01:21:38 different um operator set so for example 01:21:41 python int if you divide you get a float 01:21:43 back 01:21:44 out and so 01:21:47 oh sure yeah makes sense how's that for 01:21:49 solving a problem right yeah for us we 01:21:51 say okay well you get an integer back 01:21:53 out when you divide integers and there's 01:21:54 a different operation to to do this 01:21:56 thing right and um and so there's a 01:21:58 bunch of different design tradeoffs but 01:22:00 but it's compatible with existing python 01:22:02 because if you use lowercase in you 01:22:03 still get an object on the Heap that's 01:22:05 dynamically sized and it's an arbitrary 01:22:07 Precision integer and you have to opt 01:22:09 into I want Capital int and if you use 01:22:12 that you get way better performance you 01:22:13 get better predictability it will run on 01:22:15 a GPU etc etc um and it's not about one 01:22:18 being better than the other it's about a 01:22:20 a trade-off and you can make and so 01:22:22 that's that's kind of how we handle that 01:22:24 and similarly with list like lowercase 01:22:26 list it's an array of objects it's 01:22:28 untyped it's all Dynamic okay cool um 01:22:31 Capital list takes a type parameter and 01:22:33 it has an element type and it's a 01:22:34 homogeneous array in line and it's got 01:22:37 none of the the fanciness it's more like 01:22:39 a STD Vector in C++ right um and then 01:22:43 you as a programmer can use the right 01:22:44 tool for the job that makes a lot of 01:22:46 sense yeah a thing that we have to 01:22:48 contend within rock is if I just put in 01:22:50 the number five and I don't put any type 01:22:51 annotations anywhere um what type we so 01:22:55 we have a default but in practice it 01:22:56 almost the default almost never comes up 01:22:58 outside the reppel in the reppel if 01:22:59 you're just like playing around it can 01:23:00 totally come up um but uh actually the 01:23:03 more interesting one is what happens if 01:23:04 you do decimals so you say like 5.1 or 01:23:06 something like that because let's say I 01:23:08 put in 5.1 instead of five um now what 01:23:11 we do is we actually use our standard 01:23:13 libraries built-in fix Point Decimal 01:23:15 System so what that means is that by 01:23:17 default we're kind of prioritizing 01:23:18 correctness if you put into 0.1 plus 0.2 01:23:21 into the reppel it gives you back 0.3 um 01:23:24 but at the end of the day most of the 01:23:26 time it's just going to be inferring it 01:23:27 from surrounding context of of types 01:23:29 that are there um I think the uh so I'm 01:23:33 curious what you think about this we 01:23:34 talked a little bit about type inference 01:23:36 uh in past episodes and uh one of the 01:23:38 things that comes up is even if you have 01:23:40 the ability which I'm not sure if Swift 01:23:42 does or not but um to say I'm not going 01:23:44 to annotate even any of my top level 01:23:46 functions just no type annotations 01:23:47 anywhere at all um doesn't go that far 01:23:50 okay so yeah swi Swift requires you to 01:23:52 have signatures and Mojo as well so you 01:23:55 know you have to have a statically 01:23:56 understandable signature so you don't do 01:23:58 whole program type inference okay so 01:23:59 that's where the the bidirectional part 01:24:00 comes in so you require that so you can 01:24:04 yeah and and Swift is even limited to 01:24:06 just inference within a statement it 01:24:08 doesn't go across statements in a 01:24:09 function interesting all right good to 01:24:11 know okay so that explains why it would 01:24:12 f it still can't it can't still can't 01:24:14 solve individual statements okay yeah 01:24:18 well so so in Rockets like the whole the 01:24:20 whole thing is just like annotations are 01:24:21 completely optional and we don't even 01:24:22 have a compil error that says hey you 01:24:24 need to annotate this um right but uh 01:24:27 basically the the consequence of that is 01:24:29 that if you want to use Rock for like a 01:24:31 scripting in the sense of like as a bash 01:24:33 replacement you can do that and in that 01:24:35 world you kind of like don't want to 01:24:36 annotate your functions usually um or 01:24:38 you can start by writing your whole 01:24:40 program without annotations then add 01:24:41 them later or ask the editor to infer 01:24:43 them for you things like that um yeah 01:24:45 and I've always thought that's pretty 01:24:46 cool but I have heard a counter argument 01:24:48 that like it's so important that you 01:24:50 should annotate your types that it's 01:24:52 like actually a feature that you are 01:24:53 required to all the time as opposed to 01:24:55 it just being a cultural norm which it 01:24:57 is in El and hll and so forth well I 01:24:59 think it's really about your programmer 01:25:01 expectation right and if you're 01:25:03 intending to write types then it's 01:25:05 useful to help people that are intending 01:25:06 to write types if they're intending not 01:25:08 to write types or they don't care then 01:25:10 it's nice to not hassle them yeah 01:25:13 although I do like the ability to write 01:25:15 the implementation and then ask the 01:25:16 editor to infer the types for you so you 01:25:18 don't have to write out the types 01:25:20 upfront if that makes sense before you 01:25:22 can try it out that's how I used to 01:25:23 write h was the hasal would just say 01:25:26 like okay every time you're in the Rael 01:25:28 would say okay well it's a t t t and 01:25:29 you're like awesome copy and paste that 01:25:31 put it in the source code and then and 01:25:33 then when I'm reading a code later I 01:25:34 don't have to like be repping it to 01:25:36 understand what the signature is yeah 01:25:38 now I have heard that I think this was O 01:25:40 but okay people can correct me on this 01:25:41 but um I have heard that there are some 01:25:43 cultural norms where you just don't 01:25:45 annotate the top level functions at all 01:25:47 which to me is wild I would just like 01:25:48 it's like that sounds gross to me like 01:25:50 you don't you don't annotate any of them 01:25:51 like no you should it's like it's it's 01:25:53 really helpful information um not just 01:25:56 because it helps the error messages get 01:25:57 better and stuff like that but no 01:25:59 because easier to read and understand 01:26:00 the expectations of the interface yeah 01:26:02 for sure again I think there's personal 01:26:04 preference here but I'm in the same camp 01:26:05 as you for what it's worth that's my my 01:26:07 natural proclivity yeah yeah so Mojo has 01:26:11 uh a couple of problems to solve one is 01:26:14 we need to be a good Python and so 01:26:16 python doesn't really have types it has 01:26:18 Type annotations but it doesn't really 01:26:20 have stag types and the type annotations 01:26:21 are 01:26:22 cursed um 01:26:24 uh the other problem we have to solve is 01:26:26 we have to run on a GPU right and so if 01:26:29 you're on GPU you want full control and 01:26:31 here talk about GPU embedded system like 01:26:34 whatever the there's a class of people 01:26:35 that want full control and low-level 01:26:37 predictability um we also want to be 01:26:39 able to write low-level libraries like 01:26:41 sign and cosine and int and stuff like 01:26:42 this and have them composable by people 01:26:44 who don't want to care about this stuff 01:26:46 right yeah um and so the way we solve 01:26:49 that is actually uh and and to your 01:26:51 point x equals 5 five in Python means 01:26:57 create an object on the Heap with five 01:26:59 in it or maybe a tagged integer or 01:27:01 something but create a create an object 01:27:03 that is got you know that or you say 01:27:05 quote Fu you get a python string object 01:27:09 thingy right and we want a fancy 01:27:13 inline you know closer to what C++ to 01:27:16 have um and so the way we handle this is 01:27:19 we have two different keywords for 01:27:20 defining functions we have def which is 01:27:23 python and in deps like you know you 01:27:26 don't need Types on any of the arguments 01:27:28 and they get inferred to being python 01:27:29 object you can put a type on but if you 01:27:32 don't then it's just an object type and 01:27:34 and so everything is statically got a 01:27:36 type it's just defaults to object um and 01:27:39 then within it now you can say okay well 01:27:41 things default to their python behavior 01:27:44 and then we have FN and so if you choose 01:27:46 to Define your functions as FN well then 01:27:50 you have to have types things default in 01:27:52 a different way and then what you can do 01:27:54 is you can self-identify by the kind of 01:27:56 code you're writing do I want to be more 01:27:57 scripty or do I want to be more systemy 01:27:59 and then we make it so that you're 01:28:01 telling the compiler what world you want 01:28:02 to be in and without without judgment 01:28:04 right you can choose where you want to 01:28:06 be and that's fine and that also helps 01:28:07 both this um compatibility python but 01:28:10 also the Persona issue that makes a lot 01:28:13 of sense so I guess that also means that 01:28:14 if you wanted to given that you have a 01:28:16 different top level keyword for like I'm 01:28:18 in I'm in FN land now if you wanted to 01:28:20 you could also change the syntax of fun 01:28:22 function declarations to use your own 01:28:24 type annotation syntax I don't know if 01:28:25 you do that or not but uh we just use 01:28:28 the same syntax because it's fine it's 01:28:29 nice um but you could choose to do that 01:28:31 the only tradeoff then is now you have 01:28:33 like two a two two world problem and so 01:28:35 we want two worlds when it comes to some 01:28:37 of this Behavior but we don't want it to 01:28:38 be like let's so our generics use square 01:28:41 brackets because that's how python works 01:28:43 and and things like this you don't want 01:28:45 to have like angle brackets like C++ in 01:28:48 the same language or you don't want to 01:28:49 say I mean again a different language 01:28:52 could choose to do this but you could 01:28:53 say like uses indentation for scoping 01:28:56 and then uh FN uses curly braces and 01:28:58 semicolons you know you could 01:29:00 theoretically do that but but that's not 01:29:03 what we've chosen to do we've chosen to 01:29:05 be consistent except for providing this 01:29:07 one big fork that allows us to serve two 01:29:09 different kinds of use cases yeah that 01:29:11 makes sense so going back to uh numbers 01:29:13 and uh and sort of uh Hardware 01:29:16 accelerated complex numbers um I I just 01:29:18 love to learn more about that because 01:29:20 something that people have asked about 01:29:21 in rock is that currently I'm actually 01:29:24 leaning towards someday adding this but 01:29:25 currently we don't actually support 01:29:27 operator overloading if we do in the 01:29:28 future it'll be basically the way Russ 01:29:30 does it um because we have an equivalent 01:29:31 of rust traits but uh but the question 01:29:34 is like should we do it at all like Java 01:29:35 famously didn't didn't support that and 01:29:37 you know after C++ did and there there 01:29:39 trade-offs um but one of the things that 01:29:41 people brought up is you know well what 01:29:43 if I want to implement my own complex 01:29:44 numbers and one of the questions or like 01:29:46 I want to implement big in or something 01:29:48 like that and one of my questions is 01:29:49 always like what's the performance of 01:29:50 that even going to be like like how how 01:29:53 good could it be and Hardware 01:29:55 accelerated complex numbers is just not 01:29:56 something I'd heard about at all so I 01:29:57 just love to learn more about like how 01:29:59 do you how does that fit into Mojo is 01:30:00 that something that there's like demand 01:30:02 for in the scientific Computing 01:30:03 Community or how how does that work let 01:30:05 me unwind this a little bit and talk 01:30:07 about operator overloading and I can't 01:30:09 tell you what the right thing is for 01:30:10 rock I can I can I can give and share 01:30:12 some experiences and lessons learn right 01:30:14 I've used C++ quite a bit but also 01:30:16 hasool and other languages like that to 01:30:18 allow you to Define your own operators 01:30:19 and so C++ was always very frustrating 01:30:22 to me because it had a limited oper set 01:30:24 and people want to overload them and so 01:30:25 you got shift left for Io right you know 01:30:29 IO streams shift left and so and so with 01:30:32 swift what we decided to do is much more 01:30:34 of a 01:30:35 hollish thing where you can define an 01:30:37 arbitrary or nearly arbitrary symbol to 01:30:40 be a uh you know a punctuation character 01:30:43 can be an operator you can Define that 01:30:44 you can define a precedence level and 01:30:46 then you can go overload it to your 01:30:47 heart's content Swift even allows you to 01:30:49 use Unicode and so you can use like set 01:30:51 inclusion and things like this as operat 01:30:54 and go nuts with it 01:30:56 um that I think is useful theoretically 01:31:01 and one of the ways that I would always 01:31:03 defend that is you know in C++ you see 01:31:06 these these types that go overload like 01:31:09 shift left or Plus or whatever you don't 01:31:12 know something weird is happening and so 01:31:14 part of my thesis was okay well if I see 01:31:16 some crazy weird Unicode operator I 01:31:20 don't know what it is at first glance 01:31:21 maybe but at least I know it's not Plus 01:31:25 at least I know it's something weird and 01:31:27 I can go do a Google Search and try to 01:31:28 figure out what's happening and then I 01:31:29 can go I know I don't know right now 01:31:32 yeah in practice um nobody in the Swift 01:31:34 ecosystem picked up unic code operators 01:31:36 and so that was very theoretical um in 01:31:40 practice very few people Define their 01:31:41 own operators and so it seems like it's 01:31:43 not very valuable and so in Mojo we're 01:31:46 just saying at least for now just allow 01:31:48 the python style stuff keep the grammar 01:31:51 fixed keep the language simple like yes 01:31:53 it can can be solved or it could be 01:31:55 extended someday but the burden should 01:31:56 be why do we want to make the language 01:31:58 more complicated and does it pay for its 01:32:02 complexity cost in the language um 01:32:05 versus in Swift it was much more of a 01:32:06 hey wow this thing could be useful and 01:32:08 wow wouldn't this be fun and cool and we 01:32:09 added it very early and then spend a 01:32:11 bunch of time trying to polish it and 01:32:12 make it better and so just you know 01:32:14 that's some different opinions and 01:32:16 theories over time about operators and 01:32:18 it's not that one is right or wrong but 01:32:20 it's just different things but I'm 01:32:22 generally Pro allow over 01:32:24 Operator overloading by the way yeah I 01:32:26 mean well so the feedback I've heard is 01:32:28 there's certain use cases where people 01:32:30 talk about like um remember I think it 01:32:32 was Casey murator uh mentioning like he 01:32:34 does a lot of game development obviously 01:32:36 um he said like he basically likes C 01:32:39 better than C++ but he uses a C++ 01:32:41 compiler just to get operator 01:32:42 overloading because it's like that 01:32:43 important if you're doing that much like 01:32:45 Matrix math and stuff like that um and 01:32:48 that's that's those are the types of use 01:32:49 cases people have asked for it in rock 01:32:51 um and I'm pretty dead set on not adding 01:32:55 custom operators like where you can add 01:32:57 cuz Elm had them and then removed them 01:32:59 after seeing what they got used for in a 01:33:02 lot of cases well okay I mean a lot of 01:33:04 people so Swift also allows you to have 01:33:06 Unicode oper or Unicode function names 01:33:08 and stuff like this and so you know yes 01:33:10 there there are the memes where you have 01:33:11 pile of poop function and stuff like 01:33:13 this but but there are also people that 01:33:15 you know don't speak English and they 01:33:16 want to use like you know their native 01:33:18 language and go right yeah well this was 01:33:21 more about like you have your operators 01:33:23 asky characters but it's like six asky 01:33:25 characters in a row you know because and 01:33:27 you're like no really this I made this 01:33:28 beautiful DSL it's like this this is not 01:33:31 nice code to read anymore um yes but uh 01:33:35 but thinking about overloading I the 01:33:36 downside there is exactly what you said 01:33:38 about C++ where people use it to make 01:33:40 dsls that are not math related like 01:33:42 there it's like now right bit shift 01:33:45 means IO or or something like that do 01:33:48 you worry about that in Mojo or you just 01:33:49 sort of like it's the cost of doing 01:33:51 business like we got to have it again I 01:33:54 so we we have we're not making forever 01:33:56 decisions here we're making a set of 01:33:58 starting point decisions and the 01:33:59 starting point decision is complexi has 01:34:01 to pay for itself now you're right I've 01:34:04 seen the bad things I've also solved the 01:34:06 bad things in Swift and then the in my 01:34:08 opinion having lived with it for 14 01:34:10 years uh the solution was worse than 01:34:13 the The Cure was worse than the curse or 01:34:17 whatever the poison yeah you weren't 01:34:19 poison or yeah and so and so I don't 01:34:22 think it actually was the right tra off 01:34:24 um now some beautiful things were built 01:34:26 with it some problems I'm sure were 01:34:27 avoided but it also added a ton of 01:34:29 complexity and makes the language more 01:34:30 complicated for people who don't care 01:34:33 right and so my current position with 01:34:34 Mojo and part of this is forced by 01:34:36 python but is allow the goofy underbar 01:34:41 ad Dunder style syntax so you can 01:34:43 overload operators same way python does 01:34:45 and I think that's good enough but let 01:34:47 me tell you the benefit to you as a 01:34:49 language designer if you choose to add 01:34:51 operator overloading and Swift and Mojo 01:34:55 take it so Swift took it quite far Mojo 01:34:58 takes it even further um the benefit is 01:35:00 is that you don't have to have 01:35:01 privileged types in your language 01:35:03 Actually I don't even know is is I think 01:35:05 rock is a zero overhead language is it 01:35:08 not as much as possible yeah I mean uh I 01:35:10 the only overhead that really comes in 01:35:12 the standard library is that um I guess 01:35:15 you could call we do the small string 01:35:16 optimization you could call that 01:35:17 overhead depending I mean obviously 01:35:19 hopefully but I mean if if you define a 01:35:20 struct a structure or method that gets 01:35:22 in line then 01:35:24 yeah it's the same as okay so and and 01:35:25 you're using lvm and stuff like so so so 01:35:28 the benefit of that is that you can say 01:35:29 look I don't have to privilege types in 01:35:32 C uh they had to add complex numbers to 01:35:36 c99 and so they built it into the 01:35:38 language because they have no ability to 01:35:40 find an STD complex number and so that 01:35:43 made the language more complicated and 01:35:45 they had to have all the overload 01:35:47 resolution rules and c99 actually even 01:35:50 added imaginary numbers I don't know if 01:35:51 you know this I didn't know that so what 01:35:54 ends up happening is if you can say uh 01:35:56 libraries can be expressive then what 01:35:57 you do is you put back pressure on this 01:35:59 and you allow the language to be simpler 01:36:00 which makes then Library developers able 01:36:04 to Define their own expressive 01:36:05 abstractions for their domain and then 01:36:07 you can allow them to go and so Mo Mojo 01:36:10 again takes us extremely far and makes 01:36:11 int not be built into the language and 01:36:14 things like this um and that's actually 01:36:16 great because um in the domain that we 01:36:19 work in there's all kinds of numeric 01:36:20 data types so there's float and double 01:36:23 for example fine uh there's also B float 01:36:26 16 and Float 8 and all like couple of 01:36:29 float 8s with different mantis and 01:36:31 exponent and like and so there's this 01:36:33 huge range of innovation that's 01:36:34 happening in numeric data types and 01:36:36 having to rev the compiler every time 01:36:38 somebody wants to go do something it 01:36:40 would be very 01:36:42 annoying and so instead we can just have 01:36:44 people to find these things in libraries 01:36:45 and that's actually great and it reduces 01:36:47 pressure on the language from having to 01:36:48 churn with all the hardware rock is 01:36:50 probably in a different space since you 01:36:52 don't care about float 4 or something 01:36:54 yeah I mean not today certainly not not 01:36:56 on our radar um Java is an interesting 01:36:59 example where essentially what they did 01:37:01 which is what rock is doing today but 01:37:03 could change in the future is 01:37:05 essentially to say yeah you can Define 01:37:06 your own complex number in user space I 01:37:08 don't know if like I mean how much 01:37:10 overhead there might or might not be in 01:37:12 Java but you don't get custom operators 01:37:14 for it like you need to call do add if 01:37:16 you want to add two of these things and 01:37:17 do divide obviously that has ergonomics 01:37:19 downsides if you're trying to do 01:37:22 arithmetic well well but Java also 01:37:24 doesn't have zero cost subtractions 01:37:25 right and so it requires you to box 01:37:27 things and stuff like this and so 01:37:29 practically speaking you could Define a 01:37:31 complex number but it's going to be slow 01:37:33 anyways or unpredictable I mean jvms 01:37:35 have a lot of optimizations to work 01:37:37 around this um go back to Hero heroic 01:37:40 optimizations and when they're important 01:37:43 uh at least I haven't done Java in a 01:37:44 long time but the jvm didn't used to 01:37:46 have two-dimensional arrays it only had 01:37:48 one dimensional array really and so yes 01:37:51 and so if you want to allocate an array 01:37:54 of arrays you would have an array and 01:37:57 then each of the rows of the array can 01:37:58 be a different length yeah right right I 01:38:02 I did know that because that was true 01:38:03 back when I did Java yeah and so maybe 01:38:06 they've changed that now but um but but 01:38:08 one of the challenges with that is if 01:38:10 you write arrays of arrays that's way 01:38:11 slower than having an actually proper 2D 01:38:13 array and so this is again where a lot 01:38:15 of java implementations particularly for 01:38:17 benchmarks would go try to turn that 01:38:19 pattern match code do these heroic 01:38:22 static analysis thingies understand I 01:38:24 can prove the length of each of these 01:38:25 subarrays is the same and then and then 01:38:27 solve that and again that's that's cool 01:38:30 some Engineers work really hard on that 01:38:31 it's a it's 01:38:33 heroic but wow it's so much better to 01:38:35 have arrays of arrays actually in line 01:38:39 and predictable and two dimensional 01:38:40 arrays are actually a pretty valuable 01:38:42 thing and relying on compiler nerds to 01:38:44 go crazy to uh give you the thing that 01:38:46 you could have asked for in the first 01:38:48 place is pretty pretty rough I see okay 01:38:50 so so in in Mojo it's not that there are 01:38:52 like specific Hardware optimizations 01:38:54 that come up for complex numbers but 01:38:55 rather that it's like zero overhead 01:38:57 complex numbers if that makes sense okay 01:38:59 so sorry you're right I you I got 01:39:01 distracted on on operator overloading so 01:39:03 just in terms of complex numbers so I'm 01:39:05 not an expert on this but I'll tell you 01:39:06 my understanding which is that um 01:39:08 certain instruction sets including like 01:39:10 IBM mainframes and a bunch of other 01:39:12 stuff they kind of climbed this Tower of 01:39:14 let's have scalar floats now let's have 01:39:16 simd and I 01:39:18 have I have like uh so back in the day 01:39:21 it be like four floats or two doubles 01:39:24 kind of a thing and then you kind of 01:39:26 build into this and then some of these 01:39:28 chip people decide that okay well 01:39:30 complex math is really important and so 01:39:33 there are certain operations like 01:39:34 complex multiply that actually include 01:39:37 like I think four multiplies and two 01:39:40 adds plus some other goop that all come 01:39:44 together and they say I want to have one 01:39:45 Hardware instruction that does complex 01:39:47 multiply it takes see two two complex 01:39:50 numbers and simd registers or something 01:39:52 like this right and so once somebody 01:39:54 puts something into hardware and says 01:39:55 wow this is 10x faster than doing 01:39:57 individual multiplies and ads you kind 01:39:59 of have to say like okay well 10x 01:40:00 Improvement because they put into 01:40:02 silicon uh how do we expose that and 01:40:04 then make it so people can use the stuff 01:40:07 without having to know about it and so 01:40:08 this is where a benefit and there's many 01:40:11 different solutions to this the benefit 01:40:13 of what we do in Mojo is you can just 01:40:14 have a complex struct and you say again 01:40:17 complex struct well what is it it's just 01:40:19 a struct it has a float and imaginary it 01:40:20 has like simple Behavior has a bunch of 01:40:22 methods including multiply and then 01:40:25 inside the multiply you can say 01:40:26 effectively if def time yeah are we on 01:40:29 an IBM yeah time are we on that thing 01:40:32 and now you say okay well use the one 01:40:34 instruction and then boom everybody's 01:40:36 complex numbers can be accelerated they 01:40:37 don't have to worry about it and um and 01:40:39 then somebody who like open codes of 01:40:42 multiplication can be told just use star 01:40:44 and it's better for the amount of code 01:40:46 you write and things like this but you 01:40:47 also get it accelerated now other other 01:40:49 approaches and the traditional way of 01:40:51 doing this is you say okay well teach 01:40:53 something like lvm to pattern match it 01:40:55 and again this gets back into the okay 01:40:57 now you have somewhat flaky behavior and 01:40:59 when you have like a partially constant 01:41:02 thing it constant folds and you can't 01:41:04 pattern match it and then like there's 01:41:05 all these other problems that come in 01:41:07 and so this is where like making it more 01:41:08 predictable and building it at the 01:41:10 language level at the library level that 01:41:12 enables it to be way more predictable 01:41:13 and and um works better uh generally in 01:41:17 optimization you want to be lowering you 01:41:19 don't want to be raising and so this is 01:41:21 where taking something that's a higher 01:41:23 level of abstraction and then 01:41:26 decomposing it across the optimizer as 01:41:28 you go is it becomes more predictable 01:41:30 and structurally sound than starting 01:41:32 with something that's scaler and then 01:41:33 trying to vectorize and trying to find 01:41:35 parallelism and and build into it that 01:41:37 way totally an interesting example of 01:41:39 that is something that uh is number 01:41:41 related also a feature that I like about 01:41:42 Zig uh that's that's like comp time as 01:41:45 well as uh like well it's comp time 01:41:47 related is they have um sort of 01:41:50 non-standard bitwidth integers as like a 01:41:52 built-in thing like you say like I want 01:41:54 i7 for example um and at first I heard 01:41:57 about that I was like that's just kind 01:41:58 of cute but like you know I've never 01:41:59 actually use that and ever since I found 01:42:01 out about that I run into it on a 01:42:03 surprisingly regular basis like wanting 01:42:05 that in Rust um in some cases because 01:42:07 I'm doing like some bit packing thing 01:42:09 where I'm like well okay I've got 32-bit 01:42:11 integer but I'm actually only using 23 01:42:13 of those because seven of those are 01:42:14 being used for something else one case 01:42:16 comes up to me is I have a number I know 01:42:18 is small and then I want to put it in an 01:42:19 optional and I want the tag to not cause 01:42:22 another like a 01:42:24 slot yeah right that's that's a great 01:42:26 example yeah um does mojo have a concept 01:42:29 like that where you can say like I I I 01:42:32 want something that is going to end up 01:42:34 taking up you know eight bits but I'm 01:42:36 only using fewer than that so Mojo 01:42:38 Builds on top of ml ml has arbitrary 01:42:41 Precision integers and so yeah you can 01:42:42 have a 10,23 bit integer if you'd like 01:42:45 to do that um we haven't wrapped it up 01:42:48 into libraries and so we haven't gone so 01:42:50 far as to say I have int of n where n is 01:42:53 or like like AP in of with some bitwidth 01:42:58 as a parameter but somebody could 01:42:59 totally do that and so that's something 01:43:01 that Mojo allows you to express directly 01:43:03 in a library and so we don't have to add 01:43:05 compiler support again this is this is a 01:43:07 cool thing is like and you know for 01:43:09 some uh security AES crypto thingy like 01:43:13 maybe having 01:43:15 256bit accelerated stuff I mean actually 01:43:18 there are definitely AES instructions in 01:43:20 all these processors and so being able 01:43:21 to directly access that kind of stuff is 01:43:22 super valuable and being able to do that 01:43:24 for the domain specific uh expert only 01:43:28 kind of territory is really important 01:43:30 and then people like me that don't 01:43:33 understand any of this stuff can just 01:43:34 build on top of the libraries that 01:43:35 somebody else who does know how it works 01:43:37 works and then I can just take it for 01:43:39 granted and that's how most of software 01:43:40 engineering works yeah so I have one 01:43:43 last question about uh API design and 01:43:45 Mojo and uh so in zigg there is a 01:43:50 standard Library convention that when 01:43:52 you call a function that allocates on 01:43:54 the Heap you have to pass it an 01:43:55 allocator there's no concept of like a 01:43:57 built-in like I mean I guess there's a 01:44:00 there is a global allocator available 01:44:01 but culturally the way it's done is you 01:44:03 pass in allocators for everything I 01:44:05 really miss that in Rust I really like 01:44:07 that um I appreciate that if rust had 01:44:09 that there would be certain categories 01:44:11 of use cases that would be a lot less 01:44:13 ergonomic and maybe rust wouldn't be as 01:44:15 popular as it is um but for the type of 01:44:18 programming I do I actually love that 01:44:19 and I I wish that Russ had made that 01:44:22 decision because you could you can 01:44:23 totally build that yourself and that 01:44:24 kind of is what we do a little bit in 01:44:25 the Rock compiler um but I'm kind of 01:44:27 curious how you think about the concept 01:44:29 of allocators and like bump allocators 01:44:30 and stuff like that and um in the 01:44:33 context of Mojo especially given that 01:44:35 you have this you know coupling with 01:44:38 python um well so the python stuff is 01:44:40 kind of unrelated because python will do 01:44:42 what python does and like fair enough um 01:44:45 but but but great question um my today 01:44:48 opinion which reserve the right to 01:44:50 change my opinion but my today opinion 01:44:52 is that um what we need to do is build 01:44:57 different kinds of types into uh 01:45:02 so having an allocator abstraction is 01:45:05 not super uh appealing to me it can be 01:45:08 done C++ STL for example has a fairly 01:45:12 old school uh allocator abstraction and 01:45:15 you could make every type parameterized 01:45:16 on an allocator and then you could 01:45:18 choose to default it or make it explicit 01:45:20 or whatever uh we haven't done that just 01:45:22 because um I haven't seen the value to 01:45:24 be great often if you want to allocate 01:45:27 things with bump pointers and stuff like 01:45:28 this what you have is you have a bump 01:45:30 pointer abstraction and then you want to 01:45:32 be able to allocate instances of types 01:45:34 out of that and then what you get from a 01:45:36 bump pointer is fast allocation but then 01:45:38 you don't run the destructors right and 01:45:42 so because you want to just blow the 01:45:43 whole thing away at once and so it ends 01:45:45 up being a very particular kind of use 01:45:47 case that isn't actually that generic 01:45:49 and so I haven't I haven't seen that 01:45:53 actually work super well um and so I'd 01:45:56 rather keep the complexity out now there 01:45:58 are other things and so I give you an 01:46:00 example of 01:46:01 this um in in C++ for example it has an 01:46:06 SD Vector data type SD Vector as soon as 01:46:09 you push one element onto the vector it 01:46:10 has to go Malik some 01:46:12 space okay um there's another class in 01:46:15 in the lvm code base it's called small 01:46:17 Vector right where you say I want to 01:46:19 have some number of inline elements on 01:46:21 the stack and then if if I exceed that 01:46:23 then I go Malo some thing and the idea 01:46:26 is that many vectors actually have one 01:46:27 or two elements in them and if that's 01:46:30 true then just put on the stack and you 01:46:31 save a Malik you save IND Direction you 01:46:33 save cash misses like it's all it's all 01:46:35 win right and so um it's very 01:46:39 unfortunate that C++ had STD vector and 01:46:42 now there's a different zoo of different 01:46:45 things made by different Library vendors 01:46:47 to try to solve this problem right um 01:46:50 and so our approach which hasn't been 01:46:52 implemented yet but I think folks are 01:46:53 working on is to say well Mojo has a 01:46:55 list type list is basically like SD 01:46:57 Vector but it can take a parameter which 01:47:00 is number of inline elements parameter 01:47:02 happens at comp time and so then you can 01:47:04 have one data type make it consistent 01:47:06 and then you can make the decision of do 01:47:09 you want inline elements or not be 01:47:11 separate from okay what is are you an 01:47:13 array or a dictionary or things like 01:47:14 this and so you can get uh unification 01:47:17 at that level without trying to unify 01:47:19 the allocators do you do the small 01:47:21 string optimization then because that's 01:47:22 another example of that to okay yeah 01:47:24 yeah yeah exactly and so and strings are 01:47:26 very magic and you're talking about 01:47:27 graphium clusters and things like this 01:47:29 you want to be able to do all that kind 01:47:30 of stuff too yeah what about um like 01:47:32 sharing so something that we do in rock 01:47:34 that I only Java knows that does that I 01:47:36 know about is like we have we call it 01:47:38 seamless slices so if you do an 01:47:40 operation that's like split a list like 01:47:41 you get back a bunch of elements like 01:47:44 within that that are that are not it's 01:47:45 not a whole new Heap allocation but 01:47:46 rather it's just like bumping the 01:47:47 reference count that's right that has a 01:47:49 whole bunch of trade-offs but it's 01:47:50 worked out generally very well for us is 01:47:52 that something you're also looking into 01:47:53 in Mojo or uh yes Mojo has a whole 01:47:56 ownership system so you can get slices 01:47:58 and then they all refer to the lifetime 01:48:00 of the the array that you sliced out of 01:48:02 and so you can do all that kind of stuff 01:48:04 and S sear generally get back yes and 01:48:07 slices separate type but then it 01:48:09 conforms the same traits that other 01:48:10 collection types do makes sense awesome 01:48:13 wow we covered a lot of ground uh this 01:48:15 is a great conversation uh thank you so 01:48:18 much Chris I I really appreciate you're 01:48:19 taking the time to talk to me about all 01:48:21 this yeah well so I I mean if if folks 01:48:23 are interested in Mojo um I'd really 01:48:25 encourage them to check out our web page 01:48:26 like we have a ton of documentation the 01:48:27 community is amazing we have a Discord 01:48:29 Channel I think we have 20,000 people on 01:48:32 Discord all talking about different 01:48:33 stuff building things uh Mojo is also 01:48:35 not just an AI language it's also being 01:48:37 used to build web servers and guey 01:48:40 libraries and all kinds of stuff by the 01:48:41 community and so we'd love for people to 01:48:42 get involved and uh Mojo is still pretty 01:48:45 early and so we're still adding uh core 01:48:47 capabilities and building out the 01:48:48 libraries but uh we have a huge 01:48:51 community of people that really 01:48:52 enthusiastic and love for people to get 01:48:54 involved nice well well thanks so much 01:48:57 again and also thank you for all of your 01:48:58 great contributions to programming over 01:49:00 the 01:49:01 years well thank you for having such 01:49:03 interesting people on the show I've 01:49:04 learned a 01:49:06 lot that's it for this one I hope you 01:49:08 liked it you can find links to some of 01:49:10 the things we talked about in the show 01:49:11 notes and if you've been enjoying these 01:49:12 episodes consider supporting the show on
identifier: 2104
model: gemini-2.5-pro-exp-03-25| input-price: 1.25 output-price: 10.0 max-context-length: 128_000
host: 194.230.160.66
https://www.youtube.com/live/uul6hZ5NXC8?feature=sharedinclude_comments: None
include_timestamps: 1
include_glossary: None
output_language: en
cost: 0.07025625
Okay, here is the abstract and summary for the provided transcript based on the Modular Community Meetup. *Abstract:* This transcript captures the first Modular Community Meetup, featuring CEO Chris Lattner and engineer Jack Clayton. Chris Lattner outlines the significant challenges and complexity plaguing the current AI and GPU software ecosystem, heavily criticizing CUDA's limitations (vendor lock-in, proprietary nature, lagging innovation, usability issues) and explaining why alternatives like OpenCL, AI compilers (XLA, TVM), Triton, and MLIR haven't fully solved the problem. He introduces Modular's approach to rebuilding the stack with the Mojo programming language and the Max AI engine, emphasizing their design for performance, usability, portability across heterogeneous hardware, and independence from CUDA. Key announcements include a new, permissive "Modular Community License," the imminent open-sourcing of ~250,000 lines of Mojo GPU kernel code (Apache 2.0), upcoming pip support, and the revelation that Modular's stack currently runs the same binary on both Nvidia and AMD GPUs. Jack Clayton follows with a live demo showcasing Mojo for GPU programming, including writing basic kernels (Mandelbrot), inspecting generated PTX/assembly, creating custom ops for the Max engine via Python, and demonstrating the cross-vendor portability by running the identical Mojo code on both Nvidia A100 and AMD MI300X GPUs. *Modular Community Meetup: Reshaping GPU Development with Mojo and Max* * *0:00:51 Event Kick-off:* Welcome to the first Modular Community Meetup, outlining talks by Chris Lattner (CEO) on Mojo/Max and Jack Clayton (Engineer) with a Mojo GPU programming demo. * *0:03:30 Chris Lattner: The Problem with GPU Software:* AI's importance is hampered by poor, complex software stacks, particularly around CUDA, which has become a "rickety moat" and a "swamp." * *0:07:12 CUDA's Shortcomings:* Identified issues include vendor lock-in (Nvidia only), hardware limitations (GPU-only, not even Nvidia CPUs), proprietary nature hindering innovation, poor Python integration, C++ complexities, and massive container sizes (15-50GB). * *0:09:43 Flawed Solutions - Fixed Function:* Approaches like NIMS optimize specific models but sacrifice the flexibility and innovation core to AI's value ("dystopian"). * *0:12:26 DeepSeek as a Catalyst:* Highlighted performance gains possible by bypassing abstractions (going to PTX) but showed this requires immense resources, making it inaccessible for most ("dystopian"). Democratization is needed. * *0:14:32 Why Alternatives Failed (OpenCL, Compilers, Triton, MLIR):* * OpenCL: Suffered from committee-driven slowness vs. CUDA's focused drive. * AI Compilers (XLA/TVM): Hit performance ceilings and bottlenecked on compiler expertise; struggled to adapt to GenAI's complexity (e.g., FlashAttention). * Triton: Improved usability (Pythonic, tile-based) but sacrifices performance (~20%), often rewritten in CUDA C++, and lacks portability/ASIC support. Validated need for usability but can't lose performance. * MLIR: Successful infrastructure but doesn't provide a full AI stack; ecosystem projects often replicated XLA's limitations. * *0:23:47 Modular's Approach: Rebuilding the Stack:* A multi-year research effort to create new abstractions (Mojo language, Max engine) to fundamentally change the game. * *0:25:17 The Modular Stack:* * *Mojo:* A Pythonic systems programming language designed for heterogeneous compute (CPUs, GPUs, ASICs), providing full hardware access and performance without C++ pain points. *Crucially, Modular's GPU code is written in Mojo, entirely replacing CUDA, CuDNN, Cutlass etc.* * *Max:* An AI engine (initially inference-focused) built on Mojo, offering Python APIs, predictability, GenAI design, kernel fusion, and serving components. * *Enterprise Platform:* Higher-level tools for managing large-scale GPU deployments. * *0:31:43 Deployment & Portability:* Simple Docker deployment. *Key Reveal:* The *same compiled binary* currently runs across both Nvidia (A100/H100) and AMD (MI300X) GPUs. * *0:32:25 Major Announcements: Licensing & Open Source:* * *New Modular Community License:* Mojo & Max are free for non-commercial use, AND free for commercial/production use on x86/Nvidia hardware (with attribution). Goal: remove barriers to adoption. * *Impending Open Source:* ~250,000 lines of high-performance Mojo GPU kernels (matmul, FlashAttention, etc.) for multiple GPUs coming in ~1-2 weeks under Apache 2.0 license. * *Mojo Compiler Open Source:* Still planned (target tentative: 2026). * *Pip Support:* Coming soon for easier integration. * *0:39:09 Resources:* `builds.modular-dot-com` provides access to hundreds of optimized models and easy deployment commands. Recipes for common AI tasks available. * *0:40:45 Q&A Highlights (Chris Lattner):* Mojo is general-purpose but currently optimized for AI use cases; Mac GPU support targeted for summer; C interop exists, Rust possible via C; Mojo supports various accelerators (if they have a program counter); Mojo uses powerful metaprogramming (inspired by Zig's `comptime`, dependent types) to build high-level abstractions (like tile-based programming) *as libraries*, not hardcoded in the compiler, offering more power and flexibility than Triton. * *1:01:03 Jack Clayton: Mojo GPU Programming Demo:* Introduction to writing GPU kernels in Mojo. * *1:01:31 Mandelbrot Kernel:* Live coding and execution of a Mandelbrot kernel on an Nvidia A100 GPU using Mojo syntax (`gpu_kernel`, `thread_idx`, `LayoutTensor`). * *1:03:20 Assembly Inspection:* Demonstrated dumping the generated PTX assembly code directly from Mojo. * *1:03:53 Custom Ops for Max:* Showcased creating Mojo GPU kernels (grayscale, brightness, blur) and integrating them as custom operations within a Python script using the Max engine for an image processing pipeline. * *1:08:48 Cross-Vendor Portability Demo:* *Key Demo:* Ran the *exact same Mojo code* (Mandelbrot, image pipeline, even low-level warp shuffles) successfully on an AMD MI300X GPU, demonstrating true write-once-run-anywhere capability between vendors. Showed dumping AMD GPU assembly. * *1:09:47 Demo Resources & Challenge:* Provided links to GPU guides, example repositories, and issued a challenge to refactor a kernel for swag. * *1:11:28 Q&A Highlights (Jack Clayton):* Apple Metal support not yet available (planned); confirmed AMD GPU was MI300X; Collab support being investigated; graphics/video possible but not specifically explored yet. * *1:14:11 Closing:* Call for hiring announcements, mention of upcoming hackathon and community forum. I used gemini-2.5-pro-exp-03-25| input-price: 1.25 output-price: 10.0 max-context-length: 128_000 on rocketrecap dot com to summarize the transcript. Cost (if I didn't use the free tier): $0.07 Input tokens: 43053 Output tokens: 1644
Okay, here is the abstract and summary for the provided transcript based on the Modular Community Meetup.
Abstract:
This transcript captures the first Modular Community Meetup, featuring CEO Chris Lattner and engineer Jack Clayton. Chris Lattner outlines the significant challenges and complexity plaguing the current AI and GPU software ecosystem, heavily criticizing CUDA's limitations (vendor lock-in, proprietary nature, lagging innovation, usability issues) and explaining why alternatives like OpenCL, AI compilers (XLA, TVM), Triton, and MLIR haven't fully solved the problem. He introduces Modular's approach to rebuilding the stack with the Mojo programming language and the Max AI engine, emphasizing their design for performance, usability, portability across heterogeneous hardware, and independence from CUDA. Key announcements include a new, permissive "Modular Community License," the imminent open-sourcing of ~250,000 lines of Mojo GPU kernel code (Apache 2.0), upcoming pip support, and the revelation that Modular's stack currently runs the same binary on both Nvidia and AMD GPUs. Jack Clayton follows with a live demo showcasing Mojo for GPU programming, including writing basic kernels (Mandelbrot), inspecting generated PTX/assembly, creating custom ops for the Max engine via Python, and demonstrating the cross-vendor portability by running the identical Mojo code on both Nvidia A100 and AMD MI300X GPUs.
Modular Community Meetup: Reshaping GPU Development with Mojo and Max
Below, I will provide input for an example video (comprising of title, description, and transcript, in this order) and the corresponding abstract and summary I expect. Afterward, I will provide a new transcript that I want you to summarize in the same format. **Please give an abstract of the transcript and then summarize the transcript in a self-contained bullet list format.** Include starting timestamps, important details and key takeaways. Example Input: Fluidigm Polaris Part 2- illuminator and camera mikeselectricstuff 131K subscribers Subscribed 369 Share Download Clip Save 5,857 views Aug 26, 2024 Fluidigm Polaris part 1 : • Fluidigm Polaris (Part 1) - Biotech g... Ebay listings: https://www.ebay.co.uk/usr/mikeselect... Merch https://mikeselectricstuff.creator-sp... Transcript Follow along using the transcript. Show transcript mikeselectricstuff 131K subscribers Videos About Support on Patreon 40 Comments @robertwatsonbath 6 hours ago Thanks Mike. Ooof! - with the level of bodgery going on around 15:48 I think shame would have made me do a board re spin, out of my own pocket if I had to. 1 Reply @Muonium1 9 hours ago The green LED looks different from the others and uses phosphor conversion because of the "green gap" problem where green InGaN emitters suffer efficiency droop at high currents. Phosphide based emitters don't start becoming efficient until around 600nm so also can't be used for high power green emitters. See the paper and plot by Matthias Auf der Maur in his 2015 paper on alloy fluctuations in InGaN as the cause of reduced external quantum efficiency at longer (green) wavelengths. 4 Reply 1 reply @tafsirnahian669 10 hours ago (edited) Can this be used as an astrophotography camera? Reply mikeselectricstuff · 1 reply @mikeselectricstuff 6 hours ago Yes, but may need a shutter to avoid light during readout Reply @2010craggy 11 hours ago Narrowband filters we use in Astronomy (Astrophotography) are sided- they work best passing light in one direction so I guess the arrows on the filter frames indicate which way round to install them in the filter wheel. 1 Reply @vitukz 12 hours ago A mate with Channel @extractions&ire could use it 2 Reply @RobertGallop 19 hours ago That LED module says it can go up to 28 amps!!! 21 amps for 100%. You should see what it does at 20 amps! Reply @Prophes0r 19 hours ago I had an "Oh SHIT!" moment when I realized that the weird trapezoidal shape of that light guide was for keystone correction of the light source. Very clever. 6 Reply @OneBiOzZ 20 hours ago given the cost of the CCD you think they could have run another PCB for it 9 Reply @tekvax01 21 hours ago $20 thousand dollars per minute of run time! 1 Reply @tekvax01 22 hours ago "We spared no expense!" John Hammond Jurassic Park. *(that's why this thing costs the same as a 50-seat Greyhound Bus coach!) Reply @florianf4257 22 hours ago The smearing on the image could be due to the fact that you don't use a shutter, so you see brighter stripes under bright areas of the image as you still iluminate these pixels while the sensor data ist shifted out towards the top. I experienced this effect back at university with a LN-Cooled CCD for Spectroscopy. The stripes disapeared as soon as you used the shutter instead of disabling it in the open position (but fokussing at 100ms integration time and continuous readout with a focal plane shutter isn't much fun). 12 Reply mikeselectricstuff · 1 reply @mikeselectricstuff 12 hours ago I didn't think of that, but makes sense 2 Reply @douro20 22 hours ago (edited) The red LED reminds me of one from Roithner Lasertechnik. I have a Symbol 2D scanner which uses two very bright LEDs from that company, one red and one red-orange. The red-orange is behind a lens which focuses it into an extremely narrow beam. 1 Reply @RicoElectrico 23 hours ago PFG is Pulse Flush Gate according to the datasheet. Reply @dcallan812 23 hours ago Very interesting. 2x Reply @littleboot_ 1 day ago Cool interesting device Reply @dav1dbone 1 day ago I've stripped large projectors, looks similar, wonder if some of those castings are a magnesium alloy? Reply @kevywevvy8833 1 day ago ironic that some of those Phlatlight modules are used in some of the cheapest disco lights. 1 Reply 1 reply @bill6255 1 day ago Great vid - gets right into subject in title, its packed with information, wraps up quickly. Should get a YT award! imho 3 Reply @JAKOB1977 1 day ago (edited) The whole sensor module incl. a 5 grand 50mpix sensor for 49 £.. highest bid atm Though also a limited CCD sensor, but for the right buyer its a steal at these relative low sums. Architecture Full Frame CCD (Square Pixels) Total Number of Pixels 8304 (H) × 6220 (V) = 51.6 Mp Number of Effective Pixels 8208 (H) × 6164 (V) = 50.5 Mp Number of Active Pixels 8176 (H) × 6132 (V) = 50.1 Mp Pixel Size 6.0 m (H) × 6.0 m (V) Active Image Size 49.1 mm (H) × 36.8 mm (V) 61.3 mm (Diagonal), 645 1.1x Optical Format Aspect Ratio 4:3 Horizontal Outputs 4 Saturation Signal 40.3 ke− Output Sensitivity 31 V/e− Quantum Efficiency KAF−50100−CAA KAF−50100−AAA KAF−50100−ABA (with Lens) 22%, 22%, 16% (Peak R, G, B) 25% 62% Read Noise (f = 18 MHz) 12.5 e− Dark Signal (T = 60°C) 42 pA/cm2 Dark Current Doubling Temperature 5.7°C Dynamic Range (f = 18 MHz) 70.2 dB Estimated Linear Dynamic Range (f = 18 MHz) 69.3 dB Charge Transfer Efficiency Horizontal Vertical 0.999995 0.999999 Blooming Protection (4 ms Exposure Time) 800X Saturation Exposure Maximum Date Rate 18 MHz Package Ceramic PGA Cover Glass MAR Coated, 2 Sides or Clear Glass Features • TRUESENSE Transparent Gate Electrode for High Sensitivity • Ultra-High Resolution • Board Dynamic Range • Low Noise Architecture • Large Active Imaging Area Applications • Digitization • Mapping/Aerial • Photography • Scientific Thx for the tear down Mike, always a joy Reply @martinalooksatthings 1 day ago 15:49 that is some great bodging on of caps, they really didn't want to respin that PCB huh 8 Reply @RhythmGamer 1 day ago Was depressed today and then a new mike video dropped and now I’m genuinely happy to get my tear down fix 1 Reply @dine9093 1 day ago (edited) Did you transfrom into Mr Blobby for a moment there? 2 Reply @NickNorton 1 day ago Thanks Mike. Your videos are always interesting. 5 Reply @KeritechElectronics 1 day ago Heavy optics indeed... Spare no expense, cost no object. Splendid build quality. The CCD is a thing of beauty! 1 Reply @YSoreil 1 day ago The pricing on that sensor is about right, I looked in to these many years ago when they were still in production since it's the only large sensor you could actually buy. Really cool to see one in the wild. 2 Reply @snik2pl 1 day ago That leds look like from led projector Reply @vincei4252 1 day ago TDI = Time Domain Integration ? 1 Reply @wolpumba4099 1 day ago (edited) Maybe the camera should not be illuminated during readout. From the datasheet of the sensor (Onsemi): saturation 40300 electrons, read noise 12.5 electrons per pixel @ 18MHz (quite bad). quantum efficiency 62% (if it has micro lenses), frame rate 1 Hz. lateral overflow drain to prevent blooming protects against 800x (factor increases linearly with exposure time) saturation exposure (32e6 electrons per pixel at 4ms exposure time), microlens has +/- 20 degree acceptance angle i guess it would be good for astrophotography 4 Reply @txm100 1 day ago (edited) Babe wake up a new mikeselectricstuff has dropped! 9 Reply @vincei4252 1 day ago That looks like a finger-lakes filter wheel, however, for astronomy they'd never use such a large stepper. 1 Reply @MRooodddvvv 1 day ago yaaaaay ! more overcomplicated optical stuff ! 4 Reply 1 reply @NoPegs 1 day ago He lives! 11 Reply 1 reply Transcript 0:00 so I've stripped all the bits of the 0:01 optical system so basically we've got 0:03 the uh the camera 0:05 itself which is mounted on this uh very 0:09 complex 0:10 adjustment thing which obviously to set 0:13 you the various tilt and uh alignment 0:15 stuff then there's two of these massive 0:18 lenses I've taken one of these apart I 0:20 think there's something like about eight 0:22 or nine Optical elements in here these 0:25 don't seem to do a great deal in terms 0:26 of electr magnification they're obiously 0:28 just about getting the image to where it 0:29 uh where it needs to be just so that 0:33 goes like that then this Optical block I 0:36 originally thought this was made of some 0:37 s crazy heavy material but it's just 0:39 really the sum of all these Optical bits 0:41 are just ridiculously heavy those lenses 0:43 are about 4 kilos each and then there's 0:45 this very heavy very solid um piece that 0:47 goes in the middle and this is so this 0:49 is the filter wheel assembly with a 0:51 hilariously oversized steper 0:53 motor driving this wheel with these very 0:57 large narrow band filters so we've got 1:00 various different shades of uh 1:03 filters there five Al together that 1:06 one's actually just showing up a silver 1:07 that's actually a a red but fairly low 1:10 transmission orangey red blue green 1:15 there's an excess cover on this side so 1:16 the filters can be accessed and changed 1:19 without taking anything else apart even 1:21 this is like ridiculous it's like solid 1:23 aluminium this is just basically a cover 1:25 the actual wavelengths of these are um 1:27 488 525 570 630 and 700 NM not sure what 1:32 the suffix on that perhaps that's the uh 1:34 the width of the spectral line say these 1:37 are very narrow band filters most of 1:39 them are you very little light through 1:41 so it's still very tight narrow band to 1:43 match the um fluoresence of the dies 1:45 they're using in the biochemical process 1:48 and obviously to reject the light that's 1:49 being fired at it from that Illuminator 1:51 box and then there's a there's a second 1:53 one of these lenses then the actual sort 1:55 of samples below that so uh very serious 1:58 amount of very uh chunky heavy Optics 2:01 okay let's take a look at this light 2:02 source made by company Lumen Dynamics 2:04 who are now part of 2:06 excelitas self-contained unit power 2:08 connector USB and this which one of the 2:11 Cable Bundle said was a TTL interface 2:14 USB wasn't used in uh the fluid 2:17 application output here and I think this 2:19 is an input for um light feedback I 2:21 don't if it's regulated or just a measur 2:23 measurement facility and the uh fiber 2:27 assembly 2:29 Square Inlet there and then there's two 2:32 outputs which have uh lens assemblies 2:35 and this small one which goes back into 2:37 that small Port just Loops out of here 2:40 straight back in So on this side we've 2:42 got the electronics which look pretty 2:44 straightforward we've got a bit of power 2:45 supply stuff over here and we've got 2:48 separate drivers for each wavelength now 2:50 interesting this is clearly been very 2:52 specifically made for this application 2:54 you I was half expecting like say some 2:56 generic drivers that could be used for a 2:58 number of different things but actually 3:00 literally specified the exact wavelength 3:02 on the PCB there is provision here for 3:04 385 NM which isn't populated but this is 3:07 clearly been designed very specifically 3:09 so these four drivers look the same but 3:10 then there's two higher power ones for 3:12 575 and 3:14 520 a slightly bigger heat sink on this 3:16 575 section there a p 24 which is 3:20 providing USB interface USB isolator the 3:23 USB interface just presents as a comport 3:26 I did have a quick look but I didn't 3:27 actually get anything sensible um I did 3:29 dump the Pi code out and there's a few 3:31 you a few sort of commands that you 3:32 could see in text but I didn't actually 3:34 manage to get it working properly I 3:36 found some software for related version 3:38 but it didn't seem to want to talk to it 3:39 but um I say that wasn't used for the 3:41 original application it might be quite 3:42 interesting to get try and get the Run 3:44 hours count out of it and the TTL 3:46 interface looks fairly straightforward 3:48 we've got positions for six opto 3:50 isolators but only five five are 3:52 installed so that corresponds with the 3:54 unused thing so I think this hopefully 3:56 should be as simple as just providing a 3:57 ttrl signal for each color to uh enable 4:00 it a big heat sink here which is there I 4:03 think there's like a big S of metal 4:04 plate through the middle of this that 4:05 all the leads are mounted on the other 4:07 side so this is heat sinking it with a 4:09 air flow from a uh just a fan in here 4:13 obviously don't have the air flow 4:14 anywhere near the Optics so conduction 4:17 cool through to this plate that's then 4:18 uh air cooled got some pots which are 4:21 presumably power 4:22 adjustments okay let's take a look at 4:24 the other side which is uh much more 4:27 interesting see we've got some uh very 4:31 uh neatly Twisted cable assemblies there 4:35 a bunch of leads so we've got one here 4:37 475 up here 430 NM 630 575 and 520 4:44 filters and dcro mirrors a quick way to 4:48 see what's white is if we just shine 4:49 some white light through 4:51 here not sure how it is is to see on the 4:54 camera but shining white light we do 4:55 actually get a bit of red a bit of blue 4:57 some yellow here so the obstacle path 5:00 575 it goes sort of here bounces off 5:03 this mirror and goes out the 520 goes 5:07 sort of down here across here and up 5:09 there 630 goes basically straight 5:13 through 5:15 430 goes across there down there along 5:17 there and the 475 goes down here and 5:20 left this is the light sensing thing 5:22 think here there's just a um I think 5:24 there a photo diode or other sensor 5:26 haven't actually taken that off and 5:28 everything's fixed down to this chunk of 5:31 aluminium which acts as the heat 5:32 spreader that then conducts the heat to 5:33 the back side for the heat 5:35 sink and the actual lead packages all 5:38 look fairly similar except for this one 5:41 on the 575 which looks quite a bit more 5:44 substantial big spay 5:46 Terminals and the interface for this 5:48 turned out to be extremely simple it's 5:50 literally a 5V TTL level to enable each 5:54 color doesn't seem to be any tensity 5:56 control but there are some additional 5:58 pins on that connector that weren't used 5:59 in the through time thing so maybe 6:01 there's some extra lines that control 6:02 that I couldn't find any data on this uh 6:05 unit and the um their current product 6:07 range is quite significantly different 6:09 so we've got the uh blue these 6:13 might may well be saturating the camera 6:16 so they might look a bit weird so that's 6:17 the 430 6:18 blue the 575 6:24 yellow uh 6:26 475 light blue 6:29 the uh 520 6:31 green and the uh 630 red now one 6:36 interesting thing I noticed for the 6:39 575 it's actually it's actually using a 6:42 white lead and then filtering it rather 6:44 than using all the other ones are using 6:46 leads which are the fundamental colors 6:47 but uh this is actually doing white and 6:50 it's a combination of this filter and 6:52 the dichroic mirrors that are turning to 6:55 Yellow if we take the filter out and a 6:57 lot of the a lot of the um blue content 7:00 is going this way the red is going 7:02 straight through these two mirrors so 7:05 this is clearly not reflecting much of 7:08 that so we end up with the yellow coming 7:10 out of uh out of there which is a fairly 7:14 light yellow color which you don't 7:16 really see from high intensity leads so 7:19 that's clearly why they've used the 7:20 white to uh do this power consumption of 7:23 the white is pretty high so going up to 7:25 about 2 and 1 half amps on that color 7:27 whereas most of the other colors are 7:28 only drawing half an amp or so at 24 7:30 volts the uh the green is up to about 7:32 1.2 but say this thing is uh much 7:35 brighter and if you actually run all the 7:38 colors at the same time you get a fairly 7:41 reasonable um looking white coming out 7:43 of it and one thing you might just be 7:45 out to notice is there is some sort 7:46 color banding around here that's not 7:49 getting uh everything s completely 7:51 concentric and I think that's where this 7:53 fiber optic thing comes 7:58 in I'll 8:00 get a couple of Fairly accurately shaped 8:04 very sort of uniform color and looking 8:06 at What's um inside here we've basically 8:09 just got this Square Rod so this is 8:12 clearly yeah the lights just bouncing 8:13 off all the all the various sides to um 8:16 get a nice uniform illumination uh this 8:19 back bit looks like it's all potted so 8:21 nothing I really do to get in there I 8:24 think this is fiber so I have come 8:26 across um cables like this which are 8:27 liquid fill but just looking through the 8:30 end of this it's probably a bit hard to 8:31 see it does look like there fiber ends 8:34 going going on there and so there's this 8:36 feedback thing which is just obviously 8:39 compensating for the any light losses 8:41 through here to get an accurate 8:43 representation of uh the light that's 8:45 been launched out of these two 8:47 fibers and you see uh 8:49 these have got this sort of trapezium 8:54 shape light guides again it's like a 8:56 sort of acrylic or glass light guide 9:00 guess projected just to make the right 9:03 rectangular 9:04 shape and look at this Center assembly 9:07 um the light output doesn't uh change 9:10 whether you feed this in or not so it's 9:11 clear not doing any internal Clos Loop 9:14 control obviously there may well be some 9:16 facility for it to do that but it's not 9:17 being used in this 9:19 application and so this output just 9:21 produces a voltage on the uh outle 9:24 connector proportional to the amount of 9:26 light that's present so there's a little 9:28 diffuser in the back there 9:30 and then there's just some kind of uh 9:33 Optical sensor looks like a 9:35 chip looking at the lead it's a very 9:37 small package on the PCB with this lens 9:40 assembly over the top and these look 9:43 like they're actually on a copper 9:44 Metalized PCB for maximum thermal 9:47 performance and yeah it's a very small 9:49 package looks like it's a ceramic 9:51 package and there's a thermister there 9:53 for temperature monitoring this is the 9:56 475 blue one this is the 520 need to 9:59 Green which is uh rather different OB 10:02 it's a much bigger D with lots of bond 10:04 wise but also this looks like it's using 10:05 a phosphor if I shine a blue light at it 10:08 lights up green so this is actually a 10:10 phosphor conversion green lead which 10:12 I've I've come across before they want 10:15 that specific wavelength so they may be 10:17 easier to tune a phosphor than tune the 10:20 um semiconductor material to get the uh 10:23 right right wavelength from the lead 10:24 directly uh red 630 similar size to the 10:28 blue one or does seem to have a uh a 10:31 lens on top of it there is a sort of red 10:33 coloring to 10:35 the die but that doesn't appear to be 10:38 fluorescent as far as I can 10:39 tell and the white one again a little 10:41 bit different sort of much higher 10:43 current 10:46 connectors a makeer name on that 10:48 connector flot light not sure if that's 10:52 the connector or the lead 10:54 itself and obviously with the phosphor 10:56 and I'd imagine that phosphor may well 10:58 be tuned to get the maximum to the uh 5 11:01 cenm and actually this white one looks 11:04 like a St fairly standard product I just 11:06 found it in Mouse made by luminous 11:09 devices in fact actually I think all 11:11 these are based on various luminous 11:13 devices modules and they're you take 11:17 looks like they taking the nearest 11:18 wavelength and then just using these 11:19 filters to clean it up to get a precise 11:22 uh spectral line out of it so quite a 11:25 nice neat and um extreme 11:30 bright light source uh sure I've got any 11:33 particular use for it so I think this 11:35 might end up on 11:36 eBay but uh very pretty to look out and 11:40 without the uh risk of burning your eyes 11:43 out like you do with lasers so I thought 11:45 it would be interesting to try and 11:46 figure out the runtime of this things 11:48 like this we usually keep some sort 11:49 record of runtime cuz leads degrade over 11:51 time I couldn't get any software to work 11:52 through the USB face but then had a 11:54 thought probably going to be writing the 11:55 runtime periodically to the e s prom so 11:58 I just just scope up that and noticed it 12:00 was doing right every 5 minutes so I 12:02 just ran it for a while periodically 12:04 reading the E squ I just held the pick 12:05 in in reset and um put clip over to read 12:07 the square prom and found it was writing 12:10 one location per color every 5 minutes 12:12 so if one color was on it would write 12:14 that location every 5 minutes and just 12:16 increment it by one so after doing a few 12:18 tests with different colors of different 12:19 time periods it looked extremely 12:21 straightforward it's like a four bite 12:22 count for each color looking at the 12:24 original data that was in it all the 12:26 colors apart from Green were reading 12:28 zero and the green was reading four 12:30 indicating a total 20 minutes run time 12:32 ever if it was turned on run for a short 12:34 time then turned off that might not have 12:36 been counted but even so indicates this 12:37 thing wasn't used a great deal the whole 12:40 s process of doing a run can be several 12:42 hours but it'll only be doing probably 12:43 the Imaging at the end of that so you 12:46 wouldn't expect to be running for a long 12:47 time but say a single color for 20 12:50 minutes over its whole lifetime does 12:52 seem a little bit on the low side okay 12:55 let's look at the camera un fortunately 12:57 I managed to not record any sound when I 12:58 did this it's also a couple of months 13:00 ago so there's going to be a few details 13:02 that I've forgotten so I'm just going to 13:04 dub this over the original footage so um 13:07 take the lid off see this massive great 13:10 heat sink so this is a pel cool camera 13:12 we've got this blower fan producing a 13:14 fair amount of air flow through 13:16 it the connector here there's the ccds 13:19 mounted on the board on the 13:24 right this unplugs so we've got a bit of 13:27 power supply stuff on here 13:29 USB interface I think that's the Cyprus 13:32 microcontroller High speeded USB 13:34 interface there's a zyink spon fpga some 13:40 RAM and there's a couple of ATD 13:42 converters can't quite read what those 13:45 those are but anal 13:47 devices um little bit of bodgery around 13:51 here extra decoupling obviously they 13:53 have having some noise issues this is 13:55 around the ram chip quite a lot of extra 13:57 capacitors been added there 13:59 uh there's a couple of amplifiers prior 14:01 to the HD converter buffers or Andor 14:05 amplifiers taking the CCD 14:08 signal um bit more power spy stuff here 14:11 this is probably all to do with 14:12 generating the various CCD bias voltages 14:14 they uh need quite a lot of exotic 14:18 voltages next board down is just a 14:20 shield and an interconnect 14:24 boardly shielding the power supply stuff 14:26 from some the more sensitive an log 14:28 stuff 14:31 and this is the bottom board which is 14:32 just all power supply 14:34 stuff as you can see tons of capacitors 14:37 or Transformer in 14:42 there and this is the CCD which is a uh 14:47 very impressive thing this is a kf50 100 14:50 originally by true sense then codec 14:53 there ON 14:54 Semiconductor it's 50 megapixels uh the 14:58 only price I could find was this one 15:00 5,000 bucks and the architecture you can 15:03 see there actually two separate halves 15:04 which explains the Dual AZ converters 15:06 and two amplifiers it's literally split 15:08 down the middle and duplicated so it's 15:10 outputting two streams in parallel just 15:13 to keep the bandwidth sensible and it's 15:15 got this amazing um diffraction effects 15:18 it's got micro lenses over the pixel so 15:20 there's there's a bit more Optics going 15:22 on than on a normal 15:25 sensor few more bodges on the CCD board 15:28 including this wire which isn't really 15:29 tacked down very well which is a bit uh 15:32 bit of a mess quite a few bits around 15:34 this board where they've uh tacked 15:36 various bits on which is not super 15:38 impressive looks like CCD drivers on the 15:40 left with those 3 ohm um damping 15:43 resistors on the 15:47 output get a few more little bodges 15:50 around here some of 15:52 the and there's this separator the 15:54 silica gel to keep the moisture down but 15:56 there's this separator that actually 15:58 appears to be cut from piece of 15:59 antistatic 16:04 bag and this sort of thermal block on 16:06 top of this stack of three pel Cola 16:12 modules so as with any Stacks they get 16:16 um larger as they go back towards the 16:18 heat sink because each P's got to not 16:20 only take the heat from the previous but 16:21 also the waste heat which is quite 16:27 significant you see a little temperature 16:29 sensor here that copper block which 16:32 makes contact with the back of the 16:37 CCD and this's the back of the 16:40 pelas this then contacts the heat sink 16:44 on the uh rear there a few thermal pads 16:46 as well for some of the other power 16:47 components on this 16:51 PCB okay I've connected this uh camera 16:54 up I found some drivers on the disc that 16:56 seem to work under Windows 7 couldn't 16:58 get to install under Windows 11 though 17:01 um in the absence of any sort of lens or 17:03 being bothered to the proper amount I've 17:04 just put some f over it and put a little 17:06 pin in there to make a pinhole lens and 17:08 software gives a few options I'm not 17:11 entirely sure what all these are there's 17:12 obviously a clock frequency 22 MHz low 17:15 gain and with PFG no idea what that is 17:19 something something game programmable 17:20 Something game perhaps ver exposure 17:23 types I think focus is just like a 17:25 continuous grab until you tell it to 17:27 stop not entirely sure all these options 17:30 are obviously exposure time uh triggers 17:33 there ex external hardware trigger inut 17:35 you just trigger using a um thing on 17:37 screen so the resolution is 8176 by 17:40 6132 and you can actually bin those 17:42 where you combine multiple pixels to get 17:46 increased gain at the expense of lower 17:48 resolution down this is a 10sec exposure 17:51 obviously of the pin hole it's very uh 17:53 intensitive so we just stand still now 17:56 downloading it there's the uh exposure 17:59 so when it's 18:01 um there's a little status thing down 18:03 here so that tells you the um exposure 18:07 [Applause] 18:09 time it's this is just it 18:15 downloading um it is quite I'm seeing 18:18 quite a lot like smearing I think that I 18:20 don't know whether that's just due to 18:21 pixels overloading or something else I 18:24 mean yeah it's not it's not um out of 18:26 the question that there's something not 18:27 totally right about this camera 18:28 certainly was bodge wise on there um I 18:31 don't I'd imagine a camera like this 18:32 it's got a fairly narrow range of 18:34 intensities that it's happy with I'm not 18:36 going to spend a great deal of time on 18:38 this if you're interested in this camera 18:40 maybe for astronomy or something and 18:42 happy to sort of take the risk of it may 18:44 not be uh perfect I'll um I think I'll 18:47 stick this on eBay along with the 18:48 Illuminator I'll put a link down in the 18:50 description to the listing take your 18:52 chances to grab a bargain so for example 18:54 here we see this vertical streaking so 18:56 I'm not sure how normal that is this is 18:58 on fairly bright scene looking out the 19:02 window if I cut the exposure time down 19:04 on that it's now 1 second 19:07 exposure again most of the image 19:09 disappears again this is looks like it's 19:11 possibly over still overloading here go 19:14 that go down to say say quarter a 19:16 second so again I think there might be 19:19 some Auto gain control going on here um 19:21 this is with the PFG option let's try 19:23 turning that off and see what 19:25 happens so I'm not sure this is actually 19:27 more streaking or which just it's 19:29 cranked up the gain all the dis display 19:31 gray scale to show what um you know the 19:33 range of things that it's captured 19:36 there's one of one of 12 things in the 19:38 software there's um you can see of you 19:40 can't seem to read out the temperature 19:42 of the pelta cooler but you can set the 19:44 temperature and if you said it's a 19:46 different temperature you see the power 19:48 consumption jump up running the cooler 19:50 to get the temperature you requested but 19:52 I can't see anything anywhere that tells 19:54 you whether the cool is at the at the 19:56 temperature other than the power 19:57 consumption going down and there's no 19:59 temperature read out 20:03 here and just some yeah this is just 20:05 sort of very basic software I'm sure 20:07 there's like an API for more 20:09 sophisticated 20:10 applications but so if you know anything 20:12 more about these cameras please um stick 20:14 in the 20:15 comments um incidentally when I was 20:18 editing I didn't notice there was a bent 20:19 pin on the um CCD but I did fix that 20:22 before doing these tests and also 20:24 reactivated the um silica gel desicant 20:26 cuz I noticed it was uh I was getting 20:28 bit of condensation on the window but um 20:31 yeah so a couple of uh interesting but 20:34 maybe not particularly uh useful pieces 20:37 of Kit except for someone that's got a 20:38 very specific use so um I'll stick a 20:42 I'll stick these on eBay put a link in 20:44 the description and say hopefully 20:45 someone could actually make some uh good 20:47 use of these things Example Output: **Abstract:** This video presents Part 2 of a teardown focusing on the optical components of a Fluidigm Polaris biotechnology instrument, specifically the multi-wavelength illuminator and the high-resolution CCD camera. The Lumen Dynamics illuminator unit is examined in detail, revealing its construction using multiple high-power LEDs (430nm, 475nm, 520nm, 575nm, 630nm) combined via dichroic mirrors and filters. A square fiber optic rod is used to homogenize the light. A notable finding is the use of a phosphor-converted white LED filtered to achieve the 575nm output. The unit features simple TTL activation for each color, conduction cooling, and internal homogenization optics. Analysis of its EEPROM suggests extremely low operational runtime. The camera module teardown showcases a 50 Megapixel ON Semiconductor KAF-50100 CCD sensor with micro-lenses, cooled by a multi-stage Peltier stack. The control electronics include an FPGA and a USB interface. Significant post-manufacturing modifications ("bodges") are observed on the camera's circuit boards. Basic functional testing using vendor software and a pinhole lens confirms image capture but reveals prominent vertical streaking artifacts, the cause of which remains uncertain (potential overload, readout artifact, or fault). **Exploring the Fluidigm Polaris: A Detailed Look at its High-End Optics and Camera System** * **0:00 High-End Optics:** The system utilizes heavy, high-quality lenses and mirrors for precise imaging, weighing around 4 kilos each. * **0:49 Narrow Band Filters:** A filter wheel with five narrow band filters (488, 525, 570, 630, and 700 nm) ensures accurate fluorescence detection and rejection of excitation light. * **2:01 Customizable Illumination:** The Lumen Dynamics light source offers five individually controllable LED wavelengths (430, 475, 520, 575, 630 nm) with varying power outputs. The 575nm yellow LED is uniquely achieved using a white LED with filtering. * **3:45 TTL Control:** The light source is controlled via a simple TTL interface, enabling easy on/off switching for each LED color. * **12:55 Sophisticated Camera:** The system includes a 50-megapixel Kodak KAI-50100 CCD camera with a Peltier cooling system for reduced noise. * **14:54 High-Speed Data Transfer:** The camera features dual analog-to-digital converters to manage the high data throughput of the 50-megapixel sensor, which is effectively two 25-megapixel sensors operating in parallel. * **18:11 Possible Issues:** The video creator noted some potential issues with the camera, including image smearing. * **18:11 Limited Dynamic Range:** The camera's sensor has a limited dynamic range, making it potentially challenging to capture scenes with a wide range of brightness levels. * **11:45 Low Runtime:** Internal data suggests the system has seen minimal usage, with only 20 minutes of recorded runtime for the green LED. * **20:38 Availability on eBay:** Both the illuminator and camera are expected to be listed for sale on eBay. Here is the real transcript. Please summarize it: 00:00:51 hello 00:00:51 everyone we're about to get started 00:00:55 welcome again to Modular Headquarters 00:00:58 and thank you so much for joining us for 00:01:00 our very first Modular Community Meetup 00:01:03 we are so excited to see all of you here 00:01:06 in person with us tonight and a special 00:01:09 shout out to everyone tuning in 00:01:10 virtually from LinkedIn and YouTube so 00:01:13 before we get started here's what we've 00:01:15 got planned for you tonight we'll kick 00:01:17 it off with a talk from Chris Latner our 00:01:20 co-founder and CEO who will share how 00:01:23 Max and Mojo are reshaping GPU 00:01:25 development and our mission to 00:01:27 democratize AI compute next we'll hand 00:01:31 it off to Jack Clayton Mojo standard 00:01:34 library engineer for an awesome live 00:01:36 demo of GPU programming in Mojo we'll 00:01:40 have time for a couple questions after 00:01:42 each segment so start thinking about 00:01:44 what you might want to ask and then at 00:01:46 the end we'll open the floor and if 00:01:48 you're hiring we'll give you the chance 00:01:50 to grab the mic and let us know where 00:01:52 you're hiring for where you're hiring uh 00:01:55 so that folks can come and find you 00:01:56 during the networking portion of 00:01:58 tonight's event so afterwards we'll 00:02:01 shift gears into networking mingling and 00:02:04 just hanging out until around 900 p.m or 00:02:06 so so without further ado please help me 00:02:09 give a very warm welcome to the 00:02:11 legendary Chris Latner 00:02:21 awesome thank you Carolyn um welcome 00:02:22 everyone super excited to see you all 00:02:23 this is an amazing turnout i'm really 00:02:25 glad that you've joined us here but also 00:02:26 on the live stream um we have never done 00:02:29 this before so prepare yourself this 00:02:32 could be a catastrophic fail but even if 00:02:35 it is hopefully it's fun so we'll see 00:02:37 how it goes and if it goes well then 00:02:39 maybe we'll do it again so we'll see how 00:02:40 this is so today as Caroline said I'm 00:02:43 going to be talking about kind of two 00:02:45 things one is this whole compute thing 00:02:49 right so I've been writing this blog 00:02:50 post series this takes a lot of my time 00:02:53 um is anybody reading this has anybody 00:02:55 heard this all right that's awesome 00:02:57 sometimes some you're like shouting into 00:02:59 the void and so you don't you don't 00:03:00 actually uh so I'll talk a little bit 00:03:02 about this and kind of do a condensed 00:03:04 form but then I'll also share some what 00:03:06 modular is doing about it which the blog 00:03:09 post series has not talked about 00:03:10 conveniently it's building up to it 00:03:12 we'll get there but not gone to talking 00:03:14 about it and then we'll also talk about 00:03:16 some things we've not shared with the 00:03:18 world yet that are pretty exciting how 00:03:20 we're opening things up so with that 00:03:23 I'll just jump in and also warning I've 00:03:25 never given this talk before and so be 00:03:28 nice so what what are we here to do well 00:03:30 we want to rebuild GPU software right ai 00:03:34 is super important to me i've been 00:03:35 working on this thing for years ai is so 00:03:38 so so important to the world but all the 00:03:39 software is so bad and so one of the 00:03:42 things that the blog series tries to do 00:03:44 is tries to dissect this understand why 00:03:48 it is that something so important with 00:03:50 so much money so much talent so many 00:03:52 people and energy and change and thrash 00:03:54 and all this stuff is still the software 00:03:56 is so 00:03:57 problematic and so we've seen what we as 00:04:01 an industry have seen is that with all 00:04:02 the different use cases with chat GPT 00:04:04 with all this different there's a huge 00:04:06 amount of demand right the applications 00:04:09 are endless the innovations are endless 00:04:12 but on the other hand and and what we've 00:04:14 got as a consequence of that is that all 00:04:16 the people builtware want to solve this 00:04:19 problem there's so much excitement so 00:04:21 many new ideas so many ways to build a 00:04:24 chip so many chips you can build because 00:04:27 it turns about yeah data center is a big 00:04:29 deal so are cell phones so are embedded 00:04:31 IoT so is like all the different use 00:04:33 cases that you might want to use AI in 00:04:35 this is technology that can touch 00:04:37 everything and of course while we say AI 00:04:39 like a million times in this talk it's 00:04:42 actually about heterogeneous compute 00:04:45 right and AI to me is also just an 00:04:47 excuse to tackle this long-standing 00:04:49 problem of heterogeneous compute CPUs 00:04:52 and GPUs AS6 like this entire realm of 00:04:55 accelerated compute and so what we're 00:04:57 doing is we really want to crack this 00:04:59 open because there hasn't been software 00:05:00 to do now there's a flip side of this so 00:05:05 there is a winner in this C 00:05:07 category and so it's it's amazing their 00:05:11 contribution right i mean Alexet 00:05:13 happened on CUDA for example right it's 00:05:15 amazing that this has happened but it we 00:05:18 are kind of at this inflection point 00:05:20 we're kind of stuck a little bit as an 00:05:21 industry and so uh tonight we're going 00:05:24 to dig into this and so I'll briefly 00:05:26 talk about what is CUDA and why hasn't 00:05:28 anything worked but the more exciting 00:05:30 and excite interesting part is talking 00:05:32 about hey okay modular people like 00:05:34 you've been working on this thing for 00:05:35 years like what does it mean and so 00:05:37 we'll talk about that so first thing I 00:05:40 wanted to say is that CUDA is an amazing 00:05:42 thing I have a respect and again I think 00:05:44 we as an industry have a lot of we need 00:05:47 to give CUDA a lot of its deserved 00:05:50 respect for catalyzing AI right it made 00:05:53 TensorFlow and PyTorch possible like a 00:05:54 lot of these things happened really 00:05:56 because of CUDA alexet again like that 00:05:58 was it catalyzed the deep learning 00:06:00 revolution tons of stuff has been built 00:06:02 on top of CUDA but there is the flip 00:06:04 side right which is CUDA is actually not 00:06:06 that great right cuda made sense 18 00:06:09 years ago whatever it was um but the 00:06:12 world's moved on cuda hasn't right and 00:06:15 so what CUDA has done is it's piled up 00:06:17 more and more and more stuff uh to scale 00:06:19 in new use cases but now it's this 00:06:22 rickety moat it's a moat but it's also a 00:06:25 swamp that is surrounding us and you can 00:06:27 feel this you can feel this in the 00:06:28 systems we have today so the systems we 00:06:30 have you know not to pick on any one but 00:06:33 there are this cobbled together pile of 00:06:35 different technologies come together 00:06:37 over the course of many years and many 00:06:40 really smart people have worked on these 00:06:42 systems like this is not negative about 00:06:44 any individual engineer or person or 00:06:46 product manager or whatever but 00:06:48 everybody's always looked at their part 00:06:49 of the problem and so how do we take 00:06:52 given all the stuff that exists how do I 00:06:54 have impact by making my thing better 00:06:57 given that everything else is somewhat 00:06:58 unfortunate but it's convenient and so 00:07:01 as you do this as you iterate as AI 00:07:03 changes as hardware changes as all these 00:07:04 things keep going you get 00:07:07 complexity complexity is the 00:07:10 enemy okay so what else is wrong with 00:07:12 CUDA okay obviously runs on one vendor's 00:07:15 hardware the other thing that's funny 00:07:17 about CUDA is that it only runs on GPUs 00:07:19 even Nvidia's building CPUs they have 00:07:21 this whole Grace thing they're very 00:07:23 proud of it cuda doesn't run on that 00:07:25 super interesting it's the complexity 00:07:28 the closeness the proprietariness and 00:07:30 innovation in around the broader 00:07:32 ecosystem and so we're also blocked 00:07:35 because if we want things to move fast 00:07:36 we need that vendor to go do this 00:07:39 um all all due respect to CUDA it is not 00:07:43 very Pythonic i find it very exciting 00:07:46 very flattering that Nvidia is now 00:07:48 adding tons of Pythony things and 00:07:50 they're trying to like fit all the stuff 00:07:52 that is Python with Nvidia specific 00:07:55 stuff but but core CUDA is not and a lot 00:07:58 of the ecosystem are not using the 00:08:00 technologies they're introducing and so 00:08:02 if you want to have state-of-the-art 00:08:03 best TCO best power of the hardware you 00:08:06 have to use CUDA 00:08:07 C++ 00:08:09 is not very AIcentric and not very 00:08:12 Python uh C++ is I I'm think I'm 00:08:15 entitled to say this uh C++ 00:08:18 sucks are not awesome compile times are 00:08:22 miserable um if you don't understand 00:08:25 that just ask me offline i can explain 00:08:27 this and and this whole stack just 00:08:29 requires deep intricate knowledge of 00:08:31 like all these things which are poorly 00:08:32 documented and so what does this mean if 00:08:34 you're actually building systems well it 00:08:36 means that like KUDNN for example is a 00:08:39 massive library ecosystem you put all 00:08:41 this stuff together all which is kind of 00:08:43 cobbled together and you get these big 00:08:44 containers and the containers vary but 00:08:47 15 gigs is not unusual sometimes you can 00:08:49 get 50 gigs like you know this packaging 00:08:52 dependency system just becomes a mess so 00:08:56 it's not just me that sees these 00:08:57 problems like we just picked a random 00:08:59 collection of the pain that people are 00:09:01 suffering out there just getting CUDA 00:09:03 versions to line up with PyTorch 00:09:05 nightmare if you want to do something on 00:09:06 AMD well you don't have CUDA and so you 00:09:09 have another nightmare or a different 00:09:10 set of problems if you want to have uh 00:09:12 you know any of these things going on 00:09:15 the one hand the demo looks really 00:09:16 appealing you can follow the little 00:09:18 tutorial and you stay on the happy path 00:09:20 you get to success but then you try 00:09:22 something slightly off the path 00:09:24 everything breaks the magic is gone and 00:09:26 now you have to go debug everything in 00:09:29 the soft back and try to figure out what 00:09:31 assumption at what layer didn't compose 00:09:33 on something else and so now you're 00:09:35 stuck because the beautiful demo didn't 00:09:37 fail so this is AI development today 00:09:41 so there are different theories on how 00:09:43 to fix this so um one one one approach 00:09:46 which lots of people take uh for example 00:09:48 NIMS say okay well all this software all 00:09:50 this ecosystem is such a nightmare 00:09:51 nobody can get it to work but there are 00:09:53 very specific use cases if you want 00:09:56 llama 3 we can go tackle all this 00:09:58 complexity get llama 3 in this one 00:10:00 config to be super optimal and perfect 00:10:02 put in a fake 00:10:04 box and that that works um but the 00:10:08 problem with that is you've destroyed a 00:10:09 lot of the value of AI a lot of the 00:10:11 value of AI is it's not about one thing 00:10:13 a wide variety of use cases you might 00:10:16 care about latency not just throughput 00:10:18 you might care about a slightly 00:10:20 different model you might want to put in 00:10:21 a Laura adapter or something and so a 00:10:24 lot of what the power and the magic of 00:10:26 AI is about the innovation the research 00:10:29 the the velocity in which people can 00:10:31 move and so I find this like make 00:10:33 everything fixed function solution to be 00:10:36 fairly dystopian like it it destroys a 00:10:39 lot of what I believe in in terms of 00:10:40 innovation and and the velocity and the 00:10:42 progress in the real meaning of what has 00:10:44 made AI 00:10:45 successful and so why hasn't given all 00:10:48 these problems why hasn't anything 00:10:50 worked well blog post series kind of 00:10:51 goes into this first you have to 00:10:53 understand why it succeeded and so this 00:10:56 is something that again people different 00:10:59 people look at different technologies in 00:11:01 different ways what I try to do is I try 00:11:02 to understand and look look into it and 00:11:04 understand okay well it's not an 00:11:07 accident that things become successful 00:11:09 let's go understand what happened well 00:11:12 CUDA succeeded for a number of reasons 00:11:14 including luck but not just luck they 00:11:17 built an 00:11:18 amazing gaming ecosystem they built on 00:11:21 their install base they had P with 00:11:24 laptop GPUs and things like this so it 00:11:27 enabled people to learn how to program 00:11:29 CUDA on inexpensive devices and then 00:11:31 scale that's actually a really big deal 00:11:33 that's something that for example if 00:11:35 you're building just a data center line 00:11:37 you you lose out on um of course they 00:11:40 invested early even when it was like 00:11:42 almost physically irresponsible to do so 00:11:44 like Nvidia the GeForce 3 I'm super 00:11:46 impressed it was like a huge bet for the 00:11:48 company on programmability and so they 00:11:50 made really big bets when it was obvious 00:11:53 which I have tons of respect for but 00:11:56 then what they did is they said well as 00:11:58 AI is evolving as Tensflow and PyTorch 00:12:00 have evolving you know in in apparently 00:12:03 good interest go make it go fast on our 00:12:05 hardware the consequence of that is that 00:12:07 then they also said okay well all these 00:12:09 frameworks were now co-designed and 00:12:11 locked to their hardware because to CUDA 00:12:14 and therefore it was very difficult for 00:12:15 anything else to come in and so PyTorch 00:12:18 and Tensflow both have a you know they 00:12:20 they accidentally knited 00:12:23 CUDA so deepse was a big deal and so 00:12:26 that was the excuse for starting blog 00:12:28 post series because I think to me 00:12:29 deepseek was it's just another day in 00:12:32 the neighborhood of AI right it's like 00:12:33 okay well you wake up and now things go 00:12:35 10x faster faster than they used to 00:12:36 because new research right but I think 00:12:39 deepseek was more than that of the world 00:12:41 deepseek both the geopolitical 00:12:43 ramifications but also the technological 00:12:45 ramifications caused a huge wakeup call 00:12:48 and I think a lot of people were talking 00:12:49 about compute in a really new way and so 00:12:52 one of the things that was super 00:12:54 interesting to me is that they got a lot 00:12:56 of attention by air quotes going down to 00:12:59 PTX this is something that if you're a 00:13:01 GPU programmer you kind of have to 00:13:03 routinely do because tensor cores 00:13:05 are you to do this right and I I think 00:13:08 the world didn't realize just how 00:13:10 problematic the CUDA ecosystem is in the 00:13:12 fact that it's not just DeepSec that 00:13:14 does this like lots of people have to do 00:13:16 this and this is a bug not a feature 00:13:19 um uh what I believe to be true and 00:13:22 others as well is that efficiency gains 00:13:24 like what deepseek brought lead to 00:13:26 increased consumption more use cases 00:13:28 more demand AI goes to more places 00:13:30 because as a cost curve changes as you 00:13:32 make things less expensive suddenly you 00:13:34 can do that 00:13:35 technology as Satcha point out Jven 00:13:38 Jven's paradox which is a very 00:13:40 interesting way to look at this now but 00:13:43 the problem is is that okay well if you 00:13:45 go look at deepseek and I'm again I'm 00:13:47 super impressed with their work they've 00:13:48 been opening a lot they've been 00:13:49 publishing a lot i mean they've massive 00:13:51 contribution to the AI 00:13:53 ecosystem they are so advanced they they 00:13:57 as well as some of the top research labs 00:13:59 are doing things at such an advanced and 00:14:02 incredible level but you have to be at 00:14:04 that incredible level where you can 00:14:06 afford to pay hundreds of researchers 00:14:07 and do this kind of work to be able to 00:14:09 unlock the power of the silicon and to 00:14:11 me like on the one hand it's amazing 00:14:13 what can be done but it's also again 00:14:16 kind of dystopian if only the top five 00:14:18 research groups in the industry can 00:14:20 afford to do it i don't think that's a 00:14:22 path to success for us as an industry 00:14:24 and so this is where democratizing has 00:14:26 many different meanings but let's like 00:14:28 get more people able to participate in 00:14:30 this now there's lots of attempts to 00:14:32 solve this problem i'm going to go 00:14:33 through this lightning lightning speed 00:14:36 open CL is the one that should have won 00:14:38 right it was the industry standard 00:14:41 it's it's a a great technology i worked 00:14:43 on it when I was much earlier in my 00:14:45 career um uh but the problem is is it 00:14:48 turned into a committee project right it 00:14:50 moved with comm tensor cores had a lot 00:14:54 of problems that prevented from actually 00:14:55 scaling into the AI ecosystem and um and 00:14:58 you know what in contrast you look at 00:15:00 CUDA CUDA had a strong leader that look 00:15:03 you know what I'm going to solve the 00:15:05 most important use cases in their 00:15:06 opinion and they could drive it much 00:15:08 stronger in a specific direction which 00:15:10 you may or may not agree with but it 00:15:13 showed a lot of velocity and I think 00:15:14 this is the problem that I see with a 00:15:16 lot of community or you know standards 00:15:19 bodies type 00:15:20 approaches what about I'm Hey I'm a I'm 00:15:23 a fan of compilers does anybody else 00:15:24 like compilers here so compilers are 00:15:27 good um so TVM is I think fading away 00:15:31 now xla is something that I happen to 00:15:32 work on so I know a lot more about it um 00:15:35 AI compilers are thing because you 00:15:37 getization kernels don't 00:15:40 generalize of in AI you have many 00:15:43 different kernels to fuse them and if 00:15:45 you have to write all permutations of 00:15:46 all fused kernels you quickly run out of 00:15:48 kernel engineers who can do the work to 00:15:50 actually tune it both for all 00:15:51 permutations of kernels but then all 00:15:53 generations of hardware then all the 00:15:55 dtypes and then all the you know all the 00:15:57 different use cases and suddenly you run 00:15:59 out of kernel engineers you just can't 00:16:00 scale and so compilers are awesome go 00:16:03 compiler 00:16:04 Um and it solved this by making it so 00:16:06 that you could synthesize in an 00:16:08 automated way a novel kernel by using 00:16:11 compiler techniques now the problem with 00:16:14 this is twofold one of which is that 00:16:15 many of projects never could deliver the 00:16:17 peak performance of a kernel author and 00:16:20 so a lot of them ended up using like the 00:16:22 CUDA libraries for matrix multiplication 00:16:24 for example and just did kernel fusion 00:16:26 for the easy cases one problem but there 00:16:29 there's a much bigger problem so the 00:16:31 much bigger problem and compiler folk I 00:16:33 love you um we make fun of the kernel 00:16:36 folk because they don't scale but it 00:16:38 turns out compiler folk don't scale 00:16:39 either so if you say you need a compiler 00:16:42 engineer in the loop to be able to in 00:16:44 innovate in algorithms then you quickly 00:16:46 run out of compiler engineers or you run 00:16:49 to this problem where you may have an 00:16:51 amazingly smart compiler but the 00:16:53 compiler engineer doesn't know the cross 00:16:55 product of the compiler algorithms the 00:16:58 hardware the dtypes the crazy new 00:17:01 research paper that just got published 00:17:03 the hack that the hardware has that 00:17:05 allows you to cut three cycles out of 00:17:06 the inner loop of the thingy and all of 00:17:09 that into one brain i mean to me what I 00:17:11 learned from XLA is it doesn't scale and 00:17:13 there are like when I was there there 00:17:15 was like two or three humans that could 00:17:18 do some of this work and it was just a 00:17:20 huge bottleneck for TPUs and it was a 00:17:22 huge problem now these technologies also 00:17:25 have a different problem that problem is 00:17:26 called Genai and so a lot of these 00:17:29 technologies were designed for 00:17:31 traditional AI and they made sense in 00:17:32 that context but when Genai came over it 00:17:34 really changed the it wasn't about 00:17:36 putting a reu onto a matrix 00:17:38 multiplication anymore it was about 00:17:40 flash attention right and so when when 00:17:42 these things happen when you get MLA you 00:17:45 get like these much more advanced 00:17:46 algorithms suddenly the world got so 00:17:48 much more complicated that the compiler 00:17:50 approaches or these this category of 00:17:52 compiler approaches just didn't scale 00:17:55 and so again we found ourselves as an 00:17:56 industry writing all the kernels and 00:17:58 when you're writing all the kernels 00:17:59 typically those kernels got written in 00:18:01 CUDA because that's where all the 00:18:02 installed hardware was and then again 00:18:05 everything got locked onto one vendor's 00:18:07 platform what about Triton so I'm a big 00:18:10 fan of Triton people like Triton 00:18:12 triton's a good thing too um Triton is 00:18:15 one of these things i said "Okay well 00:18:17 CUDA C++ not that great it's really hard 00:18:20 to use it's very low level." Triton and 00:18:23 Phil and other folks did a really job of 00:18:24 upleveling the programming paradigm and 00:18:26 saying "Let's get the world into Python 00:18:28 let's get people off of uh individual 00:18:31 threads and start thinking about a much 00:18:33 higher level tilebased programming 00:18:35 model." And I think this is a huge step 00:18:37 forward this has been very popular and 00:18:38 very successful in the industry um this 00:18:40 really held when PyTorch 2 came out with 00:18:43 this inductor and Dynamo and this 00:18:45 technology stack that really kind of 00:18:46 endorsed uh Triton the solution to 00:18:50 PyTorch now there's some problems with 00:18:52 it though so first of all it's again 00:18:54 it's a fancy compiler compilers and so 00:18:58 you get a lot of productivity but you 00:18:59 also give up you know it varies but 00:19:01 somewhere on the 20% of performance of 00:19:03 your 00:19:04 chip 20% of cost on a GPU is a lot and 00:19:10 so all of your experience what I've been 00:19:13 told by many folks is that train isn't 00:19:15 really used for state-of-the-art geni 00:19:17 inference like it's been very popular 00:19:20 for training and things like this where 00:19:22 tend to and velocity and things like 00:19:23 this and research are really important 00:19:25 but I've heard of many labs that use 00:19:28 Triton for research and prototyping and 00:19:30 things like this but then have to 00:19:31 rewrite in CUDA C++ and so that's that's 00:19:35 that's value that's contribution that's 00:19:36 awesome that they're making a use case 00:19:38 better but it's really deeply 00:19:40 unfortunate when you have like okay I 00:19:42 prototype in Visual Basic and I have to 00:19:43 rewrite in C++ or something right 00:19:47 and technologies i think Triton also has 00:19:50 another hidden secret which I don't know 00:19:52 if everybody in this room will agree 00:19:54 with me i'm sure some of you work on AS6 00:19:56 triton was not designed for AS6 and so 00:19:59 the programming model the way the memory 00:20:01 and the pointer model work like the like 00:20:03 all this stuff is very GPU specific and 00:20:05 so while it's appealing to think okay 00:20:07 well I can just write this code the the 00:20:10 inner secret of Triton is that you don't 00:20:12 even get portability across Nvidia 00:20:13 generations it's like you definitely 00:20:15 don't get portability off and and and 00:20:18 then you end up in this realm where okay 00:20:19 well you can't express the full power of 00:20:21 the hardware for your ASIC well suddenly 00:20:23 if you're building a chip it's really 00:20:25 quite problematic because giving up more 00:20:27 than 20% performance means your chip's 00:20:30 not competitive it's hard enough to keep 00:20:31 up with Nvidia on just flops and 00:20:34 features and things like this in the 00:20:35 hardware but if you're holding back the 00:20:36 hardware's potential based on software 00:20:38 like this it becomes an even bigger 00:20:40 challenge so yeah so I I what I learned 00:20:43 from this is really quite important 00:20:45 which is this is incredible validation 00:20:47 that CUDA C++ is too hard and I think 00:20:50 the world wants something that's much 00:20:51 easier to use and so I think this is it 00:20:53 has taught me a tremendous amount but 00:20:56 the flip side of it is we can't give up 00:20:58 what makes CUDA great and that's 00:21:01 performance what about ML well I'm also 00:21:04 a fan of MLR does anybody else like ML 00:21:06 here is this a good thing so so the blog 00:21:08 post talks about this because people 00:21:10 always ask me in particular like why 00:21:13 didn't MLR solve this like isn't isn't 00:21:15 it supposed to be the thing that solves 00:21:16 all this stuff and this is where I get 00:21:18 very sad because it's also very personal 00:21:21 and um and MLR is amazing i'm very 00:21:24 excited to see how wide it's gone and 00:21:26 like so many people use it for so many 00:21:27 things and it is the it is widely 00:21:31 successful across enabling a tremendous 00:21:33 number of chips i don't I know some of 00:21:34 them which is amazing i don't know all 00:21:36 of them but the problem is it doesn't 00:21:38 provide an endto-end AI stack and there 00:21:40 are things out in the ecosystem there's 00:21:42 Eerie and many of these things that have 00:21:44 come up and existed but they haven't 00:21:45 really solved this problem and what's 00:21:48 what happened is a lot of the ML 00:21:51 ecosystem tried to be tried to build a 00:21:53 better XLA is probably the easiest way 00:21:55 to explain it but XLA got left in the 00:21:58 dust by Genai and so a lot of the 00:22:01 technologies a lot of the dialects a lot 00:22:02 of stuff in the ml ecosystem is really 00:22:04 cool but it's really a it's like a 00:22:05 tinker toy set it's like a some assembly 00:22:07 required and so I think ML is an amazing 00:22:10 thing and it's a great I'm obviously 00:22:11 very proud of many of the people that 00:22:13 have helped make it possible but it's 00:22:16 not solving the CUDA problem it's 00:22:18 enabling people to build systems in in 00:22:20 the in this 00:22:21 space okay what about hardware companies 00:22:24 again uh hardware makers are awesome the 00:22:27 the challenge with bit innovating in 00:22:28 hardware is that the software teams are 00:22:31 just don't have a chance it's just so 00:22:33 hard to do this work and everything ends 00:22:36 up being stacked against you not only is 00:22:38 Nvidia seeding all of the software 00:22:40 ecosystem making it CUDA specific but 00:22:43 geni is changing so rapidly all the 00:22:45 technology is moving so fast there isn't 00:22:46 a thing really to build into it depends 00:22:48 on the company but hardware leadership 00:22:50 often doesn't understand how hard it is 00:22:51 to build software because the hardware 00:22:54 leadership again I don't know I don't 00:22:57 speak for every but hardware leaders 00:23:00 were built up were brought up you know 00:23:02 say 10 years ago or 15 years ago and 00:23:04 typically they're building a CPU or 00:23:06 they're building a networking thing 00:23:08 building some other ecosystem and if 00:23:10 you're building a CPU enabling software 00:23:13 means build a GCC or an LVM back end and 00:23:16 then I get Linux I get web browsers I 00:23:17 get 00:23:18 email I get I get everything with a 00:23:21 constrained investment in plugging into 00:23:23 a 00:23:24 thing ai doesn't work that way there is 00:23:28 and this is for everybody and so as a 00:23:30 consequence everybody ends up signing up 00:23:32 for this problem that seems like it's 00:23:34 easy to get a thing to work but then 00:23:35 when you get to the general case 00:23:37 suddenly it gets exponentially more 00:23:39 difficult of folks in the secret 00:23:44 so what if this is a long way of saying 00:23:47 this is what the world's been dealing 00:23:48 with and when we started modular we knew 00:23:51 a lot of this a lot of the details were 00:23:53 different but a lot of the lessons 00:23:55 learned I had experience with TPUs and 00:23:57 many other things and so what if we 00:24:00 could build this what if we could really 00:24:02 crack this what if we could change 00:24:04 change the trajectory what if we could 00:24:06 do something that could be a fundamental 00:24:08 contribution and help solve some of 00:24:09 these long-standing 00:24:11 problems I think this would be pretty 00:24:13 cool be a good thing but it's also 00:24:17 really hard right we can't expect to 00:24:21 just do the same thing that didn't work 00:24:22 before and have it work this time we 00:24:26 have to do something that's quite 00:24:28 different we have to change the we have 00:24:30 to redefine the rules and so this is 00:24:32 what again the crazy folks at Modular 00:24:34 have been doing for years is building 00:24:37 many of the different pieces and we've 00:24:38 been going through years of research 00:24:40 because it's pure research required to 00:24:43 be able to build new abstractions new 00:24:44 concepts new technologies new syntax new 00:24:47 things that all can come together that 00:24:48 we think can actually with this now 00:24:51 we're still very early right this is 00:24:53 this is not a small problem this is a 00:24:55 big problem have not and solve this 00:24:57 today but I am really really excited 00:25:00 that just this quarter the amount of 00:25:01 progress we've made just next quarter 00:25:03 what I know is going to happen and 00:25:05 through the rest of this year it's all 00:25:06 coming together and so I'm really happy 00:25:09 to talk about what this is what our 00:25:10 approach is and share this with you and 00:25:12 obviously if you're interested we'd love 00:25:13 to talk about it and so what is modular 00:25:17 building well we want to build a new 00:25:19 world without CUDA and so I love Nvidia 00:25:22 they're an amazing company their 00:25:24 hardware is amazing um the challenge I 00:25:26 see is with CUDA and and what I want to 00:25:29 see is something that can run really 00:25:30 well on Nvidia and run on other places 00:25:33 because that hardware is very important 00:25:35 to the world and so it's not really 00:25:38 Nvidia's fault it's it's just what CUDA 00:25:41 has done to us and so we want to go 00:25:42 build something that can help scale into 00:25:45 this how do we do this well this is a 00:25:48 big problem this is an industry scale 00:25:50 problem and so what we did was we broke 00:25:52 this down into pieces and so first 00:25:55 there's a language aspect so you run 00:25:57 squarely into this problem of domain 00:25:59 specific languages are difficult to use 00:26:02 they have bad tooling debugging all the 00:26:03 stuff that the Triton blog post thing 00:26:05 talked about but then on the other hand 00:26:07 you have all the existing languages like 00:26:09 you know Rust and Swift and C++ and 00:26:11 stuff like this they're not really 00:26:13 designed for GS compute some of them can 00:26:16 be used in some ways but they're not 00:26:18 designed for it and so one of the core 00:26:21 assumptions that we made is AI is 00:26:23 important and will gain an importance 00:26:24 and hardware is going to get more weird 00:26:26 and we need to be able to scale into all 00:26:28 of the things and so that squarely I 00:26:31 tried to resist this but I squarely ran 00:26:33 us into the language problem and then 00:26:36 dot dot dot I think what many of you 00:26:37 know is Mojo mojo comes out well so Mojo 00:26:41 is a really cool thing you'll see some 00:26:42 demos later um I'm very excited about 00:26:44 the pace that and the progress that 00:26:46 Mojo's made but what it enables you to 00:26:48 do is actually be able to get the full 00:26:50 power of the hardware have a development 00:26:52 experience be able to scale and run on 00:26:55 wide ranges of different devices and 00:26:57 it's designed for this heterogeneous 00:26:59 talk to weird tensor cores and all the 00:27:01 diff goofy things that hardware people 00:27:03 keep coming up with because that's 00:27:05 inherent to the problem and so it's not 00:27:09 solution but it's a really important 00:27:11 part of the solution and when we talk 00:27:12 about higher levels in the stack all of 00:27:14 the code that we use that uh runs on the 00:27:17 GPU is written in Mojo and this is what 00:27:19 we mean by not using CUDA so we don't 00:27:22 use CUDNN or vendor libraries or cutless 00:27:24 or like any any of that stuff we've 00:27:26 rebuilt that entire 00:27:27 ecosystem well if you take a step out 00:27:30 now you get to cool you can write GPU 00:27:32 kernels how do you actually use them 00:27:35 right and today I love people to write 00:27:37 GPU kernels and you'll learn more about 00:27:38 that later uh but but a lot of people 00:27:40 work at the AI layer right they work at 00:27:42 hey I have a matrix multiplications and 00:27:44 convolutions and I have a transformer i 00:27:46 have a a dense layer i I want these I 00:27:48 want a module this I want to think about 00:27:50 things in terms of AI and what we've 00:27:53 built is a very tiny be it's not fair to 00:27:56 call it an AI framework yet but it is a 00:28:00 very simple inference focused AI 00:28:02 framework in Python that allows you to 00:28:05 develop 00:28:07 models that are super predictable super 00:28:10 good control designed for Genai makes it 00:28:12 so we can solve these problems and get 00:28:14 the full power of the hardware which I 00:28:16 think is really really really important 00:28:18 now this framework again it's very early 00:28:20 and so it's it's exciting but still 00:28:22 still coming into its existence called 00:28:24 max has serving components it has the 00:28:26 kernels it has a bunch of different 00:28:28 things in this ecosystem but also gives 00:28:30 you an autofusing graph component you a 00:28:32 lot of the technologies that are quite 00:28:34 advanced and that even the existing uh 00:28:36 systems don't have yet which 00:28:38 dramatically simplifies it and designing 00:28:41 the abstractions this is why modular is 00:28:43 a project for three years was so key 00:28:45 designing the abstractions so you can 00:28:47 compose so that we to get this thing 00:28:48 simple at the bottom enables us to make 00:28:51 a scalable interface going 00:28:53 up uh we're just now bringing in even 00:28:57 bigger higher level pieces and so we've 00:28:59 only just started talking about this but 00:29:00 we have things like okay well once you 00:29:02 solve for the code that runs on AGPU and 00:29:05 now you start building pipelines and you 00:29:07 start doing the max very fancy KB cache 00:29:10 optimizations and like all this kind of 00:29:11 stuff well suddenly you get to the 00:29:13 problem of hey well I have 200 GPUs or 00:29:15 2,000 GPUs how do I them and so what our 00:29:18 technology is which we're not going to 00:29:20 talk much about today but if you curious 00:29:21 let's it is saying hey if you're an 00:29:24 enterprise and you have a lot of GPUs 00:29:25 and you're running a platform team and 00:29:27 need reliability you want your product 00:29:29 team to be able to innovate in models 00:29:31 and do stuff and then the platform team 00:29:33 shouldn't have to like chase AI research 00:29:34 they should be able to manage the GPUs 00:29:36 and so this a natural set of concentric 00:29:39 goals where everything builds together 00:29:40 and builds on top of each other and it's 00:29:42 actually uh very simple which 00:29:45 is so let's dig into each of these just 00:29:47 I'll give you the whirlwind tour so Mojo 00:29:49 assume that most of you are familiar 00:29:51 with this i'm not going to talk too much 00:29:52 about because Jack will talk about it 00:29:53 later well Mojo by definition hopefully 00:29:56 you know is a Pythonic language so it is 00:29:58 not Python but it is designed to be very 00:30:00 familiar to people who are are working 00:30:02 with this um by the way we Mojo Python 00:30:06 interrupt is coming really well and so 00:30:08 maybe in a month or something there will 00:30:10 be some really cool announcements about 00:30:11 this and so you can extend existing 00:30:13 Python Python packages in Mojo and make 00:30:16 that super easy but our focus has been 00:30:18 on GPUs right and this is the core 00:30:20 problem we need to solve which is we 00:30:22 need GPUs to go 00:30:23 fast advanced type system for example 00:30:26 Mojo has dependent types uh full power 00:30:29 of the hardware full access to ML full 00:30:31 power full access to like inline 00:30:32 assembly and tensor ptx and like all 00:30:35 this stuff but then it interoperates 00:30:38 with the existing world and this is 00:30:40 again the bottom of our stack but a 00:30:41 really really really important component 00:30:43 of this 00:30:44 um max I think I already talked about 00:30:46 this this is the framework level this 00:30:48 gives you something that's 00:30:50 VLM you get open compatible APIs if 00:30:53 you're serving but if you want you can 00:30:54 go down and say hey just run a subg 00:30:56 graph on a GPU and so you get a very 00:30:58 nice GPU programming experience um this 00:31:00 all you can use this in Mojo like the 00:31:02 preferred API is all Python the reason 00:31:05 we do this is just lots of people are 00:31:07 already used to technologies in space 00:31:08 using 00:31:09 Python mojo is required for getting GPU 00:31:13 per Python is fine for like I'm building 00:31:17 a graph and so we said okay we'll just 00:31:19 let's use Python for what it's good at 00:31:23 um by the way you say build a new 00:31:25 framework that sounds insane and it is 00:31:27 kind of insane but this is not something 00:31:29 where we have like ResNet 50 we're now 00:31:31 at the point where we have over 500 00:31:32 models and variants and things like this 00:31:34 in our native stack running on a wide v 00:31:37 variety of hardware which is pretty cool 00:31:39 and so we'll talk about where you can 00:31:40 find these later um deployment this is 00:31:43 also a pretty important thing we totally 00:31:45 believe in meeting people where they are 00:31:47 and so we make it super easy to get a 00:31:48 Docker container and we'll give you 00:31:50 preconfigured ones they're super small 00:31:53 this is one simple way to understand 00:31:54 what without CUDA means um these small 00:31:58 containers by the way it's kind of an 00:32:00 open secret please hush hush don't tell 00:32:01 anybody same binary runs on Nvidia and 00:32:04 AMD GPUs today we haven't announced that 00:32:07 please don't tell the world yet that's 00:32:09 actually a pretty big deal um and that's 00:32:11 that's a thing nobody's ever done before 00:32:13 right and so this is 00:32:16 very telling the world and telling the 00:32:19 story about this but this will have 00:32:22 professions so let's talk about license 00:32:25 so this is something I'll just be 00:32:26 straight with you i think we've done a 00:32:27 great job at 00:32:30 um I'll tell you the thinking but then 00:32:33 I'll also tell you why I'm very happy 00:32:34 it's time for new thinking so the 00:32:36 thinking was we're building a research 00:32:39 project it's turns out it took three 00:32:42 years to go from like that we have an 00:32:44 idea that we could build something great 00:32:45 to I believe it's working like the I 00:32:50 believe it's working moment was really 00:32:52 this December release that we called 00:32:54 246 which is where we could show we 00:32:56 could run state-of-the-art performance 00:32:57 on A100 with one model and the that 00:33:01 required the entire tech stack to line 00:33:02 up and work right and in that moment of 00:33:05 three years of development the decision 00:33:07 we made which you can agree is you can 00:33:10 agree or disagree is we said look the 00:33:12 risk is can we achieve our goal and so 00:33:15 we've kind of been wavering between do 00:33:17 we want people to adopt this not or is 00:33:19 it a distraction or is it helping us or 00:33:21 holding us back and I take full 00:33:24 ownership of this I've made bad 00:33:26 decisions and like we tried to keep the 00:33:28 license weird so it didn't grow too fast 00:33:30 and because if you grow too fast and you 00:33:32 get all these people and dependencies 00:33:33 and that's bad if you're changing it and 00:33:35 but we want to be open but super 00:33:38 confusing So we're fixing that and so 00:33:42 we're fixing it for a number of reasons 00:33:43 but the biggest reason is it's the right 00:33:45 time to do that and I'm happy i mean 00:33:48 obviously there's lots of I'm never 00:33:50 fully happy i think everything can 00:33:52 always be better but I'm I'm getting 00:33:54 happier with the uh we're really close 00:33:56 and so actually this is where I'm very 00:33:58 happy that we get to actually talk to 00:34:00 people and and do this and license and 00:34:02 availability 00:34:04 pieces also we've I've said this before 00:34:08 our plan is to open this stuff over 00:34:10 time so in about a week maybe two weeks 00:34:14 we're about to open about a quarter 00:34:16 million lines of Mojo 00:34:19 code which I think will be pretty 00:34:21 interesting and so this is all the all 00:34:25 the flash attentions and matrix 00:34:27 multiplications and all the high 00:34:28 performance kernels 00:34:30 for at least three 00:34:32 GPUs all open Apache license very 00:34:35 excited this will be I think a big 00:34:39 moment and I'm very very very 00:34:49 um this whole team doing this i'm so 00:34:49 proud of the team by the is like what 00:34:51 they're able to do is just ridiculous so 00:34:54 uh so this will be a big deal stay stay 00:34:56 tuned for this we're still going to open 00:34:58 source the Mojo compiler that's also 00:35:00 coming not quite there yet let's get 00:35:02 over this and see if it crushes the team 00:35:03 or not but uh but we're still on this on 00:35:07 track for this i think 2026 may be a 00:35:09 conservative but we'll see about that um 00:35:12 and we have a new license and so the the 00:35:15 license on Max has been super confusing 00:35:19 um and it's super confusing and let 00:35:22 before I tell you what we're doing I'll 00:35:24 tell you why it's super confusing early 00:35:27 when we first came out of stealth and we 00:35:29 were starting to talk about what we were 00:35:30 doing i this is me taking responsibility 00:35:33 i said is a technology we want to be 00:35:36 free and take over the world and people 00:35:38 to use it max is a commercial product 00:35:40 and like that's a different thing and 00:35:42 you know and we I I drew this line 00:35:45 between the two because that's what 00:35:47 we're thinking about it 18 months ago or 00:35:49 something like this but what we want is 00:35:51 we want everybody to use Mojo and Max 00:35:54 and so we can't do that if it has a 00:35:55 weird license so what we've done is 00:35:58 we've defined a new license which is way 00:36:01 better max community license or the 00:36:04 modular and so what it means is you can 00:36:08 go and use both Mojo and Max for 00:36:10 whatever you want for nonprouction 00:36:14 non-commercial use so if you're a 00:36:15 researcher all that stuff is super cool 00:36:18 oh by the way if you're production 00:36:20 commercial you can also use it for 00:36:22 anything you 00:36:23 want on x86 and Nvidia which is the only 00:36:27 hardware we support right now for other 00:36:29 things we'll figure it out we need to 00:36:30 actually do them first but we may relax 00:36:33 over time we're going to we're going to 00:36:34 be a little bit conservative that the 00:36:36 only requirement for commercial use is 00:36:38 please tell us and let us use your logo 00:36:41 it's like a small thing to ask scale it 00:36:44 on 10,000 Nvidia GPUs and do amazing 00:36:47 things that would be awesome just like 00:36:48 tell us so we know and so we can tell 00:36:50 our testers so that we bang for it and 00:36:52 do 00:36:54 stuff this is this is a big deal and so 00:36:58 what we're doing and just so you know is 00:37:00 we're trying to take away any of the 00:37:02 reason anybody would want to use 00:37:06 CUDA right so 00:37:08 CUDA proprietary closed closed source 00:37:12 kernels you cannot understand what's 00:37:14 happening if it breaks you get weird 00:37:16 stack 00:37:17 trace in two weeks uh the kernels will 00:37:20 be open source you can go look at them 00:37:22 you can change them you can enhance them 00:37:23 you can tell us that our engineers are 00:37:24 not as smart as your engineers or 00:37:26 whatever let's work together to fix this 00:37:28 and let's make this thing better if you 00:37:30 want to build a scalable LLM service and 00:37:33 compete like fine that you could do that 00:37:37 that would be annoying but this is the 00:37:38 way the world works but we want to see 00:37:40 the tech go far and wide and we want to 00:37:42 see what you can build with it and so 00:37:43 this I think is a really big moment for 00:37:46 us and to this has always been part of 00:37:49 the plan i don't think we've 00:37:50 communicated this well but this has 00:37:51 always been part of the plan but I'm 00:37:53 really really really excited we're we're 00:37:54 here and so this is a very special 00:37:56 moment for 00:37:57 me um if you're interested in using it 00:38:01 commercially and you want to scale we 00:38:03 have a very fancy enterprise edition and 00:38:05 so please let us know if you do have 00:38:07 hundreds or thousands of GPUs if you 00:38:10 want commercial support yeah you can pay 00:38:11 us for commercial support that's fine uh 00:38:13 we have other stuff that kind of goes in 00:38:15 this including the cluster management 00:38:17 other products like that but we want 00:38:19 this you can call it like the VLM and 00:38:22 down technology to be free for people to 00:38:25 be able to go use it and go do really 00:38:27 cool stuff and my my core hope my core 00:38:30 belief is that as we unlock this 00:38:32 industry as we break down these barriers 00:38:35 like new research will happen and I'm 00:38:37 really really really excited about 00:38:39 that okay so what does this mean so 00:38:42 there's lots of things another thing 00:38:43 that's about to drop is pip support this 00:38:46 is 00:38:47 another ongoing quest in terms of 00:38:49 figuring out packaging how we distribute 00:38:51 our stuff uh pip support i think this 00:38:54 command technically works if you know 00:38:56 the right URL you can bug other people 00:38:57 this is I don't actually understand how 00:38:59 pip works um uh this this is really 00:39:02 because now it integrates with all your 00:39:04 normal developer flows and all this kind 00:39:05 of stuff and so this is also another big 00:39:07 deal if you're interested in modu mo 00:39:09 models um we have this thing called 00:39:11 builds.mmodular.com this is where you 00:39:13 can find you know hundreds of models and 00:39:15 variants and things like this the cool 00:39:17 thing about this is you can click on 00:39:18 each of these things and then there's 00:39:20 like four four 00:39:22 commands run install this stuff and get 00:39:24 going this is the benefit of building a 00:39:26 vertical stack by the way again the my 00:39:29 enemy my number one arian nemesis is 00:39:32 complexity and so by getting right by 00:39:34 building this thing the right way we can 00:39:36 delete a lot of that complexity and make 00:39:38 stuff that actually just works and so 00:39:40 this is a lot of the value of what we're 00:39:41 doing here um there's also other things 00:39:44 recipes and other other examples can 00:39:46 follow the recipes are cool because you 00:39:48 can say "Hey I want to build an like 00:39:50 show me an example of a rag solution or 00:39:52 show me how to build an agent using this 00:39:55 stuff and we make it super easy." And 00:39:57 the cool thing about this is the stuff 00:39:59 plugs together really nicely and it 00:40:00 builds on standards like AI endpoint 00:40:03 standard um and you can get going really 00:40:05 quickly which 00:40:07 is so that is all I wanted to ramble 00:40:10 about i hope that's interesting you 00:40:12 teach a little bit about things that are 00:40:13 coming soon um we have obviously way way 00:40:15 way more stuff coming that'll come over 00:40:17 the next month but I'd love for Jack to 00:40:19 share yeah thank you so much Chris 00:40:31 uh before we let Chris go entirely uh 00:40:31 not getting away yet do I get Diet Coke 00:40:33 you do get a Diet Coke but you also get 00:40:35 some questions if there are any 00:40:37 questions in the audience yeah we'll 00:40:38 start over here 00:40:45 so you mentioned Triton is like 00:40:45 overoptimized for OpenAI's needs um do 00:40:49 you feel like you've made any 00:40:51 assumptions in the development of Mojo 00:40:53 that make it overoptimized for I mean 00:40:56 maybe it's overoptimized for AI and you 00:40:58 know I can't run molecular dynamics 00:41:01 simulations anymore um Mojo right now is 00:41:04 overoptimized 00:41:07 for let me be very honest with you sugar 00:41:09 coat it this is the problem with 00:41:11 building a language or something as 00:41:12 horizontal as Mojo is that um and again 00:41:16 I know this annoys people but um case 00:41:19 has been to build AI stuff it's really 00:41:23 hard and so what is constantly happened 00:41:25 is um there's several very smart mojo 00:41:28 engineers in the room here um uh you 00:41:32 know they're working on like okay I want 00:41:33 to 00:41:34 make lists work amazingly awesome and 00:41:37 fancy like Python and then they get 00:41:38 really pulled into how to split 00:41:40 compilation work again how do I make 00:41:42 compile time go 00:41:43 faster very fancy meta stuff for 00:41:46 tilebased layout okay we need to make 00:41:48 the compiler 10x faster right and so um 00:41:51 as far as I know there is nothing in the 00:41:53 software stack that is AI specific and 00:41:57 so if you want to do biochemistry or oil 00:41:58 and gas or something like this or game 00:42:00 of life like awesome go nuts you can 00:42:02 totally do that but the consequence of 00:42:05 our approach which by the way I think is 00:42:07 the right approach for building a 00:42:08 technology always dog food and build 00:42:10 something yourself is also why I wanted 00:42:11 in research mode so we could iterate 00:42:14 rapidly and be very focused um but the 00:42:16 consequence of that is that we haven't 00:42:18 built out all all of the features that 00:42:20 we want Mojo to have and so what you'll 00:42:22 see across even this year um allegedly 00:42:25 the Mojo team is publishing a road very 00:42:26 soon uh and what you'll see they're now 00:42:29 finally getting time to invest in 00:42:31 broader ecosystem enablement features 00:42:33 for example Mojo Python interrupting 00:42:34 things like this 00:42:42 hi uh thanks for Mojo it's really 00:42:42 awesome i love it so far um 00:42:45 uh when I run it on my Mac obviously it 00:42:48 doesn't have GPU support yet right so I 00:42:52 was trying to dig into how I could 00:42:54 possibly implement that myself i don't I 00:42:56 don't think those tools are available 00:42:58 yet is is that right that's correct so I 00:43:01 can't commit to anything but our hope is 00:43:03 that we'll enable that 00:43:05 summer so and then can help 00:43:14 hi this is Mahi uh we use Mojo because 00:43:14 primarily for the language I mean we 00:43:16 love it um it gives great productivity 00:43:20 uh we are an enduser app we run on iOS 00:43:22 and Android any future plans for edge 00:43:26 computing or iOS or Android um this is 00:43:30 outside my well so I mean I have some 00:43:32 experience with iOS but the um 00:43:36 uh I I I haven't spent a lot of time on 00:43:39 this the last few years um my 00:43:41 understanding is it runs fine on iOS and 00:43:43 Macs and people put on Raspberry Pies 00:43:45 and stuff like this but the best place 00:43:47 to ask is on this the discourse forum 00:43:50 forum the forum yeah forum that would be 00:43:52 the best place to ask 00:43:55 and there may be some weird API thing or 00:43:57 something that I don't know 00:44:06 hey Sean here uh I've got a question 00:44:06 about uh interop with native code beyond 00:44:08 CIS FFI uh in particular if you want to 00:44:12 reach into you know Rust ecosystem stuff 00:44:14 for like Apache Arrow or a bunch of the 00:44:16 great work that's been done there and 00:44:18 you want to like do a lot of data 00:44:19 processing and pull that in i'm curious 00:44:21 like where some of that stuff is going 00:44:23 uh so I I don't know of any work 00:44:25 specifically around Mojo Rust interop 00:44:28 that's super cool i'd love to to explore 00:44:30 that um Mojo does have good C interrupt 00:44:32 like you talked about the FFI module and 00:44:34 so I assume that Rust talks to C and has 00:44:38 this C decal type thing or something and 00:44:40 if that's the case mojo can talk to it 00:44:43 and like we can meet in the middle but I 00:44:45 don't I I don't know specifically how 00:44:46 that works um uh I also don't know how 00:44:49 the ownership systems getting them to 00:44:51 line up optimally would work but I would 00:44:53 suggest starting with like a uh like a a 00:44:57 glue generator or something that would 00:44:58 be an interesting way to just like 00:44:59 synthesize some bindings 00:45:11 uh Jeremy here i was wondering I mean 00:45:11 you've you mentioned a lot about GPUs 00:45:13 and things what about nonGPU 00:45:14 accelerators you know exotic data flow 00:45:17 architectures these sorts of things how 00:45:18 is that going to play into the Mojo 00:45:20 modular ecosystem um so uh I care I love 00:45:24 all the 00:45:25 chips and that includes things that are 00:45:29 way over my planning horizon like 00:45:30 quantum and analog and like I'm just 00:45:32 such a nerd right um 00:45:34 uh FPGAs like there's so many things out 00:45:36 there um the way to think about Mojo is 00:45:39 it's useful for anything that has a 00:45:40 program counter and so it's not going to 00:45:42 help you with an FPGA 00:45:44 could I mean you could do uh high level 00:45:47 synthesis type stuff theoretically I'm 00:45:49 not a fan of that and if you want I can 00:45:51 talk about it over beer um but uh so you 00:45:53 could theoretically do that but it's 00:45:54 really good for something with the 00:45:56 program counter and so if you have a 00:45:58 weird ASIC you have something that has 00:46:00 like a gigantic systolic array or you 00:46:02 have a a risk 5 core that's controlling 00:46:04 a video encoder like it's very useful 00:46:07 for that kind of stuff um we haven't 00:46:09 opened up the ability to add new 00:46:10 backends yet that's something we're 00:46:12 interested in doing but it's not like in 00:46:13 the next three months time horizon so 00:46:25 um you mentioned DeepSeek um I I've only 00:46:25 found one article that basically sort of 00:46:27 said they decoded the binaries and found 00:46:30 out you know there's a bunch of special 00:46:31 op codes that Nvidia sort of quasi 00:46:34 documents in the documentation so you 00:46:36 have to sort of give the the question is 00:46:38 that the Chinese programmers obviously 00:46:39 had some serious brains to take PDX and 00:46:41 say this stuff is crap we're going to 00:46:43 generate the real lower level machine 00:46:45 code so my question is anybody decode 00:46:47 all of the kernel binaries to find out 00:46:49 what what the translation that Nvidia 00:46:51 did that was sort of less than less than 00:46:53 optimal because effectively what you're 00:46:55 doing is you're saying these guys are 00:46:56 smarter than we were and they've sort of 00:46:58 done a better transformation with the 00:47:00 PDX into the raw hardware so we ought to 00:47:02 be able to I mean Nvidia ought to be 00:47:04 able to figure that out themselves and 00:47:05 say "Hey we they set the bar that we're 00:47:08 we're missing you know." Yeah so So the 00:47:10 lesson I take from Deepseek is slightly 00:47:12 different than what you're saying i 00:47:14 think what I what I hear you say is that 00:47:15 you're saying uh that the Deepseek 00:47:19 programmers figured out how to use 00:47:20 Nvidia's hardware better than Nvidia did 00:47:23 right but that's I don't think that's 00:47:24 actually what happened what actually 00:47:26 happened is they invented new algorithms 00:47:29 that Nvidia hadn't hardcoded into CUDA 00:47:31 yet right and so because they invented 00:47:33 new algorithms they had to go to the 00:47:34 same primitives that Nvidia and many 00:47:36 other Cutless programmers and other 00:47:38 things like this had to use and so the 00:47:40 problem with the the CUDA thing is that 00:47:43 again the Nvidia approach is saying like 00:47:45 find the important use case and then 00:47:46 hardcode it into the libraries and so 00:47:48 that only works if you're chasing 00:47:50 research that's not helping if you're 00:47:52 doing research and so this is where if 00:47:54 you go again just a week or so you can 00:47:56 go pour over more mojo code than you 00:47:59 ever wanted to know by the way please 00:48:00 train your LLMs on it um the uh you can 00:48:04 go look and we go directly PTX also 00:48:06 because you have to to get the highest 00:48:07 performance and so it's not I don't 00:48:09 think that that was a novel thing it was 00:48:10 more about 00:48:22 did say oh there's these special op 00:48:22 codes that won't 00:48:25 slow somebody said the paper also did 00:48:27 combination algorithm but 00:48:41 also you're beyond my knowledge uh so I 00:48:41 I'm I'm going to earn many things i 00:48:43 don't know the full details of what the 00:48:44 DeepSeek team did um but again there 00:48:47 there's no magic my my my fundamental 00:48:49 philosophy is there's no magic right and 00:48:51 so there is an optimal set of 00:48:53 instructions that will make any 00:48:54 individual chip go fast and so you know 00:48:57 you can write theoretically everything 00:48:59 in assembly and if you care about 00:49:00 exactly one point solution you can do 00:49:03 that but that's not what AI is what AI 00:49:06 is is an ever evolving set of use cases 00:49:08 there's an evolving set of algorithms 00:49:10 there's like research innovations 00:49:12 happening like every day there's a new 00:49:14 paper and so I think the the challenge 00:49:16 to me the way I define success is making 00:49:18 it so you can get the full power of what 00:49:20 for example a deepseek engineer did but 00:49:23 that you can then get the software 00:49:25 engineering benefit and the scalability 00:49:26 and the leverage of compilers and of 00:49:29 abstraction and of software engineering 00:49:32 so that you're not having to literally 00:49:34 write everything in assembly language um 00:49:36 if you go back to me making fun of 00:49:38 compiler engineers for making fun of 00:49:40 kernel engineers like my whole 00:49:43 philosophy and the philosophy with Mojo 00:49:44 is to say look kernel engineers are 00:49:47 amazing they should be empowered with 00:49:50 compiler algorithms and so what Mojo 00:49:52 does is says let's take much of the 00:49:54 algorithmic smarts and put it back in 00:49:56 the kernels and then allow the kernel 00:49:59 engineers to have superpowers so they 00:50:00 get the power of generalization they get 00:50:02 power of meta programming they get the 00:50:03 power of what compilers will get at and 00:50:06 it's in air quote user space it's in the 00:50:07 Mojo source code instead of being hacked 00:50:10 and hardcoded into the compiler and so I 00:50:12 love compilers i love compiler engineers 00:50:14 obviously right but the the the way to 00:50:16 factor this world is to make it so that 00:50:18 you have more people that can 00:50:19 participate in the space and make it so 00:50:21 that their work can scale i think that's 00:50:23 fundamentally the contribution of what 00:50:25 we're building 00:50:31 did you 00:50:31 ever uh can you elaborate on the 00:50:34 self-hosted K8s in your commercial 00:50:38 license self-hosted which uh you had a 00:50:41 clause about self-hosted uh K8s 00:50:45 I'm assuming Kubernetes uh yeah so I 00:50:47 mean our general approach is that we 00:50:49 like you can use for free the kind of 00:50:51 VLM or TRTLM subset of the stack and so 00:50:54 you can deploy it yourself on your own 00:50:56 Kubernetes thing and scale and do do 00:50:57 whatever we can make that much easier 00:50:59 for you if you want to use our 00:51:01 management layer and our cluster 00:51:02 solution 00:51:04 y um I have a question about the 00:51:07 abstraction levels So in your slides you 00:51:10 mentioned um OpenAI Triton library um in 00:51:14 that case you were saying it's not as 00:51:16 fast as CUDA but um Triton is actually 00:51:19 I'm sure you know maybe better than me 00:51:21 but um sort of at least the Triton I 00:51:24 wrote is quite low level much more much 00:51:26 more lowle um it's actually more low 00:51:28 level than CUDA in a way and um first of 00:51:31 all I was surprised there's a gap uh so 00:51:33 this is something I need to learn more 00:51:35 but um second um u and excuse me if you 00:51:39 explain that but sure do you think 00:51:41 um module is higher level abstraction 00:51:44 level or lower than um an open eye 00:51:48 triton and if it's higher how could it 00:51:51 be faster great question so this and 00:51:54 this this is this is the secret okay so 00:51:58 so I'll try to explain this in a 00:51:59 slightly different way yeah I will write 00:52:01 more blog posts time permitting but uh I 00:52:05 did not I mean it's turning to an epic 00:52:06 book at this point but the uh um so the 00:52:10 the problem with Triton if I the problem 00:52:12 with the Triton approach because it's 00:52:14 it's not just Triton there's many of 00:52:15 these systems and there's many DSLs 00:52:18 there's like tons of them but the 00:52:20 problem is they're saying let me give 00:52:22 you a programming model so the tabbased 00:52:24 programming model that Triton has let me 00:52:27 provide a very fancy compiler that does 00:52:29 loop pipelining or whatever set of 00:52:31 transformations that it does and then 00:52:33 let me expose that to you as a 00:52:35 programmer to make a common case simpler 00:52:38 and so Triton's really nice if you're 00:52:39 writing attention blocks like the vast 00:52:42 majority of what Triton gets used for is 00:52:43 writing attention blocks um but there's 00:52:46 a problem because the compiler goes and 00:52:48 does stuff and sometimes does the right 00:52:50 stuff sometimes it doesn't sometimes 00:52:52 performance is good sometimes it's not 00:52:53 maybe you're an expert i don't know the 00:52:55 20% number came from Phil Tole who's the 00:52:57 author of Triton and so I was not trying 00:52:59 to make an innovative claim I was just 00:53:02 trying to repeat what he has said 00:53:04 publicly um but Mojo has a fundamentally 00:53:07 different approach and so Mojo the 00:53:10 language is lower level and so you can 00:53:13 write thread index.x X you can write all 00:53:15 the way down to if you're building an 00:53:17 ASIC you can write assembly go into the 00:53:19 whatever your tensor core is or like you 00:53:21 get full power of the hardware so it's a 00:53:24 low-level 00:53:25 language but then it has a very powerful 00:53:27 metaroming system and so the 00:53:29 metarogramming system allows you to 00:53:31 build for example tile-based algorithms 00:53:35 in Mojo code and so it's not hardcoded 00:53:38 into the compiler and so the thing that 00:53:40 we open source not only does it include 00:53:42 all the kernels but it has all of our 00:53:44 abstractions for building kernels and so 00:53:46 we have a thing called layout tensor 00:53:47 layout tensor is roughly the API used to 00:53:49 talk to tensor cores and it talks to 00:53:51 multiple tensor cores that have 00:53:52 different layouts and it has an index 00:53:53 algebra and it has a whole very fancy 00:53:56 set of abstractions they're all built in 00:53:59 Mojo so they're libraries which means 00:54:02 that if you want to you can totally use 00:54:04 those abstractions they're very nice i 00:54:06 think they should be better but you know 00:54:08 we'd love help uh but the uh but they're 00:54:10 getting there they're they're they're 00:54:12 not done they're not perfect they're but 00:54:14 they're they're getting close enough for 00:54:15 us to publish um but if you don't like 00:54:17 them ignore them throw them away it's a 00:54:20 library and so this means that you can 00:54:22 write you know array tracer you could 00:54:24 write bioinformatics you can write 00:54:26 whatever not just transformer blocks you 00:54:28 can write anything you want and you have 00:54:29 the full power of a GPU but if you like 00:54:32 the abstractions for tilebased 00:54:33 programming you can totally use them you 00:54:35 get high expressivity and a lot of power 00:54:37 and portability coarser grain and things 00:54:40 like this and that that's the secret 00:54:42 right is is about taking it out of the 00:54:44 compiler tiles are not in our compiler 00:54:46 at all putting it in the library 00:54:55 um you can cross compile yes the the 00:54:55 again it's not obvious usually you need 00:54:57 to like for survey you need to actually 00:55:00 get 00:55:06 p and compile them so it's actually kind 00:55:06 of cool just the only thing you have you 00:55:09 can compile say on a normal non GPU that 00:55:12 run this slow well seriously like have 00:55:16 you ever heard of have you ever heard of 00:55:17 one binary that runs on both an H100 and 00:55:20 A100 and an MI300 00:55:30 what what what what 00:55:30 what I what I what I love right now is 00:55:32 that I see virtual brains splatter 00:55:34 against the wall because your head just 00:55:36 exploded so but but again this is and by 00:55:39 the way 00:55:40 all all you try it yourself so this is 00:55:45 like great claims can be justified with 00:55:47 actual proof but um uh but for example 00:55:51 all of our kernels are jitted because 00:55:54 they're all jitted this means that yes 00:55:56 you get cross compilation and the way 00:55:58 mojo works was designed again this is 00:55:59 the mad science that it took to build 00:56:01 this stuff what's 00:56:04 that question no warm up is fast i think 00:56:07 again it depends on the model but if you 00:56:09 build like a llama 8B model which is I 00:56:12 don't know a thousand plus ops in the 00:56:15 graph something like this and you're 00:56:16 jitting it it jits and fuses down to 80 00:56:19 odd CUDA launches or kernel launches 00:56:22 something like this the whole thing end 00:56:24 to end jitting takes about a minute 00:56:27 decodegen makes hundreds of thousands of 00:56:29 PTX completely novel kernels 00:56:33 like seriously built entirely new 00:56:36 language the whole thing is paralyzed 00:56:38 like work around all the problems with 00:56:40 LVM this is not easy took a lot of work 00:56:42 from a lot of very smart people so but I 00:56:44 love the brains the brains splattered is 00:56:47 that's what we're going for so 00:56:50 take uh two more questions and then 00:56:51 we're gonna let Chris enjoy the rest of 00:56:53 his Diet Coke um All right all right I'm 00:56:56 going to try to pass this back here 00:57:00 you mentioned dependent types uh did 00:57:02 that grow out of something specific is 00:57:04 that just helping to keep the meta 00:57:05 programming sort of structured somehow 00:57:07 what what's going on there yeah so 00:57:09 dependent types for those that don't 00:57:11 know is that it allows you to build uh 00:57:14 type computation computation in the type 00:57:17 domain basically um so Mojo has a very 00:57:21 powerful metaroming system it's not 00:57:23 completely novel it's directly inspired 00:57:26 by Zigg and so Zigg has they call it 00:57:29 comp time and so it allows you to use 00:57:31 arbitrary runtime code or nearly 00:57:33 arbitrary runtime code at compile time 00:57:35 and it unifies the language in the meta 00:57:37 language and so Mojo does literally that 00:57:39 unifies the language of the meta 00:57:40 language making the whole thing simpler 00:57:42 make it way faster etc etc etc the 00:57:45 consequence is now you can use arbitrary 00:57:47 uh what we call parameters in type type 00:57:49 domain and so um it may blow your mind 00:57:52 but go look at the nightlys um integer 00:57:56 literals for example do like all of 00:58:00 their work in type algebra and so they 00:58:03 can be infinite precision those infinite 00:58:06 precision literals that even for 00:58:08 floating point which is really cool uh 00:58:10 again very mad science stuff that most 00:58:12 people don't know anything about which 00:58:14 is fine but um but that then gets 00:58:16 materialized into different integer 00:58:18 widths and SIMD types and stuff like 00:58:20 this and it's all made possible by by 00:58:22 this kind of stuff and so again like 00:58:24 this is stuff that uh innovation in PL 00:58:27 is programming languages is not 00:58:28 something that lots of people talk about 00:58:30 but it's actually um you know it's not 00:58:33 about syntax it's about being able to 00:58:34 express powerful libraries and this this 00:58:37 to me is the the key 00:58:39 So and not having to wait for C++ 00:58:41 templates to compile 00:58:50 both it takes an hour to compile just 00:58:50 the flash attention 00:58:52 repository it's Hey Chris uh thanks for 00:58:55 the talk and answering all our questions 00:58:57 um so I work at a hardware vendor and 00:58:59 you know we're 5 CPU so we're not ARM or 00:59:01 Intel uh and then you know we have our 00:59:03 own accelerator so we're not AMD or 00:59:04 Nvidia y so when is like a good time for 00:59:06 us to meet with you know you guys to 00:59:08 build our own back end uh you know for 00:59:11 Mojo because you know I was at the talk 00:59:13 I think a year ago now in San Francisco 00:59:15 and you know obviously that was way too 00:59:16 early you know GPU was so early then uh 00:59:19 but you know I'd love to get this in our 00:59:20 hands internally because I feel like it 00:59:22 would accelerate a lot of our kernel 00:59:23 engineers um and a lot of the work we do 00:59:26 yeah amazing uh so the answer is 00:59:28 probably this fall is my best guess so 00:59:32 um I I'll share my problem with you 00:59:35 which is we really want to get this it's 00:59:38 like I was talking about earlier we 00:59:39 really want to get this in everybody's 00:59:40 hands but we also want to do things in 00:59:41 the right steps and so I'm really 00:59:44 worried about open sourcing kernels by 00:59:46 the way i'll share my anxiety uh we have 00:59:48 a small team i think that the stuff 00:59:50 we're doing is really interesting i 00:59:51 don't want them to get completely 00:59:53 annihilated 00:59:55 what we're doing is we're putting 00:59:58 the um I 01:00:01 think can't guarantee that we'll have 01:00:04 the right answer there but it 01:00:05 feels ballpark to where we could do some 01:00:08 interesting things together potentially 01:00:10 so thank you so much how about this 01:00:13 definitely not until fall so so maybe 01:00:16 fall but definitely not until then all 01:00:18 right let's give another round of 01:00:20 applause for Chris 01:00:28 so if you have any other burning 01:00:28 questions for Chris he will be around 01:00:29 during the networking portion of 01:00:31 tonight's event or you can always feel 01:00:33 free to drop your questions in our 01:00:35 community forum at 01:00:37 forum.mmodular.com the team is 01:00:39 constantly monitoring chris himself 01:00:41 responds to a lot of the stuff there so 01:00:43 definitely a great place to share your 01:00:44 questions uh up next we have Jack 01:00:47 Clayton Mojo standard library engineer 01:00:49 who is going to do a live demo for us of 01:00:51 GPU programming with 01:01:03 Mojo testing all right cool um so yeah 01:01:03 we've gone over time a little bit so I'm 01:01:05 going to rush through it a little can I 01:01:07 can I just get a round of hands um who 01:01:09 has 01:01:11 done how many people have done GPU 01:01:13 programming with CUDA like written a 01:01:15 CUDA kernel 01:01:16 before okay sweet and has anyone written 01:01:19 Mojo 01:01:21 here cool all right so I'll just give 01:01:25 brief give a brief explanation of GPU 01:01:29 programming as I go but this is this is 01:01:31 all Mojo code so it's just a mandler 01:01:34 kernel you can see there's some 01:01:34 parameters up the top here um so alias 01:01:37 means it's a 01:01:39 parameter uh yeah 01:01:41 sure how's 01:01:49 that um yeah so the this device context 01:01:49 this is getting a device context for the 01:01:51 GPU so I'm on an Nvidia machine at the 01:01:53 moment you can see when I do this I got 01:01:56 a uh A100 that I'm running on here um 01:01:59 and then this device buffer it's 01:02:00 creating it's in queuing creating a 01:02:02 device buffer um using the size of this 01:02:04 layout so you can see as Chris was 01:02:06 talking about before we have this 01:02:08 parameter system so um it dynamically uh 01:02:11 sorry like at compile time it's actually 01:02:13 getting these figures so you can see 01:02:14 that the mount brought grid size is 25 01:02:16 by 60 um so it's it's creating a buffer 01:02:20 of that size and then wrapping it in a 01:02:21 layout tensor um the layout tensor is a 01:02:23 really powerful abstraction uh based on 01:02:26 the cute library which Nvidia has so it 01:02:28 allows you to do a lot of powerful 01:02:29 things um and then this uh is in queuing 01:02:32 the function mandle brought to run so if 01:02:35 you look down here this is a mandelro 01:02:36 kernel so this is like what you would do 01:02:38 if you were programming a CUDA kernel um 01:02:41 I won't go over like how how mandelro 01:02:43 works but basically you can see this is 01:02:45 like a getting the thread index um so 01:02:48 it's getting the index on the y index 01:02:49 and the x index um and then it's doing 01:02:51 the mandelro uh uh algorithm and just 01:02:55 putting it back into the tensor um and 01:02:57 then at the end it just draws the 01:02:58 moundro so if I run this 01:03:15 Uh you can see the mount bro come back 01:03:15 so uh yeah that's running on GPU um and 01:03:20 then something really cool that you can 01:03:21 do you can actually see the PTX that's 01:03:24 being generated so if I go in here and 01:03:26 dump the uh assembly 01:03:34 And this is running on a remote instance 01:03:34 with a with an Nvidia 01:03:41 GPU that dumps the assembly here so if I 01:03:41 open it 01:03:42 up uh you can see what the PTX looks 01:03:45 like so you can see the actual 01:03:46 instructions that it's generating um and 01:03:48 it's because it's just a mandelro kernel 01:03:50 it's like it's quite simple um so yeah 01:03:53 this is running directly from Mojo so 01:03:54 you can create like a binary and just 01:03:55 run things eagerly but then we can also 01:03:57 do things with Python so if you want to 01:03:59 create uh custom ops that you can run 01:04:02 with the max inference engine uh then 01:04:05 you can create your own custom ops so an 01:04:08 example is this uh grayscale uh I'll 01:04:10 bring this 01:04:17 up so you can see this is the Mojo 01:04:17 grayscale 01:04:23 uh where it's doing the same thing where 01:04:23 it's in queuing the the function to run 01:04:25 so it's incuing the GPU kernel to run um 01:04:27 the block dims is setting out by 16 grid 01:04:30 there and then the grid dims is 01:04:33 uh using the ceiling to create like a 01:04:36 grid across the entire dimension of the 01:04:38 image um so what we're going to be doing 01:04:40 here we got this image of two dogs um so 01:04:44 that's been loaded in by uh by IMRI and 01:04:48 that's just a numpy tensor um and then 01:04:51 we're defining the tensor shape so we 01:04:52 have a color tensor and a gray tensor uh 01:04:55 the input is the color tensor um which 01:04:57 is the image this image here of the two 01:04:59 dogs and then the ops we have a 01:05:01 grayscale a brightness and a blur um so 01:05:04 these are just wrapper functions to make 01:05:06 building the graph a bit simpler um so 01:05:08 the data goes to the GPU it sits on the 01:05:10 GPU runs through one op at a time and 01:05:12 then it gets copied back to the CPU and 01:05:14 written to the device um and you can see 01:05:17 that code down here and it'll write out 01:05:18 to uh that one there 01:05:21 so if we go into that folder um and then 01:05:24 magic run image pipeline so magic is the 01:05:26 CLI tool that allows you to use Mojo and 01:05:28 get all the access to GPU programming um 01:05:31 from Python as well 01:05:50 um just on the first run it's just 01:05:50 compiling the the kernels so the three 01:05:53 different uh kernels for the custom ops 01:05:56 and then we can bring up the image so we 01:05:59 got dogs 01:06:01 out and you can see that's just 01:06:04 uh it's just blurred them turn them into 01:06:07 grayscale and brightened up the image a 01:06:08 bit um so you could like you want to 01:06:10 inject this into a into an AI pipeline 01:06:12 where lots of people will eagerly 01:06:20 um you can actually put this into the 01:06:20 pipeline so that you know you get a lot 01:06:22 of better performance across different 01:06:24 batch sizes and things like that um and 01:06:27 I'll just quickly show you through the 01:06:28 the kernels how they work 01:06:31 uh so bring this 01:06:44 yeah the grayscale one here so you can 01:06:44 see that it's just looping like it's got 01:06:48 Um and it's just getting 01:07:17 what's that sorry oh the comma oh yeah i 01:07:17 deleted it thank 01:07:23 you uh yeah so that that takes an input 01:07:23 image with color and then outputs a 01:07:25 grayscale image um and then the next two 01:07:27 kernels so this brightness one is using 01:07:29 like a sort of higher level abstraction 01:07:31 using this for each um so for each runs 01:07:34 across its CPU and GPU so uh like this 01:07:37 this one up the top here is just running 01:07:38 on GPU using Q function but the the CPU 01:07:42 version can run on CPU or GPU um so I 01:07:45 put a challenge in here if anyone like 01:07:46 wants to go into the repo um I'll show a 01:07:49 link at the end if you can convert this 01:07:51 into using a for each loop so it runs on 01:07:53 CPU and GPU you get some special Mojo 01:07:55 swag which we'll we'll send out to 01:07:57 you and then the blur kernel down the 01:08:00 bottom here so this is like a little bit 01:08:01 more complicated you can't really use a 01:08:03 4 each for this um because it's taking 01:08:05 the eight surrounding pixels and then 01:08:06 doing a doing a getting the average of 01:08:08 them so it can blur across the entire 01:08:10 image um but yeah this is just an 01:08:12 example of a slightly more complicated 01:08:14 kernel um and like Chris said like we'll 01:08:16 be open sourcing all the all the very 01:08:18 complicated kernels um soon so you can 01:08:20 see like how to do like you know super 01:08:22 lowle performance type things um and 01:08:25 there is an example here of uh of top 01:08:28 kum and this is using like a low-level 01:08:30 operation using uh warps so shuffling 01:08:33 down sharing data between threads in on 01:08:35 a GPU within the same warp um so for any 01:08:38 uh CUDA um engineers you'll you'll be 01:08:41 familiar with that and uh yeah Mojo's 01:08:43 fully capable of doing all the low-level 01:08:45 um things that you expect um but the the 01:08:48 very special thing about Mojo is that um 01:08:50 this hasn't been like announced publicly 01:08:52 yet but all of this that I've shown you 01:08:55 actually works on AMD GPUs as well so 01:08:57 this is an AMD instance here um and it 01:09:00 does the same same thing so I can run 01:09:04 the Mandelro 01:09:13 Um yeah and you see it running the metal 01:09:13 brought there and then you can actually 01:09:14 see the the PTX as well uh not the PTX 01:09:17 sorry but the the 01:09:20 assembly um so yeah you can see the 01:09:22 machine instructions for the AMD side uh 01:09:25 so you can get a comparison about like 01:09:26 you know what what the difference is 01:09:27 between Nvidia and AMD um and yeah it's 01:09:31 been quite amazing really because uh 01:09:32 like low-level stuff like warp shuffles 01:09:35 um I didn't expect it to work but when I 01:09:36 just tried to run it on AMD DG GPUs it 01:09:39 just just worked out of the box um and 01:09:41 same with all that image pipeline stuff 01:09:43 just worked um after the first 01:09:45 try so if you're interested in doing 01:09:47 this kind of stuff um I have these three 01:09:49 links here so I'll zoom in on that um 01:09:52 you have this GPU basics guide which is 01:09:54 just like introducing GPU programming uh 01:09:57 it gives you like a kind of rundown of 01:09:59 what GPU programming is um but it's 01:10:01 still in the early stages there's only 01:10:02 chapter one there so far uh I'll just 01:10:05 show you it and if you any problems we 01:10:08 have this um forum thread so if you have 01:10:11 any problems with the with the tutorial 01:10:12 you just post your questions or feedback 01:10:15 there that would be great and this is 01:10:18 the repo with the eager GPU functions 01:10:21 that you can just run from Mojo um so 01:10:23 yeah you can make contributions there 01:10:27 and this custom ops one um so tomorrow 01:10:30 the image pipeline will be there with 01:10:31 the two challenges if you can get 01:10:33 everything running on CPU as well as GPU 01:10:35 uh you get some special Mojo swag um and 01:10:38 that image pipeline will be there 01:10:44 tomorrow um so that's essentially it uh 01:10:44 we we have lots of good stuff coming out 01:10:46 like we realize how teach the world how 01:10:48 to program GPUs with Mojo um so there's 01:10:51 lots of stuff coming down um things 01:10:54 similar to like you know uh uh GPU 01:10:56 puzzles and extending the manual and 01:10:59 releasing all the open source co code so 01:11:01 that everyone can get up to speed uh 01:11:02 with programming GPUs with 01:11:05 Mojo and that's it is any questions 01:11:28 Oh thank you Jack um so I'm Todd 01:11:28 um we mentioned Nvidia and AMD today and 01:11:35 there was kind of a side mention of 01:11:37 Apple but I have an Apple NPU in this 01:11:42 and so have you had any experience 01:11:47 we we haven't started work on on Apple 01:11:49 Metal yet uh but Chris was saying like 01:11:52 towards the end of the year in the 01:11:53 summer um that we'll be opening it up so 01:11:56 people can start working on that 01:11:57 themselves uh and then they'll be able 01:11:59 to contribute and get get things running 01:12:01 on Apple Metal just like it runs on AMD 01:12:03 and Nvidia so yeah that'll be coming at 01:12:06 a later date 01:12:09 thank you any other questions for Jack 01:12:13 what was the AMD model or the CPU sorry 01:12:16 GPU oh yeah MI3 MI3 30 300 300X yeah so 01:12:21 I got it here rocking 01:12:25 SMI uh it doesn't show the model number 01:12:27 here um but yeah it's an 01:12:30 MI300X no worries yeah I keep these red 01:12:33 bars and green bars so that I know what 01:12:35 instance I'm on uh otherwise I get 01:12:36 confused 01:12:38 um 01:12:46 uh at the moment no collab is something 01:12:46 that people are looking at but um so 01:12:55 yeah you need access to to GPUs at the 01:12:56 moment yeah but we we want to fix that 01:12:57 um so people are looking at Google 01:12:59 collab integration 01:13:02 or something similar yeah so quick 01:13:04 question when you showed the pipeline it 01:13:05 reminded me of two things one was a 01:13:07 Gstreamer pipeline kind of situation 01:13:10 where you build like end to end and 01:13:11 Nvidia does support that thing um and 01:13:14 the other was basically a graphics 01:13:15 pipeline right so do you think it's easy 01:13:19 enough with where it stands in the Mojo 01:13:21 GPU library you can do some basic 01:13:23 graphical stuff as well as something 01:13:25 like build an application for like maybe 01:13:27 video realtime video streaming have you 01:13:29 guys played around with that uh no like 01:13:32 so I haven't seen anyone play around 01:13:33 with uh video yet but um it's you know 01:13:36 definitely like all the tools are there 01:13:38 like you have the access to all the 01:13:39 low-level stuff you do um through CUDA 01:13:42 so it's just a matter of someone that's 01:13:44 like motivated and wants to get that 01:13:45 working they they could do that um so 01:13:47 hopefully as we start releasing more 01:13:49 train like learning materials people 01:13:51 will go out and build that kind of stuff 01:13:52 because you know we're really excited to 01:13:53 see 01:13:54 that any other questions for 01:14:02 Jack no all right let's give another 01:14:02 hand thank you thank you Jack 01:14:11 that was awesome 01:14:11 so before we kick off the networking 01:14:13 portion of the night I just want to give 01:14:15 a chance for anyone who's hiring in the 01:14:16 audience if you want a moment with the 01:14:18 mic to share what you're hiring for so 01:14:21 folks can find you uh when we kick off 01:14:24 the the mingling portion no one's hiring 01:14:28 we're hiring there are a few QR codes 01:14:31 scattered around here uh there are also 01:14:34 printouts of four different roles that 01:14:36 we're hiring for up at the front desk so 01:14:39 definitely check those out um and if you 01:14:41 had a good time with us tonight we also 01:14:44 have a an upcoming uh modular GPU kernel 01:14:47 hackathon at AGI House in Hillsboro so 01:14:51 you can scan this QR code right here to 01:14:55 attend uh you can 01:14:58 also you can also scan that QR code to 01:15:01 join our community forum drop any 01:15:04 questions for the team there that's the