313 I'm sure they'll wait for that dude to finish
Paul had a blast at HellmouthCon! The boys talk about DubDub, and Paul is worried about his NAS. Drew gives an update on his local LLM setup and his never ending quest for more VRAM.
Listen to This Episode
Download Episode MP3 (44.0 MB)
This transcript was generated by AI and may contain errors.
Drew 00:00
Hello.
Paul 00:00
Hello. Oh.
Drew 00:02
Buddy, I have been craving this podcast. Okay.
Paul 00:07
Craving this podcast to hear about your big trip. Yeah, uh I mean let's not bury the lead there. Uh my daughter and I went to Los Angeles to go to HellmouthCon. And it was awesome, Drew. It was awesome.
Drew 00:24
It was awesome.
Paul 00:25
That's what I really wanted to know.
Drew 00:27
I didn't want you to be disappointed.
Paul 00:28
No, it was good. It was good. It was a good time.
Drew 00:31
Okay, so walk walk me through it. Give me the whole like give me like everything you guys did. Okay. Uh It's your camera's following your head. It's great. It's pretty cool, right?
Paul 00:46
Fancy. Yeah. Uh so anywho, uh it was held at Torrance High School, which is where part of the like first three episodes, first three seasons, excuse me, were filmed It's also the high school they use for like Beverly Hills Nighter 210. It's a very h famous high school. They use it for a lot of exterior shots and things like that. Uh we stayed like within walking distance of the high school, so we got up and it was Saturday and Sunday. We went uh it was a little very disorganized at first first like they could have used like somebody in charge like getting our getting our badges and getting our wristbands were just it was just it was a that was a nightmare uh but once we got in uh We basically stuck to the the the cast member panels, which are like held on the auditorium and they'd have a couple people from the cast and they would ask them questions and open up the fan questions. Uh all of those were really, really good. Uh we had also paid for a photo op on Saturday with Charisma Carpenter. Uh, Chrisma Carpenter plays Cordelia Chase. Chris McCain's daughter's favorite character. And who she's named after. Her father's middle name is Cordelia. So we got to we got a photo up with her. Uh Lots we bought some but they had a pretty good size like little like vendor hall too people selling stuff anywhere from like actual like buffy collectibles. I got a couple things that have been on my list to get for a while
Drew 02:20
Okay. Alright, pause there.
Paul 02:22
Mm-hmm.
Drew 02:22
What kind of merch?
Paul 02:24
Okay, so it's very much for nerds, right? So it's there was some booth there was like one booth in particular and all they had was like Buffy like Like things that were sold when Buffy was popular, right? So they had like calendars and action figures. I bought a lunchbox from them. Uh um like the old school like metal like Buffy the Vampire Slayer lunchbox. Uh they had some people had made like handcrafted things. Like one table was selling like crocheted versions of the characters, uh lots of like buffy themed jewelry. There was also just kind of just general like Go to an art show kind of crafts there for people to buy. Uh there was uh one booth that was selling all Firefly stuff, also a Josh Whedon show. Uh So, yeah, just lots of just random things. There was a couple authors there that were like vaguely, you know, monster-related vampire books kind of stuff. Uh I mean it wasn't huge. I think maybe like there was like I'm gonna say somewhere between like five and eight hundred people there. Uh so uh So Saturday, yeah, we did some panels. Uh we did the photo opportunity uh with Charisma Carpenter. The highlight of Saturday was uh So one of the episodes of Buffy the Vampire Slayer is called Once More With Feeling. It is a musical episode. There there were some actors there that did a live performance of Once More With Feeling. And it was It was incredible. Just absolutely incredible. I loved every second of it. I was smiling ear to ear. I was singing along, Drew. Like it was it was great. Absolutely great.
Drew 04:22
Okay.
Paul 04:22
Uh S the highlight Sunday is we paid for a uh cast member brunch, which is basically they serve you the food was very mediocre. It was crappy eggs and potatoes, whatever. But the highlight of it was like you were sitting at a table with like eight other people and they had like six six was six or seven cast members and they would yeah exactly six seven and they uh would spend ten minutes at each table so you got time to talk to them and and meet them. So, uh, yeah. I mean I I won't I won't bore everybody with all the cast members there, but uh do you know who Doug Jones is, Drew? I do not Chances are, if you have seen a movie with a tall, skinny person in full makeup, head to toe, it is probably Doug Jones. He was the fish in the way of water. He was the fish man in Hellboy. He was the dude with the hands on his hands in Pan's Labyrinth. He was uh one of the characters in Hocus Pocus in full makeup, like Billy Butcherson. He's the Baron in what we do in the shadows.
Drew 05:40
Oh, okay.
Paul 05:41
Okay. I would die for that man. He is the most like polite, nice, just happy to be there person I have ever met. Just incredibly just an awesome awesome, awesome dude. Uh yeah. Okay. Yeah. Yeah. I should probably put a IMDB for Doug Jones. I think I actually have it open.
Drew 06:02
I would have never I would have never known that
Paul 06:04
Yeah. Yeah, I mean dude, he's been in like I said, chances are if it's a tall, skinny dude in full like head-to-toe makeup, it is probably Doug Jones. Okay. Uh he's done so so many things. Um plus you know plus he's the Baron and what we do in the shadows, and I love the Baron. Uh And the reason he was there is he played a monster in one of the episodes, Hush. So he was he was there because of that, you know. So we got to we got to meet with all of the cast members. It was just a good time The panels were great, hearing the the actors like talk about the show, generally they they seem to actually like like each other. We kept hearing over and over again how excited they were for things like this because they get to catch up with people and talk to them and and whatnot. Uh Andrew, it was it was really good. Then we spent a day going to like museums and stuff and then we flew home. Okay. Museums and shopping is what we did. Yeah. Okay. Yeah. But Helmholtz Con was really good. It was a lot of fun. I really enjoyed it.
Drew 07:09
Now, Arabelle had a good time, I assume.
Paul 07:12
She did. Yes. Yes, she did. Yeah.
Drew 07:14
Now, did you guys go to any good restaurants while you were there? Uh you guys were you guys were at an Airbnb, right?
Paul 07:22
Yes, we've stayed in Airbnb. Yeah, we found some decent p like we didn't do anything extravagant. She's not a she's not a fancy kind of meal person, so The first night, like our Uber driver recommended going down to like the pier. Uh, there was like a pier close to Torrance. I can't remember what it's called. It's not one of the famous ones. He's basically said, like, it's a little less touristy. And we found a good restaurant there. Uh one of the nights it was Saturday night. We actually went to a uh Was it Saturday night or Sunday? Sun it was Sunday because the Stanley Cup finals were on. Uh and we w uh went to this great, great little like hole-in-the-wall pizza place in Torrance. Amazing pizza, awful service. Well uh well uh in their defense, I think they were way busier than they expected they were gonna be. Uh because they it was they had the Stanley Cup finals on and also the UFC fight at the White House. Yeah. And the place was the place was packed. Luckily we got a table, but uh they were just they were just overwhelmed. But the pizza was great. Uh Yeah. Yeah. Okay. It was it was a good time. Good time. Okay, so flights were easy. Yeah.
Drew 08:33
So you did it. You you did the thing. We did it. We did it. Would you do it again?
Paul 08:39
Yes. Yes. Okay. I don't know if I'd go every year, but depending on the cast that is there, I would definitely go again. Yeah. Okay. Yeah, 100%. 100% It was good. It was good. Yeah, I mean it's not a huge convention, you know. Like I said, it's I mean, maybe eight hundred I I would guess probably around eight hundred people, because that auditorium to get pretty packed in a couple of of the the panels and stuff. But, you know, it's everybody's there. They had they had people like be pretending to be Sunny Dale Sunnydale residents, so there was people, there was cheerleaders going around. There was like like watchers and vampires and stuff. And And like we avoided a like a lot of like the interactive kind of things. We mostly just sat like in like panel after panel after panel. Maybe if it went again, I would try to do it a little bit differently. But for us just being able like we learned that just like listening to the cast members talk about the show and asking questions and and having fun with each other. That that's what we enjoyed. We just sat there and just and listened to them talk and enjoyed it. So Okay. Yeah.
Drew 09:48
Yeah. All right. Now here's the question that I think was up in the air when we talked about this last time. Did your daughter decide to cosplay?
Paul 09:56
She did not. She did not. But we did talk about what we would do if we went again. So she has some ideas. So yeah. Yeah. And the and I will say a lot of the cosplay there was really good too. A lot of people did some really good like deep cut costumes and stuff. So it was it was good. I it's f you know, nine out of ten stars. Would recommend to others. Yeah.
Drew 10:22
Okay. Anything about the experience that you particularly didn't like?
Paul 10:26
I wish they only had like three food trucks there. Which I get it. It's only like 800 people. The food the and that was kind of meh. But Honestly, like it was a after the the whole shenanigans of actually like getting our badge and getting in, after that it was very well run. Uh You know, like some of the panels were a little bit late, but mostly it was because of the talent, like they would stay longer. 'Cause like you also like in the merch hall you could also stand in line to get like autographs and stuff of people. We didn't do any of that. We were like What we're gonna do with an autograph, nothing. Uh but uh yeah, it was good. It was really good. I really I really enjoyed it.
Drew 11:10
Okay, buddy. I'm I'm thrilled. I I was I was hoping that you had a good time.
Paul 11:15
And it sounds like it sounds like you and Arabell did. I did. I did. I'm still I'm still kind of buzzing from it. It was good. When did you get home Uh Tuesday evening. Okay. Alright. Yeah. Yeah. So I t I had I basically took a day off before and after the trip. And so that was it back at I was back at work today. So, yeah, you can imagine how that was.
Drew 11:38
Yeah, I can imagine. I can imagine. Are you off tomorrow?
Paul 11:41
No. No. Work tomorrow. Yeah. Yep. So yeah.
Drew 11:46
Oh good. Uh all right. A plus A plus plus.
Paul 11:49
A yeah, it was good. Good time. Would I would I would do it again
Drew 11:53
Okay. Mm-hmm. Okay. Um, all right. I don't have any other questions. I just was super excited to hear about your trip. I know you've been excited for a while, so it's really good. Really good. Yeah. So this next topic. I'm I'm mad. I'm mad You're mad. You're mad. Why are you mad? Uh because I'm really sick of Mar Keasley Brownlee's bullshit
Paul 12:16
Yeah, I haven't watched I've been s out, I haven't watched any of his reaction videos or anything to dub dub. I should probably look those up.
Drew 12:23
So I know that dubdub was happening. I wasn't able to watch anything because I've been in like slammed with meetings for work. I didn't get to watch any of it. I did read a couple snippets of things, but it's just like The the move now is for people to just like slam on anything Apple does, and I'm just sick of it. Like you know, like you can not like Apple. That's fine.
Paul 12:43
Yeah, yeah, 100%.
Drew 12:44
A lot of people don't. But find something legitimate to complain about. Like WWDC is not for the normies. It's not for it is yeah.
Paul 12:55
Outside like there is a polished tight like 60 to 70 minutes. that is for public consumption. And and then even then sometimes like they also announced Swift during one of those like big keynotes. It's a developer conference, right? They're there to talk about the upcoming features in the operating system so developers can get their apps ready for that the next upcoming year. So yes, I I'm with you. It is it is a developer conference.
Drew 13:22
Yep. Alright, so the only the only thing I really know that got play um that that kind of like hit my news feed and hit TikTok. was all of the AI stuff. Like Siri AI.
Paul 13:39
That was most of the keynote, to be honest. It was like 90% Siri AI.
Drew 13:44
And their AI play. Everybody's knee-jerk reaction to that is, oh, it won't work on my phone.
Paul 13:52
Yeah, yeah. I mean, but then again if you have an old Android phone, you're not running Gemini on your stuff anyway, right? I know. Right. I know.
Drew 13:59
So it doesn't bother me. It does not bother me. No. Doesn't bother me. No. Um I I I purposely haven't upgraded my phone the last Two years? Because I knew that we were about due for a pretty major update. So like I'm okay. I don't care about other people. I've never cared about other people. Um Yeah, I that was the only one that I really know. And I and you said you have a list kind of in here. Give me give me like your big ones that you feel strongly about.
Paul 14:25
Okay, so I mean this I'll pick th the things that I'm most excited about. Uh they made some changes to liquid glass. Now there's a slider, you can make it less glassy. Uh they kind of f fixed uh macOS They fixed like a lot of like the sidebars and the tool bars and the corner radiuses of windows. Uh you know, they did have this one like And it so basically like there's one new feature, it's Siri AI. But outside of that, they made lots of small improvements. Apparently, like things like airdrop is faster, moving files on the iPad is faster. uh the cam the uh images that you take with your camera show up faster in the photos app now. Right? So just a lot of like little like improvements Apparently uh one of the big thing I think people will notice is I'm sure everyone has had this where like you walk away from a Wi-Fi signal. using your phone and then there's like five seconds where you can't use your phone because it hasn't switched over. Apparently they fix that. So like things like that. Uh There's also some like little things that I think are very interesting. Uh they spent time talking about resizing iPhone apps. Uh-huh. Folding phone, right? Bigger bigger th and even things like if you have the betas, you have the Golden Gate Mac beta and you have the iOS twenty-seven beta on your phone. and you do like uh iPhone mirroring, you actually can like make that mirror bigger and pretend it's like a phone that's opened up. It's like your phone, but you can make it bigger. Uh so they're talking about like, hey, you usually make your iPhone apps respond to any resolution. And here's some tools in Xcode to help you do that. Wink. Uh things like that. Uh And then like most of it is like the Siri AI stuff. And I've installed the Mac OS beta on my Mac Mini, because that's what it's for. That's what I bought it for with beta's. I haven't really played around with it much. It does look nicer. Uh but everything I've heard about the iOS 27 betas, A, I've heard it's a really stable beta. It feels more like a beta 3 than a beta 1. So that's actually I may install it at some point here. Uh and everyone, you have to join a wait list to get access to Siri. And uh they also changed They basically redid spotlight and the spotlight index. Uh so apparently it can take like a couple of weeks for that index to fully rebuild. But apparently once that's done and you got Siri AI Everyone says it's really good.
Drew 17:13
So instead of just doing spotlight, you can just like talk to it? Yeah, exactly.
Paul 17:18
Yeah. Okay. Yeah. And there's there's more tools for developers to like they call it like donating your data to the spotlight index so it can show up. They have more stories around uh app intents. Uh As an end user, you maybe you you you see the results of App Intense because everything that shows up in shortcuts is an app intent, something that the application has exposed. for the thing to do. So apparently Siri AI can see those and and do things inside of your apps if that is exposed and if the data is there. Yeah, so like everyone is saying like hey Siri it's Siri is good now. That's what they say. Siri is good. It sounds better, it's more responsive. Yeah, exactly. Yeah. Well what they s here's here's what they have said. So they basically they they did some tech talks about this. So they claim it is none of it is Gemini. None of it. It sounds like they distilled their cut model custom models with Gemini. Got it. So they basically fine-tuned Gemini. Yeah, ex yeah, they they took their models and fine t used Gemini to fine-tune them and and get them. It does a lot of it does run on so private cloud compute used to be Apple servers and Apple data centers. Now private cloud compute is NVIDIA GPUs and Google data centers. Yep, yep.
Drew 18:49
They they base they basically have adopted GCP as their cloud infrastructure.
Paul 18:52
Yes, a hundred percent. Yep. And and they were not shy about it. They're basically like they they came out right and said it like, hey, we've partnered with Google. Google helped us do this. Like You know, Gemini was the foundation that we we were able to achieve these things with. Uh they have some new on-device models. Uh which one you get depends on how much RAM you have. So if you only have an 8 gig phone, you get a smaller model that uses more of private cloud compute. If you have 12 gigs or more, I guess. Uh it's the larger model, so it does more things on device, but it still can go to private cloud compute uh for the the bigger things. So, I mean I I'm I may install the beta here this weekend and join the wait list. Maybe by next week I'll have an opinion on it. Uh But it looks very everything they've shown looks very promising. Uh I've watched a little bit of like the developer story and uh one one of ooh this is one interesting thing that I think is really smart. uh that Apple did. So they they've always had these things. They called them like the foundation uh like foundation model like an API against the foundation models. But what they have done is they've kind of made it extensible. So you can plug in any AI provider and use their API like use one API over top of any you know, whether it's a local model or external to allow you to switch to be able to plug things in or, you know, I think it's just real smart. Like Apple's kind of, I think, realized that like a lot of like research and stuff happens on Macs now, so they're kind of just embracing that and making it easier. Like you know, you can bring your own model and plug it in and use the same set of developer APIs for your application. Yeah. Uh so
Drew 20:46
Yeah. Huh. So a as a developer, is there any of the developer stuff like the Swift things or anything about that that excite you?
Paul 20:59
Uh y Yeah, because before as a developer you really didn't have access to the stuff that ran in private cloud compute. It was just a local on-device model. And the like the one on in iOS 26, it's a very small, very limited model. You can still do some very interesting stuff with it. In fact, like uh Marco Arment, uh developer of Overcast is using it to do transcription for his his podcast app. It's vr it's very capable, but but it it is limited, right? So now as a software developer you can get access to these larger Apple models and actually like integrate them into your app, which I think is very interesting. Now there is one thing. So To get access to private cloud compute as a developer, you cannot have any application in your developer account that has had more than two million downloads. If you have two million downloads for an app or more, you are cut off from private cloud compute. Period. Hard cut. All of your apps. Your whole account. So like basically go find your own. Yeah. Now this could be simply because Apple doesn't have a way to bill for it yet. Maybe they don't fully understand. what kind of capacity they're gonna need and how much demand they have. So they're kind of just hedging their bets. So maybe that will loosen up as c if you know if there's enough capacity to deal with it. Maybe it's simply just a matter of we don't know how to bill you for this yet And they will figure that out and open it up. So like a lot of the large apps, any app of kind of like any moderate success is not gonna have access to private cloud compute as of today. And it's one of those things that like if you what they're saying is like if your app is shipped using private cloud compute and you are successful and you hit that two million download mark, you're just cut off. Boom. So yeah, at that point you you'd have to bring your own model uh into the app to keep that functionality. So I guess that is kind of a a gotcha there, but uh You know, I I'm again I was out last week. I really didn't have a lot of time to spend with it. There are some like WWDC videos I am interested in watching just to kind of see where things are. Uh but Yeah, it looks I mean and everyone's saying like okay so Apple's caught up to where everyone is two years ago. Well yes, you're absolutely right, but they needed to catch up. Like the step one was catching up. Right. So that looks like what they have done. You know, all the the the things about like, you know, like asking it like, hey, uh when does my you know, when does my mom arrive? And it goes and goes, oh, looks in your email and finds that there's the flight information for your mom and then tells you that you can do things, what's good to eat around there. And it will like You know, all of those LLM videos you've seen, Siri will be able to do that in iOS 27. And it looks like so So if obviously you remember two years ago when they announced some of these features and they never shipped them, right? It it was it was all vaporware. And a lot of people were very upset about the videos because like none of the demos were kind of shown live. It was very much like cut, cut, cut. If you watch the WWDC videos. You can tell they were aware of that and like leaned into it because anytime anyone did with anything with WDC, it was a two-camera setup. One was pointing at them and one was over their shoulder pointing at the camera. And the wait times are still in there. If the like there's one video where the guy made a typo and that's in there Like there was no cuts, it was like live shots one to one from both. You see his hand move in one, you can see it in the other. They did not speed it up. Like if it took five seconds to get the response You were awkwardly waiting with that dude for five seconds on can on on screen for it to come. So they were really aware, like like they wanted to show this is real. And like I said, they didn't shy away from saying that they get got Google's help. You know, th they right on the keynote, they're like, hey, we partnership with Google. We use like Google Gemini to to refine these models and It looks like it worked, whatever they're paying them. I guess the rumor is like a billion dollars a year. Probably worth it. Yeah. It's probably not cheap. No, no. What I've heard is basically the billion dollars Google gives Apple to be the default search provider and Safari is just going right back to Google. A lot of this. Yeah. Yep. Like you give me that money and then I'll give it right back. Uh which is very trendy these days in AI.
Drew 25:56
Uh yeah. Yeah. Gamer uh dude, Steve from Gamers Nexus has been on his shit, dude. Have you seen any of his recent videos? Uh-uh. No. Oh my god. You gotta you just gotta hover to that channel and see what they've been making over there, man. It's crazy. It's crazy. Okay. Uh speaking of tech other tech stuff, sounds like you had uh a death in the family.
Paul 26:22
My Synology is Well well for starters, it's on the floor right here beside me and that's not usually where it goes. Uh I went to access it one uh one morning before I left for uh LA. I'm like, huh My Synology's not online. That's weird. And I went over to it and I started looking at the UPS and sort of let's see if it was plugged in. It's like, oh, it's just not running. Oh, it won't turn on. Okay. So I has okay. I've been through many stages of Of my recognizing that my Synology. Now granted, it's from like 2018, I think, is when I bought it. It's been around. It's been on same original hard drives, pretty much on 24-7. I am not surprised that I I knew I knew this day would come. Okay? So I have started looking and doing some research. So oddly enough, true. If you start doing enough research, you will find a lot of YouTubers who have my exact same model, whose Synology has died recently, and apparently the biggest cause of failure is the power supply, which is an easy thing. I was gonna say Yeah. I have a power supply on order. It'll be here tomorrow. So I will know whether or not my Synology still works. Tomorrow my power supply arrives.
Drew 27:51
I mean if it if it just doesn't turn on, that's the first thing I would look at.
Paul 27:55
Yeah, yeah. So That is coming, but it has made me kind of just generally aware of the age of my Synology and my Naz, and I will be replacing it with something at some point. Another stage of this whole thing was, oh my god, our hard drive's fucking expensive right now.
Drew 28:16
That too.
Paul 28:17
Like, oh my god. And then like at the same time, like literally the day after Mike died, a someone I work with reached out for advice because they had a NAS where a four a four-ray NAS, they lost a drive. They put a new win to rebuild it and during the rebuild they lost another drop.
Drew 28:39
That's not that's not good. That's not it's yeah, that's that is the ultimate doomsday scenario.
Paul 28:44
Yeah, so now I'm like, okay, granted, like The the the files that are not replaceable, like my DVD rips, whatever, I don't have a lot of that stuff backed up. But the stuff that I cannot replace I have been syncing it to B2 and Backblaze. So I have a off-site backup of my NAS. So my data is fine. In fact, like all of the podcast episodes. All the raw recordings, everything is all in B2. It's all backed up in B2. We're good. We did not lose any of that. Okay. But I have started thinking about like, well, I'm gonna have to replace it with something. What is that? So I've been I have not decided. I think if I uh If I had to buy something today, I'd probably buy one of the Ugreen NASS. U-green. Tell me more. So they make a lot of like charging equipment and power supplies and they and they've they're the latest newcomer into into the market. Uh they do not have all of the features of the other uh Like it's not feature parity with Synology yet, but the price for the amount of compute and Bayes beats everybody else. And they are very like tinkerer friendly. So like if you want to do something like like the open source, like TrueNAS or free or TrueNAS and what's it you uh Unraid, some of the open source. Yeah. They're perfectly like absolutely like we will support that. Your warranty's still good. In fact, like our OS comes in a little uh you know M2 SSD on the board if you want to open that up and put your own OS drive in there Great, we support you. Go for it. Right? So they're very like the they basically saw the backlash of Synology and said, I bet you if we did the opposite thing, people would like us. And it turns out People are liking them. Yeah. Yeah. Uh like I said, it's not quite feature complete with the Synology stuff, but like It's Linux. They give you SSH access. You know, it's very like they're very big on like, hey, bring your Docker containers, we'll run them on on these things pretty easily. So if you want to do something like cloud sync, you can just pull down like R clone or R Sync and use that. And which I'm perfectly comfortable with. You know, terminals don't scare me. Uh so I'm I'm kind of thinking about that, but you know, I'm thinking like, well I probably should get a six bay so I can have two disc redundancy, and then it's like you start pricing how much six hard drive costs and you're like Hmm. This is at least gonna be $3,000.
Drew 31:21
I mean I'm looking I'm looking at Micro Center right now and they have eight terabyte 7200 RPM ironwolf drives for 300 bucks.
Paul 31:30
Yeah, yeah.
Drew 31:31
But as bad as I thought, but still.
Paul 31:35
Now I ha I have done research and apparently uh Seagates are cheaper than Wester Digital, but they're louder.
Drew 31:42
Uh my NAS is not in the same room as me. Yeah.
Paul 31:46
Mine usually is. Usually it sits right here. Usually you can't hear it, but like among like, well, if the price isn't that much, maybe I should just get The Western Digitals, because you know, they're a little less quiet. But I I haven't done anything besides price shopped and looked around and watched some YouTube videos, but keep this this'll be our ongoing Paul's NAS corner, I guess. This will be an ongoing story as I figure out what I'm going to do. Hopefully this thing comes back to life and I can get like a few more months out of it, but it's just yeah.
Drew 32:19
The price of the NAS doesn't have to be a good idea. I'm not familiar with this uh with this U-green company. Huh?
Paul 32:29
Yeah, it's really like You know, Synology, QNAP, U Green, or Build Your Own. That's those are pretty much the four big options today. I I don't know. I mean, maybe build your own is Something I'll consider. I mean maybe there's a future Paul takes Drew to Micro Center.
Drew 32:59
Well speaking of Micro Center Uh, I've been on my bullshit here too. Okay. So yeah. Did you buy anything since I bought a lot of things and I returned a lot of things. Okay. Let's talk about it.
Paul 33:14
Okay, let's talk about it. Okay.
Drew 33:15
So last episode I was talking about potentially looking at a DGX Spark. which is Nvidia's. Basically they call it a desktop supercomputer where it's got an ARM processor and it has 128 gigs of shared memory. Uh the idea is it's for local AI development, local machine learning development, and it just has a fuck ton of RAM. Okay. So I hemmed it hard after we recorded, like, should I just go get one and see how it does? Because to level set in my current server machine here for lack of a better term. I have two graphics cards in there. I have my 5090, which it came with. And I have my old 3090. Now, a 3090, by the way, like a used 3090 are impossible to find. They're still very quick. They have 24 gigs of RAM. They're not super power hungry. So they're still a very, very good card for local inference, image generation, content creation, whatever. So 32 plus 24 is what 50 some gigs of VRAM. Yeah. Right. And then there's a little bit of overhead for like the OS. you know because i'm i am running linux desktop i'm not running like a true linux server type scenario all right so i i have been looking at different large language models And the problem is you cap out pretty quick. Even on a 32 gig card, you're really only looking at anywhere from 8 to 20 billion parameter models. All the cool models are 80, 120, 200, 300, 800 billion parameters, but Once you start getting up into those crazy numbers, you need hundreds of gigabytes of video memory, which as a consumer I'm just not going to have access to unless I were to like go buy a bunch of video cards. U uh you can do RPC between you know, you can you can basically build a compute cluster and configure the runtime to know that it's multi
Paul 35:25
cluster aware and all this other stuff, but it's just like connected with like thunderbolts so you get fast connection all the way through. Yeah.
Drew 35:31
Yeah. Right. Um actually here's a fun fact. So the DGX Spark is designed to run in a cluster. They don't use Firewire or I'm sorry, they don't use Thunderbolt or USB C4 to connect to each other. They actually use SFP. Okay. Yeah. Yeah. For the bandwidth. So like there's actually SFP ports on the back to plug in. Yeah. So microcenter cells sparks. So last weekend um I had a drink at dinner and I thought to myself Let's just go get one. Let's see what happens. I'll put it through its paces. Worst case, I'll return it if it doesn't do what I want. Because I watch some YouTube videos and While it has a lot of memory, it's not very fast. So unless you're running a bunch of different models or high concurrency workloads where you have multiple people hitting it at the same time. People are like, it works, but it's never gonna be as fast as a discrete GPU. Vito's very upset about something up there. He is. So I went and picked one up. Um it after taxes, it was about $5,000. So pricey. They they've they've gone up in price recently. So I took it home. Uh and then I basically spent all day Sunday with it. And I got it plugged in, I got it configured, I installed some models on it, and I wasn't impressed. And I was like, I was like, I'm just really not getting the performance that I want out of this.
Paul 37:00
Especially for that amount of money, right? You want something that honestly probably wows you a little bit.
Drew 37:05
I I could load very large models on it. That worked fine. That worked absolutely great. But I could not actually get decent performance. With a large model and a large context. So now I wanna I wanna go on a slight detour here real quick. And I wanna bitch about benchmarks and I wanna bitch about Reddit and I wanna bitch about everybody who tries to tell you what is good enough for different models and different hardware. So you have to understand is that everybody's local setup is different. You have some people that have really really powerful video cards, but not a lot of system RAM and CPU. You have people that have the opposite. You have people that uh want to run the smallest possible models on the smallest possible hardware and they will say, I got this great token per second performance on this one model. And it's like a four billion parameter model, right? Yeah. So it's very, very small. And You will see people talk about tokens per second, time to first token, uh context window size, all this other stuff. So there's just What I'm trying to say is all these models are utter bullshit. You're uh really what you need to understand is that model big Model need RAM, differ different frameworks, different memory types, different manufacturers. They all perform differently. There can be differences in performance between uh one NVIDIA 5090 and another NVIDIA 5090 just because like one is a founders edition and one is some aftermarket one that is overclocked.
Paul 38:40
So yeah, yeah.
Drew 38:41
Everything is bullshit. Every benchmark you look at, okay Now that said, there are legitimate benchmarks that measure model performance in terms of real world tasks. So like for example I don't know if you're familiar with the website SWE Bench.
Paul 38:58
I am not.
Drew 38:59
Okay, so let me just put this in here real quick because this is like kind of important to the conversation. So what this is, is this is a leaderboard that measures the effectiveness of different software engineering models. Okay. And basically like out of a hundred percent, how good at they are are they at resolving tasks? Now you'll notice that obviously Claude and Gemini are up there. Um, but in the in the drop down you can pick open source only. And change it to all OSS agents. And you can see that like these are the ones that you could run.
Paul 39:35
Okay.
Drew 39:35
Okay. And some of them are pretty good. Are they as good as Claude? No, like Claude is like sitting at like a 76, 77% on the SWE bench. Yeah. But like some of these are pushing 60, 63, 65, 68%. Okay. Now here's the thing. These are not very small models. Okay. So like the GLM models, pretty fucking big. Um minimax is like a hundred and eighty gig model. So like you're not gonna fit that onto most local things, okay? Okay. So if you're willing to compromise a little bit, like some of these are actually not bad. And we're gonna come back to this a little bit later Okay. But all but suffice to say that like when you go online, like you'll go to you'll go to like the local LLM subreddits and be like, is this a good setup to run Quen? Is this a good setup to run whatever? And everybody's like Everybody has very big opinions about it. And and and and and look, everybody's wrong. Everybody's wrong. Here's the thing. You are never going to know how this is going to perform for you because you need to understand the task you're trying to do, the appropriate model that you would need to do it with. the amount of hardware to run that model and how complex of a task are you asking it to do and how long is it going to take? Because like Everything I just said, like time to first token, reasoning, thinking time, tokens per second on the return, all of that matters when it comes to local inference performance. Okay. So I part of the reason that I did not like the DGX Spark is yes, I could load a very large coding model into it. Yes, I could host it on my network and I could connect to it with a local coding agent. Yes, I could submit a task. The performance was abysmal. Like, there was a threshold that I was willing to accept, but like just getting it to like spit out a a framework of a website I was like, yeah, I don't I don't know about this.
Paul 41:39
Yeah. It's okay. Especially when Claude can do it in like a couple like yeah. Yes. You can one-shot an application in a few minutes. Like, yeah. Yes.
Drew 41:48
And and that's just the thing. I could pay $30 a month to Claude or $5,000 for this, and eventually I'll make my money back, question mark, because I'm not constrained on tokens.
Paul 42:00
Right.
Drew 42:01
But I mean that's the that's the thing about local inference is like I'm not constrained by cloud, I'm not constrained by tokens, I don't have a monthly subscription fee. Those are all things. Okay. So I returned the DGX spark Um but I but I I yearn. I yearn for more video memory.
Paul 42:19
Okay. Now to buy another video card
Drew 42:23
So I thought about it. Oh I thought about it. But but here's the thing. Realistically Uh the NVIDIA 5090, which is the t which is faster than video cards with more memory right now, the going rate for a 5090 right now is like four grand. So it's damn near what I would have paid for the spark. And I just, I you know me, Paul. I love spending money. I love being financially irresponsible. I just, I really felt bad about buying the DJX Park and spending that much money. So I said, there's gotta be another way to do this. So I was thinking to myself, I was thinking to myself, self, I got this really nice motherboard, and it has a lot of PCIe lanes. Okay. What can I do with that? Now I know I just said I wasn't going to go buy another 5090, but cost wasn't the only consideration there. Okay. Because those video cards are big. They're big and they're hot. Big and hot. They're big. Like they are three or four slot cards, meaning they take up a lot of real estate in your case. They also cover up other slots on your motherboard.
Paul 43:41
Yes, you can't use all the slots.
Drew 43:43
You can't use all the slots. And that's and guess what? That's that's that's where the memory goes. So I wasn't about to go out and spend that much more money. In fact, even these smaller video cards, because remember, I have the 3090. I've got 24 more Gigs of VRAM right there. Right there. So but I want more. I want more. More. Yeah. So I did so I did some thinking Okay. And I found something interesting that when I realized the implications of it, it was perfect for me.
Paul 44:22
Okay. Alright, so we have a link here. This is a link to AMD's website, AMD Radeon AI Pro. Graphics. Scalable performance that performs. I'm preparing to send you a picture. Okay.
Drew 44:42
But you keep reading.
Paul 44:43
Okay, so I see them. There are four stacked in there.
Drew 44:49
So are these are these are obviously the full slot video cards that have 32 gigs of RAM. Okay. Okay. Alright. Now so 32 is more than 24. So after I return after I return the DGX Spark. I went over to where uh by the way, Micro Center is currently being remodeled. Nothing is where it is supposed to be. It really gives me a lot of anxiety. Okay.
Paul 45:21
I I have seen the advertisements that they're open during remodeling. I have not been there, but yeah.
Drew 45:26
Uh it's much brighter and smells better in there now. Okay. That's all I really have to say about it. Will it be nicer when they're done? I think so. Okay. I think so. They got rid of all the crummy carpet. It's gonna be it's gonna be way better lit. Let's put it that way. Okay. Okay. Um anyway, so I went over there and I was like, I want one of these cards. Now Again, slight deviation here. Let's talk about how your computer takes a prompt and submits it to an LLM running locally. There is there is a software layer in between there. The most common one that you'll hear, because they are the market leaders, is NVIDIA has CUDA. Yeah. CUDA is the runtime that basically handles all of the compute and memory interactions between a software interface that like hosts the model. It takes care of loading it into memory, memory management, context management. Um You know, ca you know, encoding the encoding and decoding the tokens and sending them back and forth between the hardware. That is usually referred to as a runtime, and CUDA is just one runtime. For example, you can also have pure CPU runtimes. They're very, very slow, right? Because they're not running on GPU hardware, but that's another choice. And there is a um There is a interface here called uh Llama CPP. Right And what this is, is this is sort of the granddaddy of what is used for talking to LLMs. Now the way you typically get this to work is you would go grab the source code and depending on what you want to use it with, you compile it. So if you're using it with an NVIDIA graphics card, you will compile it with NVIDIA flags. If you were going to use this on a Mac, if with metal, you could tell it to compile for metal. If you wanted to use it with AMD, you can have it compile with Vulcan or you can have it compile with their Rock M architecture, which is like their competitor to CUDA. Okay. Um the problem is I have mixed graphics cards.
Paul 47:54
Yes.
Drew 47:55
I have an AMD. And I have an NVIDIA card now. Okay. So I get it home, get it all plugged in, get it all booted up, get all the drivers installed, get everything ready to go. How do I get 64 gigs of video RAM? Well, now I'm gonna give you a plug for my favorite utility, and I've probably talked about this before, and that is this program called LM Studio.
Paul 48:21
Yes, we I remember this. Yes. Yeah.
Drew 48:23
This program is so freaking good. It's so good. It takes care of managing all of the dependencies. It takes care of downloading and organizing your models. It also lets you effortlessly swap between runtimes.
Paul 48:39
Okay.
Drew 48:39
And Vulcan supports NVIDIA and AMD. Okay. Is it as fast as CUDA? No. But I can now load models greater than, you know, 32 gigs or 48 gigs into LM Studio using Vulcan and bam. I now have 64 gigs of total LLM memory to use
Paul 49:04
Okay, so how much so okay, two questions. One, how much do these AMD Radian AI Pro graphics cards cost? And two Could you put more of these in your computer?
Drew 49:14
Okay. So let's take that let's take that one at a time. Okay. So right now, the one that I bought at Micro Center is this guy. Now I told you that a 5090 goes for about four grand.
Paul 49:25
Mm-hmm. Okay, hold on, kick it by opening it now. Oh That's only fifteen hundred dollars. And that's one of the more expensive models.
Drew 49:40
Okay. They're anywhere between fourteen hundred dollars and twelve hundred dollars. Okay. Okay. So thirty-two gigs of GDDR6, PCIe five dot oh, thirty-two gigs of RAM. I uh all I'm trying to do is just memory max. Okay. So again, slight detour here. When you load a model for inference, the total size of the model is only one part of the equation. Do you know what else is important?
Paul 50:09
I don't.
Drew 50:11
Context size.
Paul 50:12
Okay, I was I was I was gonna guess context. So you need to you need to s basically allocate memory to be able to hold the context. Correct. Okay.
Drew 50:20
So for example, if I want to do coding tasks, the size of my context matters.
Paul 50:25
Yes. Yeah.
Drew 50:26
So if I want to load up a a particular model, let's say the model itself Is 48 gigs. That assumes that I have a very, very small context window. If I want to have a 200,000 token context window. It might not be 48 gigs. It might be 70 gigs. It might be 80 gigs, depending on the model architecture. And like I'm not gonna go off on a tangent between quantizations and fp six and npv4. All the different types of models. Like go watch YouTube. That's not like I'm not smart enough to describe it properly, but just having enough to fit the model in memory isn't enough. Context matters, and you need more memory. So let's go back to your second question.
Paul 51:14
Can you put more of these into your computer? Am I gonna get a picture here? Got my phone. Okay, here we go. There is two of those suckers in there isn't.
Drew 51:27
So it's a tight fit. These things are designed to butt up against each other because if you look at the picture on Micro Center Yeah. It is not a card like the NVIDIA card that has the fans in the huge heat sinks. This is a blower card. So the fan spins and it blows it straight out. The NVIDIA card has fans that blow onto the heat so you can try to vent it out. Now, if you look at that picture in there, there's probably something you notice. I got a lot of shit crammed in there.
Paul 51:54
You have a lot of shit crammed in there, yeah.
Drew 51:57
Not a lot of space for that heat to go.
Paul 51:58
Yeah, I don't think yeah, yeah. So Is it open on that all the time for heat?
Drew 52:04
No. No, okay. So here's the thing. Um when I'm just doing regular inference, it really doesn't get that hot. But what I what I can now do is because I have 64 total gigs between the AMD cards and 32 on the NVIDIA. I can do really cool stuff with inference to where I can have it do image generation at the same time. Mmm. So I can be chatting with my LLM and I could be modal. I could I could submit a picture of like, I want my website to look like this. And the NVIDIA card can do all the vision stuff. The AMD cards can do all the inference. Okay. So now we're rocking with between all three cards, 96 gigs of Okay. Uh or act wait, yes, 96 gigs of total VRAM in my PC now. Okay. And it works beautifully. It works beautifully. So Now that I got it all set up and I put it through its paces, it was time to do a real world test to see is this going to be all worth it? Because right now, out of pocket. Between the two video cards, I'm out of pocket about $2,800 instead of the five grand for the spark. Okay? I have not as much video memory. But I spent considerably less and it's faster. Okay? Yeah. Yeah. Yeah. So I wanted to do a real world task. So what I did is in LM Studio, I fired up this model. So this is Quen3 Coder Next. This is kind of regul this is kind of regarded as one of the best, if not the best, open source coding models out there. And it is an 80 billion parameter model. Okay. So I loaded that bad boy up. It fit on both of my AMD cards. I gave it max context, which is about 250,000 tokens.
Paul 53:59
Okay.
Drew 53:59
Okay. Now I need a way to actually have an agent that writes code against it. Yes. Well, open source to the rescue. So what a lot of coding agents can do is they can point to basically any open AI compatible endpoint. Right.
Paul 54:20
Okay.
Drew 54:20
But I decided to give open code a spin. Okay. Okay. So what's open code? Uh it's Claude, right? It is basically a CLI. They they they do have a desktop app in beta. And this can also talk to Claude. It can talk to any model you want, whether it's a Frontier model, local model, whatever, where it basically does the same thing. So what I did. is today when I was supposed to be working. I I I I got all this loaded up, I put this in plan mode, and I basically described to it a website that I wanted to build. What I wanted was just a very, very simple website built in JavaScript. And I wanted it to basically have a status bar that showed the different services running on my computer and whether or not they were reachable. So for example, LM Studio, right? One of the nice things about LM Studio is not only is it a GUI for talking to an LLM. It also has a server component. And that's what I use to host the LLM. So open code talks to that. You basically go to developer mode, start the server. Generate a key, load a model through it, and it exposes open AI compatible endpoint. So I load all that up. Um Tied that to and and like I described what I wanted to it. It asked me it like and the model asked me a bunch of questions. Well, how do you want to handle this? And do you want to do this? This is some reasoning. Yeah. It's reasoning. It's asking me questions. So It used by the time I was all said and done, it had used less than 50% of that total context. It took about 90 minutes for me to get a working app. And when I say that, it spit out the application pretty quick, but it didn't work right away. Yeah, it's iterate a few times. I had to iterate a few times because like first it wasn't rendering anything and I was like and I went to the model and I'm like the screen is blank, nothing is rendering. Can you check for errors? And it like pulled it up. It was like, oh yeah, I see. I don't like the C like the CSS wasn't attached, right?
Paul 56:21
That happens with Claude. Like you're like you you're not done. Did you try you test this? It doesn't work.
Drew 56:27
Oh it it built it built every API back end route, it built everything, but it was just like blank screen. I'm like, all right. So, you know, had to iterate a few times, had to type in a few more prompts. Uh I did run into an error with open code where it like just Forgot how to write files for a minute, but that's okay. We got past that, figured out what's wrong with that. And uh now I got a working app to where I can add services like Plex, LM Studio. Um my Docker containers that run different services. And it's a nice little dashboard and it basically tells me, hey, is that service up or not? And that took me about 90 minutes today.
Paul 57:02
That's awesome.
Drew 57:03
Yeah, so this is this is pretty this is pretty rad. Yeah. Now don't be like me Right? I'm not doing this enough that this is cost effective or is gonna earn me all my money back in anytime soon. Absolutely not. But I think that everybody is starting to realize that token maxing can be expensive. And if you want the privacy, if you want the security, And you have some money to spend where you're willing to do a little bit of outlay, you can get Claude at home. You can do this, right? And like if I were to rip out my 50-90 out of my case. I could slam in 64 more gigs of these cards. I'm not sure I want to. Okay. Okay. I'm not sure I want to because Having the 5090 for image generation, which is still something I mess with, right? Not quite ready to give that up yet. Now, I could take the 5090 out of there and put it in my gaming PC, because I don't have a 5090 in there. So it's like it's gonna go to waste. Right, right. But I'm not not thinking that Okay. Yeah So these have all been very, very good tools. Now if you go back, excuse me, if you go back to that SWE leader bench. Mm-hmm. Some of these really big models, like the LingZ and Kimmy K2 and um some of the Minimax stuff like I could load those. It would be some it would be some CPU offloading. It'd be dog shit slow. But I could do it. I I could do it. And I am also playing with some other models like GLM5, uh Deep Seek apparently can do some of this too. Like there's a couple Deep Seek models that I can run. So I'm gonna play with some other models, but I had a really, really good time with the Quen Coder next. That that worked really, really well.
Paul 59:00
Awesome.
Drew 59:01
Yeah. So that that's kind of what I that's the kind of the bullshit I've been up to. Now there is a footnote to this story. Okay. You might be wondering, Drew. Drew. You said you had a 3090. Hey, where did that go? Where did that go? Where did that go, Drew? You sort of you sort of uh you sort of stole my thunder a little bit. Okay, I'm sorry.
Paul 59:35
Oh, you got an external GPU enclosure. So is that hooked up to your machine too? Like Not yet. Not yet.
Drew 59:47
So I I have this. I have it all. So first of all, tell people what this is.
Paul 59:54
This is a Razer Core X V2 External Graphics Enclosure EGPU. It's designed, you put a GPU in there. I was briefly toying around with these at part of my handheld phase. It never really worked out for me. But basically it's an enclosure. You thought it's put a power supply or sometimes they have power supply in them, a GPU, hook it all up, and then it connects to your device, usually via thunderbolt for the bandwidth, though there are some proprietary like doesn't razor have like their own Option like sometime. So no. This one this one is Thunderbolt. Then about four and five. Yeah.
Drew 1:00:31
Okay. So I got this. I got it all hooked up. I haven't played with it yet, but there's there's one thing that I'm concerned about. So my motherboard supports USB four Okay. Which is not the same as Thunderbolt. It's still twice. It's still pretty high bandwidth. I don't know how that's gonna work. If it's gonna be performant or not. But I threw so it does require a PSU. You basically throw that in there and turn it on. And you actually have to hook up like your ATX uh power to it and another like eight pin connector and then the connectors for the graphics card. Um but then you turn it on and it connects via USB or Thunderbolt. Like I said my my motherboard does not have thunderbolt
Paul 1:01:18
Yeah, it looks like Thunderbolt four and USB four have the same bandwidth forty gigs. Yeah. F yeah, exactly. Up. Thunderbolt increases that up to 80, but looks like there's also 120 if it's unidirectional. Yeah.
Drew 1:01:34
Now put that in perspective, like native system bus is hundreds of gigabytes.
Paul 1:01:40
Exactly. Way, way faster. Yeah.
Drew 1:01:43
So the real question is, will I get decent performance? Will it be worth it? Now, I didn't pay this much for it. On Amazon, it's $349. I think I paid a little less than $300 at Micro Center for it. Okay. So maybe this is sunk cost, but I would love to be able to get access to that extra 24 gigs if I could get it. Okay. Yeah. Otherwise I'll otherwise I'll throw it at another computer and do something else with it. But yeah. Yeah. So that that to be continued because I haven't really played with this sufficiently yet.
Paul 1:02:17
So yeah, I'm glad you found those AMD cards. That looks like a pretty good like performance to price.
Drew 1:02:23
Yeah. I mean it's ridiculous, right? Like for 32 like t I basically added 64 gigs of video RAM to my computer. For less than three grand. And again, using the Vulcan runtime at LM Studio, like I was getting forty or fifty tokens per second with that Quen model. Like it was it was plenty fast. It was plenty fast for what I was doing. That's awesome. So yeah. Yeah. So that's been that's been my big science experiment. I'm I'm I'm pretty well pleased. I'm pretty well pleased.
Paul 1:02:51
Nice, nice. That's excellent.
Drew 1:02:53
Yep. Cool. Well, that's a show, huh?
Paul 1:02:56
We did a show.
Drew 1:02:57
We made it.
Paul 1:02:58
Yeah. You know what I'm gonna say? Doing the best. com. You can find all the show notes, all the links. And hey, thanks for listening.