2025-12-24

Where is this AI stuff going?

There is so much money tied up in AI these days that it seems impossible that all these AI comapnies are going to survive. Eventually the venture capital money is going to dry up and who is going to be left holding the bag?

At the same time the progress on models seems to be slowing down. When I use Cursor I’ve got a varierty of different models to pick from and I don’t see a super meaningful difference between them. Perhaps I’m just oblivious but they’re all pretty spectacular. And that’s part of the problem, changing models comes at almost zero cost. So today I can use Gemini and tomorrow GTP-5 so what moat do these companies have other than capacity? Even if I’ve got an applicaiton that is using an AI API then switching to a different provider is trivial.

Meanwhile running models locally is getting easier. Sure they’re not as good as the big models but for many applications they’re good enough. And running locally means no ongoing costs, no API limits and no data privacy issues. If you’re a small or medium business then buying a few GPUs to act as a farm for employeees to use is a one time cost that might make more sense than paying for API calls forever.

I honestly belive that the future is going to be a hybrid approach: some operations will run on local models and then fewer will be farmed out to a larger cloud model. But the number of application which make sense to run on a cloud model is going to be fewer and fewer as local models get better and better. Summarizing emails, writing cover letters, cleaning up data and reports are all things that can be done with small models. So that’s going to really undercut the market for OpenAI and others.

But here is where things get a little interesting: I was reading that OpenAI is purchasing something like 40% of all the memory being produced by 2030. And that’s just one company, if Microsoft, Google, Amazon and Meta all jump on this bandwagon then what memory will there be left for everyone else? What if that’s the intention? We’re already seeing consumer memory prices spike.

Here are the average prices for DDR5-6000 2x32GB kits over the last 18 months (from PC PartPicker):
DDR5-6000 2x32GB (Average price in CAD over last 18 months)

That seems like a hockey stick graph to me. And this memory is used in everything from desktops to laptops to phones to TVs. I best stupid toasters and fridges use memory now too. I’m remminded that during COVID a lot of tech companies were hiring anybody they could, not because they had work for them, but because they wanted to hoard talent and keep it away from competitors. What if the same is happening with memory and compute? What if these big AI companies are buying up all the resources they can get their hands on to keep it away from competitors and startups? And perhaps they’re buying them to keep them out of the hand of consumers so they cannot run models locally?

That seems like a terrible thing for everybody. This strategy, if it is indeed a strategy and not a wild conspiracy theory, is going to cost billions. And just how long is venture capital going to pay for hording resources? At some point these companies are going to have to make money. And if they’re spending all their money on hoarding resources then there won’t be anything left for R&D, marketing or sales.

2025-11-05

Jujutsu Cheat Sheet

I’ve started playing around a bit with the source control tool Jujutsu which is commonly referred to as jj. Git has been my go to tool for what seems like decades now but in the before times I worked as a release engineer and made use of a huge stable of source control tools as our code base was spread over many versions and had been created from purchasing lots of other companies. For a while there I was working on a daily basis with

  • ClearCase
  • Perforce
  • Subversion
  • CVS
  • Visual Source Safe
  • Mercurial
  • Git
  • CCC/Harvest

I’m using Jujutsu at my day job now because it just layers transparently on top of git so I don’t need to go seeking permission. I’m only a few days into using it and I’m not thoroughly convinced yet that it is better than git but I’m willing to keep trying.

Here are some of the commands I’m using so far:

Get the latest version of the code from a central repository locally

jj git fetch

Start new work from the latest mainline

jj new main@origin -m "Whatever I'm going to work on"

Bookmark the work with a name I’m going to use as a branch in git

jj bookmark my-feature-branch

Push my work up to Github

jj git push --allow-new

Create a new commit before my current one that I can squash into

jj new -B @ -m "Some description of the work"

Push individual files into the parent change

jj squash path/to/file1 path/to/file2

I’ll keep expanding this document with new commands as I uncover them.

2025-07-12

Postgres Data Masking and Anonymization

Postgres Data Masking and Anonymization

Predicting the performance of a web application is always a little bit difficult. If it’s a question of how the site will perform under load then you can use things like Artillery to throw requests at it. But sometimes problems arise from increasing amounts of data. This was the case for a site I helped develop a little while back.

It had been in production for a couple of years and was starting to have problems with some of the queries. I profiled some of them and found some query optimizations which cleared it up. But it was annoying that this hadn’t been caught earlier and that it was difficult to replicate in lower environments because they simply didn’t have enough data.

We could have generated data but for any sort of complex data model it’s difficult to create realistic data. Fortunately I knew of a place we could get really well structured data which would have the same performance profile as production: production. But this data contains sensitive information like phone numbers, addresses names and salaries. I didn’t want to just copy that over to the lower environments so I needed a way to clean up the data.

Initially I started with just throwing FakerJS at the data but performance of updating every row with values was not great. After some research I found the Postgres extension PostgreSQL Anonymizer which looked like it would fit the bill.

PostgreSQL Anonymizer

PostgreSQL Anonymizer is a Postgres extension which allows you to mask data in a variety of ways. It can be used to anonymize data or to mask it. It has a number of built-in functions for common tasks like replacing names with random names, addresses with random addresses and so on.

Getting it installed on Azure Postgres Flexible Server was a little tricky and I’ll probably post on that in a separate article. But once it was installed it was easy to use.

There are a couple of modes it can run in Dynamic and Static. In the dynamic mode it will leave the data in the table put but will apply rules based on the current user role to mask the data when queried. This is a pretty handy thing and you could use it for something like masking SSNs for any user other than the admin. Static mode will actually update the data in the table to a new value, per your rules. This is what I opted for as by default the masking wasn’t deterministic.

Using PostgreSQL Anonymizer

The script I put together to run after I had restored a backup into the test environment looked like this:

-- Start up the extension
CREATE EXTENSION anon;

-- Rules
SECURITY LABEL FOR anon ON COLUMN household.contact_email
  IS 'MASKED WITH FUNCTION anon.fake_email()';
SECURITY LABEL FOR anon ON COLUMN household.contact_phone
  IS 'MASKED WITH FUNCTION anon.random_int_between(10000000,90000000)';

SECURITY LABEL FOR anon ON COLUMN address.street1
  IS 'MASKED WITH FUNCTION anon.fake_address()';
SECURITY LABEL FOR anon ON COLUMN address.street2
  IS 'MASKED WITH FUNCTION anon.fake_address()';

...

-- Anonomize database statically
SELECT anon.anonymize_database();

You can see the sections there first enable the extension, then create a series of rules which will replace the data in the columns with fake data. Finally SELECT anon.anonymize_database(); will actually kick off the changes. A few seconds of crunching later and the data is all faked up and we can hand over the database to QA or developers without having to worry about sensitive data leaking.

2025-03-30

Open API Generator for C#

Every once in a while I run into the need to generate a C# client for some API which has been nice enough to provide me with OpenAPI specifications. But it’s one of those things that I do so infrequently that I always forget how to do it. So I thought I would document it here.

The first thing to do is to install the OpenAPI generator. It’s written in Java which is obviously a decision I’m solidly against. As I run a Mac most of the time I prefer to use brew since it’s easier that trying to figure out how to build stuff with Maven.

brew install openapi-generator

Now comes the fun part: figuring out the options to use. There are bunch of different generators for different languages and on top of that each generator has options. The C# generator is called csharp and the options are described in some detail here https://openapi-generator.tech/docs/generators/csharp. In my case I was looking to generator a client for a US Government API that they seem to be really snippy about people getting their hands on documentation about it without going through a heap of hoops so we’ll just call it USGovAPI because this administration is not one I want to be on the wrong side of.

In addition I wanted to generate one for .NET 8 because the project is still running on the long term support version of .NET. So the command ended up being

openapi-generator generate -i swagger.json -g csharp -o out/usgoveapi --additional-properties=packageName=USGovAPI,targetFramework=net8.0

And with that we get a nice little C# client that we can use to call the USGovAPI. It even includes some unit tests to go along with it.