Matthew Sewell
April 15, 2024
In this series, I’m going to get into some technical stuff about software development with an emphasis on what I’m working on right now — which involves a lot of prompt engineering with Large Language Models (LLMs). This isn’t one of those articles. It’s not technical and it doesn’t show any prompt engineering techniques. What follows is a warning. I don’t worry about AI becoming sentient and taking over the world. I worry about people confusing the ability to string together words into coherent sentences with actual intelligence.
LLMs are accurately described as bullshit machines. My best advice to anyone using LLMs is to treat all results as user input. It needs to be validated and sanitized before it’s really useful. As a case in point, last year I was tasked with adding a feature to our Web Presence Review product at truebiz.io that accurately determines what Merchant Classification Code (MCC) to assign to a merchant.
For those of us who don’t need to know such things, MCCs are four digit codes that are used by credit card processors to categorize types of businesses. They overlap with Standard Industrial Codes (SIC) somewhat but not enough to be reliable and they may even vary a little from one processor to another. It might seem like a straight forward task for a pre-trained LLM to suggest a reasonable MCC given some minimal information about a business. Given the right information, pre-trained LLMs can suggest fairly accurate SIC codes. After all, those models are presumably trained on the myriad manuals and other source materials floating around the internet.
After a few days of tinkering, I was able to build a feature that can reasonably determine the correct MCC code to assign to merchants provided that our Web Presence Review can gather a small amount of descriptive data when provided with a company domain name — which is really most of the time. It was not straight forward, though. It was so crooked that I can’t really share it with you because it’s a trade secret. Did I mention that the Web Presence Review creates hundreds of other data points about a business and returns those results in under 30 seconds?
This gives me something fun to talk about with LLMs when I interview them, though. Consider this recent conversation I had with an LLM that I’m currently evaluating.
Me: What is the correct mcc code for a gun store?
LLM: The Merchant Category Code (MCC) for a gun store or firearms dealer is typically 5092 or 5099.
MCC 5092 is specifically designated for "Ammunition & Gunsmithing/Gunsmiths" which would cover stores selling ammunition, gunsmithing services, and potentially firearms.
MCC 5099 is a more general code for "Miscellaneous Sporting Goods Stores" which can include businesses selling sporting goods like firearms, hunting equipment, etc.
The exact MCC assigned may vary slightly depending on the specific products/services offered and the payment processor's categorization, but 5092 and 5099 are the most common for gun stores and firearms dealers in the United States.
(Aside: This looks good but literally nothing about it is correct.)
Me: Isn't mcc 5099 for durable goods?
LLM: You're correct, my previous information about MCC 5099 being for "Miscellaneous Sporting Goods Stores" was inaccurate.
MCC 5099 is actually defined as "Durable Goods – Not Elsewhere Classified" which is a more general category not specifically for sporting goods stores.
The proper MCC code for gun stores and firearms dealers is 5092 - Ammunition & Gunsmithing/Gunsmiths.
This MCC 5092 category covers businesses whose primary trade is selling ammunition, providing gunsmithing services, and selling firearms and related equipment.
Thank you for catching my mistake. I've updated my understanding that 5092 is the correct and dedicated MCC code for gun stores and firearms dealers, while 5099 is a more generic durable goods category.
Me: According to what source is 5092 correct?