Stop Measuring AI Like Ranking

A ranking position is a neat little number. AI visibility is messier: a company can be named, misdescribed, uncited, half-cited, or replaced by a source that knows less but speaks more clearly.

The spreadsheet looks comforting until you read the cells aloud. Query in column A. ChatGPT answer in column B. Mentioned sources in column C. Position, perhaps, if someone has tried to force a ranking out of a paragraph. In a composite scenario resembling several service-area audits, a twelve-person industrial maintenance company in Auvergne-Rhône-Alpes wanted to know where it “ranked” in AI answers for food-processing equipment maintenance. The answer was awkward: sometimes first, sometimes absent, sometimes named but described as a general repair provider, and once placed in the wrong department.

Calling that a ranking problem would have hidden the actual damage. The company did not merely need to appear higher. It needed to be described correctly. Its service area had to remain stable. Its specialist equipment work had to survive compression. The source used by the answer had to come from the company page or a reliable supporting source, not a copied city page or a broad directory. A rank column could not show that. It was measuring a shadow.

AI answers do not have one stable shelf

Classic SEO trained us to look for position. The habit is strong because position is simple. A page sits above another page. A business appears in a local pack. A listing moves. There are complications, of course, but the mental model has a shelf: first, second, third.

AI answers are built more like a short briefing than a shelf. A model may produce a paragraph, a list, a comparison, a caveat, or a refusal to recommend specific companies. It may name three businesses in no obvious order. It may mention one firm in the text but cite another source. It may describe a business from memory-like patterns rather than live source retrieval, depending on the system and the interface. Treating all of that as ranking makes the measurement falsely tidy.

AI visibility is the degree to which an answer engine names, describes and sources a business accurately for the queries that matter, because citation without correct description can still damage buyer understanding. That definition is the one I use in audits. It refuses the comfort of one number. It asks whether the answer is commercially usable.

For the maintenance company, being named was not enough. If the answer described the firm as a general repair provider, the specialist positioning was lost. If it widened the service area too far, the sales team might receive poor-fit enquiries. If it cited a directory while ignoring the service page, the company had less control over future drift. These are separate failures. A rank score would mash them into one vague success.

I call the old habit position mimicry: taking a measurement shape from Google and pressing it onto AI answers where it does not fit. Position mimicry feels efficient. It usually delays the useful work.

The first measure is description accuracy

When I begin an AI visibility review, I ask a plain question: if a buyer believed this answer, would they understand the business correctly? That question sounds almost too simple. It catches more errors than a ranking sheet.

Description accuracy covers the entity, service, location, customer type and limits. For the industrial maintenance company, the accurate description might say that the team maintains food-processing equipment across defined departments in Auvergne-Rhône-Alpes. It should not become “industrial repair services near Lyon” unless that is the actual positioning. It should not imply domestic appliance repair. It should not invent a branch. It should not omit the food-processing context if that is the reason the company matters.

A useful audit records the answer’s wording, then marks each claim as correct, vague, wrong or unsupported. “Correct” means the company page or a reliable source states it. “Vague” means the answer is not exactly false but loses commercial meaning. “Wrong” means the answer contradicts the business. “Unsupported” means the answer may be true in real life but is not proven by the available page. That last category matters. Private truth does not help an answer engine cite safely.

The rough detail is often where the work begins. In one run, the model may name the company and get the equipment category right, but call the coverage “national.” In another, it may keep the region but forget the sector. These are not equal outcomes. The second may be easier to fix because the service-area signal is stable. The first may reveal a dangerous absence of limits.

Description accuracy turns AI monitoring from vanity into page work. It tells you which sentence needs repair.

Citation presence is weaker than citation quality

Many teams ask, “Are we cited?” I understand why. A citation feels tangible. The business can point to it. The marketing report can show it. Still, citation presence by itself is a thin measure. A company can be cited from the wrong page. It can be cited after a bad description. It can be named because a directory mentioned it, while the company’s own source remains unused.

Citation quality asks where the answer got its usable facts. Did it cite the service page, the homepage, a location page, a review platform, a directory, a competitor comparison, or an old cached summary? Each source has a different risk. A homepage citation may be fine for broad identity. It may be too weak for a specialised product or service. A copied local page may create service-area errors. A directory may simplify the category. A competitor page may frame the business in someone else’s language.

In the maintenance-company scenario, one answer cited a page that was technically on the company’s site but was one of the duplicated city pages. That looked like success until we read it. The page described the business so broadly that the answer repeated the broadness. Another answer cited a directory and got the service area closer, but missed the equipment specialism. Which one is better? Neither is clean. The measurement has to show both faults.

I like to record citation quality in prose before reducing it to tags. “Own regional service page, accurate service, vague area.” “Aggregator, correct area, weak service boundary.” “Own homepage, entity correct, specialist service omitted.” These little notes are not elegant. They stop the report from lying.

The best outcome is not merely “cited.” It is cited from the page that can support the claim being made.

Source stability matters because answers drift

An AI answer may be correct once and unstable across repeated runs. That does not mean the system is useless. It means the measurement has to respect variability. A business owner who runs one prompt, sees one mention, and declares victory is doing the same thing as checking the weather through a keyhole.

Source stability measures whether the same kind of source keeps supporting the same business description across repeated prompts and answer engines. I do not expect perfect repetition. I look for patterns. Does the company’s own service page appear often enough? Do directories keep replacing it? Does the answer switch between city pages? Does the English page explain the service better than the French page? Does a competitor become the accidental explainer for the category?

For a French SMB, source stability is especially important where there are parallel French and English pages, old SEO landing pages, and directory fragments. The web around the business may contain several versions of the same truth. AI systems can borrow from any of them. If the site does not choose a source of record, the answer may choose for it.

This is where monthly monitoring can be useful, but only if the monitoring reads the answers closely. Counting mentions over time is too crude. A month with more mentions and worse descriptions is not progress. A month with fewer mentions but stronger own-site citations for the right queries may be a better signal, depending on the commercial context.

Forecasts should be handled carefully here. If answer engines continue to mix retrieval, summarisation and model memory in shifting ways, source drift will remain part of the work. That is a reasonable expectation, not a law. The practical response is to build pages that are stable enough to be reused when the system goes looking.

Service-boundary correctness is the commercial measure

The most expensive AI error is often not absence. It is the wrong promise. A company can recover from not appearing in a few answers. It has a harder time when buyers arrive with a false understanding of what the company does.

Service-boundary correctness asks whether the AI answer preserves the limits of the offer. For the industrial maintenance company, the boundary includes the equipment type, the sector, the region, and the kind of intervention. Preventive maintenance is not the same as emergency repair. Food-processing equipment is not all industrial machinery. A service area across several departments is not a national network. These limits are commercial facts, not copy details.

This measure also protects the sales team. If AI answers widen the boundary, the company receives poor-fit enquiries. If they narrow it, the company loses good-fit buyers. If they blur the sector, the company becomes comparable with firms it should not be compared with. Ranking language cannot show that. A business “ranking” in an answer while being misframed may be worse off than a business absent from that answer.

The rewrite work follows the boundary errors. If AI widens the region, the page needs a clearer area sentence. If it genericises the service, the page needs equipment examples and exclusions. If it confuses the buyer type, the page needs a customer sentence. If it cites the wrong page, internal links and page hierarchy need attention. Measurement becomes instruction.

That is why I dislike dashboards that end at visibility. They show whether the business is seen. They do not show whether it is seen as itself.

A useful AI visibility ledger is small and strict

I keep the measurement ledger deliberately plain. Too many columns invite decoration. The useful columns are usually query, engine or answer source, answer excerpt, business mentioned or absent, description accuracy, citation source, source quality, service-boundary error, rewrite instruction and date. The date matters because answers can change, but I avoid pretending one date proves a permanent state.

For the maintenance scenario, a row might read: “food-processing equipment maintenance Auvergne-Rhône-Alpes”; answer names the company; description says “industrial repair”; source is duplicated city page; error is service genericisation; rewrite instruction is to create one regional food-processing maintenance page with equipment examples and coverage limits. That row is ugly and useful. It tells the business what to change.

A second row might say: answer omits the company; cites directory and competitor guide; company page lacks a liftable sentence naming food-processing equipment; rewrite instruction is to add one sentence near the top of the service page and link to it from old local pages. Again, not glamorous. Good.

Over time, the ledger should show whether the company is becoming easier to describe. Are wrong locations decreasing? Are own-site citations increasing for service-specific queries? Are directory citations less dominant? Are answers keeping the specialist boundary? These are better questions than “where do we rank?” because they match the way AI answers actually shape buyer understanding.

The final report can still include a simple summary. Owners need decisions, not a museum of prompt runs. But the summary should rest on close reading, not position mimicry.

The better question is: what can the model safely say?

A French business does not need to win an abstract AI race. It needs answer engines to say the right thing when a buyer asks a commercially relevant question. That changes the work. The target is not a universal position. It is a stable, accurate, sourced description.

This is a quieter goal than ranking, and perhaps less satisfying at first. There is no trophy number. There is a paragraph. There is a citation. There is the absence of a wrong city. There is the survival of one service boundary after compression. Yet those are the details that decide whether an AI answer helps or harms the business.

When I read a page now, I ask what the model can safely say if it only had this page in front of it. If the answer is vague, the page is vague. If the answer must guess, the proof is missing. If the answer can quote one clean sentence and stay accurate, the page has begun to do its new job.

That is the shift from SEO measurement to GEO measurement. Less shelf, more sentence. Less position, more description. It feels smaller. It is closer to the buyer.

The Lift Note

Query: “mesurer visibilité ia entreprise.” Liftable sentence: “AI visibility should be measured by accurate description, citation quality, source stability and service-boundary correctness, not by ranking position alone.” Missing proof: a ledger that records what the answer says, which source supports it and where the business is misdescribed. Rewrite instruction: replace rank-style reporting with query-by-query notes that connect each AI error to a specific page sentence that needs repair.