Talking Into The Ether

On how and what we pay attention to

đź”· Subscribe to get breakdowns of the most important developments in AI in your inbox every morning.

Here’s today at a glance:

đź’» Mindshifts in Sofware Engineering

If there is a generative AI person in your circle of friends, yes, they’ve probably felt that they were going slightly crazy in the last year. Only a few generative AI systems have truly succeeded in production, and they’ve been hard to build. This wonderful paper from Microsoft-Github, the first organization outside OpenAI to unveil a large-scale system, makes clear the challenges and opportunities:

  • Time-consuming process of trial and error

    • “Early days, we just wrote a bunch of crap to see if it worked. Experimenting is the most time-consuming [thing] if you don’t have the right tools. We need to build better tools”

  • Wrangling prompt output

    • “It would make up objects that didn’t conform to that JSON schema, and we’d have to figure out what to do with that“

    • “if the model is kind of inherently predisposed to respond with a certain type of data, we don’t try to force it to give us something else because that seems to yield a higher error rate“ - on eventually accepting that file trees would be better generated as ASCII output and then parsed

  • Prompt management

    • its “a mistake doing too much with one prompt“

    • “So we end up with a library of prompts and things like that.”

  • Every test is a flaky test

    • “that’s why we run each test 10 times“

    • “If you do it for one scenario no guarantee it will work for another scenario“

    • “[manually curated spreadsheets with hundreds of] input/output examples“ - on how they managed testing

  • Creating benchmarks and reaching testing adequacy

    • “especially for more qualitative output than quantitative, it might just be humans in the loop saying yes or no [but] the hardest parts are testing and benchmarks [still]”

    • “most of these, like each of these tests, would probably cost 1-2 cents to run, but once you end up with a lot of them, that will start adding up anyway”

    • “Where is that line that clarifies we’re achieving the correct result without overspending resources and capital to attain perfection?”

  • Safety and privacy

    • “We have telemetry, but we can’t see user prompts, only what runs in the back end, like what skills get used. For example, we know the explain skill is most used but not what the user asked to explain.”

    • “telemetry will not be sufficient; we need a better idea to see what’s being generated.”

  • Mindshifts in software engineering

    • “So, for someone coming into it, they have to come into it with an open mind, in a way, they kind of need to throw away everything that they’ve learned and rethink it. You cannot expect deterministic responses, and that’s terrifying to a lot of people. There is no 100% right answer. You might change a single word in a prompt, and the entire experience could be wrong. The idea of testing is not what you thought it was. There is no, like, this is always 100% going to return that yes, that test passed. 100% is not possible anymore“

The whole paper is interesting as an oral history into the first contact between humanity and an alien intelligence. Hat Tip @vboykis

📱 Tweets over Peer Review

You’ve done everything right. Waited 18 months for peer review, answered nonsense comments, submitted to a prestigious conference…Then you find out, submitting your paper to a preprint server, and getting tweeted by an “academic influencer“ would have been a better use of your time. Or at least that’s what this paper says:

Measuring citation count, it finds:

  • Mention by Twitter users @_akhaliq (who does this professionally for Hugging Face as of last year) and @arankomatsuzaki (working on EleutherAI) had a 2x-3x impact on citation count

  • They serve as curators for the entire ML/AI research space

I often feel like current-day academia engages in cargo cult rituals around fundamental information creation, quality control, curation, and dissemination. It’s good to start to see these rituals decay. The paper’s authors recommend:

  • Re-evaluation of traditional models of paper selection and review - OK

  • Evolve - towards what?

The emergent system here seems to be to

  • push out research quickly to preprint servers

  • include datasets, and other materials so that reviewers can analyze data for themselves rather than just relying on your conclusions

  • publicize amongst the research community on social media and other platforms

  • receive and respond to feedback in the open in real time

The advantages this system has are in speed, lack of fashionable gatekeepers, and greater attention and scrutiny to important papers with meaningful results. The disadvantages are in clear Pareto winner-take-most dynamics and a Gartner hype cycle around research that may obscure how useful it really is.

Meanwhile, the MAMBA (potentially the next big architecture after the current one) authors get rejected from ICLR 2024:

Aa Yann LeCun had said before:

In general, the problems around curation, around knowing what is important and what to pay attention to, only become ever more pressing as the flood of information ascends the elbow of the exponential curve. We are finally at the stage where the existing gatekeepers have been completely overwhelmed, and new ways of allocating attention are required.

🗞️ Things Happen

  • The MidJourney Discord has 18,502,417 users. At $8 per month x 20% did not churn x 12 months = $400 million ARR. At current AI multiples of 70x revenue, that’s a valuation of $28 billion. Yes, that’s my back of envelope.. what’s 5-10% more churn amongst friends.

  • The students and researchers who came up with the shorter embeddings that underpin OpenAI’s recent update, chase OpenAI down on Twitter, and get acknowledgement. It’s pretty clear at this point that AI firms are productizing published research as quickly as possible, but are under no pressure to disclose or cite the technologies used. Some noblesse oblige would be nice perhaps.

  • FTC starts inquiries on AI firm-big tech partnerships. I invest 100 mil in your company at 1 bil valuation, then you buy 100 mil of my cloud services. With my public company P/E ratio of 30, that’s work 3 bil in market cap for me. That was the dotcom era story of round-tripping, which may be echoing in AI right now

🖼️ AI Artwork of The Day

Girl With A Pearl Earring in the style of Grand Theft Auto - @ARTiV3RSE on X

That’s it for today! Become a subscriber for daily breakdowns of what’s happening in the AI world:

Reply

or to participate.