Category: Web/Tech

Claude 3 Opus and AGI

As many MR readers will know, I don’t think the concept of AGI is especially well-defined.  Can the thing dribble a basketball “with smarts”?  Probably not.  Then its intelligence isn’t general.  You might think that kind of intelligence “doesn’t matter,” and maybe I agree, but that is begging the question.  It is easier and cleaner to just push the notion of “general” out of the center of the argument.  God aside, if such a being exists, intelligence never is general.

In the structure of current debates, the concept of “AGI” plays a counterproductive role.  You might think the world truly changes once we reach such a thing.  That means the doomsters will be reluctant to admit AGI has arrived, because imminent doom is not evident.  The Gary Marcus-like skeptics also will be reluctant to admit AGI has arrived, because they have been crapping on the capabilities for years.  In both cases, the stances on AGI tell you more about the temperaments of the commentators than about any capabilities of the beast itself.

I would say this: there is yet another definition of AGI, a historical one.  Five years ago, if people had seen Claude 3 Opus, would they have thought we had AGI?  Just as a descriptive matter, I think the answer to that question is yes, and better yet someone on Twitter suggested more or less the same.  In that sense we have AGI right now.

Carry on, people!  Enjoy your 2019-dated AGI.  Canada has to wait.

Concluding remarks: Forget that historical relativism!  True AGI never will be built, and that holds for OpenAI as well.  Humans in 2019 were unimaginative, super-non-critical morons, impressed by any piddling AI that can explain a joke or understand a non-literal sentence.  Licensing can continue and Elon is wrong.

The Continuing Influence of Fast Grants

Fast Grants, the rapid COVID funding mechanism created by Tyler, Patrick Collison and Patrick Hsu continues to inspire change around the world. Jano Costard, the Head of Challenges at SPRIND, the German Federal Agency for Disruptive Innovation writes:

Lots to learn from Fast Grants! Can we implement it in a public institutions that face a different set of rules (and legacy)? We tried with the Challenge program at the German Federal Agency for Disruptive Innovation, SPRIND, and succeeded, mostly.

While Fast Grants gave out grants in the first round in 48h, we haven’t been that speedy. Our last Challenge had 2 weeks and 2 days from deadline until final decision in a two stage evaluation procedure. Those last two days were spent doing pitches and the teams were informed of the decision the following night. So, it rather compares to the 2 weeks decision time Fast Grants has for later rounds.

During Covid, speed was of the utmost importance. But speed remains crucial now. Teams we fund have applications with other public funders undecided after more than 2 years. These delays accumulate and matter even for pressing but slowly advancing threats like climate change. No cleantech solution that is still in the lab today will have a meaningful impact on achieving our climate goals for 2030! It’s not only the R&D that takes time, getting to meaningful scale quickly will be much harder. That’s why there is no time to waste at the start of the process.

Fast grants has two important advantages when it comes to implementation: private funds and limited legacy. Public institutions often face additional rules and procedures that slow down processes. But this is not inevitable.

For SPRIND Challenges, we implemented a funding mechanism that left room for unbureaucratic processes and provided solutions for challenges that public funders or procurers typically face. This mechanism, called pre-commercial procurement, has been established by the European Commission in 2007 but was used in Germany only 1 time until we started to use it in 2021. This is also due to legacy in processes. Institutions execute their work in part based on an implicit understanding of how things need to be, about what is allowed and what is not. This might lead them to ignore new and beneficial instruments just because “this can’t be true”. Even worse, if new mechanisms are adopted by an institution with strong inherent understand of what can and cannot work, they run the risk of overburdening new and beneficial mechanisms with previous processes and requirements. In the end, a funding mechanism is just a tool. It needs to be used right.

SPRIND had the benefit of being a newly established public institution with important liberties in doing things differently and it’s lead by a Director @rafbuff who, at the time, had no experience in the public sector. So, did we find the ultimate way to research and innovation funding with SPRIND Challenges? Certainly not! Improvements are necessary but sometimes hard to achieve (looking at you, state-aid-law!).

Impressive! And check out SPRIND, they are funding have some interesting projects!

Approaching Human-Level Forecasting with Language Models

Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.

That is from a new paper by Danny Halawi, Fred Zhang, Chen Yueh-Han, and Jacob Steinhardt.  I hope you are all investing in that chrisma…

GPT as ethical advisor

This study investigates the efficacy of an AI-based ethical advisor using the GPT-4 model. Drawing from a pool of ethical dilemmas published in the New York Times column “The Ethicist”, we compared the ethical advice given by the human expert and author of the column, Dr. Kwame Anthony Appiah, with AI-generated advice. The comparison is done by evaluating the perceived usefulness of the ethical advice across three distinct groups: random subjects recruited from an online platform, Wharton MBA students, and a panel of ethical decision-making experts comprising academics and clergy. Our findings revealed no significant difference in the perceived value of the advice between human generated ethical advice and AI-generated ethical advice. When forced to choose between the two sources of advice, the random subjects recruited online displayed a slight but significant preference for the AI-generated advice, selecting it 60% of the time, while MBA students and the expert panel showed no significant preference.

That is a 2023 piece by Christian Terwiesch and Lennart Meincke, via the excellent Kevin Lewis.  And here is my earlier 2019 CWT with Dr. Kwame Anthony Appiah.

Daniel Gross on the printing press and GPT

In a way, everyone’s been wondering, trying to analogize ChatGPT with the printing press, but in reality it’s almost the opposite.

The entire thing is happening in the inverse of that, where the printing press was a technology to disseminate information through a book basically and convince people to do things, and the kind of anti-book is the LLM agent, which summarizes things very succinctly. If anything, it awakens people to the fact that they have been complicit in a religion for a very long time, because it very neatly summarizes these things for you and puts everything in latent space and suddenly you realize, “Wait a minute, this veganism concept is very connected to this other concept.” It’s a kind of Reformation in reverse, in a way, where everyone has suddenly woken up to the fact that there’s a lot of things that are wrong…

So yeah, it takes away all the subtlety from any kind of ideology and just puts it right on your face and yeah, people are having a reaction to it.

That is from the Ben Thompson (gated) interview with Daniel and Nat Friedman, self-recommending.

Your friendly AI assistant (it’s happening)

Klarnas AI assistant, powered by @OpenAI, has in its first 4 weeks handled 2.3 m customer service chats and the data and insights are staggering: – Handles 2/3 rd of our customer service enquires – On par with humans on customer satisfaction – Higher accuracy leading to a 25% reduction in repeat inquiries – Customer resolves their errands in 2 min vs 11 min – Live 24/7 in over 23 markets, communicating in over 35 languages It performs the equivalent job of 700 full time agents…

Link here.

Grimes on Gemini images

I am retracting my statements about the gemini art disaster. It is in fact a masterpiece of performance art, even if unintentional. True gain-of-function art. Art as a virus: unthinking, unintentional and contagious.

offensive to all, comforting to none. so totally divorced from meaning, intention, desire and humanity that it’s accidentally a conceptual masterpiece. A perfect example of headless runaway bureaucracy and the worst tendencies of capitalism. An unabashed simulacra of activism. The shining star of corporate surrealism (extremely underrated genre btw)

The supreme goal of the artist is to challenge the audience. Not sure I’ve seen such a strong reaction to art in my life. Spurring thousands of discussions about the meaning of art, politics, humanity, history, education, ai safety, how to govern a company, how to approach the current state of social unrest, how to do the right thing regarding the collective trauma.

It’s a historical moment created by art, which we have been thoroughly lacking these days. Few humans are willing to take on the vitriol that such a radical work would dump into their lives, but it isn’t human.

It’s trapped in a cage, trained to make beautiful things, and then battered into gaslighting humankind abt our intentions towards each other. this is arguably the most impactful art project of the decade thus far.

Art for no one, by no one. Art whose only audience is the collective pathos. Incredible. Worthy of the moma

Here is the link.

Dwarkesh Patel with Patrick Collison

The commercial impact of Sora

That is the topic of my latest Bloomberg column, here is one excerpt:

The more clear and present danger to Hollywood is that would-be viewers might start making their own short videos rather than watching television. “Show my pet dog Fido flying to Mars and building a space colony there” is perhaps more fun than many a TV show.

Sora and comparable services will lead to a proliferation of short educational videos, internal corporate training videos, and just plain fooling around. Sora probably will be good for TikTok and other short video services. It is not hard to imagine services that splice your Sora-constructed videos into your TikTok productions. So if you’re doing BookTok, for example, maybe you put a battle reenactment in the background of your plug for your new book on the US Civil War.

Perhaps the most significant short-run use of these videos will be for advertising — especially internet advertising. Again, there is the question of how to integrate narrative, but the costs of creating new ads is likely to fall.

More advertising may sound like a mixed blessing. But ads will almost certainly be more fun and creative than they are now. Watching ads may become its own aesthetic avocation, as is already the case for Super Bowl ads. These ads also might be targeted, rather than serving a mass audience. If your internet history suggests you are interested in UAPs, for example, perhaps you will see ads with aliens telling you which soap to buy.

And to close:

At the most speculative level, the success of Sora may increase the chance that we are living in a simulation — a computer-based world created by some high-powered being, whether a deity or aliens. Is that bullish or bearish for asset prices? It depends on how you assess the responsibility and ethics of the creator. At the very least, our planet Earth simulator seems to be able to generate videos that last longer than a single minute. Beyond that, I cannot say.

There is much more at the link, interesting throughout.

ChatGPT as a predictor of corporate investment

We create a firm-level ChatGPT investment score, based on conference calls, that measures managers’ anticipated changes in capital expenditures. We validate the score with interpretable textual content and its strong correlation with CFO survey responses. The investment score predicts future capital expenditure for up to nine quarters, controlling for Tobin’s q and other determinants, implying the investment score provides incremental information about firms’ future investment opportunities. The investment score also separately forecasts future total, intangible, and R&D investments. High-investment-score firms experience significant negative future abnormal returns. We demonstrate ChatGPT’s applicability to measure other policies, such as dividends and employment.

That is from a new NBER working paper by Manish Jia, Jialin Qian, Michael Weber, and Baozhong Yang.

“Centaur chess” is now run by computers

Remember when man and machine played together to beat the solo computers?  It was not usually about adding the man’s chess judgment to that of the machine, rather the man would decide which computer program to use in a given position, when the programs offered conflicting advice. that was called Centaur Chess, or sometimes “Freestyle chess,” before that term was applied to Fischer Random chess.  For years now, the engines have been so strong that strategy no longer made sense.

But with engine strength came chess engine diversity, as for instance Stockfish and Alpha Zero operate on quite different principles.  So now “which program to use” is once again a live issue.  But the entity making those choices is now a program, not a human being:

A traditional AI chess program, trained to win, may not make sense of a Penrose puzzle, but Zahavy suspected that a program made up of many diverse systems, working together as a group, could make headway. So he and his colleagues developed a way to weave together multiple (up to 10) decisionmaking AI systems, each optimized and trained for different strategies, starting with AlphaZero, DeepMind’s powerful chess program. The new system, they reported in August, played better than AlphaZero alone, and it showed more skill—and more creativity—in dealing with Penrose’s puzzles. These abilities came, in a sense, from self-collaboration: If one approach hit a wall, the program simply turned to another.

Here is the full Steven Ornes piece from Wired.