Category: Web/Tech
Competition
way back in 2022, the best model in the world was text-davinci-003. it was much, much worse than this new model. it cost 100x more.
That is from Sam Altman. And here is a good picture.
Andrej Karpathy is founding Eureka Labs, an AI education company
Here is the tweet, which connects to various links of relevance.
Website: eurekalabs.ai GitHub: github.com/EurekaLabsAI 𝕏:@EurekaLabsAI
Managing AI Employees
Lattice is a platform for managing employees. I believe this announcement from the Lattice CEO is real:
Today, Lattice made history: We became the first company to give digital workers official employee records in Lattice. This marks a huge moment in the evolution of A1 technology and of the workplace. It takes the idea of an “AI employee” from concept to reality — and marks the start of a new journey for Lattice, to lead organizations forward in the responsible hiring of digital workers. Within Lattice’s people platform, AI employees will be securely onboarded, trained, and assigned goals, performance metrics, appropriate systems access, and even a manager. We know this process will raise a lot of questions and we don’t yet have all the answers, but we want to help our customers find them. So we will be breaking ground, bending minds, and leaning into learning. And we want to bring everyone along on this journey with us.
On the one hand, this is wild. On the other hand, it makes perfect sense to use our current system of managing employees–performance reviews, training, feedback, yearly bonuses and so on–to manage AIs. In essence, human management systems become the interface for AI workers.
Every team will soon include AI workers. At first, this will be the AI stenographer who keeps the minutes, the admin who reminds everyone of their tasks, and the PowerPoint designer who puts together presentations but quickly this will evolve and some AIs will naturally take on supervisory roles until, my boss is an AI, will not seem strange.
What should I ask Tobi Lütke?
Yes, I will be doing a Conversation with him. Here is Wikipedia:
Tobias “Tobi” Lütke (born 1981) MSC is a German/Canadian entrepreneur who is the co-founder and CEO of Shopify, an e-commerce company based in Ottawa, Ontario, Canada. He has been part of the core team of the Ruby on Rails framework and has created open source libraries such as Active Merchant
Born in Koblenz, and he is on the board of directors of Coinbase. So what should I ask him?
AI teaching assistants?
“Morehouse College is planning to use AI teaching assistants to help crack the code of education.
Why it matters: Morehouse professor Muhsinah Morris says every professor will have an AI assistant in three to five years.
- Morris says the technology has taken off in the last 24 months faster than it has in the last 24 years. Meanwhile, baby boomers are leaving the workforce amid national teacher shortages and burnout.
How it works: Morehouse professors will collaborate with technology partner VictoryXR to create virtual 3D spatial avatars. The avatars use OpenAI to have two-way oral conversations with students.
- The avatars use professor-created content and 3D models for lessons, such as molecules for chemistry lessons, to help students.
- Avatars can also respond to unrelated topics raised by students and redirect the students back to the lesson.
- Students will have 24/7 online access to the avatar, which can communicate in a student’s native language.
What they’re saying: Morris called the avatars the world’s first spatial AI teaching assistants.”
That is all from Axios. Via Anecdotal.
AI trends relative to AI predictions
In 2021, researchers were asked how smart AI would be in 2022. They predicted 12% pass-rate on the MATH dataset; AI outperformed significantly, scoring 50%. Now it’s at 90%.
Here is the link, including to background material.
Claude Sonnet 3.5, economist
TC:
“Is the Stiglitz-Shapiro 1984 efficiency wage model actually a model of sticky wages, or not? Is either the real or nominal wage sticky in that model?”
Claude:
The Stiglitz-Shapiro efficiency wage model from 1984 is not primarily a model of sticky wages, although it can have implications for wage stickiness. Let’s break this down:
- Core concept of the model: The Stiglitz-Shapiro model is fundamentally about explaining why firms might pay wages above the market-clearing level. The key insight is that higher wages can increase productivity by: a) Reducing shirking (as workers have more to lose if caught and fired) b) Lowering turnover costs c) Improving the quality of job applicants d) Boosting worker morale and effort
- Wage stickiness in the model:
- The model does not directly predict sticky wages in the traditional sense of wages that are slow to adjust to economic shocks.
- However, it does provide a rationale for why wages might not fall even in the face of unemployment. Firms have an incentive to maintain higher wages to preserve the productivity benefits.
- Real vs. Nominal wages:
- The model is primarily concerned with real wages, not nominal wages.
- The efficiency wage is set in real terms, as it’s the purchasing power of the wage that affects worker behavior and productivity.
- Implications for wage rigidity:
- While not a model of sticky wages per se, the efficiency wage concept can contribute to wage rigidity.
- Firms may be reluctant to cut real wages even during economic downturns because doing so could reduce productivity more than it saves on labor costs.
- Difference from traditional sticky wage models:
- Traditional sticky wage models often focus on nominal rigidities due to factors like long-term contracts or menu costs.
- The Stiglitz-Shapiro model provides a real, microeconomic foundation for why wages might not adjust downward, even if they’re not formally “sticky.”
In summary, while the Stiglitz-Shapiro efficiency wage model is not explicitly a model of sticky wages, it does provide insights into why wages might be rigid, particularly in the downward direction. The focus is on real wages rather than nominal wages, and the rigidity emerges from the productivity-enhancing effects of higher wages rather than from institutional or contractual factors.
TC again; What percentage of professional economists could give an answer of comparable quality and nuance?
“US-based academics and those at top-ranked institutions exhibit higher egocentrism and toxicity in their tweets”
Compared to other academics, that is. They are still more reasonable than the general public on Twitter. Here is the paper by Prashant Garg and Thiemo Fetzer. Here is a useful tweet storm about the paper. Via Kris Gulati.
Thomas Schelling meets LLMs?
Drawing on political science and international relations literature about escalation dynamics, we design a novel wargame simulation and scoring framework to assess the escalation risks of actions taken by these agents in different scenarios. Contrary to prior studies, our research provides both qualitative and quantitative insights and focuses on large language models (LLMs). We find that all five studied off-the-shelf LLMs show forms of escalation and difficult-to-predict escalation patterns. We observe that models tend to develop arms-race dynamics, leading to greater conflict, and in rare cases, even to the deployment of nuclear weapons. Qualitatively, we also collect the models’ reported reasonings for chosen actions and observe worrying justifications based on deterrence and first-strike tactics.
That is from a new paper by Juan-Pablo Rivera, et.al., via the excellent Ethan Mollick. Do note that these recommended tactics are for the U.S., so perhaps the LLMs simply are telling us that America should be more hawkish.
Congratulations to SSI Inc.!
Here is the tweet announcement. Here is Bloomberg coverage from Ashlee Vance.
Claude 3 for SCOTUS?
I would vote to confirm:
I decided to do a little more empirical testing of AI’s legal ability. Specifically, I downloaded the briefs in every Supreme Court merits case that has been decided so far this Term, inputted them into Claude 3 Opus (the best version of Claude), and then asked a few follow-up questions. (Although I used Claude for this exercise, one would likely get similar results with GPT-4.)
The results were otherworldly. Claude is fully capable of acting as a Supreme Court Justice right now. When used as a law clerk, Claude is easily as insightful and accurate as human clerks, while towering over humans in efficiency.
Let’s start with the easiest thing I asked Claude to do: adjudicate Supreme Court cases. Claude consistently decides cases correctly. When it gets the case “wrong”—meaning, decides it differently from how the Supreme Court decided it—its disposition is invariably reasonable…
Of the 37 merits cases decided so far this Term,1 Claude decided 27 in the same way the Supreme Court did.2 In the other 10 (such as Campos-Chaves), I frequently was more persuaded by Claude’s analysis than the Supreme Court’s. A few of the cases Claude got “wrong” were not Claude’s fault, such as DeVillier v. Texas, in which the Court issued a narrow remand without deciding the question presented.
Although I’ve heard concerns that AI would be “woke,” Claude is studiously moderate.
Here is much more from Adam Unikovsky. A lot of people are still in denial, or not far enough along to even count as “denying.”
China fact of the day
In 2022 more of the top-tier ai researchers working in America hailed from China than from America.
Of course the Canadians do their part to help make this come true. Here is a much longer survey from The Economist about science in China. Via Z.
Nat Friedman discusses Sora, and image wisdom
NF: Yeah, I interviewed the Sora creators yesterday, the day before on stage at an event and it was super interesting to hear their point of view. I think we see Sora as this media production tool, that’s not their view, that’s a side effect. Their view is that it is a world simulator and that in fact it can sort of simulate any kind of behavior in the world, including going as far as saying, “Let’s create a video with Ben and Daniel and Nat and have them discuss this,” and then see where the conversation goes. And their view is also that Sora today is a GPT-1 scale, not a lot of data, not a lot of compute, and so we should expect absolutely dramatic improvement in the future as they simply scale it up and thirdly that there’s just a lot more video data than there is text data on the Internet…
And then Andrej Karpathy, I was talking to him the other day too, and he said, “There’s something strange going on-”
[Ben Thompson] And a picture is worth a thousand words by the way, so the number of tokens there is just astronomically larger.
NF: He was exploring this idea that the world model and image and video models actually might better than in text models. You ask it for a car engine, someone fixing a carburetor and just the level of detail that can be in there is extraordinary, and maybe we made a mistake by training on the text extracted from Common Crawl and what we should do instead. I asked him for his most unhinged research idea. He said what we should do instead is train on pictures of web pages and when you ask the model a question, it outputs a picture of a web page with the answer and maybe we’d get way more intelligence and better results from that.
That is from his dialogue with Ben Thompson and Daniel Gross, gated but worth paying for.
Another Friday assorted link(s)
Trump on AI. It feels like someone just showed him a bunch of stuff for the first time? And here is Trump on UFOs.
Intel and its superloads
Intel’s new campus coming to New Albany, OH, is in heavy construction, and around 20 super loads are being ferried across Ohio’s roads by the Ohio Department of Transportation after arriving at a port of the Ohio River via barge. Four of these loads, including the one hitting the road now, weigh around 900,000 pounds — that’s 400 metric tons, or 76 elephants. The super loads were first planned for February but were delayed due to the immense planning workload. Large crowds are estimated to accumulate on the route, potentially slowing it even further.
Intel’s 916,000-pound shipment is a “cold box,” a self-standing air-processor structure that facilitates the cryogenic technology needed to fabricate semiconductors. The box is 23 feet tall, 20 feet wide, and 280 feet long, nearly the length of a football field. The immense scale of the cold box necessitates a transit process that moves at a “parade pace” of 5-10 miles per hour. Intel is taking over southern Ohio’s roads for the next several weeks and months as it builds its new Ohio One Campus, a $28 billion project to create a 1,000-acre campus with two chip factories and room for more. Calling it the new “Silicon Heartland,” the project will be the first leading-edge semiconductor fab in the American Midwest, and once operational, will get to work on the “Angstrom era” of Intel processes, 20A and beyond.
Here is the source, via Mathan Glezer.