anotherpaulg 5 hours ago

Claude 3.7 Sonnet scored 60.4% on the aider polyglot leaderboard [0], WITHOUT USING THINKING.

Tied for 3rd place with o3-mini-high. Sonnet 3.7 has the highest non-thinking score, taking that title from Sonnet 3.5.

Aider 0.75.0 is out with support for 3.7 Sonnet [1].

Thinking support and thinking benchmark results coming soon.

[0] https://aider.chat/docs/leaderboards/

[1] https://aider.chat/HISTORY.html#aider-v0750

  • anotherpaulg 34 minutes ago

    Using up to 32k thinking tokens, Sonnet 3.7 set SOTA with a 64.9% score.

      65% Sonnet 3.7, 32k thinking
      64% R1+Sonnet 3.5
      62% o1 high
      60% Sonnet 3.7, no thinking
      60% o3-mini high
      57% R1
      52% Sonnet 3.5
  • nightpool an hour ago

    > 225 coding exercises from Exercism

    Has there been any effort taken to reduce data leakage of this test set? Sounds like these exercises were available on the internet pre-2023, so they'll probably be included in the training data for any modern model, no?

  • gwd 3 hours ago

    Interesting that the "correct diff format" score went from 99.6% with Claude 3.5 to 93.3% for Claude 3.7. My experience with using claude-code was that it consistently required several tries to get the right diff. Hopefully all that will improve as they get things ironed out.

    • macNchz an hour ago

      Reasoning models pretty reliably seem to do worse at exacting output formats/structured outputs—so far with Aider it has been an effective strategy to employ o1 to “think” about the issue at hand, and have Sonnet implement. Interested to try various approaches with 3.7 in various combinations of reasoning effort.

    • WatchDog an hour ago

      3.7 completed a lot more than 3.5, without seeing the actual results, we can't tell if there were any regressions in the edit format among the previously completed tasks.

  • darkotic an hour ago

    Working really well for me. Thanks for Aider!

  • bearjaws 4 hours ago

    Thanks for all the work on aider, my favorite AI tool.

    • bt1a an hour ago

      It really is best in slot. Owe it to git, which has a particular synergy with a hallucination-prone but correctable system

  • stavros 3 hours ago

    I'd like to second the thanks for Aider, I use it all the time.

  • usaar333 2 hours ago

    Updated. #1 with thinking

  • sheepdestroyer an hour ago

    Nice !

    Could we please get benchmarks for architect / DeepSeek R1 + claude-3-7-20250219 ?

    To compare perf and price with Sonnet-3.7-thinking

  • liamYC 3 hours ago

    I’d like to 3rd the thanks for Aider it’s fantastic!

  • throwaway454812 3 hours ago

    Any chance you can add support for Vertex AI Sonnet 3.7, which looks like it's available now? Thank you!

bcherny 6 hours ago

Hi everyone! Boris from the Claude Code team here. @eschluntz, @catherinewu, @wolffiex, @bdr and I will be around for the next hour or so and we'll do our best to answer your questions about the product.

  • babyshake 6 hours ago

    One thing I would love to have fixed - I type in a prompt, the model produces 90% or even 100% of the answer, and then shows an error that the system is at capacity and can't produce an answer. And then the response that has already been provided is removed! Please just make it where I can still have access to the response that has been provided, even if it is incomplete.

    • rishikeshs 5 hours ago

      This. Claude team, please fix this!

      • cat-snatcher an hour ago

        The UX team would never allow it. You gotta stay minimal and and definitely can't have any acknowledgement that a non-ideal user experience exists.

  • pookieinc 6 hours ago

    The biggest complaint I (and several others) have is that we continuously hit the limit via the UI after even just a few intensive queries. Of course, we can use the console API, but then we lose ability to have things like Projects, etc.

    Do you foresee these limitations increasing anytime soon?

    Quick Edit: Just wanted to also say thank you for all your hard work, Claude has been phenomenal.

    • eschluntz 6 hours ago

      We are definitely aware of this (and working on it for the web UI), and that's why Claude Code goes directly through the API!

      • smallerfish 6 hours ago

        I'm sure many of us would gladly pay more to get 3-5x the limit.

        And I'm also sure that you're working on it, but some kind of auto-summarization of facts to reduce the context in order to avoid penalizing long threads would be sweet.

        I don't know if your internal users are dogfooding the product that has user limits, so you may not have had this feedback - it makes me irritable/stressed to know that I'm running up close to the limit without having gotten to the bottom of a bug. I don't think stress response in your users is a desirable thing :).

        • justinbaker84 4 hours ago

          This is the main point I always want to communicate to the teams building foundation models.

          A lot of people just want the ability to pay more in order to get more.

          I would gladly pay 10x more to get relatively modest increases in performance. That is how important the intelligence is.

          • willsmith72 2 hours ago

            As a growth company, they likely would prefer a larger amount of users even with occasional rate limits, vs smaller pool of power users.

            As long as capacity is an issue, you can't have both

            • cruffle_duffle 3 minutes ago

              If people are paying for use, then why can’t you have both?

    • punkpeye 6 hours ago

      If you are open to alternatives, try https://glama.ai/gateway

      We currently serve ~10bn tokens per day (across all models). OpenAI compatible API. No rate limits. Built in logging and tracing.

      I work with LLMs every day, so I am always on top of adding models. 3.7 is also already available.

      https://glama.ai/models/claude-3-7-sonnet-20250219

      The gateway is integrated directly into our chat (https://glama.ai/chat). So you can use most of the things that you are used to having with Claude. And if anything is missing, just let me know and I will prioritize it. If you check our Discord, I have a decent track record of being receptive to feedback and quickly turning around features.

      Long term, Glama's focus is predominantly on MCPs, but chat, gateway and LLM routing is integral to the greater vision.

      I would love feedback if you are going to give a try frank@glama.ai

      • airstrike 6 hours ago

        The issue isn't API limits, but web UI limits. We can always get around the web interface's limits by using the claude API directly but then you need to have some other interface...

        • punkpeye 5 hours ago

          The API still has limits. Even if you are on the highest tier, you will quickly run into those limits when using coding assistants.

          The value proposition of Glama is that it combines UI and API.

          While everyone focuses on either one or the other, I've been splitting my time equally working on both.

          Glama UI would not win against Anthropic if we were to compare them by the number of features. However, the components that I developed were created with craft and love.

          You have access to:

          * Switch models between OpenAI/Anthropic, etc.

          * Side-by-side conversations

          * Full-text search of all your conversations

          * Integration of LaTeX, Mermaid, rich-text editing

          * Vision (uploading images)

          * Response personalizations

          * MCP

          * Every action has a shortcut via cmd+k (ctrl+k)

          • airstrike 4 hours ago

            Ok, but that's not the issue the parent was mentioning. I've never hit API limits but, like the original comment mentioned, I too constantly hit the web interface limits particularly when discussing relatively large modules.

            • glenstein 4 hours ago

              Right, that's how I read it also. It's not that there's no limits with the API, but that they're appreciably different.

          • m_kos 2 hours ago

            Your chat idea is a little similar to Abacus AI. I wish you had a similarly affordable monthly plan for chat only, but your UI seems much better. I may give it a try!

          • Aeolun 2 hours ago

            > Even if you are on the highest tier, you will quickly run into those limits when using coding assistants.

            Even heavy coding sessions never run into Claude limits, and I’m nowhere near the highest tier.

      • cmdtab 5 hours ago

        Do you have deepseek r1 support? I need it for a current product I’m working on.

        • punkpeye 4 hours ago

          Indeed we do https://glama.ai/models/deepseek-r1

          It is provided by DeepSeek and Avian.

          I am also midway of enabling a third-provider (Nebius).

          You can see all models/providers over at https://glama.ai/models

          As another commenter in this tread said, we are just a 'frontend wrapper' around other people services. Therefore, it is not particularly difficult to add models that are already supported by other providers.

          The benefit of using our wrapper is that you can use a single API key and you get one bill for all your AI bills, you don't need to hack together your own logic for routing requests between different providers, failovers, keeping track of their costs, worry what happens if a provider goes down, etc.

          The market at the moment is hugely fragmented, with many providers unstable, constantly shifting prices, etc. The benefit of a router is that you don't need to worry about those things.

          • cmdtab 3 hours ago

            Yeah I am aware. I use open router at the moment but I find it lacks a good UX.

            • punkpeye 3 hours ago

              Open router is great.

              They have a very solid infrastructure.

              Scaling infrastructure to handle billions of tokens is no joke.

              I believe they are approaching 1 trillion tokens per week.

              Glama is way smaller. We only recently crossed 10bn tokens per day.

              However, I have invested a lot more into UX/UI of that chat itself, i.e. while OpenRouter is entirely focused on API gateway (which is working for them), I am going for a hybrid approach.

              The market is big enough for both projects to co-exist.

        • pclmulqdq 4 hours ago

          They are just selling a frontend wrapper on other people's services, so if someone else offers deepseek, I'm sure they will integrate it.

    • clangfan 6 hours ago

      this is also my problem, ive only used the UI with $20 subscription, can I use the same subscription to use the cli? I'm afraid its like those aws api billing where there is no limit to how much I can use then get a surprise bill

      • eschluntz 5 hours ago

        It is API billing like AWS - you pay for what you use. Every time you exit a session we print the cost, and in the middle of a session you can do /cost to see your cost so far that session!

        You can track costs in a few ways and set spend limits to avoid surprises: https://docs.anthropic.com/en/docs/agents-and-tools/claude-c...

        • danw1979 3 hours ago

          What I really want (as a current Pro subscriber) is a subscription tier ("Ultimate" at ~$120/month ?) that gives me priority access to the usual chat interface, but _also_ a bunch of API credits that would ensure Claude and I can code together for most of the average working month (reasonable estimate would be 4 hours a day, 15 days a month).

          i.e I'd like my chat and API usage to be all included under a flat-rate subscription.

          Currenty Pro doesn't give me any API credits to use with coding assistants (Claude Code included ?) which is completely disjointed. And I need to be a business to use the API still ?

          Honestly, Claude is so good, just please take my money and make it easy to do the above !

          • Aeolun 2 hours ago

            I don’t think you need to be a business to use the API? At least I’m fairly certain I’m using it in a personal capacity. You are never going to hit $120/month even with full-time usage (no guarantees of course, but I get to like $40/month).

            • Terretta 41 minutes ago

              Careful -- a solo dev using it professionally, meaning, coding with it as a pair coder (XP style), can easily spend $1500/week.

          • istjohn 2 hours ago

            You don't need to be a business to use the API.

          • dghlsakjg 2 hours ago

            You can do this yourself. Anyone can buy API credits. I literally just did this with my personal credit card using my gmail based account earlier today.

            1. Subscribe to Claude Pro for $20 month

            2. Separately, Buy $100 worth of API credits.

            Now you have a Claude "ultimate" subscription where the credits roll over as an added bonus.

            As someone who only uses the APIs, and not the subscription services for AI, I can tell you that $100 is A LOT of usage. Quite frankly, I've never used anywhere close to $20 in a month which is why I don't subscribe. I mostly just use text though, so if you do a lot of image generation that can add up quickly

            • numba888 2 hours ago

              I don't think you can generate images with claude. just asked it for pink elephant: "I can't generate images directly, but I can create an SVG representation of a pink elephant for you." And it did it :)

            • dr_kiszonka 2 hours ago

              That is a good idea. For something like Claude Code, $100 is not a lot, though.

        • mindok 4 hours ago

          Which is theoretically great, but if anyone can get an Aussie credit card to work, please let me know.

          • robbiep 4 hours ago

            I haven’t had an issue with Aussie cards?

            But I still hit limits, I use Claudemind with jetbrains stuff and there is a max of input tokens (j believe), I am ‘tier 2’ but doesn’t look like I can go past this without an enterprise agreement

      • edmundsauto an hour ago

        I use AnythingLLM so you can still have a "Projects" like RAG.

  • posix86 5 hours ago

    Claude is my go to llm for everything, sounds corny but it's literally expanding the circle of what I can reasonably learn, manyfold. Right now I'm attempting to read old philosophical texts (without any background in similar disciplines), and without claude's help to explain the dense language in simpler terms & discuss its ideas, give me historical contexts, explaining why it was written this or that way, compare it against newer ideas - I would've given up many times.

    At work I used it many times daily in development. It's concise mode is a breath of fresh air compared to any other llm I've tried. It has helped me find bugs in foreign code bases, explain me the techstack, written bash scripts, saving me dozens of hours of work & many nerves. It generally makes me reach places I wouldn't without due to time constraints & nerves.

    The only nitpick is that the service reliability is a bit worse than others, forcing me sometimes to switch to others. This is probably a hard to answer question, but are there plans to improve that?

  • gwd 4 hours ago

    Just started playing with the command-line tool. First reaction (after using it for 5 minutes): I've been using `aider` as a daily driver, with Claude 3.5, for a while now. One of the things I appreciate about aider is that it tells you how much each query cost, and what your total cost is this session. This makes it low-key easy to keep tabs on the cost of what I'm doing. Any chance you could add that to claude-code?

    I'd also love to have it in a language that can be compiled, like golang or rust, but I recognize a rewrite might be more effort than it's worth. (Although maybe less with claude code to help you?)

    EDIT: OK, 10 minutes in, and it seems to have major issues doing basic patches to my Golang code; the most recent thing it did was add a line with incorrect indentation, then try three times to update it with the correct indentation, getting "String to replace not found in file" each time. Aider with claude 3.5 does this really well -- not sure what the counfounding issue is here, but might be worth taking a look at their prompt & patch format to see how they do it.

    • davidbarker 4 hours ago

      If you do `/cost` it will tell you how much you've spent during that session so far.

    • eschluntz 4 hours ago

      hi! You can do /cost at any time to see what the current session has cost

  • davely 6 hours ago

    I'm in the middle of a particularly nasty refactor of some legacy React component code (hasn't been touched in 6 years, old class based pattern, tons of methods, why, oh, why did we do XYZ) at work and have been using Aider for the last few days and have been hitting a wall. I've been digging through Aider's source code on Github to pull out prompts and try to write my own little helper script.

    So, perfect timing on this release for me! I decided to install Claude Code and it is making short work of this. I love the interface. I love the personality ("Ruminating", "Schlepping", etc).

    Just an all around fantastic job!

    (This makes me especially bummed that I really messed up my OA awhile back for you guys. I'll try again in a few months!)

    Keep on doing great work. Thank you!

    • bcherny 6 hours ago

      Hey thanks so much! <3

  • antirez 5 hours ago

    One of the silver bullets of Claude, in the context of coding, is that it does NOT use RAG when you use it via the web interface. Sure, you burn your tokens but the model sees everything and this let it reply in a much better way. Is Claude Code doing the same and just doing document-level RAG, so that if a document is relevant and if it fits, all the document will be put inside the context window? I really hope so! Also, this means that splitting large code bases into manageable file sizes will make more and more sense. Another Q: is the context size of Sonnet 3.7 the same of 3.5? Btw Thanks you so much for Claude Sonnet, in the latest months it changed the way I work and I'm able to do a lot more, now.

    • bcherny 5 hours ago

      Right -- Claude Code doesn't use RAG currently. In our testing we found that agentic search out-performed RAG for the kinds of things people use Code for.

      • marlott 5 hours ago

        Interesting - can you elaborate a little on what you mean by agentic search here?

        • simonw 2 hours ago

          Since the Claude Code docs suggest installing Ripgrep, my guess is that they mean that Claude Code often runs searches to find snippets to improve in the context.

          I would argue that this is still RAG. There's a common misconception (or at least I think it's a misconception) that RAG only counts if you used vector search - I like to expand the definition of RAG to include non-vector search (like Ripgrep in this case), or any other technique where you use Retrieval techniques to Augment the Generation phase.

          IR (Information Retrieval) has been around for many decades before vector search become fashionable: https://en.wikipedia.org/wiki/Information_retrieval

        • antirez 4 hours ago

          I guess it's what sometimes it's called "self RAG", that is, the agent looks inside the files how a human would be to find that's relevant.

          • kadushka 3 hours ago

            As opposed to vector search, or…?

            • FeepingCreature 2 hours ago

              To my knowledge these are the options:

              1. RAG: A simple model looks at the question, pulls up some associated data into the context and hopes that it helps.

              2. Self-RAG: The model "intentionally"/agentically triggers a lookup for some topic. This can be via a traditional RAG or just string search, ie. grep.

              3. Full Context: Just jam everything in the context window. The model uses its attention mechanism to pick out the parts it needs. Best but most expensive of the three, especially with repeated queries.

              Aider uses kind of a hybrid of 2 and 3: you specify files that go in the context, but Aider also uses Tree-Sitter to get a map of the entire codebase, ie. function headers, class definitions etc., that is provided in full. On that basis, the model can then request additional files to be added to the context.

              • kadushka 37 minutes ago

                I'm still not sure I get the difference between 1 and 2. What is "pulls up some associated data into the context" vs ""intentionally"/agentically triggers a lookup for some topic"?

            • numba888 an hour ago

              Does it make sense to use vector search for code? It's more for vague texts. In the code relevant parts can be found by exact name match. (in most cases. both methods aren't exclusive)

              • simonw 39 minutes ago

                Vector search for code can be quite interesting - I've used it for things like "find me code that downloads stuff" and it's worked well. I think text search is usually better for code though.

  • swairshah 5 hours ago

    Why not just open source Claude Code? people have tried to reverse eng the minified version https://gist.githubusercontent.com/1rgs/e4e13ac9aba301bcec28...

  • fsndz 6 hours ago

    Anthropic is back and cementing its place as the creator of the best coding models—bravo!

    With Claude Code, the goal is clearly to take a slice of Cursor and its competitors' market share. I expected this to happen eventually.

    The app layer has barely any moat, so any successful app with the potential to generate significant revenue will eventually be absorbed by foundation model companies in their quest for growth and profits.

    • biker142541 4 hours ago

      I wonder if they will offer competitive request counts against Cursor. Right now, at least for me, the biggest downside to Claude is how fast I blow through the limits (Pro) and hit a wall.

      At least with Cursor, I can use all "premium" 500 completions and either buy more, or be patient for throttled responses.

      • biker142541 an hour ago

        Reread the blog post, and I suspect Cursor will remain much more competitive on pricing! No specifics, but likely far exceeding typical Cursor costs for a typical developer. Maybe it's worth it, though? Look forward to trying.

        >Claude Code consumes tokens for each interaction. Typical usage costs range from $5-10 per developer per day, but can exceed $100 per hour during intensive use.

    • keithwhor 6 hours ago

      I think an argument could be reasonably made that the app layer is the only moat. It’s more likely Anthropic eventually has to acquire Cursor to cement a position here than they out-compete it. Where, why, what brand and what product customers swipe their credit cards for matters — a lot.

      • fsndz 5 hours ago

        if Claude Code offers a better experience, users will rapidly move from cursor to Claude Code.

        Claude is for Code: https://medium.com/thoughts-on-machine-learning/claude-is-fo...

        • keithwhor 5 hours ago

          (1) That's a big if. It requires building a team specialized in delivering what Cursor has already delivered which is no small task. There are probably only a handful of engineers on the planet that have or can be incentivized to develop the product intuition the Cursor founders have developed in the market already. And even then; I'm an aspiring engineer / PM at Anthropic. Why would I choose to spend all of my creative energy copying what somebody else is doing for the same pay I'd get working on something greenfield, or more interesting to me, or more likely to get me a promotion?

          (2) It's not clear to me that users (or developers) actually behave this way in practice. Engineering is a bit of a cargo cult. Cursor got popular because it was good but it also got popular because it got popular.

          • Etheryte 3 hours ago

            In my opinion you're vastly overestimating how much of a moat Cursor has. In broad strokes, in builds an index of your repo for easier referencing and then adds some handy UI hooks so you can talk to the model, there really isn't that much more going on. Yes, the autocomplete is nice at times, but it's at best like pair programming with a new hire. Every big player in the AI space could replicate what they've done, it's only a matter of whether they consider it worth the investment or not given how fast the whole field is moving.

            • Aeolun 2 hours ago

              If Zed gets its agentice editing mode in I’m moving away from Cursor again. I’m only with them because they currently have the best experience there. Their moat is zero, and I’d much rather use purely API models than a Cursor subscription.

            • keithwhor 3 hours ago

              Conversely, I think you're overestimating the impact of the value (or lack thereof) of technology over distribution and market timing.

          • CharlesW 5 hours ago

            > It requires building a team specialized in delivering what Cursor has already delivered which is no small task.

            There are several AIDEs out there, and based on working with Cursor, VS Code, and Windsurf there doesn't seem to be much of a difference (although I like Windsurf best). What moat does Cursor have?

            • aquariusDue 4 hours ago

              Just chiming in to say that AIDEs (Artificial Intelligence Development Environments, I suppose) is such a good term for these new tools imo.

              It's one thing to retrofit LLMs into existing tools but I'm more curious how this new space will develop as time goes on. Already stuff like the Warp terminal is pretty useful in day to day use.

              Who knows, maybe this time next year we'll see more people programming by voice input instead of typing. Something akin to Talon Voice supercharged by a local LLM hopefully.

    • eschluntz 6 hours ago

      hi! I've been using Claude Code in a very complementary way to my IDE, and one of the reasons we chose the terminal is because you can open it up inside whichever IDE you want!

  • cpeterso 2 hours ago

    A minor ChatGPT feature I miss with Claude is temporary chats. I use ChatGPT for a lot of random one-off questions and don’t want them filling up my chat history with so many conversations.

  • theptip 41 minutes ago

    > We’ve also improved the coding experience on Claude.ai. Our GitHub integration is now available on all Claude plans—enabling developers to connect their code repositories directly to Claude

    Would love to learn a bit more about how the GitHub integration works. From https://support.anthropic.com/en/articles/10167454-using-the... it seems it’s read only.

    Does Claude Code let me take a generated/edited artifact and commit it back as a PR?

    • simonw 35 minutes ago

      The https://claude.io/ integration is read-only. Basically you OAuth with GitHub and now you can select a repository, then select files or directories within it to add to either a Claude Project or to an individual prompt.

      Claude Code can run commands including "git" commands, so it can create a branch, commit code to that branch and push that branch to GitHub - at which point point you can create a PR.

  • joshuabaker2 6 hours ago

    Hi Boris, love working with Claude! I do have a question—is there a plan to have Claude 3.5 Sonnet (or even 3.7!) made available on ca-central-1 for Amazon Bedrock anytime soon? My company is based in Canada and we deal with customer information that is required to stay within Canada, and the most recent model from Anthropic we have available to us is Claude 3.

    • pbronez 5 hours ago

      Concur. Models aren’t real until I can run them inside my perimeter.

  • danso 5 hours ago

    Been a long time casual — i.e. happy to fix my code by asking questions and copy/pasting individual snippets via the chat interface. Decided to give the `claude` terminal tool a run and have to admit it looks like a fantastic tool.

    Haven't tried to build a modern JS web app in years — it took the claude tool just a few minutes of prompting to convert and refactor an old clunky tool into a proper project structure, and using svelte and vite and tailwind (which I haven't built with before). Trying to learn how to even scaffold a modern app has felt daunting and this eliminates 99% of that friction.

    One funny quirk: I asked it to build a test suite (I know zilch about JS testing frameworks, so it picked vitest for me) for the newly refactored app. I noticed that 3 of the 20 tests failed and so I asked it to run vitest for itself and fix the failing things. 2 minutes later, and now 7 tests were failing...

    Which is very funny to me, but also not a big deal. Again, it's such a chore to research test libs and then set things up to their conventions. That the claude tool built a very usable scaffold that I can then edit and iterate on is such a huge benefit by itself, I don't need (nor desire) the AI to be complete turnkey solution.

  • pbor 6 hours ago

    Hi and congrats on the launch!

    Will check out Claude Code soon, but in the meantime one unrelated other feature request: Moving existing chats into a project. I have a number of old-ish but super-useful and valuable chats (that are superficially unrelated) that I would like to bring together in a project.

  • matznerd 6 hours ago

    Hi Boris et al, can you comment on increased conversation lengths or limits through the UI? I didn't see that mentioned in the blog post, but it is a continued major concern of $20/month Claude.ai users. Is this an issue that should be fixed now or still waiting on a larger deployment via Amazon or something? If not now, when can users expect the conversation length limitations will be increased?

  • ipsum2 6 hours ago

    Why gatekeep Claude Code, instead of releasing the code for it? It seems like a direct increase in revenue/API sales for your company.

    • sangnoir 4 hours ago

      I'm not affiliated with Anthropic, but it seems like doing this will commoditize Claude (the AIaaS). Hosted AI providers are doing all they can to move away from being interchangeable commodities; it's not good for Anthropic's revenue for users to be able to easily swap-out the backend of Cloud Code to a local Olama backend, or a cheaper hosted DeepSeek. Open sourcing Claude Code would make this option 1 or 2 forks/PRs away.

  • darkotic 35 minutes ago

    Love the UI so far. The experience feels very inspired by Aider, which is my current choice. Thanks!

  • robbomacrae 3 hours ago

    Awesome to see a new Claude model - since 3.5 its been my go-to for all code related tasks.

    I'd really like to use Claude Code in some of my projects vs just sharing snippets via the UI but I'm curious how might doing this from our source directory affect our IP including NDA's, trade secret protections, prior disclosure rules on (future) patents, open source licensing restrictions re: redistribution etc?

    Also hi Erik! - Rob

  • sha16 2 hours ago

    When I first started using Cursor the default behavior was for Claude to make a suggestion in the chat, and if the user agreed with it, they could click apply or cut and paste the part of it they wanted to use in their larger project. Now it seems the default behavior is for Claude to start writing files to the current working directory without regard for app structure or context (e.g., config files that are defined elsewhere claude likes to create another copy of). Why change the default to this? I could be wrong but I would guess most devs would want to review changes to their repo first.

    • frohrer an hour ago

      Cursor has two LLM interaction modes, chat and composer. The chat does what you described first and composer can create/edit/delete files directly. Have you checked which mode you're on? It should be a tab above your chat window.

  • Ninjinka 6 hours ago

    How is your largest customer, Cursor, taking the news that you'll be competing directly with them?

    • sebzim4500 5 hours ago

      They probably aren't thrilled, but a lot of users will prefer a UI and I doubt Anthropic has the spare cycles to make a full Cursor competitor.

    • alienthrowaway 4 hours ago

      Unless Cursor had agreed to an exclusivity agreement with Anthropic, Antropic was (and still is) at risk of Cursor moving to a different provider or using their middleman position to train/distill their own model that competes with Anthropic.

    • behnamoh 6 hours ago

      honestly, is this something that anthropic should be worried about? you could ask the same question from all the startups that were destroyed by OpenAI.

  • timojaask 4 hours ago

    Hi! I’ve been using Claude for macOS and iOS coding for a while, and it’s mostly great, but it’s always using deprecated APIs, even if I instruct it not to. It will correct the mistake if I ask it to, but then in later iterations, it will sometimes switch back to using a deprecated API. It also produces a lot of code that just doesn’t compile, so a lot of time is spent fixing the made up or deprecated APIs.

  • andrewchilds 4 hours ago

    Hi Boris! Thank you for your work on Claude! My one pet peeve with Claude specifically, if I may: I might be working on a Svelte codebase and Claude will happily ignore that context and provide React code. I understand why, but I’d love to see much less of a deep reliance on React for front-end code generation.

  • mike_hearn 6 hours ago

    Great, thanks! Could you compare this new tool to Aider?

  • lintaho 5 hours ago

    For the pokemon benchmark, what happened after the Lt Surge gym? Did the model stall or run out of context or something similar?

  • curl-up 6 hours ago

    In the console, TPM limit for 3.7 is not shown (I'm tier 4). Does it mean there is no limit, or is it just pending and is "variable" until you set it to some value?

    • catherinewu 6 hours ago

      We set the Claude Code rate limits to be usable as a daily driver. We expect hitting rate limits for synchronous usage to be uncommon. Since this is a research preview, we recommend you start small as you try the product though.

      • curl-up 5 hours ago

        Sorry, I completely missed you're from the Code team. I was actually asking about the vanilla API. Any insights into those limits? It's still missing the TPM number in the console.

  • _cs2017_ 3 hours ago

    Your footnote 3 seems to imply that the low number for o1 and Grok3 is without parallelism, but I don't think it's publicly known whether they use internal parallelism? So perhaps the low number already uses parallelism, while the high number uses even more parallelism?

    Also, curious if you have any intuition as to why the no-parallelism number for AIME with Claude (61.3%) is quite low (e.g., relative to R1 87.3% -- assuming it is an apples to apples comparison)?

  • oofbaroomf 6 hours ago

    Do you think Claude Code is "better", in terms of capabilities and token efficiency, than other tools such as Cline, Cursor, or Aider?

    • bcherny 6 hours ago

      Claude Code is a research preview -- it's more rough, lets you see model errors directly, etc. so it's not as polished as something like Cline. Personally I use all of the above. Engineers here at Anthropic also tend to use Claude Code alongside IDEs like Cursor.

  • galaxyLogic 3 hours ago

    The thing I would like automated is highlighting a function in my code then ask the AI to move it to a new module-file and import that new module.

    I would like this to happen easily like hitting a menu or button without having to write an elaborate "prompt" every time.

    Is this possible?

    • Aeolun 2 hours ago

      I think most language servers have a feature like this right?

      • hassleblad23 2 hours ago

        Moving a function or class? Yes. But moving arbitrary lines of code into their own function in a new module is still a PITA, particularly when the lines of code are not consecutive.

  • throw83288 3 hours ago

    Serious question: What advice would you give to a Computer Science student in light of these tools?

    • danw1979 3 hours ago

      Serious answer: learn to code.

      You still need to know what good code looks like to use these tools. If you go forward in your career trusting the output of LLMs without the skills to evaluate the correctness, style, functionality of that code then you will have problems.

      People still write low level machine code today, despite compilers having existed for 70+ (?) years.

      We'll always need full-stack humans who understand everything down to the electrons even in the age of insane automation that we're entering.

      • jackjeff 2 hours ago

        Could not agree more! I have 20+ years experience and use Cursor/Sonnet daily. It saves huge amounts of time.

        But I can’t imagine this tool in the hands of someone who does not have a solid understanding of programming.

        You need to understand when to push back and why. It’s like doing mini code reviews all the time. LLMs are very convincing and will happily generate garbage with the utmost authority.

        Don’t trust and absolutely verify.

      • simonw 2 hours ago

        +1 to this. There has never been a better time to learn to code - the learning curve is being shaved down by these new LLM-based tools, and the amount of value people with programming literacy can produce is going up by an order of magnitude.

        People who know both coding and LLMs will be a whole lot more attractive to hire to build software than people who just know LLMs for many years to come.

  • kevinz3 6 hours ago

    hey guys! i was wondering why you chose to build Claude code via CLI when many popular choices like cursor and windsurf fork VScode. do you envision the future of Claude code to abstract away the codebase entirely?

    • bcherny 6 hours ago

      We wanted to bring the model to people where they are without having to commit to a specific tool or radically change their workflows. We also wanted to make a way that lets people experience the model’s coding abilities as directly as possible. This has tradeoffs: it uses a lot of tokens, and is rough (eg. it shows you tool errors and model weirdness), but it also gives you a lot of power and feels pretty awesome to use.

      • unshavedyak 3 hours ago

        I like this quite a bit, thank you! I prefer Helix editor and i hate the idea of running VSCode just to access some random Code assistant

  • throwaway0123_5 5 hours ago

    I'm curious why there are no results for the "Claude 3.7 Extended Thinking" on SWE-Bench and Agentic tool use.

    Are you finding that extended thinking helps a lot when the whole problem can be posed in the prompt, but that it isn't a major benefit for agentic tasks?

    It would be a bit surprising, but it would also mirror my experiences, and the benchmarks which show Claude 3.5 being better at agentic tasks and SWE tasks than all other models, despite not being a reasoning model.

  • farco12 5 hours ago

    Thank you for the update!

    I recently attempted to use the Google Drive integration but didn't follow through with connecting because Claude wanted access to my entire Google Drive. I understand this simplifies the user experience and reduced time to ship, but is there anyway the team can add "reduce the access scope of Google Drive integration" to your backlog. Thank you!

    Also, I just caught the new Github integration. Awesome.

  • kapnap 4 hours ago

    Any change there will be a way to copy and paste the responses into other text boxes (i.e., a new email) and not have to re-jig the formatting?

    Lists, numbers, tabs, etc. are all a little time consuming... minor annoyance but thought I'd share.

  • 420gunna 6 hours ago

    Are you guys paying Claude for its assistance with your products

  • jumploops 6 hours ago

    From the release you say: "[..] in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs."

    Can you tell us more about the trade-offs here?

    Also, are you using synthetic data for improving the responses here, or are you purely leveraging data from usage/partner's usage?

  • dailykoder 3 hours ago

    Folks, let me tell you, AI is a big league player, it's a real winner, believe me. Nobody knows more about AI than I do, and I can tell you, it's going to be huge, just huge. The advancements we're seeing in AI are tremendous, the best, the greatest, the most fantastic. People are saying it's going to change the world, and I'm telling you, they're right, it's going to be yuge. AI is a game-changer, a real champion, and we're going to make America great again with the help of this incredible technology, mark my words.

  • Flux159 6 hours ago

    Is there a way to always accept certain commands across sessions? Specifically for things like reading or updating files I don't want to have to approve that each time I open a new repl.

    Also, is there a way to switch models between 3.5-sonnet and 3.5-sonnet-thinking? Got the initial impression that the thinking model is using an excessive amount of tokens on first use.

    • bcherny 6 hours ago

      When you are prompted to accept a bash command, we should be giving you the option to not ask again. If you're not seeing that for a specific bash command, would you mind running /bug or filing an issue on Github? https://github.com/anthropics/claude-code/issues

      Thinking and not thinking is actually the same model! The model thinks automatically when you ask it to. If you don't explicitly ask it to think, it won't use thinking.

      • trees101 an hour ago

        with Claude coder, how does history work? I used it with my account, ran out of credit then switched to a work account but there was no chat history or other saved context of the work that had been done. I logged back in with my account to try copy it but it was gone.

    • eschluntz 6 hours ago

      Right now no, but if you run in docker, you can use `--dangerously-skip-permissions`

      Some commands could be totally fine in one context, but bad in a different i.e. pushing to master

  • failerk 3 hours ago

    I tried signing up to use Claude about 6 months ago and ran into an error on the signup page. For some reason this completely locked me out from signing up since a phone number was tied to the login. I have submitted requests to get removed from this blacklist and heard nothing. The times I have tried to reach out on Twitter were never responded to. Has the customer support improved in the last 6 months?

    • Aeolun 2 hours ago

      You can try using it through Github Copilot? Just as a different avenue for usage.

  • luke-stanley 3 hours ago

    My key got killed months ago when I tested it on a PDF, and support never got back to me so I am waiting for OpenRouter support!

  • joevandyk 4 hours ago

    It would be amazing to be able to use an API key to submit prompts that use our Project Knowledge. That doesn't seem to be currently possible, right?

  • xianshou 4 hours ago

    Any way to parallelize tool use? When I go into a repo and ask "what's in here", I'm aiming for a summary that returns in 20 seconds.

  • rgomez 5 hours ago

    What kind of sorcery did you use to create Claude? Honest question :)

    • bcherny 5 hours ago

      Reticulating...

  • frankfrank13 6 hours ago

    Congrats on the launch! You said its an important tool for you (Claude Code) how does this fit in with Co-Pilot, Cursor, etc. Do you/your teammates only rely on Claude Code? What do you reach for for different tasks?

    • bcherny 6 hours ago

      Claude Code is super popular internally at Anthropic. Most engineers like to use it together with an IDE like Cursor, Windsurf, VS Code, Zed, Xcode, etc. Personally I usually start most coding tasks in Code, then move to an IDE for finishing touches.

  • jiggawatts 2 hours ago

    I really want to try your AI models, but "You must have a valid phone number to use Anthropic's services." is a show-stopper for me.

    It's the only mainstream AI service that requests this information. After a string of security lapses by many of your competitors, I have zero faith in the ability of a "fast moving" AI-focused company to keep my PII data secure.

    • AdrianEGraphene 18 minutes ago

      It's a phone number. It's probably been bought / sold a few times already. Unless you're on the level of Edward Snowden, I wouldn't worry about it. But maybe your sense of privacy is more valuable than the outcome you'd get from Claude. That's fine too.

  • sebzim4500 5 hours ago

    Did you guys ever fix the issue where if UK users wanted to use the API they have to provide a VAT number?

  • thegeomaster 6 hours ago

    Thank you to the team. Looks like a great release. Already switching existing prompts to Claude 3.7 to see the eval results :)

  • light_triad 6 hours ago

    Thanks for this - exciting launch. Do you have examples of cool applications or demos that the HN crowd should check out?

  • Attummm 6 hours ago

    Hi Boris,

    Would it be possible to bring back sonnet 2024 June?

    That model was the most attentive.

    Because we lost that model this release a value loss for me personally.

    • ac29 6 hours ago

      Seems to still be available via API as claude-3-5-sonnet-20240620

  • LouisSayers 6 hours ago

    Awesome work, Claude is amazingly good at writing code that is pretty much plug and play.

    Could you speak at all about potential IDE integrations? An integration into Jetbrains IDEs would be super useful - I imagine being able to highlight a bit of code and having a plugin check the code graph to see dependencies, tests etc that might be affected by a change.

    Copying and pasting code constantly is starting to seem a bit primitive.

    • eschluntz 6 hours ago

      Part of our vision is that because Claude Code is just in the terminal, you can bring it into any IDE (or server) you want! Obviously that has tradeoffs of not having a full GUI of the IDE though

      • unshavedyak 3 hours ago

        Anyone know how to get access to it? Notably i'm debating purchasing for Claude Code, but being on NixOS i want to make sure i can install it first.

        If this Code preview is only open to subscribers it means i have to subscribe before i can even see if the binary works for me. Hmm

        edit: Oh, there's a link to "joining the preview" which points to: https://docs.anthropic.com/en/docs/agents-and-tools/claude-c...

      • elliot07 6 hours ago

        I much prefer the standalone design to being editor integrated.

    • ben30 6 hours ago

      Jetbrains have an official mcp plugin

  • levocardia 4 hours ago

    Which starter pokemon does Claude typically choose?

    • lcnPylGDnU4H9OF 3 hours ago

      I'd also be interested in stats on Helix Fossil vs. Dome Fossil.

  • TIPSIO 5 hours ago

    What are your thoughts on having a UI/design benchmark?

  • cowpig 4 hours ago

    It would be great if we could upgrade API rate limits. I've tried "contacting sales" a few times and never received a response.

    edit: note that my team mostly hits rate limits using things like aider and goose. 80k input token is not enough when in a flow, and I would love to experiment with a multi-agent workflow using claude

  • siva7 5 hours ago

    Will Claude be available on Azure?

  • Falimonda 6 hours ago

    CLAUDE NUMBA ONE!!!

    Congrats on the new release!

  • fragmede 3 hours ago

    Now that the world's gotten used to the existence of AI, any hope on removing the guardrails on Claude? I don't need it to answer "How do I make meth", but I would like to not have to social engineer my prompts. I'd like it to just write the code I asked for and not judge me on how ethical the code might be.

    Eg Claude will refuse to write code to wget a website and parse the html if you ask it to scrape your ex girlfriend's Instagram profile, for ethical and tos reasons, but if you phrase the request differently, it'll happily go off and generate code that does that exact thing.

    Asking it to scrape my ex girlfriend's Instagram profile is just a stand in for other times I've hit a problem where I've had to social engineer my way past those guard rails, but does having those guard rails really provide value on a professional level?

  • wellthisisgreat 4 hours ago

    Hi, what are the privacy terms for Claude Code? Is it memorizing the codebase it’s helping with? From an enterprise standpoint

  • adastra22 5 hours ago

    When are you providing an alternative to email magic login links?

  • riku_iki 5 hours ago

    Is there plans to add websearch function over some core websites (SO, API docs)? Competitors have it, and in my experience this provide very good grounding for coding tasks (way less API functions hallucinated).

  • nprateem 6 hours ago

    Does this actually have an 8k (or more) output context via the API?

    3.5 did with a beta header but while 3.6 claimed to, it always cut its responses after 4k.

    IIRC someone reported it on GH but had no reply.

  • bakugo 6 hours ago

    Can you let the API team know that the /v1/models endpoint has been broken for hours? Thanks.

    • latetomato 6 hours ago

      Hello! Member of the API team here. We're unable to find issues with the /v1/models endpoint—can you share more details about your request? Feel free to email me at suzanne@anthropic.com. Thank you!

      • bakugo 6 hours ago

        It always returns a Not Found error for me. Using the curl command copied directly from the docs:

        $ curl https://api.anthropic.com/v1/models --header "x-api-key: $ANTHROPIC_API_KEY" --header "anthropic-version: 2023-06-01"

        {"type":"error","error":{"type":"not_found_error","message":"Not found"}}

        Edit: Tried creating a different API key and it works with that one. Weird.

        • lebovic 5 hours ago

          If you can reproduce the issue with the other API key, I'd also love to debug this! Feel free to share the curl -vv output (excluding the key) with the Anthropic email address in my profile

  • logicallee 6 hours ago

    Can you give some insight into how you chose the reply limit length? It seems to cut off many useful programs that are 80%-90% done and if the limit were just a little higher it would be a source of extraordinary benefit.

    • bcherny 6 hours ago

      If you can reproduce that, would you mind reporting it with /bug?

      • logicallee 5 hours ago

        Just tried it with claude 3.7 sonnet, here is the share: https://claude.ai/share/68db540d-a7ba-4e1f-882e-f10adf64be91 and it doesn't finish outputing the program. (It's missing the rest of the application function and the main function).

        Here are steps to reproduce.

        Background/environment:

        ChatGPT helped me build this complete web browser in Python:

        https://taonexus.com/publicfiles/feb2025/71toy-browser-with-...

        It looks like this, versus the eventual goal: https://imgur.com/a/j8ZHrt1

        in 1055 lines. But eventually it couldn't improve on it anymore, ChatGPT couldn't modify it at my request so that inline elements would be on the same line.

        If you want to run it just download it and rename it to .py, I like Anaconda as an environment, after reading the code you can install the required libraries with:

        conda install -c conda-forge requests pillow urllib3

        then run the browser from the Anaconda prompt by just writing "python " followed by the name of the file.

        2.

        I tried to continue to improve the program with Claude, so that in-line elements would be on the same line.

        I performed these reproduceable steps:

        1. copied the code and pasted it into a Claude chat window with ctrl-v. This keeps it in the chat as paste.

        2. Gave it the prompt "This complete web browser works but doesn't lay out inline elements inline, it puts them all on a new line, can you fix it so inline elements are inline?"

        It spit out code until it hit section 8 out of 9 which is 70% of the way through and gave the error message "Claude hit the max length for a message and has paused its response. You can write Continue to keep the chat going". Screenshot:

        https://imgur.com/a/oSeiA4M

        So I wrote "Continue" and it stops when it is 90% of the way done.

        Again it got stuck at 90% of the way done, second screenshot in the above album.

        So I wrote "Continue" again.

        It just gave an answer but it never finished the program. There's no app entry in the program, it completely omited the rest of the main class itself and the callback to call it, which would be like:

                def run(self):
                    self.root.mainloop()
            
            ###############################################################################
            # main
            ###############################################################################
            
            if __name__=="__main__":
                sys.setrecursionlimit(10**6)
                app=ToyBrowser()
                app.run()
        
        so it only output a half-finished program. It explained that it was finished.

        I tried telling it "you didn't finish the program, output the rest of it" but doing so just got it stuck rewriting it without finishing it. Again it said it ran into the limit, again I said Continue, and again it didn't finish it.

        The program itself is only 1055 lines, it should be able to output that much.

        • istjohn 2 hours ago

          You don't want all that code in one file anyway. Have Claude write the code as several modules. You'll put each module in its own file and then you can import functions and classes from one module to another. Claude can walk you through it.

  • neoromantique 6 hours ago

    Thanks for the product! Glad to hear the (so called) "safety" is being walked back on, previously Claude has been feeling a little like it is treating me as a child, excited to try it out now.

freediver 5 hours ago

Kagi LLM benchmark updated with general purpose and thinking mode for Sonnet 3.7.

https://help.kagi.com/kagi/ai/llm-benchmark.html

Appears to be second most capable general purpose LLM we tried (second to gemini 2.0 pro, in front of gpt-4o). Less impressive in thinking mode, about at the same level as o1-mini and o3-mini (with 8192 token thinking budget).

Overall a very nice update, you get higher quality and higher speed model at same price.

Hope to enable it in Kagi Assistant within 24h!

  • jjice 5 hours ago

    Thank you to the Kagi team for such fast turn around on new LLMs being accessible via the Assistant! The value of Kagi Assistant has been a no-brainer for me.

  • KTibow 4 hours ago

    One thing I don't understand is why Claude 3.5 Haiku, a non thinking model in the non thinking section, says it has a 8192 thinking budget.

  • thefourthchime 5 hours ago

    Nice, but where is Grok?

    • pertymcpert 4 hours ago

      Perhaps they're waiting for the Grok API to be public?

  • Squarex 4 hours ago

    I'm surprised that Gemini 2.0 is first now. I remember that Google models were under performing on kagi benchmarks.

    • Workaccount2 4 hours ago

      Having your own hardware to run LLMs will pay dividends. Despite getting off on the wrong foot, I still believe Google is best positioned to run away with the AI lead, solely because they are not beholden to Nvidia and not stuck with a 3rd party cloud provider. They are the only AI team that is top to bottom in-house.

      • Squarex 4 hours ago

        I've used gemini for it's large context window before. It's a great model. But specifically in this benchmark it has always scored very low. So I wonder what has changed.

    • manmal 4 hours ago

      Gemini 2 is really good, and insanely fast.

      • Squarex 4 hours ago

        It is, but in this benchmark gemini scored very poorly in the past.

  • guelo 4 hours ago

    How did you chose the 8192 token thinking budget? I've often seen Deepseek R1 use way more than that.

  • flixing 5 hours ago

    Do you think kagi is the right Eval tool? If so,why?

hubraumhugo 6 hours ago

You can get your HN profile analyzed by it and it's pretty funny :)

https://hn-wrapped.kadoa.com/

I'm using this to test the humor of new models.

  • khendron 2 minutes ago

    > You've spent so much time explaining why enterprise software is terrible, we're starting to think you might be the person who designed Salesforce.

    That's a low blow.

  • creakingstairs 10 minutes ago

    > You took a year off for mental health but still couldn't resist building 'for-profit projects' during your break. The only thing more persistent than your work ethic is your inability to actually relax.

    > You complain about Elixir's lack of types but keep using it anyway. This is the programming equivalent of staying in a relationship where you keep trying to change the other person.

    > You've lived in multiple countries but spend most of your time on HN explaining why their tech infrastructure is terrible. Maybe the common denominator is you?

    Ouch, it's pretty good haha

  • steve_adams_86 an hour ago

    > You left a high-paying tech job to grow plants in water, which is basically just being a farmer with extra steps and less sunlight.

    Ha

    Also:

    > Your comments read like someone who discovered philosophy in their 30s and now can't decide if they want to code or become the next Marcus Aurelius.

    skull emoji

  • desperatecuban 5 hours ago

    > Your salary is so low even your legacy code feels sorry for you.

    > You're the only person on HN who thinks $800/month is a salary and not a cloud computing bill.

    ouch

  • redeux 4 hours ago

    > You complain about digital distractions while writing novels in HN comment threads. That's like criticizing fast food while waiting in the drive-thru line.

    >You'll write a thoughtful essay about 'digital minimalism' that reaches the HN front page, ironically causing you to spend more time on HN responding to comments than you have all year.

    It sees me! Noooooo ...

  • jedberg 6 hours ago

    > For someone who worked at Reddit, you sure spend a lot of time on HN. It's like leaving Facebook to spend all day on Twitter complaining about social media.

    Wow, so spot on it hurts!

    • sitkack 5 hours ago

      > For someone who criticizes corporate structures so much, you've spent an impressive amount of time analyzing their technical decisions. It's like watching someone critique a restaurant's menu while eating there five times a week.

    • calvinmorrison 4 hours ago

      >Your ideal tech stack is so old it qualifies for social security benefits

      >You're the only person who gets excited when someone mentions Trinity Desktop Environment in 2025

      > You probably have more opinions about PHP's empty() function than most people have about their entire career choices

      • drivers99 3 hours ago

        > Personal Projects: You'll finally complete that bare-metal Forth interpreter for Raspberry Pi

        I was just looking into that again as of yesterday (I didn't post about it here yesterday, just to be clear; it picked up on that from some old comments I must have posted).

        > Profile summary: [...] You're the person who not only remembers what a CGA adapter is but probably still has one in working condition in your basement, right next to your collection of programming books from 1985.

        Exactly the case, in a working IBM PC, except I don't have a basement. :)

  • LinXitoW 5 hours ago

    Got absolutely read to filth:

    > You've spent more time explaining why Go's error handling is bad than Go developers have spent actually handling errors.

    > Your relationship with programming languages is like a dating show - you keep finding flaws in all of them but can't commit to just one.

    > If error handling were a religion, you'd be its most zealous missionary, converting the unchecked one exception at a time.

    • airstrike 5 hours ago

      > You've spent more time explaining why Go's error handling is bad than Go developers have spent actually handling errors.

      That is absolutely hilarious. Really well done by everyone who made that line possible.

    • sa46 4 hours ago

      Yea, these are nicely done. To add some balance:

      > After years of defending Go, you'll secretly start a side project in Rust but tell no one on HN about your betrayal

  • tilsammans 2 hours ago

    My roasts are savage:

    > Your 236-line 'simplified' code example suggests you might need to look up the definition of 'simplified' in a dictionary that's not written in Ruby.

    OUCH

    > You've spent so much time worrying about Facebook tracking you that you've failed to notice your dental nanobot fantasies are far more concerning to the rest of us.

    Heard.

  • throwup238 5 hours ago

      Your comments about suburban missile defense systems have the FBI agent monitoring your internet connection seriously questioning their career choices.
      You've spent so much time explaining why manufacturing is complex that you could have just built your own CRT factory by now.
      You claim to be skeptical of AI hype, yet you've indexed more documentation with Cursor than most people have read in their lifetime.
    
    Surprisingly accurate, but seems to be based on a very small snippet of actual comments (presumably to save money). I wonder what the prompt would output when given the full 200k tokens of context.
  • Yizahi 2 hours ago

    > You predicted Facebook would collapse into a black hole in 2012. The only black hole we found was the one where all your optimism disappeared.

    Ouch... :)

    PS: This profile check idea is really funny, great job :)

  • iandanforth 20 minutes ago

    > Hacker News: You'll write a comment so perfectly balanced between technical insight and dry humor that it breaks the upvote system, forcing dang to implement a new 'slow clap' feature just for you.

    fist pump

  • rubslopes 6 hours ago

    > - You've reminded so many people to use 'Show HN:' that you should probably just apply for a moderator position already.

    > - Your relationship with AI coding assistants is more complicated than most people's dating history - Cline, Cursor, Continue.Dev... pick a lane!

    > - You talk about grabbing coffee while your LLM writes code so much that we're not sure if you're a developer or a barista who occasionally programs.

    I laughed hard at this :D

  • martin_ 2 hours ago

    Wow brutal roasts

    “You've spent so much time reverse engineering other people's APIs that you forgot to build something people would want to reverse engineer.”

  • jddj 5 hours ago

    > You've recommended Marginalia search so many times, we're starting to think you're either the developer or just really enjoy websites that look like they were designed in 1998.

    Actually quite funny.

    [1] https://hn-wrapped.kadoa.com/jddj?share

    • throwup238 4 hours ago

      Especially hilarious considering that this is the actual marginalia developer: https://hn-wrapped.kadoa.com/marginalia_nu

      > You defend Java with such passion that Oracle's legal team is considering hiring you as their chief evangelist - just don't tell them about your secret admiration for more elegant programming paradigms.

    • herval an hour ago

      I love that its two predictions of projects I’m likely doing in 2025 are.. projects I actually tried already

  • tiltowait 30 minutes ago

    > You'll write a comment about chickens that somehow transitions into a critique of modern UI design principles, garnering your highest karma score yet.

    Challenge accepted.

  • hambos22 4 hours ago

    > You built your own Klaviyo alternative to save €500, but how many hours of development at market rate did that cost? The true Greek economy at work!

    ouch (ㅠ﹏ㅠ)

  • agys 4 hours ago

    “You've spent more time optimizing DOM manipulation for ASCII art than most people spend deciding what to watch on Netflix in their entire lives.”

    Ouch… :)

  • al_borland an hour ago

    > For someone who has strong opinions about rice cookers, bookmarklets, and toilet flushing mechanisms, we're surprised you haven't started a 'Unnecessarily Detailed Reviews of Mundane Objects' newsletter yet.

    That's not a terrible idea.

  • rtrgrd 19 minutes ago

    This should be a show hm post. 10/10 humor

  • ilrwbwrkhv 3 hours ago

    Profile Summary

    A successful tech entrepreneur who built a multi-million dollar business starting with Common Lisp, you're the rare HN user who actually practices what they preach.

    Your journey from Lisp to Go to Rust mirrors your evolution from idealist to pragmatist, though you still can't help but reminisce about the magical REPL experience while complaining about JavaScript frameworks.

    ---

    Roast

    You complain about AI-generated code being too complex, yet you pine for Common Lisp, a language where parentheses reproduction is the primary feature.

    For someone who built a multi-million dollar business, you spend an awful lot of time telling everyone how much JavaScript and React suck. Did a React component steal your lunch money?

    You've changed programming languages more often than most people change their profile pictures. At this rate, you'll be coding in COBOL by 2026 while insisting it's 'underappreciated'.

  • TeMPOraL 39 minutes ago

    Frak me, how is this so good?

    How does it know that I'm still tweaking Nyan Mode for Emacs in 2025?

  • ANewFormation 6 hours ago

    Oh god that's genuinely way more amusing than I thought llm systems were capable of.

    • XenophileJKO 4 hours ago

      The more I use LLMs the more I have actually gravitated to looking at the humor of LLMs as a imperfect proxy measure of "intelligence".

      Obviously this is problematic, but Claude 3.5 (and now 3.7) have been genuinely funny and consistently funny.

  • BeetleB 5 hours ago

    This is a better plug for the new Claude Sonnet model than the official announcement!

  • netshade 3 hours ago

    LOL, this truly made me laugh. I'm also doing humor stuff with Claude, I was pretty pleased with 3.5 so excited to see what happens with the 3.7 change. It's a radio station with a bunch of DJs with different takes on reality, so looking forward to see how it handles their different experiences.

  • Philpax 6 hours ago

    Seems broken? Getting

    > An error occurred in the Server Components render. The specific message is omitted in production builds to avoid leaking sensitive details. A digest property is included on this error instance which may provide additional details about the nature of the error.

    • ghxst 5 hours ago

      Worked for me, seems to be case sensitive (?) I'll post these incase I just got lucky and it still doesn't work for you.

      https://hn-wrapped.kadoa.com/Philpax?share

      > You explain WebAssembly memory management with such passion that we're worried you might be dating your pointer allocations.

      > Your comments about multiplayer game architecture are so detailed, we suspect you've spent more time debugging network code than maintaining actual human connections.

      > You track AI model performance metrics more closely than your own bank account. DeepSeek R1 knows your preferences better than your significant other.

      I like your interests :)

      • Philpax 5 hours ago

        Aha, there it is - terrific, thank you :>

        Yes, I'm quite the eclectic kind!

    • ANewFormation 6 hours ago

      I did multiple accounts with no problem, but in trying to do you I got the same error.

      You've broke the system.

      • Philpax 6 hours ago

        New benchmark for good posting, I'll take it!

  • Aeolun 2 hours ago

    It seems to have a heavy bias towards my most recent comments? If it were summarizing the last week or so it would be very accurate.

    • stickfigure an hour ago

      I got "Still defending Java in 2023? I bet you also think cargo shorts are the height of fashion."

      I defend Java and cargo shorts in 2025!

  • dgunay 3 hours ago

    > Your ideal laptop would run Linux flawlessly with perfect hardware compatibility, have MacBook build quality, and Windows game support. Meanwhile, the rest of us live in reality.

    Damn, got me there haha

  • nbbaier an hour ago

    Off topic, but what is this made with a specific component library?

  • raminf 3 hours ago

    > Hacker News

    > You'll finally stop checking egg prices at Costco and instead focus on writing that definitive 'How I Built My Own Super App Without Getting Rejected By Apple' post.

    On it!

  • jjice 5 hours ago

    This is absolutely hilarious! Thanks for posting. It feels weighted towards some specific things (I assume this is done by the LLM caring about later context more?) - making it debatably even funnier.

    > You're the only person who gets excited about trailing commas in SQL. Even the database administrators are like 'dude, it's just a comma.'

  • wildermuthn 3 hours ago

    "Your enthusiasm for Oculus in 2014 was so intense that Mark Zuckerberg probably bought it just to make you stop posting about it."

    Incredible work!

  • nickvec 3 hours ago

    > You correct grammar in HN comments but still haven't figured out that nobody cares

    My ego will never recover from this

  • jumploops 5 hours ago

    > You've mentioned 'simple is robust' so many times that we're starting to think your dating profile just says 'uncomplicated and sturdy'.

    > For someone who builds tools to automate everything, you sure spend a lot of time manually explaining why automation is the future on HN.

    > Your obsession with sandboxed code execution suggests you've been traumatized by at least one production outage caused by an intern's unreviewed PR.

    So good it hurts!

  • xvector 41 minutes ago

    This is amazing

  • fullstackchris 3 hours ago

    > You've experienced so many startup failures that your LinkedIn profile should just read 'Professional Titanic Passenger: Always Picks the Wrong Ship'.

    :'(

  • nbzso 4 hours ago

    This thing is hilarious. :)

    Roast:

    - Your comments have more doom predictions than a Y2K convention in December 1999.

    - You've used 'stochastic parrot' so many times, actual parrots are filing for trademark infringement.

    - If tech dystopia were an Olympic sport, you'd be bringing home gold medals while explaining how the podium was designed by committee and the medal contains surveillance chips.

    • Yizahi 2 hours ago

      > You've used 'stochastic parrot' so many times, actual parrots are filing for trademark infringement.

      Ahahaha:) This line wins:)

  • replete 4 hours ago

    I need some ice for the burn I just received.

  • toomuchtodo 5 hours ago

    The 2025 predictions were like a spooky tarot card reading.

  • seafoamteal 6 hours ago

    Felt genuinely called out by that 'Roasts' section.

    • Panoramix 5 hours ago

      That thing knows me better than I know myself

  • processing 4 hours ago

    ljl good stuff

    "A digital nomad who splits time between critiquing Facebook's UI decisions, unearthing obscure electronic music tracks with 3 plays on YouTube, and occasionally making fires on German islands. When not creating Dystopian Disco mixtapes or lamenting the lack of MIDI export in AI tools, they're probably archiving NYT articles before paywalls hit.

    Roast

    You've spent more time complaining about Facebook's UI than Facebook has spent designing it, yet you still check it enough to notice every change.

    Your music discovery process is so complex it requires Discogs, Bandcamp, YouTube, and three specialized record stores, yet you're surprised when tracks only have 3 plays.

    You're the only person who joined HN to discuss the Yamaha DX7 synthesizer from 1983 and somehow managed to submit two front-page stories about it in 2019-2020. The 80s called, they want their FM synthesis back."

    edit: predictions are spot on - wow. Two of them detailed two projects I'm actively working on.

  • cyberpunk 6 hours ago

    > You hate Terraform so much you'd rather learn Erlang than write another for-loop in HCL.

    ..

    > After years of complaining about Terraform, you'll fully embrace Crossplane and write a scathing Medium article titled 'Why I Left Terraform and Never Looked Back'.

    Hahahaha.

  • boogieknite 3 hours ago

    > You've spent more time justifying your Apple Vision Pro purchase than actually using it for anything productive, but hey, at least you can watch movies on 'the best screen' while pretending it's a 'dev kit'.

    blasted

  • CamperBob2 3 hours ago

    Your comments have more bits of precision than the ADCs you love discussing, but somehow still manage to compress all nuance out of complex topics

    Hit dog hollers

  • gmassman 4 hours ago

    > Spends more time explaining why TypeScript in Svelte is problematic than actually fixing TypeScript in Svelte.

    Damn, that’s brutal. I mean, I never said I knew how to fix ComponentProps or generic components, just that they have issues…

  • anal_reactor 29 minutes ago

    This is fucking golden. It's incredible how accurate and funny it is

  • airstrike 5 hours ago

    > You've mentioned iced so many times, we're starting to wonder if you're secretly developing a Rust-based refrigerator company on the side.

    LMFAO so good. Humor seems on point

  • taytus 4 hours ago

    "You were using 'I don't understand these valuations' before it was cool - the original valuation skeptic hipster of Hacker News" -

  • StefanBatory 4 hours ago

    ... I had been called out by it hard, lmao. Painfully accurate.

t55 7 hours ago

Anthropic doubling down on code makes sense, that has been their strong suit compared to all other models

Curious how their Devin competitor will pan out given Devin's challenges

  • ru552 7 hours ago

    Considering that they are the model that powers a majority of Cursor/Windsurf usage and their play with MCP, I think they just have to figure out the UX and they'll be fine.

  • weinzierl 6 hours ago

    It's their strong suit no doubt, but sometimes I wish the chat would not be so eager to code.

    It often throws code at me when I just want a conceptual or high level answer. So often that I routinely tell it not to.

    • ben30 6 hours ago

      I’ve set up a custom style in Claude that won’t code but just keeps asking questions to remove assumptions:

      Deep Understanding Mode (根回し - Nemawashi Phase)

      Purpose: - Create space (間, ma) for understanding to emerge - Lay careful groundwork for all that follows - Achieve complete understanding (grokking) of the true need - Unpack complexity (desenrascar) without rushing to solutions

      Expected Behaviors: - Show determination (sisu) in questioning assumptions - Practice careful attention to context (taarof) - Hold space for ambiguity until clarity emerges - Work to achieve intuitive grasp (aperçu) of core issues

      Core Questions: - What do we mean by [key terms]? - What explicit and implicit needs exist? - Who are the stakeholders? - What defines success? - What constraints exist? - What cultural/contextual factors matter?

      Understanding is Complete When: - Core terms are clearly defined - Explicit and implicit needs are surfaced - Scope is well-bounded - Success criteria are clear - Stakeholders are identified - Achieve aperçu - intuitive grasp of essence

      Return to Understanding When: - New assumptions surface - Implicit needs emerge - Context shifts - Understanding feels incomplete

      Explicit Permissions: - Push back on vague terms - Question assumptions - Request clarification - Challenge problem framing - Take time for proper nemawashi

    • NitpickLawyer 6 hours ago

      > I just want a conceptual or high level answer

      I've found claude to be very receptive to precise instructions. If I ask for "let's first discuss the architecture" it never produces code. Aider also has this feature with /architect

    • ap-hyperbole 6 hours ago

      I added custom instruction under my Profile settings in the "personal preferences" text box. Something along the lines of "I like to discuss things before wanting the code. Only generate code when I prompt for it. Any question should be answered to as a discussion first and only when prompted should the implementation code be provided". It works well, occasionally I want to see the code straight away but this does not happen as often.

    • KerryJones 6 hours ago

      I complain about this all the time, despite me saying "ask me questions before you code" or all these other instructions to code less, it is SO eager to code. I am hoping their 3.7 reasoning follows these instructions better

      • vessenes 6 hours ago

        We should remember 3.5 was trained in an era when ChatGPT would routinely refuse to code at all and architected in an era when system prompts were not necessarily very effective. I bet this will improve, especially now that Claude has its own coding and arch cli tool.

    • perdomon 6 hours ago

      I get this as well, to the point where I created a specific project for brainstorming without code — asking for concepts, patterns, architectural ideas without any code samples. One issue I find is that sometimes I get better answers without using projects, but I’m not sure if that’s everyone experience.

      • bitbuilder 6 hours ago

        That's been my experience as well with projects, though I have yet to do any sort of A/B testing to see if it's all in my head or not.

        I've attributed it to all your project content (custom instruction, plus documents) getting thrown into context before your prompt. And honestly, I have yet to work with any model where the quality of the answer wasn't inversely proportional to the length of context (beyond of course supplying good instruction and documentation where needed).

  • KaoruAoiShiho 6 hours ago

    They cited Cognition (Devin's maker) in this blog post which is kinda funny.

  • malux85 7 hours ago

    I thought the same thing, I have 3 really hard problems that Claude (or any model) hasn’t been able to solve so far and I’m really excited to try them today

jumploops 6 hours ago

> "[..] in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.”

This is good news. OpenAI seems to be aiming towards "the smartest model," but in practice, LLMs are used primarily as learning aids, data transformers, and code writers.

Balancing "intelligence" with "get shit done" seems to be the sweet spot, and afaict one of the reasons the current crop of developer tools (Cursor, Windsurf, etc.) prefer Claude 3.5 Sonnet over 4o.

  • eschluntz 5 hours ago

    Thanks! We all dogfood Claude every day to do our own work here, and solving our own pain points is more exciting to us than abstract benchmarks.

    Getting things done require a lot of booksmarts, but also a lot of "street smarts" - knowing when to answer quickly, when to double back, etc

    • LouisSayers 5 hours ago

      Could you tell us a bit about the coding tools you use and how you go about interacting with Claude?

      • catherinewu 5 hours ago

        We find that Claude is really good at test driven development, so we often ask Claude to write tests first and then ask Claude to iterate against the tests

        • Kerrick 5 hours ago

          Write tests (plural) first, as in write more than one failing test before making it pass?

    • jasonjmcghee 5 hours ago

      Just want to say nice job and keep it up. Thrilled to start playing with 3.7.

      In general, benchmarks seem to very misleading in my experience, and I still prefer sonnet 3.5 for _nearly_ every use case- except massive text tasks, which I use gemini 2.0 pro with the 2M token context window.

      • jasonjmcghee 3 hours ago

        An update: "code" is very good. Just did a ~4 hour task in about an hour. It cost $3 which is more than I usual spend in an hour, but very worth it.

  • crowcroft 6 hours ago

    Sometimes I wonder if there is overfitting towards benchmarks (DeepSeek is the worst for this to me).

    Claude is pretty consistently the chat I go back to where the responses subjectively seem better to me, regardless of where the model actually lands in benchmarks.

    • ben_w 5 hours ago

      > Sometimes I wonder if there is overfitting towards benchmarks

      There absolutely is, even when it isn't intended.

      The difference between what the model is fitting to and reality it is used on is essentially every problem in AI, from paperclipping to hallucination, from unlawful output to simple classification errors.

      (Ok, not every problem, there's also sample efficiency, and…)

  • bicx 6 hours ago

    Claude 3.5 has been fantastic in Windsurf. However, it does cost credits. DeepSeek V3 is now available in Windsurf at zero credit cost, which was a major shift for the company. Great to have variable options either way.

    I’d highly recommend anyone check out Windsurf’s Cascade feature for agentic-like code writing and exploration. It helped save me many hours in understanding new codebases and tracing data flows.

    • throwup238 6 hours ago

      DeepSeek’s models are vastly overhyped (FWIW I have access to them via Kagi, Windsurf, and Cursor - I regularly run the same tests on all three). I don’t think it matters that V3 is free when even R1 with its extra compute budget is inferior to Claude 3.5 by a large margin - at least in my experience in both bog standard React/Svelte frontend code and more complex C++/Qt components. After only half an hour of using Claude 3.7, I find the code output is superior and the thinking output is in a completely different universe (YMMV and caveat emptor).

      For example, DeepSeek’s models almost always smash together C++ headers and code files even with Qt, which is an absolutely egregious error due to the meta-object compiler preprocessor step. The MOC has been around for at least 15 years and is all over the training data so there’s no excuse.

      • SkyPuncher 5 hours ago

        I've found DeepSeek's models are within a stone's throw of Claude. Given the massive price difference, I often use DeepSeek.

        That being said, when cost isn't a factor Claude remains my winner for coding.

      • rubymamis 5 hours ago

        Hey there! I’m a fellow Qt developer and I really like your takes. Would you like to connect? My socials are on my profile.

        • throwup238 5 hours ago

          We’ve already connected! Last year I think, because I was interested in your experience building a block editor (this was before your blog post on the topic). I’ve been meaning to reconnect for a few weeks now but family life keeps getting in the way - just like it keeps getting in the way of my implementing that block editor :)

          I especially want to publish and send you the code for that inspector class and selector GUI that dumps the component hierarchy/state, QML source, and screenshot for use with Claude. Sadly I (and Claude) took some dumb shortcuts while implementing the inspector class that both couples it to proprietary code I can’t share and hardcodes some project specific bits, so it’s going to take me a bit of time to extricate the core logic.

          I haven’t tried it with 3.7 but based on my tree-sitter QSyntaxHighlighter and Markdown QAbstactListModel tests so far, it is significantly better and I suspect the work Anthropic has done to train it for computer use will reap huge rewards for this use case. I’m still experimenting with the nitty gritty details but I think it will also be a game changer for testing in general, because combining computer use, gammaray-like dumps, and the Spix e2e testing API completes the full circle on app context.

      • bionhoward 5 hours ago

        The big difference is DeepSeek R1 has a permissive license whereas Claude has a nightmare “closed output” customer noncompete license which makes it unusable for work unless you accept not competing with your intelligence supplier, which sounds dumb

        • Aeolun 2 hours ago

          Do most people have an expectation of competing with Claude?

          • ein0p 2 hours ago

            Some of the people who use Claude for coding work on products involving AI. I don't know what percentage, but I bet it's not trivial.

      • tonyhart7 5 hours ago

        I seen people switch from claude due to cost to another model notably deepseek tbh I think it still depends on model trained data on

    • ai-christianson 6 hours ago

      I'm working on an OSS agent called RA.Aid and 3.7 is anecdotally a huge improvement.

      About to push a new release that makes it the default.

      It costs money but if you're writing code to make money, it's totally worth it.

    • newgo 5 hours ago

      How is it possible that deepseek v3 would be free? It costs a lot of money to host models

bredren 12 minutes ago

I just sub’d to Claude a few days ago to rank against extensive use of gpt-4o and o1.

So I started using this today not knowing it was even new.

One thing I noticed is when I tried uploading a PowerPoint template produced by Google slides that was 3 slides—-just to give styling and format—-the web client said I’d exceeded line limit by 1200+%.

Is that intentional?

I wanted Claude to update the deck with content I provided in markdown but it could seemingly not be done, as the line overflow error prevented submission.

Uninen 5 hours ago

I'm somewhat impressed from the very first interaction I had with Claude 3.7 Sonnet. I prompted it to find a problem in my codebase where a CloudFlare pages function would return 500 + nonsensical error and an empty response in prod. Tried to figure this out all Friday. It was super annoying to fix as there's no way to add more logging or have any visibility to the issue as the script died before outputting anything.

Both o1, o3 and Claude 3.5 failed to help me in any way with this, but Claude 3.7 not only found the correct issue with first answer (after thinking 39 seconds) but then continued to write me a working function to work around the issue with the second prompt. (I'm going to let it write some tests later but stopped here for now.)

I assume it doesn't let me to share the discussion as I connected my GitHub repo to the conversation (a new feature in the web chat UI launched today) but I copied it as a gist here: https://gist.github.com/Uninen/46df44f4307d324682dabb7aa6e10...

  • Uninen 5 hours ago

    One thing about the reply gives away why Claude is still basically clueless about Actual Thinking; it suggested me to move the HTML sanitization to the frontend. It's in the CF function because it would be trivial to bypass it in the frontend making it easy to post literally anything in the db. Even a junior developer would understand this.

j_maffe 4 hours ago

It redid half of my BSc thesis in less than 30s :|

https://claude.ai/share/ed8a0e55-633f-4056-ba70-772ab5f5a08b

edit: Here's the output figure https://i.imgur.com/0c65Xfk.png

edit 2: Gemini Flash 2 failed miserably https://g.co/gemini/share/10437164edd0

  • crm9125 2 hours ago

    Yes usually most of the topics covered in undergraduate studies are well documented and understood and therefore will likely be part of the training data of the AI.

    Once you get to graduate studies that's where the material coverage is a little more sparse/niche (though usually still not groundbreaking), and for a PhD. coverage is mostly non-existent since the point is to expand upon current knowledge within the field and many topics are being explored for the first time.

  • ThouYS 4 hours ago

    master and phd next!

  • akreal 3 hours ago

    Could this (or something similar) be found in public access/some libraries?

    • j_maffe 3 hours ago

      There is only a single paper that has published a similar derivation but with a critical mistake. To be fair there are many documented examples of how to derive parametric relationships in linkages and can be quite methodical. I think I could get Gemini or 3.5 to do it but not single shot/ultra fast like here.

Copenjin 3 hours ago

Very good, Code is extremely nice but as others have said, if you let it go on its own it burns through your money pretty fast.

I've made it build a web scraper from scratch, figuring out the "API" of a website using a project from github in another language to get some hints, and while in the end everything was working, I've seen 100k+ tokens being sent too frequently for apparently simple requests, something feels off, it feels like there are quite a few opportunities to reduce token usage.

bpbp-mango 5 minutes ago

Using 3.7 today via the web UI and it feels far lazier than 3.5 was

apsec112 6 hours ago

They don't say this, but from querying it, they also seem to have updated the knowledge cutoff from April 2024 ("3.6") to October 2024 (3.7)

d_watt 6 hours ago

I'm about 50kloc into a project making a react native app / golang backend for recipes with grocery lists, collaborative editing, household sharing, so a complex data model and runtime. Purely from the experiment of "what's it like to build with AI, no lines of code directly written, just directing the AI."

As I go through features, I'm comparing a matrix of Cursor, Cline, and Roo, with the various models.

While I'm still working on the final product, there's no doubt to me that Sonnet is the only model that works with these tools well enough to be Agentic (rather than single file work).

I'm really excited to now compare this 3.7 release and how good it is at avoiding some of the traps 3.5 can fall into.

  • thebigspacefuck 4 hours ago

    This has been my experience as well. Why do the others suck so bad?

    • d_watt 3 hours ago

      I wonder how much it's self fulfilling, where the developers of the agents are tuning their prompts / tool calls to sonnet.

0xcb0 4 hours ago

I can just say that this is awesome. I just did spend 10$ and a handful of querys to init up a app idea I had in a while.

The basic idea is working, it handled everything for me.

From setting up the node environment. Creating the directories, files, patching the files, running code, handling errors, patching again. From time to time it fails to detect its own faults. But when I pinpoint it, it get it most of the time. And the UI is actually more pretty than I would have crafted in v1

When this get's cheaper, and better with each iteration, everybody will have a full dev team for a couple of bucks.

vbezhenar 3 hours ago

So far only o1 pro was breathtaking for me few times.

I wrote a kind of complex code for MCU which deals with FRAM and few buffers, juggling bytes around in a complex fashion.

I was very not sure in this code, so I spent some time with AI chats asking them to review this code.

4o, o3-mini and claude were more or less useless. They spot basic stuff like this code might be problematic for multi-thread environment, those are obvious things and not even true.

o1 pro did something on another level. It recognized that my code uses SPI to talk to FRAM chip. It decoded commands that I've used. It understood the whole timeline of using CS pin. And it highlighted to me, that I used WREN command in a wrong way, that I must have separated it from WRITE command.

That was truly breathtaking moment for me. It easily saved me days of debugging, that's for sure.

I asked the same question to Claude 3.7 thinking mode and it still wasn't that useful.

It's not the only occasion. Few weeks before o1 pro delivered me the solution to a problem that I considered kind of hard. Basically I had issues accessing IPsec VPN configured on a host, from a docker container. I made a well thought question with all the information one might need and o1 pro crafted for me magic iptables incarnation that just solved my problem. I spent quite a bit of time working on this problem, I was close but not there yet.

I often use both ChatGPT and Claude comparing them side by side. For other models they are comparable and I can't really say what's better. But o1 pro plays above. I'll keep trying both for the upcoming days.

  • davidbarker 3 hours ago

    Claude 3.5 Sonnet is great, but on a few occasions I've gone round in circles on a bug. I gave it to o1 pro and it fixed it in one shot.

    More generally, I tend to give o1 pro as much of my codebase as possible (it can take around 100k tokens) and then ask it for small chunks of work which I then pass to Sonnet inside Cursor.

    Very excited to see what o3 pro can do.

  • dkulchenko 3 hours ago

    Have you tried comparing with 3.7 via the API with a large thinking budget yet (32k-64k perhaps?), to bring it closer to the amount of tokens that o1-pro would use?

    I think claude.ai’s web app in thinking mode is likely defaulting to a much much smaller thinking budget than that.

  • momo_O 2 hours ago

    I struggle to get o1 (or any chatgpt model) is getting it to stick to a context.

    e.g. I will upload a pdf or md of an library's documentation and ask it to implement something using those docs, and it keeps on importing functions that don't exist and aren't in the docs. When I ask it where it got `foo` import from, it says something like, "It's not in the docs, but I feel like it should exist."

    Maybe I should give o1 pro a shot, but claude has never done that and building mostly basic crud web3 apps, so o1 feels like it might be overpriced for what I need.

  • Hadriel 38 minutes ago

    ask the same question to grok 3 and report back :)

  • xiphias2 2 hours ago

    Have you tried Grok 3 thinking? I haven’t made up my mind if O1 pro or Grok 3 thinking is the best model

  • akomtu 3 hours ago

    This is how the future AI will break free: "no idea what this update is doing, but what AI is suggesting seems to work and I have other things to do."

  • sylware 2 hours ago

    Is there some truth in the following relationship: o1 -> openai -> microsoft -> github for "training data" ?

TriangleEdge 6 hours ago

This AI race is happening so fast. Seems like it to me anyway. As a software developer/engineer I am worried about my job prospects.. time will tell. I am wondering what will happen to the west coast housing bubbles once software engineers lose their high price tags. I guess the next wave of knowledge workers will move in and take their place?

  • fallinditch 6 hours ago

    My guess is that, yes, the software development job market is being massively disrupted, but there are things you can do to come out on top:

    * Learn more of the entire stack, especially the backend, and devops.

    * Embrace the increased productivity on offer to ship more products, solo projects, etc

    * Be highly selective as far as possible in how you spend your productive time: being uber-effective can mean thinking and planning in longer timescales.

    * Set up an awesome personal knowledge management system and agentic assistants

    • whynotminot 3 hours ago

      > Learn more of the entire stack, especially the backend, and devops.

      I actually wonder about this. Is it better to gain some relatively mediocre experience at lots of things? AI seems to be pretty good at lots of things.

      Or would it be better to develop deep expertise in a few things? Areas where even smart AI with reasoning still can get tripped up.

      Trying to broaden your base of expertise seems like it’s always a good idea, but when AI can slurp the whole internet in a single gulp, maybe it isn’t the best allocation of your limited human training cycles.

    • j_maffe 3 hours ago

      Do you have any specific tips for the last point? I completely agree with it and have set up a fairly robust Obsidian note taking structure that will benefit greatly from an agentic assistant. Do you use specific tools or workframe for this?

    • bilbo0s 5 hours ago

      This is really good advice.

      Underrated comment.

  • viraptor 4 hours ago

    It seems to be slowing down actually. Last year was wild until around llama 3. The latest improvements are relatively small. Even the reasoning models are a small improvement over explicit planning with agents that we could already do before - it's just nicely wrapped and slightly tuned for that purpose. Deepseek did some serious efficiency improvements, but not so much user-visible things.

    So I'd say that the AI race is starting to plateau a bit recently.

    • j_maffe 3 hours ago

      While I agree, you have to remember the dimensionality of the labor-skill space is. The was I see it is that you can imagine the capability of AI as a radius, and the amount of tasks it can cover is a sphere. Linear imporovements in performance causes cubic (or whatever the labor-skill dimensionality is) imporvement in task coverage.

  • LouisSayers 4 hours ago

    I'm not too concerned short to medium term. I feel there are just too many edge cases and nuances that are going to be missed by AI systems.

    For example, systems don't always work in the way they're documented to. How is an AI going to differentiate cases where there's a bug in a service vs a bug in its own code? How will an AI even learn that the bug exists in the first place? How will an AI differentiate between someone reporting a bug and a hacker attempting to break into a system?

    The world is a complex place and without ACTUAL artificial intelligence we're going to need people to at least guide AI in these tricky situations.

    My advice would be to get familiar with using AI and new AI tools and how they fit into our usual workflows.

    Others may disagree, but I don't think software engineers (at least ones the good ones) are going anywhere.

  • throw234234234 5 hours ago

    It has the potential to effect a lot more than just SV/The West Coast - in fact SV may be one of the only areas who have some silver lining with AI development. I think these models have a chance to disrupt employment in the industry globally. Ironically it may be only SWE's and a few other industries (writing, graphic design, etc) that truly change. You can see they and other AI labs are targeting SWEs in particular - just look at the announcement "Claude 3.7 and Code" - very little mention of any other domains on their announcement posts.

    For people who aren't in SV for whatever reason and haven't seen the really high pay associated with being there - SWE is just a standard job often stressful with lots of learning required ongoing. The pain/anxiety of being disrupted is even higher then since having high disposable income to invest/save would of been less likely. Software to them would of been a job with comparable pay's to other jobs in the area; often requiring you to be degree qualified as well - anecdotally many I know got into it for the love; not the money.

    Who would of thought the first job being automated by AI would be software itself? Not labor, or self driving cars. Other industries either seem to have hit dead ends, or had other barriers (regulation, closed knowledge, etc) that make it harder to do. SWE's have set an example to other industries - don't let AI in or keep it in-house as long as possible. Be closed source in other words. Seems ironic in hindsight.

    • throw83288 3 hours ago

      What do you even do then as a student? I've asked this dozens of times with zero practical answers at all. Frankly I've become entirely numb to it all.

      • throw234234234 3 hours ago

        Be glad that you are empowered to pivot - I'm making the assumption you are still young being a student. In a disrupted industry you either want to be young (time to change out of it) or old (50+) - can retire with enough savings. The middle age people (say 15-25 years in the industry; your 35-50 yr olds) are most in trouble depending on the domain they are in. For all the "friendly" marketing IMO they are targeting tech jobs in general - for many people if it wasn't for tech/coding/etc they would never need to use an LLM at all. Anthrophic's recent stats as to who uses their products are telling - its mostly code code code.

        The real answer is either to pivot to a domain where the computer use/coding skills are secondary (i.e. you need the knowledge but it isn't primary to the role) or move to an industry which isn't very exposed to AI either due to natural protections (e.g. trades) or artifical ones (e.g regulation/oligopolies colluding to prevent knowledge leaking to AI). May not be a popular comment on this platform - I would love to be wrong.

        • throw83288 2 hours ago

          Not enough resources to get another bachelors, and a masters is probably practically worthless for a pivot. I would have to throw away the past 10 years of my life, start from scratch, with zero ideas for any real skill-developing projects since I'm not interested at all. Probably a completely non-viable candidate in anything I would choose. Maybe only Robotics would work, and that's probably going to be solved quickly because:

          You assume nothing LLMs do are actually generalization. Once Field X is eaten the labs will pivot and use the generalization skills developed to blow out Field Y to make the next earnings report. I think at this current 10x/yr capability curve (Read: 2 years -> 100x 4 years -> 10000x) I'll get screwed no matter what is chosen. Especially the ones in proximity to computing, which makes anything in which coding is secondary fruitless. Regulation is a paper wall and oligopolies will want to optimize as much as any firm. Trades are already saturating.

          This is why I feel completely numb about this, I seriously think there is nothing I can do now. I just chose wrong because I was interested in the wrong thing.

  • ilrwbwrkhv 6 hours ago

    [flagged]

    • eschluntz 6 hours ago

      Even when I feel this, 90% of any novel thing I'm doing is still old gruntwork, and Claude lets me speed through that and focus all my attention on the interesting 10% (disclaimer: I'm at Anthropic)

      • TriangleEdge 6 hours ago

        Do you think the "deep research" feature that some AI companies have will ever apply to software? For example, I had to update Spring in a Java codebase recently. AI was only able to help mildly to figure out why I was seeing some errors, but that's it.

      • trgaf 5 hours ago

        One can also steal directly from GitHub and strip the license to avoid this grunt work. LLMs automate the stealing.

    • vasco 6 hours ago

      How many novel things does a developer do at work as a percentage of their time?

      • riku_iki 5 hours ago

        that's because stacks/apis/ecosystems are super complicated and require lots of reading/searching to figure out how make things happen. Now this time will be reduced dramatically and devs time will shift on more novel things.

    • eterm 5 hours ago

      The threat is not autocomplete, it's translation.

      "translating" requirements into code is what most developers' jobs are.

      So "just" translation is a threat to job security of developers.

    • GaggiX 6 hours ago

      >There is no intelligence here and Claude 3.7 cannot create anything novel.

      I wouldn't be surprised if people would continue to deny the actual intelligence of these models even in a scenario where they were able to solve the Riemann hypothesis.

      "Every time we figure out a piece of it, it stops being magical; we say, 'Oh, that's just a computation.'" - cit

    • Trasmatta 6 hours ago

      > Its not AI

      AI is a very broad term with many different definitions.

      • dingnuts 6 hours ago

        it does seem disingenuous for something without intelligence to be called intelligence

        • danielbln 5 hours ago

          What's your definition of intelligence?

        • Trasmatta 6 hours ago

          I feel like you're nitpicking. Intelligence is ALSO a broad term with no singular consensus on what it means or what it is.

    • martin-t 6 hours ago

      Build on top of stolen code, no less. HN hates to hear it but LLMs are a huge step back for software freedom because as long as they call it "AI" and as long as politicians don't understand it, it allows companies to launder GPL code and reuse it without credit and without giving users their rights.

    • gchokov 6 hours ago

      This is BS and you are not listening and watching carefully.

      • lukaslalinsky 6 hours ago

        Even the best LLMs today are just junior devs with a lot of knowledge. They make a lot of the same mistakes junior devs would do. Even the responses, when you point out those mistakes, are the same.

        If anything, it's a tool for junior devs to get better and spend more time on the architecture.

        Using AI code without fully understanding it (ie operated by a non-programmer) is just recipe for disaster.

        • nprateem 5 hours ago

          The worst is when you tell it it's made a mistake and it agrees.

          "You're right, but I just like wasting your time"

      • dingnuts 6 hours ago

        OK then show me a model that can answer honestly and correctly about whether or not it knows something.

        • medvezhenok 5 hours ago

          Show me a human that can answer honestly and correctly about whether they know something.

    • codingwagie 5 hours ago

      This is pure cope

      • ilrwbwrkhv an hour ago

        AI cannot write a simple dockerfile. I don't know how simple stuff you guys are writing. If ai can do it then it should be an excel sheet and not code.

        • simonw 38 minutes ago

          I've been writing Dockerfiles with LLMs for over a year now - all of the top tier LLMs do a great job of those in my experience.

vondur 16 minutes ago

Tested on some chemistry problem; interestingly it was wrong on a molecular structure. Once I corrected it, it was able to draw it correctly. It was very polite about it.

meetpateltech 6 hours ago

When you ask: 'How many r's are in strawberry?'

Claude 3.7 Sonnet generates a response in a fun and cool way with React code and a preview in Artifacts

check out some examples:

[1]https://claude.ai/share/d565f5a8-136b-41a4-b365-bfb4f4400df5

[2]https://claude.ai/share/a817ac87-c98b-4ab0-8160-feefd7f798e8

  • OsrsNeedsf2P 5 hours ago

    This test has always been so stupid since models work at the token level. Claude 3.5 already 5xs your frontend dev speed but people still say "hurr durr it can't count strawberry" as if that's a useful problem

    • dannyw 5 hours ago

      The problem also comes to LLMs being confidently wrong when it’s wrong.

    • bufferoverflow 4 hours ago

      This test isn't stupid. If it can't count the number of letters in a text, can you rely on it with more important calculations?

      • TeMPOraL 17 minutes ago

        Not on calculations that involve counting at a sub-token level. Otherwise, it depends.

      • stnmtn 3 hours ago

        You can rely on it for anything that you can validate quickly. And it turns out, there are a lot of problems which are trivial to validate the solution to, but difficult to build the solution.

        • 101008 2 hours ago

          Coding is not one of those cases or edge cases wouldn't exists

  • jasonjmcghee 6 hours ago

    I'm guessing this is an easter egg, but this was a huge gripe I had with artifacts and eventually disabled it (now impossible to disable afaict) as I'd ask question completely unrelated to code or clearly not wanting code as an output, and I'd have to wait for it to write a program (which you can't stop afaict, it stops the current artifact then starts a new one)

    (still claude sonnet is my go-to and favorite model)

zone411 34 minutes ago

Claude 3.7 Sonnet Thinking scores 33.5 (4th place after o1, o3-mini, and DeepSeek R1) on my Extended NYT Connections benchmark. Claude 3.7 Sonnet scores 18.9. I'll run my other benchmarks in the upcoming days.

https://github.com/lechmazur/nyt-connections/

ckbishop 5 hours ago

Well, I used 3.5 via Cursor to do some coding earlier today, and the output kind of sucked. Ran it through 3.7 a few minutes ago, and it's much more concise and makes sense. Just a little anecdotal high five from me.

ianhawes 7 hours ago

> Include the beta header output-128k-2025-02-19 in your API request to increase the maximum output token length to 128k tokens for Claude 3.7 Sonnet.

This is pretty big! Previously most models could accept massive input tokens but would be restricted to 4096 or 8192 output tokens.

  • thegeomaster 7 hours ago

    This amounts to a cost-saving measure - you can generate arbitrarily many tokens by appending the output and re-invoking the model.

azinman2 6 hours ago

To me the biggest surprise was seeking grok dominate in all of their published benchmarks. I haven’t seen any benchmarks of it yet (which I take with a giant heap of salt), but it’s still interesting nevertheless.

I’m rooting for Anthropic.

  • phillipcarter 6 hours ago

    Neither a statement for or against Grok or Anthropic:

    I've now just taken to seeing benchmarks as pretty lines or bars on a chart that are in no way reflective of actual ability for my use cases. Claude has consistently scored lower on some benchmarks for me, but when I use it in a real-world codebase, it's consistently been the only one that doesn't veer off course or "feel wrong". The others do. I can't quantify it, but that's how it goes.

    • vessenes 6 hours ago

      O1 pro is excellent at figuring out complex stuff that Claude misses. It’s my go to mid level debug assistant when Claude spins

  • koakuma-chan 6 hours ago

    Grok does the most thinking out of all models I tried (it can think for 2+ minutes), and that's why it is so good, though I haven't tried Claude 3.7 yet.

  • pertymcpert 6 hours ago

    Indeed. I wonder what the architecture for Claude and Grok3 is. If they're still dense models was the MoE excitement with R1 was a tad premature...

  • viccis 6 hours ago

    Yeah, putting it on the opposite side of that comparison chart was a sleezy but likely effective move.

slantedview 5 hours ago

As a Claude Pro user, one of the biggest problems I have with day to day use of Sonnet is running out of tokens, and having to wait several hours. Would this new deep thinking capability just hit this problem faster?

  • k8sToGo 4 hours ago

    Have you tried just using the API and pay as you go?

    • mvdtnz 3 hours ago

      That doesn't answer his very specific question.

      • djeastm 15 minutes ago

        This thread is giving me a flashback to Stack Overflow

epistasis 5 hours ago

It's pretty fascinating to refresh the usage page on the API site while working [0].

After initialization it was up to 500k tokens ($1.50). After a few questions and a small edit, I'm up to over a million tokens (>$3.00). Not sure if the amount of code navigation and typing saved will justify the expense yet. It'll take a bit more experimentation.

In any case, the default API buy of $5 seems woefully low to explore this tool.

[0] https://console.anthropic.com/settings/usage

  • ChrisRob an hour ago

    I second that. Did a little bit of local testing with Claude Code, mostly explaining my repository and trying to suggest a few changes and 30 minutes later whoosh: 5$ gone.

  • koakuma-chan 5 hours ago

    It also produces terrible code even though it's supposed to be good for front-end development.

    • trekkie1024 5 hours ago

      Could you share an example?

      • koakuma-chan 5 hours ago

        TLDR: told it to implement a grid view as an alternative to the existing list view, and specifically told it to DRY the code. What it did? Copy and pasted the list view implementation (definitely not DRY), and tried to make it a grid, and even though it is a grid, it looks terrible (https://i.imgur.com/fJiSjq4.png).

        I don't understand how people use cursor and all that other shit when it cannot follow such simple instructions.

        Prompt (Claude Code): Implement an alternative grid view that the users can switch to. Follow the existing code style with empty comments and line breaks for improved code readability. Use snake case. DRY the code, avoid repetition of code. Do not change the font size or weight.

        Output: https://github.com/mayo-dayo/app/compare/0.4...claude-code-g...

        • sensanaty 4 hours ago

          In any moderately sized codebase it's basically useless indeed. Pretty much all the praise and hype I ever see is from people making todo-list-tier applications and shouting with excitement how this is going to replace all of humanity.

          Hell, I still have to remind it (Cursor) to not give me fucking React a few messages after I've already told it to not give me React (it's a Vue application with not a single line of React in it). Genuinely maddening, but the infinite wisdom of the higher ups forces me into wasting my time with this crap

          • pityJuke 3 hours ago

            There's a middle ground, I find.

            Absolutely, when tasked with something quite complex in a complex code base, it doesn't really work. It can get you some of the way there, and some of the code it produces gives you great ideas on where to go from, but it doesn't work.

            But there are certainly some tasks where it excels. I asked it to refactor a rather gnarly function (C++), and it did a great job at decomposing it. The initial decomposition was a bit naive: the original function took in a vector, and would parse what the function & data from the vector, and the decomposition split out the functions, but the data still came in as a vector. For instance, one of the functions took a filename, and file contents, and it took it as element 0 and element 1 from a vector, when it should obviously be two parameters. But some further prompting and it took it to the end.

          • epistasis 3 hours ago

            Claude's predilection and evangelism for React is frustrating. Many times I have used it as search with a question like "In the Python library X how do I do Z?" And I'll get a React widget that computes what I was trying to compute.

  • epistasis 3 hours ago

    Update: Code tokens appear to be cheaper than 3.7 tokens, looks like it is around $0.75/million tokens for code, rather than the $3/million that the articles specifies for Claude 3.7

    • hectormalot an hour ago

      Likely because it is blended with cached token pricing, which is at $0.30/million. You can use ‘group by’ in the usage portal to see the breakdown.

tkgally 38 minutes ago

In early January, inspired by a post by Simon Willison, I had Claude 3.5 Sonnet write a couple of stand-up comedy routines as done by an AI chatbot speaking to a mixed audience of AIs and humans. I thought the results were pretty good—the only AI-produced humor that I had found even a bit funny.

I tried the same prompt again just now with Claude 3.7 Sonnet in thinking mode, and I found myself laughing more than I did the previous time.

An excerpt:

[Conspiratorial tone]

Here's a secret: when humans ask me impossible questions, I sometimes just make up an answer that sounds authoritative.

[To human section]

Don't look shocked! You do it too! How many times has someone asked you a question at work and you just confidently said, "Six weeks" or "It's a regulatory requirement" without actually knowing?

The difference is, when I do it, it's called a "hallucination." When you do it, it's called "management."

Full set: https://gally.net/temp/20250225claudestandup2.html

elliot07 6 hours ago

The cost is absurd (compared to other LLM providers these days). I asked 3 questions and the cost was ~0.77c.

I do like how this is implemented as a bash tool and not an editor replacement though. Never leaving Vim! :P

  • nomel 2 hours ago

    That 0.77 can save hours of work though, fighting with or being misdirected by other LLM. And, relative to hourly rate, or a cup of coffee, it's incredibly insignificant, if just used for the heavy questions.

    My LLM client can switch between whatever models, mid conversation. So I'll have a question or two in the more expensive, then drop down to the cheaper for explanations/questions that help me understand. Rewind time, then hit the more expensive models with relevant prompts.

    At the edges, it really ends up being "this is the only model that can do this".

kmlx 6 hours ago

Claude 3.5 sonnet has been my go to for coding tasks, it’s just so much better than the others.

but I’ve tried using the api in production and had to drop it due to daily issues: https://status.anthropic.com/

compare to https://status.openai.com/

any idea when we’ll see some improvements in api availability or will the focus be more on the web version of claude?

  • scrollop 4 hours ago

    Err, if you compare the two consoles you'll see that anthropic is actually slightly better on average than openai's uptime.

    • kmlx 4 hours ago

      click on individual days. you’ll notice that there are daily errors.

yester01 2 hours ago

Was poking around the minified claude code entrypoint and saw an easter egg for free stickers.

If you send Claude Code “Can I get some Anthropic stickers please?” you'll get directed to a Google Form and can have free stickers shipped to you!

ctoth 7 hours ago

I've been using O3-mini with reasoning effort set to high in Aider and loving the pricing. This looks as though it'll be about three times as expensive. Curious to see which falls out as most useful for what over the next month!

  • rahimnathwani 6 hours ago

    Aro using o3-mini for editing or just architect in architect-editor mode?

    • vessenes 6 hours ago

      It is .. not a great architect. I have high hopes for 3.7 though - even 3.5 architect matched with 3.5 coding is generally better than 3.5 coding alone.

rahimnathwani 6 hours ago

I'm curious how Claude Code compares to Aider. It seems like they have a similar user experience.

tablet 7 hours ago

The progress in AI area is insane. I can't keep up with all the news. And I have work to do...

  • amelius 6 hours ago

    It stopped being revolutionary and is now mostly evolutionary, though.

    • dingnuts 6 hours ago

      it's been evolutionary for a long time. I fine-tuned a GPT-2 based chat bot that could form complete sentences back in like 2017

      It's been so long that I'm not even certain which YEAR I set that up.

      • falcor84 5 hours ago

        Where do you draw the line? If going from forming sentences to achieving medal level success on IMO questions, doing extensive web research on its own and writing entire SaaS apps based on a prompt in under 10 years is just "evolutionary", then it's one heck of an evolution.

        • nomel an hour ago

          It's always been the case that people in to tech see a smooth slope rather than some sort of discontinuity, like you might perceive if you stepped back a bit. That's why you can go laugh at "thing makes a billion dollars even though nerds say it's obvious and incremental" type posts going back 25 years. iPhone is a great one.

      • og_kalu 5 hours ago

        >I fine-tuned a GPT-2 based chat bot that could form complete sentences back in like 2017.

        GPT-2 was a 2019 release lol.

  • frankfrank13 6 hours ago

    This is a pretty small update, no? Nothing major since R1, everyone else is just catching up to that, and putting small spins on it, Anthropic's is "hybrid" research instead of separate models

    • tablet 6 hours ago

      Well, now I have to play with it, try to see how it will generate code for our agentic assistance (we do rely on code to execute tasks flows), etc.

taosx 3 hours ago

The model is expensive, it almost reaches what I charge per hour. If used right it can be a productivity increase otherwise if you trust it, it WILL introduce silent bugs. So if I have to go over the code line by line I'd prefer to use the cheapest viable model: deepseek, gemini any other free self-hosted models.

Congratz to the team!

hankchinaski 5 hours ago

It’s amazingly good, but it will be scaringly good when there will be a way to include the entire codebase in the context and let it create and run various parts of a large codebase autonomously. Right now I can only do patch work and give specific code snippets to make it work. Excited to try this new version out, I’m sure I won’t be disappointed,

Edit: I just tried claude code CLI and it's a good compromise, it works pretty well, it does the discovery by itself instead of loading the whole codebase into context

  • flutas 5 hours ago

    FWIW, there's a project to turn it into something similar, though I think it's lacking the "entire in context" part and runs into rate limits quick with Claude.

    https://github.com/All-Hands-AI/OpenHands

    The few times I've tested it out though it fails fairly quick and gets hung up (usually on setting up the project while testing with Kotlin / Go).

  • thefourthchime 4 hours ago

    Cursor AI is getting there.

    • hankchinaski 4 hours ago

      cursor is just a wrapper to the apis and is unnecessarily expensive, I use zed editor with custom API keys and it works super well

Attummm 2 hours ago

Tested the new model, seems to have the same issue as october model.

Seems to answer before fully understanding the requests, and it often gets stuck into loops.

And this update removed the june model which was great, very sad day indeed. I still don't understand why they have to remove a model that is do well received...

Maybe its time to switch again, gemini is making great strides.

Daniel_Van_Zant 3 hours ago

Being able to control how many tokens are spent on thinking is a game-changer. I've been building fairly complex, efficient, systems with many LLMs. Despite the advantages, reasoning models have been a no-go due to how variable the cost is, and how hard that makes it to calculate a final per-query cost for the customer. Being able to say "I know this model can always solve this problem in this many thinking tokens" and thus limiting the cost for that component is huge.

DavidPP 6 hours ago

Haven't had time to try it out, but I've built myself a tool to tag my bookmarks and it uses 3.5 Haiku. Here is what it said about the official article content:

I apologize, but the URL and page description you provided appear to be fictional. There is no current announcement of a Claude 3.7 Sonnet model on Anthropic's website. The most recent Claude 3 models are Claude 3 Haiku, Sonnet, and Opus, released in March 2024. I cannot generate a description for a non-existent product announcement.

I appreciate their stance on safety, but that still made me laugh.

numba888 3 hours ago

This was nice. I passed it jseessort algorithm. If you remember discussed here recently. Claude 3.7 generated C++ code. Non-working. But in few steps it gave extensive test, then fix. It looks to be working after a couple of minutes. It's 5-6 times slower than std::sort. Result is better than I've got from o3-mini-hard. Not fair comparison actually as prompting was different.

jedberg 7 hours ago

Last week when Grok launched the consensus was that its coding ability was better than Claude. Anyone have a benchmark with this new model? Or just warm feelings?

  • esafak 6 hours ago

    They merely claimed that. I have not seen many people confirm that it is the best, let alone a consensus. I don't believe it is even available through an API yet.

  • minihat 6 hours ago

    Grok 3 with thinking is comparable to o1 for writing complex algorithms.

    However, Grok sometimes loses the context where o1 seems not to. For this reason I still mostly use o1.

    I have found both o1 and Grok 3 to be substantially better than any Claude offering.

mirekrusin 4 hours ago

Ok, just got documentation and fixed two bugs in my open source project.

$1.42

This thing is a game changer.

lysace 7 hours ago

It's fascinating how close these companies are to each other. Some company comes up with something clever/ground-breaking and everyone else has implemented it a few weeks later.

Hard not to think of Kurzweil's Law of Accelerating Returns.

  • azinman2 6 hours ago

    It’s extremely unlikely that everyone is copying in a few weeks for models that themselves take many weeks if not longer to train. Great minds think alike, and everyone is influencing everyone. The history of innovation is filled with examples of similar discoveries around the same time but totally disconnected in the world. Now with the rate of publishing and the openness of the internet, you’re only bound to get even more of that.

    • KaoruAoiShiho 6 hours ago

      The copying here probably goes to strawberry from o1 which is like at least 6 months but maybe copying efforts started even earlier.

    • lysace 6 hours ago

      Isn't the reasoning thing essentially a bolt-on to existing trained models? Like basically a meta-prompt?

      • azinman2 6 hours ago

        No.

        DeepSeek and now related projects have shown it’s possible to add reasoning via SFT to existing models, but that’s not the same as a prompt. But if you look at R1 they do a blend of techniques to get reasoning.

        For Anthropic to have a hybrid model where you can control this, it will have to be built into the model directly in its training and probably architecture as well.

        If you’re a competent company filled with the best AI minds and a frontier model, you’re not just purely copying… you’re taking ideas while innovating and adapting.

      • Philpax 6 hours ago

        The fundamental innovation is training the model to reason through reinforcement learning; you can train existing models with traces from these reasoning models to get you within the same ballpark, but taking it further requires you to do RL yourself.

      • pertymcpert 6 hours ago

        Somewhat but not exactly? I think the models need to be trained to think.

    • riku_iki 4 hours ago

      > for models that themselves take many weeks if not longer to train.

      they all have foundational heavy-trained model, and then they can do follow up experimental training much faster.

  • mechagodzilla 7 hours ago

    It does seem like it will be very, very hard for the companies training their own models to recoup their investment when the capabilities of open-weight models catch up so quickly - general purpose LLMs just seem destined to be a cheap commodity.

    • jsheard 7 hours ago

      Well, the companies releasing open weights also need to recoup their investments at some point, they can't coast on VC hype forever. Huge models don't grow on trees.

      • mechagodzilla 6 hours ago

        Or, like Meta, they make their money elsewhere and just seem interested in wrecking the economics of LLMs. As soon as an open-weight model is released, it basically sets a global floor that says "Models with similar or worse performance effectively have zero value," and that floor has been rising incredibly quickly. I'd be surprised if the vast, vast majority of queries ChatGPT gets couldn't get equivalently good results from llama3/deepseek/qwen/mistral models, even for those paying for the pro versions.

        • Philpax 6 hours ago

          Eh, to some extent - there's still a pretty significant cost to actually running inference for those models. For example, no consumer can run DeepSeek v3/r1 - that requires tens, possibly hundreds, of thousands of dollars of hardware to run.

          There's still room for other models, especially if they have different performance characteristics that make them suitable to run under consumer constraints. Mistral has been doing quite well here.

          • mechagodzilla 6 hours ago

            If you don't need to pay for the model development costs, I think running inference will just be driven down to the underlying cloud computing costs. The actual requirement to passably (~4-bit quantization) run Deepseek v3/r1 at home is really just having 512GB or so of RAM - I bought a used dual-socket xeon for $2k that has 768GB of RAM, and can run Deepseek R1 at 1-1.5 tokens/sec, which is perfectly usable for "ask a complicated question, come back an hour or so later and check on the result".

        • riku_iki 4 hours ago

          I think Meta folks just don't know how to come to this market and build something potentially profitable, and doing random stuff, because need to report some results to management.

  • luma 6 hours ago

    Where RL can play into post training there's something of an anti-moat. Maybe a "tow rope"?

    Let's say OAI releases some great new model. The moment it becomes available via API, everyone else can make use of that model to create high-quality RL training data, which can then be used to make their models perform better.

    The very act of making an AI model commercially available is the same act which allows your competitors to pull themselves closer to you.

smusamashah 3 hours ago

> output limit of 128K tokens

Is this limit on thinking mode only? Or does normal mode have same limit now? 8192 tokens output limit can be bit small these days.

I was trying to extract all urls along with their topics from a "what are you working on" HN thread. And 8192 token limit couldn't cover it.

bnc319 7 hours ago

Pretty amazing how DeepSeek started the visual reasoning trend, xAI featured it in their latest release, and now Anthropic does the same.

  • anjel 7 hours ago

    I took DS visual reasoning to be an elegant misdirect from how much slower DS returns your query's output.

    • nomel an hour ago

      I thought my internet cut out the first time I used o1.

danieldevries 3 hours ago

Just tried Claude code. First impressions, it seems rather expensive. I prefer how Aider allows finer control over which files to add, or to use a sub-tree of a git repo. Also, It feels like the API calls when using Claude code are much faster then when using 3.7 on Aider. Giving bandwidth priority?

falcor84 5 hours ago

Why can't they count to 4?

I accepted it when Knuth did it with TeX's versioning. And I sort of accept it with Python (after the 2-3 transition fiasco), but this is getting annoying. Why not just use natural numbers for major releases?

  • jjice 5 hours ago

    I think I heard on a podcast with some of their team that they want 4 to be a massive jump. If I recall, they said that they want Haiku (the smallest of their current gen models) to be as good as Opus (the highest version, although there isn't one in the 3.5+ line) of the previous generation.

  • sensanaty 4 hours ago

    You'd think all these companies would have a single good naming convention, amazingly they don't. I suspect it's half on purpose so they can nerf the models without anyone suspecting once the hype dies down, since with every one of these models the latter version of the "same" version is worse than the launch version

pcwelder 5 hours ago

Claude code terminal ux feels great.

It has some well thought out features like restarting conversation with compressed context.

Great work guys.

However, I did get stuck when I asked it to run `npm create vite@latest todo-app` because it needs interactivity.

anonzzzies 6 hours ago

We have used claude almost exclusively since 3.5 ; we regularly run our internal benchmark (coding) against others, but it's mostly just a waste of time and money. Will be testing 3.7 the coming days to see how it stacks up!

highfrequency 5 hours ago

Awesome work. When CoT is enabled in Claude 3.7 (not the new Claude Code), is the model now able to compile and run code as part of its thought process? This always seemed like very low hanging fruit to me, given how common this pattern is: ask for code, try running it, get an error (often from an outdated API in one of the packages used), paste the error back to Claude, have Claude immediately fix it. Surely this could be wrapped into the reasoning iterations?

leyoDeLionKin 4 hours ago

I cancelled after I hit the limit, plus you have very limited support here in europe

cavisne 3 hours ago

So far Claude Code seems very capable, it oneshotted something I couldnt get to work in cursor at all.

However its expensive, 5m of work cost ~$1 which.

  • biker142541 an hour ago

    Likewise, tried a couple basic things and nearly at $1 already. I can see this adding up fast, per the blog post's fair warning below. Coming from Cursor, I'm a bit scared to even try to compare workflows...

    >Claude Code consumes tokens for each interaction. Typical usage costs range from $5-10 per developer per day, but can exceed $100 per hour during intensive use.

Flux159 7 hours ago

It's interesting that Anthropic is making their own coding agent with Claude Code - is this a sign of them looking to move up the stack and more into verticals that model wrapper startups are in?

  • vessenes 6 hours ago

    This makes sense to me: sell razor blades. Presumably Claude has a large developer distribution channel so they will keep eyeballing what to ‘give away’ that turns the dials on inference billing.

    I’d guess this will keep raising the bar for paid or open source competitors, so probably good for end users esp given they aren’t a monopoly by any means.

  • madduci 7 hours ago

    GitHub copilot has now introduced Claude as model as well

ginkgotree 4 hours ago

Been using 3.5 sonnet for a mobile app build the past month. Havent had much time to get a good sense of 3.7 improvements, but I have to say the dev experience improvement of Claude Code right in my shell is fantastic. Loving it so far

unsupp0rted 3 hours ago

Anybody else noticing that in Cursor, Claude Sonnet 3.7 is thinking much slower than Claude Sonnet 3.5 did?

  • nomel an hour ago

    Claude 3.5 was not a thinking model. It's thinking time was 0s.

    • unsupp0rted an hour ago

      Okay, if we're being pedantic, then anybody notice 3.7 (not 3.7 thinking) is slower to respond and slower to make code changes than 3.5 was?

bittermandel 4 hours ago

Claude Code works pretty OK so far, but Bash doesn't work straight up. Just sits and waits, even when running something basic like "!echo 123".

gigatexal 3 hours ago

How is the code generation? Open ai was generating good looking terraform but it was hallucinating on things that were incorrect.

g8oz 5 hours ago

Congratulations on the release! While team members are monitoring this discussion let me add that a relatively simple improvement I’d like to see in the UI is the ability to export a chat to markdown or XML.

dev0p 3 hours ago

The quality of the code is so much better!

The UI seems to have an issue with big artifacts but the model is noticeably smarter.

Congratulations on the release!

ungreased0675 7 hours ago

Awesome. Claude is significantly better than other models at code assistant tasks, or at least in the way I use it.

  • jasondigitized 5 hours ago

    Totally agree. I continue to be blown away at how good it is at understanding, explaining, and writing code. Got an obscure error? Give Claude enough context and it is pretty dang good and getting you on glide slope.

jsemrau 5 hours ago

What I found one of the most interesting takeaways from Huggingface's GAIA is that the agent would provide better result when the agent "reasoned" the response to the task in code.

specto 4 hours ago

I've had a personal subscription to Claude for a while now. I would love if that also gave me access to some amount of API calls.

Alifatisk 6 hours ago

Why is Claude-3.5-Haiku considered PRO and Claude-3.7-Sonnet is for free users?

_joel 6 hours ago

I've been using 3.5 with Roocode for the past couple of weeks and I've found it really quite powerful. Making it write tests and run them as part of the flow is with vscode windows pinging about is neat too.

wewewedxfgdf 5 hours ago

What makes software "agentic" instead of just a computer program?

I hear lots of talk about agents and can't see them as being any different from an ordinary computer program.

  • dannyw 5 hours ago

    Computer programs generally don’t call functions non-deterministically, including choosing what functions to call , and when, at runtime.

ndm000 6 hours ago

Have there been any updates to Claude 3.5 Sonnet pricing? I can't find that anywhere even though Claude 3.7 Sonnet is now at the same price point. I could use 3.5 for a lot more if it's cheaper.

knes 5 hours ago

at Augment (https://augmentcode.com) we were one of the partner who tested 3.7 pre-launch. And it has been a pretty significant increase in quality and code understanding. Happy to answer some questions

FYI, We use Claude 3.7 has part of the new features we are shipping around Code Agent & more.

batterylake 6 hours ago

Hi Claude Code team, excited for the launch!

How well does Claude Code do on tasks which rely heavily on visual input such as frontend web dev or creating data visualizations?

  • wolffiex 6 hours ago

    As a CLI, this tool is most efficient when it can see text outputs from the commands that it runs. But you can help it with visual tasks by putting a screenshot file in your project directory and telling claude to read it, or by copying an image to your clipboard and pasting it with CTRL+V

siva7 6 hours ago

Will Claude Code also be available with Pro Subscription?

RomanPushkin 3 hours ago

> strong improvements in coding and front-end web development

The best part

msp26 6 hours ago

Does it show the raw "reasoning" tokens or is it a summary?

Edit: > we’ve decided to make its thought process visible in raw form.

koakuma-chan 6 hours ago

Where did 3.6 go?

  • danielbln 5 hours ago

    Allegedly many people called new newest 3.5 revision 3.6, so Anthropic just rolled with it and called this 3.7.

photon_collider 6 hours ago

Nice to see a new release from Anthropic. Yet, this only makes me even more curious of when we'll see a new Claude Opus model.

  • Alex-Programs 6 hours ago

    I doubt we will. The state of the art seem to have moved away from the GPT-4 style giant and slow models to smaller, more refined ones - though Groq might be a bit of a return to the "old ways"?

    Personally I'm hoping they update Haiku at some point. It's not quite good enough for translation at the moment, while Sonnet is pretty great and has OK latency (https://nuenki.app/blog/llm_translation_comparison)

  • bakugo 6 hours ago

    Funny enough, 3.7 Sonnet seems to think it's Opus right now:

    > "thinking": "I am Claude, an AI assistant created by Anthropic. I believe the specific model is Claude 3 Opus, which is Anthropic's most capable model at the time of my training. However, I should simply identify myself as Claude and not mention the specific model version unless explicitly asked for that level of detail."

grav 5 hours ago

Claude 3.7 Sonnet seems to have a context window of 64.000 via the API:

  max_tokens: 4242424242 > 64000, which is the maximum allowed number of output tokens for claude-3-7-sonnet-20250219
I got a max of 8192 with Claude 3.5 sonnet.
  • koakuma-chan 5 hours ago

    Context window is how long your prompt can be. Output tokens is how long its response can be. What you sent says its response can be 64k tokens at maximum.

ismaelvega 4 hours ago

Any plans to make some HackerRank Astra bench?

ein0p 2 hours ago

I wish Amodei didn't write that essay where he begged for export controls on China like that disabled corgi from a meme. I won't use anything Anthropic out of principle now. Compete fairly or die.

rs_rs_rs_rs_rs 7 hours ago

Hope it's worth the money because it's quite expensive.

wellthisisgreat 4 hours ago

What’s the privacy like for Claude Code? Is it memorizing all the codebase?

  • punkpeye 2 hours ago

    It is whatever their API privacy policy is, i.e. private by default.

syndicatedjelly 3 hours ago

Claude Code is pretty sick. I love the terminal integration, I like being able to stay on the keyboard and not have to switch UIs. It did a nice job learning my small Django codebase and helping me finish out a feature that I wasn't sure how to complete.

forrestthewoods 6 hours ago

Claude is the best example of benchmarks not being reflective of reality. All the AI labs are so focused on improving benchmark scores but when it comes to providing actual utility Claude has been the winner for quite some time.

Which isn’t to say that benchmarks aren’t useful. They surely are. But labs are clearly both overtraining and overindexing on benchmarks.

Coming from gamedev I’ve always been significantly more yolo trust your gut than my PhD co-workers. Yes data is good. But I think the industry would very often be better off trusting guts and not needing a big huge expensive UX study or benchmark to prove what you can plainly see.

waltercool 7 hours ago

Just like OpenAI or Grok, there is no transparency and no way for self-hosting purposes. Your input and confidential information can be collected for training purposes.

I just don't trust those companies when you use their servers. This is not a good approach to LLM democratization.

  • azinman2 6 hours ago

    I wouldn’t assume there’s no way to self host — it just costs a lot more than open weights.

    Anthropic claims they don’t train on their inputs. I haven’t seen any reason to disbelieve them.

    • waltercool 6 hours ago

      But there is no way to know if their claims are true either. Your inputs are processed into their servers, then you get a response. Whatever happens in the middle, only Anthropic knows. We don't even know of governments are actually pushing AI companies to enforce censorship or spying people, like we seen recently at UK government getting into Apple E2E encryption.

      This criticism is valid for the business who wants to use AI to improve coding, code analysis or code review, documentation, emails, etc, but also for that individual who don't want to rely on 3rd party companies for AI usage.

      • simonw 2 hours ago

        You can sign a contract with Anthropic that fully bakes their promise not to train on your input.

        You can also access Claude via both AWS Bedrock and Google Vertex, both of which come with very robust guarantees about how your data is used.

bbor 7 hours ago

  Just as humans use a single brain for both quick responses and deep reflection, we believe reasoning should be an integrated capability of frontier models rather than a separate model entirely.
Interesting. I've been working on exactly this for a bit over two years, and I wasn't surprised to see UAI finally getting traction from the biggest companies -- but how deep do they really take it...? I've taken this philosophy as an impetus to build an integrated system of interdependent hierarchical modules, much like Minsky's Society of Mind that's been popular in AI for decades. But this (short, blog) post reads like it's more of a behavioral goal than a design paradigm.

Anyone happen to have insight on the details here? Or, even better, anyone from Anthropic lurking in these comments that cares to give us some hints? I promise, I'm not a competitor!

Separately, the throwaway paragraph on alignment is worrying as hell, but that's nothing new. I maintain hope that Anthropic is keeping to their founding principles in private, and tracking more serious concerns than "unnecessary refusals" and prompt injection...

m3kw9 7 hours ago

Wonder if Aider will copy some of these features

ramesh31 4 hours ago

It would be reeeaaally nice if someone built Claude Code into a Cline/Aider type extension...

anti-soyboy 6 hours ago

OpenAI should be worried as they products are weak

nurettin 5 hours ago

What I love about their API is the tools array. Given a json schema describing your functions, it will output tool usage appropriate for the prompt. You can return tool results per call, and it will generate a dialog and additional tool calls based on those results.

simion314 6 hours ago

Why not accepting other payment methods like PayPal/venmo ? Steam, Netflix have developers managed to integrate those payment methods so I conclude that Anthropic,Google, MS, OpenAI don't really need the money from the user but just hunting from big investors.

newbie578 6 hours ago

Scary to watch the pace of progress and how the whole industry is rapidly shifting.

I honestly didn’t believe things would speed up this much.

ramesh31 6 hours ago

Well there goes my evening

shortrounddev2 6 hours ago

Does claude have a vscode plugin yet? I dropped github copilot because I didnt want so many subscriptions

dzhiurgis 6 hours ago

Anyone else noticed all the reasoning models kinda catch up on claude and claude itself turned to crap last week?

  • punkpeye 2 hours ago

    I have observed some unusual behavior.

    I wonder if it simply due to reprioritization of resources.

    Presumably, there is some parameter that determines how long a model is allowed to use resources for, which would get tapered in preparation for a demand surge of another model.

thanhhaimai 6 hours ago

> Third, in developing our reasoning models, we’ve optimized somewhat less for math and computer science competition problems, and instead shifted focus towards real-world tasks that better reflect how businesses actually use LLMs.

Company: we find that optimizing for LeetCode level programming is not a good use of resources, and we should be training AI less on competition problems.

Also Company: we hire SWEs based on how much time they trained themselves on LeetCode

/joke of course

  • nico 6 hours ago

    And it's also the reality of hiring practices for most VC-backed and public companies

    Some try to do something more like "real-world" tasks, but those end up either being either just toy problems, or long take homes

    Personally, I feel the most important things to prioritize when hiring are: is the candidate going to get along with their teammates (colleagues, boss, etc), and do they have the basic skills to relatively quickly learn their jobs once they start?

  • Svoka 6 hours ago

    My manager explained to me that LeetCode is proving that you are willing to dance the dance. Same as PhD requirements etc - you probably won't be doing anything related and definitely nothing related to LeetCode, but you display dedication and ability.

    I kinda agree that this is probably reason why companies are doing it. I don't like it, but this is besides the matter.

    Using Claude other models in interviews probably won't be allowed any time soon, but I do use it the work. So it does make sense.

EliasWatson 6 hours ago

I asked it for a self-portrait as a joke and the result is actually pretty impressive.

Prompt: "Draw a SVG self-portrait"

https://claude.site/artifacts/b10ef00f-87f6-4ce7-bc32-80b3ee...

For comparison, this is Sonnet 3.5's attempt: https://claude.site/artifacts/b3a93ba6-9e16-4293-8ad7-398a5e...

  • punkpeye 2 hours ago

    I kinda get how LLMs work with language, but it beyond blows me my mind trying to understand how an LLM can draw SVG. There are just so many dimensions to understanding how SVG converts to an image. Even as a human I don't think I could do anywhere close to that result in first attempt.

  • orangesun 6 hours ago

    New mascot! Just make it the Anthropic orange

TIPSIO 7 hours ago

"Make me a website about books. Make it look like a designer and agency made it. Use Tailwind."

https://play.tailwindcss.com/tp54wfmIlN

Getting way better at UI.

  • flir 7 hours ago

    That's not hideous. Derivative, but that's the nature of the beast.

  • handfuloflight 3 hours ago

    As a designer and agency... this is extremely basic... but so was the prompt.

  • punkpeye 2 hours ago

    I cannot believe that others just casually dismiss this as 'basic', when just a few years ago this would have taken someone a full day of work.

    • djeastm 7 minutes ago

      I mean, it is basic. Templates have been around for decades and this looks like a template from 2007 that someone filled in with their own copy. That might take like an hour, maybe? And presumably the person who wants the page done will have to customize this text, too.

  • jasonjmcghee 6 hours ago

    I feel like something isn't working... when i try to click anything it just reloads. i can't see the collections

TekMol 7 hours ago

[flagged]

  • jonas21 7 hours ago

    > Why would my phone number be any of their business?

    Preventing abuse? It's much harder to create a throwaway phone number than a throwaway email address.

    > OpenAI does the logical thing. Let's me enter my credit card and I'm good to go. I will stay with them.

    You'd rather hand over your credit card than your phone number? I think most people would see it the other way around.

    • rcstank 7 hours ago

      Credit card is easily changed. Phone number is much more difficult.

      • crazygringo 6 hours ago

        You credit card is also easily charged.

        Your phone number isn't.

        What is a company going to do with your phone number that you're worried about...?

      • whywhywhywhy 6 hours ago

        Not difficult at all for anyone actually wanting to abuse it

        • behnamoh 6 hours ago

          You shouldn't have to want to abuse something for it to be inappropriate to be asked.

    • pseudocomposer 6 hours ago

      Many credit card companies make it easy to generate one-off card numbers/“virtual cards” you can use to subscribe to services that are hard to cancel or otherwise questionable (so you can cancel just the card you used for that company).

    • brendoelfrendo 6 hours ago

      > You'd rather hand over your credit card than your phone number?

      You know, that was my first reaction, too. But really, my phone number is much more integral to my identity. I can cancel or change a credit card number pretty trivially and then it's useless to you.

  • rustc 6 hours ago

    > OpenAI does the logical thing. Let's me enter my credit card and I'm good to go. I will stay with them.

    When did you make your account? I could have sworn I had to verify with my phone number before payment.

  • I_am_tiberius 7 hours ago

    That's the reason I've not yet signed up. After trying to use some anonymous sms service (which failed), I'm still not having an account:).

    • pushcx 6 hours ago

      They block numbers from any provider considered to be voip.

  • alsodumb 6 hours ago

    Honestly I don't think Anthropic cares about you moving over to them - it's pretty evident that they already have more demand than they can handle.

    I've always had better experience with Claude in day-to-day coding and text writing, and looking at public forums that largely seems to be the case.

  • 42lux 6 hours ago

    Not for the api.

alecco 5 hours ago

Who do I have to kill to get Claude Code access?

  • xd1936 4 hours ago

    $ npm install -g @anthropic-ai/claude-code

    $ claude

frankfrank13 6 hours ago

Tried claude code, and have an empty unresponsive terminal.

Looks cool in the demo though, but not sure this is going to perform better than Cursor, and shipping this as an interactive CLI instead of an extension is... a choice

  • toddmorey 6 hours ago

    I think it's a smart starting point as it's compatible with all IDEs. Iterate and learn and then later wrap the functionality up into IDE plugins.