How to create a modern C# web API client: An example implementation of the C# SDK for Anthropic Claude

Yoshifumi Kawai
21 min readMar 24, 2024

--

Anthropic Claude 3, a recently emerged rising star among LLMs, has exceptionally high performance and surpasses GPT-4! I am greatly impressed by it. Therefore, I wanted to use it with C#, but since there was no SDK available, I created an unofficial one. The library is named Claudia, derived from Claude. It can be used across the .NET ecosystem, and I have confirmed its functionality in both Unity Runtime and Editor, so I believe it can be utilized in various ways depending on your ideas.

GitHub — Cysharp/Claudia

To give you an idea of what style of Web API SDK you can create in C#, please take a look at the Claudia usage example first.

The primary design principle in creating this SDK was to make it as similar as possible to the official Python SDK and TypeScript SDK. This is because the explanations in the documentation will be based on these official SDKs, and many articles in the world will also be based on the official SDKs. You may also want to use the official prompt library with API requests.

In such cases, if the API style is different, it will require cognitive load for conversion. Although it’s a trivial matter, it’s crucial and can be a stumbling block, so we thoroughly remove it. On top of that, balancing C#-ness without forcibly introducing dynamic elements is important in the design.

The appearance of the C# client looks like this:

// C#
using Claudia;

var anthropic = new Anthropic();

var message = await anthropic.Messages.CreateAsync(new()
{
Model = "claude-3-opus-20240229",
MaxTokens = 1024,
Messages = [new() { Role = "user", Content = "Hello, Claude" }]
});

Console.WriteLine(message);

For comparison, the TypeScript version looks like this:

// TypeScript
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

const message = await anthropic.messages.create({
model: 'claude-3-opus-20240229',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Hello, Claude' }],
});

console.log(message.content);

They are quite similar, right? On top of that, the C# version doesn’t use dynamic or Dictionary<string, object>, and everything is specified with typed objects. The example above utilizes Target-typed new expressions added in C# 9.0 and Collection expressions added in C# 12, which are assumed to exist and are used to match the API nicely.

Often, APIs of dynamically typed languages appear (visually) simpler and easier to use, so being able to write with the same level of simplicity while being properly typed is a significant strength of modern C#. (The reason I decided to match the official TypeScript SDK in the first place was that I thought the API style of the official SDK was well-designed from my perspective; if it were terrible, I wouldn’t have attempted to match it.)

Streaming and Blazor

The Streaming API is also available, and when combined with Blazor, it’s easy to create a real-time updating Chat UI. The code is really just this, with the method body being just over 10 lines!

[Inject]
public required Anthropic Anthropic { get; init; }

double temperature = 1.0;
string textInput = "";
string systemInput = SystemPrompts.Claude3;
List<Message> chatMessages = new();

async Task SendClick()
{
chatMessages.Add(new() { Role = Roles.User, Content = textInput });

var stream = Anthropic.Messages.CreateStreamAsync(new()
{
Model = Models.Claude3Opus,
MaxTokens = 1024,
Temperature = temperature,
System = string.IsNullOrWhiteSpace(systemInput) ? null : systemInput,
Messages = chatMessages.ToArray()
});

var currentMessage = new Message { Role = Roles.Assistant, Content = "" };
chatMessages.Add(currentMessage);

textInput = "";
StateHasChanged();

await foreach (var messageStreamEvent in stream)
{
if (messageStreamEvent is ContentBlockDelta content)
{
currentMessage.Content[0].Text += content.Delta.Text;
StateHasChanged();
}
}
}

All request/response types are serializable with System.Text.Json.JsonSerializer, so serializing this List<Message> as-is will save it, and deserializing it will load it.

Function Calling

Claudia is not just an SDK that requests a REST API. It utilizes Source Generators to provide a mechanism for easily defining Function Calling.

What are the benefits of Function Calling? Currently, there are several things that LLMs can’t do on their own. For example, calculation is an area where they often return plausible-looking answers, and while you can improve the accuracy of plausibility by having them think step-by-step, they can’t perform accurate calculations (when given complex calculations, they tend to give answers that look correct but are wrong). In that case, if calculation is needed, you can simply use a calculator to calculate and create sentences based on that answer. They also can’t answer the current date and time. If you ask them to summarize or translate a specified web page, they will say they can’t see the contents. Function Calling solves these issues.

First, as an example, let’s define a function that returns a specified URL’s web page to Claude.

public static partial class FunctionTools
{
/// <summary>
/// Retrieves the HTML from the specified URL.
/// </summary>
/// <param name="url">The URL to retrieve the HTML from.</param>
[ClaudiaFunction]
static async Task<string> GetHtmlFromWeb(string url)
{
using var client = new HttpClient();
return await client.GetStringAsync(url);
}
}

The function defined with [ClaudiaFunction] generates various things through the Source Generator. To use this, it will be as follows:

var input = new Message
{
Role = Roles.User,
Content = """
Could you summarize this page in three lines?
https://docs.anthropic.com/claude/docs/intro-to-claude
"""
};

var message = await anthropic.Messages.CreateAsync(new()
{
Model = Models.Claude3Haiku,
MaxTokens = 1024,
System = FunctionTools.SystemPrompt, // set generated prompt
StopSequences = [StopSequnces.CloseFunctionCalls], // set </function_calls> as stop sequence
Messages = [input],
});

var partialAssistantMessage = await FunctionTools.InvokeAsync(message);

var callResult = await anthropic.Messages.CreateAsync(new()
{
Model = Models.Claude3Haiku,
MaxTokens = 1024,
System = FunctionTools.SystemPrompt,
Messages = [
input,
new() { Role = Roles.Assistant, Content = partialAssistantMessage! } // set as Assistant
],
});

// The page can be summarized in three lines:
// 1. Claude is a family of large language models developed by Anthropic designed to revolutionize the way you interact with AI.
// 2. This documentation is designed to help you get the most out of Claude, with clear explanations, examples, best practices, and links to additional resources.
// 3. Claude excels at a wide variety of tasks involving language, reasoning, analysis, coding, and more, and the documentation covers key capabilities, getting started with prompting, and using the API.
Console.WriteLine(callResult);

Two requests are made to Claude. First, in the initial request to Claude, the question is sent along with a list and description of available functions. If it is determined that executing a function is optimal, the function name and parameters to be executed are returned. After that, executing the function locally and passing the result back to Claude yields the desired final result.

So what is the Source Generator doing? First, it generates FunctionTools.SystemPrompt that is passed to Claude's system text, and its contents are as follows (partially omitted).

<tools>
<tool_description>
<tool_name>GetHtmlFromWeb</tool_name>
<description>Retrieves the HTML from the specified URL.</description>
<parameters>
<parameter>
<name>url</name>
<type>string</type>
<description>The URL to retrieve the HTML from.</description>
</parameter>
</parameters>
</tool_description>
</tools>

It’s XML. Claude is designed to recognize XML tags, and using XML tags is considered a best practice when you want to provide clear information systematically. Therefore, it automatically generates XML to pass from C# functions to Claude. You wouldn’t want to write this by hand, would you?

Claude then returns a result like the following in response to that request.

<function_calls>
<invoke>
<tool_name>GetHtmlFromWeb</tool_name>
<parameters>
<url>https://docs.anthropic.com/claude/docs/intro-to-claude</url>
</parameters>
</invoke>

Again, it’s XML (the closing tag is missing because it’s stopped by StopSequences. No further information is needed if you want to call a function, so it’s cut off). The Source Generator generates the FunctionTools.InvokeAsync method to parse this, execute the function (GetHtmlFromWeb), and pass it to Claude. The actually generated InvokeAsync method looks like this:

public static async ValueTask<string?> InvokeAsync(MessageResponse message)
{
var content = message.Content.FirstOrDefault(x => x.Text != null);
if (content == null) return null;

var text = content.Text;
var tagStart = text .IndexOf("<function_calls>");
if (tagStart == -1) return null;

var functionCalls = text.Substring(tagStart) + "</function_calls>";
var xmlResult = XElement.Parse(functionCalls);

var sb = new StringBuilder();
sb.AppendLine(functionCalls);
sb.AppendLine("<function_results>");

foreach (var item in xmlResult.Elements("invoke"))
{
var name = (string)item.Element("tool_name")!;
switch (name)
{
case "GetHtmlFromWeb":
{
var parameters = item.Element("parameters")!;

var _0 = (string)parameters.Element("url")!;

BuildResult(sb, "GetHtmlFromWeb", await GetHtmlFromWeb(_0).ConfigureAwait(false));
break;
}

default:
break;
}
}

sb.Append("</function_results>"); // final assistant content cannot end with trailing whitespace

return sb.ToString();

static void BuildResult<T>(StringBuilder sb, string toolName, T result)
{
sb.AppendLine(@$" <result>
<tool_name>{toolName}</tool_name>
<stdout>{result}</stdout>
</result>");
}
}

You wouldn’t want to write this by hand. Especially as the number of functions you want to call increases, it becomes more and more difficult.

By invoking & generating XML and passing it back to Claude as the initial output result by the Assistant, you can obtain the desired answer. This technique is officially introduced as one of the best practices in Prefill Claude’s response and is beneficial for guiding Claude’s responses in the desired direction. For example, if you return { as a prefill response, the probability of Claude outputting the result as JSON increases dramatically.

vs Semantic Kernel

It seems that C# users, in particular, tend to utilize Semantic Kernel for everything, but the functionality of Semantic Kernel is a bit excessive. If you are a C# engineer, it’s better to handle data storage and many other features on your own.

The User Guides in Claude’s API documentation are clear and excellent. Regardless of the framework you go through, ultimately what gets executed is the Raw API. Instead of a generic abstraction, I think it’s good to focus specifically on Claude and consider how to leverage its distinctive XML-based instructions.

How to Create a Modern Web API Client

From here, we’ll discuss how to design a modern API client based on Claudia’s design.

First, use HttpClient as the communication foundation. It’s the only choice. There’s no room for objection. Even Grpc.Net.Client uses HttpClient for HTTP/2 gRPC communication. Like it or not, the foundation of all HTTP-based communication is HttpClient.

Here, it’s a good idea to allow accepting HttpMessageHandler from the outside.

public class Anthropic : IMessages, IDisposable
{
readonly HttpClient httpClient;

// Make it public to allow changes to DefaultRequestHeaders and BaseAddress
public HttpClient HttpClient => httpClient;

public Anthropic()
: this(new HttpClientHandler(), true)
{
}

public Anthropic(HttpMessageHandler handler)
: this(handler, true)
{
}

public Anthropic(HttpMessageHandler handler, bool disposeHandler)
{
this.httpClient = new HttpClient(handler, disposeHandler);
}

public void Dispose()
{
httpClient.Dispose();
}
}

HttpClient is actually just a shell, and the entity is HttpMessageHandler. HttpMessageHandler can do various things, such as implementing DelegatingHandler to hook the before and after of requests, and Cysharp/YetAnotherHttpHandler replaces the entire communication processing with a Rust implementation in the form of a HttpMessageHandler implementation. In cases where you want to use UnityWebRequest instead of the .NET runtime’s communication implementation in Unity, you can use UnityWebRequestHttpMessageHandler.cs to replace the entire communication processing with Unity’s implementation.

Let’s also work on how to split the interfaces.

A two-level invocation style like client.Messages.CreateAsync, similar to .Controller.Method in MVC, is an intuitive and easy-to-use design. In particular, it's nice that it's friendly to input completion. To achieve this, first split the interface, but as a trick, make it an explicit interface implementation and return the interface itself with return this;.

public interface IMessages
{
Task<MessageResponse> CreateAsync(MessageRequest request, RequestOptions? overrideOptions = null, CancellationToken cancellationToken = default);
IAsyncEnumerable<IMessageStreamEvent> CreateStreamAsync(MessageRequest request, RequestOptions? overrideOptions = null, CancellationToken cancellationToken = default);
}

public class Anthropic : IMessages, IDisposable
{
public IMessages Messages => this;

async Task<MessageResponse> IMessages.CreateAsync(MessageRequest request, RequestOptions? overrideOptions, CancellationToken cancellationToken)
{
// ...
}

async IAsyncEnumerable<IMessageStreamEvent> IMessages.CreateStreamAsync(MessageRequest request, RequestOptions? overrideOptions, [EnumeratorCancellation] CancellationToken cancellationToken)
{
// ...
}
}

This way, there’s no allocation when going down one level (because it returns this), and since it’s an explicit implementation, it doesn’t appear in input completion at the top level, making it easy to use, performant, and easy to implement (because you can directly access all the client’s fields).

User-Friendly Request Type Generation

The Anthropic request types are quite organized and have a specification that is friendly to typed languages, but there are some parts that are either single string or an array of content blocks. It's a bit troublesome to have either/or, but it's not like Option<Either<List<>>> or anything like that. If you define it that way, the API client's feel will be terrible. If you think about it, in this case of the Anthropic API, a string is equivalent to a string content of length 1.

// Instead of this
Content = [ new() { Type = "text", Text = "Hello, Claude" }]

// I want to write like this
Content = "Hello, Claude"

I think this is a good specification. It’s tedious to dogmatically write Type = “text”, Text = “…”. 95% of the usage will probably be single string content (Type can also be image, in which case the binary base64 string is set in Source; it’s an array to pass both images and text).

Let’s implement that specification in C#. In this case, it’s like normalizing, so I implemented it with implicit conversion.

public record class Message
{
/// <summary>
/// user or assistant.
/// </summary>
[JsonPropertyName("role")]
public required string Role { get; set; }

/// <summary>
/// single string or an array of content blocks.
/// </summary>
[JsonPropertyName("content")]
public required Contents Content { get; set; }
}

public class Contents : Collection<Content>
{
public static implicit operator Contents(string text)
{
var content = new Content
{
Type = ContentTypes.Text,
Text = text
};
return new Contents { content };
}
}

Instead of Content[], I made it a custom collection and generated single string content from its implicit conversion from a string. It's not even the latest C# feature, but a technique that has been around for a long time. Reckless use is strictly prohibited, but utilizing it in such places is effective for improving the feel of the API client.

Timeout

Timeout is a common process, so it’s better to make it easily configurable by the user in the API client. However, since HttpClient has a Timeout property, it’s usually sufficient to set it. However, in Claudia, it’s intentionally disabled.

public class Anthropic : IMessages, IDisposable
{
public TimeSpan Timeout { get; init; } = TimeSpan.FromMinutes(10);

public Anthropic(HttpMessageHandler handler, bool disposeHandler)
{
this.httpClient = new HttpClient(handler, disposeHandler);
this.httpClient.Timeout = System.Threading.Timeout.InfiniteTimeSpan;
}
}

This is because the official Anthropic client has a specification that allows overriding the timeout setting for each method call, so I followed that specification. HttpClient or equivalent calls should be thread-safe (in fact, API clients may be registered as Singleton), so it’s not good to manipulate the properties of HttpClient in SendAsync. Therefore, the Timeout of HttpClient is disabled and processed manually.

The implementation method is to generate a LinkedTokenSource, create a CancellationToken that gets canceled after the timeout duration using CancelAfter, and pass it to HttpClient.SendAsync. This is the same as the internal implementation when HttpClient.Timeout has a timeout duration.

// The actual code is mixed with retry processing, so it's slightly different
async Task<TResult> RequestWithAsync<TResult>(HttpRequestMessage message, CancellationToken cancellationToken, RequestOptions? overrideOptions)
{
var timeout = overrideOptions?.Timeout ?? Timeout;
using (var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken))
{
cts.CancelAfter(timeout);

try
{
var result = await httpClient.SendAsync(message, HttpCompletionOption.ResponseHeadersRead, cancellationToken).ConfigureAwait(ConfigureAwait);
return result;
}
catch (OperationCanceledException ex) when (ex.CancellationToken == cts.Token)
{
if (cancellationToken.IsCancellationRequested)
{
throw new OperationCanceledException(ex.Message, ex, cancellationToken);
}
else
{
throw new TimeoutException($"The request was canceled due to the configured Timeout of {Timeout.TotalSeconds} seconds elapsing.", ex);
}

throw;
}
}
}

Be careful with error handling when cancellation actually occurs (OperationCanceledException is thrown). First, you need to strip the LinkedToken. If passed through as-is, the Token of OperationCanceledException remains the LinkedToken, but this cannot be used to determine the cause of cancellation on the upstream side. If the cause of cancellation was the cancellation of the passed CancellationToken, create a new OperationCanceledException and change the cancellation reason Token.

If it was a timeout, it’s better to throw a TimeoutException instead of an OperationCanceledException. Note that if you use the timeout implementation of HttpClient, it throws a TaskCanceledException due to historical reasons (apparently, they wanted to change it but couldn't due to compatibility; it's not a very good design, so you don't need to follow it).

Retry

There may be some debate as to whether the API client itself should have retry functionality. However, it’s not as simple as just catching an exception when it occurs and retrying; you need to first distinguish between what can be retried and what cannot. For example, if authentication fails or the JSON thrown into the request is corrupted, retrying is pointless no matter how many times you do it, so it shouldn’t be retried. However, since such detailed conditions are only known to the API client itself, it’s good to incorporate retry processing.

In Claudia, following the official client, 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are targeted for retry. Authentication failure (PermissionError(403)) or invalid request content (InvalidRequestError(400)) are not retried. The frequently occurring OverloadedError (error indicating that the result couldn’t be returned due to overload) is 529, which is resolved by hitting it a few times, so it’s retried.

The retry logic also follows the official client. If the response header has retry-after-ms or retry-after, it follows that, and if not (or if retry-after is larger than the specified value), the interval is controlled by Exponential Backoff with jitter.

Cancellation

The client side does not have a .Cancel() method or similar. This is because, in accordance with HttpClient, the client itself can be used almost like a singleton and shared across each call (it may be injected as a singleton by DI, depending on the case). Therefore, instead of .Cancel(), which affects everything, pass a CancellationToken to each call.

Ultra-Fast Parsing of Server Sent Events

The API for retrieving responses by streaming uses the server-sent events specification and is sent via streaming. Specifically, text messages like the following are received.

event: message_start
data: {"type":"message_start","message":...}

event: content_block_start
data: {"type":"content_block_start","index":...}

It’s a repetition of event: event name, data: JSON, and so on. Now, when it comes to newline-delimited text messages, using StreamReader and ReadLine is the correct answer, but it’s the wrong answer in modern C#.

ReadLine generates a string. To convert directly from UTF8 data for event name determination or eventually deserializing the JSON of data into an object, you can avoid using strings. In other words, zero allocation can be aimed for (except for generating objects to pass to the user). If you just don’t pass through strings. Therefore, StreamReader has no role to play.

Let’s look at the specific code. We’ll divide it into the first half (preparation) and the second half (parsing part).

internal class StreamMessageReader
{
readonly PipeReader reader;
readonly bool configureAwait;
MessageStreamEventKind currentEvent;

public StreamMessageReader(Stream stream, bool configureAwait)
{
this.reader = PipeReader.Create(stream);
this.configureAwait = configureAwait;
}

public async IAsyncEnumerable<IMessageStreamEvent> ReadMessagesAsync([EnumeratorCancellation] CancellationToken cancellationToken)
{
READ_AGAIN:
var readResult = await reader.ReadAsync(cancellationToken).ConfigureAwait(configureAwait);

if (!(readResult.IsCompleted | readResult.IsCanceled))
{
var buffer = readResult.Buffer;

while (TryReadData(ref buffer, out var streamEvent))
{
yield return streamEvent;
if (streamEvent.TypeKind == MessageStreamEventKind.MessageStop)
{
yield break;
}
}

reader.AdvanceTo(buffer.Start, buffer.End);
goto READ_AGAIN;
}
}

First, pass the Stream to System.IO.Pipelines.PipeReader. The Stream in this case is an unstable Stream streamed from the server over the network, so buffer management is difficult. PipeReader/PipeWriter has some quirks, but it takes care of that management nicely and is a very important library in modern C#.

The basic flow is to read the buffer (ReadAsync), parse it line by line (TryReadData) and yield return the object if it’s in a state where parsing is possible (the end of the line is not included, so it can’t be parsed), mark it up to the read part with AdvanceTo if the buffer is insufficient, and then ReadAsync again.

The user side was shown in the Blazor sample, but the basic approach is to enumerate with await foreach.

await foreach (var messageStreamEvent in Anthropic.Messages.CreateStreamAsync())
{
}

IAsyncEnumerable is very well-suited for streaming processing involving networks like this, and it has become much easier for the data source side to return an asynchronous sequence with yield return. It would be impossible to go back to the days when this didn’t exist.

Next is the second half, the processing to parse from the buffer decomposed by PipeReader.

[SkipLocalsInit]
bool TryReadData(ref ReadOnlySequence<byte> buffer, [NotNullWhen(true)] out IMessageStreamEvent? streamEvent)
{
var reader = new SequenceReader<byte>(buffer);
Span<byte> tempBytes = stackalloc byte[64]; // alloc temp

while (reader.TryReadTo(out ReadOnlySequence<byte> line, (byte)'\n', advancePastDelimiter: true))
{
if (line.Length == 0)
{
continue; // next.
}
else if (line.FirstSpan[0] == 'e') // event
{
// Parse Event.
if (!line.IsSingleSegment)
{
line.CopyTo(tempBytes);
}
var span = line.IsSingleSegment ? line.FirstSpan : tempBytes.Slice(0, (int)line.Length);

var first = span[7]; // "event: [c|m|p|e]"

if (first == 'c') // content_block_start/delta/stop
{
switch (span[23]) // event: content_block_..[]
{
case (byte)'a': // st[a]rt
currentEvent = MessageStreamEventKind.ContentBlockStart;
break;
case (byte)'o': // st[o]p
currentEvent = MessageStreamEventKind.ContentBlockStop;
break;
case (byte)'l': // de[l]ta
currentEvent = MessageStreamEventKind.ContentBlockDelta;
break;
default:
break;
}
}
else if (first == 'm') // message_start/delta/stop
{
switch (span[17]) // event: message_..[]
{
case (byte)'a': // st[a]rt
currentEvent = MessageStreamEventKind.MessageStart;
break;
case (byte)'o': // st[o]p
currentEvent = MessageStreamEventKind.MessageStop;
break;
case (byte)'l': // de[l]ta
currentEvent = MessageStreamEventKind.MessageDelta;
break;
default:
break;
}
}
else if (first == 'p')
{
currentEvent = MessageStreamEventKind.Ping;
}
else if (first == 'e')
{
currentEvent = (MessageStreamEventKind)(-1);
}
else
{
// Unknown Event, Skip.
// throw new InvalidOperationException("Unknown Event. Line:" + Encoding.UTF8.GetString(line.ToArray()));
currentEvent = (MessageStreamEventKind)(-2);
}

continue;
}
else if (line.FirstSpan[0] == 'd') // data
{
// Parse Data.
Utf8JsonReader jsonReader;
if (line.IsSingleSegment)
{
jsonReader = new Utf8JsonReader(line.FirstSpan.Slice(6)); // skip data:
}
else
{
jsonReader = new Utf8JsonReader(line.Slice(6)); // ReadOnlySequence.Slice is slightly slow
}

switch (currentEvent)
{
case MessageStreamEventKind.Ping:
streamEvent = JsonSerializer.Deserialize<Ping>(ref jsonReader, AnthropicJsonSerialzierContext.Default.Options)!;
break;
case MessageStreamEventKind.MessageStart:
streamEvent = JsonSerializer.Deserialize<MessageStart>(ref jsonReader, AnthropicJsonSerialzierContext.Default.Options)!;
break;
// Omitted (Deserialize<T> for MessageDela, MessageStop, ContentBlockStart, ContentBlockDelta, ContentBlockStop, error similarly)
default:
// unknown event, skip
goto END;
}

buffer = buffer.Slice(reader.Consumed);
return true;
}
}
END:
streamEvent = default;
buffer = buffer.Slice(reader.Consumed);
return false;
}

The desired processing is to deserialize the JSON of data into an object from the two lines of event and data. The buffer doesn’t necessarily conveniently contain the two lines of event and data; it may contain only the event, only the data, or the data may be cut off (resulting in incomplete JSON). It needs to be structured so that it can be interrupted and resumed, taking these into consideration.

However, assuming that there is sufficient buffer for one line if a newline code exists, it loops with while (reader.TryReadTo(out ReadOnlySequence<byte> line, (byte)'\n', advancePastDelimiter: true)) and uses this as a substitute for StreamReader.ReadLine. This reader is a SequenceReader, a utility that supports reading from ReadOnlySequence, and since it's a ref struct, there's no allocation for the reader itself. ReadOnlySequence is a class with many pitfalls to use correctly and efficiently, so it's more convenient and safer to implement based on such utilities.

First, in parsing the event, it reads from here what type the data is. The straightforward approach would be to determine with if (span.SequenceEqual("content_block_start")). Calling SequenceEqual on Span<byte> is implemented efficiently, so it's not bad, but is a series of if statements really good? So, in Claudia, the determination is actually simplified as follows.

var first = span[7]; // "event: [c|m|p|e]"

if (first == 'c') // content_block_start/delta/stop
{
switch (span[23]) // event: content_block_..[]
{
case (byte)'a': // st[a]rt
currentEvent = MessageStreamEventKind.ContentBlockStart;
break;
case (byte)'o': // st[o]p
currentEvent = MessageStreamEventKind.ContentBlockStop;
break;
case (byte)'l': // de[l]ta
currentEvent = MessageStreamEventKind.ContentBlockDelta;
break;
default:
break;
}
}
else if (first == 'm') // message_start/delta/stop
{
switch (span[17]) // event: message_..[]
{
case (byte)'a': // st[a]rt
currentEvent = MessageStreamEventKind.MessageStart;
break;
case (byte)'o': // st[o]p
currentEvent = MessageStreamEventKind.MessageStop;
break;
case (byte)'l': // de[l]ta
currentEvent = MessageStreamEventKind.MessageDelta;
break;
default:
break;
}
}

There are 8 types of messages: content_block_start/delta/stop, message_start/delta/stop, ping, and error. First, the first character can be used to determine whether it’s a content system, message system, or other. For start/delta/stop, the third character can be used to determine. So, by checking 1 byte twice, it can be classified. It’s clearly fast! However, it should be noted that there is a non-zero possibility of the check being broken by the addition of message types in the future (for example, if content_block_fforward comes, it may be misidentified as content_block_stop). Claudia is optimistically assuming it will be fine, but it’s something to keep in mind.

This can be said to be a variation of the code in Modern High-Performance C# 2023, which I presented before.

P.56-P.61, How parse NATS protocol effectively

When looking at text protocols, it’s hard to resist the urge to somehow cheat the determination. If you want to do strict determination while avoiding a series of if statements, first put in a length check. Make a rough branch with the length and then do an accurate check with SequenceEqual. It’s just about doing the same thing as the optimization of swtich to string in C# (the compiler is converting it to that kind of processing!). If there are many branches, it may be a good idea to take a hash code and branch, in other words, implement an inline Dictionary.

Lastly, the data line is JSON Deserialization. To deserialize from ReadOnlySpan<byte> or ReadOnlySequence<byte>, you need to pass it through Utf8JsonReader. Note that Utf8JsonReader is also a ref struct, so it's not included in the allocation.

With this, we were able to process without going through String at all! There’s a feeling that it would be super simple if we used StreamReader, but we can’t help it because we’re suffering from a disease that makes us think we’ve lost if we go through a string.

Source Generator vs Reflection

For the implementation of Function Calling, Claudia adopted Source Generator. It was possible to create it based on reflection, but in this case, Source Generators yielded more desirable results. First, let’s compare what kind of function definition would be required if it were implemented with reflection, using the case of Semantic Kernel.

public static partial class FunctionTools
{
// Claudia Source Generator

/// <summary>
/// Retrieve the current time of day in Hour-Minute-Second format for a specified time zone. Time zones should be written in standard formats such as UTC, US/Pacific, Europe/London.
/// </summary>
/// <param name="timeZone">The time zone to get the current time for, such as UTC, US/Pacific, Europe/London.</param>
[ClaudiaFunction]
public static string TimeOfDay(string timeZone)
{
var time = TimeZoneInfo.ConvertTimeBySystemTimeZoneId(DateTime.UtcNow, timeZone);
return time.ToString("HH:mm:ss");
}

// Semantic Kernel

[KernelFunction]
[Description("Retrieve the current time of day in Hour-Minute-Second format for a specified time zone. Time zones should be written in standard formats such as UTC, US/Pacific, Europe/London.")]
public static string TimeOfDay([Description("The time zone to get the current time for, such as UTC, US/Pacific, Europe/London.")]string timeZone)
{
var time = TimeZoneInfo.ConvertTimeBySystemTimeZoneId(DateTime.UtcNow, timeZone);
return time.ToString("HH:mm:ss");
}
}

In Function Calling, the information about the function must be given to Claude, so descriptions for both the method and parameters are required. In Claudia’s Source Generator implementation, I made it retrieve them from document comments. In Semantic Kernel, it retrieves them from the Description attribute. Document comments are more natural and easier to write. Attributes for parameters are not only harder to write but also become quite difficult to read when there are multiple parameters.

Also, with Source Generators, missing elements can be turned into compile errors as analyzers.

All checks, such as document comments not being written for all parameters or using unsupported types, can be known in real-time not only at compile-time but also at edit-time.

The drawback is that Source Generators have a higher implementation difficulty, and great care must be taken when using document comments.

To retrieve document comments on Roslyn, ISymbol.GetDocumentationCommtentXml() is the easiest, but whether it can be retrieved or not depends on <GenerateDocumentaionFile>. If it's false, it always returns null. That makes it too hard to use, so in Claudia, I tried to retrieve it from SyntaxNode, but that was also affected by <GenerateDocumentaionFile>.

So, I had no choice but to prepare an extension method like the following to successfully retrieve document comments in all situations (it’s a bit difficult to handle because it’s based on Trivia, but it’s much better than not being able to retrieve it).

public static DocumentationCommentTriviaSyntax? GetDocumentationCommentTriviaSyntax(this SyntaxNode node)
{
if (node.SyntaxTree.Options.DocumentationMode == DocumentationMode.None)
{
var withDocumentationComment = node.SyntaxTree.Options.WithDocumentationMode(DocumentationMode.Parse);
var code = node.ToFullString();
var newTree = CSharpSyntaxTree.ParseText(code, (CSharpParseOptions)withDocumentationComment);
node = newTree.GetRoot();
}

foreach (var leadingTrivia in node.GetLeadingTrivia())
{
if (leadingTrivia.GetStructure() is DocumentationCommentTriviaSyntax structure)
{
return structure;
}
}

return null;
}

The state of DocumentationMode determines whether DocumentationCommentTriviaSyntax can be retrieved (it becomes None when GenerateDocumentaionFile=false), so if it's None, it's parsed again with DocumentationMode.Parse attached to retrieve it. Even if you generate a CSharpSyntaxTree by passing options to SyntaxNode as-is, it doesn't parse it again or changing DocumentationMode is useless, so it's done by converting it to a string and then calling ParseText.

JSON Serializer

Requests and responses are JSON in today’s world. And the library to use is System.Text.Json.JsonSerializer, period. There is room for objection, but there isn’t. Like it or not, you have to use it now.

A feature of System.Text.Json is that it can process based on UTF8, so if you try to avoid going through strings as much as possible, you can expect high performance. To deserialize ReadOnlySpan<byte> or ReadOnlySequence<byte>, you need to pass it through Utf8JsonReader. This is a ref struct, so there's no allocation, so just new it and use it. What about the Writer? Utf8JsonWriter is a class. Why? So, for the Writer, depending on how the application is built, if you can hold it in a field and reuse it, hold it in a field and reuse it (there's Reset), and if you can't hold it, pull it from [ThreadStatic].

When providing it in a library, since all the types to be used are determined, source generating it should improve performance and increase AOT safety. Claudia is also generating it.

[JsonSourceGenerationOptions(
GenerationMode = JsonSourceGenerationMode.Default,
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull,
WriteIndented = false)]
[JsonSerializable(typeof(MessageRequest))]
[JsonSerializable(typeof(Message))]
[JsonSerializable(typeof(Contents))]
[JsonSerializable(typeof(Content))]
[JsonSerializable(typeof(Metadata))]
[JsonSerializable(typeof(Source))]
[JsonSerializable(typeof(MessageResponse))]
[JsonSerializable(typeof(Usage))]
[JsonSerializable(typeof(ErrorResponseShape))]
[JsonSerializable(typeof(ErrorResponse))]
[JsonSerializable(typeof(Ping))]
[JsonSerializable(typeof(MessageStart))]
[JsonSerializable(typeof(MessageDelta))]
[JsonSerializable(typeof(MessageStop))]
[JsonSerializable(typeof(ContentBlockStart))]
[JsonSerializable(typeof(ContentBlockDelta))]
[JsonSerializable(typeof(ContentBlockStop))]
[JsonSerializable(typeof(MessageStartBody))]
[JsonSerializable(typeof(MessageDeltaBody))]
public partial class AnthropicJsonSerialzierContext : JsonSerializerContext
{
}
// When used internally, this JsonSerializerContext is always specified
JsonSerializer.SerializeToUtf8Bytes(request, AnthropicJsonSerialzierContext.Default.Options)

One thing I stumbled upon was that JsonIgnoreCondition.WhenWritingNull, which normally (reflection-based) worked for Nullable<T> as well, stopped working with Source Generators and no longer ignored null. I had no choice but to work around it by directly attaching [JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingDefault)] to all Nullable<T> properties of the target types.

public record class MessageRequest
{
// ...

[JsonPropertyName("temperature")]
[JsonIgnore(Condition = JsonIgnoreCondition.WhenWritingDefault)]
public double? Temperature { get; set; }
}

Honestly, I feel like it’s an implementation leak in the Source Generator version, but since I was able to work around it, I’ll just leave it for now…

Like Azure OpenAI Service for the OpenAI API, people in AWS environments may find it easier to use Amazon Bedrock. So, already added Bedrock support! It should be even easier to use now.

Creating a Web API client is not that difficult. However, many SDKs out there are never designed to be easy to use. We hope this article will help you build a better design.

--

--

Yoshifumi Kawai
Yoshifumi Kawai

Written by Yoshifumi Kawai

a.k.a. neuecc. Creator of UniRx, UniTask, MessagePack for C#, MagicOnion etc. Microsoft MVP for C#. CEO/CTO of Cysharp Inc. Live and work in Tokyo, Japan.

Responses (1)