Major update to MatMul and MatMul-using operations; significant performance increases in multiple parts of the codebase.
Codebase simplifications and refactors in many areas.
Bugfixes
What's Changed
Add more ops: Sigmoid, (Two)MatVecAdd. Faster TwoMatVec. by @veluca93 in https://github.com/google/gemma.cpp/pull/129
Improve weight handling. by @veluca93 in https://github.com/google/gemma.cpp/pull/130
Remove unused includes by @copybara-service in https://github.com/google/gemma.cpp/pull/132
Add a benchmark and additional tests. by @veluca93 in https://github.com/google/gemma.cpp/pull/131
Adding Griffin implementation. by @pculliton in https://github.com/google/gemma.cpp/pull/136
Change NumGemmaLayers and NumGriffinLayers to constants in configs by @ufownl in https://github.com/google/gemma.cpp/pull/139
Mention Makefile contributed by @jart by @copybara-service in https://github.com/google/gemma.cpp/pull/141
Refactor data structures to reduce memory usage by @ufownl in https://github.com/google/gemma.cpp/pull/142
Added functionality of storing layers activations output. by @atorero in https://github.com/google/gemma.cpp/pull/145
Further improve IO, enable multiple backends without -D. by @copybara-service in https://github.com/google/gemma.cpp/pull/148
Use lambda to split function and Make stream_token can break prefill by @zeerd in https://github.com/google/gemma.cpp/pull/156
Simplify prefill early-exit (originally Merge #156) by @copybara-service in https://github.com/google/gemma.cpp/pull/158
Fix underflow in NUQ ClusterCost() by @copybara-service in https://github.com/google/gemma.cpp/pull/162
Add error-checking for py binding, add missing include+hwasan check by @copybara-service in https://github.com/google/gemma.cpp/pull/163
Simplify threading: remove the use of inner_pool. by @szabadka in https://github.com/google/gemma.cpp/pull/167
Use more parallelism in the QKV projections in MQA mode. by @szabadka in https://github.com/google/gemma.cpp/pull/170
Fix kv offset computation for MHA config. by @szabadka in https://github.com/google/gemma.cpp/pull/172
Use more parallelism in the final output of the attention block. by @szabadka in https://github.com/google/gemma.cpp/pull/175
Use more parallelism in the QKV projections of the MHA block. by @szabadka in https://github.com/google/gemma.cpp/pull/176
Factor out deinterleaving of bf16 vectors for MatVecs. by @samkaufman in https://github.com/google/gemma.cpp/pull/166
Use more parallelism in attention block in prefill mode. by @szabadka in https://github.com/google/gemma.cpp/pull/177
work with cmake install by @xinpingwang in https://github.com/google/gemma.cpp/pull/169
2x speedup of SFP decode (1.4x overall) on AVX3_DL+. by @copybara-service in https://github.com/google/gemma.cpp/pull/178
Support additional scaling by @copybara-service in https://github.com/google/gemma.cpp/pull/181
Store tokens/sec in auxiliary struct TimingInfo. by @copybara-service in https://github.com/google/gemma.cpp/pull/183
Add TTFT to TimingInfo by @copybara-service in https://github.com/google/gemma.cpp/pull/186
Make BlobWriter::Add() accept const void* by @copybara-service in https://github.com/google/gemma.cpp/pull/188
Adds Kaggle testing to CI workflow by @pculliton in https://github.com/google/gemma.cpp/pull/189
Fix normalization in Softmax function. by @szabadka in https://github.com/google/gemma.cpp/pull/194
Clarified README by @zond in https://github.com/google/gemma.cpp/pull/137
Unrolled / tiled 4x4 MatMul by @copybara-service in https://github.com/google/gemma.cpp/pull/199
Refactor GemmaImpl dispatch to use Highway 1.2's HWY_DYNAMIC_DISPATCH_T by @copybara-service in https://github.com/google/gemma.cpp/pull/202
Add first version of backpropagation support. by @szabadka in https://github.com/google/gemma.cpp/pull/203
Fix for GenerateZeroMat call in TestTiledMatMul by @copybara-service in https://github.com/google/gemma.cpp/pull/206
Remove no longer required stats.h - use Highway version instead by @copybara-service in https://github.com/google/gemma.cpp/pull/208
Simplifications: remove GemmaInterface and GemmaImpl by @copybara-service in https://github.com/google/gemma.cpp/pull/209
Implement mixed mode matmul: f32 * bf16 by @copybara-service in https://github.com/google/gemma.cpp/pull/210
Fix Softmax on SVE by @copybara-service in https://github.com/google/gemma.cpp/pull/213
Fix fix for weight type define, refs #198 by @copybara-service in https://github.com/google/gemma.cpp/pull/216
Add Adam optimizer. by @szabadka in https://github.com/google/gemma.cpp/pull/212
Add support for custom sampling function to runtime config. by @szabadka in https://github.com/google/gemma.cpp/pull/217
Shifting large matrix init to heap in ops_test.cc by @copybara-service in https://github.com/google/gemma.cpp/pull/220
Add CPU output, error if not C++17, simplify tokenizer ctor by @copybara-service in https://github.com/google/gemma.cpp/pull/222
Use CompressedWeights<TConfig> in backpropagation. by @szabadka in https://github.com/google/gemma.cpp/pull/224
Update benchmark with internal init by @copybara-service in https://github.com/google/gemma.cpp/pull/225
Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc by @copybara-service in https://github.com/google/gemma.cpp/pull/227
Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix. by @copybara-service in https://github.com/google/gemma.cpp/pull/231
Add benchmark dependency to cmake build. by @szabadka in https://github.com/google/gemma.cpp/pull/234
Fix numerical issue in Softcap by subtracting max. by @copybara-service in https://github.com/google/gemma.cpp/pull/236
Extends Transformer() to prepare for batched processing. by @copybara-service in https://github.com/google/gemma.cpp/pull/238
Tiny cleanup: distinguish between "ids" and "pieces" in argument names when encoding. by @copybara-service in https://github.com/google/gemma.cpp/pull/239
Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32, by @copybara-service in https://github.com/google/gemma.cpp/pull/237
Increase parallelism in ops_test by @copybara-service in https://github.com/google/gemma.cpp/pull/233
Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time. by @copybara-service in https://github.com/google/gemma.cpp/pull/241
Reduce duplication in Config* by inheriting no-SSM by @copybara-service in https://github.com/google/gemma.cpp/pull/242
Major duplicated code reduction in test/benchmarks by @copybara-service in https://github.com/google/gemma.cpp/pull/240
Implement a missing (bf16, f32) tiled MatMul kernel. by @copybara-service in https://github.com/google/gemma.cpp/pull/245
Removed now redundant non-batch matmul by @copybara-service in https://github.com/google/gemma.cpp/pull/246
Integrate matmul into FFW: 4.3x prefill speedup by @copybara-service in https://github.com/google/gemma.cpp/pull/243
Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/244
Added bias vector addition to MatMul by @copybara-service in https://github.com/google/gemma.cpp/pull/247
Refactor CompressedWeights. by @copybara-service in https://github.com/google/gemma.cpp/pull/248
Fix DASSERT - TiledBatch requires at least 2 vectors. by @copybara-service in https://github.com/google/gemma.cpp/pull/253
Move raw_weights into separate header, used mainly by compress_weights. by @copybara-service in https://github.com/google/gemma.cpp/pull/249
Further simplification to ForEachTensor, thanks I.K. by @copybara-service in https://github.com/google/gemma.cpp/pull/254
Update developer docs and mention asan/msan by @copybara-service in https://github.com/google/gemma.cpp/pull/255
1.15x 7b sfp prefill speedup: Matmul in attention by @copybara-service in https://github.com/google/gemma.cpp/pull/256
Fix Py binding/run_example: use GemmaEnv by @copybara-service in https://github.com/google/gemma.cpp/pull/257
Simplify Attention. by @copybara-service in https://github.com/google/gemma.cpp/pull/258
Fix debug_prompt and other binaries (internal init) by @copybara-service in https://github.com/google/gemma.cpp/pull/259
Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly by @copybara-service in https://github.com/google/gemma.cpp/pull/260
Split out common parts (embedder and transformer block) from Prefill() and Transformer() into separate functions. by @copybara-service in https://github.com/google/gemma.cpp/pull/261
Move test placeholder to a later pos. by @copybara-service in https://github.com/google/gemma.cpp/pull/263
Code cleanup by @copybara-service in https://github.com/google/gemma.cpp/pull/264
Refactor kCachePosSize and kCacheLayerSize into separate functors. by @copybara-service in https://github.com/google/gemma.cpp/pull/262
Fixing two typos. by @copybara-service in https://github.com/google/gemma.cpp/pull/265
Fix compilation errors in clang by @ufownl in https://github.com/google/gemma.cpp/pull/267
Fix KV cache size calculation error by @ufownl in https://github.com/google/gemma.cpp/pull/266
Skip the last RMSNormInplaceBatched in the Prefill phase. by @copybara-service in https://github.com/google/gemma.cpp/pull/268
Improve logging when running Gemma examples: fix the issue when max_tokens, max_generated_tokens and temperature were logging without any trailing space/newline. by @copybara-service in https://github.com/google/gemma.cpp/pull/270
Use hwy::ThreadPool::MaxThreads() to determine the number of threads to use. by @copybara-service in https://github.com/google/gemma.cpp/pull/251
Fix a clang tidy warning by @copybara-service in https://github.com/google/gemma.cpp/pull/271
Remove unused BUILD dependency by @copybara-service in https://github.com/google/gemma.cpp/pull/272
Refactor model type / training tables, simplify reverse mapping by @copybara-service in https://github.com/google/gemma.cpp/pull/273
Introduce new Gemma 9B and 27B configs by @copybara-service in https://github.com/google/gemma.cpp/pull/274
Add prompt batching to Gemma.cpp. by @copybara-service in https://github.com/google/gemma.cpp/pull/269
Add config for att/final cap, skip max-subtract. Fixes #278 by @copybara-service in https://github.com/google/gemma.cpp/pull/279
Declutter gemma/ directory, move binaries to evals/ and util/. by @copybara-service in https://github.com/google/gemma.cpp/pull/277
Remove unused kSystemPrompt by @copybara-service in https://github.com/google/gemma.cpp/pull/275
Use benchmark_helper in py bindings (adds BOS) by @copybara-service in https://github.com/google/gemma.cpp/pull/282
Cleanup: add ModelInfo struct, remove gcpp:: by @copybara-service in https://github.com/google/gemma.cpp/pull/281
Prep for sharding gemma.cc: split into kv_cache, tokenizer. by @copybara-service in https://github.com/google/gemma.cpp/pull/284
Add sliding window attention for Gemma 2. by @copybara-service in https://github.com/google/gemma.cpp/pull/280
Small cleanups. Fixes gemma_test build. by @copybara-service in https://github.com/google/gemma.cpp/pull/286
7x compile time speedup: shard gemma.cc by @copybara-service in https://github.com/google/gemma.cpp/pull/288
Fix gemma_test - moved to evals/. by @copybara-service in https://github.com/google/gemma.cpp/pull/289
Add Py bindings for weight compression by @copybara-service in https://github.com/google/gemma.cpp/pull/290
Cleanup: move util/compress and convert_weights to compression/ by @copybara-service in https://github.com/google/gemma.cpp/pull/291
Fix handling of %c and %q if eot_string. Fixes #283, thanks @ljcucc by @copybara-service in https://github.com/google/gemma.cpp/pull/292
Update gemma_test with the expected entropy values for the IT models of size 2B/7B/9B/27B. by @copybara-service in https://github.com/google/gemma.cpp/pull/294
Lint fix - string append, remove stale TODO by @copybara-service in https://github.com/google/gemma.cpp/pull/295
Update gemma_test to also pass for the v1.1. models. by @copybara-service in https://github.com/google/gemma.cpp/pull/296
Add more comments to attention computation (and some small restructuring). by @copybara-service in https://github.com/google/gemma.cpp/pull/298
Fix windows build: min conflict, unused VF by @copybara-service in https://github.com/google/gemma.cpp/pull/299
Refactor configurables. by @copybara-service in https://github.com/google/gemma.cpp/pull/297
Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing by @copybara-service in https://github.com/google/gemma.cpp/pull/303
Simplify matmul: only 2 overloads by @copybara-service in https://github.com/google/gemma.cpp/pull/304
SVE build fix: avoid capturing vectors directly. by @copybara-service in https://github.com/google/gemma.cpp/pull/305
Improve readability with RepeatedAttentionWindowSizes by @copybara-service in https://github.com/google/gemma.cpp/pull/302
Increase the prefill batch size to 64. by @copybara-service in https://github.com/google/gemma.cpp/pull/306
Fix gemma_cpp/examples/hello_world build. by @copybara-service in https://github.com/google/gemma.cpp/pull/307
Further 1.02x prefill speedup from batch 64->512 by @copybara-service in https://github.com/google/gemma.cpp/pull/308
Fix examples/hello_world for real. by @copybara-service in https://github.com/google/gemma.cpp/pull/309
Simplify FFW by using MatMul_4x4_Batch_Add. by @copybara-service in https://github.com/google/gemma.cpp/pull/311
De-templatize Activations, add RowVectorBatch class by @copybara-service in https://github.com/google/gemma.cpp/pull/310
Update gemma-27b to the correct query scaling. by @copybara-service in https://github.com/google/gemma.cpp/pull/312
Add scale parameter to MatMul. by @copybara-service in https://github.com/google/gemma.cpp/pull/313
Fix msan uninitialized scale by @copybara-service in https://github.com/google/gemma.cpp/pull/314
Major Prefill/Generate cleanup, 1.3x Prefill speedup by @copybara-service in https://github.com/google/gemma.cpp/pull/315
Cleanup: add wrapper functions and rename vars to interleaved by @copybara-service in https://github.com/google/gemma.cpp/pull/316
Split up ops.h into ops/ops-inl and matmul-inl by @copybara-service in https://github.com/google/gemma.cpp/pull/317
Use all CPU sockets when pinning threads to cores by @copybara-service in https://github.com/google/gemma.cpp/pull/319
Fix msan uninitialized scale in optimize_test by @copybara-service in https://github.com/google/gemma.cpp/pull/320
Minor polishing: adding comments, renaming variables. by @copybara-service in https://github.com/google/gemma.cpp/pull/321
Fix setting scales in Py binding by @copybara-service in https://github.com/google/gemma.cpp/pull/322
Add offset arg to MatMul, rename, Matmul for logits = ~1.1x decode speedup by @copybara-service in https://github.com/google/gemma.cpp/pull/325
1.05x prefill speedup: matvec -> matmul for !MHA by @copybara-service in https://github.com/google/gemma.cpp/pull/327
Add Python code for converting Griffin Orbax weights. Refs #301 by @copybara-service in https://github.com/google/gemma.cpp/pull/329
MatMul cleanup: Mat struct, simplify args. by @copybara-service in https://github.com/google/gemma.cpp/pull/330
Fix Windows build - macro conflict with param name by @copybara-service in https://github.com/google/gemma.cpp/pull/331
Extend LayersOutputFunc to take query index and auxillary int by @copybara-service in https://github.com/google/gemma.cpp/pull/328
Split matmul into matvec; add large matrix benchmark by @copybara-service in https://github.com/google/gemma.cpp/pull/333
Internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/326
SFP speedup: 1.14x f32, 1.19x bf16 dot = 1.02x prefill by @copybara-service in https://github.com/google/gemma.cpp/pull/335
1.1x prefill speedup, revamp threading in preparation for hierarchical parallelism. by @copybara-service in https://github.com/google/gemma.cpp/pull/334
Improve performance logging by @copybara-service in https://github.com/google/gemma.cpp/pull/336
1.03-1.08x decode speedup: precompute Rope theta, fuse by @copybara-service in https://github.com/google/gemma.cpp/pull/339
Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B. by @copybara-service in https://github.com/google/gemma.cpp/pull/342
Add pin flag to disable pinning. Refs #338 by @copybara-service in https://github.com/google/gemma.cpp/pull/343
1.3x prefill, 0.95x decode: matmul replacing last matvec by @copybara-service in https://github.com/google/gemma.cpp/pull/345
Fix gemma_test GeographyBatched for 2b-it and add entropy expectations for gemma2-2b-it. by @copybara-service in https://github.com/google/gemma.cpp/pull/346
0.98x prefill: refactor in prep for cache blocking. by @copybara-service in https://github.com/google/gemma.cpp/pull/347
Implement start_pos per query for batch interface (reopen) by @ufownl in https://github.com/google/gemma.cpp/pull/348
Simplify pos handling, auto-increment output arg by @copybara-service in https://github.com/google/gemma.cpp/pull/350
Support directly observing activations, partially replacing LayersOutputFunc by @copybara-service in https://github.com/google/gemma.cpp/pull/351
Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul by @copybara-service in https://github.com/google/gemma.cpp/pull/352
Expose underlying model configuration: number of layers, heads, etc. by @copybara-service in https://github.com/google/gemma.cpp/pull/354
VectorizedRopeAndMulBy. by @copybara-service in https://github.com/google/gemma.cpp/pull/355
Fix prefill for batched queries. by @copybara-service in https://github.com/google/gemma.cpp/pull/353
Vectorize Rope for qkv dim not evenly divisible by number of lanes. by @copybara-service in https://github.com/google/gemma.cpp/pull/356
Fix test for 2b - update prompt by @copybara-service in https://github.com/google/gemma.cpp/pull/358
Minor followup: remainder handling is a single iteration by @copybara-service in https://github.com/google/gemma.cpp/pull/359
Experiment with compensated dot product. by @copybara-service in https://github.com/google/gemma.cpp/pull/357
Avoid duplication of RMSNorm, support all activation/weight types by @copybara-service in https://github.com/google/gemma.cpp/pull/360
Demonstrate constrained decoding in gemma_cpp's hello world example by @copybara-service in https://github.com/google/gemma.cpp/pull/363
Add an additional QueryModel() overload to GemmaEnv. by @copybara-service in https://github.com/google/gemma.cpp/pull/362
Internal change. Slight restructuring of gemma_test. by @copybara-service in https://github.com/google/gemma.cpp/pull/367
1.22x NUQ compress speedup, fix out of bounds access, improve numerics by @copybara-service in https://github.com/google/gemma.cpp/pull/366
Fix NUQ for SVE - incorrect nibble packing by @copybara-service in https://github.com/google/gemma.cpp/pull/368
Further nuq_test speedups to prevent timeout by @copybara-service in https://github.com/google/gemma.cpp/pull/371
Refactor/cleanup, remove even_odd by @copybara-service in https://github.com/google/gemma.cpp/pull/372
Minor cleanup/fixes: by @copybara-service in https://github.com/google/gemma.cpp/pull/375
Major compression update, arbitrary-len unpack + new Dot by @copybara-service in https://github.com/google/gemma.cpp/pull/374
Fix mismatch between blob_store and compress interfaces (bytes) by @copybara-service in https://github.com/google/gemma.cpp/pull/376
Adds insert_float() to SbsWriter() to store a float array directly. by @copybara-service in https://github.com/google/gemma.cpp/pull/378
Implement scalar version of LayerNorm by @copybara-service in https://github.com/google/gemma.cpp/pull/379
Add const batch accessor to RowVectorBatch. by @copybara-service in https://github.com/google/gemma.cpp/pull/381
Add entropy expectations for Griffin-2b model in gemma_test and make sure it passes. by @copybara-service in https://github.com/google/gemma.cpp/pull/382
Add tests for SampleTopK that highlight existing problems and fix those: by @copybara-service in https://github.com/google/gemma.cpp/pull/383
Add pairwise sum dot products for testing by @copybara-service in https://github.com/google/gemma.cpp/pull/386
Fix the warnings complained by Clang by @ufownl in https://github.com/google/gemma.cpp/pull/380
Cascaded summation for Softmax by @copybara-service in https://github.com/google/gemma.cpp/pull/388
Fix compress-inl bf16->f32 overrun by @copybara-service in https://github.com/google/gemma.cpp/pull/390
Fix topology display for platforms where it fails (Apple) by @copybara-service in https://github.com/google/gemma.cpp/pull/391
Update expected entropy values for GRIFFIN_2B model. by @copybara-service in https://github.com/google/gemma.cpp/pull/392
Add forward and backward error by @copybara-service in https://github.com/google/gemma.cpp/pull/389
Fix prefix-LM mode assertion by @ufownl in https://github.com/google/gemma.cpp/pull/394
Reduce flakiness of dot_test. by @copybara-service in https://github.com/google/gemma.cpp/pull/396
1.6x speedup of MatMulSlow using compensated Dot by @copybara-service in https://github.com/google/gemma.cpp/pull/397
Add download location of Pali Gemma weights to README.md. by @copybara-service in https://github.com/google/gemma.cpp/pull/398
Tiny update of the README formatting. by @copybara-service in https://github.com/google/gemma.cpp/pull/399
Add double-precision dot variant by @copybara-service in https://github.com/google/gemma.cpp/pull/393
Use f64 Dot and sum in softmax - faster than Cascaded by @copybara-service in https://github.com/google/gemma.cpp/pull/400
1.09x decode speedup for topk=1/temp0: fuse softmax and sample by @copybara-service in https://github.com/google/gemma.cpp/pull/402
Rename one variable in SampleTopK and update TestSampleTopK. by @copybara-service in https://github.com/google/gemma.cpp/pull/404
Minor fix to profiler zone and add comment by @copybara-service in https://github.com/google/gemma.cpp/pull/407
Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/408
Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/377
Fix MSAN issue for multiturn. Rewind the prior EOS token. by @copybara-service in https://github.com/google/gemma.cpp/pull/412
Reduce number of operations in Gelu() by one Mul. by @copybara-service in https://github.com/google/gemma.cpp/pull/414
Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray. by @copybara-service in https://github.com/google/gemma.cpp/pull/417
Update expected ranges in dot_test. by @copybara-service in https://github.com/google/gemma.cpp/pull/420
Remove unused "two-sizes" version of MulByConstAndAdd. by @copybara-service in https://github.com/google/gemma.cpp/pull/421
Benchmark gemma.cpp with different length inputs. by @copybara-service in https://github.com/google/gemma.cpp/pull/416
Fix PaliGemma model loading. by @copybara-service in https://github.com/google/gemma.cpp/pull/425
Fix compilation error of the weights compression tool by @ufownl in https://github.com/google/gemma.cpp/pull/422
Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize. by @copybara-service in https://github.com/google/gemma.cpp/pull/419
Eliminated TConfig. by @copybara-service in https://github.com/google/gemma.cpp/pull/428
Fix PaliGemma's GenerateImageTokensT(). by @copybara-service in https://github.com/google/gemma.cpp/pull/430
Use NestedPools, add NUMA infra by @copybara-service in https://github.com/google/gemma.cpp/pull/427
Fix compilation errors of "compress_weights" target by @ufownl in https://github.com/google/gemma.cpp/pull/432
Add overloads of Image::ReadPPM method by @ufownl in https://github.com/google/gemma.cpp/pull/426
New blob_store_test, ensure ReadOne checks actual size against requested size by @copybara-service in https://github.com/google/gemma.cpp/pull/433
Add a compilation option to disable topology by @ufownl in https://github.com/google/gemma.cpp/pull/435
Serialization for class members for use with ModelConfig by @copybara-service in https://github.com/google/gemma.cpp/pull/436
Warning fixes (casts) and fix Windows build for aligned_alloc by @copybara-service in https://github.com/google/gemma.cpp/pull/437
Factor out addition of ViTConfig to a ModelConfig. by @copybara-service in https://github.com/google/gemma.cpp/pull/438
Simpler MatMul interface, vocab types, Tristate for use_spinning by @copybara-service in https://github.com/google/gemma.cpp/pull/442
Expose BlobReader::Keys() by @copybara-service in https://github.com/google/gemma.cpp/pull/443
Fix Griffin model: by @copybara-service in https://github.com/google/gemma.cpp/pull/444
Replace CLIF SbsWriter with pybind-based gcpp extension by @copybara-service in https://github.com/google/gemma.cpp/pull/445
Added a blob_compare tool that compares two sbs files that may have the blobs in a different order by @copybara-service in https://github.com/google/gemma.cpp/pull/448
Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/450
Added pybind for configs. by @copybara-service in https://github.com/google/gemma.cpp/pull/449
Improved consistency of compressor API, and added a universal method with a target type arg. by @copybara-service in https://github.com/google/gemma.cpp/pull/452
Add a simple benchmark for batching. by @copybara-service in https://github.com/google/gemma.cpp/pull/453
Threading/infra improvements. by @copybara-service in https://github.com/google/gemma.cpp/pull/455
Print cache info and update Highway version for that by @copybara-service in https://github.com/google/gemma.cpp/pull/456
Internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/457
Add support for 448px resolution to PaliGemma and PaliGemma2. by @copybara-service in https://github.com/google/gemma.cpp/pull/459
Tiny cleanup. by @copybara-service in https://github.com/google/gemma.cpp/pull/461
Refactor gemma/common.cc to improve readability and safety by @ericcurtin in https://github.com/google/gemma.cpp/pull/460
Internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/462
Fix unhandled switch warning/error by @copybara-service in https://github.com/google/gemma.cpp/pull/463
Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future. by @copybara-service in https://github.com/google/gemma.cpp/pull/454
Make prompt wrapping more consistent and fix duplicated tokens for multi-turn. by @copybara-service in https://github.com/google/gemma.cpp/pull/464
Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT by @copybara-service in https://github.com/google/gemma.cpp/pull/465
Rename ModelTraining to PromptWrapping which is a more accurate name. by @copybara-service in https://github.com/google/gemma.cpp/pull/466
Small updates to the README file. by @copybara-service in https://github.com/google/gemma.cpp/pull/467
Internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/468
Added ability to load/save a complete model file, including tokenizer. by @copybara-service in https://github.com/google/gemma.cpp/pull/469
Moved the vit config fields to their own config struct by @copybara-service in https://github.com/google/gemma.cpp/pull/471
Allow interactive use with new single-file weight format. by @copybara-service in https://github.com/google/gemma.cpp/pull/472
Add the missing migrate_weights target for CMake by @ufownl in https://github.com/google/gemma.cpp/pull/473
Tiny fix: align template parameter order with parameter order. by @copybara-service in https://github.com/google/gemma.cpp/pull/476
Add parameter for base_frequency to CreateInvTimeScale(). by @copybara-service in https://github.com/google/gemma.cpp/pull/477
Infra improvements (2) by @copybara-service in https://github.com/google/gemma.cpp/pull/474
internal change by @copybara-service in https://github.com/google/gemma.cpp/pull/478
Allow overriding num threads despite detecting topology by @copybara-service in https://github.com/google/gemma.cpp/pull/480
Assorted small cleanups. by @copybara-service in https://github.com/google/gemma.cpp/pull/482
Add python wrappers for configs and inference. by @copybara-service in https://github.com/google/gemma.cpp/pull/481
Simplified interface class and example for Gemma.cpp usage. by @copybara-service in https://github.com/google/gemma.cpp/pull/483
Base interleaved handling for 4.5-bit NUQ, specifically Enc, DecompressAndZeroPad, and Dec2. Includes tests. by @copybara-service in https://github.com/google/gemma.cpp/pull/484
Allow conversion, loading and inference with NUQ. by @copybara-service in https://github.com/google/gemma.cpp/pull/485
Improved blob diff: parallel, tolerance for float by @copybara-service in https://github.com/google/gemma.cpp/pull/489
Remove srcs_version and python_version attributes, as they already default to "PY3" by @copybara-service in https://github.com/google/gemma.cpp/pull/487
Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts by @copybara-service in https://github.com/google/gemma.cpp/pull/492
Add fork/join latency benchmark by @copybara-service in https://github.com/google/gemma.cpp/pull/496
Fix nuq Enc() to handle groups < kGroupSize. by @copybara-service in https://github.com/google/gemma.cpp/pull/497
Using TimingInfo methods and cleaning up args to DecodeStepT by @copybara-service in https://github.com/google/gemma.cpp/pull/499
Fix the link error when building compress_weights with Clang on macOS by @ufownl in https://github.com/google/gemma.cpp/pull/493
Add conversion tool for HF safetensors to gemma.cpp for PaliGemma. by @copybara-service in https://github.com/google/gemma.cpp/pull/498
Less verbose threading_test output, improve formatting. by @copybara-service in https://github.com/google/gemma.cpp/pull/500
Only temporarily enable spinning in threading benchmark by @copybara-service in https://github.com/google/gemma.cpp/pull/503
Implements FusedSoftmaxAndSampleTopK. by @copybara-service in https://github.com/google/gemma.cpp/pull/502
Use vectorized TopK using highway VQSelect by @copybara-service in https://github.com/google/gemma.cpp/pull/505
Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning by @copybara-service in https://github.com/google/gemma.cpp/pull/488
Support bf16 output of Matmul by @copybara-service in https://github.com/google/gemma.cpp/pull/511
Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/514
Internal change. by @copybara-service in https://github.com/google/gemma.cpp/pull/515
Update github actions/cache version by @copybara-service in https://github.com/google/gemma.cpp/pull/517
Fix PaliGemma models. by @copybara-service in https://github.com/google/gemma.cpp/pull/519
New Contributors
@veluca93 made their first contribution in https://github.com/google/gemma.cpp/pull/129
@atorero made their first contribution in https://github.com/google/gemma.cpp/pull/145
@samkaufman made their first contribution in https://github.com/google/gemma.cpp/pull/166
@xinpingwang made their first contribution in https://github.com/google/gemma.cpp/pull/169
@zond made their first contribution in https://github.com/google/gemma.cpp/pull/137
@ericcurtin made their first contribution in https://github.com/google/gemma.cpp/pull/460
Full Changelog: https://github.com/google/gemma.cpp/compare/v0.1.2...v0.1.3