From NVIDIA’s newsletter:
The first CUDA 4.1 release candidate (RC1) is now available to GPU Computing Registered Developers.
This is a great opportunity to try the new compiler, enhanced libraries, and improved development tools we’ve added to the CUDA Toolkit for this release. We’re looking forward to hearing your experiences (good and bad) so we can make CUDA 4.1 the best release yet!
Please login to download your copy of CUDA Toolkit 4.1 and updated SDK code samples. If you encounter any problems, please use the Bug Report link in your registered developer account.
Feel free to share this opportunity with your colleagues. All they need is a free GPU Computing Registered Developer account, which they can sign up for at: https://registration.nvidia.com/Cuda.aspx
Parallel Nsight 2.1 RC1 is available as a separate file drop on the registered developer portal.
== Release Highlights ==
Try the new compiler!
* New LLVMbased compiler delivers up to 10% faster performance for many applications
New & Improved “drop-in” acceleration with GPUAccelerated Libraries
* Over 1000 new image processing functions in the NPP library
* New cuSPARSE tri-diagonal solver up to 10x faster than MKL on a 6 core CPU
* New support in cuRAND for MRG32k3a and Mersenne Twister (MTGP11213)
* Bessel functions now supported in the CUDA standard Math library
* Up to 2x faster sparse matrix vector multiply using ELL hybrid format
* Learn more about all the great GPU-Accelerated Libraries at:
Enhanced & Redesigned Developer Tools
* Redesigned Visual Profiler with automated performance analysis and expert guidance
* CUDA-GDB support for debugging MPI applications, multi-context debugging, and assert()
in device code
* CUDA-MEMCHECK now detects out of bounds access for memory allocated in device code
* Parallel Nsight 2.1 CUDA warp watch visualizes variables and expressions across an entire
* Parallel Nsight 2.1 CUDA profiler now analyzes kernel memory activities, execution stalls
and instruction throughput
* Learn more about debugging and performance analysis tools for GPU developers at:
Advanced Programming Features
* Access to 3D surfaces and cube maps from device code
* Enhanced no-copy pinning of system memory, cudaHostRegister() alignment and size
* Peer-to-peer communication between processes
* Support for resetting a GPU without rebooting the system in nvidia-smi
New & Improved SDK Code Samples
* SimpleP2P sample now supports peer-to-peer communication with any Fermi GPU
* New grabcutNPP sample demonstrates interactive foreground extraction using
iterated graph cuts
* New samples showing how to implement the Horn-Schunck Method for optical flow,
perform volume filtering, and read cube map textures
Please refer to the Release Notes and Getting Started Guides for more information.