NVIDIA: New Pascal GPUs on the way. NVIDIA Pascal: architecture review Nvidia announced the first gpu based on the pascal architecture

The GeForce GTX 1080 Ti has 11GB of GDDR5X memory, a 1583MHz GPU (overclocked to 2000MHz with a stock cooling system), 11GHz QDR memory, and 35% better performance than the GeForce GTX 1080. And that's at a discounted price of $ 699.

The new graphics card displaces the GeForce GTX 1080 from the position of the flagship in the GeForce lineup and becomes the fastest the graphics card in existence today, as well as the most powerful card on the Pascal architecture.

The most powerful NVIDIA GeForce GTX 1080 Ti gaming card

NVIDIA GeForce GTX 1080 Ti is a gamer's dream who will finally be able to enjoy the latest AAA games, play in high definition VR headsets, enjoying the clarity and accuracy of graphics.

The GTX 1080 Ti was designed as the first full-fledged graphics card for 4K gaming. It is equipped with the newest and most technologically advanced hardware that no other video card can boast of today.

Here official presentation NVIDIA GeForce GTX 1080 Ti

“It's time for something new. The one that is 35% faster than the GTX 1080. The one that is faster than the Titan X. Let's call this the maximum ...

Video games have become more and more beautiful every year, so we are introducing a next generation top product so you can enjoy the next generation of games. "

Jen-Xun

NVIDIA GeForce GTX 1080 Ti specifications

NVIDIA did not skimp on the hardware for its new and ultra-powerful graphics card.

It is equipped with the same GPU The Pascal GP102 GPU is the same as the Titan X (P), but outperforms the latter in all respects.

The processor is equipped with 12 billion transistors and has six graphics processing clusters, two of which are blocked. This gives a total of 28 multi-threaded processors 128 cores each.

Thus, the GeForce GTX 1080 Ti video card has 3584 CUDA cores, 224 texture mapping units and 88 ROPs (units responsible for z-buffering, anti-aliasing, recording the final image into the frame buffer of video memory).

The overclocking range starts from 1582 MHz to 2 GHz. Pascal's architecture was created primarily for overclocking in reference and more extreme overclocking in non-standard models.

The GeForce GTX 1080 Ti also features 11GB GDDR5X memory operating over a 352-bit bus. The flagship is also equipped with the fastest G5X solution to date.

With the new compression and tile caching system, the GTX 1080 Ti's bandwidth can be increased to 1200GB / s, surpassing AMD's HBM2 technology.

NVIDIA GeForce GTX 1080 Ti Specification:

Specifications GTX TItan X Pascal GTX 1080 Ti GTX 1080
Technical process 16 nm 16nm 16 nm
Transistors 12 billion 12 billion 7.2 billion
Crystal area 471mm² 471mm² 314mm²
Memory 12 GB GDDR5X 11 GB GDDR5X 8 GB GDDR5X
Memory speed 10 Gb / s 11 GB / s 11 GB / s
Memory interface 384-bit 352-bit 256-bit
Bandwidth 480GB / s 484 GB / s 320GB / s
CUDA core 3584 3584 2560
Base frequency 1417 1607
Overclocking frequency 1530MHz 1583 MHz 1730 MHz
Computing power 11 teraflops 11.5 teraflops 9 teraflops
Thermal power 250W 250W 180W
Price 1200$ US $ 699 499$

Cooling the NVIDIA GeForce GTX 1080 Ti graphics card

GeForce GTX 1080 Ti Founders features a new airflow management solution that allows for better board cooling and is also quieter than previous designs. All this makes it possible to overclock the video card more and achieve even greater speed. In addition, the cooling efficiency is improved by 7-phase power supply on 14 high efficiency dualFET transistors.

The GeForce GTX 1080 Ti comes with the latest NVTTM design, which introduces a new Vapor cooling chamber that doubles the cooling surface of the Titan X (P). This new thermal design helps achieve optimal cooling and boosts the graphics card's GPU beyond specification with GPU Boost 3.0 technology.

NVIDIA GeForce GTX 1080 Ti is an overclocker's dream

So what do we do with this impressive graphics card power? The answer is obvious - to overclock to the limit. During the event, NVIDIA showcased the outstanding overclocking potential of their GTX 1080 Ti graphics card. Recall that they managed to reach a processor frequency of 2.03GHz at 60 FPS locked.

2016 is already drawing to a close, but his contribution to the gaming industry will remain with us for a long time. Firstly, video cards from the red camp received an unexpectedly successful update in the mid-price range, and secondly, NVIDIA proved once again that it is not in vain that it occupies 70% of the market. The Maxwells were good, the GTX 970 was rightfully considered one of the best cards for the money, but the Pascal is another matter entirely.

The new generation of hardware in the person of the GTX 1080 and 1070 literally buried the results of last year's systems and the market for flagship used hardware, and the "younger" lines in the person of the GTX 1060 and 1050 consolidated their success in more affordable segments. Owners of GTX980Ti and other Titans cry crocodile tears: their uber-guns for many thousands of rubles lost 50% of their value and 100% of show-off at once. NVIDIA itself claims that the 1080 is faster than last year's TitanX, the 1070 easily piles on the 980Ti, and the relatively budget 1060 will hurt all other cards.

Whether this is so, where the legs of high performance grow from and what to do with all this on the eve of the holidays and sudden financial joys, as well as how to please yourself, you can find out in this long and a little boring article.

You can love the company Nvidia or ... not love, but to deny that it is she who is the leader in the field of video card building at the moment will only become a stranger from an alternative universe. Since AMD's Vega has not yet been announced, we have not seen flagship RXs on Polaris, and the R9 Fury with its 4 GB of experimental memory can not be considered a promising card (VR and 4K, nevertheless, will want a little more, than she has) - we have what we have. While the 1080 Ti and the conditional RX 490, RX Fury and RX 580 are just rumors and expectations, we have time to sort out NVIDIA's current lineup and see what the company has achieved in recent years.

The mess and the history of the origin of Pascal

NVIDIA regularly gives reasons to “dislike yourself”. The story with the GTX 970 and its "3.5 GB memory", "NVIDIA, Fuck you!" from Linus Torvalds, complete pornography in desktop graphics lines, refusal to work with the free and much more widespread FreeSync system in favor of its proprietary ... In general, there are enough reasons. One of the most annoying for me personally is what happened with the past two generations of video cards. If we take a rough description, then "modern" graphics processors have gone from the days of DX10 support. And if you look for the "grandfather" of the 10th series today, then the beginning of modern architecture will be in the area of ​​the 400th series of video accelerators and Fermi architecture. It was in it that the idea of ​​a "block" construction from the so-called. "CUDA cores" in NVIDIA terminology.

Fermi

If video cards of the 8000th, 9000th and 200th series were the first steps in mastering the very concept of "modern architecture" with universal shader processors (like AMD, yes), then the 400th series was already as close as possible to what we see in some 1070. Yes, Fermi still has a small Legacy crutch from previous generations: the shader unit worked at twice the core frequency, which was responsible for calculating the geometry, but the overall picture of some GTX 480 does not differ much from any something 780, SM multiprocessors are combined into clusters, clusters communicate via a shared cache with memory controllers, and the results of the work are output by a common rasterization unit for the cluster:


Block diagram of the GF100 processor used in the GTX 480.

In the 500th series, there was the same Fermi, slightly improved "inside" and with fewer defects, so the top solutions received 512 CUDA cores instead of 480 from the previous generation. Visually, the flowcharts generally seem to be twins:


The GF110 is the heart of the GTX 580.

In some places we increased the frequencies, slightly changed the design of the chip itself, there was no revolution. All the same 40 nm process technology and 1.5 GB of video memory on a 384-bit bus.

Kepler

With the arrival of the Kepler architecture, a lot has changed. We can say that it was this generation that gave NVIDIA video cards the vector of development that led to the emergence of the current models. Not only the architecture of the GPU has changed, but the very kitchen of developing new hardware inside NVIDIA has changed. While Fermi was focused on finding a solution that would deliver high performance, Kepler relied on energy efficiency, resource efficiency, high frequencies, and ease of optimizing a game engine for a high-performance architecture.

Serious changes were made in the GPU design: they took not the "flagship" GF100 / GF110, but the "budget" GF104 / GF114, which was used in one of the most popular cards of that time - GTX 460.


The overall processor architecture is simplified by using only two large blocks with four unified multiprocessor shader units. The layout of the new flagships looked like this:


GK104 installed in GTX 680.

As you can see, each of the computational units has significantly increased in weight relative to the previous architecture, and was named SMX. Compare the block structure with the one shown above in the Fermi section.


GK104 GPU SMX Multiprocessor

The six hundredth series did not have video cards on a full-fledged processor containing six blocks of computational modules, the flagship was the GTX 680 with the installed GK104, and cooler than it was only the "two-headed" 690, on which just two processors with all the necessary harness and memory were divorced. A year later, the flagship GTX 680 with minor changes turned into the GTX 770, and the crown of the evolution of the Kepler architecture was video cards based on the GK110 crystal: GTX Titan and Titan Z, 780Ti and the usual 780. Inside - the same 28 nanometers, the only qualitative improvement (which is NOT went to consumer video cards based on GK110) - performance with double precision operations.

Maxwell

The first graphics card based on the Maxwell architecture was ... NVIDIA GTX 750Ti. A little later, its trimming appeared in the face of the GTX 750 and 745 (it was supplied only as an integrated solution), and at the time of appearance, the younger cards really shook up the market for inexpensive video accelerators. The new architecture was tested on the GK107 chip: a tiny piece of future flagships with huge heatsinks and a frightening price. It looked something like this:


Yes, there is only one computing unit, but how much more complicated it is than that of its predecessor, compare for yourself:


Instead of the large SMX block that was used as a basic building block in the creation of the GPU, new, more compact SMM blocks are used. Kepler's basic computing units were good, but suffered from poor capacity utilization - a banal hunger for instructions: the system could not scatter instructions over a large number of executive elements. The Pentium 4 had about the same problems: the power was idle, and the error in branch prediction was very expensive. In Maxwell, each computational module was divided into four parts, each of which was allocated its own instruction buffer and a warp scheduler - the same type of operations on a group of threads. As a result, efficiency has increased, and the graphics processors themselves have become more flexible than their predecessors, and most importantly, at the cost of a little blood and a fairly simple crystal, they worked out a new architecture. The story develops in a spiral, hehe.

Mobile solutions have benefited most from the innovations: the die area has grown by a quarter, and the number of execution units of multiprocessors has almost doubled. As luck would have it, it was the 700th and 800th series that made the main mess in the classification. Inside the 700 alone, there were video cards based on Kepler, Maxwell and even Fermi architectures! That is why desktop Maxwells, in order to move away from the mess in previous generations, received a common 900 series, from which the mobile GTX 9xx M cards later spun off.

Pascal - logical development of Maxwell architecture

What was laid down in Kepler and continued in the Maxwell generation remained in Pascal as well: the first consumer video cards were released on the basis of not the largest GP104 chip, which consists of four graphics processing clusters. The full-size, six-cluster GP100 went to an expensive semi-professional GPU under the TITAN X brand name. However, even the "cut-off" 1080 ignites so much that past generations feel bad.

Performance improvement

The foundation of the basics

Maxwell became the foundation of the new architecture, the diagram of comparable processors (GM104 and GP104) looks almost the same, the main difference is the number of multiprocessors packed into clusters. Kepler (700th generation) had two large SMX multiprocessors, which were divided into 4 parts each in Maxwell, equipped with the necessary harness (changing the name to SMM). In Pascal, two more were added to the existing eight in the block, so there were 10 of them, and the abbreviation was once again interrupted: now single multiprocessors are again called SM.


The rest is a complete visual similarity. True, there are even more changes inside.

The engine of progress

There are a lot of changes inside the block of multiprocessors. In order not to go into very boring details of what was redone, how it was optimized and how it was before, I will describe the changes very briefly, otherwise some are already yawning.

First of all, Pascal corrected the part that is responsible for the geometric component of the picture. This is necessary for multi-monitor configurations and working with VR headsets: with proper support from the game engine (and thanks to NVIDIA's efforts, this support will soon appear), a video card can calculate the geometry once and obtain several geometry projections for each of the screens. This significantly reduces the load in VR, not only in the area of ​​working with triangles (here the gain is just a twofold), but also in working with the pixel component.

The conditional 980Ti will read the geometry twice (for each eye), and then fill it with textures and perform post-processing for each of the images, processing a total of about 4.2 million points, of which about 70% will actually be used, the rest will be cut off or fall into the area , which is simply not displayed for each of the eyes.

1080 will process the geometry once, and the pixels that will not be included in the final image will simply not be calculated.


With the pixel component, everything is actually even cooler. Since the increase in memory bandwidth can be carried out only on two fronts (increasing the frequency and bandwidth per clock), and both methods cost money, and the GPU "hunger" for memory is more and more pronounced over the years due to the increase in resolution and the development of VR improve "free" methods of increasing bandwidth. If you cannot expand the bus and raise the frequency, you need to compress the data. In previous generations, hardware compression was already implemented, but Pascal has taken it to the next level. Again, we can do without boring math and take a ready-made example from NVIDIA. On the left - Maxwell, on the right - Pascal, those points, whose color component was compressed without quality loss, are filled with pink color.


Instead of transferring specific tiles of 8x8 pixels, the memory contains the "average" color + a matrix of deviations from it, such data takes from ½ to ⅛ of the initial volume. In real tasks, the load on the memory subsystem decreased from 10 to 30%, depending on the number of gradients and the uniformity of fills in complex scenes on the screen.


This was not enough for the engineers, and for the flagship video card (GTX 1080), memory with increased bandwidth was used: GDDR5X transmits twice as many data bits (not instructions) per clock, and outputs more than 10 Gbit / s at a peak. Transferring data at such a crazy speed required a completely new layout of the memory layout on the board, and in total, the efficiency of working with memory increased by 60-70% compared to the flagships of the previous generation.

Reduced latency and capacity downtime

Video cards have long been involved not only in graphics processing, but also in related calculations. Physics is often tied to animation frames and is remarkably parallel, which means it is calculated much more efficiently on the GPU. But the VR industry has become the biggest generator of problems in recent years. Many game engines, development methodologies and a bunch of other technologies used for working with graphics were simply not designed for VR, the case of camera movement or changes in the position of the user's head during the rendering process was simply not handled. If you leave everything as it is, the desynchronization of the video stream and your movements will cause seasickness attacks and simply interfere with the immersion in the game world, which means that you simply have to throw out the "wrong" frames after drawing and start over. And these are new delays in displaying images on the display. This does not have a positive effect on performance.

Pascal took this problem into account and introduced dynamic load balancing and the possibility of asynchronous interrupts: now execution units can either interrupt the current task (saving the results of work in the cache) to process more urgent tasks, or simply drop the under-rendered frame and start a new one, significantly reducing latency in image formation. The main beneficiary here, of course, is VR and games, but even with general-purpose calculations, this technology can help: simulation of particle collisions received a performance increase of 10-20%.

Boost 3.0

Automatic overclocking of NVIDIA video cards was received a long time ago, back in the 700th generation based on the Kepler architecture. Overclocking was improved in Maxwell, but it was still to put it mildly so-so: yes, the video card worked a little faster, as long as the heat package allowed it, additional 20-30 megahertz in the core wired from the factory and 50-100 in memory gave an increase, but not much ... It worked like this:


Even if there was a margin in the GPU temperature, the performance did not increase. With the arrival of Pascal, engineers have shaken up this dusty swamp. Boost 3.0 operates on three fronts: temperature analysis, overclocking, and on-chip voltage boost. Now all the juices are squeezed out of the GPU: standard NVIDIA drivers do not do this, but the vendor's software allows you to build a profiling curve in one click, which will take into account the quality of your particular video card.

One of the first in this field was EVGA, its Precision XOC utility has a certified NVIDIA scanner, which consistently goes through the entire range of temperatures, frequencies and voltages, achieving maximum performance in all modes.

Add to this a new technical process, high-speed memory, all sorts of optimizations and a decrease in the thermal package of chips, and the result will be simply indecent. From 1500 "base" MHz in the GTX 1060, you can squeeze more than 2000 MHz, if you get a good copy, and the vendor does not screw up with the cooling.

Improving the quality of the picture and perception of the game world

Performance has been increased on all fronts, but there are a number of points in which there have been no qualitative changes for several years: the quality of the displayed image. And we are not talking about graphic effects, they are provided by game developers, but about what exactly we see on the monitor and how the game looks for the end consumer.

Fast vertical sync

The most important feature of Pascal is a triple frame output buffer, which simultaneously provides ultra-low latency in rendering and ensures vertical synchronization. One buffer stores the output image, the other - the last rendered frame, the third - the current one. Goodbye horizontal stripes and tearing, hello high performance. There are no delays that the classic V-Sync suits here (since no one is holding back the performance of the video card and it always draws at the highest possible frame rate), and only fully formed frames are sent to the monitor. I think that after the new year I will write a separate big post about V-Sync, G-Sync, Free-Sync and this new fast sync algorithm from Nvidia, too many details.

Normal screenshots

No, those screenshots that are now are just a shame. Almost all games use a bunch of technologies to make the picture in motion amaze and breathtaking, and screenshots become a real nightmare: instead of a stunningly realistic picture, which consists of animation, special effects that exploit the peculiarities of human vision, you see some kind of angular nepony with strange colors and a completely lifeless picture.

New NVIDIA Ansel technology solves the problem with screens. Yes, its implementation requires the integration of special code from game developers, but there is a minimum of real manipulations, but the profit is huge. Ansel can pause the game, transfer camera control into your hands, and then there is scope for creativity. You can just take a shot without a GUI and the angle you like.


You can render an existing scene in ultra-high resolution, shoot 360-degree panoramas, stitch them into a plane, or leave them in 3D for viewing in a VR headset. Take a photo with 16 bits per channel, save it in a kind of RAW file, and then play with exposure, white balance and other settings so that screenshots become attractive again. We are waiting for tons of cool content from game fans in a year or two.

Sound processing on a video card

The new NVIDIA Gameworks libraries add many features available to developers. They are mainly aimed at VR and speeding up various calculations, as well as improving the quality of the picture, but one of the features is the most interesting and worthy of mention. VRWorks Audio takes the work with sound to a fundamentally new level, considering the sound not according to banal averaged formulas, depending on the distance and thickness of the obstacle, but performs a complete trace of the sound signal, with all reflections from the environment, reverberation and sound absorption in various materials. NVIDIA has a good video example of how this technology works:


Look better with headphones

Purely theoretically, nothing prevents you from running such a simulation on Maxwell, but the optimizations for asynchronous instruction execution and the new interrupt system built into Pascal allow you to carry out calculations without greatly affecting the frame rate.

Pascal in total

In fact, there are even more changes, and many of them are so deep in architecture that you can write a huge article on each of them. The key innovations are improved design of the chips themselves, optimization at the lowest level in terms of geometry and asynchronous work with full interrupt handling, many features tailored to work with high resolutions and VR, and, of course, insane frequencies that were not dreamed of by past generations of video cards. Two years ago, the 780 Ti barely crossed the 1 GHz line, today 1080 in some cases works on two: and here the merit is not only in the reduced process technology from 28 nm to 16 or 14 nm: many things are optimized at the lowest level, starting with the design of transistors , ending with their topology and strapping inside the chip itself.

For each individual case

The line of NVIDIA 10-series video cards turned out to be really balanced, and quite tightly covers all gaming use cases, from the option “play strategy and diablo” to “I want top games in 4k”. Game tests were chosen according to one simple technique: to cover as large a range of tests as possible with the smallest possible set of tests. BF1 is a great example of good optimization and allows you to compare the performance of DX11 against DX12 under the same conditions. DOOM is chosen for the same reason, only it allows you to compare OpenGL and Vulkan. The third "Witcher" here acts as a so-so-optimized toy, in which the maximum graphics settings allow any flagship to be screwed simply by the power of shit code. It uses the classic DX11, which is time-tested and perfectly worked out in drivers and familiar to game developers. Overwatch is blown away for all the "tournament" games in which the code is well optimized, in fact it is interesting in how high the average FPS is in a not very graphically heavy game, sharpened to work in an "average" config available around the world.

I will give some general comments right away: Vulkan is very gluttonous in terms of video memory, for it this characteristic is one of the main indicators, and you will see this thesis reflected in benchmarks. DX12 on AMD cards behaves much better than on NVIDIA cards, if the "green" ones on average show a drawdown in FPS on new APIs, then the "red" ones, on the contrary, show an increase.

Junior division

GTX 1050

Younger NVIDIA (without Ti letters) is not as interesting as its charged sister with Ti letters. Its destiny is a gaming solution for MOBA games, strategies, tournament shooters and other games where the detail and quality of the picture is of little interest to anyone, and a stable frame rate for minimal money is what the doctor ordered.


All pictures lack the core frequency, because it is individual for each instance: 1050 without additional. power supply may not chase, and her sister with a 6-pin connector can easily take the conditional 1.9 GHz. In terms of power supply and length, the most popular options are shown, you can always find a video card with a different circuit or other cooling that does not fit into the specified "standards".

DOOM 2016 (1080p, ULTRA): OpenGL - 68 FPS, Vulkan - 55 FPS;
The Witcher 3: Wild Hunt (1080p, MAX, HairWorks Off): DX11 - 38 FPS;
Battlefield 1 (1080p, ULTRA): DX11 - 49 FPS, DX12 - 40 FPS;
Overwatch (1080p, ULTRA): DX11 - 93 FPS;

The GTX 1050 has a GP107 graphics processor, inherited from the older card with a small cut of functional blocks. 2 GB of video memory will not allow you to roam, but for e-sports disciplines and playing some kind of tanks, it is perfect, since the price of a junior card starts at 9.5 thousand rubles. Additional power is not required, the video card needs 75 watts from the motherboard via the PCI-Express slot. True, in this price segment there is also an AMD Radeon RX460, which with the same 2 GB of memory is cheaper, but the quality of work is almost not inferior, and for about the same money you can get an RX460, but in a 4 GB version. Not that they helped him a lot, but some kind of reserve for the future. The choice of the vendor is not so important, you can take what is available and does not delay your pocket with an extra thousand rubles, which is better spent on the cherished letters Ti.

GTX 1050 Ti

About 10 thousand for the usual 1050 is not bad, but for the charged (or full-fledged, call it what you want) version they ask for not much more (on average, 1-1.5 thousand more), but its filling is much more interesting. By the way, the entire 1050 series is produced not from cutting / rejecting "large" chips, which are not suitable for the 1060, but as a completely independent product. It has a less process technology (14 nm), a different plant (crystals are grown by the Samsung factory), and there are extremely interesting specimens with additional. power supply: the thermal package and basic consumption are still the same 75 W, but the overclocking potential and the ability to go beyond what is permitted are completely different.


If you continue to play at FullHD (1920x1080) resolution, do not plan to upgrade, and your remaining hardware within 3-5 years ago is a great way to increase performance in toys with little blood. You should focus on ASUS and MSI solutions with an additional 6-pin power supply, the options from Gigabyte are not bad, but the price is not so pleasing.

DOOM 2016 (1080p, ULTRA): OpenGL - 83 FPS, Vulkan - 78 FPS;
The Witcher 3: Wild Hunt (1080p, MAX, HairWorks Off): DX11 - 44 FPS;
Battlefield 1 (1080p, ULTRA): DX11 - 58 FPS, DX12 - 50 FPS;
Overwatch (1080p, ULTRA): DX11 - 104 FPS.

Middle division

Video cards of the 60th line have long been considered the best choice for those who do not want to spend a lot of money, and at the same time play at high graphics settings in everything that will be released in the next couple of years. It began back in the days of the GTX 260, which had two versions (simpler, 192 stream processors, and fatter, 216 "stones"), continued in 400, 500, and 700 generations, and now NVIDIA again fell into an almost perfect combination prices and quality. Two versions of the "midrange" are again available: GTX 1060 with 3 and 6 GB of video memory differ not only in the amount of available RAM, but also in performance.

GTX 1060 3GB

The queen of esports. Reasonable price, amazing performance for FullHD (and in e-sports they rarely use a higher resolution: results are more important there), reasonable amount of memory (3 GB, for a minute, was two years ago in the flagship GTX 780 Ti, which cost indecent money). In terms of performance, the younger 1060 easily piles on last year's GTX 970 with the ever-memorable 3.5 GB of memory, and easily drags the 780 Ti superflagman by the ears.


DOOM 2016 (1080p, ULTRA): OpenGL - 117 FPS, Vulkan - 87 FPS;
The Witcher 3: Wild Hunt (1080p, MAX, HairWorks Off): DX11 - 70 FPS;
Battlefield 1 (1080p, ULTRA): DX11 - 92 FPS, DX12 - 85 FPS;
Overwatch (1080p, ULTRA): DX11 - 93 FPS.

Here the undisputed favorite in terms of price-to-emissions ratio is the version from MSI. Decent frequencies, quiet cooling system and sane dimensions. They ask for nothing at all, in the region of 15 thousand rubles.

GTX 1060 6GB

The 6GB version is a budget ticket to VR and high resolutions. It will not starve in memory, it will be slightly faster in all tests and will surely outperform the GTX 980 where 4 GB of video memory is not enough for last year's video card.


DOOM 2016 (1080p, ULTRA): OpenGL - 117 FPS, Vulkan - 121 FPS;
The Witcher 3: Wild Hunt (1080p, MAX, HairWorks Off): DX11 - 73 FPS;
Battlefield 1 (1080p, ULTRA): DX11 - 94 FPS, DX12 - 90 FPS;
Overwatch (1080p, ULTRA): DX11 - 166 FPS.

I would like to note once again the behavior of video cards when using the Vulkan API. 1050 with 2 GB of memory - FPS drawdown. 1050 Ti with 4GB is almost on par. 1060 3 GB - drawdown. 1060 6 GB - growth of results. The trend, I think, is clear: for Vulkan you need 4+ GB of video memory.

The trouble is, both 1060s are not small graphics cards. It seems that the heat package is reasonable, and the board there is really small, but many vendors decided to simply unify the cooling system between 1080, 1070 and 1060. Someone has a video card 2 slots in height, but 28+ centimeters long, someone made them shorter. but thicker (2.5 slots). Choose carefully.

Unfortunately, an additional 3 GB of video memory and an unlocked computing unit will cost you ~ 5-6 thousand rubles above the price of the 3-gig version. In this case, Palit has the most interesting options in terms of price and quality. ASUS has released monstrous 28-centimeter cooling systems that molds to 1080, 1070, and 1060, and such a video card does not fit much, versions without factory overclocking cost almost the same, and the exhaust is less, and for relatively compact MSI they ask for more than the competition at roughly the same quality level and factory overclocking.

Major League

It's hard to play for all the money in 2016. Yes, 1080 is insanely cool, but perfectionists and hardware experts know that NVIDIA HIDES the existence of a super-flagship 1080 Ti, which should be incredibly cool. The first specs are already leaking online, and it is clear that the greens are waiting for a step away from the red and white: some kind of uber-gun that can instantly be put in place by the new king of 3D graphics, the great and mighty GTX 1080 Ti. In the meantime, we have what we have.

GTX 1070

Last year's adventures of the mega-popular GTX 970 and its not-quite-honest-4-gigabyte memory were actively sorted out and sucked all over the Internet. This did not stop it from becoming the most popular gaming video card in the world. In anticipation of the change of year on the calendar, it holds the first place in the Steam Hardware & Software Survey. This is understandable: the combination of price and performance was just perfect. And if you missed last year's upgrade and the 1060 isn't cool enough for you, the GTX 1070 is your choice.

Resolutions 2560x1440 and 3840x2160, the video card digests with a bang. The Boost 3.0 overclocking system will try to throw firewood when the load on the GPU increases (that is, in the most difficult scenes, when the FPS sags under the onslaught of special effects), overclocking the video card processor to a mind-blowing 2100+ MHz. The memory easily gets 15-18% of the effective frequency over the factory settings. Monstrous thing.


Attention, all tests were carried out in 2.5k (2560x1440):

DOOM 2016 (1440p, ULTRA): OpenGL - 91 FPS, Vulkan - 78 FPS;
The Witcher 3: Wild Hunt (1440p, MAX, HairWorks Off): DX11 - 73 FPS;
Battlefield 1 (1440p, ULTRA): DX11 - 91 FPS, DX12 - 83 FPS;
Overwatch (1440p, ULTRA): DX11 - 142 FPS.

Obviously, stretching ultra-settings in 4k and never sagging below 60 frames per second is beyond the power of either this card or 1080, but you can play at conditional "high" settings by turning off or slightly reducing the most voracious features in full resolution, and in terms of real performance, the video card easily sets the heat even to last year's 980 Ti, which cost almost twice as much. Gigabyte has the most interesting option: they managed to cram a full-fledged 1070 into an ITX-standard case. Thanks to the humble thermal package and energy efficient design. Prices for cards start from 29-30 thousand rubles for delicious options.

GTX 1080

Yes, the flagship has no Ti letters. Yes, it doesn’t use the largest GPU available from NVIDIA. Yes, there is no coolest HBM 2 memory, and the video card does not look like a Death Star or, in extreme cases, an Imperial Star Destroyer-class cruiser. And yes, this is the coolest gaming graphics card out there. One takes and runs DOOM in 5k3k resolution with 60 frames per second on ultra settings. All new toys are subject to it, and for the next year or two it will not experience problems: while the new technologies incorporated in Pascal become widespread, while game engines learn to efficiently load the available resources ... Yes, in a couple of years we will say: “Here, look at GTX 1260, a couple of years ago you needed a flagship to play with these settings ”, but for now - the best of the best video cards is available before the new year at a very reasonable price.


Attention, all tests were carried out in 4k (3840x2160):

DOOM 2016 (2160p, ULTRA): OpenGL - 54 FPS, Vulkan - 78 FPS;
The Witcher 3: Wild Hunt (2160p, MAX, HairWorks Off): DX11 - 55 FPS;
Battlefield 1 (2160p, ULTRA): DX11 - 65 FPS, DX12 - 59 FPS;
Overwatch (2160p, ULTRA): DX11 - 93 FPS.

All that remains is to decide: you need it, or you can save money and take 1070. There is not much difference between playing at "ultra" or "high" settings, since modern engines perfectly draw a picture in high resolution even at medium settings: after all, we have you are not soap consoles that cannot provide enough performance for honest 4k and stable 60fps.

If we ignore the most inexpensive options, then the Palit will again have the best combination of price and quality in the GameRock version (about 43-45 thousand rubles): yes, the cooling system is "thick", 2.5 slots, but the video card is shorter than competitors, and a pair of 1080 is rarely installed ... SLI is slowly dying, and even the life-giving injection of high-speed bridges does not really help it out. The ASUS ROG option is not bad if you have a lot of add-ons installed. devices and you don't want to overlap the extra expansion slots: their video card is exactly 2 slots thick, but it requires 29 centimeters of free space from the back wall to the basket with hard drives. I wonder if Gigabyte will master the release of this monster in the ITX format?

Outcomes

New NVIDIA graphics cards have simply buried the used hardware market. Only GTX 970 survives on it, which can be snatched for 10-12 thousand rubles. Potential buyers of used 7970 and R9 280 often have nowhere to put it and simply not feed it, and many options from the secondary market are simply unpromising, and as a cheap upgrade for a couple of years ahead, they are useless: there is little memory, new technologies are not supported. The beauty of the new generation of video cards is that even toys that are not optimized for them are much more cheerful than on the veteran GPU charts of previous years, and what will happen in a year when game engines learn to use the full power of new technologies is hard to imagine.

GTX 1050 and 1050Ti

Alas, I cannot recommend the purchase of the most inexpensive Pascal. RX 460 is usually sold for a thousand or two less, and if your budget is so limited that you take a graphics card "for the latter", then Radeon is objectively a more interesting investment. On the other hand, the 1050 is a little faster, and if the prices in your city for these two video cards are almost the same, take it.

The 1050Ti, in turn, is a great option for those who prefer plot and gameplay to bells and whistles and realistic nose hair. She does not have a bottleneck in the form of 2 GB of video memory, she will not "go out" in a year. You can give money for it - do it. The Witcher on high settings, GTA V, DOOM, BF 1 - no problem. Yes, you will have to give up a number of improvements, such as ultra-long shadows, complex tessellation or "expensive" calculation of self-shadowing of models with limited ray tracing, but in the heat of battle you will forget about these beauties after 10 minutes of playing, and stable 50-60 frames per second will give much more immersive effect than nerve jumps from 25 to 40, but with settings at "maximum".

If you have any Radeon 7850, GTX 760 or lower, video cards with 2 GB of video memory or less, you can safely change them.

GTX 1060

The younger 1060 will delight those for whom a frame rate of 100 FPS is more important than graphic bells and whistles. At the same time, it will allow you to comfortably play all released toys in FullHD resolution with high or maximum settings and stable 60 frames per second, and the price is very different from everything that comes after it. The older 1060 with 6 gigabytes of memory is an uncompromising solution for FullHD with a performance margin for a year or two, acquaintance with VR and a perfectly acceptable candidate for playing at high resolutions at medium settings.

It makes no sense to change your GTX 970 to GTX 1060, it will suffer for another year. But the annoying 960, 770, 780, R9 280X and older units can be safely upgraded to 1060.

Top segment: GTX 1070 and 1080

The 1070 is unlikely to become as popular as the GTX 970 (though most users have a biennial hardware update cycle), but in terms of value for money, it is certainly a worthy continuation of the 70 line. It just grinds games on mainstream 1080p resolution, easily copes with 2560x1440, withstands the ordeal of unoptimized 21 to 9, and is quite capable of displaying 4k, albeit not at maximum settings.


Yes, SLI can be like that.

We say “come on, goodbye” to all sorts of 780 Ti, R9 390X and other last year 980s, especially if we want to play in high definition. And, yes, this is the best option for those who like to assemble a hellish box in Mini-ITX format and scare guests with 4k games on a 60-70 inch TV, which run on a computer the size of a coffee maker.
gtx 1050 graphics cards history Add tag

Last week Jen-Hsun Huang took the stage and officially unveiled Nvidia graphics cards GeForce GTX 1070 and GTX 1080... In addition to the presentation of the accelerators themselves and their overclocking potential, new technologies used in architecture were demonstrated Pascal... It is to them that this material is dedicated. Of course, not all innovations will be considered. Some of the new and / or updated technologies will be discussed in the GTX 1080 review, which will appear very soon.

PascalandGPU GP 104

The first and most important change in Pascal- Moving away from the 28nm process technology that has been used in consumer graphics cards since the release of the GeForce GTX 600-series, since March 2012. Pascal architecture builds on a new 16 nmFinFET TSMC's manufacturing process, and the move to thinner lithography comes with dramatic improvements in power consumption and performance scaling.

But above all, a more subtle technical process often allows you to increase the frequency. In stock, the video card operates at more than 1700 MHz. Also, judging by numerous reviews, the GTX 1080 is capable of overclocking to 2100+ MHz, and this is a reference, which is also seriously limited in power.

It should be noted that not only the decrease in the technical process allowed to raise the frequency in this way. According to Jonah Alben, senior vice president of GPU Engineering, after switching to 16nm FinFET process technology, new GPUs could run at around 1325 MHz, and the team Nvidia worked for a long time on increasing frequencies. The result of the work was the GTX 1080, which operates at 1733 MHz.

How did you achieve this level of clock speed and performance improvement over the Maxwell architecture? Pascal combines several interesting innovations to dramatically increase efficiency.

Optimizations allowed not only to increase the clock frequency, but also the efficiency of the CUDA cores of the GPU GP104 relative to its predecessor, the GM204. Proof of this is the performance gain of 70% (relative to the GTX 980), and this is still on the drivers that are not yet completely finished.

One of the changes can be seen in the block diagram presented above. Now, in one GPC cluster, replace four SM-s (simultaneous multiprocessor) blocks of five.

PolyMorphEngine 4.0

There is only one significant addition in the GPU crystal itself - the addition of a new module to the PolyMorph Engine. A synchronous multiprojection unit has been added. The new block is located at the very end of the frame processing path and creates several projection schemes from one geometry stream.

If you do not go into details, and everything is very complicated there, then the new block takes over the processing of geometry, not all, but an essential part. This reduces the load on other GPU units. Besides, PolyMorph helps to form a picture at the correct angles on multi-monitor configurations, but more on that later.

ParameterMeaning
Chip codenameGP104
Production technology16 nm FinFET
Number of transistors7.2 billion
Core area314 mm²
Architecture
DirectX hardware support
Memory bus
1607 (1733) MHz
Computing units20 streaming multiprocessors, including 2560 scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks160 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Monitor support
GeForce GTX 1080 Reference Graphics Card Specifications
ParameterMeaning
Core frequency1607 (1733) MHz
2560
Number of texture units160
Number of blending blocks64
Effective memory frequency10000 (4 × 2500) MHz
Memory typeGDDR5X
Memory bus256-bit
Memory8 GB
320 GB / s
about 9 teraflops
103 gigapixels / s
257 gigatexels / s
TirePCI Express 3.0
Connectors
Energy consumptionup to 180 W
Additional foodOne 8-pin connector
2
Recommended price$ 599-699 (USA), 54990 rubles (Russia)

The new model of the GeForce GTX 1080 graphics card received a logical name for the first solution of the new GeForce series - it differs from its direct predecessor only in the changed generation digit. The novelty not only replaces the top solutions in the company's current lineup, but also became the flagship of the new series for some time, until the Titan X was released on GPUs of even greater power. Below it in the hierarchy is also the already announced GeForce GTX 1070 model, based on a stripped-down version of the GP104 chip, which we will consider below.

Nvidia's new GPUs have MSRPs of $ 599 and $ 699 for the regular and Founders Edition (see below), respectively, which is a pretty good deal considering the GTX 1080 outperforms not only the GTX 980 Ti, but the Titan X as well. Today the new product is the best performance solution on the market for single-chip video cards without any questions, and at the same time it costs less than the most productive video cards of the previous generation. So far, there is essentially no competitor from AMD for the GeForce GTX 1080, so Nvidia was able to set a price that suits them.

The video card in question is based on the GP104 chip with a 256-bit memory bus, but the new type of GDDR5X memory operates at a very high effective frequency of 10 GHz, which gives a high peak bandwidth of 320 GB / s - which is almost on par with the GTX 980 Ti with 384 -bit bus. The amount of memory installed on a video card with such a bus could be equal to 4 or 8 GB, but it would be foolish to set a smaller volume for such a powerful solution in modern conditions, so GTX 1080 quite logically received 8 GB of memory, and this amount is enough to run any 3D applications with any quality settings for several years to come.

The GeForce GTX 1080 PCB is, for obvious reasons, decently different from the company's previous PCBs. The typical power consumption value for the new product is 180 W, which is slightly higher than the GTX 980, but noticeably lower than the less powerful Titan X and GTX 980 Ti. The reference board has the usual set of connectors for connecting video output devices: one Dual-Link DVI, one HDMI and three DisplayPort.

Founders Edition Reference Design

Even with the announcement of the GeForce GTX 1080 in early May, a special edition of the video card called Founders Edition was announced, which has a higher price compared to the usual video cards of the company's partners. In fact, this edition is a reference design for the card and cooling system, and it is produced by Nvidia itself. You can relate differently to such options for video cards, but the reference design developed by the company's engineers and the construction made with the use of high-quality components has its fans.

But whether they will give a few thousand rubles more for a video card from Nvidia itself is a question that only practice can answer. In any case, at first it will be the reference video cards from Nvidia at an increased price that will appear on sale, and there is not much to choose from - this happens with every announcement, but the reference GeForce GTX 1080 differs in that it is planned to sell it in this form throughout its entire period. life, right up to the next generation solutions.

Nvidia believes that this edition has its merits even over the best works of partners. For example, the dual-slot design of the cooler makes it easy to build on the basis of this powerful video card both gaming PCs of a relatively small form factor and multi-chip video systems (even in spite of the three- and four-chip operation modes not recommended by the company). The GeForce GTX 1080 Founders Edition has some of the benefits of an efficient vapor chamber cooler and a fan that blows hot air out of the case - this is the first such Nvidia solution to use less than 250W of power.

Compared to the company's previous reference product designs, the power circuit has been upgraded from four-phase to five-phase. Nvidia is also talking about the improved components on which the novelty is based, and electrical noise has been reduced, allowing for improved voltage stability and overclocking potential. As a result of all the improvements, the reference board is 6% more energy efficient than the GeForce GTX 980.

And in order to differ from the "regular" models of the GeForce GTX 1080 and outwardly, an unusual "chopped" case design was developed for the Founders Edition. Which, however, probably also led to the complication of the form of the evaporation chamber and the radiator (see photo), which may have served as one of the reasons for the $ 100 surcharge for such a special edition. We repeat that at the beginning of sales, buyers will not have much choice, but in the future they will be able to choose both a solution with their own design from one of the company's partners, and made by Nvidia itself.

The next generation of Pascal graphics architecture

The GeForce GTX 1080 graphics card is the company's first solution based on the GP104 chip, which belongs to the new generation of Nvidia's graphics architecture - Pascal. Although the new architecture is based on the solutions worked out in Maxwell, it also has important functional differences, which we will write about later. The main change from a global point of view was the new technological process, according to which the new graphics processor was made.

The use of the 16 nm FinFET process technology in the production of GP104 graphics processors at the factories of the Taiwanese company TSMC made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost. Compare the number of transistors and the area of ​​the GP104 and GM204 chips - they are close in area (the crystal of the novelty is even slightly smaller physically), but the Pascal architecture chip has a noticeably larger number of transistors, and, accordingly, execution units, including those providing new functionality.

From an architectural point of view, the first game Pascal is very similar to similar solutions of the Maxwell architecture, although there are some differences. Like Maxwell, Pascal processors will have different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SM), and memory controllers. The SM multiprocessor is a highly parallel multiprocessor that schedules and runs warps (warps, groups of 32 instruction threads) on CUDA cores and other execution units in a multiprocessor. You can find detailed information about the design of all these blocks in our reviews of previous solutions from Nvidia.

Each of the SM multiprocessors is paired with the PolyMorph Engine, which handles texture fetching, tessellation, transformation, vertex attribute setting, and perspective correction. Unlike previous solutions from the company, the PolyMorph Engine in the GP104 chip also contains the new Simultaneous Multi-Projection unit, which we will discuss below. The combination of an SM multiprocessor with one Polymorph Engine is traditionally for Nvidia called TPC - Texture Processor Cluster.

In total, the GP104 chip in the GeForce GTX 1080 contains four GPC clusters and 20 SM multiprocessors, as well as eight memory controllers combined with 64 ROP units. Each GPC has a dedicated rasterization engine and includes five SM multiprocessors. Each multiprocessor, in turn, consists of 128 CUDA cores, 256 KB of register file, 96 KB of shared memory, 48 KB of L1 cache, and eight TMU texture units. That is, in total, the GP104 contains 2560 CUDA cores and 160 TMUs.

Also, the GPU on which the GeForce GTX 1080 is based contains eight 32-bit (as opposed to the 64-bit previously used) memory controllers, which gives us a final 256-bit memory bus. Eight ROPs and 256 KB of L2 cache are tied to each memory controller. That is, the GP104 chip contains 64 ROPs and 2048 KB of L2 cache in total.

Thanks to architectural optimizations and a new process technology, the first gaming Pascal has become the most power efficient GPU ever. Moreover, a contribution to this is both from one of the most advanced technological processes of 16 nm FinFET, and from the optimizations of the architecture in Pascal, in comparison with Maxwell. Nvidia was able to increase the clock speed even more than they expected when switching to the new process technology. The GP104 runs at a higher frequency than a hypothetical GM204 would work with a 16nm process. To do this, Nvidia engineers had to carefully check and optimize all the bottlenecks of previous solutions that did not allow overclocking above a certain threshold. As a result, the new GeForce GTX 1080 runs at over 40% faster than the GeForce GTX 980. But that's not all of the GPU clock changes.

GPU Boost 3.0 Technology

As we know well from previous Nvidia graphics cards, they use GPU Boost hardware technology in their GPUs, designed to increase the operating clock speed of the GPU in modes when it has not yet reached the limits for power consumption and heat dissipation. Over the years, this algorithm has undergone many changes, and the third generation of this technology is used in the Pascal architecture video chip - GPU Boost 3.0, the main innovation of which is a finer setting of turbo frequencies, depending on voltage.

If you remember the principle of operation of previous versions of the technology, then the difference between the base frequency (guaranteed minimum frequency value, below which the GPU does not fall, at least in games) and the turbo frequency was fixed. That is, the turbo frequency was always a certain number of megahertz higher than the base one. GPU Boost 3.0 introduces the ability to set turbo frequency offsets for each voltage separately. The easiest way to understand this is from the illustration:

On the left is the GPU Boost of the second version, on the right - the third, which appeared in Pascal. The fixed difference between the base and turbo frequencies did not allow to fully reveal the capabilities of the GPU, in some cases the previous generation GPUs could work faster at the set voltage, but the fixed excess of the turbo frequency did not allow to do this. In GPU Boost 3.0, such an opportunity appeared, and the turbo frequency can be set for each of the individual voltage values, completely squeezing all the juice out of the GPU.

Handy utilities are required to control overclocking and set the turbo frequency curve. Nvidia itself does not do this, but it helps its partners create such utilities to facilitate overclocking (within reasonable limits, of course). For example, new GPU Boost 3.0 functionality has already been revealed in EVGA Precision XOC, which includes a dedicated overclocking scanner that automatically detects and sets the non-linear difference between base frequency and turbo frequency for various voltages by running an on-board performance and stability test. As a result, the user has a turbo frequency curve that perfectly matches the capabilities of a particular chip. Which, moreover, can be modified manually in any way.

As you can see in the screenshot of the utility, in addition to information about the GPU and the system, there are also settings for overclocking: Power Target (determines the typical power consumption during overclocking, as a percentage of the standard), GPU Temp Target (the maximum allowable core temperature), GPU Clock Offset (excess over the base frequency for all voltage values), Memory Offset (excess of the video memory frequency over the default value), Overvoltage (additional option to increase the voltage).

The Precision XOC Utility includes three overclocking modes: Basic Basic, Linear Linear, and Manual Manual. In the main mode, you can set a single overfrequency value (fixed turbo frequency) over the base one, as it was for previous GPUs. Linear mode allows you to set the frequency to ramp from minimum to maximum voltage values ​​for the GPU. Well, in manual mode, you can set unique values ​​for the GPU frequency for each voltage point on the graph.

The utility also includes a special scanner for automatic overclocking. You can either set your own frequency levels or let Precision XOC scan the GPU at all voltages and find the most stable frequencies for each point on the voltage and frequency curve completely automatically. During the scanning process, Precision XOC gradually adds GPU frequency and checks its performance for stability or artifacts, creating an ideal frequency and voltage curve that will be unique for each chip.

This scanner can be customized to suit your own requirements by setting the time interval for testing each voltage value, the minimum and maximum frequency to be tested, and its step. It is clear that in order to achieve stable results, it would be better to set a small step and a decent testing duration. During testing, unstable operation of the video driver and the system may be observed, but if the scanner does not freeze, it will restore operation and continue to find the optimal frequencies.

New type of video memory GDDR5X and improved compression

So, the power of the GPU has grown noticeably, and the memory bus remains only 256-bit - will the memory bandwidth limit the overall performance and what can you do about it? It seems that the promising second-generation HBM memory is still too expensive to manufacture, so other options had to be looked for. Ever since the introduction of GDDR5 memory in 2009, Nvidia engineers have been exploring the possibilities of using new types of memory. As a result, the development came to the introduction of the new GDDR5X memory standard - the most complex and advanced standard at the moment, giving a transfer rate of 10 Gbps.

Nvidia gives an interesting example of how fast it is. Only 100 picoseconds pass between transmitted bits - during this time, the beam of light travels a distance of only one inch (about 2.5 cm). And when using GDDR5X memory, the data transmission and reception circuits must select the value of the transmitted bit in less than half this time, before the next one is sent - this is just so that you understand what modern technologies have reached.

To achieve this speed, it was necessary to develop a new architecture for the data I / O system, which required several years of joint development with memory chip manufacturers. In addition to the increased data transfer rate, energy efficiency has also increased - GDDR5X memory chips use a lower voltage of 1.35 V and are produced using new technologies, which gives the same power consumption at a 43% higher frequency.

The company's engineers had to rework the data transfer lines between the GPU core and memory chips, pay more attention to preventing signal loss and degradation all the way from memory to GPU and back. So, the above illustration shows the captured signal as a large symmetrical "eye", which indicates a good optimization of the entire circuit and the relative ease of capturing data from the signal. Moreover, the changes described above have led not only to the possibility of using GDDR5X at 10 GHz, but also should help to obtain high memory bandwidth on future products using the more familiar GDDR5 memory.

Well, we got more than 40% of the memory bandwidth gain from using the new memory. But isn't this not enough? To further improve memory bandwidth efficiency, Nvidia has continued to improve the advanced data compression introduced in previous architectures. The memory subsystem in GeForce GTX 1080 uses improved and several new lossless data compression techniques designed to reduce memory bandwidth requirements - the fourth generation of in-chip compression.

In-memory data compression algorithms bring several benefits at once. Compression reduces the amount of data written to memory, the same applies to data transferred from the video memory to the L2 cache, which improves the efficiency of the L2 cache, since a compressed tile (a block of several framebuffer pixels) has a smaller size than an uncompressed one. It also reduces the amount of data sent between different points, such as the TMU texture unit and framebuffer.

The data compression pipeline in the GPU uses several algorithms, which are determined depending on the "compressibility" of the data - the best available algorithm is selected for them. One of the most important is the delta color compression algorithm. This compression method encodes data as the difference between successive values ​​instead of the data itself. The GPU calculates the difference in color values ​​between pixels in a block (tile) and stores the block as some average color for the entire block plus data on the difference in values ​​for each pixel. For graphic data, this method is usually well suited, since the color within small tiles for all pixels often does not differ too much.

The GP104 GPU in the GeForce GTX 1080 supports more compression algorithms than previous Maxwell chips. Thus, the 2: 1 compression algorithm has become more efficient, and in addition to it, two new algorithms have appeared: the 4: 1 compression mode, suitable for cases where the difference in the color value of the block pixels is very small, and the 8: 1 mode, which combines the constant 4: 1 compression of 2 × 2 pixel blocks with 2x delta compression between blocks. When compression is not possible at all, it is not used.

However, in reality, the latter happens very rarely. This can be seen from the examples of screenshots from the game Project CARS, which were brought by Nvidia to illustrate the increased compression ratio in Pascal. In the illustrations, those framebuffer tiles that the GPU was able to compress are painted in purple, and the non-lossy ones remained with the original color (top - Maxwell, bottom - Pascal).

As you can see, the new compression algorithms in GP104 actually perform much better than in Maxwell. Although the old architecture was also able to compress most of the tiles in the scene, a lot of grass and trees around the edges, as well as machine parts, are not subject to legacy compression algorithms. But with the introduction of new techniques in Pascal, very few areas of the image remained uncompressed - the improved efficiency is evident.

As a result of improvements in data compression, the GeForce GTX 1080 is able to significantly reduce the amount of data sent per frame. In terms of numbers, the improved compression saves an additional 20% in effective memory bandwidth. In addition to the more than 40% higher memory bandwidth of the GeForce GTX 1080 compared to the GTX 980 from using GDDR5X memory, all this together gives about a 70% increase in effective memory bandwidth compared to the previous generation model.

Async Compute Support

Most modern games use complex calculations in addition to graphical ones. For example, calculations when calculating the behavior of physical bodies can be carried out not before or after graphic calculations, but simultaneously with them, since they are not related to each other and do not depend on each other within one frame. Also, an example is post-processing of already rendered frames and processing of audio data, which can also be performed in parallel with rendering.

Another prime example of the use of functionality is the Asynchronous Time Warp technique used in VR systems to alter the rendered frame to match the player's head movement just before rendering it, interrupting the next rendering. Such asynchronous loading of GPU power allows to increase the efficiency of using its execution units.

Such workloads create two new use cases for the GPU. The first of them includes overlapping loads, since many types of tasks do not fully utilize the capabilities of GPUs, and some resources are idle. In such cases, you can simply run two different tasks on one GPU that separate its execution units for more efficient use - for example, PhysX effects that are performed in conjunction with 3D rendering of a frame.

To improve the performance of this scenario, dynamic load balancing has been introduced to the Pascal architecture. In the previous Maxwell architecture, overlapping workloads were implemented in the form of static allocation of GPU resources to graphics and compute. This approach is effective provided that the balance between the two workloads roughly corresponds to the division of resources and the tasks are completed at the same time. If non-graphical calculations take longer than graphical ones, and both are waiting for the completion of the common work, then part of the GPU will be idle for the remaining time, which will cause a decrease in overall performance and nullify all the benefits. Hardware dynamic load balancing allows the freed up GPU resources to be used as soon as they become available - here's an illustration for understanding.

There are also tasks that are critical to the execution time, and this is the second scenario for asynchronous computation. For example, the execution of the asynchronous time warping algorithm in VR must complete before the scan (scan out) or the frame will be discarded. In this case, the GPU must support very fast interruption of the task and switch to another in order to remove a less critical task from execution on the GPU, freeing up its resources for critical tasks - this is called preemption.

A single render command from the game engine can contain hundreds of draw calls, each draw call in turn contains hundreds of processed triangles, each containing hundreds of pixels to be calculated and drawn. The traditional GPU approach uses only high-level task interruption, and the graphics pipeline has to wait for all this work to complete before switching the task, resulting in very high latency.

To fix this, the Pascal architecture was the first to introduce the ability to interrupt the task at the pixel level - Pixel Level Preemption. Pascal GPU execution units can continuously monitor the progress of rendering tasks, and when an interrupt is requested, they can stop execution, saving the context for later completion by quickly switching to another task.

Interrupting and switching at the thread level for computational operations works similarly to interrupting at the pixel level for graphics computation. Computational workloads consist of several grids, each of which contains many threads. When an interrupt request is received, threads running on the multiprocessor end execution. Other blocks save their own state to continue from the same point in the future, and the GPU switches to another task. The entire task switching process takes less than 100 microseconds after the running threads have finished.

For gaming workloads, the combination of pixel-level interrupts for graphics and thread-level interrupts for compute tasks gives Pascal GPUs the ability to quickly switch between tasks with minimal waste of time. And for computational tasks on CUDA, it is also possible to interrupt with minimal granularity - at the instruction level. In this mode, all threads stop execution immediately, immediately switching to another task. This approach requires storing more information about the state of all registers of each thread, but in some cases of non-graphical calculations, it is quite justified.

The use of fast interruption and task switching in graphics and computational tasks was added to the Pascal architecture so that graphics and non-graphics tasks can be interrupted at the level of individual instructions, rather than entire threads, as was the case in Maxwell and Kepler. These technologies can improve the asynchronous execution of various GPU workloads and improve responsiveness when performing multiple tasks at the same time. At the event, Nvidia showed a demonstration of asynchronous computing using the example of computing physical effects. If without asynchronous computations, the performance was at the level of 77-79 FPS, then with the inclusion of these features, the frame rate increased to 93-94 FPS.

We have already cited as an example one of the possibilities of using this functionality in games in the form of asynchronous time distortion in VR. The illustration shows how this technology works with preemption and fast interrupt. In the first case, the process of asynchronous time distortion is tried to be performed as late as possible, but before the start of updating the image on the display. But the work of the algorithm must be given for execution in the GPU a few milliseconds earlier, since without a quick interruption there is no way to accurately perform the work at the right moment, and the GPU is idle for some time.

In the case of precise interruptions at the pixel and stream level (in the illustration on the right), this allows greater accuracy in determining the moment of interruption, and asynchronous time distortion can be triggered much later with confidence in the completion of work before the information is updated on the display. And the GPU that has been idle for some time in the first case can be loaded with some additional graphic work.

Simultaneous Multi-Projection technology

The new GP104 GPU adds support for the new Simultaneous Multi-Projection (SMP) technology, which allows GPUs to render data more efficiently on modern display systems. SMP allows the video chip to simultaneously output data in several projections, which required introducing a new hardware unit in the GPU into the PolyMorph engine at the end of the geometry pipeline before the rasterization unit. This block is responsible for working with multiple projections for a single geometry flow.

The multi-projection engine processes geometric data simultaneously for 16 pre-configured projections that combine a projection point (camera), these projections can be independently rotated or tilted. Since each geometry primitive can appear simultaneously in multiple projections, the SMP engine provides this functionality, allowing an application to instruct the GPU to replicate geometry up to 32 times (16 projections at two projection centers) without additional processing.

The entire processing process is hardware accelerated, and since the multiprojection works after the geometry engine, it does not need to repeat all the stages of geometry processing several times. The resources saved are important when rendering speed is limited by geometry processing performance, such as tessellation, where the same geometry work is performed multiple times for each projection. Accordingly, at its peak, multi-projection can reduce the need for geometry processing by up to 32 times.

But why is all this necessary? There are some good examples where multi-projection technology can be useful. For example, a multi-monitor system of three displays installed at an angle to each other close enough to the user (surround configuration). In a typical situation, the scene is rendered in one projection, which leads to geometric distortion and incorrect rendering of the geometry. The correct way is to have three different projections for each of the monitors, according to the angle at which they are positioned.

With the help of a video card on a chip with Pascal architecture, this can be done in one geometry pass, specifying three different projections, each for its own monitor. And the user, thus, will be able to change the angle at which the monitors are located to each other, not only physically, but also virtually - by rotating the projections for the side monitors in order to get the correct perspective in the 3D scene with a noticeably wider viewing angle (FOV). However, there is a limitation - for such support, the application must be able to render a scene with a wide FOV and use special SMP API calls to set it. That is, you can't do that in every game, you need special support.

Anyway, the days of one projection onto a single flat panel monitor are gone, now there are many multi-monitor configurations and curved displays that can also use this technology. Not to mention virtual reality systems that use special lenses between the screens and the user's eyes, which requires new techniques for projecting a 3D image into a 2D image. Many of these technologies and techniques are still early in development, the main thing is that older GPUs cannot effectively use more than one plane projection. They require multiple render passes, multiple processing of the same geometry, etc.

Maxwell chips had limited support for Multi-Resolution to help increase efficiency, but Pascal's SMP can do a lot more. Maxwell could rotate the projection 90 degrees for cube mapping or different resolutions for projection, but this was only useful in a limited number of applications like VXGI.

Other possibilities of using SMP include rendering with different resolutions and single-pass stereo rendering. For example, Multi-Res Shading can be used in games to optimize performance. When applied, a higher resolution is used in the center of the frame, and at the periphery it is reduced to obtain a higher rendering speed.

One-pass stereo rendering is used in VR and has already been added to VRWorks and uses the multi-projection capability to reduce the amount of geometry work required for VR rendering. If this feature is used, the GeForce GTX 1080 GPU processes the scene geometry only once, generating two projections for each eye at once, which halves the geometric load on the GPU, and also reduces losses from the driver and OS.

An even more advanced technique for increasing the efficiency of VR rendering is Lens Matched Shading, where multiple projections simulate the geometric distortion required for VR rendering. This method uses multi-projection to render a 3D scene onto a surface that roughly resembles the corrected lens when rendered for output to a VR headset, thus avoiding many unnecessary pixels on the periphery that will be discarded. The easiest way to understand the essence of the method is by illustration - in front of each eye, four slightly unfolded projections are used (in Pascal, you can use 16 projections for each eye - for a more accurate imitation of a curved lens) instead of one:

This approach can save a lot of productivity. So, a typical image for the Oculus Rift is 1.1 megapixels per eye. But due to the difference in projections, the original 2.1 megapixel image is used to render it - 86% more than necessary! The use of multi-projection, implemented in the Pascal architecture, allows you to reduce the resolution of the rendered image to 1.4 megapixels, obtaining one and a half times savings in pixel processing speed, and also saves memory bandwidth.

And together with a two-fold savings in geometry processing speed due to single-pass stereo rendering, the graphics processor of the GeForce GTX 1080 video card is able to provide a significant increase in the performance of VR rendering, which is very demanding on the speed of geometry processing, and even more so on pixel processing.

Improvements in blocks of output and processing of video data

In addition to performance and new functionality related to 3D rendering, the image output and video decoding and encoding capabilities also need to be maintained at a good level. And the first GPU of the Pascal architecture did not disappoint - it supports all modern standards in this sense, including the hardware decoding of the HEVC format required for viewing 4K videos on a PC. Also, future owners of GeForce GTX 1080 graphics cards will soon be able to enjoy streaming 4K video from Netflix and other providers on their systems.

In terms of display output, the GeForce GTX 1080 has HDMI 2.0b support with HDCP 2.2 as well as DisplayPort. So far, DP 1.2 has been certified, but the GPU is ready for certification for newer versions of the standard: DP 1.3 Ready and DP 1.4 Ready. The latter allows you to display images on 4K screens at 120 Hz, and on 5K and 8K displays at 60 Hz using a pair of DisplayPort 1.3 cables. If for the GTX 980 the maximum supported resolution was 5120 × 3200 at 60 Hz, then for the new GTX 1080 it increased to 7680 × 4320 at the same 60 Hz. The reference GeForce GTX 1080 has three DisplayPort outputs, one HDMI 2.0b and one digital Dual-Link DVI.

The new model of the Nvidia video card also received an improved unit for decoding and encoding video data. For example, the GP104 chip meets the high standards of PlayReady 3.0 (SL3000) for streaming video, ensuring that high-quality content from renowned providers like Netflix is ​​played with the highest quality and energy efficiency. Details on support for various video formats during encoding and decoding are given in the table, the new product clearly differs from previous solutions for the better:

But an even more interesting novelty can be called support for the so-called high dynamic range (HDR) displays, which are about to become widespread in the market. TVs are on sale in 2016 (with four million HDR TVs planned to be sold in just a year), and monitors next year. HDR is the biggest breakthrough in display technology in years, delivering double the color tones (75% of the visible spectrum versus 33% for RGB), brighter displays (1000 nits) with higher contrast (10,000: 1) and rich colors.

The emergence of the ability to reproduce content with a greater difference in brightness and richer and more saturated colors will bring the image on the screen closer to reality, the black color will become deeper, and the bright light will dazzle, as in the real world. Consequently, users will see more detail in the bright and dark areas of images compared to standard monitors and TVs.

To support HDR displays, the GeForce GTX 1080 has everything you need - the ability to output 12-bit color, support for BT.2020 and SMPTE 2084 standards, and image output in accordance with the HDMI 2.0b 10/12-bit standard for HDR in 4K- resolution, which Maxwell had. In addition to this, Pascal adds support for decoding the HEVC format in 4K resolution at 60 Hz and 10- or 12-bit color, which is used for HDR video, as well as encoding the same format with the same parameters, but only in 10-bit for HDR video recording or streaming. Also, the novelty is ready for DisplayPort 1.4 standardization for transmitting HDR data over this connector.

By the way, HDR video encoding may be needed in the future in order to transfer such data from a home PC to a SHIELD game console that can play 10-bit HEVC. That is, the user will be able to stream the game from a PC in HDR format. Stop, where can I get games with such support? Nvidia is constantly working with game developers to implement this support, providing them with everything they need (driver support, code examples, etc.) to render HDR images correctly, compatible with existing displays.

At the time of the release of the video card, the GeForce GTX 1080, HDR output is supported by games such as Obduction, The Witness, Lawbreakers, Rise of the Tomb Raider, Paragon, The Talos Principle and Shadow Warrior 2. But this list is expected to be added in the near future. ...

Changes to multi-chip SLI rendering

There were also some changes related to the proprietary technology of multi-GPU SLI rendering, although no one expected this. SLI is used by PC gaming enthusiasts in order to increase performance either to extreme values ​​by installing the most powerful single-chip video cards in tandem, or to get very high frame rates, limiting ourselves to a couple of mid-range solutions, which sometimes cost less than one top-end ( the decision is controversial, but they do it). With 4K monitors, players have almost no choice but to install a pair of video cards, since even top-end models often cannot provide a comfortable game at maximum settings in such conditions.

One of the important components of Nvidia SLI are bridges that connect video cards into a common video subsystem and serve to organize a digital channel for transferring data between them. GeForce video cards traditionally have dual SLI connectors, which served to connect between two or four video cards in 3-Way and 4-Way SLI configurations. Each of the video cards had to be connected to each, since all the GPUs sent the frames they rendered to the main GPU, so two interfaces were needed on each of the cards.

Starting with the GeForce GTX 1080, for all Nvidia Pascal-based graphics cards, two SLIs are tied together to improve data transfer performance between graphics cards, and this new dual-link SLI mode improves performance and comfort when displaying visual information on very high-definition displays. or multi-monitor systems.

For such a mode, new bridges were also needed, called SLI HB. They combine a pair of GeForce GTX 1080 graphics cards over two SLI channels at once, although the new graphics cards are also compatible with the old bridges. For resolutions of 1920 × 1080 and 2560 × 1440 pixels at a refresh rate of 60 Hz, you can use standard bridges, but in more demanding modes (4K, 5K and multi-monitor systems), only new bridges will provide better results in terms of smoothness of frame changes, although the old ones will work. but a little worse.

Also, when using SLI HB bridges, the GeForce GTX 1080 data interface runs at 650 MHz, compared to 400 MHz for conventional SLI bridges on older GPUs. Moreover, for some of the hard old bridges, a higher data transmission frequency with video chips of the Pascal architecture is also available. With an increase in the data transfer rate between GPUs via the doubled SLI interface with an increased operating frequency, a smoother display of frames to the screen is provided, compared to previous solutions:

It should also be noted that support for multi-GPU rendering in DirectX 12 is somewhat different from what was customary before. In the latest version of the graphics API, Microsoft has made many changes related to the operation of such video systems. For software developers, DX12 has two options for using multiple GPUs: Multi Display Adapter (MDA) and Linked Display Adapter (LDA) modes.

Moreover, the LDA mode has two forms: Implicit LDA (which Nvidia uses for SLI) and Explicit LDA (when the game developer takes over the tasks of managing multi-chip rendering. MDA and Explicit LDA modes were just implemented in DirectX 12 in order to provide game developers have more freedom and flexibility when using multi-chip video systems.The difference between the modes is clearly visible in the following table:

In LDA mode, the memory of each GPU can be linked to the memory of another and displayed as a large total volume, of course, with all the performance limitations when data is retrieved from the “foreign” memory. In MDA mode, the memory of each GPU works separately, and different GPUs cannot directly access data from the memory of another GPU. LDA mode is designed for multi-chip systems of similar performance, while MDA mode has fewer restrictions, and discrete and integrated GPUs or discrete solutions with chips from different manufacturers can work together. But this mode also requires more attention and work from developers when programming collaboration in order for GPUs to communicate with each other.

By default, the SLI system based on GeForce GTX 1080 cards only supports two GPUs, and the three- and four-chip configurations are not officially recommended for use, as it becomes more difficult to provide performance gains in modern games from the addition of a third and fourth GPU. For example, many games run into the capabilities of the system's central processor when operating multi-chip video systems, and new games are increasingly using temporal (temporary) techniques that use data from previous frames, in which efficient operation of several GPUs is simply impossible.

However, the operation of systems in other (non-SLI) multi-chip systems remains possible, such as MDA or LDA Explicit modes in DirectX 12 or a two-chip SLI system with a dedicated third GPU for PhysX physical effects. But what about the benchmark records, is it really that Nvidia abandons them altogether? No, of course, but since such systems are in demand in the world by almost a few users, a special Enthusiast Key was invented for such ultra-enthusiasts, which can be downloaded from the Nvidia website and unblock this feature. To do this, you first need to obtain a unique GPU ID by launching a special application, then request the Enthusiast Key on the website and, after downloading it, install the key into the system, thereby unlocking the 3-Way and 4-Way SLI configurations.

Fast Sync technology

Some changes have taken place in synchronization technologies when displaying information on the display. Looking ahead, there is nothing new in G-Sync, and Adaptive Sync technology is not supported either. But Nvidia has decided to improve the smoothness of the output and sync for games that show very high performance when the frame rate is noticeably higher than the monitor's refresh rate. This is especially important for games that require minimal latency and responsiveness, and in which multiplayer battles and competitions take place.

Fast Sync is a new vertical sync alternative that does not have the visual artifacts of picture tearing in the image and is not tied to a fixed refresh rate, which increases latency. What's the problem with vertical sync in games like Counter-Strike: Global Offensive? This game runs on powerful modern GPUs at several hundred frames per second, and the player has a choice: to enable vertical sync or not.

In multiplayer games, users most often chase the minimum latency and disable VSync, getting clearly visible tears in the image, which are extremely unpleasant even at high frame rates. If you enable vertical sync, then the player will experience a significant increase in delays between his actions and the image on the screen when the graphics pipeline slows down to the refresh rate of the monitor.

This is how a traditional conveyor works. But Nvidia decided to separate the rendering and display process using Fast Sync technology. This allows the part of the GPU that is rendering frames at full speed to continue to work as efficiently as possible, storing those frames in a special temporary Last Rendered Buffer.

This method allows you to change the display method and take the best from VSync On and VSync Off modes, getting low latency, but without image artifacts. With Fast Sync, there is no frame flow control, the game engine runs in off-sync mode and is not told to wait until the next one is drawn, so the latency is almost as low as in VSync Off mode. But since Fast Sync independently selects the buffer for displaying and displays the entire frame, there are no picture breaks either.

Fast Sync uses three different buffers, the first two of which work similarly to double buffering in a classic pipeline. The primary buffer (Front Buffer - FB) is a buffer from which information is displayed on the display, a fully rendered frame. The Back Buffer (BB) is the buffer into which information is received during rendering.

When using vertical sync in a high frame rate environment, the game waits for the refresh interval to be reached in order to swap the primary buffer with the secondary buffer to display the entire frame on the screen. This slows down the process, and adding additional buffers like traditional triple buffering will only add latency.

Fast Sync adds a third Last Rendered Buffer (LRB) that is used to hold all the frames just rendered in the back buffer. The name of the buffer speaks for itself, it contains a copy of the last fully rendered frame. And when the moment comes to update the primary buffer, this LRB buffer is copied to the primary as a whole, and not in parts, as from the secondary when vertical sync is disabled. Since copying information from buffers is ineffective, they simply swap (or rename, as it is more convenient to understand), and the new logic for swapping buffers introduced in GP104 controls this process.

In practice, enabling the new Fast Sync synchronization method still provides a slightly higher delay, compared to the completely disabled vertical sync - on average, 8 ms more, but displays the entire frames on the monitor, without unpleasant artifacts on the screen, tearing the image. The new method can be enabled from the graphical settings of the Nvidia Control Panel in the vertical sync control section. However, the default value is application control, and you simply do not need to enable Fast Sync in all 3D applications, it is better to choose this method specifically for games with high FPS.

Virtual reality technologies Nvidia VRWorks

We have already touched on the hot topic of virtual reality more than once in this article, but it was mainly about increasing the frame rate and ensuring low latency, which are very important for VR. All this is very important and there is indeed progress, but so far VR games look far from being as impressive as the best of the "regular" modern 3D games. This happens not only because the leading game developers are not yet particularly engaged in VR applications, but also because of the greater demands of VR on the frame rate, which does not allow the use of many of the usual techniques in such games due to the high demands.

In order to reduce the difference in quality between VR games and regular games, Nvidia decided to release a whole package of corresponding VRWorks technologies, which included a large number of APIs, libraries, engines and technologies that can significantly improve both quality and performance of VR. applications. How does this relate to the announcement of the first game solution in Pascal? It's very simple - some technologies have been introduced into it to help increase productivity and improve quality, and we have already written about them.

And although it is not only about graphics, we will first tell you a little about it. The set of VRWorks Graphics technologies includes the previously mentioned technologies, such as Lens Matched Shading, which use the multiprojection feature that appeared in the GeForce GTX 1080. The new product allows you to get a performance increase of 1.5-2 times in relation to solutions that do not have such support. We also mentioned other technologies, such as MultiRes Shading, which is designed to render at different resolutions in the center of the frame and at its periphery.

But much more unexpected was the announcement of VRWorks Audio technology, designed for high-quality processing of audio data in 3D scenes, which is especially important in virtual reality systems. In conventional engines, the positioning of sound sources in a virtual environment is calculated quite correctly, if the enemy shoots from the right, then the sound is heard louder from this side of the audio system, and this calculation is not too demanding on computing power.

But in reality, sounds go not only to the player, but in all directions and are reflected from various materials, similar to how rays of light are reflected. And in reality, we hear these reflections, although not as clearly as direct sound waves. These indirect sound reflections are usually simulated by special reverb effects, but this is a very primitive approach to the task.

VRWorks Audio uses rendering of sound waves in a similar way to ray tracing in rendering, where the path of light rays is traced to several reflections from objects in the virtual scene. VRWorks Audio also simulates the propagation of sound waves in the environment, where direct and reflected waves are tracked, depending on the angle of incidence and the properties of reflective materials. In its work, VRWorks Audio uses the high-performance ray tracing engine Nvidia OptiX, known for graphics tasks. OptiX can be used for a variety of tasks such as calculating indirect illumination and preparing lightmaps, and now for tracing sound waves in VRWorks Audio.

Nvidia has built accurate sound wave calculations into its VR Funhouse demo, which uses thousands of beams and calculates up to 12 reflections from objects. And in order to learn the advantages of the technology using a clear example, we suggest you watch a video about the operation of the technology in Russian:

It is important that Nvidia's approach differs from traditional sound engines, including the hardware accelerated method using a special block in the GPU from the main competitor. All these methods provide only accurate positioning of sound sources, but do not calculate the reflections of sound waves from objects in a 3D scene, although they can simulate this using the reverberation effect. And yet, using ray tracing technology can be much more realistic, since only this approach will provide an accurate imitation of various sounds, taking into account the size, shape and materials of objects in the scene. It is difficult to say whether such accuracy of calculations is required for a typical player, but we can say for sure: in VR, it can add to users the very realism that is still lacking in regular games.

Well, we just have to tell you about the VR SLI technology, which works in both OpenGL and DirectX. Its principle is extremely simple: a dual-processor video system in a VR application will work in such a way that a separate GPU is allocated to each eye, in contrast to AFR rendering, which is usual for SLI configurations. This significantly improves the overall performance so important for virtual reality systems. In theory, you can use more GPUs, but their number should be even.

This approach was required because AFR is not well suited for VR, since with its help the first GPU will draw an even frame for both eyes, and the second one - an odd one, which does not in any way reduce the latencies that are critical for virtual reality systems. Although the frame rate will be quite high. So with the help of VR SLI, work on each frame is divided into two GPUs - one works on a part of the frame for the left eye, the other for the right, and then these halves of the frame are combined into a whole.

This sharing of work between a pair of GPUs brings nearly 2x the performance gain, allowing for higher frame rates and lower latency than single-GPU systems. However, the use of VR SLI requires special support from the application in order to use this scaling method. But VR SLI technology is already built into VR demo applications such as The Lab from Valve and Trials on Tatooine from ILMxLAB, and this is just the beginning - Nvidia promises the imminent appearance of other applications, as well as the introduction of technology into game engines Unreal Engine 4, Unity and MaxPlay.

Ansel Game Screenshot Platform

One of the most interesting software-related announcements was the release of a technology for capturing high-quality screenshots in gaming applications, named after a famous photographer - Ansel. Games have long become not just games, but also a place where naughty hands are used by various creative personalities. Someone changes scripts for games, someone releases high-quality sets of textures for games, and someone makes beautiful screenshots.

Nvidia decided to help the latter by introducing a new platform for creating (precisely creating, because it's not such a simple process) high-quality pictures from games. They believe Ansel can help create a new kind of contemporary art. After all, there are already quite a few artists who spend most of their lives on the PC, creating beautiful screenshots from games, and they still did not have a convenient tool for this.

Ansel allows you not only to capture the image in the game, but to change it the way the creator needs it. With this technology, you can move the camera around the scene, rotate and tilt it in any direction in order to obtain the desired composition of the frame. For example, in games such as first-person shooters, you can only move the player, you can't change anything else, so all the screenshots are pretty monotonous. With a free camera in Ansel, you can go far beyond the game camera, choosing the angle that is needed for a successful picture, or even capture a full 360-degree stereo picture from the required point, and in high resolution for later viewing in a VR headset.

Ansel works quite simply - using a special library from Nvidia, this platform is embedded in the game code. To do this, its developer only needs to add a small piece of code to his project to allow the Nvidia video driver to intercept data from buffers and shaders. There is very little work there, the implementation of Ansel into the game takes less than one day to implement. So, the inclusion of this feature in The Witness game took about 40 lines of code, and in The Witcher 3 - about 150 lines of code.

Ansel will appear with an open development package - SDK. The main thing is that the user gets along with him a standard set of settings that allow him to change the position and angle of the camera, add effects, etc. result in the form of a regular screenshot, 360-degree picture, stereo pair or just a panorama of a huge resolution.

The only caveat: not all games will receive support for all the features of the Ansel game screenshot platform. Some of the game developers, for one reason or another, do not want to include a completely free camera in their games - for example, because of the possibility of using this functionality by cheaters. Or they want to restrict the change in viewing angle for the same reason - so that no one gets an unfair advantage. Well, or so that users do not see the wretched sprites in the background. All these are quite normal desires of game creators.

One of the most interesting features of Ansel is the creation of screenshots of just a huge resolution. It doesn't matter that the game supports resolutions up to 4K, for example, and the user's monitor is Full HD. With the help of the platform for taking screenshots, you can capture a much higher quality image, rather limited by the volume and performance of the drive. The platform easily captures screenshots up to 4.5 gigapixels, stitched together from 3600 pieces!

It is clear that in such pictures you can see all the details, right down to the text on newspapers lying in the distance, if such a level of detail is, in principle, provided in the game - Ansel can control the level of detail, setting the maximum level to get the best picture quality. But you can also enable supersampling. All this allows you to create images from games that you can safely print on large banners and be calm about their quality.

Interestingly, a special CUDA-based hardware accelerated code is used to stitch together large images. After all, no video card can render a multi-gigapixel image as a whole, but it can do it in pieces, which you just need to combine later, taking into account the possible difference in lighting, color, and so on.

After stitching such panoramas, special post-processing is used for the entire frame, also accelerated on the GPU. And for capturing high dynamic range images, you can use a special image format - EXR, an open standard from Industrial Light and Magic, whose chroma values ​​in each channel are recorded in 16-bit floating point format (FP16).

This format allows you to change the brightness and dynamic range of the image by post-processing, bringing it to the desired one for each specific display, in the same way as it is done with RAW formats from cameras. And for the subsequent application of post-processing filters in image processing programs, this format is very useful, since it contains much more data than the usual formats for images.

But the Ansel platform itself contains many filters for post-processing, which is especially important because it has access not only to the final image, but also to all the buffers used by the game for rendering, which can be used for very interesting effects, such as depth of field. To do this, Ansel has a dedicated API for post-processing, and any of the effects can be included in a game with support for this platform.

Ansel post filters include such filters as: color curves, color space, transformation, desaturation, brightness / contrast, film grain, bloom, lens flare, anamorphic glare, distortion, heathaze, fisheye, color aberration, tone mapping, lens dirt, lightshafts , vignette, gamma correction, convolution, sharpening, edge detection, blur, sepia, denoise, FXAA and others.

As for the appearance of Ansel support in games, then you will have to wait a little while the developers implement it and test it. But Nvidia promises such support will soon appear in such famous games as The Division, The Witness, Lawbreakers, The Witcher 3, Paragon, Fortnite, Obduction, No Man's Sky, Unreal Tournament and others.

The new 16 nm FinFET technological process and architecture optimizations allowed the GeForce GTX 1080 graphics card based on the GP104 GPU to achieve a high clock speed of 1.6-1.7 GHz even in the reference form, and a new generation guarantees operation at the highest possible frequencies in games. GPU Boost technology. Together with the increased number of execution units, these improvements have made the new product not only the highest performing single-chip video card ever, but also the most energy efficient solution on the market.

The GeForce GTX 1080 is the first graphics card to feature a new type of graphics memory, GDDR5X, a new generation of high-speed chips that achieve very high data rates. In the case of the GeForce GTX 1080 modification, this type of memory operates at an effective frequency of 10 GHz. Combined with improved algorithms for compressing information in the framebuffer, this has led to a 1.7x increase in effective memory bandwidth for this GPU, compared to its direct predecessor, the GeForce GTX 980.

Nvidia wisely decided not to release a radically new architecture on a completely new technical process for itself, so as not to face unnecessary problems during development and production. Instead, they have seriously improved on the already good and highly efficient Maxwell architecture, adding some features. As a result, everything is fine with the production of new GPUs, and in the case of the GeForce GTX 1080 model, engineers have achieved a very high frequency potential - in overclocked versions from partners, the GPU frequency is expected up to 2 GHz! This impressive frequency has become a reality thanks to the perfect technical process and the painstaking work of Nvidia engineers in the development of the Pascal GPU.

While Pascal has become a direct successor to the Maxwell case, and these graphics architectures are basically not very different from each other, Nvidia has implemented many changes and improvements, including display capabilities, video encoding and decoding engine, improved asynchronous execution of various types of calculations on the GPU, made changes to multi-chip rendering and introduced a new method of synchronization Fast Sync.

It is impossible not to single out the technology of multiprojection Simultaneous Multi-Projection, which helps to increase performance in virtual reality systems, get more correct display of scenes on multi-monitor systems, and introduce new techniques for optimizing performance. But the greatest gain in speed will be received by VR applications when they support multiprojection technology, which helps to save GPU resources in half when processing geometric data and one and a half times in pixel-by-pixel calculations.

Among the purely software changes, the platform for creating screenshots in games called Ansel stands out - it will be interesting to try it in practice not only for many players, but also for those simply interested in high-quality 3D graphics. The novelty allows you to take the art of creating and retouching screenshots to the next level. Well, Nvidia simply continues to improve such packages for game developers as GameWorks and VRWorks step by step - so, in the latter, an interesting possibility of high-quality sound processing has appeared, taking into account numerous reflections of sound waves using hardware ray tracing.

In general, a real leader entered the market in the form of the Nvidia GeForce GTX 1080 video card, having all the necessary qualities for this: high performance and wide functionality, as well as support for new features and algorithms. The first buyers of this video card will be able to appreciate many of the mentioned advantages immediately, and other possibilities of the solution will be revealed a little later, when there is widespread support from the software. The main thing is that the GeForce GTX 1080 turned out to be very fast and efficient, and some of the problem areas (the same asynchronous calculations), as we really hope, were managed by Nvidia engineers to fix it.

Graphics Accelerator GeForce GTX 1070

ParameterMeaning
Chip codenameGP104
Production technology16 nm FinFET
Number of transistors7.2 billion
Core area314 mm²
ArchitectureUnified, with an array of common processors for streaming processing of numerous types of data: vertices, pixels, etc.
DirectX hardware supportDirectX 12, with support for Feature Level 12_1
Memory bus256-bit: Eight independent 32-bit memory controllers supporting GDDR5 and GDDR5X memory
GPU frequency1506 (1683) MHz
Computing units15 active (out of 20 in the chip) streaming multiprocessors, including 1920 (out of 2560) scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks120 active (out of 160 in the chip) texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operation Blocks (ROPs)8 wide ROP blocks (64 pixels) with support for various anti-aliasing modes, including programmable ones with FP16 or FP32 framebuffer format. The blocks consist of an array of configurable ALUs and are responsible for generation and comparison of depth, multisampling and blending.
Monitor supportIntegrated support for up to four Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3 / 1.4 Ready) monitors

GeForce GTX 1070 Reference Graphics Card Specifications
ParameterMeaning
Core frequency1506 (1683) MHz
Number of universal processors1920
Number of texture units120
Number of blending blocks64
Effective memory frequency8000 (4 × 2000) MHz
Memory typeGDDR5
Memory bus256-bit
Memory8 GB
Memory bandwidth256 GB / s
Computational Performance (FP32)about 6.5 teraflops
Theoretical maximum fill rate96 gigapixels / s
Theoretical texture sampling rate181 gigatexels / s
TirePCI Express 3.0
ConnectorsOne Dual Link DVI, one HDMI and three DisplayPorts
Energy consumptionup to 150 W
Additional foodOne 8-pin connector
The number of slots occupied in the system chassis2
Recommended price$ 379-449 (USA), 34,990 (Russia)

The GeForce GTX 1070 video card also received a logical name, similar to the same solution from the previous GeForce series. It differs from its direct predecessor, the GeForce GTX 970, only in the modified generation number. The novelty becomes in the current line of the company one step lower than the current top-end solution GeForce GTX 1080, which has become the temporary flagship of the new series until the release of solutions on GPUs of even greater power.

Recommended prices for Nvidia's new top-end graphics card are $ 379 and $ 449 for regular versions of Nvidia partners and the special Founders Edition, respectively. Compared to the top model, this is a very good price given that the GTX 1070 is about 25% behind in the worst case. And at the time of the announcement and release, the GTX 1070 becomes the best performance solution in its class. Like the GeForce GTX 1080, the GTX 1070 has no direct competitors from AMD, and can only be compared with the Radeon R9 390X and Fury.

They decided to leave the full 256-bit memory bus for the GP104 graphics processor in the GeForce GTX 1070 modification, although they used not a new type of GDDR5X memory, but a very fast GDDR5, which operates at a high effective frequency of 8 GHz. The amount of memory installed on a video card with such a bus can be equal to 4 or 8 GB, and in order to ensure maximum performance of the new solution in conditions of high settings and rendering resolutions, the GeForce GTX 1070 model was also equipped with 8 GB of video memory, like its older sister. This volume is enough to run any 3D applications with maximum quality settings for several years.

Special Edition GeForce GTX 1070 Founders Edition

When the GeForce GTX 1080 was announced in early May, a special edition of the graphics card called Founders Edition was announced, which has a higher price than the usual graphics cards of the company's partners. The same applies to the new product. In this article, we will again talk about a special edition of the GeForce GTX 1070 graphics card called Founders Edition. As in the case of the older model, Nvidia decided to release this version of the manufacturer's reference video card at a higher price. They argue that many gamers and enthusiasts buying high-end graphics cards want a product with a premium look and feel.

Accordingly, it is for these users that the GeForce GTX 1070 Founders Edition will be released to the market, which is designed and built by Nvidia engineers from premium materials and components, such as the aluminum cover of the GeForce GTX 1070 Founders Edition, as well as a low-profile backplate that covers the back of the PCB and quite popular with enthusiasts.

As you can see from the photos of the board, the GeForce GTX 1070 Founders Edition inherits exactly the same industrial design inherent in the reference version of the GeForce GTX 1080 Founders Edition. Both models use a radial fan that blows heated air outward, which is very useful in both small cases and multi-chip SLI configurations with limited physical space. By blowing heated air outside instead of circulating it inside the chassis, it can reduce thermal stress, improve overclocking results, and extend the life of system components.

Under the cover of the GeForce GTX 1070 reference cooling system, there is a specially shaped aluminum heatsink with three integrated copper heat pipes that remove heat from the GPU itself. The heat dissipated by the heat pipes is then dissipated by an aluminum heatsink. Well, the low-profile metal plate on the back of the board is also designed to provide better thermal performance. It also has a retractable section for better airflow between multiple graphics cards in SLI configurations.

In terms of the board's power supply, the GeForce GTX 1070 Founders Edition has a four-phase power system optimized for a stable power supply. Nvidia says the GTX 1070 Founders Edition has enhanced power efficiency, stability and reliability over the GeForce GTX 970 for better overclocking performance. In the company's own tests, the GeForce GTX 1070 GPUs easily outperformed the 1.9 GHz value, which is close to the results of the older GTX 1080 model.

The Nvidia GeForce GTX 1070 graphics card will be available in retail stores starting June 10. The recommended prices for GeForce GTX 1070 Founders Edition and partner solutions differ, and that is the biggest question for this special edition. If Nvidia partners sell their GeForce GTX 1070 graphics cards starting at $ 379 (in the US market), then the Founders Edition of Nvidia's reference design will cost $ 449. Are there many enthusiasts who are willing to overpay for, frankly, the dubious advantages of the reference version? Time will tell, but we believe that the reference board is more interesting as an option available for purchase at the very beginning of sales, and later the point of acquiring it (and even at a high price!) Is already reduced to zero.

It remains to add that the printed circuit board of the reference GeForce GTX 1070 is similar to that of the older video card and both of them differ from the device of the previous boards of the company. The typical power consumption value for the new product is 150 W, which is almost 20% less than the value for the GTX 1080 and is close to the power consumption of the previous generation GeForce GTX 970. The reference Nvidia board has a familiar set of connectors for connecting image output devices: one Dual-Link DVI , one HDMI and three DisplayPort. Moreover, support for new versions of HDMI and DisplayPort appeared, which we wrote about above in the review of the GTX 1080 model.

Architectural changes

The GeForce GTX 1070 graphics card is based on the GP104 chip, the first of the new generation of Nvidia's Pascal graphics architecture. This architecture took as a basis the solutions worked out in Maxwell, but it also has some functional differences, which we wrote about in detail above - in the part devoted to the top-end GeForce GTX 1080 video card.

The main change in the new architecture was the technological process by which all new GPUs will be executed. The use of the 16 nm FinFET process technology in the production of GP104 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area and cost, and the very first chip of the Pascal architecture has a noticeably larger number of execution units, including those providing new functionality, in comparison with Maxwell chips of similar positioning.

The GP104 video chip is similar in its structure to similar solutions of the Maxwell architecture, and you can find detailed information about the device of modern GPUs in our reviews of previous solutions from Nvidia. Like previous GPUs, the new architecture chips will have different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SM) and memory controllers, and some changes have already occurred in the GeForce GTX 1070 - part of the chip has been locked and inactive ( highlighted in gray):

Although the GP104 GPU includes four GPC clusters and 20 SM multiprocessors, in the version for the GeForce GTX 1070 it received a stripped-down modification with one GPC cluster disabled by hardware. Since each GPC has a dedicated rasterization engine and includes five SM multiprocessors, and each multiprocessor consists of 128 CUDA cores and eight TMU texture units, in this version of GP104 1920 CUDA cores and 120 TMUs from 2560 stream processors are active. and 160 texture units available physically.

The graphics processor on which the GeForce GTX 1070 is based contains eight 32-bit memory controllers, giving the final 256-bit memory bus - exactly like in the case of the older GTX 1080 model. The memory subsystem has not been cut to provide sufficient bandwidth memory with the condition of using GDDR5 memory in the GeForce GTX 1070. Eight ROPs and 256 KB of L2 cache are tied to each of the memory controllers, so the GP104 chip in this modification also contains 64 ROPs and 2048 KB of L2 cache level.

Thanks to architectural optimizations and a new process technology, the GP104 GPU has become the most energy efficient GPU to date. Nvidia engineers were able to increase the clock speed more than they expected when moving to the new process technology, for which they had to work hard, carefully checking and optimizing all the bottlenecks of the previous solutions, which did not allow them to operate at a higher frequency. Accordingly, the GeForce GTX 1070 also runs at a very high frequency, more than 40% higher than the reference value for the GeForce GTX 970.

Since the GeForce GTX 1070 model is, in essence, just a slightly less productive GTX 1080 with GDDR5 memory, it supports absolutely all the technologies we described in the previous section. For more details on the Pascal architecture, as well as the technologies it supports, such as improved video output and processing units, support for Async Compute, Simultaneous Multi-Projection technology, changes to SLI multi-GPU rendering, and the new Fast Sync type of synchronization, check out with a GTX 1080 section.

High-performance GDDR5 memory and its efficient use

We wrote above about the changes in the memory subsystem of the GP104 GPU, on which the GeForce GTX 1080 and GTX 1070 models are based - the memory controllers included in this GPU support both the new type of GDDR5X video memory, which is described in detail in the GTX 1080 review, as well as and the good old GDDR5 memory that we have known for several years.

In order not to lose too much in memory bandwidth in the younger model GTX 1070 compared to the older GTX 1080, all eight 32-bit memory controllers were left active in it, having received a full-fledged 256-bit common video memory interface. In addition, the video card was equipped with the highest-speed GDDR5 memory available on the market - with an effective operating frequency of 8 GHz. All this provided a memory bandwidth of 256 GB / s, as opposed to 320 GB / s in the older solution - the computing capabilities were cut by about the same amount, so the balance was respected.

Keep in mind that while peak theoretical bandwidth is important for GPU performance, you need to pay attention to how efficiently it is used. During rendering, many different bottlenecks can limit overall performance, preventing all available memory bandwidth from being used. To minimize these bottlenecks, GPUs use special lossless compression to improve the efficiency of read and write operations.

In the Pascal architecture, the fourth generation of delta-compression of buffer information was introduced, which allows the GPU to more efficiently use the available video memory bus capabilities. The memory subsystem in GeForce GTX 1070 and GTX 1080 uses improved old and several new lossless data compression techniques designed to reduce memory bandwidth requirements. This reduces the amount of data written to memory, improves L2 cache efficiency, and reduces the amount of data transferred between different points in the GPU, such as the TMU and framebuffer.

GPU Boost 3.0 and overclocking features

Most of Nvidia's partners have already announced factory overclocked solutions based on the GeForce GTX 1080 and GTX 1070. And many of the video card manufacturers are creating special overclocking utilities that allow you to use the new functionality of GPU Boost 3.0 technology. One example of such utilities is EVGA Precision XOC, which includes an automatic scanner to determine the voltage versus frequency curve - in this mode, for each voltage value, by running a stability test, a stable frequency is found at which the GPU provides performance gains. However, this curve can also be changed manually.

We know GPU Boost technology well from previous Nvidia video cards. In their GPUs, they use this hardware feature designed to increase the operating clock speed of the GPU in modes where it has not yet reached the limits for power consumption and heat dissipation. In Pascal GPUs, this algorithm has undergone several changes, the main one being a finer setting of turbo frequencies, depending on the voltage.

If earlier the difference between the base frequency and the turbo frequency was fixed, then in GPU Boost 3.0 it became possible to set the offsets of the turbo frequencies for each voltage separately. The turbo frequency can now be set for each of the individual voltage values, which allows you to fully squeeze all the overclocking capabilities out of the GPU. We wrote about this in detail in our GeForce GTX 1080 review, and you can use the EVGA Precision XOC and MSI Afterburner utilities for this.

Since some details have changed in the overclocking method with the release of video cards with support for GPU Boost 3.0, Nvidia had to make additional explanations in the instructions for overclocking the new products. There are different overclocking techniques with different variable characteristics that affect the final result. For any given system, a specific method may be better suited, but the basis is always about the same.

Many of the overclockers use the Unigine Heaven 4.0 benchmark to check the stability of the system, which perfectly loads the GPU with work, has flexible settings and can be run in windowed mode along with an overclocking and monitoring utility window nearby, such as EVGA Precision or MSI Afterburner. However, such a check is sufficient only for initial estimates, and to firmly confirm the stability of overclocking, it must be checked in several gaming applications, because different games assume different loads on different functional blocks of the GPU: mathematical, texture, geometric. The Heaven 4.0 benchmark is also convenient for the overclocking task, because it has a looped operation mode, in which it is convenient to change the overclocking settings and there is a benchmark for assessing the speed gain.

Nvidia advises running Heaven 4.0 and EVGA Precision XOC windows together when overclocking new GeForce GTX 1080 and GTX 1070 graphics cards. First, it is advisable to immediately increase the fan speed. And for serious overclocking, you can immediately set the speed value to 100%, which will make the video card work very loud, but cool the GPU and other components of the video card as much as possible, lowering the temperature to the lowest possible level, preventing throttling (a decrease in frequencies due to an increase in the GPU temperature above a certain value ).

Next, you need to set the Power Target also to the maximum. This setting will provide the GPU with as much power as possible by increasing the power consumption and GPU Temp Target. For some purposes, the second value can be separated from the Power Target change, and then these settings can be adjusted individually - to achieve less heating of the video chip, for example.

The next step is to increase the GPU Clock Offset - it means how much higher the turbo frequency will be during operation. This value increases the frequency for all voltages and results in better performance. As usual, when overclocking, you need to check the stability when raising the GPU frequency in small steps - from 10 MHz to 50 MHz per step before you notice a hang, a driver or application error, or even visual artifacts. When this limit is reached, you should reduce the frequency value one step down and check the stability and performance during overclocking again.

In addition to the GPU frequency, you can also increase the video memory frequency (Memory Clock Offset), which is especially important in the case of the GeForce GTX 1070 equipped with GDDR5 memory, which usually overclocks well. In the case of the memory frequency, the process exactly repeats what is done when a stable GPU frequency is found, the only difference is that the steps can be made larger - add 50-100 MHz to the base frequency at once.

In addition to the steps described above, you can also increase the voltage limit (Overvoltage), because a higher frequency of the GPU is often achieved at increased voltage, when unstable parts of the GPU receive additional power. True, the potential disadvantage of increasing this value is the possibility of damage to the video chip and its accelerated failure, so you need to use the increase in voltage with extreme caution.

Overclocking enthusiasts use slightly different techniques, changing the parameters in a different order. For example, some overclockers share their experiments on finding a stable GPU and memory frequency so that they do not interfere with each other, and then test the combined overclocking of both the video chip and memory chips, but these are already insignificant details of the individual approach.

Judging by the opinions in the forums and the comments on the articles, some users did not like the new GPU Boost 3.0 algorithm, when the GPU frequency at first rises very high, often higher than the turbo frequency, but then, under the influence of the GPU temperature rise or increased power consumption above the set limit, it can drop to much lower values. This is just the specifics of the updated algorithm, you need to get used to the new behavior of the dynamically variable GPU frequency, but it does not bear any negative consequences.

The GeForce GTX 1070 graphics card is the second model after the GTX 1080 in Nvidia's new line of Pascal GPUs. The new 16nm FinFET process technology and architecture optimizations allowed the presented video card to achieve high clock speeds, which is also helped by the new generation of GPU Boost technology. Even though the number of functional units in the form of stream processors and texture units was reduced, their number remained sufficient for the GTX 1070 to become the most profitable and energy efficient solution.

The installation of GDDR5 memory on the youngest of the pair of released models of Nvidia video cards based on the GP104 chip, unlike the new type of GDDR5X, which the GTX 1080 differs, does not prevent it from achieving high performance indicators. Firstly, Nvidia decided not to cut the memory bus of the GeForce GTX 1070 model, and secondly, they installed the fastest GDDR5 memory with an effective frequency of 8 GHz, which is only slightly below 10 GHz for the GDDR5X used in the older model. Taking into account the improved delta compression algorithms, the effective memory bandwidth of the GPU has become higher than that of the similar model of the previous generation GeForce GTX 970.

GeForce GTX 1070 is good in that it offers very high performance and support for new features and algorithms at a significantly lower price compared to the older model announced a little earlier. If a few enthusiasts can afford the purchase of a GTX 1080 for 55,000, then a much larger circle of potential buyers will be able to pay 35,000 for only a quarter less productive solution with exactly the same capabilities. It was the combination of relatively low price and high performance that made the GeForce GTX 1070 arguably the best buy at the time of its release.

Graphics Accelerator GeForce GTX 1060

ParameterMeaning
Chip codenameGP106
Production technology16 nm FinFET
Number of transistors4.4 billion
Core area200 mm²
ArchitectureUnified, with an array of common processors for streaming processing of numerous types of data: vertices, pixels, etc.
DirectX hardware supportDirectX 12, with support for Feature Level 12_1
Memory bus192-bit: Six independent 32-bit memory controllers supporting GDDR5 memory
GPU frequency1506 (1708) MHz
Computing units10 streaming multiprocessors, including 1280 scalar ALUs for floating point calculations within the framework of the IEEE 754-2008 standard;
Texturing blocks80 texture addressing and filtering units with support for FP16 and FP32 components in textures and support for trilinear and anisotropic filtering for all texture formats
Raster Operation Blocks (ROPs)6 wide ROP blocks (48 pixels) with support for various anti-aliasing modes, including programmable ones with FP16 or FP32 framebuffer format. The blocks consist of an array of configurable ALUs and are responsible for generation and comparison of depth, multisampling and blending.
Monitor supportIntegrated support for up to four Dual Link DVI, HDMI 2.0b and DisplayPort 1.2 (1.3 / 1.4 Ready) monitors

GeForce GTX 1060 Reference Graphics Card Specifications
ParameterMeaning
Core frequency1506 (1708) MHz
Number of universal processors1280
Number of texture units80
Number of blending blocks48
Effective memory frequency8000 (4 × 2000) MHz
Memory typeGDDR5
Memory bus192-bit
Memory6 GB
Memory bandwidth192 GB / s
Computational Performance (FP32)about 4 teraflops
Theoretical maximum fill rate72 gigapixels / s
Theoretical texture sampling rate121 gigatexels / s
TirePCI Express 3.0
ConnectorsOne Dual Link DVI, one HDMI and three DisplayPorts
Typical power consumption120 watts
Additional foodOne 6-pin connector
The number of slots occupied in the system chassis2
Recommended price$ 249 ($ 299) in the US and 18,990 in Russia

The GeForce GTX 1060 video card also received a name similar to the same solution from the previous GeForce series, differing from the name of its direct predecessor, the GeForce GTX 960, only by the modified first digit of the generation. The novelty has become in the current line of the company one step below the previously released GeForce GTX 1070 solution, which is the average speed in the new series.

Recommended prices for Nvidia's new video card are $ 249 and $ 299 for the regular versions of the company's partners and for the special edition of the Founder's Edition, respectively. Compared to the two older models, this is a very favorable price, since the new GTX 1060 model, although inferior to top-end motherboards, is not nearly as much as it is cheaper than them. At the time of the announcement, the new product definitely became the best performance solution in its class and one of the most advantageous offers in this price range.

This model of the video card of the Pascal family of Nvidia came out to counteract the fresh decision of the competing company AMD, which released the Radeon RX 480 to the market a little earlier. ... GeForce GTX 1060 is more expensive ($ 249-299 versus $ 199-229), but clearly faster than the competitor.

The GP106 graphics processor has a 192-bit memory bus, so the amount of memory installed on a video card with such a bus can be 3 or 6 GB. A lower value in modern conditions is frankly not enough, and many game projects, even in Full HD resolution, will run into a lack of video memory, which will seriously affect the smoothness of rendering. To maximize the performance of the new solution in high settings, the GeForce GTX 1060 model was equipped with 6 GB of video memory, which is enough to run any 3D application at any quality setting. Moreover, today there is simply no difference between 6 and 8 GB, and such a solution will save a little money.

The typical power consumption value for the new product is 120 W, which is 20% less than the value for the GTX 1070 and is equal to the power consumption of the previous generation GeForce GTX 960, which has much lower performance and capabilities. The reference board has the usual set of connectors for connecting video output devices: one Dual-Link DVI, one HDMI and three DisplayPort. Moreover, support for new versions of HDMI and DisplayPort appeared, which we wrote about in the review of the GTX 1080 model.

The length of the reference GeForce GTX 1060 board is 9.8 inches (25 cm), and from the differences from the older versions, we separately note that the GeForce GTX 1060 does not support the SLI multi-chip rendering configuration, and does not have a special connector for this. Since the board consumes less power than the older models, one 6-pin PCI-E external power connector was installed on the board for additional power supply.

GeForce GTX 1060 video cards have appeared on the market since the day of the announcement in the form of products of the company's partners: Asus, EVGA, Gainward, Gigabyte, Innovision 3D, MSI, Palit, Zotac. A special edition of the GeForce GTX 1060 Founder's Edition, produced by Nvidia itself, will also be released in limited quantities, which will be sold at a price of $ 299 exclusively on the Nvidia website and will not be officially presented in Russia. The Founder's Edition is built with high quality materials and components, including an aluminum casing, and uses an efficient cooling system, low resistance power circuits and specially designed voltage regulators.

Architectural changes

The GeForce GTX 1060 graphics card is based on a completely new GPU model GP106, which is functionally no different from the first-born of the Pascal architecture in the form of the GP104 chip, on which the GeForce GTX 1080 and GTX 1070 models described above are based. Maxwell, but it also has some functional differences, which we wrote about in detail earlier.

The GP106 video chip is similar in its structure to the top-end Pascal chip and similar solutions of the Maxwell architecture, and you can find detailed information about the device of modern GPUs in our reviews of previous solutions from Nvidia. Like previous GPUs, the new architecture chips have different configurations of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SM) and memory controllers:

The GP106 GPU contains two GPC clusters, consisting of 10 Streaming Multiprocessors (SM), that is, exactly half of the GP104. As in the older GPU, each of the multiprocessors contains 128 computing cores, 8 TMU texture units, 256 KB of register memory, 96 KB of shared memory and 48 KB of L1 cache. As a result, the GeForce GTX 1060 contains a total of 1280 processing cores and 80 texture units - half the size of the GTX 1080.

But the memory subsystem of the GeForce GTX 1060 has not been cut in half in relation to the top solution, it contains six 32-bit memory controllers, giving the final 192-bit memory bus. With an effective frequency of GDDR5 video memory for the GeForce GTX 1060 equal to 8 GHz, the bandwidth reaches 192 GB / s, which is quite good for a solution in this price segment, especially given the high efficiency of its use in Pascal. Each memory controller has eight ROPs and 256KB L2 cache tied to each memory controller, so in total the full version of the GP106 GPU contains 48 ROPs and 1536KB L2 cache.

To reduce the memory bandwidth requirements and make more efficient use of the available Pascal architecture, lossless intra-chip compression has been further improved, which is capable of compressing data in buffers, gaining efficiency and performance gains. In particular, new delta compression methods have been added to the chips of the new family with 4: 1 and 8: 1 ratios, providing an additional 20% to the memory bandwidth efficiency compared to the previous solutions of the Maxwell family.

The base frequency of the new GPU is 1506 MHz - the frequency should not drop below this mark in principle. Typical Turbo Clock (Boost Clock) is much higher at 1708 MHz, which is the average value of the real clock at which the GeForce GTX 1060 graphics chip runs in a wide range of games and 3D applications. The actual Boost frequency depends on the game and the testing environment.

Like the rest of the Pascal family, the GeForce GTX 1060 not only runs at a high clock speed, providing high performance, but also has a decent headroom for overclocking. The first experiments indicate the possibility of achieving frequencies of the order of 2 GHz. It is not surprising that the company's partners are preparing, among other things, factory-overclocked versions of the GTX 1060 video card.

So, the main change in the new architecture was the 16 nm FinFET technological process, the use of which in the production of GP106 made it possible to significantly increase the complexity of the chip while maintaining a relatively low area of ​​200 mm², therefore this chip of the Pascal architecture has a noticeably larger number of execution units in comparison with the Maxwell chip of similar positioning manufactured using the 28 nm process technology.

If GM206 (GTX 960) with an area of ​​227 mm² had 3 billion transistors and 1024 ALUs, 64 TMUs, 32 ROPs and a 128-bit bus, then the new GPU has already accommodated 4.4 billion transistors, 1280 ALUs in 200 mm², 80 TMUs and 48 ROPs with 192-bit bus. Moreover, at almost one and a half times higher frequency: 1506 (1708) versus 1126 (1178) MHz. And this is at the same power consumption of 120 W! As a result, the GP106 GPU has become one of the most energy efficient GPUs, along with the GP104.

New Nvidia Technologies

One of the most interesting technologies of the company, which is supported by the GeForce GTX 1060 and other solutions of the Pascal family, is the technology Nvidia Simultaneous Multi-Projection... We already wrote about this technology in our GeForce GTX 1080 review, and it allows you to use several new techniques to optimize rendering. In particular, to simultaneously project a VR image for two eyes at once, significantly increasing the efficiency of using the GPU in virtual reality.

To support SMP, all Pascal GPUs have a dedicated engine located in the PolyMorph Engine at the end of the geometry pipeline, before the rasterization unit. With its help, the GPU can simultaneously project a geometric primitive onto several projections from one point, while these projections can be stereo (i.e., up to 16 or 32 projections are supported simultaneously). This capability allows Pascal GPUs to accurately reproduce a curved surface for VR rendering, as well as display correctly on multi-monitor systems.

It is important that Simultaneous Multi-Projection technology is already being integrated into popular game engines (Unreal Engine and Unity) and games, and to date, support for the technology has been announced for more than 30 games in development, including such well-known projects as Unreal Tournament. , Poolnation VR, Everest VR, Obduction, Adr1ft and Raw Data. Interestingly, although Unreal Tournament is not a VR game, it uses SMP to achieve better visuals and improve performance.

Another highly anticipated technology is a powerful screenshot tool in games. Nvidia Ansel... This tool allows you to create unusual and very high-quality screenshots from games, with previously inaccessible features, saving them in very high resolution and supplementing them with various effects, and share your works. Ansel allows you to literally build a screenshot the way the artist wants it, allowing you to install a camera with any parameters anywhere in the scene, apply powerful post filters to the image, or even take a 360-degree picture for viewing in a virtual reality helmet.

Nvidia has standardized the integration of Ansel's user interface into games, and it's as easy as adding a few lines of code to the code. There is no need to wait for this feature to appear in games, you can evaluate Ansel's abilities right now in Mirror’s Edge: Catalyst, and a little later it will become available in Witcher 3: Wild Hunt. In addition, there are many Ansel-enabled game projects in development, including games such as Fortnite, Paragon and Unreal Tournament, Obduction, The Witness, Lawbreakers, Tom Clancy's The Division, No Man's Sky and others.

Also the new GeForce GTX 1060 GPU supports the toolbox Nvidia VRWorks to help developers create impressive VR experiences. This package includes many utilities and tools for developers, including VRWorks Audio, which allows you to perform very accurate calculations of the reflections of sound waves from objects in the scene using ray tracing on the GPU. The package also includes integration into VR and PhysX physics effects to provide physically correct behavior of objects in the scene.

One of the brightest virtual games to take advantage of VRWorks is VR Funhouse, a virtual reality game from Nvidia itself, available for free on Valve Steam. This game is based on the Unreal Engine 4 (Epic Games) and runs on GeForce GTX 1080, 1070 and 1060 graphics cards in conjunction with HTC Vive VR headsets. Moreover, the source code of this game will be publicly available, which will allow other developers to use ready-made ideas and code already in their VR rides. Take our word for it, this is one of the most impressive demos of virtual reality capabilities.

Including thanks to SMP and VRWorks technologies, the use of the GeForce GTX 1060 GPU in VR applications provides sufficient performance for the entry-level virtual reality, and the GPU in question meets the minimum required hardware level, including for SteamVR, becoming one of the most successful acquisitions for use on systems with official VR support.

Since the GeForce GTX 1060 model is based on the GP106 chip, which is in no way inferior in capabilities to the GP104 graphics processor, which became the basis for the older modifications, it supports absolutely all the technologies described above.

The GeForce GTX 1060 graphics card is the third model in Nvidia's new line of Pascal GPUs. The new 16 nm FinFET technological process and architecture optimizations allowed all new video cards to achieve a high clock frequency and place more functional blocks in the GPU in the form of stream processors, texture units and others, compared to the previous generation video chips. That is why the GTX 1060 has become the most profitable and energy efficient solution in its class and in general.

It is especially important that the GeForce GTX 1060 offers sufficiently high performance and support for new features and algorithms at a significantly lower price compared to older solutions based on the GP104. The new model's GP106 graphics chip delivers best-in-class performance and power efficiency. The GeForce GTX 1060 is specially designed and perfectly suited for all modern games at high and maximum graphics settings in a resolution of 1920x1080 and even with full-screen anti-aliasing enabled by various methods (FXAA, MFAA or MSAA).

And for those looking to get even better performance with ultra-high-resolution displays, Nvidia has top-end GeForce GTX 1070 and GTX 1080 graphics cards that are also quite good in performance and power efficiency. Nevertheless, the combination of low price and sufficient performance distinguishes the GeForce GTX 1060 quite favorably against the background of older solutions. Compared to the competing Radeon RX 480, Nvidia's solution is slightly faster with less complexity and GPU footprint, and has significantly better power efficiency. True, it is sold a little more expensive, so each video card has its own niche.

Nvidia GeForce GTX 1080 Pascal Review | Introducing the GP104 GPU

In anticipation of the Computex exhibition, Nvidia decided to present its long-awaited novelty - the Pascal architecture adapted for gamers. In the new GeForce GTX 1080 and 1070 graphics cards, the manufacturer installs the GP104 GPU. Today, we will consider the older model, and the younger one should be in our hands at the beginning of June.

The Pascal architecture promises faster and more efficient performance, more compute modules, a smaller die area, and faster memory with an upgraded controller. It is better suited for virtual reality, 4K gaming and other high-performance tasks.

As always, we will try to understand the manufacturer's promises and test them in practice. Let's start.

Will the GeForce GTX 1080 change the balance of power in the high-end segment?

The Nvidia GeForce GTX 1080 is the fastest of the two gaming graphics cards announced earlier this month. Both use the GP104 GPU, which, by the way, is already the second GPU with Pascal microarchitecture (the first was the GP100, which appeared at GTC in April). Nvidia CEO Ren-Sun Huan teased enthusiasts when introducing the new product to the general public, claiming that the GeForce GTX 1080 will outperform two 980s in SLI.

He also noted that the GTX 1080, with higher performance, has less power consumption than the 900 series. It is twice as productive and three times more efficient than the former flagship GeForce Titan X, but if you look closely at the accompanying graphs and diagrams, it turns out that such an impressive difference manifests itself in certain tasks related to virtual reality. But even if these promises are only partially confirmed, we will still have some very interesting times in terms of the development of high-end PC games.

Virtual reality is starting to gain momentum, but the high hardware requirements for the graphics subsystem create a significant barrier to access to these technologies. In addition, most games available today do not know how to take advantage of multiprocessor rendering. That is, you are usually limited by the capabilities of one fast video adapter with one GPU. The GTX 1080 is capable of outperforming two 980s in speed and shouldn't struggle in today's VR games, eliminating the need for future multi-processor configurations.

The 4K ecosystem is progressing at a similar pace. Higher bandwidth interfaces such as HDMI 2.0b and DisplayPort 1.3 / 1.4 should open the door for 4K monitors with 120Hz panels and dynamic refresh rate support by the end of this year. While the previous generations of top-end GPUs from AMD and Nvidia were marketed as 4K gaming solutions, users had to make quality tradeoffs to maintain acceptable frame rates. The GeForce Nvidia GTX 1080 could be the first graphics card to be fast enough to support high frame rates at 3840x2160 with maximum detail settings.

What is the situation with configurations from multiple monitors? Many gamers are ready to install three monitors with a resolution of 1920x1080, but provided that the graphics system can handle the load, because in this case the card has to render half a million pixels, since the resolution is 7680x1440. There are even enthusiasts willing to take three 4K displays with an aggregate resolution of 11520x2160 pixels.

The latter option is too exotic even for a new gaming flagship video card. However, the Nvidia GP104 processor is equipped with technology that promises to improve the experience of the new model's tasks, i.e. 4K and Surround. But before we move on to new technologies, let's take a closer look at the GP104 processor and its underlying Pascal architecture.

What is the GP104 made of?

Since early 2012, AMD and Nvidia have been using a 28nm process technology. By switching to it, both companies made a significant leap forward with the introduction of the Radeon HD 7970 and GeForce GTX 680 graphics cards. However, over the next four years, they had to dodge a lot to get more performance out of existing technology. The accomplishments of the Radeon R9 Fury X and GeForce GTX 980 Ti are marvels given their complexity. Nvidia's first 28nm chip was the GK104, which consisted of 3.5 billion transistors. The GM200, which powers the GeForce GTX 980 Ti and Titan X, already has eight billion transistors.

The move to TSMC's 16nm FinFET Plus technology allowed Nvidia engineers to bring new ideas to life. According to technical data, 16FF + chips are 65% faster, can have twice the density than 28HPM, or consume 70 less power. Nvidia uses the optimal combination of these advantages to build its GPUs. TSMC claims that it was based on engineering developments of the existing 20 nm process, but used FinFET transistors instead of flat transistors. The company says this approach reduces scrap and increases plate yields. It also claims that the company did not have a 20nm process with fast transistors. Again, the world of computer graphics has been "sitting" on the 28 nm process technology for more than four years.


GP104 processor block diagram

The successor to the GM204 consists of 7.2 billion transistors located in an area of ​​314 mm2. By comparison, the GM204 has a 398mm2 die area with 5.2 billion transistors. In the full version, one GPU GP104 has four Graphics Processing Clusters (GPC). Each GPC includes five Thread / Texture Processing Clusters (TPC) and a ROP unit. TPC combines one Streaming Multiprocessor SM and PolyMorph engine. SM combines 128 single-precision CUDA cores, 256KB of register memory, 96KB of shared memory, 48KB of L1 cache / textures, and eight texture units. The fourth generation of the PolyMorph engine includes a new logic block, which is located at the end of the geometry pipeline in front of the rasterization block, it controls the Simultaneous Multi-Projection function (more on that below). In total, we get 20 SM, 2560 CUDA cores and 160 texture processing units.


One streaming multiprocessor (SM) in GP104

The GPU back-end includes eight 32-bit memory controllers (256-bit total channel width), eight ROPs, and 256KB L2 cache per block. As a result, we have 64 ROPs and 2 MB of shared L2 cache. Although the block diagram of the Nvidia GM204 processor showed four 64-bit controllers and 16 ROPs, they have been grouped and functionally equivalent.

Some of the building blocks of the GP104 are similar to the GM204, as the new GPU was created from the "building blocks" of its predecessor. There is nothing wrong. If you remember, the company focused on energy efficiency in the Maxwell architecture and did not shake up the blocks, which were Kepler's strong point. We see a similar picture here.

The addition of four SMs may not have a noticeable impact on performance. However, the GP104 has a few tricks up its sleeve. The first trump card is significantly higher clock speeds. The base GPU clock speed is 1607 MHz. The GM204 specs, for comparison, indicate 1126 MHz. The maximum GPU Boost frequency reaches 1733 MHz, but we brought our sample up to 2100 MHz using the beta version of the EVGA PrecisionX utility. Where does such a headroom for overclocking come from? According to John Albin, senior vice president of GPU development, his team knew that TSMC's 16FF + process would affect the chip's architecture, so they focused on optimizing the chip's timings to remove bottlenecks that prevent higher clock speeds. As a result, the GP104 single precision computing speed reached 8228 GFLOPs (at the base frequency) compared to the ceiling of 4612 GFLOPs in the GeForce GTX 980. The texel fill rate jumped from 155.6 Gtex / s in the 980 (with GPU Boost) to 277, 3 Gtex / s.

GPU GeForce GTX 1080 (GP104) GeForce GTX 980 (GM204)
SM 20 16
Number of CUDA cores 2560 2048
GPU base frequency, MHz 1607 1126
GPU frequency in Boost mode, MHz 1733 1216
Computation speed, GFLOPs (at base frequency) 8228 4612
Number of texture units 160 128
Speed ​​of filling texels, Gtex / s 277,3 155,6
Memory data transfer rate, Gbps 10 7
Memory bandwidth, GB / s 320 224
Number of ROP units 64 64
L2 cache size, MB 2 2
Thermal package, W 180 165
Number of transistors 7.2 billion 5.2 billion
Crystal area, mm2 314 398 mm
Process technology, nm 16 28

The back end still includes 64 ROPs and a 256-bit memory bus, but Nvidia has implemented GDDR5X memory to increase the available bandwidth. The company has put a lot of effort into promoting the new type of memory, especially against the background of HBM memory, which is used in various AMD video cards and HBM2, which Nvidia installs in the Tesla P100. There is a feeling that the market now has a shortage of HBM2 memory, and the company is not ready to accept the limitations of HBM (four 1 GB stacks, or the difficulties associated with the implementation of eight 1 GB stacks). Thus, we got GDDR5X video memory, the supply of which, apparently, is also limited, since the GeForce GTX 1070 already uses the usual GDDR5. But this does not override the merits of the new solution. The GDDR5 memory in the GeForce GTX 980 had a data transfer rate of 7 Gbps. This provided 224 GB / s of bandwidth over the 256-bit bus. GDDR5X kicks off at 10Gb / s, bumping up to 320GB / s throughput (~ 43% increase). According to Nvidia, the gains are due to the redesigned I / O scheme, without increasing power consumption.

The Maxwell architecture has made more efficient use of bandwidth by optimizing cache and compression algorithms, and Pascal follows the same path with new lossless compression methods to more economically use the available bandwidth of the memory subsystem. The delta color compression algorithm tries to achieve a 2: 1 gain, and this mode has been improved for more frequent use. There is also a new 4: 1 mode, which is used in cases where the differences per pixel are very small. Finally, Pascal introduces another new 8: 1 algorithm that applies 4: 1 compression to 2x2 blocks, the difference between which is processed using a 2: 1 algorithm.



The difference is not difficult to illustrate. The first image shows an uncompressed screenshot from the Project CARS game. The following image shows the elements that a Maxwell card can compress and are colored purple. In the third image, you can see that Pascal compresses the scene even more. According to Nvidia, this difference translates into about a 20% reduction in byte information that needs to be fetched from memory for each frame.

Nvidia GeForce GTX 1080 Pascal Review | Reference card design

Nvidia has changed its approach to card design. Instead of "reference", she calls her own version of the map Founders Edition (the version of the creators). It is impossible not to notice that the appearance of the GeForce GTX 1080 has become more angular, but the cooling system uses the same old proven mechanism for throwing hot air out through the side bar.

The card weighs 1020 g and is 27 cm long. It is quite pleasant to the touch, since the cooler casing not only looks like metal, it is actually made of metal, more precisely, aluminum. The matte silver parts are varnished and will scratch quickly if not handled very carefully.

The back plate is divided into two parts. It serves only as a decoration and does not have a cooling function. Later we will find out how correct this decision is. Nvidia recommends removing elements of this plate when using SLI in order to achieve better airflow between cards installed close to each other.

There is nothing interesting at the bottom, although we noticed that parts of the black lid may come into contact with elements of the motherboard located underneath, such as the chipset cooler and SATA ports.

At the top of the card we see one auxiliary eight-pin power connector. Considering the official specifications of the video card, as well as the 60W of power drawn from the motherboard slot, one such connector should be enough for a nominal 180W thermal package. Naturally, we will check how much power this card actually draws, and whether it overloads the power lines.

There are also two SLI connectors. Along with the new Pascal graphics cards, Nvidia has introduced new high-bandwidth bridges. We'll look at them in more detail later. In short, only two video card SLI configurations are officially supported so far, and both connectors are used to operate the dual-channel interface between the GPUs.

Three full DisplayPort connectors are available on the I / O panel. The specifications specify the DisplayPort 1.2 standard, but are expected to be DisplayPort 1.3 / 1.4 compliant (at least the display controller can handle the newer standards). There is also an HDMI 2.0 output and dual link DVI-D. You don't have to search for analog connectors.

On the other side of the card there is a large slot for air intake and three screw holes for additional fixing of the card in the case.

Cooler design and power supply

After carefully examining the exterior, it's time to take a look at the filling hidden under the aluminum casing. This turned out to be more difficult than it might seem at first glance. After disassembly, we counted 51 parts on the table, including the screws. If the fans are removed, 12 more will be added.

Nvidia is finally back to using a real vapor chamber. It is attached to the board with four screws on top of the GPU.

The centrifugal fan should be familiar to you. Direct heat dissipation implies air intake in one place, its passage through the radiator fins and out of the case. The cooler shroud, which also serves as a frame, not only stabilizes the card, but also helps cool the voltage converters and memory modules.

After removing all the external components, we get to the printed circuit board. Unlike previous solutions, Nvidia uses a six-phase power scheme. Five phases serve the GPU, and the remaining phase powers the GDDR5X memory.

On the board, you can see a place for another phase, which is empty.

The GP104 GPU covers an area of ​​314 mm2, which is much smaller than its predecessor. Lines of other layers of the board are visible around the processor. To achieve high clock frequencies, the conductors should be as short as possible. Due to the stringent requirements, Nvidia's partners are likely to take longer to get production going.

The GDDR5X memory is represented by Micron's 6HA77 chips. They just went into mass production, as we saw 6GA77 chips in earlier leaked images of Nvidia's new graphics card.

A total of eight memory modules are connected to the 256-bit memory bus through 32-bit controllers. At a frequency of 1251 MHz, the bandwidth reaches 320 GB / s.

Micron's GDDR5X modules use 170-pin packaging instead of 190-pin GDDR5. They are also slightly smaller: 14x10 mm instead of 14x12 mm. That is, they have a higher density and require improved cooling.

Turning the card over, we found free space for the second power connector. Thus, Nvidia partners can install a second auxiliary connector to add power, or move the existing one to a different position.

There is also a slot in the board that allows you to rotate the power connector 180 degrees.

Capacitors are located directly under the GPU to smooth out potential surges. Also on this side of the board is the PWM (previously it was located on the front side). This solution gives Nvidia partners the ability to install other PWM controllers.

But back to the PWM controller of the voltage regulator. Nvidia's GPU Boost 3.0 technology has received a new set of voltage regulation requirements, resulting in significant changes. We expected to see an IR3536A controller from International Rectifier combined with a 5 + 1 phase circuit, but Nvidia used the µP9511P. This is not the best news for overclocking enthusiasts, as the card does not support the interface and protocol of tools like MSI Afterburner and Gigabyte OC Guru. The move to a new controller, which is not well described yet, is most likely due to technical issues.

Since the PWM controller cannot directly drive the individual phases of the voltage converter, Nvidia uses powerful MOSFET drivers with 53603A chips to drive the gate of the MOSFETs. But compared to some of the other options, the layout of the circuit looks neat and tidy.

There are different types of MOSFETs here. The 4C85N is a fairly resilient two-channel MOSFET for voltage conversion. It serves all six phases of power supply and has sufficient electrical and thermal reserves to withstand the loads of the reference design.


I wonder how Nvidia's GPU Boost 3.0 technology and modified voltage regulator circuitry will affect power consumption. We'll check it out for sure.

Nvidia GeForce GTX 1080 Pascal Review | Simultaneous Multi-Projection and Async Compute technology

Simultaneous Multi-Projection Engine

The increased core count, clock speed, and 10Gbps GDDR5X memory accelerate every game we tested. However, the Pascal architecture includes several features that we will only be able to appreciate in the upcoming games.

One of the new features Nvidia calls Simultaneous Multi-Projection Engine, or a multi-projection engine, represented by a hardware unit added to the PolyMorph engines. The new engine can create up to 16 projections of geometric data from a single viewpoint. Or, it can shift the viewpoint to create a stereoscopic image, duplicating geometry 32 times with hardware, which is without the performance penalty that you would experience trying to achieve this effect without SMP.


One-plane projection

Let's try to understand the advantages of this technology. For example, we have three monitors in a Surround configuration. They are slightly turned inward to "wrap" the user, so it is more convenient to play and work. But games do not know about this and render the image in one plane, so it appears to be curved at the place where the monitor frames meet, and the whole picture looks distorted. For this configuration, it would be more correct to visualize one projection straight ahead, the second projection to the left, as if from a panoramic cockpit, and the third projection to the right. Thus, the previously curved panorama will appear smoother and the user will have a much wider viewing angle. The entire scene still needs to be rasterized and painted over, but the GPU doesn't have to render the scene three times, thereby eliminating unnecessary load.


Incorrect perspective on angled displays



Corrected with SMP perspective

However, the application must support wide viewing angles settings and use SMP API calls. This means that game developers have to master it before you can take advantage of this feature. We're not sure how much effort they're willing to put into a handful of users with multi-monitor Surround configurations. But there are other applications for which it makes sense to implement this feature as soon as possible.


using single-pass stereo rendering, SMP creates one projection for each eye

Take virtual reality as an example. She already needs an individual projection for each eye. Games today simply render images on two screens separately, with all the attendant disadvantages and losses in efficiency. But since SMP supports two projection centers, the scene can be rendered in one pass using the Nvidia Single Pass Stereo feature. The geometry is processed once, and the SMP creates its projection for the left and right eyes. Then SMP can apply additional projections to work with a feature called Lens Matched Shading.


Images after the first pass with Lens Matched Shading



The final scene that is sent to the headset

In short, Lens Matched Shading tries to make VR rendering more efficient by avoiding the large amount of work that would normally be done when rendering traditional planar projection to distort the geometry to match the lens distortion of the headset (so pixels are wasted where the bending is greatest) ... This effect can be approximated by using SMP to divide the area into quadrants. So instead of rendering a square projection and working with it, the GPU creates images that match the lens distortion filter. This method prevents the generation of extra pixels. You won't notice a difference in quality, provided the developers adhere to or exceed the sample rate for the eye on the HMD.

According to Nvidia, the combination of Single Pass Stereo and Lens Matched Shading techniques can provide 2x the performance in VR compared to a GPU without SMP support. It's partly related to pixel rendering. By using Lens Matched Shading technology to avoid processing pixels that shouldn't be rendered, the rendering rate in a scene with balanced Nvidia presets dropped from 4.2MP / s (Oculus Rift) to 2.8MP / s, thus putting a shader load on GPU dropped by half. Single Pass Stereo, which processes the geometry just once (instead of re-rendering for the second eye) effectively eliminates half of the geometry that needs to be done today. It is now clear what Ren-Sun meant when he said "twice the performance and three times the efficiency compared to the Titan X".

Asynchronous computation

The Pascal architecture also includes some changes to asynchronous computing, which are related to DirectX 12, VR, and AMD's architectural advantage for a number of reasons.

Nvidia supports static GPU sharing for graphics and compute tasks starting with the Maxwell architecture. In theory, this approach is good when both blocks are active at the same time. But suppose that 75% of the processor's resources are devoted to the graphics, and it completed its part of the task faster. Then this block will be idle, waiting for the computing block to complete its part of the work. Thus, all the possible advantages of performing these tasks at the same time are lost. Pascal addresses this issue by dynamically balancing the load. If the driver decides that one of the partitions is underutilized, it can switch its resources to help the other, preventing downtime that negatively affects performance.

Nvidia also improved the interrupt capabilities in Pascal, that is, the ability to stop the current task in order to solve a more "urgent" one with a very short execution time. As you know, GPUs are highly parallelized machines with large buffers designed to keep similar resources next to each other busy. An idle shader is useless, so you need to involve it in your workflow by all means.


For VR, it's best to send interrupt requests as late as possible to capture the freshest tracking data

A great example is the Asynchronous Time Warp (ATW) feature that Oculus introduced with the Rift. In the case where the video card cannot deliver a new frame every 11ms on a 90Hz display, ATW generates an intermediate frame using the last frame with head position correction. But there must be enough time to create such a frame, and, unfortunately, the graphic interruption is not very accurate. In fact, the Fermi, Kepler, and Maxwell architectures support render-level interrupts, which means frames can be switched within a draw call, potentially holding back the ATW technique.

Pascal implements a pixel-level interrupt for graphics, so the GP104 can stop the current operation at the pixel-level, save its state, and switch to a different context. Instead of the millisecond interrupt that Oculus wrote about, Nvidia claims less than 100 microseconds.

In the Maxwell architecture, the equivalent of a pixel-level interrupt in the computational unit was implemented through a thread-level interrupt. Pascal also retains this technique, but adds support for instruction-level interrupts in CUDA computational tasks. Nvidia's drivers do not include this feature at the moment, but it will soon be available along with pixel-level interrupt.

Nvidia GeForce GTX 1080 Pascal Review | Output pipeline, SLI and GPU Boost 3.0

Pascal Display Channel: HDR-Ready

Last year, we met with AMD representatives in Sonoma, California, to share some details of their new Polaris architecture, such as the HDR content pipeline and related displays.

Not surprisingly, Nvidia's Pascal architecture is packed with similar features, some of which were even available in Maxwell. For example, the display controller in the GP104 processor has support for 12-bit color, BT.2020 wide color gamut, SMPTE 2084 electro-optical transmission, and HDMI 2.0b with HDCP 2.2.

To this list, Pascal adds accelerated HEVC decoding at 4K60p at 10/12-bit color through a dedicated hardware unit that claims to support HEVC Version 2. In the past, Nvidia used a hybrid approach using software resources. In addition, the encoding was limited to eight bits of color information per pixel. But we believe Microsoft PlayReady 3.0 required a faster and more efficient solution to support the controversial specification.

The architecture also supports HEVC encoding in 10-bit color at 4K60p for HDR recording or streaming, Nvidia even has a dedicated app for that. Using the GP104 processor's encoding tools and upcoming GameStream HDR software, you'll be able to stream high dynamic range games to Shield devices connected to an HDR-compatible TV. The Shield features a proprietary HEVC decoder with 10-bit color per pixel, which further offloads the image output pipeline.

GeForce GTX 1080 GeForce GTX 980
H.264 encoding Yes (2x 4K60p) Yes
HEVC encoding Yes (2x 4K60p) Yes
HEVC 10-bit encoding Yes No
H.264 decoding Yes (4K120p up to 240Mbps) Yes
HEVC decoding Yes (4K120p / 8K30p up to 320Mbps) No
VP9 decoding Yes (4K120p up to 320Mbps) No
Decoding HEVC 10/12-bit Yes No

In addition to supporting HDMI 2.0b, the GeForce GTX 1080 is DisplayPort 1.2 certified and DP 1.3 / 1.4 compliant. In this regard, it already surpasses the yet to be released Polaris, in which the display controller so far only supports DP 1.3. Fortunately for AMD, the 1.4 specification does not imply a faster transfer mode, and the ceiling is still set at 32.4 Gbps set by HBR3 mode.

As previously mentioned, the GeForce GTX 1080 Founders Edition comes with three Display Port outputs, one HDMI 2.0b connector, and one digital dual-link DVI output. Like the GTX 980, the new product is capable of displaying images on four independent monitors simultaneously. But compared to 5120x3200 resolution via two DP 1.2 cables, the maximum resolution of the GTX 1080 is 7680x4320 dots at a refresh rate of 60 Hz.

SLI now officially supports only two GPUs

Traditionally, high-end Nvidia video cards are equipped with two connectors for connecting two, three or even four accelerators in an SLI bundle. Typically, the best scaling is achieved with dual GPU configurations. Further, the costs themselves are often not justified, since there are many pitfalls. However, some enthusiasts still use three and four graphics adapters in pursuit of each additional frame and the opportunity to show off to friends.

But the situation has changed. Due to performance scaling issues in new games, no doubt DirectX 12-related, the GeForce GTX 1080 only officially supports dual-card SLI configurations, according to Nvidia. So why does the card need two connectors? Thanks to the new SLI bridges, both connectors can be used simultaneously for dual channel data transmission. In addition to the dual-channel mode, the interface also has an I / O frequency increased from 400 MHz to 650 MHz. As a result, the bandwidth between the processors more than doubles.


Frame rendering times in Middle earth: Shadow of Mordor with new (blue line in the graph) and old (black) SLI bridge

However, many gamers will not experience the benefits of a faster channel. It will be relevant, first of all, at high resolutions and refresh rates. Nvidia showed FCAT performance of two GeForce 1080 GTXs in Middle earth: Shadow of Mordor across three 4K displays. Connecting two cards with the old bridge resulted in constant jumps in the frame time frequency, which lead to predictable problems with synchronization, manifested in the form of slowdowns. With the new bridge, the number of jumps has decreased, and they have become less pronounced.

Nvidia says it's not just SLI HB bridges that support dual-channel mode. The already familiar LED-backlit bridges can also transmit data at a frequency of 650 MHz when connected to Pascal cards. It is better to refuse flexible or regular bridges if you want to work in 4K or higher. Detailed information regarding compatibility can be found in the table provided by Nvidia:

1920x1080 @ 60Hz 2560x1440 @ 120Hz + 2560x1440 4K 5K Surround
standard bridge x x
LED bridge x x x x
High Data Rate (HB) Bridge x x x x x x

What caused the rejection of three- and four-chip configurations? After all, the company always strives to sell more and achieve higher productivity. It can be cynical to say that Nvidia does not want to take responsibility for the loss of advantages with a bundle of two or four cards in SLI, when the market for modern video games uses increasingly subtle and complex approaches to rendering. But the company insists it is acting in the best interest of buyers as Microsoft is releasing more and more multiprocessor configuration management capabilities to game developers, who in turn are exploring new technologies such as single frame co-rendering instead of the current frame-by-frame rendering (AFR).

Enthusiasts who care only about speed records and are not interested in the factors described above can still link three or four GTX 1080s in SLI using the old software. They need to generate a unique "hardware" signature using a program from Nvidia that can request an "unlock" key. Naturally, the new HB SLI bridges will not work with more than two GPUs, so you will have to limit yourself to the old LED bridges to combine the work of three / four GP104s at 650 MHz.

GPU Boost 3.0 at a glance

In order to extract even more performance from its GPUs, Nvidia has refined GPU Boost technology again.

In the previous generation (GPU Boost 2.0), the clock frequency was set by moving the sloped voltage / frequency line by a certain value. The potential headroom above these lines was usually left unused.


GPU Boost 3.0 - setting the frequency boost by one voltage increase step

Now GPU Boost 3.0 allows you to set the frequency increase for individual voltage values, which are limited only by temperature. In addition, you do not have to experiment and check the stability of the card over the entire range of values ​​on the curve. Nvidia has a built-in algorithm to automate this process, creating a voltage / frequency curve unique to your GPU.