This document aims to detail the technology that goes into, as well as the comparisons between, modern computing devices and how their design and construction makes them useful in video production.
The processor is the key component when looking at video production. This is because although other components play a part in video encoding it is the processor that must calculate the data for each frame of footage. At a basic level, the processor handles all the math functions of a modern computer and its programs.
Binary, 32 and 64-bit Computing
Although significantly more complex, a computer is still an electrical circuit. Because of this, just like a simple light switch circuit, a computer circuit can be on or off. This forms the basis of binary and it allows for conversions of different numbers into a form that a processor understands. Yadin (2016) shows the following table as a tool to assist in migrating numbers between binary (2), decimal (10), Base 4 (4), Octal (8) and Hexadecimal (16).
It is the binary system that provided the names for 32 and 64-bit computing. This terminology is used to determine the maximum amount of data a processor can use at any one time. For 32-bit, this is 232 bits of data (or 4294967296). Therefore, the maximum RAM for a 32-bit system is limited to approximately 4GB, as anything more is just wasted storage as it cannot be utilised by the processor. Gilmer (2010) states that 32 and 64-bit specifically refers to the number of parallel lines on a bus inside the system, that a wider bus allows data to be moved more rapidly inside the machine.
Processor Internal Breakdown
(CPU Block Diagram, n.d)
Using the above diagram, the processor can be broken down into several basic parts. The Control Unit handles instructions and what other units need to be used (essentially which internal circuitry to be turned on). The Memory Unit handles the data from the memory to the ALU and then back again once the ALU has processed the data. The ALU is the Arithmetic Logical Unit and it is here where (as the name suggests) all the various arithmetic operations take place in the previously mentioned binary. Although not shown in this diagram, once the ALU has its results, it will store that internally to the CPU in cache for very short-term use (for example, if the result will be needed for another calculation or function) or it can be passed back to the system memory or onto other output devices. Although useful here, cache will be discussed in the section on storage.
Clock Speeds, Instructions per Clock Cycle
This is the first section on processors where clock speeds often differ depending on the device the CPU is installed onto. The more useful piece of information for a processor is instructions per clock cycle, however even this isn’t a constant as it can depend on the software being used and even things like memory performance and the performance of other storage devices. Coupled in with this are processors with multiple ALU’s and multiple (as well as larger) tiers of cache.
Heat Generation and TDP
As mentioned earlier in this section, at its core a computer is an electrical circuit. A processor needs energy to turn its internal circuitry on and off but depending on a processors clock speed, the time for a processor to do this function reduces which results in a dangerous by-product; heat. When you put a processor under heavy load (i.e. getting it to perform complex instructions) this rapid transition between on and off can generate heat. Therefore, when a processor is clocked higher than manufacturer specified, there is a risk of permanently damaging the processor components from excessive heat as it draws more power to accomplish its tasks. The reverse of this is also true and you will find where cooling and power must be carefully managed in construction of a device, a manufacturer will underclock a processor to allow it longer time to cycle, generating less heat and consuming less power.
Manufacturers call this heat output the TDP (Thermal Design Power) and it is the heat (in Watts) that the processor will generate under load and is the value required by the cooling solution to dissipate to prevent overheating. Hennessy and Patterson (2014) stipulate that the actual peak power for a processor is often 1.5 times higher than the manufacturers stated TDP value, yet again this can vary from each design.
Cores and Threads
Everything described up to this point is true of a single core CPU. Having additional cores means simply having multiple CPU’s on the one socket. The advantage for performance is (at a basic level) that 2 cores can handle 2 different operations at the same time. This is all done within the normal die size of the CPU, so the only concern is cooling, powering etc. the one socket.
A CPU with hyperthreading is where a single CPU core displays 2 logical CPUs. It is a form of parallel computing where the various operations can be broken down into smaller parts and have those processes carried out simultaneously on the CPU (Gottlieb, A and Almasi, G. 1989). As this is a virtual core, the operating system can still use another virtual core while waiting for one to complete an operation. Although useful, it is not as powerful when compared to having physical cores available, however the results are not exact as hyperthreading’s usefulness is based on the logical cores not needing the same resources at the same time.
GPUs vs CPUs
On the surface, modern GPUs share a lot of the same fundamental design as CPUs. GPUs are a processor designed for display functions, to render images and video to be displayed on a monitor (see I/O below for connection types and bandwidth figures).
GPUs come in 2 forms; dedicated or integrated. Dedicated GPU are a standalone card which is connected to the motherboard via the PCIe expansion slot (see Internal Connectivity for bandwidth figures). They use the term dedicated as the GPU card has its own RAM called GDDR to use for its tasks, the same as a CPU uses main system memory in the form of DDR. Both have the same base design and bandwidth, however GDDR is set to handle the serial workload of the GPU. The integrated GPU is located on the same die as the main CPU and as such only has access to the main system memory. For example, the fastest GPU specific RAM currently is GGR5X, which has a data transfer rate of 45,640 MB/s. When compared to the fastest system memory DDR4 (25,600 MB/s) it is this super-fast memory which is making multiple dedicated GPUs popular for compute tasks.
As mentioned above, it is the dedicated GPU which has become popular with professionals, thanks to its GDDR RAM setup to be used for stream processing. For certain mathematical problems which require single data items, using multiple GPU cores has proven to be faster and more cost efficient than using standard CPU configurations. For Nvidia, their platform is called CUDA (Compute Unified Device Architecture) which allows the GPU to use its GDDR RAM for a unified memory structure capable of scattered reads (Nvidia, 2007). With the advantage of multiple GPUs being able to be used together on multiple PCIe expansion slots, the performance gain for compute tasks is classed as superior as you are limited to the CPU installed into the motherboard slot with no simple way to expand.
At this point, modern systems are all built with 64-bit architecture in mind. However, the performance of different devices is fundamentally decided by the size of the processor being installed, the amount of cooling required and the power requirements of the processor. As better manufacturing techniques have evolved, so has the complexity and performance of processor types.
Performance Milestones over 25 to 40 years for microprocessors (Hennessy and Patterson 2014)
The above is an example of Intel’s evolution of their architecture from 1982 to 2010 and it is this evolution that has driven their mainstream processor line to great success over the years. However, when it comes to tablets and mobile phones, Intel has essentially no market share. Apple designs their own processors for their iPhone and iPad ranges and has them made for them by Taiwan Semiconductor Manufacturing Company (TSMC). This is so they can design the processors to a very exacting manner, as they also design the other major aspect of their products, the operating system in iOS. The other mobile system in Android is a little more fractured with having multiple manufacturers on the same platform (more like a desktop setup), however the most popular manufacturer on the platform in Samsung develops and manufactures their own processors in their newest devices too. The question comes when you try and compare these processors. What is known is that both manufacturers tend to set their clocks lower to help with cooling and power efficiency, the key focus for a mobile device running off an internal battery for most of the time. The newest Apple iPhones (8 and X) run an Apple A11 Bionic processor at 2.4GHz (official core count unknown) and the new Samsung S8 models run a Samsung Exynos 8895 Octacore at 1.7GHz. Although on paper it looks like Samsung has more cores but Apple has superior clock frequency, the only comparison to date is done on Geekbench (2017), which benchmarks mobile devices through their browser against the Intel Core i7-6600U (one of Intel’s ‘ultra-low power’ processors) which achieved a baseline score of 4000. The A11 scores 10,103 to Exynos’ 6509, so it potentially indicates that Apple’s mobile processor is more powerful. For both manufacturers, a custom GPU processor is designed to be integrated to the chip to reduce on size needed for implementation. Although these are lower in performance when compared with dedicated GPUs, they are still capable of high resolution playback of video and also mobile gaming applications.
For laptop processors, with less concerns about cooling when compared to mobiles and tablets, coupled with a larger capacity battery yields improved results. With the same Geekbench (2017) results, the Intel Core i7 7700T scores 14,783. This is accomplished by a few things, namely a higher clock (2.9GHz) as well as this being a 4 core, 8 thread CPU. As there is still a compact form factor in the laptop, some considerations for cooling to have to be considered. According to Intel (2017) The T suffix on the model number is to indicate that the CPU has been designed for “Power-optimized lifestyle.” For the cooling, this processor has a TDP of 35W and it can be underclocked to 1.9GHz which gives a TDP of 25W. This is an important consideration when cooling solutions are smaller and can dissipate less heat when compared to larger desktop solutions. Like the smaller phones and tablets, laptops often rely on the integrated GPU for graphics performance. However, as many manufacturers sell mobile versions of their GPUs (the same way CPU manufacturers do), some high-end laptops can take advantage of having a dedicated GPU installed. However, as there is still a concern about cooling of the GPU just like the CPU, these GPUs are often underclocked to lower their TDP and allow the laptop chassis to provide sufficient cooling.
This can then be directly compared to the desktop model of this CPU, the Intel i7 7700. With less cooling concerns, this has a standard clock of 3.6GHz and there is also a version which can be overclocked to 4.2GHz. The TDP is 65W on the standard 7700, with the 7700K having a higher TDP of 91W. The differences though are substantial. If you take the 7700K benchmark of 18799, there is now an estimated 86% improvement in performance from Apple’s A11 processor in the iPhone. When you compare it against Samsung’s Exynos processor, it’s an estimated 189% improvement onto the i7 7700K. Although these CPUs do have integrated GPUs, the dedicated GPU is now significantly more common and preferred as there is now less concern about cooling and size constraints. However, due to CPUs of this specification generally having a smaller number of PCIe lanes (see Internal Connectivity), most setups limit themselves to 2 GPUs at the absolute most.
Workstation CPU’s are different again. Here, they can be clocked lower when compared to desktops, mainly to allow for more precise operation and cooling. Taking the example of Intel’s workstation range of CPU’s in Xeon, an E5-2683v3 is only clocked to 2GHz, however that is across 14 physical cores with 28 threads. Despite the lower clock, this processor has a TDP of 120W which is 32% larger than the overclocked 7700K. This is down to the large core count on the Xeon CPU generating more heat when compared to the 4 cores of the 7700K. Because of the number of cores here, the Xeon gets a benchmark score of 22453, which is an estimated 19% improvement. At this point, integrated graphics are generally not available, allowing the available space on the die to be focused entirely on the CPU architecture. However, thanks to significantly larger PCIe lane counts (see Internal Connectivity) multiple dedicated GPUs are now able to be installed to take advantage of parallel computing performance.
As server setups can use both desktop and workstation graded processors depending on the type of scale that is being looked at in the server, comparisons are even harder to give exactly. However, one of the largest cores on the market is the Intel Xeon E5-2696 v4. It is a 22 core, 44 thread CPU clocked to 2.2GHz and has a 145W TDP. Its benchmark from Geekbench was 29061, which is an estimated 29% improvement in performance over the smaller Xeon. For more of a high-end desktop CPU comparison, the AMD Ryzen Threadripper 1950X is a 16 core, 32 thread CPU clocked at 3.4GHz with a TDP of 180W. Based off the Geekbench benchmark of 29279, the 1950X is 0.75% faster than the E5-2696 v4.
On a practical level, this means that as you go up the processor scale into more cores and threads you are going to get more and more performance simply as the processors can handle more data. For video production, this means key tasks like video rendering time is reduced, however the scale of the benefit varies on the software being used. As is proven by the various Xeon processors, simply having a high clock speed is not always the solution to more performance, especially when you need to consider cooling and power draw by the CPU on mobile platforms. However, when there is ample cooling available, the final Threadripper example shows that different manufacturer architectures can perform very closely even with different cores, threads and clock frequencies. As far as estimated benchmark results are concerned, there is a significant 350% increase in benchmarked performance from the Samsung Exynos 8895 processor for their mobile phones to the high-end desktop AMD Ryzen Threadripper 1950X.
An input device is any hardware that allows you to input data into your computer. For modern devices, this is a larger range of hardware and software than ever before. Your traditional inputs are keyboard and mouse and are still widely used. More portable devices are starting to take advantage of an alternative pointing device to the mouse in a stylus, something which digital graphics designers have been using on graphics tablets for many years now. For portable devices, touch screens have become the standard, although they aren’t used as much on home computers. For video production, there are several well-known input devices. The camera (still and video) for image content and footage, as well as microphones.
Output devices are what the computer sends information out to for further use. Again, for video production, there are a few obvious ones in the form of monitors for video, as well as speakers and headsets for audio. Some general output devices are printers.
As far as how these are all connected to the computer, this is all done with the I/O ports, located at both the back and in many cases the front of the case. USB is now one of the most common connections and is used for keyboard, mouse and microphones, not to mention external storage devices. You then have specific audio ports for speakers, headphones and microphones too. Followed by HDMI, VGA, DVI and DisplayPort for monitor connectivity (Pearson Certification 2011).
For video outputs, those different connector types can produce significantly different outputs. For VGA (Video Graphics Array) connections, although it can handle similar resolutions to the others theoretically (up to 2048×1536px @85Hz) as it’s an analogue signal that can have problems with signal quality and interference that lower the resulting image quality. DVI and the original HDMI standard are very alike in that they have the same maximum output of 3,820×2,160px at 30 frames per second. The difference between the two at this point is DVI generally (with a few exceptions) video output only (no sound), whereas HDMI has enough bandwidth for 8-channel, 24-bit audio along with the video. The newest HDMI standard (2.1) allows for resolutions up to 10328 x 7760px at 120 frames per second. Unlike HDMI, DisplayPort has remained mainly a computer connection. The current standard (DisplayPort 1.4) has important advantages with being compatible with various lossless transmission formats to provide 30-bit colour, however the maximum output is still 7680×4320px at 60 Hz.
For input and output devices, tablets and mobile phones are very similar. They typically have 1 main connection port, with Apple using their own Lightning connection and most Android phones using the more universal USB Type-C. Both connector types have similar bandwidth of up to 625MB/s, although actual throughput can vary depending on the device itself. It is through these multi-use ports that devices are charged and can be connected to a main computer to transfer data. Common inputs include; camera (front and rear of device), touch screen display, internet connectivity via Wi-Fi and cellular network, Bluetooth and microphone. More high-end devices also have several advanced sensors built into the device to assist in various functions. Because size is a limiting factor, the common outputs include; headphones, speakers and display. The diagram below shows the internal structure of an iPhone 5, which includes the various input and output sources. The main important feature of note is all the above are designed into the form factor of the device, often helping with overall strength and rigidity of the device.
iPhone 5 Parts Diagram (VKRepair 2017)
Laptops have very similar inputs and outputs to a standard desktop or workstation, to mirror the focus of a laptop being able to do most functions of a desktop or workstation while still being a mobile device. The key differences are that the keyboard and mouse are built into the chassis and the screen is a smaller size to match too as well as being built in. On a desktop and workstation, each of these I/O would be standalone, requiring the relevant connection ports (standardly USB for keyboard and mouse, HDMI and/or DisplayPort for monitor). Most laptops also come with a webcam and microphone built into the top of the chassis above the screen. As laptops use standard USB ports, any device which is compatible can be installed the same as it would be on a desktop or workstation.
The step up onto desktops and workstations has a significant impact on I/O mainly down to the drop in size restrictions like you would get with a mobile device. With the main system being installed into a larger case, there is flexibility to not only have the main motherboard rear I/O connections, but also to have a front panel on the case. The advantage here is that more devices can be attached at any one time and coupled with what tend to be more powerful processors (as discussed above) means these can be used without any performance drop. The main difference when moving onto workstations is this increase in available ports can increase even further, often facilitated by the available performance of multiple CPUs in multi-socket motherboards.
For servers and clients, I/O requirements are completely custom depending on the required inputs and outputs for the server. With there being such a huge range of configurations, this could be anything from custom large-scale networking with redundant active connections in case of failure to additional sensor setups to handle custom monitoring of system cooling.
Finally, a note on network connectivity. On mobile phones and tablets, their primary connection is over WiFi, with the common radio signal bands being 2.4Ghz and 5.8Ghz. There has been a shift onto the 5Ghz band for 2 main reasons; firstly, the 2.4Ghz band is slower, despite being able to cover a larger range and can be prone to interference. Secondly, with the higher band being able to handle more packets, despite the shorter range, it has become more popular as faster cable broadband connections have become available for the home, allowing connection speeds over WiFi up to 1GB/s (Wei Wei et al., 2008). The initially cross over occurs with laptops, where although they typically can access networks over WiFi, they also come with an ethernet (wired) port on the chassis, which (just like a desktop) will tend to be part of the internal motherboard I/O. This ethernet port is usually capable of the same 1GB/s speed. With both desktops and workstations however, additional speed could be advantageous and is often available through the motherboard directly or via network adapters. The standard speed here is up to 10GB/s. Finally, custom switches start to become more commonplace in a server setup, however you can also see them in a smaller client-server setup too where fast file transfer is advantageous (for example, moving large video files). For the highest bandwidth servers, often a direct connection to an internet data hub is required, complete with redundancy for connection issues. In these setups, high grade data center switches become the standard, with maximum bandwidth of up to 1032 Tbit/s available (Huawei 2014).
At a broad level, storage comes in 2 types; volatile and non-volatile. Although generally faster memory, volatile has the crucial drawback of it being effectively wiped if you lose power. It is this trade-off between speed and permanent storage which determines a lot of modern day storage types, both within a processor and a system in general.
The key storage for a processor starts with the cache. A key part of the architecture is the tiered cache built into the processor. It is tiered to allow for quick organisation and access to data and typically comes in 3 levels marked L1, L2 and L3. Also, as it is made from SRAM (static random-access memory), this memory is classed as volatile. A modern CPU (like AMD Zen architecture) would only use a total of 64 Kb instruction, 32 Kb data per core for L1 cache, for L2 cache 512 Kb per core and for L3 cache 8 MB per quad-core. As this storage is on the processor, the speed of the data transfer does vary between manufacturer and then the chip itself, and typically comes in smaller capacities like the example above when compared to main system memory. However, the maximum speed is essentially tied to that of the frequency of the CPU itself, to allow the CPU to have the highest possible speed it can access data. However, the theoretical bandwidth on L3 cache is up to 175 GB/s.
Main System Memory
After the processor memory comes the main system memory in the form of DRAM. Like SRAM it is a volatile storage type, however it different to SRAM in that DRAM is designed to store states in a capacitor (charged or not) allowing for binary data storage. Although still volatile, it is not as fast as SRAM, although as technology and production quality has advanced, this difference is starting to decrease. Standard DRAM comes in the form of synchronous dynamic random-access memory (SDRAM) and operates at a double data rate (DDR) interface. This means that the memory transfers its data on both the rising and falling curves of the CPU clock frequency. The main standard of DDR4 allows for each memory module to be up to 64GB in capacity, with a peak transfer rate of 25600MB/s. So, although not as fast as cache (175,000MB/s), the main advantage is the huge jump in capacity while still maintaining a decent transfer rate.
Standard magnetic hard drives are the most common, coming in 2.5-inch and 3.5-inch variants. They are slower than solid state drives, however they generally have a longer life span. By comparison, solid state drives are much faster thanks to them using flash memory, which makes them more reliable thanks to no moving parts but also more expensive as a result. Both solid state and magnetic hard drives are non-volatile storage, making them common devices for data that needs to be stored after a system is powered off. For magnetic hard drives, their speeds are often quoted in the rotational speed of the internal disk in RPM. Because of this internal moving part, it is important to note that the read and write performance varies on if data is simply being read or written to the internal platters or if the drive is transferring data to another device. A standard 7200RPM drive from Seagate has an external rate of 300Mbytes/s (Seagate 2011). Enterprise graded HDDs (made specifically for server use) allow for 15,000RPM drives, allowing a maximum transfer rate of up to 1.6Gbits/s (200MB/s). Although this is significantly slower when compared to the DDR4 standard above, the one main advantage is capacity. Standard drives can hold up to 6TB of data, with enterprise models typically being up to 20TB.
Solid state drives however, are starting to replace HDD where speed is more desired. Unlike the above HDDs, as the name suggests solid state drives have no moving mechanical parts. They are typically made with NAND flash memory, although recent advances in production resulted in modern drives using vertical stacks of NAND (called V-NAND). As a result, these drives can achieve 2500MB/s read and 1500MB/s write speeds, making them significantly faster than even the fastest traditional hard drives. However, to achieve this requires specialised connections on a motherboard, which will be discussed further in Internal Connectivity.
External Storage Types
External storage solutions include the CD, DVD, Blu-Ray, and External storage. For the various disk types, their read/write, capacity and read time are as follows:
|Media||1x transfer speed||Capacity||Full read time|
|CD||0.15 MB/s||700MB||80 min|
|DVD||1.32 MB/s||4.7GB||120 min|
|Blu-ray||4.29 MB/s||25 GB||180 min|
Data compiled from OSTA(2014) and blu-ray(2005)
Modern devices can read/write significantly faster than the 1x transfer speed noted above. A CD can write as much as 52x and higher, and as Blu-ray movie discs typically need a data transfer of 6.75MB/s, players for movie playback typically need at least a 2x speed. Typically, discs are used for storing content for playback alone, with the faster external storage being used for read/write of data being used daily.
Like larger drives, external storage comes as either traditional HDD (in 2.5” variety) or various flash storage types. Due to costs, generally smaller capacities are used for flash storage, making it useful for USB drives and card formats like SD cards. If capacity is more a requirement, then typically the HDD is used to keep costs low.
Modern Tape Storage
Although HDD is now the standard common drive type, tape storage is still used in medium and large sized data centers. In these setups, the tapes are stored as cartridges, to allow for a quick exchange of data without the risk of causing data loss. Coupled with the tapes ability to comfortably store data in uncompressed formats, as well as data being encrypted as well as being unable to be modified once written, makes it ideal for centers when security of data is paramount. Although the tapes tend to be limited to no more than 15TB of actual storage, with a maximum speed of 360MB/s the cost of these drives is often considered a worthwhile investment from locations where the integrity of the data is the key concern.
For tablets and phones, with being such small devices where not only is there a high chance of dropping but also significant amounts of motion during day-to-day use, they tend to use flash storage types. Although this does raise the prices of the devices, having the sufficiently fast read/write speeds is advantageous for everyday use and as these devices are usually used in conjunction with a main PC, large capacities aren’t as vital.
For laptops, a similar cross-over to what was noted in I/O also occurs for storage. To keep sizes down within the chassis, the drives are typically 2.5” HDD. In more expensive models where the speed is required, 2.5” SSDs are used instead. For standard desktops, there is a switch over to standard 3.5” HDD, as space is no longer at a premium, yet like laptops if faster speeds are required, this can be supplemented by a SSD.
For workstations, the task that the system is required to be used for becomes the primary concern. Although a 3.5” HDD will most likely still be present, it is now more likely to be used as a long-term storage drive only, with a SSD taking over as the primary drive. For high speed projects like rendering and animation, this will likely also mean a more top of the range model for the maximum read/write speeds currently available. Workstation processors have one key advantage over standard desktop CPU’s and that is the ECC RAM compatibility. ECC means error correcting code, which essentially means the RAM can correct itself from data corruption. This can be a crucial part of the design when failure (meaning the loss of the data) would not be suitable, for example in financial institutions. The processors also come with the option for multi-socket motherboards, meaning 2 or more processors can be installed into the motherboard for improved performance. Workstations also typically start to use RAID setups for storage (Redundant Array of Inexpensive Disks). RAID is where two or more disks are physically linked together to form a single, large capacity storage device that offers many advantages over conventional hard disk storage devices. RAID has 2 major versions, with the variation coming in how the multiple drives are configured. The first advantage is being able to use a technique called Striping to provide increased performance. It is used to create a large virtual drive, with data divided into stripes that are written sequentially across the drives in the array. This allows the CPU to access data across large capacities as the same speed as a smaller single drive, when normally there would be an increase in delay as the HDD spun to the correct data on a high capacity drive. The other version of RAID is mirroring and like it sounds it is where your drive (or multiples in a stack) is duplicated exactly onto another drive. This creates a superb redundancy to your data and allows for easy data recovery if a single drive fails.
Finally, for servers, this is totally dependent on the scale of the setup required. If data integrity is required, then tape storage solutions or large-scale RAID setups will probably be used in conjunction with SSD storage, to allow for a synergy between when the complex server setups require both fast performance and secure data.
Without internal connectivity in a PC, there would be no way for these components to work together and function as a modern PC. This is where the motherboard comes in, providing the electrical connections which all the components can use. They have different chipsets on motherboards which define what processors it supports, as well as the RAM and the type of I/O support. This can have an impact on overall performance as (for example) a processor can often perform to a higher level with overclocked DDR4 RAM when compared to standard supported speeds.
As mentioned above in I/O, a modern computer system often needs to be compatible with a large range of devices. To that end, a standard needed to be developed to allow manufacturers to know that when they build a device and then attach it to the motherboard, the device will function. As such, Intel, Dell, HP and IBM created the Peripheral Component Interconnect Express. (Mayhew, D., & Krishnan, V. 2003). The PCI Express lanes (PCIe) are one of the key data transfer methods within your system. “Each lane has two pairs of wires – one to send and one to receive.” (Wilson, 2005). The lanes work essentially the same way a motorway does, the more you have, the more traffic they can handle, which for very large amounts of data is crucial for increasing the available bandwidth. Just like any other circuit, each lane is made up of 4 wires. Lane counts are written with an “×” prefix, for example, “×8” represents an eight-lane card or slot. The current PCIe standard 3.0 allows for significant amounts of data transfer, depending on how many lanes are being used.
|984.6 MB/s||1970 MB/s||3940 MB/s||7900 MB/s||15800 MB/s|
This high level of transfer speed is a main contributing factor to the speed and performance of GPUs mentioned earlier in this document. Having such high throughput allows manufacturers a great deal of flexibility when designing components that use the standard.
However, all this bandwidth is not without cost. A common statistic shown in data sheets for processors is the total number of PCIe lanes available at full bandwidth. This is crucial when looking at total system performance as it does determine how many additional devices can be plugged into the motherboard and still achieve the above speeds.
The M.2 port is an offshoot of the standard PCIe connection and allows high speed SSD drives to use PCIe 3.0 up to x4 speeds via connection to the M.2 port, allowing them to achieve their maximum read/write speeds, something which wasn’t possible before with SATA ports.
SATA has been the standard connection for traditional hard drives for many years, mainly as (like PCIe) the port was flexible with the type of device which could be connected to it. Although perfectly acceptable for use with the HDD read/write speeds, its 600MB/s data speeds hampered what was capable from a modern SSD.
For mobiles and tablets, like other parts of the device probably indicated, internal connectivity is generally custom made. This is mainly down to the size considerations and as such a lot of the chassis structure themselves is part of the connectivity design (see: iPhone 5 Parts Diagram, VKRepair 2017). However, since the iPhone 6S Apple has been using PCIe connectivity for its on board storage (Baram, 2015), thanks in part to them having an existing controller from their main PC range.
Laptops are now an interesting mix of connections. As mentioned above, for standard HDD they continue to use SATA ports. However, for high-end laptops thanks to the small size of M.2 drives they can still be used in the small space available within the chassis. As their processors are usually just low energy versions of desktop chips, they still have the same number of available PCIe lanes as you would find on a normal desktop. However, as a laptop is still a portable device, the most it would ever use is if the laptop has a high end mobile GPU installed for high performance (for example, gaming).
For desktop, although size is no longer an issue, the standard system is still limited by the CPU, with newer processors like AMD Ryzen CPUs featuring 24 CPU controlled lanes, with up to 8 additional lanes available on certain motherboards. These are generally populated by a single graphics card, with both SSDs and HDDs (in SATA ports). Again, depending on the motherboard, it will come with 1 or more M.2 port for superior read/write for SSDs.
Workstations represent the next jump in available PCIe lanes. For AMD’s HEDT (high end desktop) CPUs called Threadripper, which use the same Zen architecture as the smaller Ryzen CPUs. However, these CPUs have 64 PCIe lanes, allowing for multiple devices to have full bandwidth. This could include multiple SSDs connected on M.2 as well as multiple GPUs installed. For a productivity setup, this additional bandwidth allows workstations to have even more flexibility.
For servers, like the above devices, performance here depends on the CPU and the chipset. For AMD, they have EPYC processors, which is their server processor line of the above Zen architecture, allowing for a total of 128 PCIe lanes. This is to allow for server specific configurations, many of which (like networking) have been mentioned earlier in this document. However other configurations include now using multiple GPUs for rendering.
Yadin, A. (2016). Computer Systems Architecture (Chapman & Hall/CRC Textbooks in Computing). CRC Press. pp.52
Gilmer, B. (2010). Computer architecture. Broadcast Engineering, Vol.52(1), pp.22-25.
CPU Block Diagram [online image] (n.d.)
Available from: http://www.codesandtutorials.com/hardware/computerfundamentals/cpu-block_diagram-working.php
[Accessed: 10th October 2017]
Hennessy, John L.; Patterson, David A. (2014). Computer Architecture: A Quantitative Approach. Morgan Kaufmann. pp.22
Hennessy, John L.; Patterson, David A. (2014). Performance Milestones over 25 to 40 years for microprocessors (source: Computer Architecture: A Quantitative Approach. Morgan Kaufmann. pp.20)
Geekbench (2017) iOS, Android, Processor Benchmarks [online]
Available at: https://browser.geekbench.com/ios-benchmarks
[Accessed: 11th October 2017]
Gottlieb, A; Almasi, G. (1989). Highly parallel computing.
Redwood City, Calif.: Benjamin/Cummings.
Intel (2017) Processor Numbers [online]
Available at: https://www.intel.co.uk/content/www/uk/en/processors/processor-numbers.html
[Accessed: 11th October 2017]
Pearson Certification (2011) Computer Structure and Logic
United States of America: Pearson IT Certification
VKRepair (2017) iPhone 5 Parts Diagram [online]
Available from: http://vkrepair.com/iphone-5-parts-diagram/
[Accessed: 10th October 2017]
Wei Wei, Bing Wang, Chun Zhang, Kurose, J. Towsley, D. (2008) Classification of access network types: Ethernet, wireless LAN, ADSL, cable modem or dialup? Computer Networks. Volume 52, Issue 17, Pages 3205-3217 [First Published online 2nd September 2008]
Available at: http://www.sciencedirect.com/science/article/pii/S1389128608002624
[Accessed: 15th October 2017]
Huawei (2014) CloudEngine 12800 Series Data Center Switches [online]
Available at: http://e.huawei.com/uk/products/enterprise-networking/switches/data-center-switches/ce12800
[Accessed: 15th October 2017]
Seagate (2011) Speed Considerations [online – Archived]
Available at: https://web.archive.org/web/20110920075313/http://www.seagate.com/www/en-us/support/before_you_buy/speed_considerations
Original location: http://www.seagate.com/www/en-us/support/before_you_buy/speed_considerations
[Accessed: 15th October 2017]
OSTA (2004) Understanding CD-R & CD-RW Recording Speed [online]
Available at: http://www.osta.org/technology/cdqa5.htm
[Accessed: 14th October 2017]
OSTA (2004) Understanding Recordable & Rewritable DVD Recording Speed [online]
Available at: http://www.osta.org/technology/dvdqa/dvdqa4.htm
[Accessed: 14th October 2017]
Blu-ray (2005) Blu-ray FAQ [online]
Available at: http://www.blu-ray.com/faq/#bluray_speed
[Accessed: 14th October 2017]
Mayhew, D., & Krishnan, V. (2003). PCI express and advanced switching: Evolutionary path to building next generation interconnects. High Performance Interconnects, 2003. Proceedings. 11th Symposium on, 21-29.
Wilson, Tracy V. (2005) How PCI Express Works [online]
Available at: http://computer.howstuffworks.com/pci-express1.htm
[Accessed: 25th September 2017]
Baram, E (2015) PCIe/NVMe in Mobile Devices [online]
Available at: https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150811_S101C_Baram.pdf
[Accessed: 15th October 2017]
Nvidia (2007) CUDA Zone [online]
Available at: https://developer.nvidia.com/cuda-zone
[Accessed: 15th October 2017]