One Billion Records In Seconds | Syncfusion Blogs
Live Chat Icon For mobile
Live Chat Icon
Popular Categories.NET  (172).NET Core  (29).NET MAUI  (192)Angular  (107)ASP.NET  (51)ASP.NET Core  (82)ASP.NET MVC  (89)Azure  (40)Black Friday Deal  (1)Blazor  (209)BoldSign  (12)DocIO  (24)Essential JS 2  (106)Essential Studio  (200)File Formats  (63)Flutter  (131)JavaScript  (219)Microsoft  (118)PDF  (80)Python  (1)React  (98)Streamlit  (1)Succinctly series  (131)Syncfusion  (882)TypeScript  (33)Uno Platform  (3)UWP  (4)Vue  (45)Webinar  (49)Windows Forms  (61)WinUI  (68)WPF  (157)Xamarin  (161)XlsIO  (35)Other CategoriesBarcode  (5)BI  (29)Bold BI  (8)Bold Reports  (2)Build conference  (8)Business intelligence  (55)Button  (4)C#  (146)Chart  (125)Cloud  (15)Company  (443)Dashboard  (8)Data Science  (3)Data Validation  (8)DataGrid  (62)Development  (613)Doc  (7)DockingManager  (1)eBook  (99)Enterprise  (22)Entity Framework  (5)Essential Tools  (14)Excel  (37)Extensions  (22)File Manager  (6)Gantt  (18)Gauge  (12)Git  (5)Grid  (31)HTML  (13)Installer  (2)Knockout  (2)Language  (1)LINQPad  (1)Linux  (2)M-Commerce  (1)Metro Studio  (11)Mobile  (488)Mobile MVC  (9)OLAP server  (1)Open source  (1)Orubase  (12)Partners  (21)PDF viewer  (41)Performance  (12)PHP  (2)PivotGrid  (4)Predictive Analytics  (6)Report Server  (3)Reporting  (10)Reporting / Back Office  (11)Rich Text Editor  (12)Road Map  (12)Scheduler  (52)Security  (3)SfDataGrid  (9)Silverlight  (21)Sneak Peek  (31)Solution Services  (4)Spreadsheet  (11)SQL  (10)Stock Chart  (1)Surface  (4)Tablets  (5)Theme  (12)Tips and Tricks  (112)UI  (368)Uncategorized  (68)Unix  (2)User interface  (68)Visual State Manager  (2)Visual Studio  (30)Visual Studio Code  (17)Web  (577)What's new  (313)Windows 8  (19)Windows App  (2)Windows Phone  (15)Windows Phone 7  (9)WinRT  (26)

One Billion Records In Seconds

How long should it take to process 173 GB of data and then create a dashboard to make sense of it? Ten minutes? Three minutes? Two minutes? Those all sounded like reasonable estimates. We decided to put the Syncfusion Big Data and Dashboard Platforms to the test to see just how fast we could get it done.

30 seconds.

We have to admit, that’s faster than we expected. Here’s how we made it happen:

We used a data set of New York City taxi trips between 2009 and 2015. In total, this covered 1.1 billion individual taxi trips, giving us 1.1 billion records to play with. First we uploaded and processed the data with the Big Data Platform, and then we designed and displayed the results with the Dashboard Platform.

So what does it take to achieve these kind of speed results? And are they typical? And do you need enterprise-level hardware to make it happen?

The good news is you can get similar results using even the most basic hardware set up. This is the first time commodity hardware can do the heavy lifting, and it’s all because of the Syncfusion Big Data and Dashboard Platforms.

Node types

Number of nodes

Machine specs

Name node running Syncfusion Big Data Platform

2

Azure VM instance type

D4 standard

RAM

28 GB

Hard disk

400 GB

Core

8

OS

Windows Server 2012

Data node running Syncfusion Big Data Platform

3

Azure VM instance type

D15 standard

RAM

140 GB

Hard disk

1 TB

Core

20

OS

Windows Server 2012

The hardware configuration used in this test.

With Apache Spark 2.0 releasing soon, we decided to re-run our tests. The results were even better this time around: 

Query

Without tuning Spark

Resilient distributed data set caching using Spark SQL

Partitioning by year

Total record count

1.9 minutes

15 seconds

5 seconds

Passenger Count

1.9 minutes

13 seconds

5 seconds

Total Amount

2.8  minutes

12 seconds

6 seconds

The results using Spark 2.0.0 preview release are most impressive.

 For an in-depth look at the entire process, download our white paper.

Tags:

Share this post:

Popular Now

Be the first to get updates

Subscribe RSS feed
Scroll To Top