In this article, I present a Windows program that converts streaming market events into time-period based summaries which I refer to as briefs.
Tip: Before downloading, please review the Limitations and Requirements.
Introduction
BriefMaker
is a Windows program that converts streaming market events into time-period based summaries that I refer to as briefs. Briefs are a more familiar and practical data to work with.
Stream data is usually not that useful – it is a random mess of different events. We usually want to convert this mess into a more logical and useful data structure that might contain things like the highest, lowest and average value. When we look at a stock’s price history table, we often take this conversion for granted forgetting that behind the scenes, this data originally came from a massive amount of market events such as trades, bid offers, ask offers, etc. The conversion process takes these varying number of events and converts them into a nice fixed-sized summary for a specific interval of time. The result is data that is easier to read for people and AI algorithms alike.
Below, on the left, is what streaming event data looks like. [78,2,2,24.43] is just my short version of saying at 78/256th of a second, for stock id #2 and attribute #2(bid price update), there was a new value of 24.43. BriefMaker converts this streaming data into time-based snapshots that I refer to as briefs displayed on the right. The right side is much easier to read.
Why the name ‘Brief’? First off, Brief indicates a short version of something much longer. A brief is a fixed-sized summary of a potentially large set of events over a period of time. But in a way, a brief is not that brief either because it captures so much more than the basic low/high/close/volume, it also captures indicators, statistics, counts, and more. I guess a more accurate name/definition could be “fixed-sized extended summary of market event data over a unit of time”.
There is a disadvantage of converting event data to time-based summaries(briefs). Usually, when summarizing data of any kind, there is some loss of data that cannot be restored. For example, the original real-time event stream cannot be reproduced from briefs – it is a one-way street. Briefs can always be re-created from event data however, and for this reason, I recommend holding onto the original data (StreamMoments
). As shown below, briefs can be pretty detailed in how they describe the event data and we might want to adjust the brief making process and re-create all our briefs.
Briefs have many benefits over streaming quotes. They can be fit in a table for example with time on one axis and volume on the other. Briefs are easier to view and contain most of the important information on what happens to each symbol in each 6-second time frame. It contains data like the period-high, period-low, last-sale-price, last-bid, last-bid-volume, trade counts, and more.
Background
BriefMaker
was built because I had a need for table like data for my AI. I use something similar to a genetic algorithm to try and predict stocks. It basically reads in X number of rows and then does its best to predict the next line of data so it can do real-time trading. While not required, a table does work good for this. However, I had almost random event data.
While event data could also be used in a table, it would be very compute intensive to run the algorithm on every event. The only way this would make sense is when using very low latency stuff. If I lived a mile from the exchange and milliseconds counted, I might have just used the event data but I am on the west coast with a basic internet connection.
For my data needs, I had two choices. I could either use existing history data or I could build it myself from streaming data There are lots of choices online for stock history but it lacked the high detail that I wanted. Most are basically a bid, ask, high, low, last, and volume with a 5-60 second or resolution. I wanted more descriptive details on that data so I thought I could generate my own with a live data feed. This was one of the reasons I built this program from scratch instead of using existing data.
One great concern I had was missing hidden data when doing the conversion. I wanted the AI algorithm to have access to as much information as possible. The AI was being used in CUDA and in CUDA, the warp size is 32 so having 32 symbols and 32 attributes fit the data structure well. At first, it was difficult to fill 32 different attributes fields, but soon it was difficult to keep it under 32. Originally, I had more statistical fields like Variance, standard deviation, kurtas but dropped these in favor of some other descriptive fields. Also, there just was not enough data to fill some of these advanced data descriptors for only 6 seconds of trades.
So for this whole AI predicting algorithm to work well, I needed a number of parts and one was the problem of converting the event data to highly-detailed, fixed-sized, time-interval data. After some playing around some, BriefMaker was the result.
Features
Below are what I believe are some nice features of BriefMaker
.
- Detailed Briefs - Captures 32 different aspects of each stock including statistical and indicator like data.
- Direct WCF connection to enhance real-time data performance. It can be used with MarketRecorder or customized to use with other programs as well.
- Continue where left off - will re-run
StreamMoment
s from a little before it last left off to make sure the data is more accurate. Also if the program is closed while it's creating the briefs and it is re-launched, it will continue where it left off. - Protection against writing incomplete Briefs - Built-in startup protection against outputting incomplete briefs when an important value, such as the last price, has not been received yet. (See
waitingForData
in code) - Out of range checking – checks data to make sure values are within acceptable ranges such as last price should be between
Data Flow
The dataflow for BriefMaker
is pretty simple. It basically reads in six SteamMoment
s records from a table at a time then saves them to a briefs record. After all, the StreamMoment
s are read and it is all caught up, then it can optionally wait for new StreamMoment
s via WCF. Take a look at the Interactive Brokers TWS MarketRecorder project to see an example of how to send data via WCF.
Stream-to-Brief Conversion - Capturing Almost Everything
Whenever we summarize the chaotic event data for a stock, or most anything else, we are losing data in the conversion process. We are stepping away from what exactly happened in the market. One of the goals of this project was to miss as little information where possible from the original event data. For the stock predicting algorithm, I want it to have access to as many informational fields as possible to ensure it could find undiscovered patterns in the market.
In BriefMaker
, I tried to collect all kinds of detail on what happens every 6 seconds to a particular symbol. In some regards, a brief is not that brief. A brief contains the normal stuff like high, low, last, ask, bid but it also captures different kinds of volume information, statistical data (mean price, mode price, median price) and indicator data (like MACD, SMA, Bollinger-Bands), tick counts, sale counts, etc. The goal was to try and capture as much descriptive information about the stream as possible. In the past, I also had standard deviation and variance but I gave these up for other kinds of data. Often, there is just not enough data in a 6 second increment for a given ticker symbol for these.
Below are 32 different aspects captured for each symbol every six seconds. With so many different ways to view the data, we can describe pretty well what happens to that symbol in each 6-second time period. Every 6 seconds and for each stock, BriefMaker
collects the following information:
ID Format Short Name Init. Value New Values Description
0 float volume_day (none) always replace total volume for the day
1 float volume_ths 0 sum volume for this 6-sec period (calculated)
2 float largTrdPrc price_last conditional replace price at largest volume trade
3 float price_high price_last conditional replace highest price in period
4 float price_loww price_last conditional replace lowest price in period
5 float price_last price_last always replace last trade price
6 float price_bidd price_bidd always replace last bid price
7 float price_askk price_askk always replace last ask price
8 float volume_bid volume_bid always replace last bid size
9 float volume_ask volume_bid always replace last ask size
10 float price_medn (none) calculated Statistics median for last price
11 float price_mean (none) calculated Statistics mean for last price
12 float price_mode (none) calculated Statistics mode for last price
13 float buyy_price (none) calculated A prediction of what the buy price.
14 float sell_price (none) calculated A prediction of what the sell price.
15 float largTrdVol 0 conditional replace The size of largest trade.
16 float prcModeCnt 0 sum Statistics mode price count
17 float vol_at_ask 0 always replace Volume at ask price
18 float vol_no_chg 0 always replace Volume with no last change in last size.
19 float vol_at_bid 0 always replace Volume at bid price
20 float BidUpTicks 0 sum How many times the bid went up.
21 float BidDnTicks 0 sum How many times the bid went down.
22 float sale_count 0 sum # of trades counted
23 float extIndex00 overwritten calculated 6 sec. calculated ATR of last trades
24 float extIndex01 overwritten calculated 6 sec. calculated CCI of last trades
25 float extIndex02 overwritten calculated 6 sec. calculated EMA of last trades
26 float extIndex03 overwritten calculated 6 sec. calculated Kama of last trades
27 float extIndex04 overwritten calculated 6 sec. calculated RSI of last trades
28 float extIndex05 overwritten calculated 6 sec. calculated SMA of last trades
29 float extIndex06 overwritten calculated 6 sec. calculated SarExt of last trades
30 float extIndex07 overwritten calculated 6 sec. calculated MACD of last trades
31 float extIndex08 overwritten calculated 6 sec. calculated Bollinger bands of lasts
Format: For simplicity, all values are stored as type float
. One drawback to floating point is the 7.2 significate digits, however this precision is usually beyond the precision of the data.
Initial Value: This is the value that each brief starts out with. Usually, they are reset to zero or carried over from the previous brief. Some values are specified as ‘overwritten’ because they are overwritten when finishing the brief so no initial value is needed.
New Values: This is the action for new stream events.
- Always Replace - will always replace the current value with a new value.
- Conditional Replace - will only replace a value if a condition is met. Like the
price_high
would only be replaced if it is the new high. - Sum - will add the new value to the running total.
- Calculated – values are calculated on the completion of each brief. For example, this might be an array of sale prices that is fed into a formula.
Extracting a Brief
To extract a brief for use in an application, use something like the below…
More details on the Brief structure can be found here.
BinaryReader reader = new BinaryReader(new MemoryStream(lastBrfImage));
int briefID = reader.ReadSingle();
float Day = reader.ReadSingle();
float Hour = reader.ReadSingle();
float Minute = reader.ReadSingle();
float Second = reader.ReadSingle();
float DayOfWeek = reader.ReadSingle();
float MinutesSinceOpen = reader.ReadSingle();
float SecondsSinceOpen = reader.ReadSingle();
float HoursSinceOpen = reader.ReadSingle();
float RemainingHours = reader.ReadSingle();
float RemainingMinutes = reader.ReadSingle();
float TICK-NASD 4 = reader.ReadSingle();
float VOL-NASD_0 = reader.ReadSingle();
float VOL-NASD_1 = reader.ReadSingle();
float VOL-NASD_2 = reader.ReadSingle();
float AD-NASD_1 = reader.ReadSingle();
float AD-NASD_2 = reader.ReadSingle();
float TICK-NYSE_4 = reader.ReadSingle();
float VOL-NYSE_0 = reader.ReadSingle();
float VOL-NYSE_1 = reader.ReadSingle();
float VOL-NYSE_2 = reader.ReadSingle();
float AD-NYSE_0 = reader.ReadSingle();
float AD-NYSE_1 = reader.ReadSingle();
float AD-NYSE_2 = reader.ReadSingle();
float INDU_1 = reader.ReadSingle();
float INDU_2 = reader.ReadSingle();
float INDU_4 = reader.ReadSingle();
for (int s = 0; s < symbCt; s++)
{
symbols[s].volume_day = reader.ReadSingle();
symbols[s].volume_ths = reader.ReadSingle();
symbols[s].largTrdPrc = reader.ReadSingle();
symbols[s].price_high = reader.ReadSingle();
symbols[s].price_loww = reader.ReadSingle();
symbols[s].price_last = reader.ReadSingle();
symbols[s].price_bidd = reader.ReadSingle();
symbols[s].price_askk = reader.ReadSingle();
symbols[s].volume_bid = reader.ReadSingle();
symbols[s].volume_ask = reader.ReadSingle();
symbols[s].price_medn = reader.ReadSingle();
symbols[s].price_mean = reader.ReadSingle();
symbols[s].price_mode = reader.ReadSingle();
symbols[s].buyy_price = reader.ReadSingle();
symbols[s].sell_price = reader.ReadSingle();
symbols[s].largTrdVol = reader.ReadSingle();
symbols[s].prcModeCnt = reader.ReadSingle();
symbols[s].vol_at_ask = reader.ReadSingle();
symbols[s].vol_no_chg = reader.ReadSingle();
symbols[s].vol_at_bid = reader.ReadSingle();
symbols[s].BidUpTicks = reader.ReadSingle();
symbols[s].BidDnTicks = reader.ReadSingle();
symbols[s].sale_count = reader.ReadSingle();
symbols[s].extIndex00 = reader.ReadSingle();
...
symbols[s].extIndex08 = reader.ReadSingle();
}
Points of Interest
At the beginning of this project, I had some unnecessary complication. I needed a system that could receive data, prepare it, and upload it at almost the same time from many different threads. After some playing around, I thought about a coin. One side of the coin can be receiving StreamMoment
s while the other side can be finishing off a brief and uploading it to the database. Every 6-seconds, the coin is flipped and some data is carried over. This out-of-box thinking made the program a lot easier to code, maintain and understand. It also made threading much simpler as well.
Reading and Storing Data
Reading StreamMoments Records from the Database
The program reads six one-second StreamMoment
records to generate one brief. Since the bulk of the source data is stored in a database when the program starts it…
- Find and load the latest brief already written to the SQL database. The goal is to re-load the system state to where it last left off.
- Based on the latest brief, the program then starts replaying
StreamMoment
s slightly before this moment. Again, the goal is to re-load the system state to where it last left off. - After it re-plays those
StreamMoment
s, it continues and starts processing the new StreamMoment
s that do not have Briefs yet. - After it catches up by reading in all the
StreamMoment
s, or the current moment, it waits for new StreamMoment
s from either a new database record or directly from the WCF connection. The WCF method was purely added to lower the latency when working with real-time data.
Details on the StreamMoment
s format can be found here.
Writing the Finished Briefs to the Database
After BriefMaker
has created a new brief, from six StreamMoments
, it writes it to the briefs
table. Originally, I had a crazy number of columns (32 symbols x 32 attributes) but this was a performance/memory hog, so I changed it to record in byte image format. This is much more efficient but not as easy to use the data. With the data in each column, it was convenient for reporting but it was just very slow and it used tons of memory.
In the briefs
table, there are only two columns:
BriefID
: Stored in TinyTime6Sec
format. This is my own format, but it can easily be converted to DateTime
by a simple cast. This is kind of like DateTime
but it can be fit in a 32-bit integer. The value for TinyTime6Sec
is basically the number of 6-second increments between 8am-4pm M-F since 1/1/2010. This was a time format I created to (a) keep the data/time field small and (b) to create a contiguous range without gaps that I could use for an ID. For example, ID 489394 would refer to some 6-second timeframe during market hours and 4893945 would be the next 6 second timeframe.
One note is that TinyTime6Sec
does not account for holidays. Just because there is a valid TinyTime6Sec ID
that does not mean the market was open that day. Weekends are skipped however, there is no valid value for Saturday or Sunday.
BriefBytes
: This is where the brief is stored. As mentioned before, it is stored in byte image format for performance reasons. Each byte image size is 4224 bytes. (32 header items + (32 stocks x 32 attributes)) * sizeof(float).
The byte layout is as follows:
Offset | Data Stored | Offset | Data Stored (cont.) |
0 | briefID(int) | 64 | VOL-NYSE(2).askPrice |
4 | Day | 68 | TICK-NYSE(3).lastPrice |
8 | Hour | 72 | VOL-NYSE(4).bidSize |
12 | Minute | 76 | VOL-NYSE(4).bidPrice |
16 | Second | 80 | VOL-NYSE(4).askPrice |
20 | DayOfWeek | 84 | AD-NYSE(5).bidSize |
24 | TotalMinutes | 88 | AD-NYSE(5).bidPrice |
28 | TotalSeconds | 92 | AD-NYSE(5).askPrice |
32 | HoursSinceOpened | 96 | DJIA(6).bidPrice |
36 | HoursRemaining | 100 | DJIA(6).askPrice |
40 | MinutesRemaining | 104 | DJIA(6).lastPrice |
44 | TICK-NASD(0).lastPrice | 108-127 | note used |
48 | VOL-NASD(1).bidSize | 128-255 | Ticker 1 (see table) |
52 | VOL-NASD(1).bidPrice | 256-383 | Ticker 2 (see table) |
56 | AD-NASD(1).askPrice | | … |
60 | VOL-NYSE(2).bidPrice | 4096-4223 | Ticker 32 |
Viewing the Briefs
A small viewer program is included so briefs can easily be viewed. It would not be much fun to run BriefMaker, if there wasn't a way to view the output!
To view the briefs, launch the viewer application and use either the Stock Chart tab or the Brief Raw Data tab. Both tabs really show the same data but with different views - chart vs table. These are the only two tabs that are used here.
The Brief Raw Data view looks like this:
And the chart view...
The executable is included in the downloads at the top of this page and the source can be found here.
Before downloading this project, some limitation/annoyances might want to be reviewed. I wanted to share these with viewers so they do not have to download the project and have to figure them out on their own. =)
BriefMaker
is currently mostly hard-coded to work with 32 symbols. To use more/less, some code will need to be modified. This would not be that difficult. - The
BriefMaker
stores each brief’s time in a propriety TinyTime6Sec
. This is basically the number of 6-second intervals from M-F 8am-4pm since Jan 1st, 2010. TinyTime6Sec
can easily be casted to a DateTime
using the TinyTime6Sec
class. - No Level II market data
Wish List
Some items I would like to add in the future: (not sure when or if I will ever get to it though)
- Switch to the
QLNet
library (uses QuantLib
) for the quantitative finance stuff. This is a more recent up-to-date project then the TA-Lib. The C# TA-Lib is a great library, but unfortunately has not been updated since 2007. - Get rid of the hard-coded “32 symbol” requirement.
- Add Level II data.
Setup Instructions
- Download the database, extract it, and using SQL Server manager, attach it with the name
Focus
. - Download and extract either the code or the executables. If you download the code, then you will need to build the project.
- Open the .config file and first look at the
BriefsConnectionString
. You might need to edit this connection string depending on your setup. Most often "Data Source=.;Initial Catalog=Focus;Integrated Security=True
" would be for regular SQL Server or "Data Source=.\SQLEXPRESS;Initial Catalog=Focus;Integrated Security=True
" would be used for SQL Server Express. Also review the BeginRecordTime
, EndRecordTime
, and PreBeginBufferTime
. These should be set to your local time on when the markets open/close. - Now run
BriefMaker
. If there are errors, then you can either review the log window or use Visual Studio's debugger. - After
BriefMaker
finishes, then launch Viewer.exe to view the output. Again, you might need adjust the viewer's FocusConnectionString
before starting the application. After opening it, use the "Stock Chart" and "Raw Brief Data" tabs to view the data. The other tabs are not used for BriefMaker
.
- .NET 4.5
- SQL Server or SQL Express (Free)
History
- 23rd March, 2016: Initial version
Ryan White is an IT Coordinator, currently living in Pleasanton, California.
He earned his B.S. in Computer Science at California State University East Bay in 2012. Ryan has been writing lines of code since the age of 7 and continues to enjoy programming in his free time.
You can contact Ryan at s u n s e t q u e s t -A-T- h o t m a i l DOT com if you have any questions he can help out with.